【API解析】微软edge浏览器大声朗读功能（read aloud）调用步骤-阿南达文事网

【API解析】微软edge浏览器大声朗读功能（read aloud）调用步骤

1. 来源

github: MsEdgeTTS, edge-TTS-record
吾爱破解：微软语音助手免费版，支持多种功能，全网首发

2. 准备工作

功能来源：edge浏览器
抓包工具：fiddler
模拟请求：postman

3. 主要分析步骤

第一步：确定edge浏览器read aloud功能用js如何调用，fiddler上没有捕捉到

const voices = speechSynthesis.getVoices()
function speakbyvoice(text, voice) {var utter = new SpeechSynthesisUtterance(text)for (let v of voices) {if (v.name.includes(voice)) {utter.voice = vbreak}}speechSynthesis.speak(utter)return utter
}
speakbyvoice("hello world", "Xiaoxiao")

第二步：试着对edge-TTS-record抓包，抓到了一个http请求和websocket连接。对照MsEdgeTTS的代码可知：

/** postman中模拟成功* 获取可用语音包选项，等价于speechSynthesis.getVoices()* http url: =6A5AA1D4EAFF4E9FB37E23D68491D6F4* method: GET*/
{uri: "",query: {trustedclienttoken: "6A5AA1D4EAFF4E9FB37E23D68491D6F4"}method: "GET"
}/** postman中模拟成功* 发送wss连接，传输文本和语音数据，等价于speechSynthesis.speak(utter)* wss url: wss://speech.platform.bing/consumer/speech/synthesize/readaloud/edge/v1?TrustedClientToken=* send: 发送两次数据，第一次是需要的音频格式，第二次是ssml标记文本（需要随机生成一个requestid，替换掉guid的分隔符“-”即可）* receive: 接收到的webm音频字节包含在相同requestid的正文部分，用Path=audio\r\n定位正文索引* 存在的问题: 1、第一次发送的音频格式文本中，只有在webm-24khz-16bit-mono-opus格式下才能成功连接，其他格式尝试后直接断开；*           2、第二次发送的ssml文本不支持mstts命名空间的解析，是Auzure语音服务的阉割版，例如不能出现xmlns:mstts="****"、<mstts:express-as/>、<p/>、<s/>等语言标记*/
{uri: "",query: {trustedclienttoken: "6A5AA1D4EAFF4E9FB37E23D68491D6F4"},sendmessage: {audioformat: `
X-Timestamp:Mon Jul 11 2022 17:50:42 GMT+0800 (中国标准时间)
Content-Type:application/json; charset=utf-8
Path:speech.config{"context":{"synthesis":{"audio":{"metadataoptions":{"sentenceBoundaryEnabled":"false","wordBoundaryEnabled":"true"},"outputFormat":"webm-24khz-16bit-mono-opus"}}}}`,ssml: `
X-RequestId:7e956ecf481439a86eb1beec26b4db5a
Content-Type:application/ssml+xml
X-Timestamp:Mon Jul 11 2022 17:50:42 GMT+0800 (中国标准时间)Z
Path:ssml<speak version='1.0' xmlns='' xml:lang='en-US'><voice  name='Microsoft Server Speech Text to Speech Voice (zh-CN, XiaoxiaoNeural)'><prosody pitch='+0Hz' rate ='+0%' volume='+0%'> hello world</prosody></voice></speak>`}
}

4. 编写代码

websocket库：WebSocketSharp。最新版安装失败的可以降版本安装，此文发布的时候最新预览版是1.0.3-rc11

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using WebSocketSharp; // nuget包：WebSocketSharp（作者：sta，此文安装版本：1.0.3-rc10）namespace ConsoleTest
{internal class Program{static string ConvertToAudioFormatWebSocketString(string outputformat){return "Content-Type:application/json; charset=utf-8\r\nPath:speech.config\r\n\r\n{\"context\":{\"synthesis\":{\"audio\":{\"metadataoptions\":{\"sentenceBoundaryEnabled\":\"false\",\"wordBoundaryEnabled\":\"false\"},\"outputFormat\":\"" + outputformat + "\"}}}}";}static string ConvertToSsmlText(string lang, string voice, string text){return $"<speak version='1.0' xmlns='' xmlns:mstts='' xml:lang='{lang}'><voice name='{voice}'>{text}</voice></speak>";}static string ConvertToSsmlWebSocketString(string requestId, string lang, string voice, string msg){return $"X-RequestId:{requestId}\r\nContent-Type:application/ssml+xml\r\nPath:ssml\r\n\r\n{ConvertToSsmlText(lang, voice, msg)}";}static void Main(string[] args){var url = "wss://speech.platform.bing/consumer/speech/synthesize/readaloud/edge/v1?trustedclienttoken=6A5AA1D4EAFF4E9FB37E23D68491D6F4";var Language = "en-US";var Voice = "Microsoft Server Speech Text to Speech Voice (zh-CN, XiaoxiaoNeural)";var audioOutputFormat = "webm-24khz-16bit-mono-opus";var binary_delim = "Path:audio\r\n";var msg = "Hello world";var sendRequestId = Guid.NewGuid().ToString().Replace("-", "");var dataBuffers = new Dictionary<string, List<byte>>();var webSocket = new WebSocket(url);webSocket.SslConfiguration.ServerCertificateValidationCallback = (sender, certificate, chain, sslPolicyErrors) => true;webSocket.OnOpen += (sender, e) => Console.WriteLine("[Log] WebSocket Open");webSocket.OnClose += (sender, e) => Console.WriteLine("[Log] WebSocket Close");webSocket.OnError += (sender, e) => Console.WriteLine("[Error] error message: " + e.Message);webSocket.OnMessage += (sender, e) =>{if (e.IsText){var data = e.Data;var requestId = Regex.Match(data, @"X-RequestId:(?<requestId>.*?)\r\n").Groups["requestId"].Value;if (data.Contains("Path:turn.start")){// start of turn, ignore. 开始信号，不用处理}else if (data.Contains("Path:turn.end")){// end of turn, close stream. 结束信号，可主动关闭socket// dataBuffers[requestId] = null;// 不要跟着MsEdgeTTS中用上面那句，音频发送完毕后，最后还会收到一个表示音频结束的文本信息webSocket.Close();}else if (data.Contains("Path:response")){// context response, ignore. 响应信号，无需处理}else{Console.WriteLine("unknow message: " + data); // 未知错误，通常不会发生}}else if (e.IsBinary){var data = e.RawData;var requestId = Regex.Match(e.Data, @"X-RequestId:(?<requestId>.*?)\r\n").Groups["requestId"].Value;if (!dataBuffers.ContainsKey(requestId))dataBuffers[requestId] = new List<byte>();if (data[0] == 0x00 && data[1] == 0x67 && data[2] == 0x58){// Last (empty) audio fragment. 空音频片段，代表音频发送结束}else{var index = e.Data.IndexOf(binary_delim) + binary_delim.Length;dataBuffers[requestId].AddRange(data.Skip(index));}}};webSocket.Connect();var audioconfig = ConvertToAudioFormatWebSocketString(audioOutputFormat);webSocket.Send(audioconfig);webSocket.Send(ConvertToSsmlWebSocketString(sendRequestId, Language, Voice, msg));while (webSocket.IsAlive) { }Console.WriteLine("接收到的音频字节长度：" + dataBuffers[sendRequestId].Count);Console.ReadKey(true);}}
}

5. 结语

模拟websocket请求成功，缺陷是postman模拟结果显示音频outputformat参数只能是webm-24khz-16bit-mono-opus，也就是说还需要再用ffmpeg之类的库转换格式。暂时也没找到比较好用的库，先记录到这

【API解析】微软edge浏览器大声朗读功能（read aloud）调用步骤

【API解析】微软edge浏览器大声朗读功能（read aloud）调用步骤

【API解析】微软edge浏览器大声朗读功能（read aloud）调用步骤

1. 来源

2. 准备工作

3. 主要分析步骤

4. 编写代码

5. 结语

发布评论取消回复

最近发表

相关推荐

标签列表

【API解析】微软edge浏览器大声朗读功能（read aloud）调用步骤

【API解析】微软edge浏览器大声朗读功能（read aloud）调用步骤

【API解析】微软edge浏览器大声朗读功能（read aloud）调用步骤

1. 来源

2. 准备工作

3. 主要分析步骤

4. 编写代码

5. 结语

发布评论 取消回复

最近发表

相关推荐

标签列表

发布评论取消回复