HTML5 Speech Synthesis API - Only able to use the "native" voice

HTML5 Speech Synthesis API - Only able to use the "native" voice - javascript

I have a web application that makes use of the HTML5 speech synthesis API and it works - but only with the native voice. Here's my code:
var msg = new SpeechSynthesisUtterance();
var voices;
window.speechSynthesis.onvoiceschanged = function() {
voices = window.speechSynthesis.getVoices();
};
$("#btnRead").click(function() {
speak();
});
function speak(){
msg = new SpeechSynthesisUtterance();
msg.rate = 0.8;
msg.text = $("#contentView").html();
msg.voice = voices[10];
msg.lang = 'en-GB';
window.speechSynthesis.speak(msg);
}
voices[10] is the only voice that works and when I log it to the console I can see that it's the native voice - which seems to suggest that the other voices aren't being loaded properly, but they still appear in the voices array when it's logged to the console as you can see here:
Anyone have any ideas? I'm sure I'm probably missing something relatively simple but I've been wrestling with this for a bit now! I'm using Google Chrome version 42.0.2311.90 which should support the speech synthesis API as far as I can tell.

Just started playing with speechSynthesis so did not spend so much time on it. I stumbled on your question and I believe the answer is that the voice you select does not support the language you give it and you get a fallback.
If you read the docs and check how you select a voice it works (at least at my pc)
https://developers.google.com/web/updates/2014/01/Web-apps-that-talk-Introduction-to-the-Speech-Synthesis-API?hl=en
var msg = new SpeechSynthesisUtterance('Awesome!');
msg.voice = speechSynthesis.getVoices().filter(function(voice) {
return voice.name == 'Google UK English Male';
})[0];
// now say it like you mean it:
speechSynthesis.speak(msg);
Hope this helps you or others searching for it.

Related

Edge browser Speech Synthesis stop working after call speak with some empty paragraph text using Xiaoxiao voice

I'm using SpeechSynthesis APIs on Microsoft Edge browser. But something went wrong...
Here is what I have got so far (already minimized to what can reproduce the result)
I had Chinese language pack installed on Windows. The Edge browser also have some online voices available. You may need the same environments installed to make the following snippet working.
const speak = (p, voice) => {
p.split('\n').forEach(text => {
const ssu = new SpeechSynthesisUtterance(text);
ssu.lang = 'zh-CN';
ssu.voice = voice;
speechSynthesis.speak(ssu);
console.log(ssu);
});
};
let voices = [];
const getVoice = voiceURI => {
const voice = voices.find(voice => voice.voiceURI === voiceURI);
return voice;
};
const huihuiURI = "Microsoft Huihui - Chinese (Simplified, PRC)";
const xiaoxiaoURI = "Microsoft Xiaoxiao Online (Natural) - Chinese (Mainland)";
const voicesChanged = () => {
voices = speechSynthesis.getVoices();
if (getVoice(huihuiURI) && getVoice(xiaoxiaoURI)) {
mainarea.hidden = false;
}
};
speechSynthesis.addEventListener('voiceschanged', voicesChanged);
voicesChanged();
huihui.addEventListener('click', () => {
speak('第一段\n……\n第二段', getVoice(huihuiURI));
});
xiaoxiao.addEventListener('click', () => {
speak('第一段\n……\n第二段', getVoice(xiaoxiaoURI));
});
<main id=mainarea hidden>
<button id=huihui>Microsoft Huihui</button>
<button id=xiaoxiao>Microsoft Xiaoxiao Online</button>
</main>
The script first wait two speech voices available, and then show two buttons. When certain button is clicked, it try to speak texts with specified voice.
When I click the button Huihui, it works correctly. But when I try the voice Xiaoxiao, only first paragraph is spoken. The Xiaoxiao voice refused to speak the ssu without any words, and simply stop working instead of skip it and continue to the next one. I'm not sure why this happened. (You may need to reload / reopen the webpage to test different buttons.)
The text going to speak will come from user input (out of my control) in my project. So I don't think I can strip empty words before sending them to SpeechSynthesis APIs.
I want to know what's wrong here and how can I fix this, so I can use Xiaoxiao voice to speak the whole text.
In case it matters: I'm using Microsoft Edge Version 92.0.902.67 (Official build) (64-bit) on Microsoft Windows [Version 10.0.19043.1151].

I make some tests and find that the issue happens on some versions of Windows 10. On Windows 10 version 20H2, OS build 19042.630, it works well with both voices. But on Windows 10 version 1909, OS build 18363.1679, I can reproduce the same issue. The Edge versions are the same in both machines, which is 92.0.902.67 (Official build) (64-bit). I think the issue may be related with OS builds.
In the Xiaoxiao voice not working scenario, I observed that it can't speak paragraphs with only symbols like the paragraph only has "......", then it stops to speak the remaining things. According to this, I think the only workaround is not speaking the article paragraph by paragraph, but speaking the whole article for once.
Then in the code, you don't need to split the text by \n and you can edit the first part of the js code like below. Then it can speak the whole text with Xiaoxiao voice:
const speak = (p, voice) => {
const ssu = new SpeechSynthesisUtterance(p);
ssu.lang = 'zh-CN';
ssu.voice = voice;
speechSynthesis.speak(ssu);
console.log(ssu);
};

Why Can't I Choose Female "Microsoft Zira" Voice in SpeechSynthesisUtterance()?

My following HTML page correctly speaks the text but it is not speaking in a female voice Microsoft Zira Desktop - English (United States). Question: What I may be missing here and how can we make it speak in a female voice?
Remark: I tried this html in MS Edge and Google Chrome multiple times with and without refreshing the page but it keeps speaking with the same male voice. It seems it is ignoring the msg.voice value in the JavaScript below. I am using Windows 10 - that probably should not matter.
<!DOCTYPE html>
<html>
<head>
<script>
function myFunction() {
var msg = new SpeechSynthesisUtterance();
msg.voice = speechSynthesis.getVoices().filter(function(voice) {
return voice.name == "Microsoft Zira Desktop - English (United States)"})[0];
msg.text = document.getElementById("testDiv").textContent;
window.speechSynthesis.speak(msg);
}
</script>
</head>
<body>
<h2>JavaScript in Head</h2>
<div id="testDiv">SQL Managed Instance gives you an instance of SQL Server but removes much of the <b>overhead</b> of managing a <u>virtual machine</u>. Most of the features available in SQL Server are available in SQL Managed Instance. This option is ideal for customers who want to use instance-scoped features and want to move to Azure without rearchitecting their applications.</div>
<button type="button" onclick="myFunction()">Try it</button>
</body>
</html>
UPDATE
Per a suggestion from user #Frazer, I ran speechSynthesis.getVoices() in my google chrome console and got the following results - that does contain Microsoft Zira .... voice:
Observation:
Following this advice, I moved the the <script> block to end of the body block (just before </body>) but still the same male voice. HOWEVER, when I replaced the voice from Microsoft Zira Desktop - English (United States) to Google UK English Female, the following happens: On the first click of Try it button, the speaker is still the default male, but on every subsequent clicks on this button, I correctly get the Google UK English Female voice. Note: The Microsoft Zira Desktop - English (United States) does nothing in the above scenario. This leads me to believe that this technology still is experiential - as mentioned here.

Why Does it Work for Some Browsers?
I have an answer to a similar question here, Why is the voiceschanged event fired on page load?, but I think your situation is sufficiently different to merit a new answer.
First, why does it work sometimes? Because "Microsoft Zira Desktop - English (United States)" is retrieved from the web, through an API call, and this data is not available by the time the next line executes. Basically, you should wait until onvoiceschanged is called before actually calling getVoices() to get the voices.
To quote the docs...
With Chrome however, you have to wait for the event to fire before populating the list, hence the bottom if statement seen below. (Source: MDN WebDocs: SpeechSynthesis.onvoiceschanged) (Emphasis mine.)
If the list doesn't populate, and you don't have the female language available, the male will play by default.
Because Constructor `getVoices()` Makes an API Call, Treat it as Asynchronous
Try running your code like so...
var msg = new SpeechSynthesisUtterance();
var voices = window.speechSynthesis.getVoices();
window.speechSynthesis.onvoiceschanged = function() {
voices = window.speechSynthesis.getVoices();
};
function myFunction() {
console.log(voices);
msg.voice = voices.filter(function(voice) {
return voice.name == "Microsoft Zira - English (United States)"})[0];
console.log(msg.voice);
msg.text = document.getElementById("testDiv").textContent;
window.speechSynthesis.speak(msg);
}
P.S. Here is my own coding example of how I handle the voices loading on a text-to-audio reader: GreenGluon CMS: text-audio.js Also here: PronounceThat.com pronounce-that.js

Add the 'disabled' attribute to your button then try this before the /body tag with either Zira or the Chrome voice.
speechSynthesis.onvoiceschanged = () => {
voices = speechSynthesis.getVoices()
if (voices.length) document.querySelector("button").disabled = false
}
let voices = speechSynthesis.getVoices()
const myFunction = () => {
const msg = new SpeechSynthesisUtterance();
msg.voice = voices.filter(voice => {
return voice.name === "Microsoft Zira Desktop - English (United States)"
})[0];
msg.text = document.getElementById("testDiv").textContent;
speechSynthesis.speak(msg);
}

microsoft speech recognition + nodejs

Is the nodejs cognitive services speech sdk still supported? I know how to do this for the browser based sdk, but it looks like the nodejs version doesn't work, it doesn't capture any microphone input.
Notably, there are no examples publish that use AudioConfig.fromDefaultMicrophoneInput for nodejs. The nodejs sdk works perfectly fine with AudioConfig.fromStreamInput
Here's the relevant code:
var speechsdk = require("microsoft-cognitiveservices-speech-sdk");
var subscriptionKey = ";)";
var serviceRegion = "eastus"; // e.g., "westus"
const speech_Config = speechsdk.SpeechConfig.fromSubscription(subscriptionKey, serviceRegion, "en-US");
const audioConfig = speechsdk.AudioConfig.fromDefaultMicrophoneInput();
let speech_recognizer= new speechsdk.SpeechRecognizer(speech_Config, audioConfig);
speech_recognizer.recognizeOnceAsync(
function (result) {
console.log(result);
speech_recognizer.close();
speech_recognizer = undefined;
},
function (err) {
console.trace("err - " + err);
speech_recognizer.close();
speech_recognizer = undefined;
});
I get an error saying: window is not defined
npm: https://www.npmjs.com/package/microsoft-cognitiveservices-speech-sdk

For this error , Microsoft engineers has an explain for it here .
It is due to the default microphone support uses the Web Audio API to
conjure a microphone stream. The node environment doesn't support
this.
As a workaround , for pure node code you can use a file, push or pull
stream to get audio into the speech recognition engine.
Hope it helps : )

Does the function utils.device.checkHasPositionalTracking() still exist?

I'm getting an error on desktop chrome that utils.device.checkHasPositionalTracking() is "not a function".
If it is obsolete, where can I find an updated list of utils.device methods for device detection? The official documentation seems to be outdated and lists depreciated methods for device detection. The browser doesn't seem to recognize this one in particular at all.
let mobile = AFRAME.utils.device.isMobile ();
//// isOculusGo and isGearVR have been replaced with isMobileVR
//let gearVR = AFRAME.utils.device.isGearVR();
//let oculusGo = AFRAME.utils.device.isOculusGo();
let mobileVR = AFRAME.utils.device.isMobileVR
//let tracking = AFRAME.utils.device.checkHasPositionalTracking(); //not working
let headset =AFRAME.utils.device.checkHeadsetConnected();
if(mobile){
console.log("Viewer is mobile.");
}
if(mobileVR){
console.log("Viewer is MobileVR.");
}
/*if(tracking){
console.log("Viewer has positional tracking.");
}*/
if(headset){
console.log("Headset Connected.");
}
The previous code results in "Viewer is MobileVR" even though I'm testing on a desktop computer.

Yes, it's gone, I couldn't find it in the source code of A-Frame either.
My pull request for the deletion in the documentation was approved just now:
https://github.com/aframevr/aframe/pull/4255

Is there a way to use the Javascript SpeechRecognition API with an audio file?

I want to use the SpeechRecognition api with an audio file (mp3, wave, etc.)
Is that possible?

The short answer is No.
The Web Speech Api Specification does not prohibit this (the browser could allow the end-user to choose a file to use as input), but the audio input stream is never provided to the calling javascript code (in the current draft version), so you don't have any way to read or change the audio that is input to the speech recognition service.
This specification was designed so that the javascript code will only have access to the result text coming from the speech recognition service.

Basicly you may use it only with default audioinput device which is choosen on OS level...
Therefore you just need to play you file into your default audioinput
2 options possible:
1
Install https://www.vb-audio.com/Cable/
Update system settings to use VCable device as default audiooutput and audioinput
Play your file with any audio player you have
Recognize it... e.g. using even standard demo UI https://www.google.com/intl/fr/chrome/demos/speech.html
Tested this today, and it works perfectly :-)
2
THIS IS NOT TESTED BY ME, so I cannot confirm that this is working, but you may feed audio file into chrome using Selenium... just like
DesiredCapabilities capabilities = DesiredCapabilities.chrome();
ChromeOptions options = new ChromeOptions();
options.addArguments("--allow-file-access-from-files",
"--use-fake-ui-for-media-stream",
"--allow-file-access",
"--use-file-for-fake-audio-capture=D:\\PATH\\TO\\WAV\\xxx.wav",
"--use-fake-device-for-media-stream");
capabilities.setCapability(ChromeOptions.CAPABILITY, options);
ChromeDriver driver = new ChromeDriver(capabilities);
But I'm not sure if this stream will replace default audioinput

Andri deleted this post but I will repost it as I believe it to be the most accurate answer, besides the hackish answers above:
According to MDN you CAN'T do that. You can't feed any stream into recognition service
That's a big problem... You even cannot select microphone used by SpeechRecognition
That is done by purpose, Google want's to sell their CLOUD SPEECH API
You need to use services like CLOUD SPEECH API

You could probably just start the SpeechRecognition engine using the mic and playback the audio file via speakers to have feed back into the mic. It worked for me when I tested it.

Yes, it is possible to get the text transcript of the playback of an audio file using webkitSpeechRecognition. The quality of the transcript depends upon the quality of the audio playback.
const recognition = new webkitSpeechRecognition();
const audio = new Audio();
recognition.continuous = true;
recognition.interimResults = true;
recognition.onresult = function(event) {
if (event.results[0].isFinal) {
// do stuff with `event.results[0][0].transcript`
console.log(event.results[0][0].transcript);
recognition.stop();
}
}
recognition.onaudiostart = e => {
console.log("audio capture started");
}
recognition.onaudioend = e => {
console.log("audio capture ended");
}
audio.oncanplay = () => {
recognition.start();
audio.play();
}
audio.src = "/path/to/audio";
jsfiddle https://jsfiddle.net/guest271314/guvn1yq6/

Develop Reference

JavaScript is the programming language of the Web.

HTML5 Speech Synthesis API - Only able to use the "native" voice - javascript

Related

Edge browser Speech Synthesis stop working after call speak with some empty paragraph text using Xiaoxiao voice

Why Can't I Choose Female "Microsoft Zira" Voice in SpeechSynthesisUtterance()?

microsoft speech recognition + nodejs

Does the function utils.device.checkHasPositionalTracking() still exist?

Is there a way to use the Javascript SpeechRecognition API with an audio file?

Categories

Resources