I'm playing arround with the HTML5 voice recognition.
Currently I have a function like this:
doSomething() {
listen("name");
console.log("done");
}
The "listen" Function works currently like this:
recognition = new webkitSpeechRecognition();
recognition.lang = "de-DE";
recognition.continuous = false;
//recognition.interimResults = true;
recognition.onresult = function(event) {
result = event.results[event.resultIndex];
confidence = result[0].confidence;
result = result[0].transcript.trim();
};
//TODO: remove old results, work with results
recognition.start();
What is happening is that Chrome asks for the microphone access and directly does the console.log.
What I want is for the console.log to wait until the speech recognition is done. Like this:
Chrome asks for mic access
User says something
Something is done with what the user said
the console.log and everything that follows will be executed.
How can I do that?
Thank you!
Javascript programming is event-driven. The code is not a sequence of statements to execute, but just a description of events to handle and reactions on them.
If you want to perform some action on speech recognized, you need to put it into even handler, in your case:
recognition.onresult = function(event) {
result = event.results[event.resultIndex];
confidence = result[0].confidence;
result = result[0].transcript.trim();
console.log("done")
};
You can access variables inside handler function and do more complex things.
There are many explanations of event-driven programming on the web, but the most complete one is Chapter 17 Handling Events of JavaScript: The Definitive Guide, 6th Edition
Related
Safari on iOS puts a scrubber on its lock screen for simple HTMLAudioElements. For example:
const a = new Audio();
a.src = 'https://example.com/audio.m4a'
a.play();
JSFiddle: https://jsfiddle.net/0seckLfd/
The lock screen will allow me to choose a position in the currently playing audio file.
How can I disable the ability for the user to scrub the file on the lock screen? The metadata showing is fine, and being able to pause/play is also acceptable, but I'm also fine with disabling it all if I need to.
DISABLE Player on lock screen completely
if you want to completely remove the lock screen player you could do something like
const a = new Audio();
document.querySelector('button').addEventListener('click', (e) => {
a.src = 'http://sprott.physics.wisc.edu/wop/sounds/Bicycle%20Race-Full.m4a'
a.play();
});
document.addEventListener('visibilitychange', () => {
if (document.hidden) a.src = undefined
})
https://jsfiddle.net/5s8c9eL0/3/
that is stoping the player when changing tab or locking screen
(code to be cleaned improved depending on your needs)
From my understanding you can't block/hide the scrubbing commands unless you can tag the audio as a live stream. That being said, you can use js to refuse scrubbing server-side. Reference the answer here. Although that answer speaks of video, it also works with audio.
The lock screen / control center scrubber can also be avoided by using Web Audio API.
This is an example of preloading a sound and playing it, with commentary and error handling:
try {
// <audio> element is simpler for sound effects,
// but in iOS/iPad it shows up in the Control Center, as if it's music you'd want to play/pause/etc.
// Also, on subsequent plays, it only plays part of the sound.
// And Web Audio API is better for playing sound effects anyway because it can play a sound overlapping with itself, without maintaining a pool of <audio> elements.
window.audioContext = window.audioContext || new AudioContext(); // Interoperate with other things using Web Audio API, assuming they use the same global & pattern.
const audio_buffer_promise =
fetch("audio/sound.wav")
.then(response => response.arrayBuffer())
.then(array_buffer => audioContext.decodeAudioData(array_buffer))
var play_sound = async function () {
audioContext.resume(); // in case it was not allowed to start until a user interaction
// Note that this should be before waiting for the audio buffer,
// so that it works the first time (it would no longer be "within a user gesture")
// This only works if play_sound is called during a user gesture (at least once), otherwise audioContext.resume(); needs to be called externally.
const audio_buffer = await audio_buffer_promise; // Promises can be awaited any number of times. This waits for the fetch the first time, and is instant the next time.
// Note that if the fetch failed, it will not retry. One could instead rely on HTTP caching and just fetch() each time, but that would be a little less efficient as it would need to decode the audio file each time, so the best option might be custom caching with request error handling.
const source = audioContext.createBufferSource();
source.buffer = audio_buffer;
source.connect(audioContext.destination);
source.start();
};
} catch (error) {
console.log("AudioContext not supported", error);
play_sound = function() {
// no-op
// console.log("SFX disabled because AudioContext setup failed.");
};
}
I did a search, in search of a way to help you, but I did not find an effective way to disable the commands, however, I found a way to customize them, it may help you, follow the apple tutorial link
I think what's left to do now is wait, see if ios 13 will bring some option that will do what you want.
I've been trying to think on some ideas on what I could make with JavaScript using Web Audio API. I know that depending on the user's browser I know that sometimes it won't let you play audio without a user gesture of some sort. I been doing some research on how to do it and they are pretty useful ways but the problem is that some developers found different ways to do it. For example:
Using a audioContext.resume() and audioContext.suspend() methods to unlock web audio by changing it's state:
function unlockAudioContext(context) {
if (context.state !== "suspended") return;
const b = document.body;
const events = ["touchstart", "touchend", "mousedown", "keydown"];
events.forEach(e => b.addEventListener(e, unlock, false));
function unlock() {context.resume().then(clean);}
function clean() {events.forEach(e => b.removeEventListener(e, unlock));}
}
creating an empty buffer and play it to unlock web audio.
var unlocked = false;
var context = new (window.AudioContext || window.webkitAudioContext)();
function init(e) {
if (unlocked) return;
// create empty buffer and play it
var buffer = context.createBuffer(1, 1, 22050);
var source = context.createBufferSource();
source.buffer = buffer;
source.connect(context.destination);
/*
Phonograph.js use this method to start it
source.start(context.currentTime);
paulbakaus.com suggest to use this method to start it
source.noteOn(0);
*/
source.start(context.currentTime) || source.noteOn(0);
setTimeout(function() {
if (!unlocked) {
if (source.playbackState === source.PLAYING_STATE || source.playbackState === source.FINISHED_STATE) {
unlocked = true;
window.removeEventListener("touchend", init, false);
}
}
}, 0);
}
window.addEventListener("touchend", init, false);
I know mostly how both of these methods work but
my question is what is going on here, what is the difference and which method is better etc?
And can someone please explain to me about this source.playbackState from an AudioBufferSourceNode Please? I never heard about that property on there before. It even doesn't have an article or get mentioned in the Mozilla MDN Website.
Also as a bonus question (which you don't have to answer), If both of these methods are useful then could it be possible to put them together as one if you know what I mean?
Sorry if that is a lot to ask. Thanks :)
resources:
https://paulbakaus.com/tutorials/html5/web-audio-on-ios/
https://github.com/Rich-Harris/phonograph/blob/master/src/init.ts
https://www.mattmontag.com/web/unlock-web-audio-in-safari-for-ios-and-macos
Both methods work, but I find the first (resume context in a user gesture) to be cleaner. The AudioBufferSource method is a kind of gross hack for backward compatibility with old sites that started playing buffers in a user gesture. This method doesn't work if you don't start the buffer from a gesture. (I think.)
Which one you want to use is up to you.
I try to realize a small web page with a javascript, that allows to playback a spechsynthesis part from text in between two mp3.
As for whatever reason the onend statement of the spoken part does not work, I wanted to create a recursive function, that helps me. For that, I use the "speaking" methode of the SpeechSynthesis. But for whatever reason, speaking is never true.
I debugged and also tried several statements (see code), but it never ever turns out to be true. Is there something with the code? Otherwise, how to report a bug of this library?
function doSpeech() {
var synth = window.speechSynthesis;
var utterance1 = new SpeechSynthesisUtterance('How about we say this now? This is quite a long sentence to say.');
var utterance2 = new SpeechSynthesisUtterance('We should say another sentence too, just to be on the safe side.');
synth.speak(utterance1);
if(synth.speaking){
doNothing();
}else{
playEnd();
}
playEnd() just plays an mp3 if synth is speaking.
Please note, when I put playEnd() in the if statement, it won't play. I can put whatever code in there, it is never reached, as synth.speaking will never be true. This example is close to the example of Mozilla foundations documentation on this (https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis/speaking). I wanted to test it, as the recursion never worked.
EDIT: Recursion still won't do it in my specific coding. Am I missing something here?
function doSpeech() {
var synth = window.speechSynthesis;
var speech = new SpeechSynthesisUtterance();
speech.text = getText();
speech.lang = "en-US";
speech.voice = speechSynthesis.getVoices().filter(function(voice) { return voice.name == 'Google UK English Male'; })[0];
speech.addEventListener('start', function(){
speechEndLoop(synth);
});
synth.speak(speech);
}
function speechEndLoop(x) {
if (x.speaking) {
speechEndLoop(x);
} else {
playEnd();
}
}
It works perfectly, the problem is that according to your code you are verifying immediately for the status. This may be a problem because it depends on how the API is converting the text to audio (using the local text to speech of the operative system or using the google servers):
function doSpeech() {
var synth = window.speechSynthesis;
var utterance1 = new SpeechSynthesisUtterance('How about we say this now? This is quite a long sentence to say. Make it longer !');
var utterance2 = new SpeechSynthesisUtterance('We should say another sentence too, just to be on the safe side. even longer !');
synth.speak(utterance1);
// If you check immediately (js code executed in less than ms) the
// status won't be true
if (synth.speaking) {
console.log("This is usually printed, if the utterance uses the default voice of the browser (native)");
}
// Wait 500ms to check for the status of the utterance
setTimeout(function(){
if (synth.speaking) {
console.log("This will be printed if starts after 500ms :)");
}
}, 500);
}
doSpeech();
In my case, both of the console.log statements are being printed. But if in your case it isn't being printed, execute your code only after the start event of the utterance:
function doSpeech() {
var synth = window.speechSynthesis;
var msg = new SpeechSynthesisUtterance();
msg.text = "We should say another sentence too, just to be on the safe side. even longer !";
msg.addEventListener('start', function () {
if(synth.speaking){
console.log("This will be printed !");
}
});
synth.speak(msg);
}
doSpeech();
Is pretty nice to work with the plain API by yourself, but if you want something more robust to solve the text to speech problem, i recommend you to use the JS Library Artyom.js, it offers a pretty nice wrapper for the Speech Synthesis API. Even with this library you will see the same behaviour:
function doSpeech() {
let assistant = new Artyom();
assistant.say('How about we say this now? This is quite a long sentence to say. Make it longer !', {
onStart: function() {
if(assistant.isSpeaking()){
console.log("This will be shown");
}
}
});
if(assistant.isSpeaking()){
console.log("This won't appear !");
}
}
doSpeech();
I am trying out a simple example with Speechsynthesis.
<script>
voices = window.speechSynthesis.getVoices()
var utterance = new SpeechSynthesisUtterance("Hello World");
utterance.voice = voices[4];
utterance.lang = voices[4].lang;
window.speechSynthesis.speak(utterance);
</script>
But this gives an error that voices is undefined. I found that getVoices() is loaded async. I saw this answer and updated my code as shown below to use callback.
<script>
window.speechSynthesis.onvoiceschanged = function() {
voices = window.speechSynthesis.getVoices()
var utterance = new SpeechSynthesisUtterance("Hello World");
utterance.voice = voices[4];
utterance.lang = voices[4].lang;
window.speechSynthesis.speak(utterance);
};
</script>
But due to some strange reason, the text is spoken three times instead of one. How can I fix this code?
I can't replicate your issue, but try adding an event listener so that your function runs after the voices are loaded.
let voices, utterance;
function speakVoice() {
voices = this.getVoices();
utterance = new SpeechSynthesisUtterance("Hello World");
utterance.voice = voices[1];
speechSynthesis.speak(utterance);
};
speechSynthesis.addEventListener('voiceschanged', speakVoice);
This can be seen on many JS Bin-type demos. For examples:
http://jsbin.com/sazuca/1/edit?html,css,js,output
https://codepen.io/matt-west/pen/wGzuJ
This behaviour is seen in Chrome, which uses the voiceschanged event, when a non-local voice is used. Another effect is that the list of voices is often triplicated.
The W3C specification says:
voiceschanged event
Fired when the contents of the
SpeechSynthesisVoiceList, that the getVoices method will return, have
changed. Examples include: server-side synthesis where the list is
determined asynchronously, or when client-side voices are
installed/uninstalled.
...so I presume that the event is fired once when Chrome gets the voices and then twice more when the first non-local voice is used.
Given that there doesn't seem to be a way to distinguish which change is triggering the event I have been using this ugly bit of code:
// Add voices to dropdown list
loadVoices();
// For browsers that use voiceschanged event
speechSynthesis.onvoiceschanged = function(e) {
// Load the voices into the dropdown
loadVoices();
// Don't add more options when voiceschanged again
speechSynthesis.onvoiceschanged = null;
}
Where loadVoices() is the function that adds the voices to a selection's options. It's not ideal, however it does work on all browsers (with speech synthesis) whether they use onvoiceschanged or not.
Faced the same problem just now & the solution is pretty easy.
Just declare the voices globally not just inside the onclick function &
do it two times
utterance.voice = window.speechSynthesis.getVoices()[Math.floor(Math.random()*6)]
setTimeout(() => {
utterance.voice = window.speechSynthesis.getVoices()[Math.floor(Math.random()*6)]
}, 1000)
The Utterance is variable containing speechSynthesisisUtterance()
The Brave browser only supports 6 types of voices as compared to 24 of chrome,
that's why I choose any random voice b/w 1-6.
You can simply add this code and use SpeechSynthesis in your project, it works for me.
var su;
su = new SpeechSynthesisUtterance();
su.text = "Hello World";
speechSynthesis.speak(su);
speechSynthesis.cancel();
I recently implemented a basic web app which relied on Google's TTS URL to generate clear MP3 files for playback on the front end.
This has since been subject to an additional security check, meaning I have had to update the code base to use alternative methods.
One such alternative is javascript's speech synthesis API, i.e. SpeechSynthesisUtterance() and window.speechSynthesis.speak('...'). This works really well on my desktop and laptop but as soon as I use it on my iOS devices, the rate of the audio is accelerated significantly.
Can anyone suggest what I can do to resolve this?
See below for example code:
var msg = new SpeechSynthesisUtterance();
msg.text = item.title;
msg.voice = "Google UK English Male";
msg.rate = 0.7;
msg.onend = function(){
console.log('message has ended');
$('.word-img').removeClass('img-isplaying');
};
msg.onerror = function(){
console.log('ERROR WITH SPEECH API');
$('.word-img').removeClass('img-isplaying');
};
window.speechSynthesis.speak(msg);
IOS doesn't allow to use the new SpeechSynthesis-Api programmatically. The user must trigger the action explicit. I can understand this decision. But I don't understand, why the Api is not working in webapps, like playing audio files. This is not working in IOS's default safari, but its working in webapps.
Here is a little trick:
<a id="trigger_me" onclick="speech_text()"></a>
<script>
function speech_text() {
var msg = new SpeechSynthesisUtterance();
/* ... */
}
/* and now you must trigger the event for #trigger_me */
$('#trigger_me').trigger('click');
</script>
This is working only with native dom elements. If you add a new tag programmatically into the dom like...
$('body').append('<a id="trigger_me" onclick="speech_text()"></a>');
... the function will not triggered. It seems that IOS-Safari registers events for special internal functions only once after domload.
OK. I solved this problem today. The problem is that the iOS would not let the speech API run programmatically unless we have triggered one time under the user's interaction.
So we can listen to the user interaction and trigger one silent speech which can let us speak programmatically later.
Here is my code.
let hasEnabledVoice = false;
document.addEventListener('click', () => {
if (hasEnabledVoice) {
return;
}
const lecture = new SpeechSynthesisUtterance('hello');
lecture.volume = 0;
speechSynthesis.speak(lecture);
hasEnabledVoice = true;
});