Continuous Speech Recognition on browser like "ok google" or "hey siri" - javascript

I am doing a POC and my requirement is that I want to implement the feature like OK google or Hey Siri on browser.
I am using the Chrome Browser's Web speech api. The things I noticed that I can't continuous the recognition as it terminates automatically after a certain period of time and I know its relevant because of security concern. I just does another hack like when the SpeechReognition terminates then on its end event I further start the SpeechRecogntion but it is not the best way to implement such a solution because suppose if I am using the 2 instances of same application on the different browser tab then It doesn't work or may be I am using another application in my browser that uses the speech recognition then both the application doesn't behave the same as expected. I am looking for a best approach to solve this problem.
Thanks in advance.

Since your problem is that you can't run the SpeechRecognition continuously for long periods of time, one way would be to start the SpeechRecognition only when you get some input in the mic.
This way only when there is some input, you will start the SR, looking for your magic_word.
If the magic_word is found, then you will be able to use the SR normally for your other tasks.
This can be detected by the WebAudioAPI, which is not tied by this time restriction SR suffers from. You can feed it by an LocalMediaStream from MediaDevices.getUserMedia.
For more info, on below script, you can see this answer.
Here is how you could attach it to a SpeechRecognition:
const magic_word = ##YOUR_MAGIC_WORD##;
// initialize our SpeechRecognition object
let recognition = new webkitSpeechRecognition();
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 1;
recognition.continuous = true;
// detect the magic word
recognition.onresult = e => {
// extract all the transcripts
var transcripts = [].concat.apply([], [...e.results]
.map(res => [...res]
.map(alt => alt.transcript)
)
);
if(transcripts.some(t => t.indexOf(magic_word) > -1)){
//do something awesome, like starting your own command listeners
}
else{
// didn't understood...
}
}
// called when we detect silence
function stopSpeech(){
recognition.stop();
}
// called when we detect sound
function startSpeech(){
try{ // calling it twice will throw...
recognition.start();
}
catch(e){}
}
// request a LocalMediaStream
navigator.mediaDevices.getUserMedia({audio:true})
// add our listeners
.then(stream => detectSilence(stream, stopSpeech, startSpeech))
.catch(e => log(e.message));
function detectSilence(
stream,
onSoundEnd = _=>{},
onSoundStart = _=>{},
silence_delay = 500,
min_decibels = -80
) {
const ctx = new AudioContext();
const analyser = ctx.createAnalyser();
const streamNode = ctx.createMediaStreamSource(stream);
streamNode.connect(analyser);
analyser.minDecibels = min_decibels;
const data = new Uint8Array(analyser.frequencyBinCount); // will hold our data
let silence_start = performance.now();
let triggered = false; // trigger only once per silence event
function loop(time) {
requestAnimationFrame(loop); // we'll loop every 60th of a second to check
analyser.getByteFrequencyData(data); // get current data
if (data.some(v => v)) { // if there is data above the given db limit
if(triggered){
triggered = false;
onSoundStart();
}
silence_start = time; // set it to now
}
if (!triggered && time - silence_start > silence_delay) {
onSoundEnd();
triggered = true;
}
}
loop();
}
As a plunker, since neither StackSnippets nor jsfiddle's iframes will allow gUM in two versions...

Related

How to use COCO-SSD to search for specific objects, in order to take measurements from them

I am attempting to create an app to detect a ball and another object while taking a video, say a racket, and these objects must be in FOV of the camera all the time to activate a function that then takes the average speed of the ball. I have established a site that does request permission to access webcam and displays it on screen, while making a recording button and all. However, my issues start turning up in my Javascript.EDIT:Start Recording contains id="togglerecordingmodeEl" onclick="toggleRecording()"
HTML:
Start Recording
Preview
JavaScript:
let video = null; // video element
let detector = null; // detector object
let detections = []; // store detection result
let videoVisibility = true;
let detecting = false;
const toggleRecordingEl = document.getElementById('toggleRecordingEl');
function toggleRecording() {
if (!video || !detector) return;
if (!detecting) {
detect();
toggleDetectingEl.innerText = 'Stop Detecting';
} else {
toggleDetectingEl.innerText = 'Start Detecting';
}
detecting = !detecting;
}
function detect() {
// instruct "detector" object to start detect object from video element
// and "onDetected" function is called when object is detected
detector.detect(video, onDetected);
}
// callback function. it is called when object is detected
function onDetected(error, results) {
if (error) {
console.error(error);
}
detections = results;
}
In summary, I want it to first recognise two objects in the FOV, and only if these two objects are in the FOV, then I can work on a new function. I have used https://medium.com/the-web-tub/object-detection-with-javascript-the-easy-way-74fbe98741cf & https://developer.mozilla.org/en-US/docs/Web/API/MediaStream_Recording_API/Recording_a_media_element to construct this app so far.

WebRTC 'playoutDelayHint' automaticle synchronizes all tracks

I wrote a simple application that streams from one master to several clients. Since the Master may use something like an IP-Webcam (Has ~1sec Latency) but the internal microphone (No Latency) i wanted to add a delay to the audiotrack. Unfortunately it seems like the delay does not work on Firefox and on chrome it automaticle synchronizes all tracks to the highest set playoutDelayHint. So everything becomes delayed one second. I checked both consumer RTPreceivers values for both tracks, only audio has set playoutDelayHint to one second which doesn't change over time, but after a few secons streaming the video becomes delayed for one second too.
const stream = new MediaStream;
[...]
let el = document.querySelector('#remote_video');
[...]
function addVideoAudio(consumer) {
if (consumer.kind === 'video') {
el.setAttribute('playsinline', true);
consumer._rtpReceiver.playoutDelayHint = 0;
} else {
el.setAttribute('playsinline', true);
el.setAttribute('autoplay', true);
consumer._rtpReceiver.playoutDelayHint = 1;
}
stream.addTrack(consumer.track.clone());
el.srcObject = stream;
el.consumer = consumer;
}
Even when i add another video element and another mediastream, so every stream (consumer) get's it's own html element i still get the same effect:
const stream1 = new MediaStream;
const stream2 = new MediaStream;
[...]
let el1 = document.querySelector('#remote_video');
let el2 = document.querySelector('#remote_audio');
[...]
function addVideoAudio(consumer) {
if (consumer.kind === 'video') {
el1.setAttribute('playsinline', true);
consumer._rtpReceiver.playoutDelayHint = 0;
stream1.addTrack(consumer.track);
el1.srcObject = stream1;
el1.consumer = consumer;
} else {
el2.setAttribute('playsinline', true);
el2.setAttribute('autoplay', true);
consumer._rtpReceiver.playoutDelayHint = 1;
stream2.addTrack(consumer.track);
el2.srcObject = stream2;
el2.consumer = consumer;
}
}
Is it possible to delay only one track and why does the delay only (kinda) work on chrome?
Thanks in advance. :)
You can use jitterBufferDelayHint to delay the audio.
Weirdly enough, playoutDelayHint on a video delay the video and audio.
But to delay the audio only, it seem jitterBufferDelayHint fixes it.
audioReceiver.playoutDelayHint = 1;
audioReceiver.jitterBufferDelayHint = 1;
This behavior might change over time.

Failed to execute 'start' on 'SpeechRecognition': recognition has already started

I am using a wrapper of Web Speech API for Angular6. I am trying to implement a system of starting-stopping after each 3.5s in order to be able to manipulate the results for these small parts.
Even though I stop the recognition, before starting it again, I keep getting this error Failed to execute 'start' on 'SpeechRecognition': recognition has already started.
As suggested in this post, I first verify whether the speech recognition is active or not and only if not active, I try to start it. https://stackoverflow.com/a/44226843/6904971
Here is the code:
constructor( private http: Http, private service: SpeechRecognitionService, private links: LinksService) {
var recognizing; // will get bool values to verify if recognition is active
this.service.onresult = (e) => {
this.message = e.results[0].item(0).transcript;
};
this.service.onstart = function () {
recognizing = true;
};
this.service.onaudiostart = function () {
recognizing = true;
};
this.service.onerror = function (event) {
recognizing = false;
};
this.service.onsoundstart = function () {
recognizing = true;
};
this.service.onsoundstart = function () {
recognizing = true;
};
this.record = () => {
this.service.start();
setInterval(root.ongoing_recording(), 3500);
};
var root = this;
var speech = '';
this.stop_recording = () => {
this.service.stop();
};
this.ongoing_recording = ()=> {
setTimeout(function(){
if( recognizing === true){
root.service.stop();
root.service.onend = (e) => {
recognizing = false;
speech = root.message;
var sentence = document.createElement('span');
sentence.innerHTML = speech + " ";
document.body.appendChild(sentence);
}
}
}, 3500);
setTimeout(function(){
if(recognizing === false){
root.service.start();
}
}, 3510);
};
}
start() {
this.service.start();
}
stop() {
this.service.stop();
}
record(){
this.record();
}
stop_recording(){
this.stop_recording();
}
ongoing_recording(){
this.ongoing_recording();
}
I think that the timing might not be good (with the setTimeout and interval). Any help would be much appreciated. Thank you! :)
I used Web Speech API for voice search functionality in my site and I was facing a similar sort of situation. It has one microphone icon which toggles the speech recognition on and off. It was working fine in the normal on and off of the button that started speech recognition but was breaking only if you test it rigorously with a continuous button toggle.
Solution:
The thing that worked for me is:
try{
//line of code to start the speech recognition
}
catch{
//line of code to stop the speech recognition
}
So I wrapped the .start() method which was breaking the application in a try block and then added the catch block to stop it. And even if it comes across this problem, on the next button click to turn on the speech recognition, it works. I hope you would be able to extract something from it.
one observation:
you run setInterval() every 3500 ms to invoke ongoing_recording(), but then use setTimeout() with 3500 ms again within ongoing_recording().
Besides that, maybe logging the error handler --where recognizing is also set to false-- could help finding a solution:
in past versions of the SpeechRecognition implementation, not every error did actually stop the recognition (I don't know if that is still the case).
So it might be the case, that recognizing is reset due to an error that did not actually stop the recognition; if this is really the cause of the error when restarting recognition, it could be just catched & ignored.
Also it might be worth trying to re-start the recognition in the onend handler (and onerror).
I am not sure what is the reason that is causing it in your code, but i had the same error and what caused it in my case was that I was calling start() twice in a row, so what fixed it was adding a variable to check if the recognition has started or stopped, so if it has started and I clicked it again it would return speach.stop() to avoid using start() again.
let recognition = new SpeechRecognition();
let status = 0;
document.querySelector(".mic").addEventListener("click",() => {
if (status == 1) {
status = 0;
return recognition.stop();
}
recognition.start();
status = 1;
recognition.onresult = function (event) {
status=0;
var text = event.results[0][0].transcript;
recognition.stop();
};
recognition.onspeechend = function () {
status = 0;
recognition.stop();
};
});

deinitialize audio recording started via getUserMedia

I'm trying to create audio stream from browser and send it to server.
Here is the code:
let recording = false;
let localStream = null;
const session = {
audio: true,
video: false
};
function start () {
recording = true;
navigator.webkitGetUserMedia(session, initializeRecorder, onError);
}
function stop () {
recording = false;
localStream.getAudioTracks()[0].stop();
}
function initializeRecorder (stream) {
localStream = stream;
const audioContext = window.AudioContext;
const context = new audioContext();
const audioInput = context.createMediaStreamSource(localStream);
const bufferSize = 2048;
// create a javascript node
const recorder = context.createScriptProcessor(bufferSize, 1, 1);
// specify the processing function
recorder.onaudioprocess = recorderProcess;
// connect stream to our recorder
audioInput.connect(recorder);
// connect our recorder to the previous destination
recorder.connect(context.destination);
}
function onError (e) {
console.log('error:', e);
}
function recorderProcess (e) {
if (!recording) return;
const left = e.inputBuffer.getChannelData(0);
// send left to server here (socket.io can do the job). We dont need stereo.
}
when function start is fired, the samples can be catched in recorderProcess
when function stop is fired, the mic icon in browser disappears, but...
unless I put if (!recording) return in the beginning of recorderProcess, it still process samples.
Unfortunately it's not a solution at all - the samples are still being received by recordingProcess and if I fire start functiono once more, it will get all samples from previous stream and from new one.
My question is:
How can I stop/start recording without such issue?
or if it's not best solution
How can I totally remove stream in stop function, to safely initialize it again anytime?
recorder.disconnect() should help.
You might want to consider the new MediaRecorder functionality in Chrome Canary shown at https://webrtc.github.io/samples/src/content/getusermedia/record/ (currently video-only I think) instead of the WebAudio API.

Web Audio API resume from pause

I often read that it's not possible to pause/resume audio files with the Web Audio API.
But now I saw a example where they actually made it possible to pause and resume it. I tried to figure out what how they did it. I thought maybe source.looping = falseis the key, but it wasn't.
For now my audio is always re-playing from the start.
This is my current code
var context = new (window.AudioContext || window.webkitAudioContext)();
function AudioPlayer() {
this.source = context.createBufferSource();
this.analyser = context.createAnalyser();
this.stopped = true;
}
AudioPlayer.prototype.setBuffer = function(buffer) {
this.source.buffer = buffer;
this.source.looping = false;
};
AudioPlayer.prototype.play = function() {
this.source.connect(this.analyser);
this.analyser.connect(context.destination);
this.source.noteOn(0);
this.stopped = false;
};
AudioPlayer.prototype.stop = function() {
this.analyser.disconnect();
this.source.disconnect();
this.stopped = true;
};
Does anybody know what to do, to get it work?
Oskar's answer and ayke's comment are very helpful, but I was missing a code example. So I wrote one: http://jsfiddle.net/v3syS/2/ I hope it helps.
var url = 'http://thelab.thingsinjars.com/web-audio-tutorial/hello.mp3';
var ctx = new webkitAudioContext();
var buffer;
var sourceNode;
var startedAt;
var pausedAt;
var paused;
function load(url) {
var request = new XMLHttpRequest();
request.open('GET', url, true);
request.responseType = 'arraybuffer';
request.onload = function() {
ctx.decodeAudioData(request.response, onBufferLoad, onBufferError);
};
request.send();
};
function play() {
sourceNode = ctx.createBufferSource();
sourceNode.connect(ctx.destination);
sourceNode.buffer = buffer;
paused = false;
if (pausedAt) {
startedAt = Date.now() - pausedAt;
sourceNode.start(0, pausedAt / 1000);
}
else {
startedAt = Date.now();
sourceNode.start(0);
}
};
function stop() {
sourceNode.stop(0);
pausedAt = Date.now() - startedAt;
paused = true;
};
function onBufferLoad(b) {
buffer = b;
play();
};
function onBufferError(e) {
console.log('onBufferError', e);
};
document.getElementById("toggle").onclick = function() {
if (paused) play();
else stop();
};
load(url);
In current browsers (Chrome 43, Firefox 40) there are now 'suspend' and 'resume' methods available for AudioContext:
var audioCtx = new AudioContext();
susresBtn.onclick = function() {
if(audioCtx.state === 'running') {
audioCtx.suspend().then(function() {
susresBtn.textContent = 'Resume context';
});
} else if(audioCtx.state === 'suspended') {
audioCtx.resume().then(function() {
susresBtn.textContent = 'Suspend context';
});
}
}
(modified example code from https://developer.mozilla.org/en-US/docs/Web/API/AudioContext/suspend)
Actually the web-audio API can do the pause and play task for you. It knows the current state of the audio context (running or suspended), so you can do this in this easy way:
susresBtn.onclick = function() {
if(audioCtx.state === 'running') {
audioCtx.suspend()
} else if(audioCtx.state === 'suspended') {
audioCtx.resume()
}
}
I hope this can help.
Without spending any time checking the source of your example, I'd say you'll want to use the noteGrainOn method of the AudioBufferSourceNode (https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#methodsandparams-AudioBufferSourceNode)
Just keep track of how far into the buffer you were when you called noteOff, and then do noteGrainOn from there when resuming on a new AudioBufferSourceNode.
Did that make sense?
EDIT:
See comments below for updated API calls.
EDIT 2, 2019: See MDN for updated API calls; https://developer.mozilla.org/en-US/docs/Web/API/AudioBufferSourceNode/start
For chrome fix, every time you want to play sound, set it like:
if(audioCtx.state === 'suspended') {
audioCtx.resume().then(function() {
audio.play();
});
}else{
audio.play();
}
The lack of a built-in pause functionality in the WebAudio API seems like a major oversight to me. Possibly, in the future it will be possible to do this using the planned MediaElementSource, which will let you hook up an element (which supports pausing) to Web Audio. For now, most workarounds seem to be based on remembering playback time (such as described in imbrizi's answer). Such a workaround has issues when looping sounds (does the implementation loop gapless or not?), and when you allow dynamically change the playbackRate of sounds (as both affect timing). Another, equally hack-ish and technically incorrect, but much simpler workaround you can use is:
source.playbackRate = paused?0.0000001:1;
Unfortunately, 0 is not a valid value for playbackRate (which would actually pause the sound). However, for many practical purposes, some very low value, like 0.000001, is close enough, and it won't produce any audible output.
UPDATE: This is only valid for Chrome. Firefox (v29) does not yet implement the MediaElementAudioSourceNode.mediaElement property.
Assuming that you already have the AudioContext reference and your media source (e.g. via AudioContext.createMediaElementSource() method call), you can call MediaElement.play() and MediaElement.pause()on your source, e.g.
source.mediaElement.pause();
source.mediaElement.play();
No need for hacks and workarounds, it's supported.
If you are working with an <audio> tag as your source, you should not call pause directly on the audio element in your JavaScript, that will stop playback.
In 2017, using ctx.currentTime works well for keeping track of the point in the song. The code below uses one button (songStartPause) that toggles between a play & pause button. I used global variables for simplicity's sake. The variable musicStartPoint keeps track of what time you're at in the song. The music api keeps track of time in seconds.
Set your initial musicStartPoint at 0 (beginning of the song)
var ctx = new webkitAudioContext();
var buff, src;
var musicLoaded = false;
var musicStartPoint = 0;
var songOnTime, songEndTime;
var songOn = false;
songStartPause.onclick = function() {
if(!songOn) {
if(!musicLoaded) {
loadAndPlay();
musicLoaded = true;
} else {
play();
}
songOn = true;
songStartPause.innerHTML = "||" //a fancy Pause symbol
} else {
songOn = false;
src.stop();
setPausePoint();
songStartPause.innerHTML = ">" //a fancy Play symbol
}
}
Use ctx.currentTime to subtract the time the song ends from when it started, and append this length of time to however far you were in the song initially.
function setPausePoint() {
songEndTime = ctx.currentTime;
musicStartPoint += (songEndTime - songOnTime);
}
Load/play functions.
function loadAndPlay() {
var req = new XMLHttpRequest();
req.open("GET", "//mymusic.com/unity.mp3")
req.responseType = "arraybuffer";
req.onload = function() {
ctx.decodeAudioData(req.response, function(buffer) {
buff = buffer;
play();
})
}
req.send();
}
function createBuffer() {
src = ctx.createBufferSource();
src.buffer = buff;
}
function connectNodes() {
src.connect(ctx.destination);
}
Lastly, the play function tells the song to start at the specified musicStartPoint (and to play it immediately), and also sets the songOnTime variable.
function play(){
createBuffer()
connectNodes();
songOnTime = ctx.currentTime;
src.start(0, musicStartPoint);
}
*Sidenote: I know it might look cleaner to set songOnTime up in the click function, but I figure it makes sense to grab the time code as close as possible to src.start, just like how we grab the pause time as close as possible to src.stop.
I didn't follow the full discussion, but I will soon. I simply headed over HAL demo to understand. For those who now do like me, I would like to tell
1 - how to make this code working now.
2 - a trick to get pause/play, from this code.
1 : replace noteOn(xx) with start(xx) and put any valid url in sound.load(). I think it's all I've done. You will get a few errors in the console that are pretty directive. Follow them. Or not : sometimes you can ignore them, it works now : it's related to the -webkit prefix in some function. New ones are given.
2 : at some point, when it works, you may want to pause the sound.
It will work. But, as everybody knows, a new pressing on play would raise an error. As a result, the code in this.play() after the faulty source_.start(0) is not executed.
I simply enclosed those line in a try/catch :
this.play = function() {
analyser_ = context_.createAnalyser();
// Connect the processing graph: source -> analyser -> destination
source_.connect(analyser_);
analyser_.connect(context_.destination);
try{
source_.start(0);
}
catch(e){
this.playing = true;
(function callback(time) {
processAudio_(time);
reqId_ = window.webkitRequestAnimationFrame(callback);
})();
}
And it works : you can use play/pause.
I would like to mention that this HAL simulation is really incredible. Follow those simple steps, it's worth it !

Categories

Resources