I want to use the SpeechRecognition api with an audio file (mp3, wave, etc.)
Is that possible?
The short answer is No.
The Web Speech Api Specification does not prohibit this (the browser could allow the end-user to choose a file to use as input), but the audio input stream is never provided to the calling javascript code (in the current draft version), so you don't have any way to read or change the audio that is input to the speech recognition service.
This specification was designed so that the javascript code will only have access to the result text coming from the speech recognition service.
Basicly you may use it only with default audioinput device which is choosen on OS level...
Therefore you just need to play you file into your default audioinput
2 options possible:
1
Install https://www.vb-audio.com/Cable/
Update system settings to use VCable device as default audiooutput and audioinput
Play your file with any audio player you have
Recognize it... e.g. using even standard demo UI https://www.google.com/intl/fr/chrome/demos/speech.html
Tested this today, and it works perfectly :-)
2
THIS IS NOT TESTED BY ME, so I cannot confirm that this is working, but you may feed audio file into chrome using Selenium... just like
DesiredCapabilities capabilities = DesiredCapabilities.chrome();
ChromeOptions options = new ChromeOptions();
options.addArguments("--allow-file-access-from-files",
"--use-fake-ui-for-media-stream",
"--allow-file-access",
"--use-file-for-fake-audio-capture=D:\\PATH\\TO\\WAV\\xxx.wav",
"--use-fake-device-for-media-stream");
capabilities.setCapability(ChromeOptions.CAPABILITY, options);
ChromeDriver driver = new ChromeDriver(capabilities);
But I'm not sure if this stream will replace default audioinput
Andri deleted this post but I will repost it as I believe it to be the most accurate answer, besides the hackish answers above:
According to MDN you CAN'T do that. You can't feed any stream into recognition service
That's a big problem... You even cannot select microphone used by SpeechRecognition
That is done by purpose, Google want's to sell their CLOUD SPEECH API
You need to use services like CLOUD SPEECH API
You could probably just start the SpeechRecognition engine using the mic and playback the audio file via speakers to have feed back into the mic. It worked for me when I tested it.
Yes, it is possible to get the text transcript of the playback of an audio file using webkitSpeechRecognition. The quality of the transcript depends upon the quality of the audio playback.
const recognition = new webkitSpeechRecognition();
const audio = new Audio();
recognition.continuous = true;
recognition.interimResults = true;
recognition.onresult = function(event) {
if (event.results[0].isFinal) {
// do stuff with `event.results[0][0].transcript`
console.log(event.results[0][0].transcript);
recognition.stop();
}
}
recognition.onaudiostart = e => {
console.log("audio capture started");
}
recognition.onaudioend = e => {
console.log("audio capture ended");
}
audio.oncanplay = () => {
recognition.start();
audio.play();
}
audio.src = "/path/to/audio";
jsfiddle https://jsfiddle.net/guest271314/guvn1yq6/
My tablet is running Chrome 52.0.2743.98 but will not output sound when I go to this Web Audio example page.
When I inspect the audio context in the console, I can see that the currentTime is always 0.
Pasting the following code from MDN also produces no sound:
var audioCtx = new (window.AudioContext || window.webkitAudioContext)();
var oscillator = audioCtx.createOscillator();
oscillator.type = 'square';
oscillator.frequency.value = 3000; // value in hertz
oscillator.connect(audioCtx.destination);
oscillator.start();
These two examples work well on my laptop with Chrome 52.0.2743.116.
How can I get Chrome to output sound from the Web Audio API?
For Chrome Android, my recollection is that audio will only start if attached to a user interaction (e.g. a touch or click event). See also https://bugs.chromium.org/p/chromium/issues/detail?id=178297
My goal here is to use get some Speech to Text recognition with Meteor. I know that most device keyboards already have a Microphone button that enables Speech to Text, but I want to avoid having the users keyboard pop up.
To do this: I have been trying to use the webKitSpeechRecognition used here. I have it working on desktop just fine. But when I navigate to my localhost on my phone, or actually build the app on the mobile phone (Android Galaxy s5), weird things start to happen.
When navigating to localhost:
Rather than having the finalized text appear as it does on desktop, I get every iteration of what my recognition object thought. For example: if I said: 'Hello Stack'.
I get: 'hellohellohello stackhello stack'
When launching Mobile App:
The microphone never turns on. None of my console.logs come through and nothing ever happens.
Here is a Gist of all my code. And here is the relevant parts. Everything else is pretty standard Meteor template.
recognition = new webkitSpeechRecognition();
recognition.continuous = true;
recognition.interimResults = true;
Session.set('final_span','')
Session.set('interim_span','')
final_transcript='';
recognition.onstart = function() {
Session.set('listening',true)
recognizing = true;
console.log('started')
}
recognition.onresult = function(event) {
var interim_transcript = '';
for (var i = event.resultIndex; i < event.results.length; ++i) {
if (event.results[i].isFinal) {
final_transcript += event.results[i][0].transcript;
} else {
interim_transcript += event.results[i][0].transcript;
}
}
Session.set('final_span',final_transcript)
Session.set('interim_span',interim_transcript);
}
I've looked for packages to solve this for me and didn't find any that worked.
TLDR: Any insight on how to get Speech Recognition on a mobile device using Meteor.
I want to develop a web App for mobile phones that records audio from the microphone and plays music at the same time.
With getUserMedia() I get the stream and create in my AudioContext a MediaStreamSource. At the same time I create a BufferSource which plays music. In Chrome this setup works. But when I start the same Web app in Chrome on my Nexus 5 and I allow it to use the microphone the music is muted.
Success Callback for getUserMedia:
function gotStream(stream) {
mediaStreamSource = audioContext.createMediaStreamSource(stream);
meter = createAudioMeter(audioContext);
mediaStreamSource.connect(meter);
info = document.getElementById('info');
outputData();
}
Play Music Function:
function playSound(buffer) {
source = audioContext.createBufferSource();
source.buffer = buffer;
gainNode = audioContext.createGain();
source.connect(gainNode);
gainNode.connect(audioContext.destination);
source.start(0);
}
Should that be the expected behaviour or am I doing something wrong?
I have a web application that makes use of the HTML5 speech synthesis API and it works - but only with the native voice. Here's my code:
var msg = new SpeechSynthesisUtterance();
var voices;
window.speechSynthesis.onvoiceschanged = function() {
voices = window.speechSynthesis.getVoices();
};
$("#btnRead").click(function() {
speak();
});
function speak(){
msg = new SpeechSynthesisUtterance();
msg.rate = 0.8;
msg.text = $("#contentView").html();
msg.voice = voices[10];
msg.lang = 'en-GB';
window.speechSynthesis.speak(msg);
}
voices[10] is the only voice that works and when I log it to the console I can see that it's the native voice - which seems to suggest that the other voices aren't being loaded properly, but they still appear in the voices array when it's logged to the console as you can see here:
Anyone have any ideas? I'm sure I'm probably missing something relatively simple but I've been wrestling with this for a bit now! I'm using Google Chrome version 42.0.2311.90 which should support the speech synthesis API as far as I can tell.
Just started playing with speechSynthesis so did not spend so much time on it. I stumbled on your question and I believe the answer is that the voice you select does not support the language you give it and you get a fallback.
If you read the docs and check how you select a voice it works (at least at my pc)
https://developers.google.com/web/updates/2014/01/Web-apps-that-talk-Introduction-to-the-Speech-Synthesis-API?hl=en
var msg = new SpeechSynthesisUtterance('Awesome!');
msg.voice = speechSynthesis.getVoices().filter(function(voice) {
return voice.name == 'Google UK English Male';
})[0];
// now say it like you mean it:
speechSynthesis.speak(msg);
Hope this helps you or others searching for it.