Speech-to-text Recognition is not accurate - javascript

I am trying to implement Speech-to-text recognition in my React website, and I am using the react-speech-recognition package from npm. I am using the exact code they have specified in the package description over here: npm
Now it works with everyday speech, anything I say, but when I induce technical jargon, it goes way off!
Here's what I am trying to say to it, it's aviation jargon:
Cleared to enter the CTR, not above 1500 feet, join and report on a right downwind runway 19, QNH 1018, squak 2732
This is what I get in response:
please to enter the city are not above 15 feet heart penetrate join and report on a ride on the wind blown away 9 theme
What else do I need to do to fix the accuracy of the recognition?

That package leverages the Speech Recognition Interface of your browser's Web Speech API. The React Library's API allows you to get the underlying SpeechRecognition object via a call to the getRecognition() method.
The underlying SpeechRecognition object's API allows for the addition of Grammars using the JSpeech Grammar Format. Here's an example. So in theory, you could provide more information about the words you're expecting to hear in your app, and thereby improve performance.
But there are caveats, including:
There is very limited browser support for the speech recognition generally, and for the addition of grammars, specifically. Obviously if you don't have control over what browser your users will be using, that means the quality of recognition will vary, and might not work at all if you don't use Polyfills.
Depending on how the speech recognition is implemented, things like hardware configuration and the Operating System may impact speech recognition results.
Speech recognition is an extremely inexact science. The best automatic speech recognition software/services only boast about 85% accuracy, even with ordinary speech. The ones built into your browser probably won't be even that good.
You may be able to get better accuracy from cloud-based speech services. Azure Cognitive Services, for example, allows you to create custom voice models, custom grammars, etc. Of course, they also charge you based on usage, and they charge more if you're using customizations.

Related

Custom Keyword continuous recognition option using Azure Cognitive Speech Services sdk for Javascript

Using the Custom Keyword Recognizer provided by Microsoft Cognitive Speech Services, I would like to setup continuous recognition on the browser using the microsoft-cognitiveservices-speech-sdk npm package. Is there a way to setup continuous keyword recognition. As of right now, only the single time recognition option is available as part of the sdk. Once the keyword is recognized, it would trigger Speech to Text services to process further speech. Post performing the required action, the keyword recognition would once again take over.
Is there a way to accomplish this without using Custom Commands, which provides a continuous keyword recognition?
You can use this only on devices SDK startKeywordRecognitionAsync and not in other SDKs.
This starts speech recognition with keyword spotting, until stopKeywordRecognitionAsync() is called
Not sure whether this will be applicable to your scenario.
At this point, the devices SDK only works with Roobo dev kits and Azure Kinect DK.
Coming to your requirement, you can always re-arm keyword recognizer by calling it again (probably a loop sort of or alternate mechanism that may meet your requirement)

In WebBluetooth development, How do i know UUIDs for services and characteristics that is offered by the device BLE peripheral?

I am assigned to use WebBluetooth functionality to connect to a Bluetooth BLE Printer and do some PRINTING.
Upon reading documentations, i figured that the way to do this is through some web bluetooth libraries that will enable you to CONNECT -> DISCOVER SERVICE -> DISCOVER CHARACTERISTIC and operate the manipulation on that.
My problem is to access these, in sample codes i see they put HEX characters for filtering.
My question now is, how do i know the CHARACTERISTICS UUIDs and SERVICE UUIDs on my BLE Printer so that i can control it to do some printing?
I would really appreciate any inputs regarding this.

Is it possible to interact with an arbitrary client peripheral from a webapp?

I know a webapp can access media devices like microphones and webcams, and other hardware like a smartphone's GPS. As far as I know, that's done with tightly constrained protocols for each specific type of device.
However, I have an advanced scientific camera. It is only useful with a computer - it ships with a desktop application for controlling it and taking photos. It also ships with a C/C++ SDK to interface with it in your own applications.
The browser obviously doesn't recognize it as a webcam. Even if it did, all but the most basic functionality would be ignored. The camera is on the client side. Is it possible to write a webapp that can interface with that camera and use all of its features? I'm not looking for a full solution, I just don't even know what to google.
Any amount of hand-rolled solution is fair game here. Anything from plain JavaScript to browser plug-ins to a custom desktop middleware psuedo-driver to sit between the hardware and the camera and the browser. Even binding the client to a specific OS is fine.
You can do basic video capture and screen grabs with Silverlight:
https://msdn.microsoft.com/en-us/library/ff602282(v=vs.95).aspx
It also scriptable by Javascript:
https://msdn.microsoft.com/en-us/library/cc645085(v=vs.95).aspx
Problem is, Silverlight is going away. Officially not until October 2021 though so that might still be an option until the browser vendors come online with HTML 5 Media Capture and Streams:
http://w3c.github.io/mediacapture-main/getusermedia.html
Anything beyond basic capture though, your probably looking at a custom browser extension to control the camera's functions thought it's provided API

JavaScript Speech-to-Text for blind people

I'm developing a website, and I would like to help blind people to use it by the voice, so I will use:
Text-to-speech, to give some posibilities to the user
Speech-to-text, to allow user to use her voice to select one
I already have some text-to-speech JavaScript libraries (like speak.js), but now I need a good speech-to-text one. There are some solutions for this purpose (like speechapi), but they use Java Applets or Flash, and I want to depend only on JavaScript, to avoid plugins.
I'm trying HTML5's speech input with x-webkit-speech and Google Chrome, and it is good, but you need to click over an icon (and blind people can't use a mouse well). Is it posible to use x-webkit-speech pressing a key? Do you know any alternative API (JavaScript)?
Thank you!
Is it posible to use x-webkit-speech pressing a key?
According to the this post and this post you cannot override the start of speech by clicking the microphone.
What the x-webkit-speech is doing is using the audio capture capabilities of HTML5 and sending the audio to Google's servers for processing, returning the results in JSON. This blogger has reversed engineered it. You could develop a JavaScript library that looks for a key press to start capturing audio on HTML5 enabled browsers and send it to Google's service or to one you have created. The downside to using Google's service is that it is an unsupported API and subject to change at any time. The downside to developing your own service is that it can be expensive to develop and maintain.
Do you know any alternative API (JavaScript)?
This post and this post lists some services available for speech recognition. I did not see Nuance listed. You may be able to use the Dragon Mobile SDK for this. And you may want to check into ISpeech.
Google Translate is very good Text To Speech Engine. I used to read a text with it. For example you have a text: welcome to Stack overflow you can call like this
http://translate.google.com/translate_tts?ie=UTF-8&q=Welcome%20to%20stack%20overflow&tl=en&total=1&idx=0&textlen=23&prev=input
then use browser audio to play it
For speech input you can manual activate listening process, see here
http://code.google.com/chrome/extensions/experimental.speechInput.html

Interfacing a midi keyboard or other real-time midi input with javascript?

I want to create a simple visualization tool that would allow to represent my playing a midi keyboard on the screen. I play a relatively novel instrument type, called the harmonic table:
http://en.wikipedia.org/wiki/Harmonic_table_note_layout
http://www.soundonsound.com/newspix/image/axis49.jpg
And want to build tools to ease their use and to teach others how to use them.
However, I can't find a good way to get get midi into javascript environment (or, for that matter, Flash, or Java without a large helping of jiggery-pokery slightly beyond my reach, and the use of code from what look to be rather stale and abandoned open source projects. Neither language I am too enthused to work in for this project in any case).
Is there a suitable library or application that I have missed, that will enable me to do this?
While searching around for another solution (Flash based, using the functions of the Red5 Open source flash server - really ugly, but I'm desperate at this point) I found someone who had done exactly what I needed using Java to interface with the hardware. They had started with a flash solution and recently ported to Javascript. Yay!
http://www.abumarkub.net/abublog/?p=505
Don't let the caveats about 'proof of concept' discourage you: the basic functionality appears solid, at least with everything I have been able to throw at it.
So now I'm on my way, and so is anyone else who want to build javascript based midi interfaces/synths/what have you.
I can manipulate real-time midi in javascript! This is much better than flying cars and jetboots.
I have made a NPAPI browser plugin that lets you communicate in Javascript with MIDI devices.
Currently it works on OSX in Chrome and Safari, and on Windows in Chrome.
I am working on support for other browsers (Firefox, Internet Explorer) and other operating systems (Linux, Android, iOs)
See: http://abumarkub.net/abublog/?p=754
EDIT:
I recently published this module https://github.com/hems/midi2funk it's a node.js module that listens to midi and broadcast it through socket.io so if you have the luxury of running a node.js service locally together with your client side you might get some fun out of it...
~~~~~
A few others handy links, i kinda ordered in what i think would be most important for you:
midibridge.js - A Javascript API for interacting with MIDI devices
midi.js sequencing in javascript
jasmid - A Javascript MIDI file reader and synthesiser
Second web midi api working draft published - 11/12/2012
Jazz Soft - MIDI IN / OUT PLUGIN FOR BROWSER
edit: just realised the thread is old, hopefully the links will help someone ( :
The Web MIDI API is now real in Google Chrome 43+. I even wrote a library to make it easier to work with it. If you are still interested and do not care that it currently only works in Chrome, check it out: https://github.com/cotejp/webmidi
Nowadays browsers supports MIDI listening. All you need is
navigator.requestMIDIAccess().then(requestMIDIAccessSuccess, requestMIDIAccessFailure);
and listen keys
function requestMIDIAccessSuccess(midi) {
var inputs = midi.inputs.values();
for (var input = inputs.next(); input && !input.done; input = inputs.next()) {
console.log('midi input', input);
input.value.onmidimessage = midiOnMIDImessage;
}
midi.onstatechange = midiOnStateChange;
}
See working example here
Most browsers don't allow access to any hardware except the keyboard and mouse - for obvious security reasons, so it's unlikely that you could access a midi device unless it's plugged in as one of those devices.
You could try finding a driver that would translate midi output to key presses, and then deal with those in the browser, but this would be a single-computer solution only.
I am really excited by the upcoming Web MIDI API. As far as I know, its only under discussion and hasn't made it into any browsers yet.
Combined with the Web Audio API which has started to be implemented in some browsers already, it will be possible to have complete software synthesis in the browser. Exciting times.
Since Web MIDI API is still a draft, there is no way of direct access to MIDI events in the browser.
A simple workaround could be to write a small server where you register MIDI events and communicate them to your javascript using a websocket. This could be done quite easily in Python.

Categories

Resources