Language problems with WEB API speech recognition on Chrome browser - javascript

I'm working on a javascript project that uses WEP API and speech recognition. I'm using some code I found where you can choose between many languages in a "curtain"-menu for the speech recognition. The problem is that it's currently only working for Swedish (the country where I live) and English (I don't even need to set the languages, it's transferring to English and Swedish perfectly when I speak. But when I choose Spanish it doesn't work.
Then when I choose Spanish in the settings of the chrome web browser and put it at top of the list (before Swedish and English) it works (it doesn't work if I put it at second or third place). Is there any way I can solve this from javascript, because I don't want to change the language settings in the browser every time I want the speech recognition program to work in Spanish.

Related

Speech-to-text Recognition is not accurate

I am trying to implement Speech-to-text recognition in my React website, and I am using the react-speech-recognition package from npm. I am using the exact code they have specified in the package description over here: npm
Now it works with everyday speech, anything I say, but when I induce technical jargon, it goes way off!
Here's what I am trying to say to it, it's aviation jargon:
Cleared to enter the CTR, not above 1500 feet, join and report on a right downwind runway 19, QNH 1018, squak 2732
This is what I get in response:
please to enter the city are not above 15 feet heart penetrate join and report on a ride on the wind blown away 9 theme
What else do I need to do to fix the accuracy of the recognition?
That package leverages the Speech Recognition Interface of your browser's Web Speech API. The React Library's API allows you to get the underlying SpeechRecognition object via a call to the getRecognition() method.
The underlying SpeechRecognition object's API allows for the addition of Grammars using the JSpeech Grammar Format. Here's an example. So in theory, you could provide more information about the words you're expecting to hear in your app, and thereby improve performance.
But there are caveats, including:
There is very limited browser support for the speech recognition generally, and for the addition of grammars, specifically. Obviously if you don't have control over what browser your users will be using, that means the quality of recognition will vary, and might not work at all if you don't use Polyfills.
Depending on how the speech recognition is implemented, things like hardware configuration and the Operating System may impact speech recognition results.
Speech recognition is an extremely inexact science. The best automatic speech recognition software/services only boast about 85% accuracy, even with ordinary speech. The ones built into your browser probably won't be even that good.
You may be able to get better accuracy from cloud-based speech services. Azure Cognitive Services, for example, allows you to create custom voice models, custom grammars, etc. Of course, they also charge you based on usage, and they charge more if you're using customizations.

progressive web app OCR SDK (JavaScript)

We have developed a web-application which uses OCR technology. We use a paid API-service for that which works pretty well.
Our backend/server uses a Ruby/Rails based API and our frontend/client uses VUE.js.
Now the OCR tech was server side but we want to grow into the offline/PWA (progressive web app) market as well and are currently developing an offline-version of our app. Since the OCR-part of our app has to run in the client, we'd move the OCR tech also client side which means we have to use some sort of PWA compatible OCR tech, e.g. a JavaScript SDK
I have probably searched the whole internet but there does not really seem to be a solution. It all came down to two major providers:
tesseract / the probably biggest OCR project (open source). They offer a JS SDK (tesseract.js) -> http://tesseract.projectnaptha.com/ but this works pretty bad. We have compared results to the API that we're using and the results are terrible. Hence, we cannot use it.
ABBYY which is probably the most well known OCR provider in the web and they offer OCR scanning for reasonable prices. After calling them they unfortunately also do not provide any solution for PWAs, only for native apps
multiple other small projects which use everything except JS libraries unfortunately
Before giving up and consider developing a native app (which will be quote cost intensive for us) I considered to ask here if any known OCR solution for PWAs is known which I did not consider yet. Thx
You can use the ocrad.js open source javascript OCR library: http://antimatter15.com/ocrad.js/demo.html | https://github.com/antimatter15/ocrad.js
It's 3mb size but it's working well on lots of examples.
No JavaScript solution, but you could try Scandit. It works offline on next to all platforms and can be web integrated using Cordova. https://www.scandit.com/tag/ocr/. Or use the newest web platform https://www.scandit.com/scandit-launches-barcode-scanner-sdk-for-web-brings-scanning-to-the-browser/
Try it here:
https://websdk.scandit.com/
Imense offers a compiled client side JavaScript OCR engine aimed at the ID reading market (limited character set). There is a demo that reads text from USB camera input at https://www.imense.co.uk/OCR.html
The library is not free, the demo requires HTML5 support.

Is it possible to interact with an arbitrary client peripheral from a webapp?

I know a webapp can access media devices like microphones and webcams, and other hardware like a smartphone's GPS. As far as I know, that's done with tightly constrained protocols for each specific type of device.
However, I have an advanced scientific camera. It is only useful with a computer - it ships with a desktop application for controlling it and taking photos. It also ships with a C/C++ SDK to interface with it in your own applications.
The browser obviously doesn't recognize it as a webcam. Even if it did, all but the most basic functionality would be ignored. The camera is on the client side. Is it possible to write a webapp that can interface with that camera and use all of its features? I'm not looking for a full solution, I just don't even know what to google.
Any amount of hand-rolled solution is fair game here. Anything from plain JavaScript to browser plug-ins to a custom desktop middleware psuedo-driver to sit between the hardware and the camera and the browser. Even binding the client to a specific OS is fine.
You can do basic video capture and screen grabs with Silverlight:
https://msdn.microsoft.com/en-us/library/ff602282(v=vs.95).aspx
It also scriptable by Javascript:
https://msdn.microsoft.com/en-us/library/cc645085(v=vs.95).aspx
Problem is, Silverlight is going away. Officially not until October 2021 though so that might still be an option until the browser vendors come online with HTML 5 Media Capture and Streams:
http://w3c.github.io/mediacapture-main/getusermedia.html
Anything beyond basic capture though, your probably looking at a custom browser extension to control the camera's functions thought it's provided API

JavaScript Speech-to-Text for blind people

I'm developing a website, and I would like to help blind people to use it by the voice, so I will use:
Text-to-speech, to give some posibilities to the user
Speech-to-text, to allow user to use her voice to select one
I already have some text-to-speech JavaScript libraries (like speak.js), but now I need a good speech-to-text one. There are some solutions for this purpose (like speechapi), but they use Java Applets or Flash, and I want to depend only on JavaScript, to avoid plugins.
I'm trying HTML5's speech input with x-webkit-speech and Google Chrome, and it is good, but you need to click over an icon (and blind people can't use a mouse well). Is it posible to use x-webkit-speech pressing a key? Do you know any alternative API (JavaScript)?
Thank you!
Is it posible to use x-webkit-speech pressing a key?
According to the this post and this post you cannot override the start of speech by clicking the microphone.
What the x-webkit-speech is doing is using the audio capture capabilities of HTML5 and sending the audio to Google's servers for processing, returning the results in JSON. This blogger has reversed engineered it. You could develop a JavaScript library that looks for a key press to start capturing audio on HTML5 enabled browsers and send it to Google's service or to one you have created. The downside to using Google's service is that it is an unsupported API and subject to change at any time. The downside to developing your own service is that it can be expensive to develop and maintain.
Do you know any alternative API (JavaScript)?
This post and this post lists some services available for speech recognition. I did not see Nuance listed. You may be able to use the Dragon Mobile SDK for this. And you may want to check into ISpeech.
Google Translate is very good Text To Speech Engine. I used to read a text with it. For example you have a text: welcome to Stack overflow you can call like this
http://translate.google.com/translate_tts?ie=UTF-8&q=Welcome%20to%20stack%20overflow&tl=en&total=1&idx=0&textlen=23&prev=input
then use browser audio to play it
For speech input you can manual activate listening process, see here
http://code.google.com/chrome/extensions/experimental.speechInput.html

Which Browser/IDE for rapid add-on development/prototyping?

I'd like to develop an extension for a browser which does the following.
Prerequisite: Text has been selected, add-on has been triggered (e.g. by click in context menu)
read selected text
pass the text to a (e.g. RESTful) webservice
retrieve a list of comments from the webservice
show them in the browser
optional: show also an input field below to send another comment to the webservice
Writing a Firefox add-on became quite annoying since I haven't found a proper documentation and IDE (with a handy build process).
Which Browser/IDE combination do you recommend for rapid add-on development/prototyping?
For Google Chrome, you use web technologies to create extensions. (AFAIK Firefox is the same thing). The documentation for Google Chrome Extensions is documented pretty well: http://code.google.com/chrome/extensions/index.html
For the case you have mentioned, I have answered another user on how to capture selected text and send them to a service with a working example that you can learn from if you want:
Chrome Extension: how to capture selected text and send to a web service
Regarding the tools that you can use, it depends on what your comfortable with. Personally, I just use an editor that has syntax hilighting such as VIM, Notepad2, etc. Some people use dreamweaver, emacs, notepad, etc. At the end it all matters on your taste.
Good luck!

Categories

Resources