I am trying to get my laptop's speaker level shown in my application. I am new to WebRTC and Web Audio API, so just wanted to confirm about the possibility of a feature. The application is an electron application and has a calling feature, so when the user at the other end of the call speaks, the application should display a level of output which varies accordingly to the sound. I have tried using WebRTC and Web Audio API, and even seen a sample. I am able to log values but that changes when I speak in the microphone, while I need only the values of speaker not the microphone.
export class OutputLevelsComponent implements OnInit {
constructor() { }
ngOnInit(): void {
this.getAudioLevel()
}
getAudioLevel() {
try {
navigator.mediaDevices.enumerateDevices().then(devices => {
console.log("device:", devices);
let constraints = {
audio : {
deviceId: devices[3].deviceId
}
}
navigator.mediaDevices.getUserMedia(constraints).then((stream) => {
console.log("stream test: ", stream);
this.handleSuccess(stream)
});
});
} catch(e) {
console.log("error getting media devices: ", e);
}
}
handleSuccess(stream: any) {
console.log("stream: ", stream);
var context = new AudioContext();
var analyser = context.createScriptProcessor(1024, 1, 1);
var source = context.createMediaStreamSource(stream);
source.connect(analyser);
// source.connect(context.destination);
analyser.connect(context.destination);
opacify();
function opacify() {
analyser.onaudioprocess = function(e) {
// no need to get the output buffer anymore
var int = e.inputBuffer.getChannelData(0);
var max = 0;
for (var i = 0; i < int.length; i++) {
max = int[i] > max ? int[i] : max;
}
if (max > 0.01) {
console.log("max: ", max);
}
}
}
}
}
I have tried the above code, where I use enumerateDevices() and getUserMedia() which will give a set of devices, for demo purposes I am taking the last device which has 'audiooutput' as value for kind property and accessing stream of the device.
Please let me know if this is even possible with Web Audio API. If not, is there any other tool that can help me implement this feature?
Thanks in advance.
You would need to use your handleSuccess() function with the stream that you get from the remote end. That stream usually gets exposed as part of the track event.
The problem is likely linked to the machine you are running. On macOS, there is no way to capture system audio output from Browser APIs as it requires a signed kernel extension. Potential workarounds are using Blackhole for Sunflower. On windows, the code should work fine though.
Related
I've implemented the Google Cloud Speech to Text API using realtime streamed audio for the past few weeks. While initially everything looked really well, I've been testing the product on some more devices lately and have found some real weird irregularities when it comes to some iDevices.
First of all, here are the relevant code pieces:
Frontend (React Component)
constructor(props) {
super(props);
this.audio = props.audio;
this.socket = new SocketClient();
this.bufferSize = 2048;
}
/**
* Initializes the users microphone and the audio stream.
*
* #return {void}
*/
startAudioStream = async () => {
const AudioContext = window.AudioContext || window.webkitAudioContext;
this.audioCtx = new AudioContext();
this.processor = this.audioCtx.createScriptProcessor(this.bufferSize, 1, 1);
this.processor.connect(this.audioCtx.destination);
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
/* Debug through instant playback:
this.audio.srcObject = stream;
this.audio.play();
return; */
this.globalStream = stream;
this.audioCtx.resume();
this.input = this.audioCtx.createMediaStreamSource(stream);
this.input.connect(this.processor);
this.processor.onaudioprocess = (e) => {
this.microphoneProcess(e);
};
this.setState({ streaming: true });
}
/**
* Processes microphone input and passes it to the server via the open socket connection.
*
* #param {AudioProcessingEvent} e
* #return {void}
*/
microphoneProcess = (e) => {
const { speaking, askingForConfirmation, askingForErrorConfirmation } = this.state;
const left = e.inputBuffer.getChannelData(0);
const left16 = Helpers.downsampleBuffer(left, 44100, 16000);
if (speaking === false) {
this.socket.emit('stream', {
audio: left16,
context: askingForConfirmation || askingForErrorConfirmation ? 'zip_code_yes_no' : 'zip_code',
speechContext: askingForConfirmation || askingForErrorConfirmation ? ['ja', 'nein', 'ne', 'nö', 'falsch', 'neu', 'korrektur', 'korrigieren', 'stopp', 'halt', 'neu'] : ['$OPERAND'],
});
}
}
Helpers (DownsampleBuffer)
/**
* Downsamples a given audio buffer from sampleRate to outSampleRate.
* #param {Array} buffer The audio buffer to downsample.
* #param {number} sampleRate The original sample rate.
* #param {number} outSampleRate The new sample rate.
* #return {Array} The downsampled audio buffer.
*/
static downsampleBuffer(buffer, sampleRate, outSampleRate) {
if (outSampleRate === sampleRate) {
return buffer;
}
if (outSampleRate > sampleRate) {
throw new Error('Downsampling rate show be smaller than original sample rate');
}
const sampleRateRatio = sampleRate / outSampleRate;
const newLength = Math.round(buffer.length / sampleRateRatio);
const result = new Int16Array(newLength);
let offsetResult = 0;
let offsetBuffer = 0;
while (offsetResult < result.length) {
const nextOffsetBuffer = Math.round((offsetResult + 1) * sampleRateRatio);
let accum = 0;
let count = 0;
for (let i = offsetBuffer; i < nextOffsetBuffer && i < buffer.length; i++) {
accum += buffer[i];
count++;
}
result[offsetResult] = Math.min(1, accum / count) * 0x7FFF;
offsetResult++;
offsetBuffer = nextOffsetBuffer;
}
return result.buffer;
}
Backend (Socket Server)
io.on('connection', (socket) => {
logger.debug('New client connected');
const speechClient = new SpeechService(socket);
socket.on('stream', (data) => {
const audioData = data.audio;
const context = data.context;
const speechContext = data.speechContext;
speechClient.transcribe(audioData, context, speechContext);
});
});
Backend (Speech Client / Transcribe Function where data is sent to GCloud)
async transcribe(data, context, speechContext, isFile = false) {
if (!this.recognizeStream) {
logger.debug('Initiating new Google Cloud Speech client...');
let waitingForMoreData = false;
// Create new stream to the Google Speech client
this.recognizeStream = this.speechClient
.streamingRecognize({
config: {
encoding: 'LINEAR16',
sampleRateHertz: 16000,
languageCode: 'de-DE',
speechContexts: speechContext ? [{ phrases: speechContext }] : undefined,
},
interimResults: false,
singleUtterance: true,
})
.on('error', (error) => {
if (error.code === 11) {
this.recognizeStream.destroy();
this.recognizeStream = null;
return;
}
this.socket.emit('error');
this.recognizeStream.destroy();
this.recognizeStream = null;
logger.error(`Received error from Google Cloud Speech client: ${error.message}`);
})
.on('data', async (gdata) => {
if ((!gdata.results || !gdata.results[0]) && gdata.speechEventType === 'END_OF_SINGLE_UTTERANCE') {
logger.debug('Received END_OF_SINGLE_UTTERANCE - waiting 300ms for more data before restarting stream');
waitingForMoreData = true;
setTimeout(() => {
if (waitingForMoreData === true) {
// User was silent for too long - restart stream
this.recognizeStream.destroy();
this.recognizeStream = null;
}
}, 300);
return;
}
waitingForMoreData = false;
const transcription = gdata.results[0].alternatives[0].transcript;
logger.debug(`Transcription: ${transcription}`);
// Emit transcription and MP3 file of answer
this.socket.emit('transcription', transcription);
const filename = await ttsClient.getAnswerFromTranscription(transcription, 'fairy', context); // TODO-Final: Dynamic character
if (filename !== null) this.socket.emit('speech', `${config.publicScheme}://${config.publicHost}:${config.publicPort}/${filename}`);
// Restart stream
if (this.recognizeStream) this.recognizeStream.destroy();
this.recognizeStream = null;
});
}
// eslint-disable-next-line security/detect-non-literal-fs-filename
if (isFile === true) fs.createReadStream(data).pipe(this.recognizeStream);
else this.recognizeStream.write(data);
}
Now, the behavior varies heavily throughout my tested devices. I've originally developed on an iMac 2017 using Google Chrome as a browser. Works like a charm. Then, tested on an iPhone 11 Pro and iPad Air 4, both on Safari and as a full-screen web app. Again, works like a charm.
Afterwards I've tried with an iPad Pro 12.9" 2017. Suddenly, Google Cloud sometimes doesn't return a transcription at all, some other times it returns stuff which, only using very much fantasy, sounds like the actually spoken text. Same behavior on an iPad 5 and an iPhone 6 Plus.
I don't really know where to go from here. What I've read up on so far at least is that with the iPhone 6s (no idea about iPads unfortunately) the hardware sample rate was changed from 44.1khz to 48khz. So I thought, this might be it, played around with the sample rates everywhere in the code, no success. Also, I've noticed that my iMac with Google Chrome also runs on 44.1khz like the "old" iPads where transcription doesn't work. Likewise, the new iPads run on 48khz - and here everything works fine. So this can't be it.
What I've noticed as well: When I connect some AirPods to the "broken" devices and use them as audio input, everything works again. So this must have something to do with processing of the internal microphone of those devices. I just don't know what exactly.
Could anyone lead me to the right direction? What has changed between these device generations in regards to audio and the microphone?
Update 1: I've now implemented a quick function which writes the streamed PCM data from the frontend to a file in the backend using node-wav. I think, I'm getting closer now - on the devices, where the speech recognition goes nuts, I sound like a chipmunk (extremely high-pitched). I've also noticed that the binary audio data is flowing in way slower than on the devices where everything is working fine. So this probably has to do with sample/bit rate, encoding or something. Unfortunately I'm not an audio expert, so not sure what to do next.
Update 2: After a lot of trial end error, I've found that if I set the sample rate to about 9500 to 10000 in the Google Cloud RecognizeConfig, everything works. When I set this as the sample rate for the node-wav file output, it sounds okay as well. If I reset the "outgoing" sample rate to GCloud to 16000 again and downsample the audio input to about 25000 instead of 16000 in the frontend from 44100 (see "Frontend (React Component)" in the "microphoneProcess" function), it works as well. So there seems to be some kind of ~0.6 factor in sample rate differences. However, I still don't know where this behavior is coming from: Both Chrome on the working iMac and Safari on the "broken" iPads have a audioContext.sampleRate of 44100. Therefore, when I downsample them to 16000 in the code, I'd suppose both should work, whereas only the iMac works. It seems like the iPad is working with a different sample rate internally?
After a ton of trial and error, I've found the problem (and the solution).
It seems like "older" iDevice models - like the 2017 iPad Pro - have some weird peculiarity of automatically adjusting the microphone sample rate to the rate of played audio. Even though the hardware sample rate of those devices is set to 44.1khz, as soon as some audio is played, the rate changes. This can be observed through something like this:
const audioCtx = new webkitAudioContext();
console.log(`Current sample rate: ${audioCtx.sampleRate}`); // 44100
const audio = new Audio();
audio.src = 'some_audio.mp3';
await audio.play();
console.log(`Current sample rate: ${audioCtx.sampleRate}`); // Sample rate of the played audio
In my case I've played some synthesized speech from Google Text-to-Speech before opening the speech transcription socket. Those sound files have a sample rate of 24khz - exactly the sample rate Google Cloud received my audio input in.
The solution therefore was - something I should have done anyways - to downsample everything to 16khz (see my helper function in the question), but not from hard-coded 44.1khz, rather from the current sample rate of the audio context. So I've changed my microphoneProcess() function like this:
const left = e.inputBuffer.getChannelData(0);
const left16 = Helpers.downsampleBuffer(left, this.audioCtx.sampleRate, 16000);
Conclusion: Do not trust Safari with the sample rate on page load. It might change.
I have an html5 video element I'm trying to increase the volume off.
I'm using the code I found in this answer
However there is no sound coming out of the speakers. If I disable it sound is fine.
videoEl.muted = true //tried with this disabled or enabled
if(!window.audio)
window.audio = amplify(vol)
else
window.audio.amplify(vol)
...
export function amplify(multiplier) {
const media = document.getElementById('videoEl')
//#ts-ignore
var context = new(window.AudioContext || window.webkitAudioContext),
result = {
context: context,
source: context.createMediaElementSource(media),
gain: context.createGain(),
media,
amplify: function(multiplier) {
result.gain.gain.value = multiplier;
},
getAmpLevel: function() {
return result.gain.gain.value;
}
};
result.source.connect(result.gain)
result.gain.connect(context.destination)
result.amplify(multiplier)
return result;
}
That value is set to 3 for testing.
Any idea how why I'm getting no sound?
I also have Howler running for other audio files, could it be blocking the web audio API?
I'm a beginner in the Web Development world. I want to pair a BLE device (NRF52840 DK) with my mobile device through a web page. I tested an example for Web Bluetooth and it works fine on my PC as you can see on the following image:
Pairing my PC (Chrome) with a BLE device successfully
I did it through the popular extension Live Server on VS Code. When I tried to access to that page on my mobile (Android) using my IP and port, and pressed the button nothing happened.
Pairing my mobile (Chrome) with a BLE device unsuccessfully
Is there something that I'm not taking into consideration?
Here is the HTML & JS code:
<!DOCTYPE html>
<html>
<head>
<title>BLE WebApp</title>
</head>
<body>
<form>
<button>Connect with BLE device</button>
</form>
<script>
var deviceName = 'Nordic_HRM'
function isWebBluetoothEnabled() {
if (!navigator.bluetooth) {
console.log('Web Bluetooth API is not available in this browser!')
return false
}
return true
}
function getDeviceInfo() {
let chosenHeartRateService = null;
console.log('Requesting Bluetooth Device...')
navigator.bluetooth.requestDevice({ filters: [{ services: ['heart_rate'] }] })
.then(device => device.gatt.connect())
.then(server => {
console.log("Getting HR Service…")
return server.getPrimaryService('heart_rate');
})
.then(service => {
chosenHeartRateService = service;
return Promise.all([
service.getCharacteristic('heart_rate_measurement')
.then(handleHeartRateMeasurementCharacteristic),
]);
})
}
function handleHeartRateMeasurementCharacteristic(characteristic) {
return characteristic.startNotifications()
.then(char => {
characteristic.addEventListener('characteristicvaluechanged',
onHeartRateChanged);
});
}
function onHeartRateChanged(event) {
const characteristic = event.target;
console.log(parseHeartRate(characteristic.value));
}
function parseHeartRate(data) {
const flags = data.getUint8(0);
const rate16Bits = flags & 0x1;
const result = {};
let index = 1;
if (rate16Bits) {
result.heartRate = data.getUint16(index, /*littleEndian=*/true);
index += 2;
} else {
result.heartRate = data.getUint8(index);
index += 1;
}
const contactDetected = flags & 0x2;
const contactSensorPresent = flags & 0x4;
if (contactSensorPresent) {
result.contactDetected = !!contactDetected;
}
const energyPresent = flags & 0x8;
if (energyPresent) {
result.energyExpended = data.getUint16(index, /*littleEndian=*/true);
index += 2;
}
const rrIntervalPresent = flags & 0x10;
if (rrIntervalPresent) {
const rrIntervals = [];
for (; index + 1 < data.byteLength; index += 2) {
rrIntervals.push(data.getUint16(index, /*littleEndian=*/true));
}
result.rrIntervals = rrIntervals;
}
return result;
}
document.querySelector('form').addEventListener('submit', function(event) {
event.stopPropagation()
event.preventDefault()
if (isWebBluetoothEnabled()) {
getDeviceInfo()
}
})
</script>
</body>
</html>
I don't think it's possible in the way you describe it.
Web Bluetooth only works on pages which has HTTPS enabled. It works on localhost too for testing purposes. Since you access the web application via an IP-address which doesn't have HTTPS enabled, Web Bluetooth is not available.
https://web.dev/bluetooth/#https-only
You can test this by adding a text to the page with a boolean or other text which will show if web bluetooth is available.
function isWebBluetoothEnabled() {
if (!navigator.bluetooth) {
console.log('Web Bluetooth API is not available in this browser!')
document.getElementById('bluetoothState').innerText = 'Not available'
return false
}
document.getElementById('bluetoothState').innerText = 'Available'
return true
}
use mkcert to create a certificate
usage:
mkcert 127.0.0.1 localhost 0.0.0.0 192.168.0.X
use the files generated to add it to settings.json file in .vscode directory (create if doesn't exist)
{
"liveServer.settings.https": {
"enable": true,
"cert": "/full/path/to/file/192.168.0.108+3.pem",
"key": "/full/path/to/file/192.168.0.108+3-key.pem",
"passphrase": ""
}
}
this will create a warning once due to the nature of self signed certificates, but it'll make sure almost all features of https work.
The issue I faced using this method is that it is considered not secure enough to install a PWA on a local device by chrome.
I am doing a POC and my requirement is that I want to implement the feature like OK google or Hey Siri on browser.
I am using the Chrome Browser's Web speech api. The things I noticed that I can't continuous the recognition as it terminates automatically after a certain period of time and I know its relevant because of security concern. I just does another hack like when the SpeechReognition terminates then on its end event I further start the SpeechRecogntion but it is not the best way to implement such a solution because suppose if I am using the 2 instances of same application on the different browser tab then It doesn't work or may be I am using another application in my browser that uses the speech recognition then both the application doesn't behave the same as expected. I am looking for a best approach to solve this problem.
Thanks in advance.
Since your problem is that you can't run the SpeechRecognition continuously for long periods of time, one way would be to start the SpeechRecognition only when you get some input in the mic.
This way only when there is some input, you will start the SR, looking for your magic_word.
If the magic_word is found, then you will be able to use the SR normally for your other tasks.
This can be detected by the WebAudioAPI, which is not tied by this time restriction SR suffers from. You can feed it by an LocalMediaStream from MediaDevices.getUserMedia.
For more info, on below script, you can see this answer.
Here is how you could attach it to a SpeechRecognition:
const magic_word = ##YOUR_MAGIC_WORD##;
// initialize our SpeechRecognition object
let recognition = new webkitSpeechRecognition();
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 1;
recognition.continuous = true;
// detect the magic word
recognition.onresult = e => {
// extract all the transcripts
var transcripts = [].concat.apply([], [...e.results]
.map(res => [...res]
.map(alt => alt.transcript)
)
);
if(transcripts.some(t => t.indexOf(magic_word) > -1)){
//do something awesome, like starting your own command listeners
}
else{
// didn't understood...
}
}
// called when we detect silence
function stopSpeech(){
recognition.stop();
}
// called when we detect sound
function startSpeech(){
try{ // calling it twice will throw...
recognition.start();
}
catch(e){}
}
// request a LocalMediaStream
navigator.mediaDevices.getUserMedia({audio:true})
// add our listeners
.then(stream => detectSilence(stream, stopSpeech, startSpeech))
.catch(e => log(e.message));
function detectSilence(
stream,
onSoundEnd = _=>{},
onSoundStart = _=>{},
silence_delay = 500,
min_decibels = -80
) {
const ctx = new AudioContext();
const analyser = ctx.createAnalyser();
const streamNode = ctx.createMediaStreamSource(stream);
streamNode.connect(analyser);
analyser.minDecibels = min_decibels;
const data = new Uint8Array(analyser.frequencyBinCount); // will hold our data
let silence_start = performance.now();
let triggered = false; // trigger only once per silence event
function loop(time) {
requestAnimationFrame(loop); // we'll loop every 60th of a second to check
analyser.getByteFrequencyData(data); // get current data
if (data.some(v => v)) { // if there is data above the given db limit
if(triggered){
triggered = false;
onSoundStart();
}
silence_start = time; // set it to now
}
if (!triggered && time - silence_start > silence_delay) {
onSoundEnd();
triggered = true;
}
}
loop();
}
As a plunker, since neither StackSnippets nor jsfiddle's iframes will allow gUM in two versions...
I am building a project similar to this example with jsartoolkit5, and I would like to be able to select the back camera of my device instead of letting Chrome on Android select the front one as default.
According to the example in this demo, I have added the code below to switch camera automatically if the device has a back camera.
var videoElement = document.querySelector('canvas');
function successCallback(stream) {
window.stream = stream; // make stream available to console
videoElement.src = window.URL.createObjectURL(stream);
videoElement.play();
}
function errorCallback(error) {
console.log('navigator.getUserMedia error: ', error);
}
navigator.mediaDevices.enumerateDevices().then(
function(devices) {
for (var i = 0; i < devices.length; i++) {
if (devices[i].kind == 'videoinput' && devices[i].label.indexOf('back') !== -1) {
if (window.stream) {
videoElement.src = null;
window.stream.stop();
}
var constraints = {
video: {
optional: [{
sourceId: devices[i].deviceId
}]
}
};
navigator.getUserMedia(constraints, successCallback, errorCallback);
}
}
}
);
The issue is that it works perfectly for a <video> tag, but unluckily jsartoolkit renders the content inside a canvas and it consequently throws an error.
I have also tried to follow the instructions in this closed issue in the Github repository, but this time I get the following error: DOMException: play() can only be initiated by a user gesture.
Do you know or have any suggestion on how to solve this issue?
Thanks in advance for your replies!
Main problem :
You are mixing old and new getUserMedia syntax.
navigator.getUserMedia is deprecated, and navigator.mediaDevices.getUserMedia should be preferred.
Also, I think that optional is not part of the constraints dictionary anymore.
Default Solution
This part is almost a duplicate of this answer : https://stackoverflow.com/a/32364912/3702797
You should be able to call directly
navigator.mediaDevices.getUserMedia({
video: {
facingMode: {
exact: 'environment'
}
}
})
But chrome still has this bug, and even if #jib's answer states that it should work with adpater.js polyfill, I myself were unable to make it work on my chrome for Android.
So previous syntax will currently work only on Firefox for Android.
For chrome, you'll indeed need to use enumerateDevices, along with adapter.js to make it work, but don't mix up the syntax, and everything should be fine :
let handleStream = s => {
document.body.append(
Object.assign(document.createElement('video'), {
autoplay: true,
srcObject: s
})
);
}
navigator.mediaDevices.enumerateDevices().then(devices => {
let sourceId = null;
// enumerate all devices
for (var device of devices) {
// if there is still no video input, or if this is the rear camera
if (device.kind == 'videoinput' &&
(!sourceId || device.label.indexOf('back') !== -1)) {
sourceId = device.deviceId;
}
}
// we didn't find any video input
if (!sourceId) {
throw 'no video input';
}
let constraints = {
video: {
sourceId: sourceId
}
};
navigator.mediaDevices.getUserMedia(constraints)
.then(handleStream);
});
<script src="https://webrtc.github.io/adapter/adapter-latest.js"></script>
Fiddle for chrome which need https.
Make it work with jsartoolkit
You'll have to fork jsartoolkit project and edit artoolkit.api.js.
The main project currently disables mediaDevices.getUserMedia(), so you'll need to enable it again, and you'll also have to add a check for an sourceId option, that we'll add later in the ARController.getUserMediaThreeScene() call.
You can find a rough and ugly implementation of these edits in this fork.
So once it is done, you'll have to rebuild the js files, and then remember to include adapter.js polyfill in your code.
Here is a working fiddle that uses one of the project's demo.