I'm trying to make a script that gets the audio from an audio element and plays it on headset and laptop speakers at he same time.
Is this possible?
This is the code that I wrote so far:
const outputs = (await navigator.mediaDevices.enumerateDevices()).filter(({ kind, deviceId }) => kind === 'audiooutput' && deviceId !== 'default' && deviceId !== 'communications' );
var players = [];
document.querySelectorAll('audio').forEach((y)=>{
players.push([]); let l = players.length-1;
outputs.forEach((x)=>{
if(players[l].length===0) players[l].push(y);
else {
let p = y.cloneNode();
p.srcObject = players[l][0].srcObject.clone();
players[l].push(p);
}
});
})
players.forEach((a)=>{
a.forEach((o, i)=>{
o.setSinkId(outputs[i].deviceId);
o.play();
})
})
The issue with this code is that it makes the audio only play on the other speaker instead of playing it on both.
Note that the window has access to mic so I can see all the devices from navigator.mediaDevices.enumerateDevices().
The script is intended to work mainly on Edge and Chrome.
I found a solution!
Since I wasn't able to play sound on multiple devices at once in the same window I managed to get this done by creating iframes that contain audio elements.
Then I set to all of those audio elements the same source and I set them to output audio a specific device and I sync the play/stop with event listeners.
This works because iframes are separated by the main window. Actually I think they get their own process (on chromium browsers).
My code looks like this: (note I'm not using static sources, but srcObjects)
navigator.mediaDevices.getUserMedia({audio: true}).then(s=>{
s.getTracks().forEach(x=>x.stop()); //stop mic use because we need only outputs
navigator.mediaDevices.enumerateDevices().then(o=>{
const outputs = o.filter(({ kind, deviceId }) => kind === 'audiooutput' && deviceId !== 'default' && deviceId !== 'communications');
let audioSrc = getAudioSrc(), players = [];
audioSrc.src.pause(); // Pause source to start audio in sync ?
audioSrc.src.addEventListener("pause", () => players.forEach(x=>x.pause()));
audioSrc.src.addEventListener("play", () => players.forEach(x=>x.play()));
outputs.forEach((x)=>{
let ifrm = makeIFrame(), audioEl = makeAudio(ifrm.document, audioSrc.s, x.deviceId);
players.push(audioEl);
});
});
}).catch(e=>console.log(e));
I am trying to get my laptop's speaker level shown in my application. I am new to WebRTC and Web Audio API, so just wanted to confirm about the possibility of a feature. The application is an electron application and has a calling feature, so when the user at the other end of the call speaks, the application should display a level of output which varies accordingly to the sound. I have tried using WebRTC and Web Audio API, and even seen a sample. I am able to log values but that changes when I speak in the microphone, while I need only the values of speaker not the microphone.
export class OutputLevelsComponent implements OnInit {
constructor() { }
ngOnInit(): void {
this.getAudioLevel()
}
getAudioLevel() {
try {
navigator.mediaDevices.enumerateDevices().then(devices => {
console.log("device:", devices);
let constraints = {
audio : {
deviceId: devices[3].deviceId
}
}
navigator.mediaDevices.getUserMedia(constraints).then((stream) => {
console.log("stream test: ", stream);
this.handleSuccess(stream)
});
});
} catch(e) {
console.log("error getting media devices: ", e);
}
}
handleSuccess(stream: any) {
console.log("stream: ", stream);
var context = new AudioContext();
var analyser = context.createScriptProcessor(1024, 1, 1);
var source = context.createMediaStreamSource(stream);
source.connect(analyser);
// source.connect(context.destination);
analyser.connect(context.destination);
opacify();
function opacify() {
analyser.onaudioprocess = function(e) {
// no need to get the output buffer anymore
var int = e.inputBuffer.getChannelData(0);
var max = 0;
for (var i = 0; i < int.length; i++) {
max = int[i] > max ? int[i] : max;
}
if (max > 0.01) {
console.log("max: ", max);
}
}
}
}
}
I have tried the above code, where I use enumerateDevices() and getUserMedia() which will give a set of devices, for demo purposes I am taking the last device which has 'audiooutput' as value for kind property and accessing stream of the device.
Please let me know if this is even possible with Web Audio API. If not, is there any other tool that can help me implement this feature?
Thanks in advance.
You would need to use your handleSuccess() function with the stream that you get from the remote end. That stream usually gets exposed as part of the track event.
The problem is likely linked to the machine you are running. On macOS, there is no way to capture system audio output from Browser APIs as it requires a signed kernel extension. Potential workarounds are using Blackhole for Sunflower. On windows, the code should work fine though.
I've implemented the Google Cloud Speech to Text API using realtime streamed audio for the past few weeks. While initially everything looked really well, I've been testing the product on some more devices lately and have found some real weird irregularities when it comes to some iDevices.
First of all, here are the relevant code pieces:
Frontend (React Component)
constructor(props) {
super(props);
this.audio = props.audio;
this.socket = new SocketClient();
this.bufferSize = 2048;
}
/**
* Initializes the users microphone and the audio stream.
*
* #return {void}
*/
startAudioStream = async () => {
const AudioContext = window.AudioContext || window.webkitAudioContext;
this.audioCtx = new AudioContext();
this.processor = this.audioCtx.createScriptProcessor(this.bufferSize, 1, 1);
this.processor.connect(this.audioCtx.destination);
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
/* Debug through instant playback:
this.audio.srcObject = stream;
this.audio.play();
return; */
this.globalStream = stream;
this.audioCtx.resume();
this.input = this.audioCtx.createMediaStreamSource(stream);
this.input.connect(this.processor);
this.processor.onaudioprocess = (e) => {
this.microphoneProcess(e);
};
this.setState({ streaming: true });
}
/**
* Processes microphone input and passes it to the server via the open socket connection.
*
* #param {AudioProcessingEvent} e
* #return {void}
*/
microphoneProcess = (e) => {
const { speaking, askingForConfirmation, askingForErrorConfirmation } = this.state;
const left = e.inputBuffer.getChannelData(0);
const left16 = Helpers.downsampleBuffer(left, 44100, 16000);
if (speaking === false) {
this.socket.emit('stream', {
audio: left16,
context: askingForConfirmation || askingForErrorConfirmation ? 'zip_code_yes_no' : 'zip_code',
speechContext: askingForConfirmation || askingForErrorConfirmation ? ['ja', 'nein', 'ne', 'nö', 'falsch', 'neu', 'korrektur', 'korrigieren', 'stopp', 'halt', 'neu'] : ['$OPERAND'],
});
}
}
Helpers (DownsampleBuffer)
/**
* Downsamples a given audio buffer from sampleRate to outSampleRate.
* #param {Array} buffer The audio buffer to downsample.
* #param {number} sampleRate The original sample rate.
* #param {number} outSampleRate The new sample rate.
* #return {Array} The downsampled audio buffer.
*/
static downsampleBuffer(buffer, sampleRate, outSampleRate) {
if (outSampleRate === sampleRate) {
return buffer;
}
if (outSampleRate > sampleRate) {
throw new Error('Downsampling rate show be smaller than original sample rate');
}
const sampleRateRatio = sampleRate / outSampleRate;
const newLength = Math.round(buffer.length / sampleRateRatio);
const result = new Int16Array(newLength);
let offsetResult = 0;
let offsetBuffer = 0;
while (offsetResult < result.length) {
const nextOffsetBuffer = Math.round((offsetResult + 1) * sampleRateRatio);
let accum = 0;
let count = 0;
for (let i = offsetBuffer; i < nextOffsetBuffer && i < buffer.length; i++) {
accum += buffer[i];
count++;
}
result[offsetResult] = Math.min(1, accum / count) * 0x7FFF;
offsetResult++;
offsetBuffer = nextOffsetBuffer;
}
return result.buffer;
}
Backend (Socket Server)
io.on('connection', (socket) => {
logger.debug('New client connected');
const speechClient = new SpeechService(socket);
socket.on('stream', (data) => {
const audioData = data.audio;
const context = data.context;
const speechContext = data.speechContext;
speechClient.transcribe(audioData, context, speechContext);
});
});
Backend (Speech Client / Transcribe Function where data is sent to GCloud)
async transcribe(data, context, speechContext, isFile = false) {
if (!this.recognizeStream) {
logger.debug('Initiating new Google Cloud Speech client...');
let waitingForMoreData = false;
// Create new stream to the Google Speech client
this.recognizeStream = this.speechClient
.streamingRecognize({
config: {
encoding: 'LINEAR16',
sampleRateHertz: 16000,
languageCode: 'de-DE',
speechContexts: speechContext ? [{ phrases: speechContext }] : undefined,
},
interimResults: false,
singleUtterance: true,
})
.on('error', (error) => {
if (error.code === 11) {
this.recognizeStream.destroy();
this.recognizeStream = null;
return;
}
this.socket.emit('error');
this.recognizeStream.destroy();
this.recognizeStream = null;
logger.error(`Received error from Google Cloud Speech client: ${error.message}`);
})
.on('data', async (gdata) => {
if ((!gdata.results || !gdata.results[0]) && gdata.speechEventType === 'END_OF_SINGLE_UTTERANCE') {
logger.debug('Received END_OF_SINGLE_UTTERANCE - waiting 300ms for more data before restarting stream');
waitingForMoreData = true;
setTimeout(() => {
if (waitingForMoreData === true) {
// User was silent for too long - restart stream
this.recognizeStream.destroy();
this.recognizeStream = null;
}
}, 300);
return;
}
waitingForMoreData = false;
const transcription = gdata.results[0].alternatives[0].transcript;
logger.debug(`Transcription: ${transcription}`);
// Emit transcription and MP3 file of answer
this.socket.emit('transcription', transcription);
const filename = await ttsClient.getAnswerFromTranscription(transcription, 'fairy', context); // TODO-Final: Dynamic character
if (filename !== null) this.socket.emit('speech', `${config.publicScheme}://${config.publicHost}:${config.publicPort}/${filename}`);
// Restart stream
if (this.recognizeStream) this.recognizeStream.destroy();
this.recognizeStream = null;
});
}
// eslint-disable-next-line security/detect-non-literal-fs-filename
if (isFile === true) fs.createReadStream(data).pipe(this.recognizeStream);
else this.recognizeStream.write(data);
}
Now, the behavior varies heavily throughout my tested devices. I've originally developed on an iMac 2017 using Google Chrome as a browser. Works like a charm. Then, tested on an iPhone 11 Pro and iPad Air 4, both on Safari and as a full-screen web app. Again, works like a charm.
Afterwards I've tried with an iPad Pro 12.9" 2017. Suddenly, Google Cloud sometimes doesn't return a transcription at all, some other times it returns stuff which, only using very much fantasy, sounds like the actually spoken text. Same behavior on an iPad 5 and an iPhone 6 Plus.
I don't really know where to go from here. What I've read up on so far at least is that with the iPhone 6s (no idea about iPads unfortunately) the hardware sample rate was changed from 44.1khz to 48khz. So I thought, this might be it, played around with the sample rates everywhere in the code, no success. Also, I've noticed that my iMac with Google Chrome also runs on 44.1khz like the "old" iPads where transcription doesn't work. Likewise, the new iPads run on 48khz - and here everything works fine. So this can't be it.
What I've noticed as well: When I connect some AirPods to the "broken" devices and use them as audio input, everything works again. So this must have something to do with processing of the internal microphone of those devices. I just don't know what exactly.
Could anyone lead me to the right direction? What has changed between these device generations in regards to audio and the microphone?
Update 1: I've now implemented a quick function which writes the streamed PCM data from the frontend to a file in the backend using node-wav. I think, I'm getting closer now - on the devices, where the speech recognition goes nuts, I sound like a chipmunk (extremely high-pitched). I've also noticed that the binary audio data is flowing in way slower than on the devices where everything is working fine. So this probably has to do with sample/bit rate, encoding or something. Unfortunately I'm not an audio expert, so not sure what to do next.
Update 2: After a lot of trial end error, I've found that if I set the sample rate to about 9500 to 10000 in the Google Cloud RecognizeConfig, everything works. When I set this as the sample rate for the node-wav file output, it sounds okay as well. If I reset the "outgoing" sample rate to GCloud to 16000 again and downsample the audio input to about 25000 instead of 16000 in the frontend from 44100 (see "Frontend (React Component)" in the "microphoneProcess" function), it works as well. So there seems to be some kind of ~0.6 factor in sample rate differences. However, I still don't know where this behavior is coming from: Both Chrome on the working iMac and Safari on the "broken" iPads have a audioContext.sampleRate of 44100. Therefore, when I downsample them to 16000 in the code, I'd suppose both should work, whereas only the iMac works. It seems like the iPad is working with a different sample rate internally?
After a ton of trial and error, I've found the problem (and the solution).
It seems like "older" iDevice models - like the 2017 iPad Pro - have some weird peculiarity of automatically adjusting the microphone sample rate to the rate of played audio. Even though the hardware sample rate of those devices is set to 44.1khz, as soon as some audio is played, the rate changes. This can be observed through something like this:
const audioCtx = new webkitAudioContext();
console.log(`Current sample rate: ${audioCtx.sampleRate}`); // 44100
const audio = new Audio();
audio.src = 'some_audio.mp3';
await audio.play();
console.log(`Current sample rate: ${audioCtx.sampleRate}`); // Sample rate of the played audio
In my case I've played some synthesized speech from Google Text-to-Speech before opening the speech transcription socket. Those sound files have a sample rate of 24khz - exactly the sample rate Google Cloud received my audio input in.
The solution therefore was - something I should have done anyways - to downsample everything to 16khz (see my helper function in the question), but not from hard-coded 44.1khz, rather from the current sample rate of the audio context. So I've changed my microphoneProcess() function like this:
const left = e.inputBuffer.getChannelData(0);
const left16 = Helpers.downsampleBuffer(left, this.audioCtx.sampleRate, 16000);
Conclusion: Do not trust Safari with the sample rate on page load. It might change.
I am trying to write a small library for convenient manipulations with audio. I know about the autoplay policy for media elements, and I play audio after a user interaction:
const contextClass = window.AudioContext || window.webkitAudioContext;
const context = this.audioContext = new contextClass();
if (context.state === 'suspended') {
const clickCb = () => {
this.playSoundsAfterInteraction();
window.removeEventListener('touchend', clickCb);
this.usingAudios.forEach((audio) => {
if (audio.playAfterInteraction) {
const promise = audio.play();
if (promise !== undefined) {
promise.then(_ => {
}).catch(error => {
// If playing isn't allowed
console.log(error);
});
}
}
});
};
window.addEventListener('touchend', clickCb);
}
On android chrome everything ok and on a desktop browser. But on mobile Safari I am getting such error in promise:
the request is not allowed by the user agent or the platform in the current context safari
I have tried to create audios after an interaction, change their "src" property. In every case, I am getting this error.
I just create audio in js:
const audio = new Audio(base64);
add it to array and try to play. But nothing...
Tried to create and play after a few seconds after interaction - nothing.
I am trying to turn on flash on a web application running in chrome under windows 10, Panasonic tablet.
According to tutorials i should get torch true when calling track.getCapabilities()
However i dont get torch at all in the returning object.
Native apps are able to turn on the flash light.
See code:
video.onloadedmetadata = function(e) {
video.play();
setTimeout(() => {
const track = mediaStream.getVideoTracks()[0];
const capabilities = typeof track.getCapabilities === 'function' && track.getCapabilities() || {};
if (capabilities.torch) {
hasTorchMode = true;
}
}, 250);
};
Any clue how to solve this?