The PCM audio data is captured in Unity3D in real time. All those data will be streaming to HTML via WebSockets. The general setup is Socket.IO with node.js server.
My major task is adding smooth audio playback for live video+audio streaming solution on All platform. This is my working progress(video streaming): https://youtu.be/82_-a7WF3vs
The audio & video streaming part works well on non-html/non-WebGL platforms.
However, I couldn't make smooth audio playback on html with javascript. It runs real-time but I found some lagging issue like noise...
One of my concern is that Web Browsers do not support multi-threading, it added some lag when receiving streaming data and playback at the same time.
below is my core script for PCM playback. Hope someone can help me improve it.
var startTime = 0;
var audioCtx = new AudioContext();
function ProcessAudioData(_byte) {
ReadyToGetFrame_aud = false;
//read meta data
SourceSampleRate = ByteToInt32(_byte, 0);
SourceChannels = ByteToInt32(_byte, 4);
//conver byte[] to float
var BufferData = _byte.slice(8, _byte.length);
AudioFloat = new Float32Array(BufferData.buffer);
//=====================playback=====================
if(AudioFloat.length > 0) StreamAudio(SourceChannels, AudioFloat.length, SourceSampleRate, AudioFloat);
//=====================playback=====================
ReadyToGetFrame_aud = true;
}
function StreamAudio(NUM_CHANNELS, NUM_SAMPLES, SAMPLE_RATE, AUDIO_CHUNKS) {
var audioBuffer = audioCtx.createBuffer(NUM_CHANNELS, (NUM_SAMPLES / NUM_CHANNELS), SAMPLE_RATE);
for (var channel = 0; channel < NUM_CHANNELS; channel++) {
// This gives us the actual ArrayBuffer that contains the data
var nowBuffering = audioBuffer.getChannelData(channel);
for (var i = 0; i < NUM_SAMPLES; i++) {
var order = i * NUM_CHANNELS + channel;
nowBuffering[i] = AUDIO_CHUNKS[order];
}
}
var source = audioCtx.createBufferSource();
source.buffer = audioBuffer;
source.connect(audioCtx.destination);
source.start(startTime);
startTime += audioBuffer.duration;
}
How to stream PCM audio on HTML without lag?
There is always some lag with digital audio, no matter what you do. This has nothing to do with the web browser itself.
All those data will be streaming to HTML via WebSockets.
Why? The data is only going one direction so you can use a regular HTTP response and not have to worry about the overhead of Web Sockets.
One of my concern is that Web Browsers do not support multi-threading
This isn't really accurate.
It runs real-time but I found some lagging issue like noise...
What your code appears to do is take a PCM frame it receives and play it immediately. This isn't good, as the sound is wrecked if you don't play your received buffers contiguously. You must take the data and schedule it to play immediately after the current data is finished, and not a sample early or too late.
Traditionally this means doing your own buffering and setting up a ScriptProcessorNode to read from those buffers. However, this also requires some DIY resampling because the encoded rate may not be the same as the playback rate.
These days, I think that MediaSource Extensions supports PCM decoding, so you can just pipe your data through that and let the underlying system do all the work for you.
Related
I've worded my title, and tags in a way that should be searchable for both video and audio, as this question isn't specific to one. My specific case only concerns audio though, so my question body will be written specific to that.
First, the big picture:
I'm sending audio to multiple P2P clients who will connect and disconnect a random intervals. The audio I'm sending is a stream, but each client only needs the part of the stream from whence they connected. Here's how I solved that:
Every {timeout} (e.g. 1000ms), create a new audio blob
Blob will be a full audio file, with all metadata it needs to be playable
As soon as a blob is created, convert to array buffer (better browser support), and upload to client over WebRTC (or WebSockets if they don't support)
That works well. There is a delay, but if you keep the timeout low enough, it's fine.
Now, my question:
How can I play my "stream" without having any audible delay?
I say stream, but I didn't implement it using the Streams API, it is a queue of blobs, that gets updated every time the client gets new data.
I've tried a lot of different things like:
Creating a BufferSource, and merging two blobs (converted to audioBuffers) then playing that
Passing an actual stream from Stream API to clients instead of blobs
Playing blobs sequentially, relying on ended event
Loading next blob while current blob is playing
Each has problems, difficulties, or still results in an audible delay.
Here's my most recent attempt at this:
let firstTime = true;
const chunks = [];
Events.on('audio-received', ({ detail: audioChunk }) => {
chunks.push(audioChunk);
if (firstTime && chunks.length > 2) {
const currentAudio = document.createElement("audio");
currentAudio.controls = true;
currentAudio.preload = 'auto';
document.body.appendChild(currentAudio);
currentAudio.src = URL.createObjectURL(chunks.shift());
currentAudio.play();
const nextAudio = document.createElement("audio");
nextAudio.controls = true;
nextAudio.preload = 'auto';
document.body.appendChild(nextAudio);
nextAudio.src = URL.createObjectURL(chunks.shift());
let currentAudioStartTime, nextAudioStartTime;
currentAudio.addEventListener("ended", () => {
nextAudio.play()
nextAudioStartTime = new Date();
if (chunks.length) {
currentAudio.src = URL.createObjectURL(chunks.shift());
}
});
nextAudio.addEventListener("ended", () => {
currentAudio.play()
currentAudioStartTime = new Date();
console.log(currentAudioStartTime - nextAudioStartTime)
if (chunks.length) {
nextAudio.src = URL.createObjectURL(chunks.shift());
}
});
firstTime = false;
}
});
The audio-received event gets called every ~1000ms. This code works; it plays each "chunk" after the last one was played, but on Chrome, there is a ~300ms delay that's very audible. It plays the first chunk, then goes quiet, then plays the second, so on. On Firefox the delay is 50ms.
Can you help me?
I can try to create a reproducible example if that would help.
I am trying to do the following:
On the server I encode h264 packets into Webm (MKV) container structure, so that each cluster gets a single frame packet.Only the first data chunk is different as it contains something called Initialization Segment.Here it is explained quite well.
Then I stream those clusters one by one in a binary stream via WebSocket to a broweser, which is Chrome.
It probably sounds weird that I use h264 codec and not VP8 or VP9, which are native codec for Webm Video Format. But it appears that html video tag has no problem to play this sort of video container. If I just write the whole stream to a file and pass it to video.src, it is played fine. But I want to stream it in real-time.That's why I am breaking the video into chunks and sending them over websocket.
On the client, I am using MediaSource API. I have little experience in Web technologies, but I found that's probably the only way to go in my case.
And it doesn't work.I am getting no errors, the streams runs ok, and the video object emits no warning or errors (checking via developer console).
The client side code looks like this:
<script>
$(document).ready(function () {
var sourceBuffer;
var player = document.getElementById("video1");
var mediaSource = new MediaSource();
player.src = URL.createObjectURL(mediaSource);
mediaSource.addEventListener('sourceopen', sourceOpen);
//array with incoming segments:
var mediaSegments = [];
var ws = new WebSocket("ws://localhost:8080/echo");
ws.binaryType = "arraybuffer";
player.addEventListener("error", function (err) {
$("#id1").append("video error "+ err.error + "\n");
}, false);
player.addEventListener("playing", function () {
$("#id1").append("playing\n");
}, false);
player.addEventListener("progress",onProgress);
ws.onopen = function () {
$("#id1").append("Socket opened\n");
};
function sourceOpen()
{
sourceBuffer = mediaSource.addSourceBuffer('video/mp4; codecs="avc1.64001E"');
}
function onUpdateEnd()
{
if (!mediaSegments.length)
{
return;
}
sourceBuffer.appendBuffer(mediaSegments.shift());
}
var initSegment = true;
ws.onmessage = function (evt) {
if (evt.data instanceof ArrayBuffer) {
var buffer = evt.data;
//the first segment is always 'initSegment'
//it must be appended to the buffer first
if(initSegment == true)
{
sourceBuffer.appendBuffer(buffer);
sourceBuffer.addEventListener('updateend', onUpdateEnd);
initSegment = false;
}
else
{
mediaSegments.push(buffer);
}
}
};
});
I also tried different profile codes for MIME type,even though I know that my codec is "high profile.I tried the following profiles:
avc1.42E01E baseline
avc1.58A01E extended profile
avc1.4D401E main profile
avc1.64001E high profile
In some examples I found from 2-3 years ago, I have seen developers using type= "video/x-matroska", but probably alot changed since then,because now even video.src doesn't handle this sort of MIME.
Additionally, in order to make sure the chunks I am sending through the stream are not corrupted, I opened a local streaming session in VLC player and it played it progressively with no issues.
The only thing I suspect that the MediaCodec doesn't know how to handle this sort of hybrid container.And I wonder then why video object plays such a video ok.Am I missing something in my client side code? Or MediacCodec API indeed doesn't support this type of media?
PS: For those curious why I am using MKV container and not MPEG DASH, for example. The answer is - container simplicity, data writing speed and size. EBML structures are very compact and easy to write in real time.
Trying to follow the example here, which is basically a c&p of this
Think I got most of the parts down, except all the node.connect()'s
From what I understand, this sequence of code is needed to provide the audio analyzer with an audio stream:
var source = audioCtx.createMediaStreamSource(stream);
source.connect(analyser);
analyser.connect(audioCtx.destination);
I can't seem to make sense of it as it looks rather ouroboros-y to me.
And unfortunately, I can't seem to find any documentation on .connect() so quite lost and would appreciate any clarification!
Oh and I'm loading an .mp3 via pure javascript new Audio('db.mp3').play(); and am trying to use that as the source without creating an <audio> element.
Can a mediaStream object be created from this to feed into .createMediaStreamSource(stream)?
connect simply defines the output for the filters.
In this case, your source loads the stream into the buffer and writes to the input of the next filter which is defined by the connect function. This is repeated for your analyser filter.
Think of it as pipes.
here is a sample code snippet that I have written a few years back using web audio api.
this.scriptProcessor = this.audioContext.createScriptProcessor(this.scriptProcessorBufferSize,
this.scriptProcessorInputChannels,
this.scriptProcessorOutputChannels);
this.scriptProcessor.connect(this.audioContext.destination);
this.scriptProcessor.onaudioprocess = updateMediaControl.bind(this);
//Set up the Gain Node with a default value of 1(max volume).
this.gainNode = this.audioContext.createGain();
this.gainNode.connect(this.audioContext.destination);
this.gainNode.gain.value = 1;
sewi.AudioResourceViewer.prototype.playAudio = function(){
if(this.audioBuffer){
this.source = this.audioContext.createBufferSource();
this.source.buffer = this.audioBuffer;
this.source.connect(this.gainNode);
this.source.connect(this.scriptProcessor);
this.beginTime = Date.now();
this.source.start(0, this.offset);
this.isPlaying = true;
this.controls.update({playing: this.isPlaying});
updateGraphPlaybackPosition.call(this, this.offset);
}
};
So as you can see that my source is connected to a gainNode, which is connected to a scriptProcessor. When the audio starts playing, the data is passed from the source->gainNode->destination and source->scriptProcessor->destination. flowing through the "pipes" that connects them, which is defined by connect(). When the audio data pass through the gainNode, volume can be adjusted by changing the amplitude of the audio wave. After that it is passed to the script processor so that events can be attached and triggered while the audio is being processed.
I've been trying to create polyphonic WAV playback with node.js on raspberry pi 3 running latest raspbian:
shelling out to aplay/mpg123/some other program - allows me to only play single sound at once
I tried combination of https://github.com/sebpiq/node-web-audio-api and https://github.com/TooTallNate/node-speaker (sample code below) but audio quality is very low, with a lot of distortions
Is there anything I'm missing here? I know I could easily do it in another programming language (I was able to write C++ code with SDL, and Python with pygame), but the question is if it's possible with node.js :)
Here's my current web-audio-api + node-speaker code:
var AudioContext = require('web-audio-api').AudioContext;
var Speaker = require('speaker');
var fs = require('fs');
var track1 = './tracks/1.wav';
var track2 = './tracks/1.wav';
var context = new AudioContext();
context.outStream = new Speaker({
channels: context.format.numberOfChannels,
bitDepth: context.format.bitDepth,
sampleRate: context.format.sampleRate
});
function play(audioBuffer) {
if (!audioBuffer) { return; }
var bufferSource = context.createBufferSource();
bufferSource.connect(context.destination);
bufferSource.buffer = audioBuffer;
bufferSource.loop = false;
bufferSource.start(0);
}
var audioData1 = fs.readFileSync(track1);
var audioData2 = fs.readFileSync(track2);
var audioBuffer1, audioBuffer2;
context.decodeAudioData(audioData1, function(audioBuffer) {
audioBuffer1 = audioBuffer;
if (audioBuffer1 && audioBuffer2) { playBoth(); }
});
context.decodeAudioData(audioData2, function(audioBuffer) {
audioBuffer2 = audioBuffer;
if (audioBuffer1 && audioBuffer2) { playBoth(); }
});
function playBoth() {
console.log('playing...');
play(audioBuffer1);
play(audioBuffer2);
}
audio quality is very low, with a lot of distortions
According to the WebAudio spec (https://webaudio.github.io/web-audio-api/#SummingJunction):
No clipping is applied at the inputs or outputs of the AudioNode to allow a maximum of dynamic range within the audio graph.
Now if you're playing two audio streams, it's possible that summing them results in a value that's beyond the acceptable range, which sounds like - distortions.
Try lowering the volume of each audio stream by first piping them through a GainNode as so:
function play(audioBuffer) {
if (!audioBuffer) { return; }
var bufferSource = context.createBufferSource();
var gainNode = context.createGain();
gainNode.gain.value = 0.5 // for instance, find a good value
bufferSource.connect(gainNode);
gainNode.connect(context.destination);
bufferSource.buffer = audioBuffer;
bufferSource.loop = false;
bufferSource.start(0);
}
Alternatively, you could use a DynamicsCompressorNode, but manually setting the gain gives you more control over the output.
This isn't exactly answer-worthy but I can't post comments at the moment ><
I had a similar problem with an app made using js audio api and the, rather easy fix, was lowering the quality of the audio and changing the format.
In your case what I could think of is setting the bit depth&sampling frequency as low as possible without affecting the listener's experience (e.g. 44.1kHz and 16 bit depth).
You might also try changing the format, wav, in theory, should be quite good at the job of not being CPU intensive, however, there are other uncompressed formats (e.g. .aiff)
You may try using multiple cores of the pi:
https://nodejs.org/api/cluster.html
Although this may prove a bit complicated, if you are doing the audio-streaming in parallel with other unrelated processes, you could try moving the audio on a separate CPU.
An (easy) thing you could try would be running node with more RAM, although, in your case, I doubt that I possible.
The biggest problem, however, might be the code, sadly enough I am not experienced with the modules you are using and as such can give to real advice on that (hence, why I said this is not answer worthy :p)
when you create Speaker instant, set parameter like this
channels = 1 // you can try with 1 or 2 and get the best quantity
bitDepth = 16
sampleRate = 48000 // normally 44100 for speaking and higher for music playing
You can spawn from node 2 aplay processes each playing one file. Use detached: true to allow node to continue running.
Question: I am using web audio API. I need to buffer a non-stop audio stream, like a radio stream. and when I get a notification, I need to get the past 3s audio data and send it to server. How can I do achieve that? nodejs has a built in buffer, but it seems not a circular buffer, if I write a non-stop stream into it, it seems to be overflowed.
Background to help u understand my question:
I am implementing an ambient audio based web authentication method. Briefly, I need to compare two pieces of audio signal (one from the client, and one from the anchor device, they are all time synced with server), if they are similar enough, the authentication request will be approved by the server. The audio recording is implemented on both the client and the anchor device using web Audio API.
I need to manage a buffer on the anchor device to stream the ambient audio. The anchor device is supposed to be running all the time, so the stream is not going to be ended.
You can capture the audio from a stream using the ScriptProcessorNode. Whilst this is deprecated no browser as of now actually implements the new AudioWorker.
var N = 1024;
var time = 3; // Desired time of capture;
var frame_holder = [];
var time_per_frame = N / context.sampleRate;
var num_frames = Math.ceil(time / time_per_frame); // Minimum number to meet time
var script = context.createScriptProcessor(N,1,1);
script.connect(context.destination);
script.onaudioprocess = function(e) {
var input = e.inputBuffer.getChannelData(0);
var output = e.outputBuffer.getChannelData(0);
var copy = new Float32Array(input.length);
for (var n=0; n<input.length; n++) {
output[n] = 0.0; // Null this as I guess you are capturing microphone
copy[n] = input[n];
}
// Now we need to see if we have more than 3s worth of frames
if (frame_holder.length > num_frames) {
frame_holder = frame_holder.slice(frame_holder.length-num_frames);
}
// Add in the current frame
var temp = frame_holder.slice(1); // Cut off first frame;
frame_holder = temp.concat([copy]); // Add the latest frame
}
Then for actual transmission, you just need to string the copied frames together. It is easier than trying to keep one long array though of course that is also possible.