Steaming fragmented Webm over websocket to MediaSouce - javascript

I am trying to do the following:
On the server I encode h264 packets into Webm (MKV) container structure, so that each cluster gets a single frame packet.Only the first data chunk is different as it contains something called Initialization Segment.Here it is explained quite well.
Then I stream those clusters one by one in a binary stream via WebSocket to a broweser, which is Chrome.
It probably sounds weird that I use h264 codec and not VP8 or VP9, which are native codec for Webm Video Format. But it appears that html video tag has no problem to play this sort of video container. If I just write the whole stream to a file and pass it to video.src, it is played fine. But I want to stream it in real-time.That's why I am breaking the video into chunks and sending them over websocket.
On the client, I am using MediaSource API. I have little experience in Web technologies, but I found that's probably the only way to go in my case.
And it doesn't work.I am getting no errors, the streams runs ok, and the video object emits no warning or errors (checking via developer console).
The client side code looks like this:
<script>
$(document).ready(function () {
var sourceBuffer;
var player = document.getElementById("video1");
var mediaSource = new MediaSource();
player.src = URL.createObjectURL(mediaSource);
mediaSource.addEventListener('sourceopen', sourceOpen);
//array with incoming segments:
var mediaSegments = [];
var ws = new WebSocket("ws://localhost:8080/echo");
ws.binaryType = "arraybuffer";
player.addEventListener("error", function (err) {
$("#id1").append("video error "+ err.error + "\n");
}, false);
player.addEventListener("playing", function () {
$("#id1").append("playing\n");
}, false);
player.addEventListener("progress",onProgress);
ws.onopen = function () {
$("#id1").append("Socket opened\n");
};
function sourceOpen()
{
sourceBuffer = mediaSource.addSourceBuffer('video/mp4; codecs="avc1.64001E"');
}
function onUpdateEnd()
{
if (!mediaSegments.length)
{
return;
}
sourceBuffer.appendBuffer(mediaSegments.shift());
}
var initSegment = true;
ws.onmessage = function (evt) {
if (evt.data instanceof ArrayBuffer) {
var buffer = evt.data;
//the first segment is always 'initSegment'
//it must be appended to the buffer first
if(initSegment == true)
{
sourceBuffer.appendBuffer(buffer);
sourceBuffer.addEventListener('updateend', onUpdateEnd);
initSegment = false;
}
else
{
mediaSegments.push(buffer);
}
}
};
});
I also tried different profile codes for MIME type,even though I know that my codec is "high profile.I tried the following profiles:
avc1.42E01E baseline
avc1.58A01E extended profile
avc1.4D401E main profile
avc1.64001E high profile
In some examples I found from 2-3 years ago, I have seen developers using type= "video/x-matroska", but probably alot changed since then,because now even video.src doesn't handle this sort of MIME.
Additionally, in order to make sure the chunks I am sending through the stream are not corrupted, I opened a local streaming session in VLC player and it played it progressively with no issues.
The only thing I suspect that the MediaCodec doesn't know how to handle this sort of hybrid container.And I wonder then why video object plays such a video ok.Am I missing something in my client side code? Or MediacCodec API indeed doesn't support this type of media?
PS: For those curious why I am using MKV container and not MPEG DASH, for example. The answer is - container simplicity, data writing speed and size. EBML structures are very compact and easy to write in real time.

Related

How to play media files sequentially without visible break?

I've worded my title, and tags in a way that should be searchable for both video and audio, as this question isn't specific to one. My specific case only concerns audio though, so my question body will be written specific to that.
First, the big picture:
I'm sending audio to multiple P2P clients who will connect and disconnect a random intervals. The audio I'm sending is a stream, but each client only needs the part of the stream from whence they connected. Here's how I solved that:
Every {timeout} (e.g. 1000ms), create a new audio blob
Blob will be a full audio file, with all metadata it needs to be playable
As soon as a blob is created, convert to array buffer (better browser support), and upload to client over WebRTC (or WebSockets if they don't support)
That works well. There is a delay, but if you keep the timeout low enough, it's fine.
Now, my question:
How can I play my "stream" without having any audible delay?
I say stream, but I didn't implement it using the Streams API, it is a queue of blobs, that gets updated every time the client gets new data.
I've tried a lot of different things like:
Creating a BufferSource, and merging two blobs (converted to audioBuffers) then playing that
Passing an actual stream from Stream API to clients instead of blobs
Playing blobs sequentially, relying on ended event
Loading next blob while current blob is playing
Each has problems, difficulties, or still results in an audible delay.
Here's my most recent attempt at this:
let firstTime = true;
const chunks = [];
Events.on('audio-received', ({ detail: audioChunk }) => {
chunks.push(audioChunk);
if (firstTime && chunks.length > 2) {
const currentAudio = document.createElement("audio");
currentAudio.controls = true;
currentAudio.preload = 'auto';
document.body.appendChild(currentAudio);
currentAudio.src = URL.createObjectURL(chunks.shift());
currentAudio.play();
const nextAudio = document.createElement("audio");
nextAudio.controls = true;
nextAudio.preload = 'auto';
document.body.appendChild(nextAudio);
nextAudio.src = URL.createObjectURL(chunks.shift());
let currentAudioStartTime, nextAudioStartTime;
currentAudio.addEventListener("ended", () => {
nextAudio.play()
nextAudioStartTime = new Date();
if (chunks.length) {
currentAudio.src = URL.createObjectURL(chunks.shift());
}
});
nextAudio.addEventListener("ended", () => {
currentAudio.play()
currentAudioStartTime = new Date();
console.log(currentAudioStartTime - nextAudioStartTime)
if (chunks.length) {
nextAudio.src = URL.createObjectURL(chunks.shift());
}
});
firstTime = false;
}
});
The audio-received event gets called every ~1000ms. This code works; it plays each "chunk" after the last one was played, but on Chrome, there is a ~300ms delay that's very audible. It plays the first chunk, then goes quiet, then plays the second, so on. On Firefox the delay is 50ms.
Can you help me?
I can try to create a reproducible example if that would help.

Is it possible to merge multiple webm blobs/clips into one sequential video clientside?

I already looked at this question -
Concatenate parts of two or more webm video blobs
And tried the sample code here - https://developer.mozilla.org/en-US/docs/Web/API/MediaSource -- (without modifications) in hopes of transforming the blobs into arraybuffers and appending those to a sourcebuffer for the MediaSource WebAPI, but even the sample code wasn't working on my chrome browser for which it is said to be compatible.
The crux of my problem is that I can't combine multiple blob webm clips into one without incorrect playback after the first time it plays. To go straight to the problem please scroll to the line after the first two chunks of code, for background continue reading.
I am designing a web application that allows a presenter to record scenes of him/herself explaining charts and videos.
I am using the MediaRecorder WebAPI to record video on chrome/firefox. (Side question - is there any other way (besides flash) that I can record video/audio via webcam & mic? Because MediaRecorder is not supported on not Chrome/Firefox user agents).
navigator.mediaDevices.getUserMedia(constraints)
.then(gotMedia)
.catch(e => { console.error('getUserMedia() failed: ' + e); });
function gotMedia(stream) {
recording = true;
theStream = stream;
vid.src = URL.createObjectURL(theStream);
try {
recorder = new MediaRecorder(stream);
} catch (e) {
console.error('Exception while creating MediaRecorder: ' + e);
return;
}
theRecorder = recorder;
recorder.ondataavailable =
(event) => {
tempScene.push(event.data);
};
theRecorder.start(100);
}
function finishRecording() {
recording = false;
theRecorder.stop();
theStream.getTracks().forEach(track => { track.stop(); });
while(tempScene[0].size != 1) {
tempScene.splice(0,1);
}
console.log(tempScene);
scenes.push(tempScene);
tempScene = [];
}
The function finishRecording gets called and a scene (an array of blobs of mimetype 'video/webm') gets saved to the scenes array. After it gets saved. The user can then record and save more scenes via this process. He can then view a certain scene using this following chunk of code.
function showScene(sceneNum) {
var sceneBlob = new Blob(scenes[sceneNum], {type: 'video/webm; codecs=vorbis,vp8'});
vid.src = URL.createObjectURL(sceneBlob);
vid.play();
}
In the above code what happens is the blob array for the scene gets turning into one big blob for which a url is created and pointed to by the video's src attribute, so -
[blob, blob, blob] => sceneBlob (an object, not array)
Up until this point everything works fine and dandy. Here is where the issue starts
I try to merge all the scenes into one by combining the blob arrays for each scene into one long blob array. The point of this functionality is so that the user can order the scenes however he/she deems fit and so he can choose not to include a scene. So they aren't necessarily in the same order as they were recorded in, so -
scene 1: [blob-1, blob-1] scene 2: [blob-2, blob-2]
final: [blob-2, blob-2, blob-1, blob-1]
and then I make a blob of the final blob array, so -
final: [blob, blob, blob, blob] => finalBlob
The code is below for merging the scene blob arrays
function mergeScenes() {
scenes[scenes.length] = [];
for(var i = 0; i < scenes.length - 1; i++) {
scenes[scenes.length - 1] = scenes[scenes.length - 1].concat(scenes[i]);
}
mergedScenes = scenes[scenes.length - 1];
console.log(scenes[scenes.length - 1]);
}
This final scene can be viewed by using the showScene function in the second small chunk of code because it is appended as the last scene in the scenes array. When the video is played with the showScene function it plays all the scenes all the way through. However, if I press play on the video after it plays through the first time, it only plays the last scene.
Also, if I download and play the video through my browser, the first time around it plays correctly - the subsequent times, I see the same error.
What am I doing wrong? How can I merge the files into one video containing all the scenes? Thank you very much for your time in reading this and helping me, and please let me know if I need to clarify anything.
I am using a element to display the scenes
The file's headers (metadata) should only be appended to the first chunk of data you've got.
You can't make an new video file by just pasting one after the other, they've got a structure.
So how to workaround this ?
If I understood correctly your problem, what you need is to be able to merge all the recorded videos, just like if it were only paused.
Well this can be achieved, thanks to the MediaRecorder.pause() method.
You can keep the stream open, and simply pause the MediaRecorder. At each pause event, you'll be able to generate a new video containing all the frames from the beginning of the recording, until this event.
Here is an external demo because stacksnippets don't works well with gUM...
And if ever you needed to also have shorter videos from between each resume and pause events, you could simply create new MediaRecorders for these smaller parts, while keeping the big one running.

rewriting Java code to JS - creating an audio from bytes?

I'm trying to rewrite some (very simple) android code I found written in Java into a static HTML5 app (I don't need a server to do anything, I'd like to keep it that way). I have extensive background in web development, but basic understanding of Java, and even less knowledge in Android development.
The only function of the app is to take some numbers and convert them into an audio chirp from bytes. I have absolutely no problem translating the mathematical logic into JS. Where I'm having trouble is when it gets to actually producing the sound. This is the relevant parts of the original code:
import android.media.AudioFormat;
import android.media.AudioManager;
import android.media.AudioTrack;
// later in the code:
AudioTrack track = new AudioTrack(AudioManager.STREAM_MUSIC, sampleRate, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT, minBufferSize, AudioTrack.MODE_STATIC);
// some math, and then:
track.write(sound, 0, sound.length); // sound is an array of bytes
How do I do this in JS? I can use a dataURI to produce the sound from the bytes, but does that allow me to control the other information here (i.e., sample rate, etc.)? In other words: What's the simplest, most accurate way to do this in JS?
update
I have been trying to replicate what I found in this answer. This is the relevant part of my code:
window.onload = init;
var context; // Audio context
var buf; // Audio buffer
function init() {
if (!window.AudioContext) {
if (!window.webkitAudioContext) {
alert("Your browser does not support any AudioContext and cannot play back this audio.");
return;
}
window.AudioContext = window.webkitAudioContext;
}
context = new AudioContext();
}
function playByteArray( bytes ) {
var buffer = new Uint8Array( bytes.length );
buffer.set( new Uint8Array(bytes), 0 );
context.decodeAudioData(buffer.buffer, play);
}
function play( audioBuffer ) {
var source = context.createBufferSource();
source.buffer = audioBuffer;
source.connect( context.destination );
source.start(0);
}
However, when I run this I get this error:
Uncaught (in promise) DOMException: Unable to decode audio data
Which I find quite extraordinary, as it's such a general error it manages to beautifully tell me exactly squat about what is wrong. Even more surprising, when I debugged this step by step, even though the chain of the errors starts (expectedly) with the line context.decodeAudioData(buffer.buffer, play); it actually runs into a few more lines within the jQuery file (3.2.1, uncompressed), going through lines 5208, 5195, 5191, 5219, 5223 and lastly 5015 before erroring out. I have no clue why jQuery has anything to do with it, and the error gives me no idea what to try. Any ideas?
If bytes is an ArrayBuffer it is not necessary to create a Uint8Array. You can pass ArrayBuffer bytes as parameter to AudioContext.decodeAudioData() which returns a Promise, chain .then() to .decodeAudioData(), call with play function as parameter.
At javascript at stacksnippets, <input type="file"> element is used to accept upload of audio file, FileReader.prototype.readAsArrayBuffer() creates ArrayBuffer from File object, which is passed to playByteArray.
window.onload = init;
var context; // Audio context
var buf; // Audio buffer
var reader = new FileReader(); // to create `ArrayBuffer` from `File`
function init() {
if (!window.AudioContext) {
if (!window.webkitAudioContext) {
alert("Your browser does not support any AudioContext and cannot play back this audio.");
return;
}
window.AudioContext = window.webkitAudioContext;
}
context = new AudioContext();
}
function handleFile(file) {
console.log(file);
reader.onload = function() {
console.log(reader.result instanceof ArrayBuffer);
playByteArray(reader.result); // pass `ArrayBuffer` to `playByteArray`
}
reader.readAsArrayBuffer(file);
};
function playByteArray(bytes) {
context.decodeAudioData(bytes)
.then(play)
.catch(function(err) {
console.error(err);
});
}
function play(audioBuffer) {
var source = context.createBufferSource();
source.buffer = audioBuffer;
source.connect(context.destination);
source.start(0);
}
<input type="file" accepts="audio/*" onchange="handleFile(this.files[0])" />
I solved it myself. I read more into the MDN docs explaining AudioBuffer and realized two important things:
I didn't need to decodeAudioData (since I'm creating the data myself, there's nothing to decode). I actually took that bit from the answer I was replicating and it retrospect, it was entirely needless.
Since I'm working with a 16 Bit PCM stereo, that meant I needed to use the Float32Array (2 Channels, each 16 Bit).
Granted, I still had a problem with some of my calculations that resulted in a distorted sound, but as far as producing the sound itself, I ended up doing this really simple solution:
function playBytes(bytes) {
var floats = new Float32Array(bytes.length);
bytes.forEach(function( sample, i ) {
floats[i] = sample / 32767;
});
var buffer = context.createBuffer(1, floats.length, 48000),
source = context.createBufferSource();
buffer.getChannelData(0).set(floats);
source.buffer = buffer;
source.connect(context.destination);
source.start(0);
}
I can probably optimize it a bit further - the 32767 part should happen before this, in the part where I'm calculating the data, for example. Also, I'm creating a Float32Array with two channels, then outputting one of them cause I really don't need both. I couldn't figure out if there's a way to create one channel mono file with Int16Array, or if that's even necessary\better.
Anyway, that's essentially it. It's really just the most basic solution, with some minimal understanding on my part of how to handle my data correctly. Hope this helps anyone out there.

Playing audio from stream with HTML5

I've got problem with playing audio from stream in web browser. I am streaming voice via BinaryJS as 2048-length Int16 arrays to client's browser and I am getting very noisy output. This is how I am receiving the stream:
var
client,
audioContext,
audioBuffer,
sourceNode,
sampleSize = 2048,
function main() {
try {
audioContext = new AudioContext();
} catch(e) {
alert('Web Audio API is not supported in this browser');
}
sourceNode = audioContext.createBufferSource();
sourceNode.connect(audioContext.destination);
connectClient();
playSound();
};
function playSound() {
sourceNode.start(0);
};
function onError(e) {
console.log('error' + e);
};
function connectClient() {
client = new BinaryClient('ws://localhost:3000');
client.on('open', function() {
console.log('stream opened');
});
client.on('stream', function(stream, meta){
stream.on('data', function(data){
var integers = new Int16Array(data);
var audioBuffer = audioContext.createBuffer(1, 2048, 4410);
audioBuffer.getChannelData(0).set(integers);
sourceNode.buffer = audioBuffer;
});
});
};
main();
What should I do to get clean audio?
I think your playback technique here is causing the noise – first of all, can you ensure that your stream event fires often enough to provide continuous data for Web Audio? If it doesn't, then the noise you hear is the pops and clicks after each bit of audio stops playing. Also, updating the data by swapping ArrayBuffers on sourceNode isn't a good idea.
I would suggest using a ScriptProcessorNode with your desired buffer size – this gives you the ability to insert raw data into the graph at audio rate, which should reduce the noise. Look at this demo to see how to set up the node, basically you just need to fill the outputBuffer with your stream data.
This will only really help if your streaming is faster than the audio rate!
Good luck,
jakub

FFmpeg converting from video to audio missing duration

I'm attempting to load YouTube videos via their direct video URL (retrieved using ytdl-core). I load them using the request library. I then pipe the result to a stream, which is used as the input to ffmpeg (via fluent-ffmpeg). The code looks something like this:
var getAudioStream = function(req, res) {
var requestUrl = 'http://youtube.com/watch?v=' + req.params.videoId;
var audioStream = new PassThrough();
var videoUrl;
ytdl.getInfo(requestUrl, { downloadURL: true }, function(err, info) {
res.setHeader('Content-Type', 'audio/x-wav');
res.setHeader('Accept-Ranges', 'bytes');
videoUrl = info.formats ? info.formats[0].url : '';
request(videoUrl).pipe(audioStream);
ffmpeg()
.input(audioStream)
.outputOptions('-map_metadata 0')
.format('wav')
.pipe(res);
});
};
This actually works just fine, and the frontend successfully receives just the audio in WAV format and is playable. However, the audio is missing any information about its size or duration (and all other metadata). This also makes it unseekable.
I'm assuming this is lost somewhere during the ffmpeg stage, because if I load the video directly via the URL passed to request it loads and plays fine, and has a set duration/is seekable. Any ideas?
It isn't possible to know the output size nor duration until it is finished. FFmpeg cannot know this information ahead of time in most cases. Even if it could, the way you are executing FFmpeg it prevents you from accessing the extra information.
Besides, to support seeking you need to support range requests. This isn't possible either, short of encoding the file up to the byte requested and streaming from there on.
Basically, this isn't possible by the nature of what you're doing.

Categories

Resources