I have an icecast setup running on a server. The clients that will be connecting to it are tags in web pages, either through HTML5 or Flash. I am currently using audio.js to achieve this (specifically, the flash fallback).
The problem is, the audio is being played concurrently but separately with a stream of images. (It's a 10-fps jpeg stream.) I need the audio to match up as much as possible with the images. Unfortunately, the audio is sometimes as much as 7 seconds delayed before it starts playing.
Some information:
The image stream cannot be delayed to match the audio. The audio must speed up to match the images.
The icecast server config has <burst-on-connect> set to 0 to minimize latency.
There is essentially no lag when playing via VLC (perhaps a few hundred ms, which is acceptable).
Put another way, when viewing the images and playing the audio via vlc, everything is sufficiently aligned. Unfortunately, using VLC is not an option in the endgame.
Since VLC has no lag, that tells me that the web browser (Chrome, firefox, IE) is buffering the audio before playing it.
The question: How do I prevent the web browser from buffering the audio? I want it to play immediately as soon as it has anything available. I'm currently using audio.js, but other similar technologies are acceptable.
Additional information: I've set audio.js to autoplay and preload=none.
Thanks for your help!
A buffer is always necessary. Networks are packet switched. Data comes in chunks, not continuously. In fact, there are many buffers:
Capture buffer (at the sound card)
Codec buffer (codecs work on a chunk of samples at a time)
Network buffer to server
Server-side buffer (typically very large, 10+ seconds)
Network buffer to client
Client buffer (typically 2-3 seconds)
Client codec buffer
Client sound device buffer
Each buffer adds latency, as you have noticed. The only buffer you really have control over is the server-side buffer, which is configured by the <burst-on-connect> setting. By setting the size of this buffer to a larger size, you can fill all downstream buffers very quickly, enabling an extremely fast start to playback. You have set this to zero, which means that the downstream buffers can only fill as fast as the data comes in from the encoder.
Client-side, you have absolutely no control over the buffering, and nor should you. Clients are free to implement a codec in whatever way they choose. Some codecs can begin streaming right away, and others can't. Some devices have to re-sample your audio to fit within their playback, and others don't.
What it sounds like you really want to do is synchronize a video stream and an audio stream. For that, you should be just streaming a video stream to begin with. Video is made to keep audio and video in sync. Icecast even supports streaming video in a few formats.
Related
I am developing audio chat. Sounds are recorded in a browser (Firefox 91.02, javascript) using MediaRecorder that produces blobs of audio type - I have used MediaRecorder.start(timeslice) method in order to get chunks (blobs) of the sound every milliseconds. The blobs will be sent to server, which retranslate them to another clients, which will listen in their browsers.
The test version records sound and play using Audio object it in the only browser (without sending to network). Each blob that I got every milliseconds I try to play independently.
The problem I have faced is: only the first blob was played correctly, other are not played (browser says: data format is not suitable). I guess this occurs because of I try to play sounds blobs independently.
The question is: is there a solution to record sound blobs which I can play independently - not joining them for player (except for stopping the mediarecorder and restarting periodically) ?
I have a webrtc react app, where users can simulcast their stream to youtube, facebook etc (like restream.io).
I want to send both streams (screen share and webcam) as one video (half screen share and half webcam, webcam overlayed on screen share, captions on top of video) like studio.restream.io
Everything is working fine with drawing streams on canvas and piping data using websocket to backend where it is transcoded to rtmp and sent to fb,yt etc.(This method is working only in high end PC).
But the only problem with this method is when i draw stream on canvas it takes lot of cpu and browser hangs(only works when you have gpu).
The question is how to optimize this?
Do we need a back-end service to merge videos using ffmpeg? or
Is there any way to do it in browser?
In general, the canvas operations (and a lot of other drawing-related operations) in the browser assume a GPU is available and are very slow when they have to run on the CPU.
For what you're doing, you probably do need to run the browser on hardware that has a GPU.
You're right that you can do this kind of compositing more flexibly using ffmpeg or GStreamer. We've used both ffmpeg and GStreamer pretty extensively, at Daily.co.
For our production live streaming workers, we use GStreamer running on AWS instances without GPUs. Our media servers forward the WebRTC rtp tracks as raw rtp to a GStreamer process, which decodes the tracks, composites the video tracks, mixes the audio tracks, and encodes to RTMP. GStreamer has a steep learning curve and is a totally different toolkit than the browser, but it's also efficient and flexible in ways that running in the browser can't quite be.
I'm sending audio from getUserMedia over a WebRTC MediaStream. I want to send all types of audio data, not just voices. My problem is that the audio cuts out if there are no voices even if there is some background noise that the microphone picks up. I thought the problem was with the voiceActivityDetection parameter when creating the offer, however even when I disable it the problem persists. How should I go about sending the essentially raw audio data over a MediaStream without the voice processing?
Unfortunately, when these APIs were designed, they were built with a lot of processing enabled by default. Fortunately, they can be disabled with your getUserMedia audio constraints:
autoGainControl: false,
echoCancellation: false,
noiseSuppression: false
This will greatly improve your audio quality. Note however that the whole WebRTC stack will still have effects on your audio. A WebRTC call is optimized for realtime voice communication. Audio is constantly adjusted in speed to keep to as realtime as possible. Comfort noise is often added (and can be disabled by munging your SDP). Audio bitrates are often set very low (also adjustable via SDP). If audio quality matters to you (such as for music), WebRTC probably isn't the best choice. Consider MediaRecorder instead.
I would like to play a h264 stream in html5 (mse) which comes from a live camera.
What I want to achieve is to push raw h264 frames over websocket, and leave the fragmentation on the browser itself.
It should wait for the pps/sps units then it should do the fragmentation for the mse playback.
The server (camera) which creates the stream, can run ffmpeg to fragment the h264 to mp4 but it is starting slow (takes too much time while finding sps/pps). This is why I want to do the whole thing on the client-side.
I have tried MP4Box.js without success. Now looking at mux.js but still unsure how can I do this.
(If I run ffmpeg and wait until it finds sps/pps, I can display the stream in the HTML5 video item with mse)
Technically, ffmpeg command line will contain '-analyzeduration 32' to prevent ffmpeg from keep my device hanging. (But this also prevents it to wait for the correct sps/pps units => that's why I will need to do it on the client side)
I'm transferring a live audio stream between 2 Electron window processes using WebRTC. There are no ICE or STUN servers, or anything like that, the connection is established manually through Electron IPC communication (based on this code).
Note: from the technical point of view regarding the audio streams themselves, this is very similar (if not identical) to streaming between 2 browser tabs on the same domain, so this is primarily not a question regarding Electron itself, although Electron IPC would be obviously substituted with a browser equivalent.
The audio stream works, I can transmit audio from one window to another in real-time, as it is generated. That is, I can generate audio (Web Audio API) in window "A" and listen to it through an <audio> element in window "B", or do processing on it using a separate AudioContext in window "B" (although there is some latency).
However, the audio data is vastly altered during streaming: it became mono, its quality dropped, and there is significant latency. After fiddling around I've learned WebRTC does pretty much everything I don't need, including encoding the audio stream with an audio codec, encrypting the transfer, running echo cancellation, and so on.
I don't need these. I need to simply transfer raw audio data through local WebRTC without altering the audio in any way. It needs to be float32 accurate to the sample.
How can I do this with WebRTC?
Why use WebRTC then?
I need to do custom audio processing inside the Web Audio API.
The only way to do this is using a ScriptProcessorNode, which is unusuable in production code when there's essentially anything on the page, because it is broken by design (it processes audio on the UI thread, and causes audio glitching by even slight UI interactions).
So basically, because of this (and to the best of my knowledge), my only option is to transfer audio with WebRTC streams to another window process, perform ScriptProcessorNode processing there (nothing more is happening in that window, empty DOM, so the processing is always nice and smooth), then send the results back.
This works, but the audio is altered during streaming, which I want to avoid (see above).
Why not use AudioWorklet?
Because Electron is 5 versions behind Chrome unfortunately (version 59 at the moment), and simply does not ship AudioWorklet yet.