Streaming Icecast Audio & Metadata with Javascript and the Web Audio API - javascript

I've been trying to figure out the best way to go about implementing an idea I've had for a while.
Currently, I have an icecast mp3 stream of a radio scanner, with "now playing" metadata that is updated in realtime depending on what channel the scanner has landed on. When using a dedicated media player such as VLC, the metadata is perfectly lined up with the received audio and it functions exactly as I want it to - essentially a remote radio scanner. I would like to implement something similar via a webpage, and on the surface this seems like a simple task.
If all I wanted to do was stream audio, using simple <audio> tags would suffice. However, HTML5 audio players have no concept of the embedded in-stream metadata that icecast encodes along with the mp3 audio data. While I could query the current "now playing" metadata from the icecast server status json, due to client & serverside buffering there could be upwards of 20 seconds of delay between audio and metadata when done in this fashion. When the scanner is changing its "now playing" metadata upwards of every second in some cases, this is completely unsuitable for my application.
There is a very interesting Node.JS solution that was developed with this exact goal in mind - realtime metadata in a radio scanner application: icecast-metadata-js. This shows that it is indeed possible to handle both audio and metadata from a single icecast stream. The live demo is particularly impressive: https://eshaz.github.io/icecast-metadata-js/
However, I'm looking for a solution that can run totally clientside without needing a Node.JS installation and it seems like that should be relatively trivial.
After searching most of the day today, it seems that there are several similar questions asked on this site and elsewhere, without any cohesive, well-laid out answers or recommendations. From what I've been able to gather so far, I believe my solution is to use a Javascript streaming function (such as fetch) to pull the raw mp3 & metadata from the icecast server, playing the audio via Web Audio API and handling the metadata blocks as they arrive. Something like the diagram below:
I'm wondering if anyone has any good reading and/or examples for playing mp3 streams via the Web Audio API. I'm still a relative novice at most things JS, but I get the basic idea of the API and how it handles audio data. What I'm struggling with is the proper way to implement a) the live processing of data from the mp3 stream, and b) detecting metadata chunks embedded in the stream and handling those accordingly.
Apologies if this is a long-winded question, but I wanted to give enough backstory to explain why I want to go about things the specific way I do.
Thanks in advance for the suggestions and help!

I'm glad you found my library icecast-metadata-js! This library can actually be used both client-side and in NodeJS. All of the source code for the live demo, which runs completely client side, is here in the repository: https://github.com/eshaz/icecast-metadata-js/tree/master/src/demo. The streams in the demo are unaltered and are just normal Icecast streams on the server side.
What you have in your diagram is essentially correct. ICY metadata is interlaced within the actual MP3 "stream" data. The metadata interval or frequency that ICY metadata updates happen can be configured in the Icecast server configuration XML. Also, it may depend on your how frequent / accurate your source is for sending metadata updates to Icecast. The software used in the police scanner on my demo page updates almost exactly in time with the audio.
Usually, the default metadata interval is 16,000 bytes meaning that for every 16,000 stream (mp3) bytes, a metadata update will sent from Icecast. The metadata update always contains a length byte. If the length byte is greater than 0, the length of the metadata update is the metadata length byte * 16.
ICY Metadata is a string of key='value' pairs delimited by a semicolon. Any unused length in the metadata update is null padded.
i.e. "StreamTitle='The Stream Title';StreamUrl='https://example.com';\0\0\0\0\0\0"
read [metadataInterval bytes] -> Stream data
read [1 byte] -> Metadata Length
if [Metadata Length > 0]
read [Metadata Length * 16 bytes] -> Metadata
byte length
response data
action
ICY Metadata Interval
stream data
send to your audio decoder
1
metadata length byte
use to determine length of metadata string (do not send to audio decoder)
Metadata Length * 16
metadata string
decode and update your "Now Playing" (do not send to audio decoder)
The initial GET request to your Icecast server will need to include the Icy-MetaData: 1 header, which tells Icecast to supply the interlaced metadata. The response header will contain the ICY metadata interval Icy-MetaInt, which should be captured (if possible) and used to determine the metadata interval.
In the demo, I'm using the client-side fetch API to make that GET request, and the response data is supplied into an instance of IcecastReadableStream which splits out the stream and metadata, and makes each available via callbacks. I'm using the Media Source API to play the stream data, and to get the timing data to properly synchronize the metadata updates.
This is the bare-minimum CORS configuration needed for reading ICY Metadata:
Access-Control-Allow-Origin: '*' // this can be scoped further down to your domain also
Access-Control-Allow-Methods: 'GET, OPTIONS'
Access-Control-Allow-Headers: 'Content-Type, Icy-Metadata'
icecast-metadata-js can detect the ICY metadata interval if needed, but it's better to allow clients to read it from the header with this additional CORS configuration:
Access-Control-Expose-Headers: 'Icy-MetaInt'
Also, I'm planning on releasing a new feature (after I finish with Ogg metadata) that encapsulates the fetch api logic so that all a user needs to do is supply an Icecast endpoint, and get audio / metadata back.

Related

How can I take one frame from each MediaStream?

In our javascript app we are trying to extract a single frame from each video tag, which includes
'MediaStream's from each user (with audioTrack and videoTrack), which they received using navigator.mediaDevices.getUserMedia and then send said stream using peerjs API, with peerConnection.answer(stream).
Now I am trying to extract a single frame from MediaStream's videoTrack, which will then be sent to another server.
I have not dealt with mediaStreams in the past and will like to know any suggestions on how to implement it. will include in the next days entire code but will take some time to crop the relevant segments of the code. Thank you

Send live recording from HTML frontend to Google Cloud Speech via Flask backend

Alright, so I'm working on a class project and I'm trying to send a recording made using javascript's navigator.mediaDevices.getUserMedia and MediaRecorder classes to the backend of my web application (written in Python, Flask) and to the Google Speech to Text API (google-cloud-speech)
So far, I've gotten to the point of making a recording, but I can't seem to get it to the Google Cloud API successfully. Here's how I'm trying to do it:
Use navigator.mediaDevices.getUserMedia to recognize the user's microphone
Use the resulting audio stream to make a MediaRecorder object
Use that recorder object to make a blob with the resulting audio (with {'type' : 'audio/flac'})
Base64Encode it and write it to a hidden form element, and submit the corresponding form
From there, the resulting POST request goes to my Python Flask backend, where it reads in the Base64 encoded string as a... string
Attempt to use the google-cloud-speech client to decode the text
It's not working. I'm using the Python library, and I can't seem to send the base64 string directly (because the Python library wants bytes instead). I've tried base64decoding the string back into bytes, but when I ran it through the API, I always seem to get empty ([]) results. After looking this up briefly, it seems that sample rate could be a problem. I've attempted to set the sample rate on the navigator.mediaDevices.getUserMedia() object to 16000--the constructor looks like this:
navigator.mediaDevices.getUserMedia({ audio: true, sampleRate: 16000 })
and the config part of my client.recognize() call (in my Python backend) looks like this:
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=int(sampleRate),
language_code="en-US",
)
Does anyone have any idea what's the issues(s) are here?

Bypass the 6 downloads limit for multiple videos watching

I have to code a website with the capability of watching many live streams (video-surveillance cameras) at the same time.
So far, I'm using MJPEG and JS to play my live videos and it is working well ... be only up to 6 streams !
Indeed, I'm stuck with the 6 parallel downloads limit most browser have (link).
Does someone know how to by-pass this limit ? Is there a tip ?
So far, my options are:
increase the limit (only possible on Firefox) but I don't like messing with my users browser settings
merge the streams in one big stream/video on the server side, so that I can have one download at the time. But then I won't be able to deal with each stream individually, won't I ?
Switch to JPEG stream and deal with a queue of images to be refreshed on the front side (but if I have say 15 streams, I'm afraid I will collapse my client browser on the requests (15x25images/s)
Do I have any other options ? Is there a tip or a lib, for example could I merge my stream in one big pipe (so 1 download at the time) but have access to each one individually in the front code ?
I'm sure I'm on the right stack-exchange site to ask this, if I'm not please tell me ;-)
Why not stream (if you have control over the server side and the line is capable) in one connection? You do one request for all 15 streams to be send /streamed in one connection (not one big stream) so the headers of each chunk have to match the appropriate stream-id. Read more: http://qnimate.com/what-is-multiplexing-in-http2/
More in-depth here: https://hpbn.co/http2/
With http1.0/1.1 you are out of luck for this scenario - back then when developed one video or mp3 file was already heavy stuff (work arounds where e.g. torrent libraries but unreliable and not suited for most scenarios apart from mere downloading/streaming). For your interactive scenario http2 is the way to go imho.
As Codebreaker007 said, I would prefer HTTP2 stream multiplexing too. It is specifically designed to get around the very problem of too many concurrent connections.
However, if you are stuck with HTTP1.x I don't think you're completely out of luck. It is possible to merge the streams in a way so that the clientside can destructure and manipulate the individual streams, although admittedly it takes a bit more work, and you might have to resort to clientside polling.
The idea is simple - define a really simple data structure:
[streamCount len1 data1 len2 data2 ...]
Byte 0 ~ 3: 32-bit unsigned int number of merged streams
Byte 4 ~ 7: 32-bit unsigned int length of data of stream 1
Byte 8 ~ 8+len1: binary data of stream 1
Byte 8+len1+1 ~ 8+len1+4: length of data of stream 2
...
Each data is allowed to have a length of 0, and is handled no differently in this case.
On the clientside, poll continuously for more data, expecting this data structure. Then destructure it and pipe the data to the individual streams' buffer. Then you can still manipulate the component streams individually.
On the serverside, cache the data from individual component streams in memory. Then in each response empty the cache, compose this data structure and send.
But again, this is very much a plaster solution. I would recommend using HTTP2 stream as well, but this would be a reasonable fallback.

WebAudio API - Get audio data from an AudioNode

Given an AudioNode, is there any way to directly get the audio data from it? The data could be an ArrayBuffer, an AudioBuffer, a TypedArray or something similar.
I don't want to use any kind of Media Stream stuff.
Connect a ScriptProcessorNode or AudioWorkletNode to the output of the node you're interested in. These will give you buffers of audio data. You'll have to figure out how and where to save all the data, but these nodes will give you the audio data. If you can, use AudioWorkletNode, which isn't subject to main thread loading issues.

Web Audio API: Collect all audio informations at "once"

I know that I can collect Audio Data of an currently played audio with getByteFrequenzyData() and I'll get back an Uint8Array.
Now I collect all data of one Audio File by pushing each animationFrame the currently data in an Array, do for example:
I have a audio file with duration of 20min.
Then I have after 20min all Audio data in one Array, which then looks kind a like this:
var data = [Uint8Array[1024], Uint8Array[1024], Uint8Array[1024], Uint8Array[1024], ... ];
Is there a faster way to get all these audio data, so I don't have to wait the full 20 minutes of the video, and get the audio data nearly instant?
It would be good to receive the audio information in fixed steps for, like 50ms or so!
Instead of using an AudioContext, use an OfflineAudioContext to process the data. This can run much faster than real time. To get consistent data at well defined times, you'll also need to use a browser that implements the recently added suspend and resume feature for offline contexts so that you can sample the data for getByteFrequencyData at well-defined time intervals.

Categories

Resources