Streaming the Microphone output via HTTP POST using chunked transfer

Streaming the Microphone output via HTTP POST using chunked transfer - javascript

We are trying to build an app to broadcast live audio to multiple subscribers. The server(written in go) accepts pcm data through chunks and a client using pyaudio is able to tap into the microphone and send this data using the below code. We have tested this and it works. The audio plays from any browser with the subscriber URL.
import pyaudio
import requests
import time
p = pyaudio.PyAudio()
# frames per buffer ?
CHUNK = 1024
# 16 bits per sample ?
FORMAT = pyaudio.paInt16
# 44.1k sampling rate ?
RATE = 44100
# number of channels
CHANNELS = 1
STREAM = p.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK
)
print "initialized stream"
def get_chunks(stream):
while True:
try:
chunk = stream.read(CHUNK,exception_on_overflow=False)
yield chunk
except IOError as ioe:
print "error %s" % ioe
url = "https://<server-host>/stream/publish/<uuid>/"
s = requests.session()
s.headers.update({'Content-Type': "audio/x-wav;codec=pcm"})
resp = s.post(url, data=get_chunks(STREAM))
But we need a browser, iOS and Android client to do the same thing as the above client does. We are able to fetch the audio from the mic using the getUserMedia API on the browser but are unable to send this audio to the server like the python code above does. Can someone throw some light in the right direction?

This is about a year old now so I am sure you've moved on but I think that the approach to use from the browser is to stream the data over a WebSocket rather than over HTTP.

Related

trim or cut ArrrayBuffer from audio by timestamp node

I am fetching data from a remove url that hosts some audio. For instance: https://www.listennotes.com/e/p/98bcfa3fd1b44727913385938788bcc5/
I do this with the following code:
const buffer = await (await fetch(url)).arrayBuffer();
How do I trim/cut this ArrayBuffer from the audio by time. For example, I might want to the ArrayBuffer/Blob between the 12 seconds and 60 seconds.
All the solutions I have found are web solutions. I am hoping for a way to do this server side with node.

Split websocket message in multiple frames

I am using native javascript websocket in browser and we have an application hosted on AWS where every request goes through API gateway.
In some cases, request data is going upto 60kb, and then my websocket connection is closing automatically. In AWS documentation, I found out below explanation of this issue
https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-known-issues.html
API Gateway supports message payloads up to 128 KB with a maximum frame size of 32 KB. If a message exceeds 32 KB, you must split it into multiple frames, each 32 KB or smaller. If a larger message is received, the connection is closed with code 1009.
I tried to find how I can split a message in multiple frames using native javascript websocket but could not find any config related to frames in documentation or anywhere else
Although I find something related to message fragmentation but it seems like a custom solution that I need to implement at both frontend and backend
https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_servers#message_fragmentation

As far as I know, you cannot do this using the JS AWS SDK "postToConnection" API. Best you can do is write your own poor's man fragmentation and send the chunks as independent messages.
const splitInChunks =
(sizeInBytes: number) =>
(buffer: Buffer): Buffer[] => {
const size = Buffer.byteLength(buffer);
let start = 0;
let end = sizeInBytes;
const chunks: Buffer[] = [];
do {
chunks.push(buffer.subarray(start, end));
start += sizeInBytes;
end += sizeInBytes;
} while (start < size);
return chunks;
};
Where sizeInBytes must be smaller than 32KB. Then you iterate over the chunks:
await Promise.all(chunks.map(c => apiGatewayClient.postToConnection({ data: JSON.stringify(c), connectionId: myConnectionId })
Which may run into rate limits depending on the number of chunks, so consider sending the requests serially and not in parallel
Final remark: Buffer.prototype.subarray is very efficient because it does not reallocate memory: the new chunks point at the same memory space of the original buffer. Think pointer arithmetic in C.

How to read binary data response from AWS when doing a GET directly to an S3 URI in browser?

Some general context: This is an app that uses the MERN stack, but the question is more specific to AWS S3 data.
I have an S3 set up and i store images and files from my app there. I usually generate signedURLs with the server and do a direct upload from the browser.
within my app db i store the object URIs as a string and then an image for example i can render with an <img/> tag no problem. So far so good.
However, when they are PDFs and i want to let the user download the PDF i stored in S3, doing an <a href={s3Uri} download> just causes the pdf to be opened in another window/tab instead of prompting the user to download. I believe this is due to the download attribute being dependent on same-origin and you cannot download a file from an external resource (correct me if im wrong please)
So then my next attempt is to then do an http fetch of the resource directly using axios, it looks something like this
axios.create({
baseURL: attachment.fileUrl,
headers: {common: {Authorization: ''}}
})
.get('')
.then(res => {
console.log(res)
console.log(typeof res.data)
console.log(new Buffer.from(res.data).toString())
})
So by doing this I am successfully reading the response headers (useful cuz then i can handle images/files differently) BUT when i try to read the binary data returned i have been unsuccessful and parsing it or even determining how it is encoded, it looks like this
%PDF-1.3
3 0 obj
<</Type /Page
/Parent 1 0 R
/Resources 2 0 R
/Contents 4 0 R>>
endobj
4 0 obj
<</Filter /FlateDecode /Length 1811>>
stream
x�X�R�=k=E׷�������Na˅��/���� �[�]��.�,��^ �wF0�.��Ie�0�o��ݧO_IoG����p��4�BJI���g��d|��H�$�12(R*oB��:%먺�����:�R�Ф6�Xɔ�[:�[��h�(�MQ���>���;l[[��VN�hK/][�!�mJC
.... and so on
I have another function I use to allow users to download PDFs that i store directly in my database as strings in base64. These are PDF's my app generates and are fairly small so i store them directly in the DB, as opposed to the ones i store in AWS S3 which are user-submitted and can be several MBs in size (the ones in my db are just a few KB)
The function I use to process my base64 pdfs and provide a downloadable link to the users looks like this
export const makePdfUrlFromBase64 = (base64) => {
const binaryImg = atob(base64);
const binaryImgLength = binaryImg.length;
const arrayBuffer = new ArrayBuffer(binaryImgLength);
const uInt8Array = new Uint8Array(arrayBuffer);
for (let i = 0; i < binaryImgLength; i++) {
uInt8Array[i] = binaryImg.charCodeAt(i);
}
const outputBlob = new Blob([uInt8Array], {type: 'application/pdf'});
return URL.createObjectURL(outputBlob)
}
HOWEVER, when i try to apply this function to the data returned from AWS i get this error:
DOMException: Failed to execute 'atob' on 'Window': The string to be decoded contains characters outside of the Latin1 range.
So what kind of binary data encoding do i have here from AWS?
Note: I am able to render an image with this binary data by passing the src in the img tag like this:
<img src={data:${res.headers['Content-Type']};base64,${res.data}} />
which is my biggest hint that this is some form of base64?
PLEASE! If anyone has a clue how i can achieve my goal here, im all ears! The goal is to be able to prompt the user to download the resource which i have in an S3 URI. I can link to it and they can open it in browser, and then download manually, but i want to force the prompt.
Anybody know what kind of data is being returned here? any way to parse it as a stream? a buffer?
I have tried to stringify it with JSON or to log it to the console as a string, im open to all suggestions at this point

You're doing all kinds of unneeded conversions. When you do the GET request, you already have the data in the desired format.
const response = await fetch(attachment.fileUrl,
headers: {Authorization: ''}}
});
const blob = await response.blob();
return URL.createObjectURL(res.data);

Send Audio data represent as numpy array from python to Javascript

I have a TTS (text-to-speech) system that produces audio in numpy-array form whose data type is np.float32. This system is running in the backend and I want to transfer the data from the backend to the frontend to be played when a certain event happens.
The obvious solution for this problem is to write the audio data on disk as a wav file and then pass the path to the frontend to be played. This worked fine, but I don't want to do that for administrative reasons. I just want to transfer only the audio data (numpy array) to the frontend.
What I have done till now is the following:
backend
text = "Hello"
wav, sr = tts_model.synthesize(text)
data = {"snd", wav.tolist()}
flask_response = app.response_class(response=flask.json.dumps(data),
status=200,
mimetype='application/json' )
# then return flask_response
frontend
// gets wav from backend
let arrayData = new Float32Array(wav);
let blob = new Blob([ arrayData ]);
let url = URL.createObjectURL(blob);
let snd = new Audio(url);
snd.play()
That what I have done till now, but the JavaScript throws the following error:
Uncaught (in promise) DOMException: Failed to load because no supported source was found.
This is the gist of what I'm trying to do. I'm so sorry, you can't repreduce the error as you don't have the TTS system, so this is an audio file generated by it which you can use to see what I'm doing wrong.
Other things I tried:
Change the audio datatype to np.int8, np.int16 to be casted in the JavaScript by Int8Array() and int16Array() respectively.
tried different types when creating the blob such as {"type": "application/text;charset=utf-8;"} and {"type": "audio/ogg; codecs=opus;"}.
I have been struggling in this issue for so long, so any help is appriciated !!

Convert wav array of values to bytes
Right after synthesis you can convert numpy array of wav to byte object then encode via base64.
import io
from scipy.io.wavfile import write
bytes_wav = bytes()
byte_io = io.BytesIO(bytes_wav)
write(byte_io, sr, wav)
wav_bytes = byte_io.read()
audio_data = base64.b64encode(wav_bytes).decode('UTF-8')
This can be used directly to create html audio tag as source (with flask):
<audio controls src="data:audio/wav;base64, {{ audio_data }}"></audio>
So, all you need is to convert wav, sr to audio_data representing raw .wav file. And use as parameter of render_template for your flask app. (Solution without sending)
Or if you send audio_data, in .js file where you accept response, use audio_data to construct url (would be placed as src attribute like in html):
// get audio_data from response
let snd = new Audio("data:audio/wav;base64, " + audio_data);
snd.play()
because:
Audio(url) Return value:
A new HTMLAudioElement object, configured to be used for playing back the audio from the file specified by url.The new object's preload property is set to auto and its src property is set to the specified URL or null if no URL is given. If a URL is specified, the browser begins to asynchronously load the media resource before returning the new object.

Your sample as is does not work out of the box. (Does not play)
However with:
StarWars3.wav: OK. retrieved from cs.uic.edu
your sample encoded in PCM16 instead of PCM32: OK (check the wav metadata)
Flask
from flask import Flask, render_template, json
import base64
app = Flask(__name__)
with open("sample_16.wav", "rb") as binary_file:
# Read the whole file at once
data = binary_file.read()
wav_file = base64.b64encode(data).decode('UTF-8')
#app.route('/wav')
def hello_world():
data = {"snd": wav_file}
res = app.response_class(response=json.dumps(data),
status=200,
mimetype='application/json')
return res
#app.route('/')
def stat():
return render_template('index.html')
if __name__ == '__main__':
app.run(debug = True)
js
<audio controls></audio>
<script>
;(async _ => {
const res = await fetch('/wav')
let {snd: b64buf} = await res.json()
document.querySelector('audio').src="data:audio/wav;base64, "+b64buf;
})()
</script>
Original Poster Edit
So, what I ended up doing before (using this solution) that solved my problem is to:
First, change the datatype from np.float32 to np.int16:
wav = (wav * np.iinfo(np.int16).max).astype(np.int16)
Write the numpy array into a temporary wav file using scipy.io.wavfile:
from scipy.io import wavfile
wavfile.write(".tmp.wav", sr, wav)
Read the bytes from the tmp file:
# read the bytes
with open(".tmp.wav", "rb") as fin:
wav = fin.read()
Delete the temporary file
import os
os.remove(".tmp.wav")

FFmpeg converting from video to audio missing duration

I'm attempting to load YouTube videos via their direct video URL (retrieved using ytdl-core). I load them using the request library. I then pipe the result to a stream, which is used as the input to ffmpeg (via fluent-ffmpeg). The code looks something like this:
var getAudioStream = function(req, res) {
var requestUrl = 'http://youtube.com/watch?v=' + req.params.videoId;
var audioStream = new PassThrough();
var videoUrl;
ytdl.getInfo(requestUrl, { downloadURL: true }, function(err, info) {
res.setHeader('Content-Type', 'audio/x-wav');
res.setHeader('Accept-Ranges', 'bytes');
videoUrl = info.formats ? info.formats[0].url : '';
request(videoUrl).pipe(audioStream);
ffmpeg()
.input(audioStream)
.outputOptions('-map_metadata 0')
.format('wav')
.pipe(res);
});
};
This actually works just fine, and the frontend successfully receives just the audio in WAV format and is playable. However, the audio is missing any information about its size or duration (and all other metadata). This also makes it unseekable.
I'm assuming this is lost somewhere during the ffmpeg stage, because if I load the video directly via the URL passed to request it loads and plays fine, and has a set duration/is seekable. Any ideas?

It isn't possible to know the output size nor duration until it is finished. FFmpeg cannot know this information ahead of time in most cases. Even if it could, the way you are executing FFmpeg it prevents you from accessing the extra information.
Besides, to support seeking you need to support range requests. This isn't possible either, short of encoding the file up to the byte requested and streaming from there on.
Basically, this isn't possible by the nature of what you're doing.

Develop Reference

JavaScript is the programming language of the Web.

Streaming the Microphone output via HTTP POST using chunked transfer - javascript

This is about a year old now so I am sure you've moved on but I think that the approach to use from the browser is to stream the data over a WebSocket rather than over HTTP.

Related

trim or cut ArrrayBuffer from audio by timestamp node

Split websocket message in multiple frames

How to read binary data response from AWS when doing a GET directly to an S3 URI in browser?

Send Audio data represent as numpy array from python to Javascript

FFmpeg converting from video to audio missing duration

Categories

Resources