Send Audio data represent as numpy array from python to Javascript - javascript

I have a TTS (text-to-speech) system that produces audio in numpy-array form whose data type is np.float32. This system is running in the backend and I want to transfer the data from the backend to the frontend to be played when a certain event happens.
The obvious solution for this problem is to write the audio data on disk as a wav file and then pass the path to the frontend to be played. This worked fine, but I don't want to do that for administrative reasons. I just want to transfer only the audio data (numpy array) to the frontend.
What I have done till now is the following:
backend
text = "Hello"
wav, sr = tts_model.synthesize(text)
data = {"snd", wav.tolist()}
flask_response = app.response_class(response=flask.json.dumps(data),
status=200,
mimetype='application/json' )
# then return flask_response
frontend
// gets wav from backend
let arrayData = new Float32Array(wav);
let blob = new Blob([ arrayData ]);
let url = URL.createObjectURL(blob);
let snd = new Audio(url);
snd.play()
That what I have done till now, but the JavaScript throws the following error:
Uncaught (in promise) DOMException: Failed to load because no supported source was found.
This is the gist of what I'm trying to do. I'm so sorry, you can't repreduce the error as you don't have the TTS system, so this is an audio file generated by it which you can use to see what I'm doing wrong.
Other things I tried:
Change the audio datatype to np.int8, np.int16 to be casted in the JavaScript by Int8Array() and int16Array() respectively.
tried different types when creating the blob such as {"type": "application/text;charset=utf-8;"} and {"type": "audio/ogg; codecs=opus;"}.
I have been struggling in this issue for so long, so any help is appriciated !!

Convert wav array of values to bytes
Right after synthesis you can convert numpy array of wav to byte object then encode via base64.
import io
from scipy.io.wavfile import write
bytes_wav = bytes()
byte_io = io.BytesIO(bytes_wav)
write(byte_io, sr, wav)
wav_bytes = byte_io.read()
audio_data = base64.b64encode(wav_bytes).decode('UTF-8')
This can be used directly to create html audio tag as source (with flask):
<audio controls src="data:audio/wav;base64, {{ audio_data }}"></audio>
So, all you need is to convert wav, sr to audio_data representing raw .wav file. And use as parameter of render_template for your flask app. (Solution without sending)
Or if you send audio_data, in .js file where you accept response, use audio_data to construct url (would be placed as src attribute like in html):
// get audio_data from response
let snd = new Audio("data:audio/wav;base64, " + audio_data);
snd.play()
because:
Audio(url) Return value:
A new HTMLAudioElement object, configured to be used for playing back the audio from the file specified by url.The new object's preload property is set to auto and its src property is set to the specified URL or null if no URL is given. If a URL is specified, the browser begins to asynchronously load the media resource before returning the new object.

Your sample as is does not work out of the box. (Does not play)
However with:
StarWars3.wav: OK. retrieved from cs.uic.edu
your sample encoded in PCM16 instead of PCM32: OK (check the wav metadata)
Flask
from flask import Flask, render_template, json
import base64
app = Flask(__name__)
with open("sample_16.wav", "rb") as binary_file:
# Read the whole file at once
data = binary_file.read()
wav_file = base64.b64encode(data).decode('UTF-8')
#app.route('/wav')
def hello_world():
data = {"snd": wav_file}
res = app.response_class(response=json.dumps(data),
status=200,
mimetype='application/json')
return res
#app.route('/')
def stat():
return render_template('index.html')
if __name__ == '__main__':
app.run(debug = True)
js
<audio controls></audio>
<script>
;(async _ => {
const res = await fetch('/wav')
let {snd: b64buf} = await res.json()
document.querySelector('audio').src="data:audio/wav;base64, "+b64buf;
})()
</script>
Original Poster Edit
So, what I ended up doing before (using this solution) that solved my problem is to:
First, change the datatype from np.float32 to np.int16:
wav = (wav * np.iinfo(np.int16).max).astype(np.int16)
Write the numpy array into a temporary wav file using scipy.io.wavfile:
from scipy.io import wavfile
wavfile.write(".tmp.wav", sr, wav)
Read the bytes from the tmp file:
# read the bytes
with open(".tmp.wav", "rb") as fin:
wav = fin.read()
Delete the temporary file
import os
os.remove(".tmp.wav")

Related

Sending image over websockets (from python backend to javascript frontend)

I'm trying to send a frame from a video feed in open cv-python over websockets to react js frontend.
I can encode and send it fine, but when I try to decode it I get an error that reads that the string wasn't properly encoded.
backend code
#open url to view info
stream_url = urlopen(url)
#read information from url, convert array into bytarray
stream_feed = stream_url.read()
image = np.asarray(bytearray(stream_feed), dtype="uint8")
str_frame = str(b64encode(frame))
payload = json.dumps({
'image': str_frame
})
await remoteSocket.send(payload)
frontend code
const image = decodeURIComponent(atob(orgImg))
const img = new Image()
img.src = image
img.alt = 'frame'
the string to contain a b before before the single quotes, I thought that might be it so I trimmed it off and no luck. I've also tried it without decodeURICompoenet.
And I've tried it by adding data:image/jpeg;base64 to the start.

How to read binary data response from AWS when doing a GET directly to an S3 URI in browser?

Some general context: This is an app that uses the MERN stack, but the question is more specific to AWS S3 data.
I have an S3 set up and i store images and files from my app there. I usually generate signedURLs with the server and do a direct upload from the browser.
within my app db i store the object URIs as a string and then an image for example i can render with an <img/> tag no problem. So far so good.
However, when they are PDFs and i want to let the user download the PDF i stored in S3, doing an <a href={s3Uri} download> just causes the pdf to be opened in another window/tab instead of prompting the user to download. I believe this is due to the download attribute being dependent on same-origin and you cannot download a file from an external resource (correct me if im wrong please)
So then my next attempt is to then do an http fetch of the resource directly using axios, it looks something like this
axios.create({
baseURL: attachment.fileUrl,
headers: {common: {Authorization: ''}}
})
.get('')
.then(res => {
console.log(res)
console.log(typeof res.data)
console.log(new Buffer.from(res.data).toString())
})
So by doing this I am successfully reading the response headers (useful cuz then i can handle images/files differently) BUT when i try to read the binary data returned i have been unsuccessful and parsing it or even determining how it is encoded, it looks like this
%PDF-1.3
3 0 obj
<</Type /Page
/Parent 1 0 R
/Resources 2 0 R
/Contents 4 0 R>>
endobj
4 0 obj
<</Filter /FlateDecode /Length 1811>>
stream
x�X�R�=k=E׷�������Na˅��/���� �[�]��.�,��^ �wF0�.��Ie�0�o��ݧO_IoG����p��4�BJI���g��d|��H�$�12(R*oB��:%먺�����:�R�Ф6�Xɔ�[:�[��h�(�MQ���>���;l[[��VN�hK/][�!�mJC
.... and so on
I have another function I use to allow users to download PDFs that i store directly in my database as strings in base64. These are PDF's my app generates and are fairly small so i store them directly in the DB, as opposed to the ones i store in AWS S3 which are user-submitted and can be several MBs in size (the ones in my db are just a few KB)
The function I use to process my base64 pdfs and provide a downloadable link to the users looks like this
export const makePdfUrlFromBase64 = (base64) => {
const binaryImg = atob(base64);
const binaryImgLength = binaryImg.length;
const arrayBuffer = new ArrayBuffer(binaryImgLength);
const uInt8Array = new Uint8Array(arrayBuffer);
for (let i = 0; i < binaryImgLength; i++) {
uInt8Array[i] = binaryImg.charCodeAt(i);
}
const outputBlob = new Blob([uInt8Array], {type: 'application/pdf'});
return URL.createObjectURL(outputBlob)
}
HOWEVER, when i try to apply this function to the data returned from AWS i get this error:
DOMException: Failed to execute 'atob' on 'Window': The string to be decoded contains characters outside of the Latin1 range.
So what kind of binary data encoding do i have here from AWS?
Note: I am able to render an image with this binary data by passing the src in the img tag like this:
<img src={data:${res.headers['Content-Type']};base64,${res.data}} />
which is my biggest hint that this is some form of base64?
PLEASE! If anyone has a clue how i can achieve my goal here, im all ears! The goal is to be able to prompt the user to download the resource which i have in an S3 URI. I can link to it and they can open it in browser, and then download manually, but i want to force the prompt.
Anybody know what kind of data is being returned here? any way to parse it as a stream? a buffer?
I have tried to stringify it with JSON or to log it to the console as a string, im open to all suggestions at this point
You're doing all kinds of unneeded conversions. When you do the GET request, you already have the data in the desired format.
const response = await fetch(attachment.fileUrl,
headers: {Authorization: ''}}
});
const blob = await response.blob();
return URL.createObjectURL(res.data);

Google TTS in Django: Create Audio File in Javascript from base64 String

I am currently using Google's TTS Python API "synthesize_text" function in one of my Django views.
def synthesize_text(text):
"""Synthesizes speech from the input string of text."""
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.types.SynthesisInput(text=text)
# Note: the voice can also be specified by name.
# Names of voices can be retrieved with client.list_voices().
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
response = client.synthesize_speech(input_text, voice, audio_config)
# The response's audio_content is binary.
# Removing this because I do not care about writing the audio file
# ----------------------------------------------------
'''
with open('output.mp3', 'wb') as out:
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
'''
# ----------------------------------------------------
# instead return the encoded audio_content to decode and play in Javascript
return response.audio_content
def my_view(request):
test_audio_content = synthesize_text('Test audio.')
return render('my_template.html', {'test_audio_content': test_audio_content})
The only change I made to the "synthesize_text" function is that I return the audio_content instead of writing it out to an audio file. This is because I don't care about storing the file, and instead just want to play it in my template using Javascript. Google claims they encode the audio_content in base64: "Cloud Text-to-Speech API allows you to convert words and sentences into base64 encoded audio data of natural human speech. You can then convert the audio data into a playable audio file like an MP3 by decoding the base64 data." So I tried creating and playing the audio file with the following code as suggested here:
<!-- my_template.html -->
<script>
var audio_content = "{{ test_audio_content }}";
var snd = new Audio("data:audio/mp3;base64," + audio_content);
console.log(snd);
snd.play();
</script>
But I get the following error:
Uncaught (in promise) DOMException: Failed to load because no supported source was found.
I logged out the audio_content, and it starts as b'ÿóDÄH.. not sure if that is base64 or not.
Also I tried to decode the audio_content by doing:
var decoded_content = window.atob(audio_content);
And that gave me an error as well, claiming it isn't base64.
From your example:
The response's audio_content is binary
This means that you'll need to encode the result as base64 first before you can use it:
import base64
...
return base64.b64encode(response.audio_content).decode('ascii'))
Then this should work with your JS snippet exactly as you intended.

Decode a pickled file that has been encoded as a Blob

Overview:
I am trying to get save/load functionality working as part of a web app I am building, but cannot properly reload a file after I've downloaded it.
Backend:
I have a list a of lists in python that looks something like
[[bytes, bytes, int, list, list, str], [...], [...], etc].
This is the data I care about. I then pickle it using
with open(file_path, 'wb') as fp:
pickle.dump(save_this_arr, fp)
and send it using Flask's send_file:
return send_file(file_path, as_attachment=True)
Frontend:
On the front end, I am creating a blob, encoding a data url, and then setting it as the src of a hidden <iframe>:
let blob = new Blob([response.data], { type: "application/octet-stream" });
let url = window.URL.createObjectURL(blob);
self.downloader.src = url
This works fine and gets me a file that I can re-upload.
Problem:
I am getting stuck on how to properly decode the URL so that I can pickle.load the result. The two links below seem like they're what I need, but I'm getting UnicodeDecodeErrors when I apply it to my code.
Current Attempt:
with open(file_path, "rb") as fid:
contents = fid.read()
data = urllib.parse.parse_qs(contents, encoding='utf-16')
with open(file_path, 'wb') as fid:
fid.write(text)
with open(file_path, 'rb') as fid:
myList = pickle.load(fid)
EDIT:
The original question asked about decoding a url because I misunderstood what window.URL.createObjectURL(blob) was doing. From this blog post, I realized that we are actually creating a reference to an in-memory blob. So what I actually want to do is read a Blob in Python.
References:
Url decode UTF-8 in Python
decoding URL encoded byte stream data in python
I'm not sure why I was unable to decode the blob directly, but encoding to a base64 string before writing the file works.
Backend (writing to disk):
import base64
with open(file_path, 'wb') as fp:
data = pickle.dumps(save_this_arr)
encoded = base64.b64encode(data)
fp.write(encoded)
Frontend (copied from question - no change):
let blob = new Blob([response.data], { type: "application/octet-stream" });
let url = window.URL.createObjectURL(blob);
self.downloader.src = url
Backend (reading from disk):
with open(file_path, "rb") as fid:
contents = fid.read()
decoded = base64.b64decode(contents)
myList = pickle.loads(decoded)

Streaming the Microphone output via HTTP POST using chunked transfer

We are trying to build an app to broadcast live audio to multiple subscribers. The server(written in go) accepts pcm data through chunks and a client using pyaudio is able to tap into the microphone and send this data using the below code. We have tested this and it works. The audio plays from any browser with the subscriber URL.
import pyaudio
import requests
import time
p = pyaudio.PyAudio()
# frames per buffer ?
CHUNK = 1024
# 16 bits per sample ?
FORMAT = pyaudio.paInt16
# 44.1k sampling rate ?
RATE = 44100
# number of channels
CHANNELS = 1
STREAM = p.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK
)
print "initialized stream"
def get_chunks(stream):
while True:
try:
chunk = stream.read(CHUNK,exception_on_overflow=False)
yield chunk
except IOError as ioe:
print "error %s" % ioe
url = "https://<server-host>/stream/publish/<uuid>/"
s = requests.session()
s.headers.update({'Content-Type': "audio/x-wav;codec=pcm"})
resp = s.post(url, data=get_chunks(STREAM))
But we need a browser, iOS and Android client to do the same thing as the above client does. We are able to fetch the audio from the mic using the getUserMedia API on the browser but are unable to send this audio to the server like the python code above does. Can someone throw some light in the right direction?
This is about a year old now so I am sure you've moved on but I think that the approach to use from the browser is to stream the data over a WebSocket rather than over HTTP.

Categories

Resources