Decode a pickled file that has been encoded as a Blob

Decode a pickled file that has been encoded as a Blob - javascript

Overview:
I am trying to get save/load functionality working as part of a web app I am building, but cannot properly reload a file after I've downloaded it.
Backend:
I have a list a of lists in python that looks something like
[[bytes, bytes, int, list, list, str], [...], [...], etc].
This is the data I care about. I then pickle it using
with open(file_path, 'wb') as fp:
pickle.dump(save_this_arr, fp)
and send it using Flask's send_file:
return send_file(file_path, as_attachment=True)
Frontend:
On the front end, I am creating a blob, encoding a data url, and then setting it as the src of a hidden <iframe>:
let blob = new Blob([response.data], { type: "application/octet-stream" });
let url = window.URL.createObjectURL(blob);
self.downloader.src = url
This works fine and gets me a file that I can re-upload.
Problem:
I am getting stuck on how to properly decode the URL so that I can pickle.load the result. The two links below seem like they're what I need, but I'm getting UnicodeDecodeErrors when I apply it to my code.
Current Attempt:
with open(file_path, "rb") as fid:
contents = fid.read()
data = urllib.parse.parse_qs(contents, encoding='utf-16')
with open(file_path, 'wb') as fid:
fid.write(text)
with open(file_path, 'rb') as fid:
myList = pickle.load(fid)
EDIT:
The original question asked about decoding a url because I misunderstood what window.URL.createObjectURL(blob) was doing. From this blog post, I realized that we are actually creating a reference to an in-memory blob. So what I actually want to do is read a Blob in Python.
References:
Url decode UTF-8 in Python
decoding URL encoded byte stream data in python

I'm not sure why I was unable to decode the blob directly, but encoding to a base64 string before writing the file works.
Backend (writing to disk):
import base64
with open(file_path, 'wb') as fp:
data = pickle.dumps(save_this_arr)
encoded = base64.b64encode(data)
fp.write(encoded)
Frontend (copied from question - no change):
let blob = new Blob([response.data], { type: "application/octet-stream" });
let url = window.URL.createObjectURL(blob);
self.downloader.src = url
Backend (reading from disk):
with open(file_path, "rb") as fid:
contents = fid.read()
decoded = base64.b64decode(contents)
myList = pickle.loads(decoded)

Related

How do I recreate the original file using Javascript?

Working on a Javascript-based file encryption program. It works great for text files, but I'm trying to do other data (images, etc) and the decrypted file isn't matching. In my program, I use the following code to read the file that the user provides:
let reader = new FileReader();
let file = UI.file.input.files[0];
reader.readAsText(file, 'UTF-8');
The reader then adds that to a Javascript object that contains other information, like the date and file name. Then, that's stringified using JSON.stringify() and encrypted. When the encryption is completed, the file gets saved as a different file format with the encrypted JSON inside of it. The code for that is:
let message = // ENCRYPTED FILE STRING
let file = new Blob([message], {
type: 'text/plain'
});
let url = URL.createObjectURL(file);
The url is then attached to a link element on the page. That's all working fine.
To decrypt it, the file is again provided by the user and runs through the same reader as used above. It's decrypted successfully and the resulting string is again put into an object using JSON. Up to that point, it works exactly as it's supposed to. It works fine if what I'm decrypting is a text file, but if I do an image file it all goes bad.
To save the decrypted file, I use this code:
let message = // DECRYPTED DATA CONVERTED TO AN OBJECT
let fileName = message.name;
let fileType = message.type;
let file = new Blob([message.data], {
type: fileType
});
let url = URL.createObjectURL(file);
The original file and and original file type are both pulled from the file before it's encrypted and added to the object that has the file data. Like I said, there's no problem with the encryption or decryption process. I've used a HEX viewer to check and the string that is being put into the encryption process is identical to the one coming out. I'm guessing the issue is somewhere in my final block of code.
My best guess is something to do with encoding, although I'm not sure exactly what. Any help would be greatly appreciated.

You can use FileReader.readAsDataURL and strip the data:*/*;base64, from the string. Then you'll have a base64 encoded string that you can put into JSON.

Send Audio data represent as numpy array from python to Javascript

I have a TTS (text-to-speech) system that produces audio in numpy-array form whose data type is np.float32. This system is running in the backend and I want to transfer the data from the backend to the frontend to be played when a certain event happens.
The obvious solution for this problem is to write the audio data on disk as a wav file and then pass the path to the frontend to be played. This worked fine, but I don't want to do that for administrative reasons. I just want to transfer only the audio data (numpy array) to the frontend.
What I have done till now is the following:
backend
text = "Hello"
wav, sr = tts_model.synthesize(text)
data = {"snd", wav.tolist()}
flask_response = app.response_class(response=flask.json.dumps(data),
status=200,
mimetype='application/json' )
# then return flask_response
frontend
// gets wav from backend
let arrayData = new Float32Array(wav);
let blob = new Blob([ arrayData ]);
let url = URL.createObjectURL(blob);
let snd = new Audio(url);
snd.play()
That what I have done till now, but the JavaScript throws the following error:
Uncaught (in promise) DOMException: Failed to load because no supported source was found.
This is the gist of what I'm trying to do. I'm so sorry, you can't repreduce the error as you don't have the TTS system, so this is an audio file generated by it which you can use to see what I'm doing wrong.
Other things I tried:
Change the audio datatype to np.int8, np.int16 to be casted in the JavaScript by Int8Array() and int16Array() respectively.
tried different types when creating the blob such as {"type": "application/text;charset=utf-8;"} and {"type": "audio/ogg; codecs=opus;"}.
I have been struggling in this issue for so long, so any help is appriciated !!

Convert wav array of values to bytes
Right after synthesis you can convert numpy array of wav to byte object then encode via base64.
import io
from scipy.io.wavfile import write
bytes_wav = bytes()
byte_io = io.BytesIO(bytes_wav)
write(byte_io, sr, wav)
wav_bytes = byte_io.read()
audio_data = base64.b64encode(wav_bytes).decode('UTF-8')
This can be used directly to create html audio tag as source (with flask):
<audio controls src="data:audio/wav;base64, {{ audio_data }}"></audio>
So, all you need is to convert wav, sr to audio_data representing raw .wav file. And use as parameter of render_template for your flask app. (Solution without sending)
Or if you send audio_data, in .js file where you accept response, use audio_data to construct url (would be placed as src attribute like in html):
// get audio_data from response
let snd = new Audio("data:audio/wav;base64, " + audio_data);
snd.play()
because:
Audio(url) Return value:
A new HTMLAudioElement object, configured to be used for playing back the audio from the file specified by url.The new object's preload property is set to auto and its src property is set to the specified URL or null if no URL is given. If a URL is specified, the browser begins to asynchronously load the media resource before returning the new object.

Your sample as is does not work out of the box. (Does not play)
However with:
StarWars3.wav: OK. retrieved from cs.uic.edu
your sample encoded in PCM16 instead of PCM32: OK (check the wav metadata)
Flask
from flask import Flask, render_template, json
import base64
app = Flask(__name__)
with open("sample_16.wav", "rb") as binary_file:
# Read the whole file at once
data = binary_file.read()
wav_file = base64.b64encode(data).decode('UTF-8')
#app.route('/wav')
def hello_world():
data = {"snd": wav_file}
res = app.response_class(response=json.dumps(data),
status=200,
mimetype='application/json')
return res
#app.route('/')
def stat():
return render_template('index.html')
if __name__ == '__main__':
app.run(debug = True)
js
<audio controls></audio>
<script>
;(async _ => {
const res = await fetch('/wav')
let {snd: b64buf} = await res.json()
document.querySelector('audio').src="data:audio/wav;base64, "+b64buf;
})()
</script>
Original Poster Edit
So, what I ended up doing before (using this solution) that solved my problem is to:
First, change the datatype from np.float32 to np.int16:
wav = (wav * np.iinfo(np.int16).max).astype(np.int16)
Write the numpy array into a temporary wav file using scipy.io.wavfile:
from scipy.io import wavfile
wavfile.write(".tmp.wav", sr, wav)
Read the bytes from the tmp file:
# read the bytes
with open(".tmp.wav", "rb") as fin:
wav = fin.read()
Delete the temporary file
import os
os.remove(".tmp.wav")

base64 encode in python, decode in javascript

A Python backend reads a binary file, base64 encodes it, inserts it into a JSON doc and sends it to a JavaScript frontend:
#Python
with open('some_binary_file', 'rb') as in_file:
return base64.b64encode(in_file.read()).decode('utf-8')
The JavaScript frontend fetches the base64 encoded string from the JSON document and turns it into a binary blob:
#JavaScript
b64_string = response['b64_string'];
decoded_file = atob(b64_string);
blob = new Blob([decoded_file], {type: 'application/octet-stream'});
Unfortunately when downloading the blob the encoding seems to be wrong but I'm not sure where the problem is. E.g. it is an Excel file that I can't open anymore. In the Python part I've tried different decoders ('ascii', 'latin1') but that doesn't make a difference. Is there a problem with my code?

I found the answer here. The problem was at the JavaScript side. It seems only applying atob to the base64 encoded string doesn't work for binary data. You'll have to convert that into a typed byte array. What I ended up doing (LiveScript):
byte_chars = atob base64_str
byte_numbers = [byte_chars.charCodeAt(index) for bc, index in byte_chars]
byte_array = new Uint8Array byte_numbers
blob = new Blob [byte_array], {type: 'application/octet-stream'}

Decode Base64 encoded PDF content in browser

We transform HTML to PDF in the backend (PHP) using dompdf. The generated output from dompdf is Base64 encoded with
$output = $dompdf->output();
base64_encode($output);
This Base64 encoded content is saved as a file on the server. When we decode this file content like this:
cat /tmp/55acbaa9600f4 | base64 -D > test.pdf
we get a proper PDF file.
But when we transfer the Base64 content to the client as a string value inside a JSON object (the server provides a RESTful API...):
{
"file_data": "...the base64 string..."
}
And decode it with atob() and then create a Blob object to download the file later on, the PDF is always "empty"/broken.
$scope.downloadFileData = function(doc) {
DocumentService.getFileData(doc).then(function(data) {
var decodedFileData = atob(data.file_data);
var file = new Blob([decodedFileData], { type: doc.file_type });
saveAs(file, doc.title + '.' + doc.extension);
});
};
When we log the decoded content, it seems that the content is "broken", because several symbols are not the same as when we decode the content on the server using base64 -D.
When we encode/decode the content of simple text/plain documents, it's working as expected. But all binary (or not ASCII formats) are not working.
We have searched the web for many hours, but didn't find a solution for this that works for us. Does anyone have the same problem and can provide us with a working solution? Thanks in advance!
This is a example for a on the server Base64 encoded content of a PDF document:
JVBERi0xLjMKMSAwIG9iago8PCAvVHlwZSAvQ2F0YWxvZwovT3V0bGluZXMgMiAwIFIKL1BhZ2VzIDMgMCBSID4+CmVuZG9iagoyIDAgb2JqCjw8IC9UeXBlIC9PdXRsaW5lcyAvQ291bnQgMCA+PgplbmRvYmoKMyAwIG9iago8PCAvVHlwZSAvUGFnZXMKL0tpZHMgWzYgMCBSCl0KL0NvdW50IDEKL1Jlc291cmNlcyA8PAovUHJvY1NldCA0IDAgUgovRm9udCA8PCAKL0YxIDggMCBSCj4+Cj4+Ci9NZWRpYUJveCBbMC4wMDAgMC4wMDAgNjEyLjAwMCA3OTIuMDAwXQogPj4KZW5kb2JqCjQgMCBvYmoKWy9QREYgL1RleHQgXQplbmRvYmoKNSAwIG9iago8PAovQ3JlYXRvciAoRE9NUERGKQovQ3JlYXRpb25EYXRlIChEOjIwMTUwNzIwMTMzMzIzKzAyJzAwJykKL01vZERhdGUgKEQ6MjAxNTA3MjAxMzMzMjMrMDInMDAnKQo+PgplbmRvYmoKNiAwIG9iago8PCAvVHlwZSAvUGFnZQovUGFyZW50IDMgMCBSCi9Db250ZW50cyA3IDAgUgo+PgplbmRvYmoKNyAwIG9iago8PCAvRmlsdGVyIC9GbGF0ZURlY29kZQovTGVuZ3RoIDY2ID4+CnN0cmVhbQp4nOMy0DMwMFBAJovSuZxCFIxN9AwMzRTMDS31DCxNFUJSFPTdDBWMgKIKIWkKCtEaIanFJZqxCiFeCq4hAO4PD0MKZW5kc3RyZWFtCmVuZG9iago4IDAgb2JqCjw8IC9UeXBlIC9Gb250Ci9TdWJ0eXBlIC9UeXBlMQovTmFtZSAvRjEKL0Jhc2VGb250IC9UaW1lcy1Cb2xkCi9FbmNvZGluZyAvV2luQW5zaUVuY29kaW5nCj4+CmVuZG9iagp4cmVmCjAgOQowMDAwMDAwMDAwIDY1NTM1IGYgCjAwMDAwMDAwMDggMDAwMDAgbiAKMDAwMDAwMDA3MyAwMDAwMCBuIAowMDAwMDAwMTE5IDAwMDAwIG4gCjAwMDAwMDAyNzMgMDAwMDAgbiAKMDAwMDAwMDMwMiAwMDAwMCBuIAowMDAwMDAwNDE2IDAwMDAwIG4gCjAwMDAwMDA0NzkgMDAwMDAgbiAKMDAwMDAwMDYxNiAwMDAwMCBuIAp0cmFpbGVyCjw8Ci9TaXplIDkKL1Jvb3QgMSAwIFIKL0luZm8gNSAwIFIKPj4Kc3RhcnR4cmVmCjcyNQolJUVPRgo=
If you atob() this, you don't get the same result as on the console with base64 -D. Why?

Your issue looks identical to the one I needed to solve recently.
Here is what worked for me:
const binaryImg = atob(base64String);
const length = binaryImg.length;
const arrayBuffer = new ArrayBuffer(length);
const uintArray = new Uint8Array(arrayBuffer);
for (let i = 0; i < length; i++) {
uintArray[i] = binaryImg.charCodeAt(i);
}
const fileBlob = new Blob([uintArray], { type: 'application/pdf' });
saveAs(fileBlob, 'filename.pdf');
It seems that only doing a base64 decode is not enough...you need to put the result into a Uint8Array. Otherwise, the pdf pages appear blank.
I found this solution here:
https://github.com/sayanee/angularjs-pdf/issues/110#issuecomment-579988190

You can use btoa() and atob() work in some browsers:
For Exa.
var enc = btoa("this is some text");
alert(enc);
alert(atob(enc));
To JSON and base64 are completely independent.
Here's a JSON stringifier/parser (and direct GitHub link).
Here's a base64 Q&A. Here's another one.

How can I write a file in ISO-8859-1 with Chrome's File API

At the moment I am writing a file like this:
var blob = new Blob([contents], {type: 'text/plain;charset=iso-8859-1'});
fileWriter.write(blob);
However, when I run file -i on the resulting file the charset is always UTF-8.
The variable contents is encoded in ISO-8859-1 on the server side, and then communicated over the wire in base64:
def write_csv_file
filewriter = RMS::LabelFile.for_order(self.order)
csv = filewriter.to_csv
csv = csv.encode("ISO-8859-1")
csv = Base64.encode64(csv)
%Q{<script type="text/javascript" charset="ISO-8859-1">
var csv_data = #{csv.inspect.gsub('\n', '')};
csv_data = window.atob(csv_data);
parent.phn.filewriter.writeFile("#{self.order.order_number}.csv", csv_data, 'ISO-8859-1');
</script>
}
end
I've checked and double checked that the encoding is still ISO-8859-1 on the client side in the javascript. It seems like Blob and fileWriter are changing the encoding before they write. Examining the W3C's working draft it seems as though Blob converts DOMStrings to UTF-8 before writing them.

Develop Reference

JavaScript is the programming language of the Web.

Decode a pickled file that has been encoded as a Blob - javascript

Related

How do I recreate the original file using Javascript?

Send Audio data represent as numpy array from python to Javascript

base64 encode in python, decode in javascript

Decode Base64 encoded PDF content in browser

How can I write a file in ISO-8859-1 with Chrome's File API

Categories

Resources