I'm trying to read and write raw data from files using Mozilla's add-on SDK. Currently I'm reading data with something like:
function readnsIFile(fileName, callback){
var nsiFile = new FileUtils.File(fileName);
NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
var data = NetUtil.readInputStreamToString(inputStream, inputStream.available(),{charset:"UTF-8"});
callback(data, status, nsiFile);
});
}
This works for text files, but when I start messing with raw bytes outside of Unicode's normal range, it doesn't work. For example, if a file contains the byte 0xff, then that byte and anything past that byte isn't read at all. Is there any way to read (and write) raw data using the SDK?
You've specified an explicit charset in the options to NetUtil.readInputStream.
When you omit the charset option, then the data will be read as raw bytes. (Source)
function readnsIFile(fileName, callback){
var nsiFile = new FileUtils.File(fileName);
NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
// Do not specify a charset at all!
var data = NetUtil.readInputStreamToString(inputStream, inputStream.available());
callback(data, status, nsiFile);
});
}
The suggestion to use io/byte-streams is OK as well, but keep in mind that that SDK module is still marked experimental, and that using ByteReader via io/file as the example suggests is not a good idea because this would be sync I/O on the main thread.
I don't really see the upside, as you'd use NetUtil anyway.
Anyway, this should work:
const {ByteReader} = require("sdk/io/byte-streams");
function readnsIFile(fileName, callback){
var nsiFile = new FileUtils.File(fileName);
NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
var reader = new ByteReader(inputStream);
var data = reader.read(inputStream);
reader.close();
callback(data, status, nsiFile);
});
}
Also, please keep in mind that reading large files like this is problematic. Not only will the whole file buffered in memory, obviously, but:
The file is read as a char (byte) array first, so there will be a temporary buffer in the stream of at least file.size length (via asyncFetch).
Both NetUtil.readInputStreamToString and ByteReader will use another char (byte) array to read the result into from the inputStream, but ByteReader will do that in 32K chunks, while NetUtil.readInputStreamToString, will use a big buffer of file.length.
The data is then read into the resulting jschar/wchar_t (word) array aka. Javascript string, i.e. you need file.size * 2 bytes in memory at least.
E.g., reading a 1MB file would require more than fileSize * 4 = 4MB memory (NetUtil.readInputStreamToString) and/or more than fileSize * 3 = 3MB memory (ByteReader) during the read operation. After the operation, 2MB of that memory will be still alive to store the resulting data in a Javascript string.
Reading a 1MB file might be OK, but a 10MB file might be already problematic on mobile (Firefox for Android, Firefox OS) and a 100MB would be problematic even on desktop.
You can also read the data directly into an ArrayBuffer (or Uint8Array), which has more efficient storage for byte arrays than a Javascript string and avoid the temporary buffers of NetUtil.readInputStreamToString and/or ByteReader.
function readnsIFile(fileName, callback){
var nsiFile = new FileUtils.File(fileName);
NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
var bs = Cc["#mozilla.org/binaryinputstream;1"].
createInstance(Ci.nsIBinaryInputStream);
bs.setInputStream(inputStream);
var len = inputStream.available();
var data = new Uint8Array(len);
reader.readArrayBuffer(len, data.buffer);
bs.close();
callback(data, status, nsiFile);
});
}
PS: The MDN documentation might state something about "iso-8859-1" being the default if the charset option is omitted in the NetUtil.readInputStreamToString call, but the documentation is wrong. I'll fix it.
Related
I have a webpage with file upload functionality. The upload is performed in 5MB chunks. I want to calculate hash for each chunk before sending it to the server. The chunks are represented by Blob objects. In order to calculate the hash I am reading such blob into an ArrayBuffer using a native FileReader. Here is the code:
var reader = new FileReader();
var getHash = function (blob, callback) {
reader.onloadend = function (e) {
var hash = util.hash(e.target.result);
callback(hash);
};
reader.readAsArrayBuffer(blob);
}
var processChunk = function (chunk) {
if (chunk) {
getHash(chunk, function (hash) {
util.sendToServer(chunk, hash, function() {
// this callback is called when chunk upload is finished
processChunk(chunks.shift());
});
});
}
}
var chunks = file.splitIntoChunks(); // gets an array of blobs
processChunk(chunks.shift());
The problem: using the FileReader.readAsArrayBuffer seems to eat up a lot of memory which is not released. So far I tested with a 5GB file on following browsers:
Chrome 55.0.2883.87 m (64-bit): the memory goes up to 1-2GB quickly and oscillates around that. Sometimes it goes all the way up and browser tab crashes. It can use more memory than the size of read chunks. E.g. after reading 500MB of chunks the process already uses 700MB of memory.
Firefox 50.1.0: memory usage oscillates around 300-600MB
Code adjustments I have tried - all to no avail:
re-using the same FileReader instance for all chunks (as suggested in this question)
creating new FileReader for each chunk
adding timeout before starting new chunk
setting the FileReader and the ArrayBuffer to null after each read
The question: is there a way to fix the problem? Is this a bug in the FileReader implementations or am I doing something wrong?
EDIT: Here is a JSFiddle https://jsfiddle.net/andy250/pjt9udeu/
This is a bug in Chrome on Windows. It is reported here: https://bugs.chromium.org/p/chromium/issues/detail?id=674903
i'm not 100% sure but from what i read when i send a blob (binary data) over websocket, the blob does not contain any file information. (Also the official specification states that wesockets only send the raw binary)
the filesize
the mimetype
user info (explain later)
i'm using https://github.com/websockets/ws
Testing:
Sending directly the blob from an input file.
ws.send(this.files[0]) //this should already contain the info
Creating a new blob with the native javascript api from file setting the proper mimetype.
ws.send(new Blob([this.files[0]],{type:this.files[0].type})); //also this
on both sides you can get only the effective blob without any other information.
Is it possible to append let's say a 4kb predefined json data converted also to binary that contains important information like the mimetype and the filesize,
and then just split off the 4kb when needed?
{"mime":"txt/plain","size":345}____________4KB_REST_OF_THE_BINARY
OR
ws.send({"mime":"txt\/plain","size":345})
ws.send(this.files[0])
Even if the first one is the worst solution ever it would allow me to send everything in one time.
The second one has a big problem:
it's a chat that allows to send also files like documents,images,music videos.
i could write some sort of handshaking system when sending the file/user info before i send the binary data.
BUT
if another person sends also a file, as it's async, the handshaking system has no chance to determine wich file is the right one for the correct user and mimetype.
So how do you properly send a binary file in a multiuser async envoirement?
i know i can convert to base64 but thats 30% bigger.
btw. Totally disappointed with Apple... while chrome shows every binary data properly, my ios devices are not able to handle blob's, only images will show in blob or base64 format, not even a simple txt file. Basically only a <img> tag can read dynamic files.
How everything works (now):
user sends a file
nodejs gets the binary data, also user info... but not mimetype,filename,size.
nodejs broadcasts the raw binary file to all the users.(can't specify user & file info)
clients create a bloburl (who send that? XD).
EDIT
what i have now:
client 1 (sends a file)CHROME
fileInput.addEventListener('change',function(e){
var file=this.files[0];
ws.send(new Blob([file],{
type:file.type //<- SET MIMETYPE
}));
//file.size
},false);
note: file is already a blob ... but this is how you would normally create a new blob specifying the mimetype.
server (broadcasts the binary data to the other clients)NODEJS
aaaaaand the mimetype is gone...
ws.addListener('message',function(binary){
var b=0,c=wss.clients.length;
while(b<c){
wss.clients[b++].send(binary)
}
});
client 2 (recieves the binary)CHROME
ws.addEventListener('message',function(msg){
var blob=new Blob([msg.data],{
type:'application/octet-stream' //<- LOST
});
var file=window.URL.createObjectURL(blob);
},false);
note: m.data is already a blob ... but this is how you would normally create a new blob specifying the mimetype witch is lost.
In client 2 i need the mimetype and naturally i also need the info about the user, wich can be retrieved from client 1 or the server (not a good choice)...
You're a bit out of luck with this because Node doesn't support the Blob interface and so any data you send or receive in Binary with Node is just Binary. You would have to have something that knew how to interpret a Blob object.
Here's an idea, and let me know if this works. Reading through the documentation for websockets\ws it says it supports sending and receiving ArrayBuffers. Which means you can use TypedArrays.
Here's where it gets nasty. You set a certain fixed n number of bytes at the beginning of every TypedArray to signal the mime type encoded in utf8 or what have you, and the rest of your TypedArray contains your file's bytes.
I would recommend using UInt8Array because utf8 characters are 8 bits long and your text will probably be readable when encoded that way. As for the file bits you'll probably just end up writing those down somewhere and appending an ending to it.
Also note, this method of interpretation works both ways whether from Node or in the Browser.
This solution is really just a form of type casting and you might get some unexpected results. The fixed length of your mime type field is crucial.
Here it is illustrated. Copy, paste, set the image file to whatever you want and then run that. You'll see the mime type I set pop out.
var fs = require('fs');
//https://stackoverflow.com/questions/8609289/convert-a-binary-nodejs-buffer-to-javascript-arraybuffer
function toUint8Array(buffer) {
var ab = new ArrayBuffer(buffer.length);
var array = new Uint8Array(ab);
for (var i = 0; i < buffer.length; ++i) {
array[i] = buffer[i];
}
return array;
}
//data is a raw Buffer object
fs.readFile('./ducklings.png', function (err, data) {
var mime = new Buffer('image/png');
var allBuffed = Buffer.concat([mime, data]);
var array = toUint8Array(allBuffed);
var mimeBytes = array.subarray(0,9); //number of characters in mime Buffer
console.log(String.fromCharCode.apply(null, mimeBytes));
});
Here's how you do it on the client side:
SOLUTION A: GET A PACKAGE
Get buffer, an implementation of Node's Buffer API for browsers. The solution to concatenate Byte buffers will work exactly as before. You can append fields like To: and what not as well. The way you format your headers in order to best serve your clients will be an evolving process I'm sure.
SOLUTION B: OLD SCHOOL
STEP 1: Convert your Blob to an ArrayBuffer
Notes: How to convert a String to an ArrayBuffer
var fr = new FileReader();
fr.addEventListener('loadend', function () {
//Asynchronous action in part 2.
var message = concatenateBuffers(headerStringAsBuffer, fr.result);
ws.send(message);
});
fr.readAsArrayBuffer(blob);
STEP 2: Concatenate ArrayBuffers
function concatenateBuffers(buffA, buffB) {
var byteLength = buffA.byteLength + buffB.byteLength;
var resultBuffer = new ArrayBuffer(byteLength);
//wrap ArrayBuffer in a typedArray/view
var resultView = new Uint8Array(resultBuffer);
var viewA = new Uint8Array(resultBuffer);
var viewB = new Uint8Array(resultBuffer);
//Copy 8 bit integers AKA Bytes
resultView.set(viewA);
resultView.set(viewB, viewA.byteLength);
return resultView.buffer
}
STEP 3: Receive and Reblob
I'm not going to repeat how to convert the concatenated String bytes back into a string because I've done it in the server example, but for turning the file bytes into a blob of your mime type is fairly simple.
new Blob(buffer.slice(offset, buffer.byteLength), {type: mimetype});
This Gist by robnyman goes into further details on how you would use an image transmitted via XHR, put it into localstorage, and use it in an image tag on your page.
I liked #Breedly's idea of prepending a fixed length byte array to indicate mime type of the ArrayBuffer so I created this npm package that I use when dealing with websockets but maybe others' might find it useful.
Example usage
const {
arrayBufferWithMime,
arrayBufferMimeDecouple
} = require('arraybuffer-mime')
// some image array buffer
const uint8 = new Uint8Array(1)
uint8[0] = 1
const ab = uint8.buffer
const mime = 'image/png'
const abWithMime = arrayBufferWithMime(ab, mime)
const {mime, arrayBuffer} = arrayBufferMimeDecouple(abWithMime)
console.log(mime) // "image/png"
console.log(arrayBuffer) // ArrayBuffer
I'm trying to decompress zlib'ed XML such as the following:
https://drive.google.com/file/d/0B52P0MZLTdw8ZzQwQzVpZGZVZWc
Uploading to online decompress services works, such as: http://i-tools.org/gzip
In PHP, I'm using this code and works just fine, I get the XML string:
$raw = file_get_contents("file_here");
$uncompressed = zlib_decode($raw);
However, I want to do this in JavaScript.
The app is a client-side Chrome extension which uses chrome.devtools.network that reads from the network logs
Reads binary responses. Example at Google Drive link at the top
JS needs to decompress that response to its original XML and parsed afterwards into object
The only problem I have is the zlib decompress part.
As of Latest Update, the Decompression Libraries work but unpacking doesn't. Please skip to the Update Sept 16 at the bottom.
I have already tried several JavaScript libraries and still cannot make it work:
Pako: https://github.com/nodeca/pako
unpack() code: https://codereview.stackexchange.com/questions/3569/pack-and-unpack-bytes-to-strings
function unpack(str) {
var bytes = [];
for(var i = 0, n = str.length; i < n; i++) {
var char = str.charCodeAt(i);
bytes.push(char >>> 8, char & 0xFF);
}
return bytes;
}
$.get("file_here", function(response){
var charData = unpack(response);
var binData = new Uint8Array(charData);
var data = pako.inflate(binData);
var strData = String.fromCharCode.apply(null, new Uint16Array(data));
console.log(strData);
});
Error: Uncaught incorrect header check
It's the same even placing the response elsewhere:
new Uint8Array(response);
pako.inflate(response);
Imaya's zlib: https://github.com/imaya/zlib.js
$.get("file_here", function(response){
var inflate = new Zlib.Inflate(response);
var output = inflate.decompress();
console.log(output);
});
Error: Uncaught Error: unsupported compression method inflate.js:60
Still using Imaya's zlib, combining with this Stack Overflow question:
Decompress gzip and zlib string in javascript
$.get("file_here", function(response){
var response = response.split('').map(function(e) {
return e.charCodeAt(0);
});
var inflate = new Zlib.Inflate(response);
var output = inflate.decompress();
console.log(output);
});
Error: Uncaught Error: invalid fcheck flag:29 inflate.js:65
dankogai's js-deflate: https://github.com/dankogai/js-deflate
console.log(RawDeflate.inflate(response));
Output: empty
augustl's js-inflate: https://github.com/augustl/js-inflate
console.log(JSInflate.inflate(response));
Output: empty
zlib-browserify: https://github.com/brianloveswords/zlib-browserify
Error: ReferenceError: exports is not defined
This is just a wrapper for Imaya's zlib. I think this is requireJS? I'm not even sure how to use it. Can it even be used without installing anything and just jQuery/JS? The app as mentioned is downloadable Chrome extension with just HTML importing JS files.
UPDATE Sept 16, 2014
It seems the problem is with the JavaScript unpack( ) function. When I use the ByteArray generated by PHP: http://pastebin.com/uDWvK94B, the JavaScript decompression functions work.
PHP unpacking that works:
$unpacked = unpack("C*", $raw);
For the JavaScript unpack( ) code that I use, which doesn't work, see top of the post under Pako section.
So the new question is, why does JavaScript generate a different ByteArray values than the one generated by PHP.
Is it really a problem with the unpack( ) function?
or is it something when the JS fetches the file, the encoding or whatsoever changes thus bytes get messed up?
and lastly, what is your suggested fix?
UPDATE Sept 20, 2014
With more research and some of the answers here giving leads
Sebastian S opening the idea that the problem was in the manner of retrieving data and it had something to do with text encodings
user3995789 providing an example that it will work even without the unpack( ) function, though outside the context of Chrome extensions
Isaac providing examples in the context of Chrome extensions, but still does not work
With that I researched further combining all leads which lead me to a theory that the reason behind all this is that Chrome is unable to get "raw" data through its request.getContent function. See here for the Chrome documentation for the said function.
As of now, I have taken the issue to Chrome, see here.
UPDATE March 24, 2015
Although the problem was not fully resolved, the answer which I think was the most useful to me was from #Sebastian S, who proposed that "the way" I was taking or receiving the data was at fault and a bad conversion was the cause, which is as near as the problem was.
Jquery reads in utf8 format, you have to read the raw file, this function will work.
function readTextFile(file)
{
var rawFile = new XMLHttpRequest();
rawFile.open('GET', file, true);
rawFile.responseType = 'arraybuffer';
rawFile.onload = function (response)
{
var words = new Uint8Array(rawFile.response);
console.log(words[1]);
console.log(pako.ungzip(words));
};
rawFile.send();
}
For more information see this answer.
I understood that you wanna use the zlib decompression inside a chrome extension while reading reponses bodies from the network log.
You need first to retrieve the base64 who will be decompressed. You can achieve this while using the getContent method.
function zlibDecompress(base64Content){
// var base64Content = base64Content.split(',')[1]; // Not sure if need to keep it
// Decode base64 (convert ascii to binary)
var strData = atob(base64Content);
// Convert binary string to character-number array
var charData = strData.split('').map(function(x){return x.charCodeAt(0);});
// Turn number array into byte-array
var binData = new Uint8Array(charData);
// Pako inflate
var data = pako.inflate(binData, { to: 'string' });
return data;
}
chrome.devtools.network.onRequestFinished.addListener(
function(request) {
request.getContent(
function(content, encoding){
if(encoding == 'base64'){
var output = zlibDecompress(content);
}
}
);
}
);
https://developer.chrome.com/extensions/devtools_network#type-Request
Using XMLHttpRequest :
<script type="text/javascript" src="pako.js"></script>
<script type="text/javascript">
function zlibDecompress(url){
var xhr = new XMLHttpRequest();
xhr.open('GET', url, true);
xhr.responseType = 'blob';
xhr.onload = function(oEvent) {
// Base64 encode
var reader = new window.FileReader();
reader.readAsDataURL(xhr.response);
reader.onloadend = function() {
base64data = reader.result;
var base64 = base64data.split(',')[1];
// Decode base64 (convert ascii to binary)
var strData = atob(base64);
// Convert binary string to character-number array
var charData = strData.split('').map(function(x){return x.charCodeAt(0);});
// Turn number array into byte-array
var binData = new Uint8Array(charData);
// Pako inflate
var data = pako.inflate(binData, { to: 'string' });
console.log(data);
}
};
xhr.send();
}
zlibDecompress('fileurl');
</script>
If you wanna use XMLHttpRequest with chrome extension
{
"name": "My extension",
...
"permissions": [
"http://www.domain.com/", // The domain that hold the file
"http://*/" // Or every domain
],
...
}
https://developer.chrome.com/extensions/xhr
Feel free to ask if you have any questions ;)
In my opinion the question you should really be asking is: How do you retrieve the compressed data? As soon as it becomes an UTF-16 string, the trouble begins. I'm not even sure, if the conversion from raw byte data to javascript strings is lossless.
As you wrote something about php, I assume you're communicating to some sort of backend. If this is true, there are options to handle binary data with native means. Maybe this can help you: https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/Sending_and_Receiving_Binary_Data
I'm writing a handler for accessing binary data. Currently I'm doing this:
...
d = new FileReader();
d.onload = function (e) {
var response_buffer, data, byte_len, temp_float;
response_buffer = e.target.result;
data = new DataView(response_buffer);
byte_len = data.byteLength;
// create empty placeholder for binary data received
tmp_data = new Float32Array(byte_len / Float32Array.BYTES_PER_ELEMENT);
len = tmp_data.length;
// Pre-parse
for (i = 0; i < len; i += 1) {
tmp_data[i] = data.getFloat32(i * Float32Array.BYTES_PER_ELEMENT, true);
}
...
Which works fine and "pre-processes" my fetched data (file/http) into my tmp_data array.
However I need to be able to handle large files as well (like 1GB+). My idea was to try and fetch only part of the file because from the binary structure I would know exactly what offsets I would have to fetch.
Question:
Is it possible to XHR for a "range" of bytes from a large file if I know offset and length instead of having to fetch the whole file?
Thanks!
If your server respects it, you can set the Range: request header, which does exactly that.
You could also implement your own polling strategy where you ask for the data in chunks of x bytes and pass along an offset. Similar to how an endless scroll works on a web page.
I need to store a lot of text in WebSQl, so I decided to compress the text with zip.js
and store the compressed Blobs.
From the documentation you can compress a blob as follows:
function zipBlob(filename, blob, callback) {
// use a zip.BlobWriter object to write zipped data into a Blob object
zip.createWriter(new zip.BlobWriter("application/zip"), function(zipWriter) {
// use a BlobReader object to read the data stored into blob variable
zipWriter.add(filename, new zip.BlobReader(blob), function() {
// close the writer and calls callback function
zipWriter.close(callback);
});
}, onerror);
}
Although this works, I don't understand why you need to specify a filename. Is this really necessary? And, is this file always removed after compression?
Check this answer here - it does not require a filename, and I'll bet its much easier to use. I've tried quite a few javascript compression/decompression implementations and have been stung by problems such as limits on size of original data, overall speed, efficiency, and so forth. It is oddly difficult to find a good compression/decompression implementation in javascript, but thankfully this one hasn't failed me yet (and I've used it quite a bit):
Compressing a blob in javascript
The implementation you have currently requires the filename because it is trying to be consistent with the zip, so that you can save it for example to a desktop and open it with your favorite zip utility. It sounds like your challenge is very similar to mine, I needed to save and restore compressed items out of local storage in the browser as well as on the server.
The filename is necessary according to this implementation. It wouldn't be necessary if you were only compressing the data, but zip.js builds zip files, which store files, which must have filenames.
In your original example, zipWriter.add() effectively converts your blob into a new file and adds it to a zip – and the "filename" parameter is the name you want that new file to have.
Here's an example that uses zip.js to add multiple blobs to a zip and then downloads it with FileSaver.js:
function zipBlob() {
zip.createWriter(new zip.BlobWriter("application/zip"), function(writer) {
files = ["abcd", "123"];
var f = 0;
function nextFile(f) {
fblob = new Blob([files[f]], { type: "text/plain" });
writer.add("file"+f, new zip.BlobReader(fblob), function() {
// callback
f++;
if (f < files.length) {
nextFile(f);
} else close();
});
}
function close() {
// close the writer
writer.close(function(blob) {
// save with FileSaver.js
saveAs(blob, "example.zip");
});
}
nextFile(f);
}, onerror);
}