Javascript - Reading Binary Records with FileReader into ArrayBuffers - javascript

I have a binary data file where each x bytes is a record, and I have some format/mask (however you prefer to see it) to decipher that data. It's like, short short int short float double, blah blah. So I'm reading this file with the File API, I'll need to be using ArrayBuffers eventually but I'm not there yet... So my question is two fold. Firstly, and most directly, what is the best way to read in every x bytes from a binary file into an ArrayBuffer?
Secondly, as I'm running into some problems... why is the below script filling 5gb+ of RAM nearly immediately when reading a 500kb binary file?
$('input[type="file"]').change(function(event) {
// FileList object
var files = event.target.files;
for (var i = 0, f; f = files[i]; i++) {
var reader = new FileReader();
// closures and magnets, how do they work
reader.onload = (function(f) {
return function(event) {
// data file starts with header XML
// indexOf +9 for </HEADER> and +1 for null byte
var data_start = event.target.result.indexOf('</HEADER>')+10,
// leverage jQuery for XML
header = $(event.target.result.slice(0,data_start)),
rec_len = parseInt(header.find('REC_LEN').text(),10);
// var ArrayBuffer
// define ArrayBufferView
// loop through records
for (var i = data_start; i<event.target.result.length; i+=rec_len) {
// fill ArrayBuffer
// add data to global data []
console.log(i+' : '+event.target.result.slice(i, i+rec_len));
}
};
})(f);
// Read as Binary
reader.readAsBinaryString(f);
}
});

A couple general tips at least:
Using a DataView is flexible but a bit slow -- it should be faster than parseInt called on strings but not as fast as array views. The upside is that it supports different byte orders, if your binary data requires. Use reader.readAsArrayBuffer(f), then in your onload callback, use something like
var dv = new DataView(arrayBuffer, [startCoord, [endCoord]]),
result = [];
// in some loop for i...
result[i] = [];
result[i].push(dv.getInt8(coord));
// coord += offset;
result[i].push(dv.getFloat32(coord));
// end some loop
As I mentioned, faster would be to create multiple views on on the ArrayBuffer, but you can't (to my knowledge) change the cursor position as you go -- so your mixed data types will be an issue.
To put the results into a typed array, just declare something like var col1 = Uint8Array(length);. The typed array subclasses are listed here. Note that in my experience, typed arrays don't gain you much in terms of performance. Google around for some jsperf tests of typed arrays.

Related

Browser read integer from binary string

I have for example here this string
'x���'
Which you may possibly not see depending on the devide you're using. That's the number 2024000250 encoded as a 32 bit signed big endian integer, which I've generated using Node
let buffer = new Buffer(4);
b.writeInt32BE(2024000250).toString();
I'm receiving the 4 bytes in question on the client side but I can't seem to find how to turn them back into an integer...
I might be dead wrong here. But as far as I remember unicode characters can be between 2-4 bytes. When you transfer your binary data as text to client-side you risk corrupting this information because the client is going to interpret them as unicode.
If I were to convert that text to a blob on client side:
var b = new Blob(['x���'],{type:"application/octet-stream"});
b.size; //10
As you can see I receive 10 bytes, which is wrong, it should have been 4.
You can transfer the data directly as a binary string, since you are using Node, on the server side:
function createBuffer(v){
var b = new ArrayBuffer(4),
vw = new DataView(b);
vw.setInt32(0,v);
return b;
}
This will create your buffer, now you cannot just send this as it is to client, either represent it as a json or directly as a binary string. To represent it as binary you don't need the above function, you could have done:
("0".repeat(32) + (2024000250).toString(2)).slice(-32); //"01111000101000111100101011111010"
If you want json, you can do:
function convertBuffToBinaryStr(buff){
var res = [],
l = buff.byteLength,
v = new DataView(buff);
for (var i = 0; i < l; ++i){
res.push(v.getUint8(i));
}
return JSON.stringify(res);
}
Now try seeing what this outputs:
convertBuffToBinaryStr(createBuffer(2024000250)); //"[120,163,202,250]"
Back on the client-side you have to interpret this:
function interpret(json){
json = JSON.parse(json);
return parseInt(json.map((d)=>("0".repeat(8) + d.toString(2)).slice(-8)).join(""),2);
}
Now try:
interpret("[120,163,202,250]"); //2024000250
Note: For your interpret function, you have to use dataView to setUint8 and the use getInt32 at the end, since you are using signed integers, above won't work for all cases.
Well I finally got around to getting this to work.
It is not quite what I started off with but I'm just gonna post this for some other lost souls.
It is worth mentioning that ibrahim's answer contains most of the necessary information but is trying to satisfy the XY problem which my question ended up being.
I just send my binary data, as binary
let buffer = new Buffer(4);
buffer.writeInt32BE(2024000250);
// websocket connection
connection.send(buffer);
Then in the browser
// message listener
let reader = new FileReader();
reader.addEventListener('loadend', () => {
let view = new DataView(reader.result);
// there goes the precious data
console.log(view.getInt32());
});
reader.readAsArrayBuffer(message.data);
In all honesty this tickles my gag reflex. Why am I using a file reader to get some data out of a binary message? There is a good chance a better way of doing this exists, if so please add to this answer.
Other methods I found are the fetch API which is no better than the file reader in terms of hack rating and Blob.prototype.arrayBuffer which is not yet supported fully.

adding id3 tags html5 filesystem api

I have scenario, where i am building a podcast web applications that allows to listen and store .mp3 podcast file.
I am trying to implement a basic web interface where someone can add the entire id3 tag from the client side (the file will be stored locally on the client side : this client is not like the everyone client but just the guy who gets the raw podcast file without any id3 tag preferably). He then hosts this one page locally adds the correct id3 tags and then copies these .mp3 do a WebDav folder.
I do understand that edits needs to be done at server, but it would really helpful if it all can be done locally on the browser.
Off course there is no ready library to edit files , so i decided to use the HTML5 filesystem api, i.e drop the file into the virtual file system , edit it there are then copy it back to the local system. (for copying there is a ready library FileSaver.js) .
I have been able to do the following:
1) associate the mp3 file dropped at a drop zone to filesystem api using webkitGetAsEntry
2) copy this file then to the file system api.
part of the code looks like:
function onDrop(e)
{
e.preventDefault();
e.stopPropagation();
var items = e.dataTransfer.items;
var files = e.dataTransfer.files;
for (var i = 0, item; item = items[i]; ++i)
{
// Skip this one if we didn't get a file.
if (item.kind != 'file') {
continue;
}
var entry = item.webkitGetAsEntry();
if (entry.isFile)
{
// Copy the dropped entry into local filesystem.
entry.copyTo(cwd, null, function(copiedEntry) {
//setLoadingTxt({txt: DONE_MSG});
renderMp3Writer(entry);
My confusion is how do i add the entire id3 tag ? . I am lost at this point as i am not sure about:
1) can we add the entire id3 tag into the file from the fileWriter method?
2) If yes would this be a binary edit or how?? .
Any Help would be useful. tried the below but i am guessing i am wrong.
var blob1 = new Blob(['ID3hTIT2ga'], {type: 'audio/mp3'});
fileWriter.write(blob1);
You need to build a ID3 buffer, then create a buffer large enough to hold both the ID3 and MP3 file, insert the ID3 and append the MP3 data.
For this you need the ID3 specification and use typed arrays with DataView to build your array.
The ID3 overall structure is defined like this (see link above):
+-----------------------------+
| Header (10 bytes) |
+-----------------------------+
| Extended Header |
| (variable length, OPTIONAL) |
+-----------------------------+
| Frames (variable length) |
+-----------------------------+
| Padding |
| (variable length, OPTIONAL) |
+-----------------------------+
| Footer (10 bytes, OPTIONAL) |
+-----------------------------+
At this point the buffer length is unknown so you need to do this in steps. There are several ways to do this, you can build up small buffer segments for each field, then sum them up into a single buffer. Or you can make a larger buffer you know can hold all the fields you want to include and copy the sum of field from that buffer to the final one.
The latter tends to be simpler and as we're dealing with very small sizes this could be the best way (considering that each fragment in the first approach has their overheads).
So the first thing you need to do is to define the header. The header is defined this way:
ID3v2/file identifier "ID3"
ID3v2 version $04 00
ID3v2 flags %abcd0000 (note: bit-representation)
ID3v2 size 4 * %0xxxxxxx (note: bit-representation/mask)
ID3 and version are fixed values (other versions exists of course, but lets follow the current).
You can probably ignore most of the flags, if not all, by setting them to 0. But check the docs for your use-case, for example if you want to use extended headers.
Size is defined:
The ID3v2 tag size is stored as a 32 bit synchsafe integer (section
6.2), making a total of 28 effective bits (representing up to 256MB).
The ID3v2 tag size is the sum of the byte length of the extended
header, the padding and the frames after unsynchronisation. If a
footer is present this equals to ('total size' - 20) bytes, otherwise
('total size' - 10) bytes.
An example how you can build your buffer. First define a buffer big enough to hold all the data as well as a DataView:
var id3buffer = new ArrayBuffer(1024), // 1kb "space"
view = new DataView(id3buffer);
The DataView defaults to big-endian which is perfect, so all we need to do now is to fill in the data where it should be. We can make a few helper methods to help us move position at the same time as we write. Positions for DataView are byte-bound:
var pos = 0; // global start position
function setU8(value) {
view.setUint8(pos++, value)
}
function setU16(value) {
view.setUint16(pos, value);
pos += 2;
}
function setU32(value) {
view.setUint32(pos, value);
pos += 4;
}
etc. you can make helpers to write text unicode strings (see TextEncoder for example) and so forth.
To define the header, we can write in the "magic" word ID3. You could convert a string, or since it's only 3 bytes also just write it straight-forward. ID3 = 0x494433 in hex so:
setU8(0x49); // at pos 0
setU8(0x44); // at pos 1
setU8(0x33); // at pos 2
Since we made a wrapper we don't need to worry about the buffer position.
Then write in version (according to spec v.2.4.0 uses 0x0400 not using major version (2)):
setU16(0x0400); // default is big-endian so this works
Now you can continue with flags and size (see specs).
When the ID3 header is filled up pos will now hold the total length. So make a new buffer for ID3 tag and MP3 buffer:
var mp3 = new ArrayBuffer(pos + mp3Buffer.byteLength),
view8 = new Uint8Array(mp3);
The view8 view will allow us to do a simple copy to destination:
// create a segment from the tag buffer that will fit target:
var segment = new Uint8Array(view.buffer, 0, n); // replace n with actual length
view8.set(segment, 0);
view8.set(mp3buffer, pos);
If everything went OK you now have a MP3 with a ID3 tag (remember to check for existing ID3s - you need to scan to to end).
You can now send the ArrayBuffer to server, or convert to Blob for IndexedDB, or to an Object-URL if you want to present a link for download (none shown here as answer is becoming out-of-scope).
This should be enough to get you started - as said, you need to study the specs. If you're not familiar with typed array, check those out as well.
Also see the site for other resources (frames etc.).
Sync-safe values
"MP3" files uses frames which starts with 11 bits, all set to 1. If the size field of the header happen to contain 11 bits set to 1, the decoder could mistakenly interpret it as sound data. To avoid this the concept of sync-safe integers are used making sure that each byte's MSB (most signicant bit, bit 7) always is set to 0. The bit is moved to the left, the next byte is shifted one bit, for ID3 tag 4 times (hence the 4x %01111111).
Here is how to encode and decode sync-safe integers using JavaScript (from Wikipedia C/C++ source):
// test values
var value = 0xfffffff,
sync = intToSyncsafe(value);
document.write("<pre>Original size: 0x" + value.toString(16) + "<br>");
document.write("Synch-safe : 0x" + sync.toString(16) + "<br>");
document.write("Decoded value: 0x" + syncsafeToInt(sync).toString(16) + "</pre>");
function intToSyncsafe(value) {
var out, mask = 0x7f;
while(mask ^ 0x7fffffff) {
out = value & ~mask;
out <<= 1;
out |= value & mask;
mask = ((mask + 1) << 8) - 1;
value = out;
}
return out
}
function syncsafeToInt(value) {
var out = 0, mask = 0x7F000000;
while (mask) {
out >>= 1;
out |= value & mask;
mask >>= 8;
}
return out;
}
The sync-safe value would show the bits like: &b01111111011111110111111101111111 for the example value used in the demo above.

How do I use a typed array with offset in node?

I am writing a mat file parser using jBinary, which is built on top of jDataView. I have a working parser with lots of tests, but it runs very slowly for moderately sized data sets of around 10 MB. I profiled with look and found that a lot of time is spent in tagData. In the linked tagData code, ints/uints/single/doubles/whatever are read one by one from the file and pushed to an array. Obviously, this isn't super-efficient. I want to replace this code with a typed array view of the underlying bytes to remove all the reading and pushing.
I have started to migrate the code to use typed arrays as shown below. The new code preserves the old functionality for all types except 'miINT8'. The new functionality tries to view the buffer b starting at offset s and with length l, consistent with the docs. I have confirmed that the s being passed to the Int8Array constructor is non-zero, even going to far as to hard code it to 5. In all cases, the output of console.log(elems.byteOffset) is 0. In my tests, I can see that the Int8Array is indeed starting from the beginning of the buffer and not at offset s as I intend.
What am I doing wrong? How do I get the typed array to start at position s instead of position 0? I have tested this on node.js version 10.25 as well as 12.0 with the same results in each case. Any guidance appreciated as I'm totally baffled by this one!
tagData: jBinary.Template({
baseType: ['array', 'type'],
read: function (ctx) {
var view = this.binary.view
var b = view.buffer
var s = view.tell()
var l = ctx.tag.numBytes
var e = s + l
var elems
switch (ctx.tag.type) {
case 'miINT8':
elems = new Int8Array(b,s,l); view.skip(l); console.log(elems.byteOffset); break;
default:
elems = []
while (view.tell() < e && view.tell() < view.byteLength) {
elems.push(this.binary.read(ctx.tag.type))
}
}
return elems
}
}),

Javascript runs out of memory major browsers due to array size

I am trying to create an array, that will be MAASSSIVVEE...i read somewhere that javascript can create an array of up to 4.xx Billion or so. The array i am trying to create will likely be in the quadrillions or higher. I don't even know where to go from here. I am going to assume that JS is not the proper solution for this, but i'd like to give it a try...it is for client side, and i would prefer not to bog down my server with this if there are multiple people using it at once. Also, not looking to learn a new language as i am just getting into JS, and code in general.
could i possibly use setTimeout(),0 breaks in the totalcombos function? Time is not really an issue, i wouldn't mind if it took a few minutes to calculate, but right now it just crashes.
i have tried this using a dedicated worker, but it still crashes the host. the worker code is what i am posting, as the host code is irrelevant to this question (it only compiles the original objects and posts them, then receives the messages back).
The code: (sorry for the mess...im coding noob and just an enthusiast)
onmessage = function(event){
//this has been tested on the very small sample size below, and still runs out of memory
//all the objects in these first arrays are formatted as follows.
// {"pid":"21939","name":"John Smith","position":"QB","salary":"9700","fppg":"23"}
// "PID" is unique to each object, everything else could appear in another object.
// There are no repeated objects.
var qbs = **group of 10 objects like above**
var rbs = **group of 10 objects like above**
var wrs = **group of 10 objects like above**
var tes = **group of 10 objects like above**
var ks = **group of 10 objects like above**
var ds = **group of 10 objects like above**
//This code works great and fast with small sets. ie (qbs, rbs, wrs)
function totalcombos() {
var r = [], arg = arguments, max = arg.length-1;
function helper(arr, i) {
for (var j=0; j<arg[i].length; j++) {
var a = arr.slice(0); // clone arr
if(a.indexOf(arg[i][j]) != -1){
j++;
} else
a.push(arg[i][j]);
if (i==max) {
r.push(a);
} else
helper(a, i+1);
}
}
helper([], 0);
return r;
};
//WAY TOO BIG...commented out so as not to crash when run
//var tCom = totalcombos(qbs, rbs, wrs, tes, ks, ds);
//postMessage(tCom.length);
}
When the sets get to be larger like 50 objects in each, it just crashes as it is out of memory. I reduce the set with other code but it will still be very large. How would i fix it?
I am trying to create all the possible combinations and then go through and reduce from there based on total salary of each group.
When working with data, regardless of language or platform, its usually best practice to only load the data that's otherwise you encounter errors or bottlenecks etc. as you are finding.
If your data is being stored somewhere like a Database, a JSON file, or a Web Service or an API etc. (anything basically), you'd be better of searching that set of data to retrieve only that which you need, or to at least reduce the size of the Array data your're trying to traverse.
As an analogy, if you're trying to load the whole internet into memory on a PC with only 2GB of RAM, you're going to have a really bad time. :)

Is it possible to iterate through every word in stdin through means of Javascript?

I need to know if it's possible to iterate through every word input through stdin into a program using JavaScript. If so, may I get any leads on how to do so?
With Node:
var stdin = process.openStdin();
var buf = '';
stdin.on('data', function(d) {
buf += d.toString(); // when data is received on stdin, stash it in a string buffer
// call toString because d is actually a Buffer (raw bytes)
pump(); // then process the buffer
});
function pump() {
var pos;
while ((pos = buf.indexOf(' ')) >= 0) { // keep going while there's a space somewhere in the buffer
if (pos == 0) { // if there's more than one space in a row, the buffer will now start with a space
buf = buf.slice(1); // discard it
continue; // so that the next iteration will start with data
}
word(buf.slice(0,pos)); // hand off the word
buf = buf.slice(pos+1); // and slice the processed data off the buffer
}
}
function word(w) { // here's where we do something with a word
console.log(w);
}
Processing stdin is much more complicated than a simple string split because Node presents stdin as a Stream (which emits chunks of incoming data as Buffers), not as a string. (It does the same thing with network streams and file I/O.)
This is a good thing because stdin can be arbitrarily large. Consider what would happen if you piped a multi-gigabyte file into your script. If it loaded stdin into a string first, it would first take a long time, then crash when you run out of RAM (specifically, process address space).
By handling stdin as a stream, you're able to handle arbitrarily large input with good performance, since your script only deals with small chunks of data at a time. The downside is obviously increased complexity.
The above code will work on any size input and doesn't break if a word gets chopped in half between chunks.
Assuming you're using an environment that has console.log and standard input is a string, then you can do this.
Input:
var stdin = "I hate to write more than enough.";
stdin.split(/\s/g).forEach(function(word){
console.log(word)
});
Outputs:
I
hate
to
write
more
than
enough.

Categories

Resources