base64 or hex encoded Int8 binary string to Int32Array - javascript

I have a binary string that must travel over a ws websocket (I can't use websocket.io) and so is being JSON.stringified. e.g. var msg.data = data.toString('base64')
On the other end, I want that data back not as byte wide binary, but as an array of 32 bit integers. e.g. if the binary data is [0, 0, 0, 1] going in, I want [1] coming out. Each output element is 4 bytes.
If I just take the binary string directly, I can new Int32Array(data) and I'm golden; the result is 1/4 the length of the original and each 32 bit element is made up from 4 of the original byte wide elements.
But when I've encoded it, then decoded with var data = Buffer.from(msg.data, 'base64') then new Int32Array(data)is the same length as the original, and each 32 byte element is made from ONE of the original 8 byte elements. Int32Array.from(data) does the same.
I'm not finding any answer by searching, everyone appears to be ok with byte wide data.

.buffer
I was forgetting .buffer.
new Int32Array(data.buffer) works perfectly.

Related

How can I convert from a Javascript Buffer to a String only containing 0 and 1

So I currently am trying to implement the huffman alg and it works fine for decoding and encoding. However, I store the encoded data as follows.
The result of the encoding function is a list containing many strings made up of 0 and 1 and all are varying length.
If i'd safe them in a normal txt file it would take up more space, if Id store them how they are in a binary file it could be that for example an 'e' which would have the code 101 would be stored in a full 8 bits looking like '00000101' which is wasteful and wont take up less storage then the original txt file. I took all the strings in the list and put them into one string and split it into equal parts of length 8 to store them more effectively.
However if I wanna read the data now, instead of 0 and 1 I get utf-8 chars, even some escape characters.
I'm reading the file with fs.readFileSync("./encoded.bin", "binary"); but javascript then thinks it's a buffer already and converts it to a string and it gets all weird... Any solutions or ideas to convert it back to 0 and 1?
I also tried to switch the "binary" in fs.readFileSync("./encoded.bin", "binary"); to a "utf-8" which helped with not crashing my terminal but still is "#��C��Ʃ��Ԧ�y�Kf�g��<�e�t"
To clarify, my goal in the end is to read out the massive string of binary data which would look like this "00011001000101001010" and actually get this into a string...
You can convert a String of 1s and 0s to the numerical representation of a byte using Number.parseInt(str, 2) and to convert it back, you can use nr.toString(2).
The entire process will look something like this:
const original = '0000010100000111';
// Split the string in 8 char long substrings
const stringBytes = original.match(/.{8}/g);
// Convert the 8 char long strings to numerical byte representations
const numBytes = stringBytes.map((s) => Number.parseInt(s, 2));
// Convert the numbers to an ArrayBuffer
const buffer = Uint8Array.from(numBytes);
// Write to file
// Read from file and reverse the process
const decoded = [...buffer].map((b) => b.toString(2).padStart(8, '0')).join('');
console.log('original', original, 'decoded', decoded, 'same', original === decoded);
var binary = fs.readFileSync("./binary.bin");
binary = [...binary].map((b) => b.toString(2).padStart(8, "0")).join("");
console.log(binary);
//Output will be like 010000111011010

Extract a buffer having different types of data with it

Can a buffer have both string and image associated with it? If so, how to extract them separately.
An example case would be a buffer with image data and also file name data.
I have worked with sharedArrayBuffers/arrayBuffers before.
If you are storing image pixel data, it's going to be a u32-int array, with 4 8-bit segment controlling rbga respectively... yes: you CAN tack on string data at the front in the form of a 'header' if you encode it and decode it to int values... but I have a hard time understanding why that might be desirable. because working with raw pixel data that is ONLY pixel-data is simpler. (I usually just stick it as a property of an object, with whatever other data I want to store)
Data Buffers
Typed arrays
You can use ArrayBuffer to create a buffer to hold the data. You then create a view using a typed array. eg unsigned characters Uint8Array. Types can be 8-16-32-64 bit (un/signed integers), float - double (32 - 64 bit floating point)
One buffer can have many view. You can read and write to view just like any JS array. The values are automatically converted to the correct type when you write to a buffer, and converted to Number when you read from a view
Example
Using buffer and views to read different data types
For example say you have file data that has a 4 character header, followed by a 16 bit unsigned integer chunk length, then 2 signed 16 bit integer coordinates, and more data
const fileBuffer = ArrayBuffer(fileSizeInBytes);
// Create a view of the buffer so we can fill it with file data
const dataRaw = new Uint8Array(data);
// load the data into dataRaw
// To get a string from the data we can create a util function
function readBufferString(buffer, start, length) {
// create a view at the position of the string in the buffer
const bytes = new Uint8Array(buffer, start, length);
// read each byte converting to JS unicode string
var str = "", idx = 0;
while (idx < length) { str += String.fromCharCode(bytes[idx++]) }
return str;
}
// get 4 char chunk header at start of buffer
const header = readBufferString(fileBuffer, 0, 4);
if (header === "HEAD") {
// Create views for 16 bit signed and unsigned integers
const ints = new Int16Array(fileBuffer);
const uints = new Uint16Array(fileBuffer);
const length = uints[2]; // get the length as unsigned int16
const x = ints[3]; // get the x coord as signed int16
const y = ints[4]; // get the y coord as signed int16
A DataView
The above example is one way of extracting the different types of data from a single buffer. However there could be an problem with older files and some data sources regarding the order of bytes that create multi byte types (eg 32 integers). This is called endianness
To help with using the correct endianness and to simplify access to all the different data types in a buffer you can use a DataView
The data view lets you read from the buffer by type and endianness. For example to read a unsigned 64bit integer from a buffer
// fileBuffer is a array buffer with the data
// Create a view
const dataView = new DataView(fileBuffer);
// read the 64 bit uint starting at the first byte in the buffer
// Note the returned value is a BigInt not a Number
const bInt = dataView.getBigUint64(0);
// If the int was in little endian order you would use
const bInt = dataView.getBigUint64(0, true); // true for little E
Notes
Buffers are not dynamic. That means they can not grow and shrink and that you must know how large the buffer needs to be when you create it.
Buffers tend to be a little slower than JavaScript's standard array as there is a lot type coercion when read or writing to buffers
Buffers can be transferred (Zero copy transfer) across threads making them ideal for distributing large data structures between WebWorkers. There is also a SharedArrayBuffer that lets you create true parallel processing solutions in JS

javascript turn float-allocated number into smaller unsigned numbers

in javascript numbers are always allocated as double precision floats. This is fine if you aren't sending huge amounts of these as binary without compression, or don't need to conserve memory. If you need to make these numbers smaller how do you do so?
The obvious goal would be to store numbers into the smallest possible byte size, for example 208 : 1 byte, 504 : 2 bytes. Even better would be smallest number of bit size, for example 208 : 8 bits, 504 : 9 bits.
example:
//myNetwork is a supposed network API that sends as binary
var x = 208;
myNetwork.send(x); // sends 01000011010100000000000000000000
myNetwork.send(x.toString()); //sends 001100100011000000111000
There is also typed arrays, but turning into a typed array is tricky if it isn't already a blob or file. On certain network APIs in Javascript the raw data is often represented as a string before you can touch it.
encoding
//myNetwork is a supposed network API that sends as binary
var x = 208;
myNetwork.send(String.fromCharCode(x)); //sends 11010000 , also known as Ð
decoding
var receivedString = "Ð";
var decodedNum = receivedString.charCodeAt(0); //208
The string method mentioned is 24 bits, whereas this is only 8 bits.
The drawback of this method is that there is obviously some waste if you want less than byte sized values. For example, you should be able to store 512 values in 9 bits, however you'd be forced to go up to 16 bits (2 bytes) which is 65,535 values because in unicode characters are all byte-sized. However, it is fine if you'll be utilizing the full range of values.

How do you decide which typed array to use?

I am trying to create a view of ArrayBuffer object in order to JSONify it.
var data = { data: new Uint8Array(arrayBuffer) }
var json = JSON.stringify(data)
It seems that the size of the ArrayBuffer does not matter even with the smallest Uint8Array. I did not get any RangeError so far:) If so, how do I decide which typed array to use?
You decide based on the data stored in the buffer, or, better said, based on your interpretation of that data.
Also an Uint8Array is not an 8 bit array, it's an array of unsigned 8 bit integers. It can have any length. A Uint8Array created from the same ArrayBuffer as a Uint16Array is going to be twice as long, because every byte in the ArrayBuffer is going to be "placed" as one element of the Uint8Array, while for the Uint16Array each pair of bytes is going to "become" one element in the array.
A good explanation of what happens is if we try thinking in binary. Try running this:
var buffer = new ArrayBuffer(2);
var uint8View = new Uint8Array(buffer);
var uint16View = new Uint16Array(buffer);
uint8View[0] = 2;
uint8View[1] = 1;
console.log(uint8View[0].toString(2));
console.log(uint8View[1].toString(2));
console.log(uint16View[0].toString(2));
The output is going to be
10
1
100000010
because displayed as an unsigned 8 bit integer in binary, 2 is 00000010 and 1 is 00000001. (toString strips leading zeroes).
Uint8Array represents an array of bytes. As I said, an element is an unsigned 8 bit integer. We just pushed two bytes to it.
In memory those two bytes are stored side by side as 00000001 00000010 (binary form again used to make things clearer).
Now when you initialize a Uint16Array over the same buffer it's going to contain the same bytes, but because an element is a unsigned 16 bit integer (two bytes), when you access uint16View[0] it's going to take the first two bytes and give them back to you. So 0000000100000010, which is 100000010 with no leading zeroes.
If you interpret this data as base 10 (decimal) integers you'll know it's 0000000100000010 to base 10 (258).
Neither Uint8Array nor Uint16Array store any data themselves. They are simply different ways of accessing bytes in an ArrayBuffer.
how one chooses which one to use? It's not based on preference but on the underlying data. ArrayBuffer is to be used when you receive some binary data from some external source (web socket maybe) and already know what the data represents. It might be a list of unsigned 8 bit integers, or one of signed 16 bit ones, or even a mixed list where you know the first element is an 8 bit integer and the next one is a 16 bit one. Then you can use DataView to read typed items from it.
If you don't know what the data represents you can't choose what to use.

javascript hex codes with TCP/IP communication

I’m using a node module ‘net’ to create a client application that sends data through a TCP socket. The server-side application accepts this message if it starts and ends with a correct hex code, just for example the data packet would start with a hex “0F” and ends with a hex “0F1C”. How would I create these hex codes with javascript ? I found this code to convert a UTF-8 string into a hex code, not sure if this is what I need as I don’t have much experience with TCP/IP socket connections. Heres some javascript I've used to convert a utf-8 to a hex code. But I'm not sure this is what I'm looking for? Does anyone have experience with TCP/IP transfers and/or javascript hex codes?.
function toHex(str,hex){
try{
hex = unescape(encodeURIComponent(str))
.split('').map(function(v){
return v.charCodeAt(0).toString(16)
}).join('')
}
catch(e){
hex = str
console.log('invalid text input: ' + str)
}
return hex
}
First of all, you do not need to convert your data string into hex values, in order to send it over TCP. Every string in node.js is converted to bytes when sent over the network.
Normally, you'd send over a string like so:
var data = "ABC";
socket.write(data); // will send bytes 65 66 67, or in hex: 44 45 46
Node.JS also allows you to pass Buffer objects to functions like .write().
So, probably the easiest way to achieve what you wish, is to create an appropriate buffer to hold your data.
var data = "ABC";
var prefix = 0x0F; // JavaScript allows hex numbers.
var suffix = 0x0FC1;
var dataSize = Buffer.byteLength(data);
// compute the required buffer length
var bufferSize = 1 + dataSize + 2;
var buffer = new Buffer(bufferSize);
// store first byte on index 0;
buffer.writeUInt8(prefix, 0);
// store string starting at index 1;
buffer.write(data, 1, dataSize);
// stores last two bytes, in big endian format for TCP/IP.
buffer.writeUInt16BE(suffix, bufferSize - 2);
socket.write(buffer);
Explanation:
The prefix hex value 0F requires 1 byte of space. The suffix hex value 0FC1 actually requires two bytes (a 16-bit integer).
When computing the number of required bytes for a string (JavaScript strings are UTF-16 encoded!), str.length is not accurate most of the times, especially when your string has non-ASCII characters in it. For this, the proper way of getting the byte size of a string is to use Buffer.byteLength().
Buffers in node.js have static allocations, meaning you can't resize them after you created them. Hence, you'll need to compute the size of the buffer -in bytes- before creating it. Looking at our data, that is 1 (for our prefix) + Buffer.byteLength(data) (for our data) + 2 (for our suffix).
After that -imagine buffers as arrays of bytes (8-bit values)-, we'll populate the buffer, like so:
write the first byte (the prefix) using writeUInt8(byte, offset) with offset 0 in our buffer.
write the data string, using .write(string[, offset[, length]][, encoding]), starting at offset 1 in our buffer, and length dataSize.
write the last two bytes, using .writeUInt16BE(value, offset) with offset bufferSize - 2. We're using writeUInt16BE to write the 16-bit value in big-endian encoding, which is what you'd need for TCP/IP.
Once we've filled our buffer with the correct data, we can send it over the network, using socket.write(buffer);
Additional tip:
If you really want to convert a large string to bytes, (e.g. to later print as hex), then Buffer is also great:
var buf = Buffer.from('a very large string');
// now you have a byte represetantion of the string.
Since bytes are all 0-255 decimal values, you can easily print them as hex values in console, like so:
for (i = 0; i < buf.length; i++) {
const byte = buf[i];
const hexChar = byte.toString(16); // convert the decimal `byte` to hex string;
// do something with hexChar, e.g. console.log(hexChar);
}

Categories

Resources