javascript turn float-allocated number into smaller unsigned numbers - javascript

in javascript numbers are always allocated as double precision floats. This is fine if you aren't sending huge amounts of these as binary without compression, or don't need to conserve memory. If you need to make these numbers smaller how do you do so?
The obvious goal would be to store numbers into the smallest possible byte size, for example 208 : 1 byte, 504 : 2 bytes. Even better would be smallest number of bit size, for example 208 : 8 bits, 504 : 9 bits.
example:
//myNetwork is a supposed network API that sends as binary
var x = 208;
myNetwork.send(x); // sends 01000011010100000000000000000000
myNetwork.send(x.toString()); //sends 001100100011000000111000
There is also typed arrays, but turning into a typed array is tricky if it isn't already a blob or file. On certain network APIs in Javascript the raw data is often represented as a string before you can touch it.

encoding
//myNetwork is a supposed network API that sends as binary
var x = 208;
myNetwork.send(String.fromCharCode(x)); //sends 11010000 , also known as Ð
decoding
var receivedString = "Ð";
var decodedNum = receivedString.charCodeAt(0); //208
The string method mentioned is 24 bits, whereas this is only 8 bits.
The drawback of this method is that there is obviously some waste if you want less than byte sized values. For example, you should be able to store 512 values in 9 bits, however you'd be forced to go up to 16 bits (2 bytes) which is 65,535 values because in unicode characters are all byte-sized. However, it is fine if you'll be utilizing the full range of values.

Related

How to convert/store a double precision (64bit) number to a 32 bit float or a UInt16 in javascript

Assume I can bare the loss of digits/precision to some degree. I find that to send a 64bit (8 bytes) number over the network sometimes is overkilled. I want the data to use less bandwidth but maintain certain accuracy. But I don't know the correct way to store a number in 32 bit or 16 bit data in javascript.
Here's how you can convert a JavaScript number to an array buffer holding a 32 bit float or a 16 bit unsigned integer:
let float64 = 3.141592653589793238462643383279502884197;
let float32View = new DataView(new ArrayBuffer(4));
float32View.setFloat32(0, float64);
let uint16View = new DataView(new ArrayBuffer(2));
uint16View.setUint16(0, float64);
console.log(float64);
console.log(float32View.getFloat32(0));
console.log(uint16View.getUint16(0));

How do you decide which typed array to use?

I am trying to create a view of ArrayBuffer object in order to JSONify it.
var data = { data: new Uint8Array(arrayBuffer) }
var json = JSON.stringify(data)
It seems that the size of the ArrayBuffer does not matter even with the smallest Uint8Array. I did not get any RangeError so far:) If so, how do I decide which typed array to use?
You decide based on the data stored in the buffer, or, better said, based on your interpretation of that data.
Also an Uint8Array is not an 8 bit array, it's an array of unsigned 8 bit integers. It can have any length. A Uint8Array created from the same ArrayBuffer as a Uint16Array is going to be twice as long, because every byte in the ArrayBuffer is going to be "placed" as one element of the Uint8Array, while for the Uint16Array each pair of bytes is going to "become" one element in the array.
A good explanation of what happens is if we try thinking in binary. Try running this:
var buffer = new ArrayBuffer(2);
var uint8View = new Uint8Array(buffer);
var uint16View = new Uint16Array(buffer);
uint8View[0] = 2;
uint8View[1] = 1;
console.log(uint8View[0].toString(2));
console.log(uint8View[1].toString(2));
console.log(uint16View[0].toString(2));
The output is going to be
10
1
100000010
because displayed as an unsigned 8 bit integer in binary, 2 is 00000010 and 1 is 00000001. (toString strips leading zeroes).
Uint8Array represents an array of bytes. As I said, an element is an unsigned 8 bit integer. We just pushed two bytes to it.
In memory those two bytes are stored side by side as 00000001 00000010 (binary form again used to make things clearer).
Now when you initialize a Uint16Array over the same buffer it's going to contain the same bytes, but because an element is a unsigned 16 bit integer (two bytes), when you access uint16View[0] it's going to take the first two bytes and give them back to you. So 0000000100000010, which is 100000010 with no leading zeroes.
If you interpret this data as base 10 (decimal) integers you'll know it's 0000000100000010 to base 10 (258).
Neither Uint8Array nor Uint16Array store any data themselves. They are simply different ways of accessing bytes in an ArrayBuffer.
how one chooses which one to use? It's not based on preference but on the underlying data. ArrayBuffer is to be used when you receive some binary data from some external source (web socket maybe) and already know what the data represents. It might be a list of unsigned 8 bit integers, or one of signed 16 bit ones, or even a mixed list where you know the first element is an 8 bit integer and the next one is a 16 bit one. Then you can use DataView to read typed items from it.
If you don't know what the data represents you can't choose what to use.

Buffer to integer. Having trouble understanding this line of code

I'm looking for help in understanding this line of code in the npm moudle hash-index.
The purpose of this module is to be a function which returns the sha-1 hash of an input mod by the second argument you pass.
The specific function in this module that I don't understand is this one that takes a Buffer as input and returns an integer:
var toNumber = function (buf) {
return buf.readUInt16BE(0) * 0xffffffff + buf.readUInt32BE(2)
}
I can't seem to figure out why those specific offsets of the buffer are chosen and what the purpose of multiplying by 0xffffffff is.
This module is really interesting to me and any help in understanding how it's converting buffers to integers would be greatly appreciated!
It prints the first UINT32 (Unsigned Integer 32 bits) in the buffer.
First, it reads the first two bytes (UINT16) of the buffer, using Big Endian, then, it multiplies it by 0xFFFFFFFF.
Then, it reads the second four bytes (UINT32) in the buffer, and adds it to the multiplied number - resulting in a number constructed from the first 6 bytes of the buffer.
Example: Consider [Buffer BB AA CC CC DD ... ]
0xbb * 0xffffffff = 0xbaffffff45
0xbaffffff45 + 0xaaccccdd = 0xbbaacccc22
And regarding the offsets, it chose that way:
First time, it reads from byte 0 to byte 1 (coverts to type - UINT16)
second time, it reads from byte 2 to byte 5 (converts to type - UINT32)
So to sum it up, it constructs a number from the first 6 bytes of the buffer using big endian notation, and returns it to the calling function.
Hope that's answers your question.
Wikipedia's Big Endian entry
EDIT
As someone pointed in the comments, I was totally wrong about 0xFFFFFFFF being a left-shift of 32, it's just a number multiplication - I'm assuming it's some kind of inner protocol to calculate a correct legal buffer header that complies with what they expect.
EDIT 2
After looking on the function in the original context, I've come to this conclusion:
This function is a part of a hashing flow, and it works in that manner:
Main flow receives a string input and a maximum number for the hash output, it then takes the string input, plugs it in the SHA-1 hashing function.
SHA-1 hashing returns a Buffer, it takes that Buffer, and applies the hash-indexing on it, as can be seen in the following code excerpt:
return toNumber(crypto.createHash('sha1').update(input).digest()) % max
Also, it uses a modulu to make sure the hash index returned doesn't exceed the maximum possible hash.
Multiplication by 2 is equivalent to a shift of bits to the left by 1, so the purpose of multiplying by 2^16 is the equivalent of shifting the bits left 16 times.
Here is a similar question already answered:
Bitwise Logic in C

Reassembling negative Python marshal int's into Javascript numbers

I'm writing a client-side Python bytecode interpreter in Javascript (specifically Typescript) for a class project. Parsing the bytecode was going fine until I tried out a negative number.
In Python, marshal.dumps(2) gives 'i\x02\x00\x00\x00' and marshal.dumps(-2) gives 'i\xfe\xff\xff\xff'. This makes sense as Python represents integers using two's complement with at least 32 bits of precision.
In my Typescript code, I use the equivalent of Node.js's Buffer class (via a library called BrowserFS, instead of ArrayBuffers and etc.) to read the data. When I see the character 'i' (i.e. buffer.readUInt8(offset) == 105, signalling that the next thing is an int), I then call readInt32LE on the next offset to read a little-endian signed long (4 bytes). This works fine for positive numbers but not for negative numbers: for 1 I get '1', but for '-1' I get something like '-272777233'.
I guess that Javascript represents numbers in 64-bit (floating point?). So, it seems like the following should work:
var longval = buffer.readInt32LE(offset); // reads a 4-byte long, gives -272777233
var low32Bits = longval & 0xffff0000; //take the little endian 'most significant' 32 bits
var newval = ~low32Bits + 1; //invert the bits and add 1 to negate the original value
//but now newval = 272826368 instead of -2
I've tried a lot of different things and I've been stuck on this for days. I can't figure out how to recover the original value of the Python integer from the binary marshal string using Javascript/Typescript. Also I think I deeply misunderstand how bits work. Any thoughts would be appreciated here.
Some more specific questions might be:
Why would buffer.readInt32LE work for positive ints but not negative?
Am I using the correct method to get the 'most significant' or 'lowest' 32 bits (i.e. does & 0xffff0000 work how I think it does?)
Separate but related: in an actual 'long' number (i.e. longer than '-2'), I think there is a sign bit and a magnitude, and I think this information is stored in the 'highest' 2 bits of the number (i.e. at number & 0x000000ff?) -- is this the correct way of thinking about this?
The sequence ef bf bd is the UTF-8 sequence for the "Unicode replacement character", which Unicode encoders use to represent invalid encodings.
It sounds like whatever method you're using to download the data is getting accidentally run through a UTF-8 decoder and corrupting the raw datastream. Be sure you're using blob instead of text, or whatever the equivalent is for the way you're downloading the bytecode.
This got messed up only for negative values because positive values are within the normal mapping space of UTF-8 and thus get translated 1:1 from the original byte stream.

Is the smallest usable memory size 2 bytes, In javascript?

To represent a single character in JS, we use " or ', i.e use a string of length 1.
But since JavaScript uses the UTF-16 encoding, the character would be 16 bit as well.
Would it be safe to thus say, 2 bytes is the smallest re presentable size, that is possible in JS.
Or would there be some data structure, which can represent a single byte (8 bit) which I am missing.
EDIT:
I understand using the datatype with word size (usually 4 bytes) is always the most efficient, but I just wondered if the creators of JS would have cared to include something in.
Actionscript 3 derives from ECMAScript as well & it includes a variable size data type i.e an integer in AS3 is 1 to 4 bytes of storage.
Note : I'm not really looking for a hack to fit in multiple bytes into a larger datatype.
Guess the answer is most probably no.
Well, not in ECMAScript but in node.js you have Buffer and in browsers you have Uint8Array
var b = new UInt8Array(16); //Holds 16 bytes
Technically, you can have an array of 8 booleans - but there might be significant overhead in storing the array itself.
Taking a page from c-style unions, sure, you can create your own data structure which does just that. Using a large int as a bitmap.
var memory = 0;
function setbyte(addr, value) {
memory = memory | (value << addr);
}
function getbyte(addr) {
return 0xFF & (memory>>addr)
}

Categories

Resources