Buffer to integer. Having trouble understanding this line of code - javascript

I'm looking for help in understanding this line of code in the npm moudle hash-index.
The purpose of this module is to be a function which returns the sha-1 hash of an input mod by the second argument you pass.
The specific function in this module that I don't understand is this one that takes a Buffer as input and returns an integer:
var toNumber = function (buf) {
return buf.readUInt16BE(0) * 0xffffffff + buf.readUInt32BE(2)
}
I can't seem to figure out why those specific offsets of the buffer are chosen and what the purpose of multiplying by 0xffffffff is.
This module is really interesting to me and any help in understanding how it's converting buffers to integers would be greatly appreciated!

It prints the first UINT32 (Unsigned Integer 32 bits) in the buffer.
First, it reads the first two bytes (UINT16) of the buffer, using Big Endian, then, it multiplies it by 0xFFFFFFFF.
Then, it reads the second four bytes (UINT32) in the buffer, and adds it to the multiplied number - resulting in a number constructed from the first 6 bytes of the buffer.
Example: Consider [Buffer BB AA CC CC DD ... ]
0xbb * 0xffffffff = 0xbaffffff45
0xbaffffff45 + 0xaaccccdd = 0xbbaacccc22
And regarding the offsets, it chose that way:
First time, it reads from byte 0 to byte 1 (coverts to type - UINT16)
second time, it reads from byte 2 to byte 5 (converts to type - UINT32)
So to sum it up, it constructs a number from the first 6 bytes of the buffer using big endian notation, and returns it to the calling function.
Hope that's answers your question.
Wikipedia's Big Endian entry
EDIT
As someone pointed in the comments, I was totally wrong about 0xFFFFFFFF being a left-shift of 32, it's just a number multiplication - I'm assuming it's some kind of inner protocol to calculate a correct legal buffer header that complies with what they expect.
EDIT 2
After looking on the function in the original context, I've come to this conclusion:
This function is a part of a hashing flow, and it works in that manner:
Main flow receives a string input and a maximum number for the hash output, it then takes the string input, plugs it in the SHA-1 hashing function.
SHA-1 hashing returns a Buffer, it takes that Buffer, and applies the hash-indexing on it, as can be seen in the following code excerpt:
return toNumber(crypto.createHash('sha1').update(input).digest()) % max
Also, it uses a modulu to make sure the hash index returned doesn't exceed the maximum possible hash.

Multiplication by 2 is equivalent to a shift of bits to the left by 1, so the purpose of multiplying by 2^16 is the equivalent of shifting the bits left 16 times.
Here is a similar question already answered:
Bitwise Logic in C

Related

Decode a prepended VarInt from a byte stream of unknown size in javascript/Nodejs

I am writing a small utility library for me to request server status of a given minecraft host in js on node. I am using the Server List Ping Protocol as outlined here (https://wiki.vg/Server_List_Ping) and got it mostly working as expected, albeit having big trouble working with unsupported Data Types (VarInt) and had to scour the internet to find a way of converting js nums into VarInts in order to craft the necessary packet buffers:
function toVarIntBuffer(integer) {
let buffer = Buffer.alloc(0);
while (true) {
let tmp = integer & 0b01111111;
integer >>>= 7;
if (integer != 0) {
tmp |= 0b10000000;
}
buffer = Buffer.concat([buffer, Buffer.from([tmp])]);
if (integer <= 0) break;
}
return buffer;
}
Right now I am able to request a server status by sending the handshake packet and then the query packet and do receive a JSON response with the length of the response prepended as a VarInt.
However the issue is here, where I simply don't know how to safely identify the VarInt from the beginning of the JSON response (as it can be anywhere up to 5 byte) and decode it back to a readable num so I can get the proper length of the response byte stream.
[...] as with all strings this is prefixed by its length as a VarInt
(from the protocol documentation)
My current super hacky workaround is to concatenate the chunks as String until the concatenated string contains the same count of '{'s and '}'s (meaning a full json object) and slice the json response at the first '{' before parsing it.
However I am very unhappy with this hacky, inefficient, unelegant and possibly unreliable way of solving the issue and would rather decode the VarInt in front of the JSON response in order to get a proper length to compare against.
I don't know this protocol, but VarInt in protobuf are coded with the MSB bit:
Each byte in a varint, except the last byte, has the most significant
bit (msb) set – this indicates that there are further bytes to come.
The lower 7 bits of each byte are used to store the two's complement
representation of the number in groups of 7 bits, least significant
group first.
Note: Too long for a comment, so posting as an answer.
Update: I browsed a bit through the URL you gave, and it is indeed the ProtoBuf VarInt. It is also described there with pseudo-code:
https://wiki.vg/Protocol#VarInt_and_VarLong
VarInt and VarLong Variable-length format such that smaller numbers
use fewer bytes. These are very similar to Protocol Buffer Varints:
the 7 least significant bits are used to encode the value and the most
significant bit indicates whether there's another byte after it for
the next part of the number. The least significant group is written
first, followed by each of the more significant groups; thus, VarInts
are effectively little endian (however, groups are 7 bits, not 8).
VarInts are never longer than 5 bytes, and VarLongs are never longer
than 10 bytes.
Pseudocode to read and write VarInts and VarLongs:
Thanks to the reference material that #thst pointed me to, I was able to slap together a working way of reading VarInts in javascript.
function readVarInt(buffer) {
let value = 0;
let length = 0;
let currentByte;
while (true) {
currentByte = buffer[length];
value |= (currentByte & 0x7F) << (length * 7);
length += 1;
if (length > 5) {
throw new Error('VarInt exceeds allowed bounds.');
}
if ((currentByte & 0x80) != 0x80) break;
}
return value;
}
buffer must be a byte stream starting with the VarInt, ideally using the std Buffer class.

How do you decide which typed array to use?

I am trying to create a view of ArrayBuffer object in order to JSONify it.
var data = { data: new Uint8Array(arrayBuffer) }
var json = JSON.stringify(data)
It seems that the size of the ArrayBuffer does not matter even with the smallest Uint8Array. I did not get any RangeError so far:) If so, how do I decide which typed array to use?
You decide based on the data stored in the buffer, or, better said, based on your interpretation of that data.
Also an Uint8Array is not an 8 bit array, it's an array of unsigned 8 bit integers. It can have any length. A Uint8Array created from the same ArrayBuffer as a Uint16Array is going to be twice as long, because every byte in the ArrayBuffer is going to be "placed" as one element of the Uint8Array, while for the Uint16Array each pair of bytes is going to "become" one element in the array.
A good explanation of what happens is if we try thinking in binary. Try running this:
var buffer = new ArrayBuffer(2);
var uint8View = new Uint8Array(buffer);
var uint16View = new Uint16Array(buffer);
uint8View[0] = 2;
uint8View[1] = 1;
console.log(uint8View[0].toString(2));
console.log(uint8View[1].toString(2));
console.log(uint16View[0].toString(2));
The output is going to be
10
1
100000010
because displayed as an unsigned 8 bit integer in binary, 2 is 00000010 and 1 is 00000001. (toString strips leading zeroes).
Uint8Array represents an array of bytes. As I said, an element is an unsigned 8 bit integer. We just pushed two bytes to it.
In memory those two bytes are stored side by side as 00000001 00000010 (binary form again used to make things clearer).
Now when you initialize a Uint16Array over the same buffer it's going to contain the same bytes, but because an element is a unsigned 16 bit integer (two bytes), when you access uint16View[0] it's going to take the first two bytes and give them back to you. So 0000000100000010, which is 100000010 with no leading zeroes.
If you interpret this data as base 10 (decimal) integers you'll know it's 0000000100000010 to base 10 (258).
Neither Uint8Array nor Uint16Array store any data themselves. They are simply different ways of accessing bytes in an ArrayBuffer.
how one chooses which one to use? It's not based on preference but on the underlying data. ArrayBuffer is to be used when you receive some binary data from some external source (web socket maybe) and already know what the data represents. It might be a list of unsigned 8 bit integers, or one of signed 16 bit ones, or even a mixed list where you know the first element is an 8 bit integer and the next one is a 16 bit one. Then you can use DataView to read typed items from it.
If you don't know what the data represents you can't choose what to use.

Reassembling negative Python marshal int's into Javascript numbers

I'm writing a client-side Python bytecode interpreter in Javascript (specifically Typescript) for a class project. Parsing the bytecode was going fine until I tried out a negative number.
In Python, marshal.dumps(2) gives 'i\x02\x00\x00\x00' and marshal.dumps(-2) gives 'i\xfe\xff\xff\xff'. This makes sense as Python represents integers using two's complement with at least 32 bits of precision.
In my Typescript code, I use the equivalent of Node.js's Buffer class (via a library called BrowserFS, instead of ArrayBuffers and etc.) to read the data. When I see the character 'i' (i.e. buffer.readUInt8(offset) == 105, signalling that the next thing is an int), I then call readInt32LE on the next offset to read a little-endian signed long (4 bytes). This works fine for positive numbers but not for negative numbers: for 1 I get '1', but for '-1' I get something like '-272777233'.
I guess that Javascript represents numbers in 64-bit (floating point?). So, it seems like the following should work:
var longval = buffer.readInt32LE(offset); // reads a 4-byte long, gives -272777233
var low32Bits = longval & 0xffff0000; //take the little endian 'most significant' 32 bits
var newval = ~low32Bits + 1; //invert the bits and add 1 to negate the original value
//but now newval = 272826368 instead of -2
I've tried a lot of different things and I've been stuck on this for days. I can't figure out how to recover the original value of the Python integer from the binary marshal string using Javascript/Typescript. Also I think I deeply misunderstand how bits work. Any thoughts would be appreciated here.
Some more specific questions might be:
Why would buffer.readInt32LE work for positive ints but not negative?
Am I using the correct method to get the 'most significant' or 'lowest' 32 bits (i.e. does & 0xffff0000 work how I think it does?)
Separate but related: in an actual 'long' number (i.e. longer than '-2'), I think there is a sign bit and a magnitude, and I think this information is stored in the 'highest' 2 bits of the number (i.e. at number & 0x000000ff?) -- is this the correct way of thinking about this?
The sequence ef bf bd is the UTF-8 sequence for the "Unicode replacement character", which Unicode encoders use to represent invalid encodings.
It sounds like whatever method you're using to download the data is getting accidentally run through a UTF-8 decoder and corrupting the raw datastream. Be sure you're using blob instead of text, or whatever the equivalent is for the way you're downloading the bytecode.
This got messed up only for negative values because positive values are within the normal mapping space of UTF-8 and thus get translated 1:1 from the original byte stream.

Is the smallest usable memory size 2 bytes, In javascript?

To represent a single character in JS, we use " or ', i.e use a string of length 1.
But since JavaScript uses the UTF-16 encoding, the character would be 16 bit as well.
Would it be safe to thus say, 2 bytes is the smallest re presentable size, that is possible in JS.
Or would there be some data structure, which can represent a single byte (8 bit) which I am missing.
EDIT:
I understand using the datatype with word size (usually 4 bytes) is always the most efficient, but I just wondered if the creators of JS would have cared to include something in.
Actionscript 3 derives from ECMAScript as well & it includes a variable size data type i.e an integer in AS3 is 1 to 4 bytes of storage.
Note : I'm not really looking for a hack to fit in multiple bytes into a larger datatype.
Guess the answer is most probably no.
Well, not in ECMAScript but in node.js you have Buffer and in browsers you have Uint8Array
var b = new UInt8Array(16); //Holds 16 bytes
Technically, you can have an array of 8 booleans - but there might be significant overhead in storing the array itself.
Taking a page from c-style unions, sure, you can create your own data structure which does just that. Using a large int as a bitmap.
var memory = 0;
function setbyte(addr, value) {
memory = memory | (value << addr);
}
function getbyte(addr) {
return 0xFF & (memory>>addr)
}

Dealing With Binary / Bitshifts in JavaScript

I am trying to perform some bitshift operations and dealing with binary numbers in JavaScript.
Here's what I'm trying to do. A user inputs a value and I do the following with it:
// Square Input and mod with 65536 to keep it below that value
var squaredInput = (inputVal * inputVal) % 65536;
// Figure out how many bits is the squared input number
var bits = Math.floor(Math.log(squaredInput) / Math.log(2)) + 1;
// Convert that number to a 16-bit number using bitshift.
var squaredShifted = squaredInput >>> (16 - bits);
As long as the number is larger than 46, it works. Once it is less than 46, it does not work.
I know the problem is the in bitshift. Now coming from a C background, I know this would be done differently, since all numbers will be stored in 32-bit format (given it is an int). Does JavaScript do the same (since it vars are not typed)?
If so, is it possible to store a 16-bit number? If not, can I treat it as 32-bits and do the required calculations to assume it is 16-bits?
Note: I am trying to extract the middle 4-bits of the 16-bit value in squaredInput.
Another note: When printing out the var, it just prints out the value without the padding so I couldn't figure it out. Tried using parseInt and toString.
Thanks
Are you looking for this?
function get16bitnumber( inputVal ){
return ("0000000000000000"+(inputVal * inputVal).toString(2)).substr(-16);
}
This function returns last 16 bits of (inputVal*inputVal) value.By having binary string you could work with any range of bits.
Don't use bitshifting in JS if you don't absolutely have to. The specs mention at least four number formats
IEEE 754
Int32
UInt32
UInt16
It's really confusing to know which is used when.
For example, ~ applies a bitwise inversion while converting to Int32. UInt16 seems to be used only in String.fromCharCode. Using bitshift operators converts the operands to either UInt32 or to Int32.
In your case, the right shift operator >>> forces conversion to UInt32.
When you type
a >>> b
this is what you get:
ToUInt32(a) >>> (ToUInt32(b) & 0x1f)

Categories

Resources