Beginner here trying to understand on a low-level how websockets work. I am trying to create my own implementation, however I am very confused on the logic of parsing the data frame that get's sent from client => server.
I know the buffer that is received on the server side consists of multiple bytes, with the first two being the main header information (fin bit, length, opcode, mask, etc).
I found the following code on SO that parses both the bytes, and from testing, it DOES indeed return the correct values.
let index = 0;
frame = {
data: new Buffer(0),
fin: (buffer[index] & 128) === 128,
length: buffer[index + 1] & 127,
masked: (buffer[index + 1] & 128) === 128,
opcode: buffer[index] & 15
}
What my main question is though.... HOW exactly is this returning the correct values?
I know buffer[index] and buffer[index+1] are referring to the first and second byte, and the AND operand is being used to compare the binary values of each, and output 1 whenever both indexes in both numbers equal to 1, otherwise 0...... but...
Where do the numbers after the & operator come from? ex: opcode is 15, length is 127.
HOW exactly does using the AND operator on both these values, give the right result? This is what I really don't understand.
I apologize if this is basic computer science concepts that I'm not understanding, but if anyone out there is able to explain to me what exactly is occurring with this code, it would be so much appreciated.
I get that it looks like a normal AND comparison but rather it is a boolean AND comparison being made.
To clarify a bit more specific, buffer[index] & 15 for opcode says compare buffer[index] as a binary number with 15(this is the highest allowed opcode for websockets) as a binary number bit by bit and return the binary results as an integer, the opcode itself tells which frame type is being sent.(If you are curious you can deep dive on this here https://www.rfc-editor.org/rfc/rfc6455#section-11.8.)
On the length part of 127 I refer to this answer on SO since it is a solid answer: how to work out payload size from html5 websocket
For further reading on the operator see the section in my source on bitwise logical operators.
Source: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Expressions_and_Operators
Related
I am writing a small utility library for me to request server status of a given minecraft host in js on node. I am using the Server List Ping Protocol as outlined here (https://wiki.vg/Server_List_Ping) and got it mostly working as expected, albeit having big trouble working with unsupported Data Types (VarInt) and had to scour the internet to find a way of converting js nums into VarInts in order to craft the necessary packet buffers:
function toVarIntBuffer(integer) {
let buffer = Buffer.alloc(0);
while (true) {
let tmp = integer & 0b01111111;
integer >>>= 7;
if (integer != 0) {
tmp |= 0b10000000;
}
buffer = Buffer.concat([buffer, Buffer.from([tmp])]);
if (integer <= 0) break;
}
return buffer;
}
Right now I am able to request a server status by sending the handshake packet and then the query packet and do receive a JSON response with the length of the response prepended as a VarInt.
However the issue is here, where I simply don't know how to safely identify the VarInt from the beginning of the JSON response (as it can be anywhere up to 5 byte) and decode it back to a readable num so I can get the proper length of the response byte stream.
[...] as with all strings this is prefixed by its length as a VarInt
(from the protocol documentation)
My current super hacky workaround is to concatenate the chunks as String until the concatenated string contains the same count of '{'s and '}'s (meaning a full json object) and slice the json response at the first '{' before parsing it.
However I am very unhappy with this hacky, inefficient, unelegant and possibly unreliable way of solving the issue and would rather decode the VarInt in front of the JSON response in order to get a proper length to compare against.
I don't know this protocol, but VarInt in protobuf are coded with the MSB bit:
Each byte in a varint, except the last byte, has the most significant
bit (msb) set – this indicates that there are further bytes to come.
The lower 7 bits of each byte are used to store the two's complement
representation of the number in groups of 7 bits, least significant
group first.
Note: Too long for a comment, so posting as an answer.
Update: I browsed a bit through the URL you gave, and it is indeed the ProtoBuf VarInt. It is also described there with pseudo-code:
https://wiki.vg/Protocol#VarInt_and_VarLong
VarInt and VarLong Variable-length format such that smaller numbers
use fewer bytes. These are very similar to Protocol Buffer Varints:
the 7 least significant bits are used to encode the value and the most
significant bit indicates whether there's another byte after it for
the next part of the number. The least significant group is written
first, followed by each of the more significant groups; thus, VarInts
are effectively little endian (however, groups are 7 bits, not 8).
VarInts are never longer than 5 bytes, and VarLongs are never longer
than 10 bytes.
Pseudocode to read and write VarInts and VarLongs:
Thanks to the reference material that #thst pointed me to, I was able to slap together a working way of reading VarInts in javascript.
function readVarInt(buffer) {
let value = 0;
let length = 0;
let currentByte;
while (true) {
currentByte = buffer[length];
value |= (currentByte & 0x7F) << (length * 7);
length += 1;
if (length > 5) {
throw new Error('VarInt exceeds allowed bounds.');
}
if ((currentByte & 0x80) != 0x80) break;
}
return value;
}
buffer must be a byte stream starting with the VarInt, ideally using the std Buffer class.
Hello dear swarm intelligence,
One of my current private projects is in the field of the internet of things, specifically LoRaWan and TTN. For easy data-handling I decided to use node-red which is a node-js based flow tool to process the received data.
This is the first time ever I have encoutered contact with the javascript world (apart from minor reading ;)). Here's the problem:
I am transmitting an C-Style int16_t signed type devided into two 8-bit nibbles via ttn. On the receiving site I want to merge these two nibbles again into a signed 16 bit type. Well the problem is that javascript only supports 32-bit intergers which means by simply mergin them via bitwise operations like this:
newMsg.payload=(msg.payload[1]<<8)|(msg.payload[0]);
I lose the signed information and just get the unsigned interpretation of the data, since it is not stored in a 32-bit two's complement.
Since I am not yet firmly familiar with the javascript "standard library" this seems like a hard problem for me!
Any help will be appreciated
var unsignedValue = (msg.payload[1] << 8) | (msg.payload[0]);
if (result & 0x8000) {
// If the sign bit is set, then set the two first bytes in the result to 0xff.
newMsg.payload = unsignedValue | 0xffff0000;
} else {
// If the sign bit is not set, then the result is the same as the unsigned value.
newMsg.payload = unsignedValue;
}
Note that this still stores the value as a signed 32-bit integer, but with the right value.
After two years coming back to this/another topic where I see people discussing the same; I still don't understand what is going on.
following this SO post:
String length in bytes in JavaScript
I want to understand this part of javascript! I am also interested in calculating the kb size of a bitcoin transaction before I push it to the blockchain. The more important of the two though is that I finally understand what these users are doing because its come up more than once and I just don't get it!
I've tried three of the functions outlined as answers but they all seem to do nothing more than return the string.length whereas I would expect them to return a different value (the overhead of the string in bytes/kilobytes/megabytes)
function byteCount(s) {
return encodeURI(s).split(/%..|./).length - 1;
}
console.log(byteCount('hello'),'hello'.length);//5,5
function getLengthInBytes(str) {
var b = str.match(/[^\x00-\xff]/g);
return (str.length + (!b ? 0: b.length));
}
console.log(getLengthInBytes('hello'),'hello'.length);//5,5
console.log((new TextEncoder('utf-8').encode('hello')).length,'hello'.length);//5,5
It's annoying that this makes no sense to me! Clearly these people would not be talking about how to get something that they can easily get with string.length so what are they trying and succeeding in returning?
Should the string instead be binary? (like so: How to convert text to binary code in JavaScript?)
There are a lot of different signs in the world.
They dont fit in one byte of data. Thats why some chars use more than one byte of data.
Some examples: "Äüöôś"
You are testing with the base ascii characters (well, they are utf8, but you can think of them a little like ascii and these characters work very similarly in both encodings). Try with an extended character.
console.log((new TextEncoder('utf-8').encode('😁')).length, '😁'.length);
I'm looking for help in understanding this line of code in the npm moudle hash-index.
The purpose of this module is to be a function which returns the sha-1 hash of an input mod by the second argument you pass.
The specific function in this module that I don't understand is this one that takes a Buffer as input and returns an integer:
var toNumber = function (buf) {
return buf.readUInt16BE(0) * 0xffffffff + buf.readUInt32BE(2)
}
I can't seem to figure out why those specific offsets of the buffer are chosen and what the purpose of multiplying by 0xffffffff is.
This module is really interesting to me and any help in understanding how it's converting buffers to integers would be greatly appreciated!
It prints the first UINT32 (Unsigned Integer 32 bits) in the buffer.
First, it reads the first two bytes (UINT16) of the buffer, using Big Endian, then, it multiplies it by 0xFFFFFFFF.
Then, it reads the second four bytes (UINT32) in the buffer, and adds it to the multiplied number - resulting in a number constructed from the first 6 bytes of the buffer.
Example: Consider [Buffer BB AA CC CC DD ... ]
0xbb * 0xffffffff = 0xbaffffff45
0xbaffffff45 + 0xaaccccdd = 0xbbaacccc22
And regarding the offsets, it chose that way:
First time, it reads from byte 0 to byte 1 (coverts to type - UINT16)
second time, it reads from byte 2 to byte 5 (converts to type - UINT32)
So to sum it up, it constructs a number from the first 6 bytes of the buffer using big endian notation, and returns it to the calling function.
Hope that's answers your question.
Wikipedia's Big Endian entry
EDIT
As someone pointed in the comments, I was totally wrong about 0xFFFFFFFF being a left-shift of 32, it's just a number multiplication - I'm assuming it's some kind of inner protocol to calculate a correct legal buffer header that complies with what they expect.
EDIT 2
After looking on the function in the original context, I've come to this conclusion:
This function is a part of a hashing flow, and it works in that manner:
Main flow receives a string input and a maximum number for the hash output, it then takes the string input, plugs it in the SHA-1 hashing function.
SHA-1 hashing returns a Buffer, it takes that Buffer, and applies the hash-indexing on it, as can be seen in the following code excerpt:
return toNumber(crypto.createHash('sha1').update(input).digest()) % max
Also, it uses a modulu to make sure the hash index returned doesn't exceed the maximum possible hash.
Multiplication by 2 is equivalent to a shift of bits to the left by 1, so the purpose of multiplying by 2^16 is the equivalent of shifting the bits left 16 times.
Here is a similar question already answered:
Bitwise Logic in C
I'm writing a client-side Python bytecode interpreter in Javascript (specifically Typescript) for a class project. Parsing the bytecode was going fine until I tried out a negative number.
In Python, marshal.dumps(2) gives 'i\x02\x00\x00\x00' and marshal.dumps(-2) gives 'i\xfe\xff\xff\xff'. This makes sense as Python represents integers using two's complement with at least 32 bits of precision.
In my Typescript code, I use the equivalent of Node.js's Buffer class (via a library called BrowserFS, instead of ArrayBuffers and etc.) to read the data. When I see the character 'i' (i.e. buffer.readUInt8(offset) == 105, signalling that the next thing is an int), I then call readInt32LE on the next offset to read a little-endian signed long (4 bytes). This works fine for positive numbers but not for negative numbers: for 1 I get '1', but for '-1' I get something like '-272777233'.
I guess that Javascript represents numbers in 64-bit (floating point?). So, it seems like the following should work:
var longval = buffer.readInt32LE(offset); // reads a 4-byte long, gives -272777233
var low32Bits = longval & 0xffff0000; //take the little endian 'most significant' 32 bits
var newval = ~low32Bits + 1; //invert the bits and add 1 to negate the original value
//but now newval = 272826368 instead of -2
I've tried a lot of different things and I've been stuck on this for days. I can't figure out how to recover the original value of the Python integer from the binary marshal string using Javascript/Typescript. Also I think I deeply misunderstand how bits work. Any thoughts would be appreciated here.
Some more specific questions might be:
Why would buffer.readInt32LE work for positive ints but not negative?
Am I using the correct method to get the 'most significant' or 'lowest' 32 bits (i.e. does & 0xffff0000 work how I think it does?)
Separate but related: in an actual 'long' number (i.e. longer than '-2'), I think there is a sign bit and a magnitude, and I think this information is stored in the 'highest' 2 bits of the number (i.e. at number & 0x000000ff?) -- is this the correct way of thinking about this?
The sequence ef bf bd is the UTF-8 sequence for the "Unicode replacement character", which Unicode encoders use to represent invalid encodings.
It sounds like whatever method you're using to download the data is getting accidentally run through a UTF-8 decoder and corrupting the raw datastream. Be sure you're using blob instead of text, or whatever the equivalent is for the way you're downloading the bytecode.
This got messed up only for negative values because positive values are within the normal mapping space of UTF-8 and thus get translated 1:1 from the original byte stream.