I have for example here this string
'x���'
Which you may possibly not see depending on the devide you're using. That's the number 2024000250 encoded as a 32 bit signed big endian integer, which I've generated using Node
let buffer = new Buffer(4);
b.writeInt32BE(2024000250).toString();
I'm receiving the 4 bytes in question on the client side but I can't seem to find how to turn them back into an integer...
I might be dead wrong here. But as far as I remember unicode characters can be between 2-4 bytes. When you transfer your binary data as text to client-side you risk corrupting this information because the client is going to interpret them as unicode.
If I were to convert that text to a blob on client side:
var b = new Blob(['x���'],{type:"application/octet-stream"});
b.size; //10
As you can see I receive 10 bytes, which is wrong, it should have been 4.
You can transfer the data directly as a binary string, since you are using Node, on the server side:
function createBuffer(v){
var b = new ArrayBuffer(4),
vw = new DataView(b);
vw.setInt32(0,v);
return b;
}
This will create your buffer, now you cannot just send this as it is to client, either represent it as a json or directly as a binary string. To represent it as binary you don't need the above function, you could have done:
("0".repeat(32) + (2024000250).toString(2)).slice(-32); //"01111000101000111100101011111010"
If you want json, you can do:
function convertBuffToBinaryStr(buff){
var res = [],
l = buff.byteLength,
v = new DataView(buff);
for (var i = 0; i < l; ++i){
res.push(v.getUint8(i));
}
return JSON.stringify(res);
}
Now try seeing what this outputs:
convertBuffToBinaryStr(createBuffer(2024000250)); //"[120,163,202,250]"
Back on the client-side you have to interpret this:
function interpret(json){
json = JSON.parse(json);
return parseInt(json.map((d)=>("0".repeat(8) + d.toString(2)).slice(-8)).join(""),2);
}
Now try:
interpret("[120,163,202,250]"); //2024000250
Note: For your interpret function, you have to use dataView to setUint8 and the use getInt32 at the end, since you are using signed integers, above won't work for all cases.
Well I finally got around to getting this to work.
It is not quite what I started off with but I'm just gonna post this for some other lost souls.
It is worth mentioning that ibrahim's answer contains most of the necessary information but is trying to satisfy the XY problem which my question ended up being.
I just send my binary data, as binary
let buffer = new Buffer(4);
buffer.writeInt32BE(2024000250);
// websocket connection
connection.send(buffer);
Then in the browser
// message listener
let reader = new FileReader();
reader.addEventListener('loadend', () => {
let view = new DataView(reader.result);
// there goes the precious data
console.log(view.getInt32());
});
reader.readAsArrayBuffer(message.data);
In all honesty this tickles my gag reflex. Why am I using a file reader to get some data out of a binary message? There is a good chance a better way of doing this exists, if so please add to this answer.
Other methods I found are the fetch API which is no better than the file reader in terms of hack rating and Blob.prototype.arrayBuffer which is not yet supported fully.
Related
I got the following Javascript code and I need to convert it to Python(I'm not an expert in hashing so sorry for my knowledge on this subject)
function generateAuthHeader(dataToSign) {
let apiSecretHash = new Buffer("Rbju7azu87qCTvZRWbtGqg==", 'base64');
let apiSecret = apiSecretHash.toString('ascii');
var hash = CryptoJS.HmacSHA256(dataToSign, apiSecret);
return hash.toString(CryptoJS.enc.Base64);
}
when I ran generateAuthHeader("abc") it returned +jgBeooUuFbhMirhh1KmQLQ8bV4EXjRorK3bR/oW37Q=
So I tried writing the following Python code:
def generate_auth_header(data_to_sign):
api_secret_hash = bytearray(base64.b64decode("Rbju7azu87qCTvZRWbtGqg=="))
hash = hmac.new(api_secret_hash, data_to_sign.encode(), digestmod=hashlib.sha256).digest()
return base64.b64encode(hash).decode()
But when I ran generate_auth_header("abc") it returned a different result aOGo1XCa5LgT1CIR8C1a10UARvw2sqyzWWemCJBJ1ww=
Can someone tell me what is wrong with my Python code and what I need to change?
The base64 is the string I generated myself for this post
UPDATE:
this is the document I'm working with
//Converting the Rbju7azu87qCTvZRWbtGqg== (key) into byte array
//Converting the data_to_sign into byte array
//Generate the hmac signature
it seems like apiSecretHash and api_secret_hash is different, but I don't quite understand as the equivalent of new Buffer() in NodeJS is bytearray() in python
It took me 2 days to look it up and ask for people in python discord and I finally got an answer. Let me summarize the problems:
API secret hash from both return differents hash of the byte array
javascript
Javascript
apiSecret = "E8nm,ns:\u0002NvQY;F*"
Python
api_secret_hash = b'E\xb8\xee\xed\xac\xee\xf3\xba\x82N\xf6QY\xbbF\xaa'
once we replaced the hash with python code it return the same result
def generate_auth_header(data_to_sign):
api_secret_hash = "E8nm,ns:\u0002NvQY;F*".encode()
hash = hmac.new(api_secret_hash, data_to_sign.encode(), digestmod=hashlib.sha256).digest()
return base64.b64encode(hash).decode()
encoding for ASCII in node.js you can find here https://github.com/nodejs/node/blob/a2a32d8beef4d6db3a8c520572e8a23e0e51a2f8/src/string_bytes.cc#L636-L647
case ASCII:
if (contains_non_ascii(buf, buflen)) {
char* out = node::UncheckedMalloc(buflen);
if (out == nullptr) {
*error = node::ERR_MEMORY_ALLOCATION_FAILED(isolate);
return MaybeLocal<Value>();
}
force_ascii(buf, out, buflen);
return ExternOneByteString::New(isolate, out, buflen, error);
} else {
return ExternOneByteString::NewFromCopy(isolate, buf, buflen, error);
}
there is this force_ascii() function that is called when the data contains non-ASCII characters which is implemented here https://github.com/nodejs/node/blob/a2a32d8beef4d6db3a8c520572e8a23e0e51a2f8/src/string_bytes.cc#L531-L573
so we need to check for the hash the same as NodeJS one, so we get the final version of the Python code:
def generate_auth_header(data_to_sign):
# convert to bytearray so the for loop below can modify the values
api_secret_hash = bytearray(base64.b64decode("Rbju7azu87qCTvZRWbtGqg=="))
# "force" characters to be in ASCII range
for i in range(len(api_secret_hash)):
api_secret_hash[i] &= 0x7f;
hash = hmac.new(api_secret_hash, data_to_sign.encode(), digestmod=hashlib.sha256).digest()
return base64.b64encode(hash).decode()
now it returned the same result as NodeJS one
Thank you Mark from the python discord for helping me understand and fix this!
Hope anyone in the future trying to convert byte array from javascript to python know about this different of NodeJS Buffer() function
I have a kafka stream with avro message using the confluent.io kafka package. This is working fine for the java applications. But I am now trying to read in these messages in javascript.
I've been attempting to use the kafka-node + avsc packages to decode the messages from a buffer array to string, using the schema. I know that confluent puts the first 5 bytes as a magic byte (0) + 4 bytes for the schema Id.
So I slice the Buffer to remove those bytes and attempt to send this to avsc to decode. But I get an error
return this.buf.utf8Slice(pos, pos + len);
RangeError: out of range index
at RangeError (native)
at Tap.readString (C:\git\workflowapps\workItemsApp\node_modules\avsc\lib\utils.js:452:19)
at StringType._read (C:\git\workflowapps\workItemsApp\node_modules\avsc\lib\types.js:612:58)
Also attempting to manually decode this leaves lots of non-utf8 characters and I am losing data that way.
Sample Code:
consumer.on('message', function(message) {
var val = message.value.slice(4);
sails.log.info('val buffer', val, val.length);
sails.log.info('hex',val.toString('hex'));
var type = avro.parse({"type":"record",
"name":"StatusEvent",
"fields":[{"name":"ApplicationUUID","type":"string"},
{"name":"StatusUUID","type":"string"},
{"name":"Name","type":"string"},
{"name":"ChangedBy","type":"string"},
{"name":"ChangedByUUID","type":"string"},
{"name":"ChangedAt","type":"long"}]
});
var decodedValue = type.fromBuffer(val);
sails.log.info('Decoded', decodedValue);
});
Your slice(4) should be slice(5) (otherwise you're only skipping 4 out of the 5 header bytes). You might also be able to find helpful information here.
I am writing a mat file parser using jBinary, which is built on top of jDataView. I have a working parser with lots of tests, but it runs very slowly for moderately sized data sets of around 10 MB. I profiled with look and found that a lot of time is spent in tagData. In the linked tagData code, ints/uints/single/doubles/whatever are read one by one from the file and pushed to an array. Obviously, this isn't super-efficient. I want to replace this code with a typed array view of the underlying bytes to remove all the reading and pushing.
I have started to migrate the code to use typed arrays as shown below. The new code preserves the old functionality for all types except 'miINT8'. The new functionality tries to view the buffer b starting at offset s and with length l, consistent with the docs. I have confirmed that the s being passed to the Int8Array constructor is non-zero, even going to far as to hard code it to 5. In all cases, the output of console.log(elems.byteOffset) is 0. In my tests, I can see that the Int8Array is indeed starting from the beginning of the buffer and not at offset s as I intend.
What am I doing wrong? How do I get the typed array to start at position s instead of position 0? I have tested this on node.js version 10.25 as well as 12.0 with the same results in each case. Any guidance appreciated as I'm totally baffled by this one!
tagData: jBinary.Template({
baseType: ['array', 'type'],
read: function (ctx) {
var view = this.binary.view
var b = view.buffer
var s = view.tell()
var l = ctx.tag.numBytes
var e = s + l
var elems
switch (ctx.tag.type) {
case 'miINT8':
elems = new Int8Array(b,s,l); view.skip(l); console.log(elems.byteOffset); break;
default:
elems = []
while (view.tell() < e && view.tell() < view.byteLength) {
elems.push(this.binary.read(ctx.tag.type))
}
}
return elems
}
}),
I have a binary data file where each x bytes is a record, and I have some format/mask (however you prefer to see it) to decipher that data. It's like, short short int short float double, blah blah. So I'm reading this file with the File API, I'll need to be using ArrayBuffers eventually but I'm not there yet... So my question is two fold. Firstly, and most directly, what is the best way to read in every x bytes from a binary file into an ArrayBuffer?
Secondly, as I'm running into some problems... why is the below script filling 5gb+ of RAM nearly immediately when reading a 500kb binary file?
$('input[type="file"]').change(function(event) {
// FileList object
var files = event.target.files;
for (var i = 0, f; f = files[i]; i++) {
var reader = new FileReader();
// closures and magnets, how do they work
reader.onload = (function(f) {
return function(event) {
// data file starts with header XML
// indexOf +9 for </HEADER> and +1 for null byte
var data_start = event.target.result.indexOf('</HEADER>')+10,
// leverage jQuery for XML
header = $(event.target.result.slice(0,data_start)),
rec_len = parseInt(header.find('REC_LEN').text(),10);
// var ArrayBuffer
// define ArrayBufferView
// loop through records
for (var i = data_start; i<event.target.result.length; i+=rec_len) {
// fill ArrayBuffer
// add data to global data []
console.log(i+' : '+event.target.result.slice(i, i+rec_len));
}
};
})(f);
// Read as Binary
reader.readAsBinaryString(f);
}
});
A couple general tips at least:
Using a DataView is flexible but a bit slow -- it should be faster than parseInt called on strings but not as fast as array views. The upside is that it supports different byte orders, if your binary data requires. Use reader.readAsArrayBuffer(f), then in your onload callback, use something like
var dv = new DataView(arrayBuffer, [startCoord, [endCoord]]),
result = [];
// in some loop for i...
result[i] = [];
result[i].push(dv.getInt8(coord));
// coord += offset;
result[i].push(dv.getFloat32(coord));
// end some loop
As I mentioned, faster would be to create multiple views on on the ArrayBuffer, but you can't (to my knowledge) change the cursor position as you go -- so your mixed data types will be an issue.
To put the results into a typed array, just declare something like var col1 = Uint8Array(length);. The typed array subclasses are listed here. Note that in my experience, typed arrays don't gain you much in terms of performance. Google around for some jsperf tests of typed arrays.
Im having some trouble with receiving packet data from a TCP stream. I think this is due in part to not understanding the servers responses.
my code (objective c):
unsigned type=0;
unsigned bufferFirstByte=0;
unsigned bufferSecondByte=0;
unsigned bufferThirdByte=0;
NSScanner *hexToInt = [NSScanner scannerWithString:[mutableBuffer objectAtIndex:0]];
[hexToInt scanHexInt:&bufferFirstByte];
hexToInt = [NSScanner scannerWithString:[mutableBuffer objectAtIndex:1]];
[hexToInt scanHexInt:&bufferSecondByte];
hexToInt = [NSScanner scannerWithString:[mutableBuffer objectAtIndex:2]];
[hexToInt scanHexInt:&bufferThirdByte];
hexToInt = [NSScanner scannerWithString:[mutableBuffer objectAtIndex:0]];
[hexToInt scanHexInt:&type];
int len = (bufferSecondByte<<8)+bufferSecondByte;
if (![mutableBuffer count]<(3+len)) {
NSArray *payload = [mutableBuffer subarrayWithRange:NSMakeRange(2,([mutableBuffer count] - 2))];
NSLog(#"length %d",len);
[self processReceive:type length:len payload:payload];
}
is some what modelled from this javascript code:
self.receive = function (itemName, data) {
self.log("Receiving: " + self.toHex(data));
self.ourData += data;
while (self.ourData.length >= 3) {
var type = self.ourData.charCodeAt(0);
var len = (self.ourData.charCodeAt(1) << 8) + self.ourData.charCodeAt(2);
if (self.ourData.length < (3 + len)) { // sanity check: buffer doesn't contain all the data advertised in the packet
break;
}
var payload = self.ourData.substr(3,len);
self.ourData = self.ourData.substr(3 + len);
self.processMessage(type, len, payload); // process payload
}
};
The reason for the modeling is that the command fusion javascript project is talking to the same server I am (a crestron controller).
However I could never get the len thing to work and I think thats whats causing my problem. When looking at a sample packet (05:00:06:00:00:03:00:52:00) the len would equal 1280 (see math above) even though the data portion is only 9bytes.
Currently my code will work but it misses certain data. This happens because of the streaming that TCP does (some packets are conjoined while others are fragmented). But without knowing the data segment size I cannot fix the issue and I believe the answer to that is the len variable. But I dont see how to properly implement it.
My question comes down to this. How can I determine the size of the data segment from this len variable or control my receive method to only except one data segment at a time (which from my research is not possible since TCP is made as a stream)?
I have a feeling there will be questions so Im going to attempt to answer a few of them here.
A. How do you come up with 1280: look at the math in the method ((self.ourData.charCodeAt(1) << 8) + self.ourData.charCodeAt(2);) (5<<8)+0=1280d
B. Why are you using different indexes:
You will notice that the index for what data goes where (payload, len, type). This is merely because they have their payload/data bytes as a string and myn is an array. in the end it is the same data being referenced
Use the following logic:
1) Do you have enough bytes to determine the length? If not, receive some more bytes and try again.
2) Do you have enough bytes to have one complete unit? If not, receive some more bytes and try again.
3) Extract one complete unit using the decoded length from step 1 and process it.
4) If we have no leftover bytes, we are done.
5) Return to step 1 to process the leftover bytes.
ok so i got some help from (this group) which you might not be able to see without a login for the group. in any case there is a 3 byte header. so my len which is 6 and not 1280 like i thought is actually 9 once the 3 is added for the header. and this gets me the value i was loooking for (9) since the data segment is 9bytes.
Thanks for the suggestions david, one up for some good basic knowledge.