Out of range index this.buf.utf8Slice - javascript

I have a kafka stream with avro message using the confluent.io kafka package. This is working fine for the java applications. But I am now trying to read in these messages in javascript.
I've been attempting to use the kafka-node + avsc packages to decode the messages from a buffer array to string, using the schema. I know that confluent puts the first 5 bytes as a magic byte (0) + 4 bytes for the schema Id.
So I slice the Buffer to remove those bytes and attempt to send this to avsc to decode. But I get an error
return this.buf.utf8Slice(pos, pos + len);
RangeError: out of range index
at RangeError (native)
at Tap.readString (C:\git\workflowapps\workItemsApp\node_modules\avsc\lib\utils.js:452:19)
at StringType._read (C:\git\workflowapps\workItemsApp\node_modules\avsc\lib\types.js:612:58)
Also attempting to manually decode this leaves lots of non-utf8 characters and I am losing data that way.
Sample Code:
consumer.on('message', function(message) {
var val = message.value.slice(4);
sails.log.info('val buffer', val, val.length);
sails.log.info('hex',val.toString('hex'));
var type = avro.parse({"type":"record",
"name":"StatusEvent",
"fields":[{"name":"ApplicationUUID","type":"string"},
{"name":"StatusUUID","type":"string"},
{"name":"Name","type":"string"},
{"name":"ChangedBy","type":"string"},
{"name":"ChangedByUUID","type":"string"},
{"name":"ChangedAt","type":"long"}]
});
var decodedValue = type.fromBuffer(val);
sails.log.info('Decoded', decodedValue);
});

Your slice(4) should be slice(5) (otherwise you're only skipping 4 out of the 5 header bytes). You might also be able to find helpful information here.

Related

NodeJS convert to Byte Array code return different results compare to python

I got the following Javascript code and I need to convert it to Python(I'm not an expert in hashing so sorry for my knowledge on this subject)
function generateAuthHeader(dataToSign) {
let apiSecretHash = new Buffer("Rbju7azu87qCTvZRWbtGqg==", 'base64');
let apiSecret = apiSecretHash.toString('ascii');
var hash = CryptoJS.HmacSHA256(dataToSign, apiSecret);
return hash.toString(CryptoJS.enc.Base64);
}
when I ran generateAuthHeader("abc") it returned +jgBeooUuFbhMirhh1KmQLQ8bV4EXjRorK3bR/oW37Q=
So I tried writing the following Python code:
def generate_auth_header(data_to_sign):
api_secret_hash = bytearray(base64.b64decode("Rbju7azu87qCTvZRWbtGqg=="))
hash = hmac.new(api_secret_hash, data_to_sign.encode(), digestmod=hashlib.sha256).digest()
return base64.b64encode(hash).decode()
But when I ran generate_auth_header("abc") it returned a different result aOGo1XCa5LgT1CIR8C1a10UARvw2sqyzWWemCJBJ1ww=
Can someone tell me what is wrong with my Python code and what I need to change?
The base64 is the string I generated myself for this post
UPDATE:
this is the document I'm working with
//Converting the Rbju7azu87qCTvZRWbtGqg== (key) into byte array
//Converting the data_to_sign into byte array
//Generate the hmac signature
it seems like apiSecretHash and api_secret_hash is different, but I don't quite understand as the equivalent of new Buffer() in NodeJS is bytearray() in python
It took me 2 days to look it up and ask for people in python discord and I finally got an answer. Let me summarize the problems:
API secret hash from both return differents hash of the byte array
javascript
Javascript
apiSecret = "E8nm,ns:\u0002NvQY;F*"
Python
api_secret_hash = b'E\xb8\xee\xed\xac\xee\xf3\xba\x82N\xf6QY\xbbF\xaa'
once we replaced the hash with python code it return the same result
def generate_auth_header(data_to_sign):
api_secret_hash = "E8nm,ns:\u0002NvQY;F*".encode()
hash = hmac.new(api_secret_hash, data_to_sign.encode(), digestmod=hashlib.sha256).digest()
return base64.b64encode(hash).decode()
encoding for ASCII in node.js you can find here https://github.com/nodejs/node/blob/a2a32d8beef4d6db3a8c520572e8a23e0e51a2f8/src/string_bytes.cc#L636-L647
case ASCII:
if (contains_non_ascii(buf, buflen)) {
char* out = node::UncheckedMalloc(buflen);
if (out == nullptr) {
*error = node::ERR_MEMORY_ALLOCATION_FAILED(isolate);
return MaybeLocal<Value>();
}
force_ascii(buf, out, buflen);
return ExternOneByteString::New(isolate, out, buflen, error);
} else {
return ExternOneByteString::NewFromCopy(isolate, buf, buflen, error);
}
there is this force_ascii() function that is called when the data contains non-ASCII characters which is implemented here https://github.com/nodejs/node/blob/a2a32d8beef4d6db3a8c520572e8a23e0e51a2f8/src/string_bytes.cc#L531-L573
so we need to check for the hash the same as NodeJS one, so we get the final version of the Python code:
def generate_auth_header(data_to_sign):
# convert to bytearray so the for loop below can modify the values
api_secret_hash = bytearray(base64.b64decode("Rbju7azu87qCTvZRWbtGqg=="))
# "force" characters to be in ASCII range
for i in range(len(api_secret_hash)):
api_secret_hash[i] &= 0x7f;
hash = hmac.new(api_secret_hash, data_to_sign.encode(), digestmod=hashlib.sha256).digest()
return base64.b64encode(hash).decode()
now it returned the same result as NodeJS one
Thank you Mark from the python discord for helping me understand and fix this!
Hope anyone in the future trying to convert byte array from javascript to python know about this different of NodeJS Buffer() function

Browser read integer from binary string

I have for example here this string
'x���'
Which you may possibly not see depending on the devide you're using. That's the number 2024000250 encoded as a 32 bit signed big endian integer, which I've generated using Node
let buffer = new Buffer(4);
b.writeInt32BE(2024000250).toString();
I'm receiving the 4 bytes in question on the client side but I can't seem to find how to turn them back into an integer...
I might be dead wrong here. But as far as I remember unicode characters can be between 2-4 bytes. When you transfer your binary data as text to client-side you risk corrupting this information because the client is going to interpret them as unicode.
If I were to convert that text to a blob on client side:
var b = new Blob(['x���'],{type:"application/octet-stream"});
b.size; //10
As you can see I receive 10 bytes, which is wrong, it should have been 4.
You can transfer the data directly as a binary string, since you are using Node, on the server side:
function createBuffer(v){
var b = new ArrayBuffer(4),
vw = new DataView(b);
vw.setInt32(0,v);
return b;
}
This will create your buffer, now you cannot just send this as it is to client, either represent it as a json or directly as a binary string. To represent it as binary you don't need the above function, you could have done:
("0".repeat(32) + (2024000250).toString(2)).slice(-32); //"01111000101000111100101011111010"
If you want json, you can do:
function convertBuffToBinaryStr(buff){
var res = [],
l = buff.byteLength,
v = new DataView(buff);
for (var i = 0; i < l; ++i){
res.push(v.getUint8(i));
}
return JSON.stringify(res);
}
Now try seeing what this outputs:
convertBuffToBinaryStr(createBuffer(2024000250)); //"[120,163,202,250]"
Back on the client-side you have to interpret this:
function interpret(json){
json = JSON.parse(json);
return parseInt(json.map((d)=>("0".repeat(8) + d.toString(2)).slice(-8)).join(""),2);
}
Now try:
interpret("[120,163,202,250]"); //2024000250
Note: For your interpret function, you have to use dataView to setUint8 and the use getInt32 at the end, since you are using signed integers, above won't work for all cases.
Well I finally got around to getting this to work.
It is not quite what I started off with but I'm just gonna post this for some other lost souls.
It is worth mentioning that ibrahim's answer contains most of the necessary information but is trying to satisfy the XY problem which my question ended up being.
I just send my binary data, as binary
let buffer = new Buffer(4);
buffer.writeInt32BE(2024000250);
// websocket connection
connection.send(buffer);
Then in the browser
// message listener
let reader = new FileReader();
reader.addEventListener('loadend', () => {
let view = new DataView(reader.result);
// there goes the precious data
console.log(view.getInt32());
});
reader.readAsArrayBuffer(message.data);
In all honesty this tickles my gag reflex. Why am I using a file reader to get some data out of a binary message? There is a good chance a better way of doing this exists, if so please add to this answer.
Other methods I found are the fetch API which is no better than the file reader in terms of hack rating and Blob.prototype.arrayBuffer which is not yet supported fully.

adding id3 tags html5 filesystem api

I have scenario, where i am building a podcast web applications that allows to listen and store .mp3 podcast file.
I am trying to implement a basic web interface where someone can add the entire id3 tag from the client side (the file will be stored locally on the client side : this client is not like the everyone client but just the guy who gets the raw podcast file without any id3 tag preferably). He then hosts this one page locally adds the correct id3 tags and then copies these .mp3 do a WebDav folder.
I do understand that edits needs to be done at server, but it would really helpful if it all can be done locally on the browser.
Off course there is no ready library to edit files , so i decided to use the HTML5 filesystem api, i.e drop the file into the virtual file system , edit it there are then copy it back to the local system. (for copying there is a ready library FileSaver.js) .
I have been able to do the following:
1) associate the mp3 file dropped at a drop zone to filesystem api using webkitGetAsEntry
2) copy this file then to the file system api.
part of the code looks like:
function onDrop(e)
{
e.preventDefault();
e.stopPropagation();
var items = e.dataTransfer.items;
var files = e.dataTransfer.files;
for (var i = 0, item; item = items[i]; ++i)
{
// Skip this one if we didn't get a file.
if (item.kind != 'file') {
continue;
}
var entry = item.webkitGetAsEntry();
if (entry.isFile)
{
// Copy the dropped entry into local filesystem.
entry.copyTo(cwd, null, function(copiedEntry) {
//setLoadingTxt({txt: DONE_MSG});
renderMp3Writer(entry);
My confusion is how do i add the entire id3 tag ? . I am lost at this point as i am not sure about:
1) can we add the entire id3 tag into the file from the fileWriter method?
2) If yes would this be a binary edit or how?? .
Any Help would be useful. tried the below but i am guessing i am wrong.
var blob1 = new Blob(['ID3hTIT2ga'], {type: 'audio/mp3'});
fileWriter.write(blob1);
You need to build a ID3 buffer, then create a buffer large enough to hold both the ID3 and MP3 file, insert the ID3 and append the MP3 data.
For this you need the ID3 specification and use typed arrays with DataView to build your array.
The ID3 overall structure is defined like this (see link above):
+-----------------------------+
| Header (10 bytes) |
+-----------------------------+
| Extended Header |
| (variable length, OPTIONAL) |
+-----------------------------+
| Frames (variable length) |
+-----------------------------+
| Padding |
| (variable length, OPTIONAL) |
+-----------------------------+
| Footer (10 bytes, OPTIONAL) |
+-----------------------------+
At this point the buffer length is unknown so you need to do this in steps. There are several ways to do this, you can build up small buffer segments for each field, then sum them up into a single buffer. Or you can make a larger buffer you know can hold all the fields you want to include and copy the sum of field from that buffer to the final one.
The latter tends to be simpler and as we're dealing with very small sizes this could be the best way (considering that each fragment in the first approach has their overheads).
So the first thing you need to do is to define the header. The header is defined this way:
ID3v2/file identifier "ID3"
ID3v2 version $04 00
ID3v2 flags %abcd0000 (note: bit-representation)
ID3v2 size 4 * %0xxxxxxx (note: bit-representation/mask)
ID3 and version are fixed values (other versions exists of course, but lets follow the current).
You can probably ignore most of the flags, if not all, by setting them to 0. But check the docs for your use-case, for example if you want to use extended headers.
Size is defined:
The ID3v2 tag size is stored as a 32 bit synchsafe integer (section
6.2), making a total of 28 effective bits (representing up to 256MB).
The ID3v2 tag size is the sum of the byte length of the extended
header, the padding and the frames after unsynchronisation. If a
footer is present this equals to ('total size' - 20) bytes, otherwise
('total size' - 10) bytes.
An example how you can build your buffer. First define a buffer big enough to hold all the data as well as a DataView:
var id3buffer = new ArrayBuffer(1024), // 1kb "space"
view = new DataView(id3buffer);
The DataView defaults to big-endian which is perfect, so all we need to do now is to fill in the data where it should be. We can make a few helper methods to help us move position at the same time as we write. Positions for DataView are byte-bound:
var pos = 0; // global start position
function setU8(value) {
view.setUint8(pos++, value)
}
function setU16(value) {
view.setUint16(pos, value);
pos += 2;
}
function setU32(value) {
view.setUint32(pos, value);
pos += 4;
}
etc. you can make helpers to write text unicode strings (see TextEncoder for example) and so forth.
To define the header, we can write in the "magic" word ID3. You could convert a string, or since it's only 3 bytes also just write it straight-forward. ID3 = 0x494433 in hex so:
setU8(0x49); // at pos 0
setU8(0x44); // at pos 1
setU8(0x33); // at pos 2
Since we made a wrapper we don't need to worry about the buffer position.
Then write in version (according to spec v.2.4.0 uses 0x0400 not using major version (2)):
setU16(0x0400); // default is big-endian so this works
Now you can continue with flags and size (see specs).
When the ID3 header is filled up pos will now hold the total length. So make a new buffer for ID3 tag and MP3 buffer:
var mp3 = new ArrayBuffer(pos + mp3Buffer.byteLength),
view8 = new Uint8Array(mp3);
The view8 view will allow us to do a simple copy to destination:
// create a segment from the tag buffer that will fit target:
var segment = new Uint8Array(view.buffer, 0, n); // replace n with actual length
view8.set(segment, 0);
view8.set(mp3buffer, pos);
If everything went OK you now have a MP3 with a ID3 tag (remember to check for existing ID3s - you need to scan to to end).
You can now send the ArrayBuffer to server, or convert to Blob for IndexedDB, or to an Object-URL if you want to present a link for download (none shown here as answer is becoming out-of-scope).
This should be enough to get you started - as said, you need to study the specs. If you're not familiar with typed array, check those out as well.
Also see the site for other resources (frames etc.).
Sync-safe values
"MP3" files uses frames which starts with 11 bits, all set to 1. If the size field of the header happen to contain 11 bits set to 1, the decoder could mistakenly interpret it as sound data. To avoid this the concept of sync-safe integers are used making sure that each byte's MSB (most signicant bit, bit 7) always is set to 0. The bit is moved to the left, the next byte is shifted one bit, for ID3 tag 4 times (hence the 4x %01111111).
Here is how to encode and decode sync-safe integers using JavaScript (from Wikipedia C/C++ source):
// test values
var value = 0xfffffff,
sync = intToSyncsafe(value);
document.write("<pre>Original size: 0x" + value.toString(16) + "<br>");
document.write("Synch-safe : 0x" + sync.toString(16) + "<br>");
document.write("Decoded value: 0x" + syncsafeToInt(sync).toString(16) + "</pre>");
function intToSyncsafe(value) {
var out, mask = 0x7f;
while(mask ^ 0x7fffffff) {
out = value & ~mask;
out <<= 1;
out |= value & mask;
mask = ((mask + 1) << 8) - 1;
value = out;
}
return out
}
function syncsafeToInt(value) {
var out = 0, mask = 0x7F000000;
while (mask) {
out >>= 1;
out |= value & mask;
mask >>= 8;
}
return out;
}
The sync-safe value would show the bits like: &b01111111011111110111111101111111 for the example value used in the demo above.

Conversion Error : ArrayBuffer to Int16Array

this should be straight forward, but I am not sure why I am getting the error, I am using constructor with ArrayBuffer as parameter as shown in mdn, but I am getting the error as invalid arguments, (p.s with dataview I have checked, the data is Int16 only)
the code is:
var view= DataView(arrayBuf);
console.log('arrayBuf.byteLength : '+arrayBuf.byteLength);
console.log('data at 0 : '+view.getInt16(0));
console.log('data at 1 : '+view.getInt16(1));
var int16arry = new Int16Array(arrayBuf);
the console output is:
"arrayBuf.byteLength : 117"
"data at 0 : 22720"
"data at 1 : -16315"
Error: invalid arguments
what is my mistake?
The short answer is that your arrayBuffer is in the wrong size. You can use:
var int16Array = new Int16Array(arrayBuf, 0, Math.floor(arrayBuf.byteLength / 2));
to hack away the problem.
Case specific comment:
I have tried reading the source for your library but i am unable to see why you are getting that extra byte (or what is missing).
The data you are getting is supposed to be 16 bit ints but for some reason you have other data there that take up an uneven amount of bytes, and according to the source as far as i can tell there should be some doubles (javascript floats) in there as well, meaning that "hacking" away the problem might not work.

parsing out data segment from a TCP stream

Im having some trouble with receiving packet data from a TCP stream. I think this is due in part to not understanding the servers responses.
my code (objective c):
unsigned type=0;
unsigned bufferFirstByte=0;
unsigned bufferSecondByte=0;
unsigned bufferThirdByte=0;
NSScanner *hexToInt = [NSScanner scannerWithString:[mutableBuffer objectAtIndex:0]];
[hexToInt scanHexInt:&bufferFirstByte];
hexToInt = [NSScanner scannerWithString:[mutableBuffer objectAtIndex:1]];
[hexToInt scanHexInt:&bufferSecondByte];
hexToInt = [NSScanner scannerWithString:[mutableBuffer objectAtIndex:2]];
[hexToInt scanHexInt:&bufferThirdByte];
hexToInt = [NSScanner scannerWithString:[mutableBuffer objectAtIndex:0]];
[hexToInt scanHexInt:&type];
int len = (bufferSecondByte<<8)+bufferSecondByte;
if (![mutableBuffer count]<(3+len)) {
NSArray *payload = [mutableBuffer subarrayWithRange:NSMakeRange(2,([mutableBuffer count] - 2))];
NSLog(#"length %d",len);
[self processReceive:type length:len payload:payload];
}
is some what modelled from this javascript code:
self.receive = function (itemName, data) {
self.log("Receiving: " + self.toHex(data));
self.ourData += data;
while (self.ourData.length >= 3) {
var type = self.ourData.charCodeAt(0);
var len = (self.ourData.charCodeAt(1) << 8) + self.ourData.charCodeAt(2);
if (self.ourData.length < (3 + len)) { // sanity check: buffer doesn't contain all the data advertised in the packet
break;
}
var payload = self.ourData.substr(3,len);
self.ourData = self.ourData.substr(3 + len);
self.processMessage(type, len, payload); // process payload
}
};
The reason for the modeling is that the command fusion javascript project is talking to the same server I am (a crestron controller).
However I could never get the len thing to work and I think thats whats causing my problem. When looking at a sample packet (05:00:06:00:00:03:00:52:00) the len would equal 1280 (see math above) even though the data portion is only 9bytes.
Currently my code will work but it misses certain data. This happens because of the streaming that TCP does (some packets are conjoined while others are fragmented). But without knowing the data segment size I cannot fix the issue and I believe the answer to that is the len variable. But I dont see how to properly implement it.
My question comes down to this. How can I determine the size of the data segment from this len variable or control my receive method to only except one data segment at a time (which from my research is not possible since TCP is made as a stream)?
I have a feeling there will be questions so Im going to attempt to answer a few of them here.
A. How do you come up with 1280: look at the math in the method ((self.ourData.charCodeAt(1) << 8) + self.ourData.charCodeAt(2);) (5<<8)+0=1280d
B. Why are you using different indexes:
You will notice that the index for what data goes where (payload, len, type). This is merely because they have their payload/data bytes as a string and myn is an array. in the end it is the same data being referenced
Use the following logic:
1) Do you have enough bytes to determine the length? If not, receive some more bytes and try again.
2) Do you have enough bytes to have one complete unit? If not, receive some more bytes and try again.
3) Extract one complete unit using the decoded length from step 1 and process it.
4) If we have no leftover bytes, we are done.
5) Return to step 1 to process the leftover bytes.
ok so i got some help from (this group) which you might not be able to see without a login for the group. in any case there is a 3 byte header. so my len which is 6 and not 1280 like i thought is actually 9 once the 3 is added for the header. and this gets me the value i was loooking for (9) since the data segment is 9bytes.
Thanks for the suggestions david, one up for some good basic knowledge.

Categories

Resources