String of Binary to Buffer in Node.js - javascript

I'm trying to convert a string of 0 and 1 into the equivalent Buffer by parsing the character stream as a UTF-16 encoding.
For example:
var binary = "01010101010101000100010"
The result of that would be the following Buffer
<Buffer 55 54>
Please note Buffer.from(string, "binary") is not valid as it creates a buffer where each individual 0 or 1 is parsed as it's own Latin One-Byte encoded string. From the Node.js documentation:
'latin1': A way of encoding the Buffer into a one-byte encoded string (as defined by the IANA in RFC 1345, page 63, to be the Latin-1 supplement block and C0/C1 control codes).
'binary': Alias for 'latin1'.

Use "".match to find all the groups of 16 bits.
Convert the binary string to number using parseInt
Create an Uint16Array and convert it to a Buffer
Tested on node 10.x
function binaryStringToBuffer(string) {
const groups = string.match(/[01]{16}/g);
const numbers = groups.map(binary => parseInt(binary, 2))
return Buffer.from(new Uint16Array(numbers).buffer);
}
console.log(binaryStringToBuffer("01010101010101000100010"))

Related

How to convert a hex binary string to Uint8Array

I have this string of bytes represented in hex:
const s = "\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\xff\x8bV23J15O4\xb14\xb1H61417KKLL\xb50L5U\x8a\x05\x00\xf6\xaa\x8e.\x1c\x00\x00\x00"
I would like to convert it to Uint8Array in order to further manipulate it.
How can it be done?
Update:
The binary string is coming from python backend. In python I can create this representation correctly:
encoded = base64.b64encode(b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\xff\x8bV23J15O4\xb14\xb1H61417KKLL\xb50L5U\x8a\x05\x00\xf6\xaa\x8e.\x1c\x00\x00\x00')
Since JavaScript strings support \x escapes, this should work to convert a Python byte string to a Uint8Array :
const s = "\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\xff\x8bV23J15O4\xb14\xb1H61417KKLL\xb50L5U\x8a\x05\x00\xf6\xaa\x8e.\x1c\x00\x00\x00";
const array = Uint8Array.from([...s].map(v => v.charCodeAt(0)));
console.log(array);
In Node.js, one uses Buffer.from to convert a (base64-encoded) string into a Buffer.
If the original argument is a base64 encoded string, as in Python:
const buffer = Buffer.from(encodedString, 'base64');
It if's a UTF-8 encoded string:
const buffer = Buffer.from(encodedString);
Buffers are instances of Uint8Array, so they can be used wherever a Uint8Array is expected. Quoting from the docs:
The Buffer class is a subclass of JavaScript's Uint8Array class and extends it with methods that cover additional use cases. Node.js APIs accept plain Uint8Arrays wherever Buffers are supported as well.
const s = "\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\xff\x8bV23J15O4\xb14\xb1H61417KKLL\xb50L5U\x8a\x05\x00\xf6\xaa\x8e.\x1c\x00\x00\x00"
//btoa(base64) - transforms base64 to ascii
let str = btoa(s)
let encoder = new TextEncoder()
let typedarr = encoder.encode(str) //encode() returns Uint8Array
console.log(typedarr)

Why is the output the same when I use different encodings on Node.js using fs.readFileSync()?

I am trying to understand why when I use the readFileSync method and use different encodings (for example utf-8, hex, ascii) I have the same output on the console and when I dont pass any specific encoding I receive the output in utf-8 format.
I mean, shouldn’t I receive the info of the file on the file format (in this case .sol) If I dont specify any encoding and receive the info on utf-8 format If I specify utf-8 format?
I think there is something I am not understanding about how enconding works.
const path = require('path');
const fs = require('fs');
const solc = require('solc')
const inboxPath = path.resolve(__dirname, 'contracts', 'Inbox.sol');
const source = fs.readFileSync(inboxPath, 'utf8');
console.log(solc.compile(source, 1));
When you call it like this:
const source = fs.readFileSync(inboxPath, 'utf8');
You are passing the utf8 encoding to the function. It will read the data from the file and apply the encoding to the data to convert it to a string.
If you call it like this with no encoding:
console.log(solc.compile(source, 1));
It will give you the raw binary data in a Buffer object and you will get something that (if you console.log(source)) will look like this:
<Buffer 54 65 73 74 69 6e 67 20 4e 6f 64 65 2e 6a 73 20 72 65 61 64 46 69 6c 65 28 29>
That display of the Buffer data is showing one hexadecimal value for each 8-bits of the binary data (for our viewing convenience). The 54 at the start of the buffer corresponds to the letter T so if you converted that buffer to a string with either the utf8 or ascii encoding, you would get a string that starts with a T.
If your data is all made up of characters with character codes less than 128, then interpreting it with utf8 and ascii encoding give identical results. This is because for characters with codes less than 128, utf8 just uses the code for the character directly. It's only when character codes get higher than 128 that utf8 starts using more than one byte for the character (in fact, it can use 1-4 bytes depending upon the actual code). There are 1,112,064 code points in unicode. Since you can only represent 256 unique values in a single byte, it obviously takes more than one byte to represent all 1,112,064 code points in unicode. When using utf8, that is a variable length encoding that uses one byte per character for anything with a code less than 128 and once you get above 128, it starts using more than one byte for those characters.
Your function call here:
console.log(solc.compile(source, 1));
is apparently expecting a string in the source argument so you HAVE to give it a string. If you don't pass an encoding like this:
const source = fs.readFileSync(inboxPath, 'utf8');
Then, source is a Buffer object (not a string) and the solc.compile(source, 1) function does not like that and gives you an error. You apparently need to pass that function a string. So, the code you show in your question:
const inboxPath = path.resolve(__dirname, 'contracts', 'Inbox.sol');
const source = fs.readFileSync(inboxPath, 'utf8');
console.log(solc.compile(source, 1));
is properly getting a string from fs.readFileSync() and then passing that string to solc.compile().

Why String.protoptype.charCodeAt() can convert binary string into an Uint8Array?

Suppose I have a base64 encoded string and I want to convert it into an ArrayBuffer, I can do it in this way:
// base64 decode the string to get the binary data
const binaryString = window.atob(base64EncodedString);
// convert from a binary string to an ArrayBuffer
const buf = new ArrayBuffer(binaryString.length);
const bufView = new Uint8Array(buf);
for (let i = 0, strLen = binaryString.length; i < strLen; i++) {
bufView[i] = binaryString.charCodeAt(i);
}
// get ArrayBuffer: `buf`
From String.protoptype.charCodeAt(), it will return an integer between 0 and 65535 representing the UTF-16 code unit at the given index. But an Uint8Array's range value is [0, 255].
I was initially thinking that the code point we obtained from charCodeAt() could go out of the bound of the Uint8Array range. Then I checked the built-in atob() function, which returns an ASCII string containing decoded data. According to Binary Array, ASCII string has a range from 0 to 127, which is included in the range of Uint8Array, and that's why we are safe to use charCodeAt() in this case.
That's my understanding. I'm not sure if I interpret this correctly. Thanks for your help!
So looks like my understanding is correct.
Thanks to #Konrad, and here is his/her add-up:
charCodeAt is designed to support utf-16. And utf-16 was designed to be compatible with ASCII so the first 256 characters have exact values like in ASCII encoding.

Conversion from buffer of bytes to string then back to bytes in NodeJS / Javascript

Like the title states, I am just trying to encode some bytes in a string, then decode them back to bytes. The conversion of a Uint8 array of bytes to string then back to array does not happen perfectly. I am just wondering what encoding I should use in the conversion to make it happen correctly.
I try this as a dummy example:
var bytes = serializeToBinary(); // giving me bytes
console.log('bytes type:'+ Object.prototype.toString.call(bytes));
console.log('bytes length:'+ bytes.length);
var bytesStr = bytes.toString('base64'); // gives me a string that looks like '45,80,114,98,97,68,111'
console.log('bytesStr length:'+ bytesStr.length);
console.log('bytesStr type:'+ Object.prototype.toString.call(bytesStr));
var decodedbytesStr = Buffer.from(bytesStr, 'base64');
console.log('decodedbytesStr type:'+ Object.prototype.toString.call(decodedbytesStr));
console.log('decodedbytesStr length:'+ decoded.length);
Output:
bytes type:[object Uint8Array]
bytes length:4235
bytesStr type:[object String]
bytesStr length:14161
decodedbytesStr type:[object Uint8Array]
decodedbytesStr length:7445
Shouldn't decodedbytesStr length and bytes length be the same?
TypedArray does not support .toString('base64'). The base64 argument is ignored, and you simply get a string representation of the array's values, separated by commas. This is not a base64 string, so Buffer.from(bytesStr, 'base64') is not processing it correctly.
You want to call .toString('base64') on a Buffer instead. When creating bytesStr, simply build a Buffer from your Uint8Array first:
var bytesStr = Buffer.from(bytes).toString('base64');

javascript hex codes with TCP/IP communication

I’m using a node module ‘net’ to create a client application that sends data through a TCP socket. The server-side application accepts this message if it starts and ends with a correct hex code, just for example the data packet would start with a hex “0F” and ends with a hex “0F1C”. How would I create these hex codes with javascript ? I found this code to convert a UTF-8 string into a hex code, not sure if this is what I need as I don’t have much experience with TCP/IP socket connections. Heres some javascript I've used to convert a utf-8 to a hex code. But I'm not sure this is what I'm looking for? Does anyone have experience with TCP/IP transfers and/or javascript hex codes?.
function toHex(str,hex){
try{
hex = unescape(encodeURIComponent(str))
.split('').map(function(v){
return v.charCodeAt(0).toString(16)
}).join('')
}
catch(e){
hex = str
console.log('invalid text input: ' + str)
}
return hex
}
First of all, you do not need to convert your data string into hex values, in order to send it over TCP. Every string in node.js is converted to bytes when sent over the network.
Normally, you'd send over a string like so:
var data = "ABC";
socket.write(data); // will send bytes 65 66 67, or in hex: 44 45 46
Node.JS also allows you to pass Buffer objects to functions like .write().
So, probably the easiest way to achieve what you wish, is to create an appropriate buffer to hold your data.
var data = "ABC";
var prefix = 0x0F; // JavaScript allows hex numbers.
var suffix = 0x0FC1;
var dataSize = Buffer.byteLength(data);
// compute the required buffer length
var bufferSize = 1 + dataSize + 2;
var buffer = new Buffer(bufferSize);
// store first byte on index 0;
buffer.writeUInt8(prefix, 0);
// store string starting at index 1;
buffer.write(data, 1, dataSize);
// stores last two bytes, in big endian format for TCP/IP.
buffer.writeUInt16BE(suffix, bufferSize - 2);
socket.write(buffer);
Explanation:
The prefix hex value 0F requires 1 byte of space. The suffix hex value 0FC1 actually requires two bytes (a 16-bit integer).
When computing the number of required bytes for a string (JavaScript strings are UTF-16 encoded!), str.length is not accurate most of the times, especially when your string has non-ASCII characters in it. For this, the proper way of getting the byte size of a string is to use Buffer.byteLength().
Buffers in node.js have static allocations, meaning you can't resize them after you created them. Hence, you'll need to compute the size of the buffer -in bytes- before creating it. Looking at our data, that is 1 (for our prefix) + Buffer.byteLength(data) (for our data) + 2 (for our suffix).
After that -imagine buffers as arrays of bytes (8-bit values)-, we'll populate the buffer, like so:
write the first byte (the prefix) using writeUInt8(byte, offset) with offset 0 in our buffer.
write the data string, using .write(string[, offset[, length]][, encoding]), starting at offset 1 in our buffer, and length dataSize.
write the last two bytes, using .writeUInt16BE(value, offset) with offset bufferSize - 2. We're using writeUInt16BE to write the 16-bit value in big-endian encoding, which is what you'd need for TCP/IP.
Once we've filled our buffer with the correct data, we can send it over the network, using socket.write(buffer);
Additional tip:
If you really want to convert a large string to bytes, (e.g. to later print as hex), then Buffer is also great:
var buf = Buffer.from('a very large string');
// now you have a byte represetantion of the string.
Since bytes are all 0-255 decimal values, you can easily print them as hex values in console, like so:
for (i = 0; i < buf.length; i++) {
const byte = buf[i];
const hexChar = byte.toString(16); // convert the decimal `byte` to hex string;
// do something with hexChar, e.g. console.log(hexChar);
}

Categories

Resources