I am working on a function that iterates over PCM data. I am getting chunks of data of varying size and I am currently handling this by buffer concatenation. Problem is, I am quite sure that this approach is a performance killer.
One of the simplest algorithm consists of chunking 500 chunks of 4800 bytes (= grain), and repeating them 3 times as such :
buf = <grain1, grain1, grain1, ..., grain500, grain500, grain500>
function(){
// ...
let buf = Buffer.alloc(0) // returned buffer, mutated
// nGrains is defined somewhere else in the function
// example: nGrains = 500
for(let i=0;i<nGrains;i++){
// a chunk of PCM DATA
// example: grain.byteLength = 4800
const grain = Buffer.from(this._getGrain())
// example: nRepeats = 3
for(let j=0;j<nRepeats;j++)
buf = Buffer.concat([buf, grain])
}
return buf
}
I feel like these performance heavy operations (1500 mutating concatenations) could be avoided if there were some sort of way to directly write "raw data" from a given offset to a pre-size-allocated buffer. I made the following helper function that gave me HUGE performance improvements, but I feel like I am doing something wrong...
function writeRaw(buf, rawBytes, offset) => {
for(i=0;i<rawBytes.byteLength;i++){
buf.writeUInt8(rawBytes.readUInt8(i), offset + i)
}
return buf
}
My function now looks like this:
function(){
// ...
const buf = Buffer.alloc(len) // returned buffer, immutable
for(let i=0;i<nGrains;i++){
const grain = Buffer.from(this._getGrain())
for(let j=0;j<nRepeats;j++)
writeRaw(buf, grain, (i * nRepeats + j) * grainSize)
}
return buf
}
My question is : Is there a cleaner way (or more standard way) to do this instead of iterating over bytes ? Buffer.write only seems to work for strings, although this would be ideal...
There is Buffer.copy.
const buf = Buffer.alloc(len);
for(let i = 0; i < nGrains; i++){
const grain = Buffer.from(this._getGrain());
for(let j=0;j<nRepeats;j++)
grain.copy(/*to*/ buf, /*at*/ (i * nRepeats + j) * grainSize);
}
You could also use Buffer.fill:
const buf = Buffer.alloc(len);
for(let i = 0; i < nGrains; i++) {
const grain = Buffer.from(this._getGrain());
buf.fill(grain, i * nRepeats * grainSize, (i + 1) * nRepeats * grainSize);
}
Related
I have found 3 methods to convert Uint8Array to BigInt and all of them give different results for some reason. Could you please tell me which one is correct and which one should I use?
Using bigint-conversion library. We can use bigintConversion.bufToBigint() function to get a BigInt. The implementation is as follows:
export function bufToBigint (buf: ArrayBuffer|TypedArray|Buffer): bigint {
let bits = 8n
if (ArrayBuffer.isView(buf)) bits = BigInt(buf.BYTES_PER_ELEMENT * 8)
else buf = new Uint8Array(buf)
let ret = 0n
for (const i of (buf as TypedArray|Buffer).values()) {
const bi = BigInt(i)
ret = (ret << bits) + bi
}
return ret
}
Using DataView:
let view = new DataView(arr.buffer, 0);
let result = view.getBigUint64(0, true);
Using a FOR loop:
let result = BigInt(0);
for (let i = arr.length - 1; i >= 0; i++) {
result = result * BigInt(256) + BigInt(arr[i]);
}
I'm honestly confused which one is right since all of them give different results but do give results.
I'm fine with either BE or LE but I'd just like to know why these 3 methods give a different result.
One reason for the different results is that they use different endianness.
Let's turn your snippets into a form where we can execute and compare them:
let source_array = new Uint8Array([
0xff, 0xee, 0xdd, 0xcc, 0xbb, 0xaa, 0x99, 0x88,
0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11]);
let buffer = source_array.buffer;
function method1(buf) {
let bits = 8n
if (ArrayBuffer.isView(buf)) {
bits = BigInt(buf.BYTES_PER_ELEMENT * 8)
} else {
buf = new Uint8Array(buf)
}
let ret = 0n
for (const i of buf.values()) {
const bi = BigInt(i)
ret = (ret << bits) + bi
}
return ret
}
function method2(buf) {
let view = new DataView(buf, 0);
return view.getBigUint64(0, true);
}
function method3(buf) {
let arr = new Uint8Array(buf);
let result = BigInt(0);
for (let i = arr.length - 1; i >= 0; i--) {
result = result * BigInt(256) + BigInt(arr[i]);
}
return result;
}
console.log(method1(buffer).toString(16));
console.log(method2(buffer).toString(16));
console.log(method3(buffer).toString(16));
Note that this includes a bug fix for method3: where you wrote for (let i = arr.length - 1; i >= 0; i++), you clearly meant i-- at the end.
For "method1" this prints: ffeeddccbbaa998877665544332211
Because method1 is a big-endian conversion (first byte of the array is most-significant part of the result) without size limit.
For "method2" this prints: 8899aabbccddeeff
Because method2 is a little-endian conversion (first byte of the array is least significant part of the result) limited to 64 bits.
If you switch the second getBigUint64 argument from true to false, you get big-endian behavior: ffeeddccbbaa9988.
To eliminate the size limitation, you'd have to add a loop: using getBigUint64 you can get 64-bit chunks, which you can assemble using shifts similar to method1 and method3.
For "method3" this prints: 112233445566778899aabbccddeeff
Because method3 is a little-endian conversion without size limit. If you reverse the for-loop's direction, you'll get the same big-endian behavior as method1: result * 256n gives the same value as result << 8n; the latter is a bit faster.
(Side note: BigInt(0) and BigInt(256) are needlessly verbose, just write 0n and 256n instead. Additional benefit: 123456789123456789n does what you'd expect, BigInt(123456789123456789) does not.)
So which method should you use? That depends on:
(1) Do your incoming arrays assume BE or LE encoding?
(2) Are your BigInts limited to 64 bits or arbitrarily large?
(3) Is this performance-critical code, or are all approaches "fast enough"?
Taking a step back: if you control both parts of the overall process (converting BigInts to Uint8Array, then transmitting/storing them, then converting back to BigInt), consider simply using hexadecimal strings instead: that'll be easier to code, easier to debug, and significantly faster. Something like:
function serialize(bigint) {
return "0x" + bigint.toString(16);
}
function deserialize(serialized_bigint) {
return BigInt(serialized_bigint);
}
If you need to store really big integers that isn't bound to any base64 or 128 and also keep negative numbers then this is a solution for you...
function encode(n) {
let hex, bytes
// shift all numbers 1 step to the left and xor if less then 0
n = (n << 1n) ^ (n < 0n ? -1n : 0n)
// convert to hex
hex = n.toString(16)
// pad if neccesseery
if (hex.length % 2) hex = '0' + hex
// convert hex to bytes
bytes = hex.match(/.{1,2}/g).map(byte => parseInt(byte, 16))
return bytes
}
function decode(bytes) {
let hex, n
// convert bytes back into hex
hex = bytes.map(e => e.toString(16).padStart(2, 0)).join('')
// Convert hex to BigInt
n = BigInt(`0x`+hex)
// Shift all numbers to right and xor if the first bit was signed
n = (n >> 1n) ^ (n & 1n ? -1n : 0n)
return n
}
const input = document.querySelector('input')
input.oninput = () => {
console.clear()
const bytes = encode(BigInt(input.value))
// TODO: Save or transmit this bytes
// new Uint8Array(bytes)
console.log(bytes.join(','))
const n = decode(bytes)
console.log(n.toString(10)+'n') // cuz SO can't render bigints...
}
input.oninput()
<input type="number" value="-39287498324798237498237498273323423" style="width: 100%">
I need to add compression to my project and I decided to use the LZJB algorithm that is fast and the code is small. Found this library https://github.com/copy/jslzjb-k
But the API is not very nice because to decompress the file you need input buffer length (because Uint8Array is not dynamic you need to allocate some data). So I want to save the length of the input buffer as the first few bytes of Uint8Array so I can extract that value and create output Uint8Array based on that integer value.
I want the function that returns Uint8Array from integer to be generic, maybe save the length of the bytes into the first byte so you know how much data you need to extract to read the integer. I guess I need to extract those bytes and use some bit shifting to get the original number. But I'm not exactly sure how to do this.
So how can I write a generic function that converts an integer into Uint8Array that can be embedded into a bigger array and then extract that number?
Here are working functions (based on Converting javascript Integer to byte array and back)
function numberToBytes(number) {
// you can use constant number of bytes by using 8 or 4
const len = Math.ceil(Math.log2(number) / 8);
const byteArray = new Uint8Array(len);
for (let index = 0; index < byteArray.length; index++) {
const byte = number & 0xff;
byteArray[index] = byte;
number = (number - byte) / 256;
}
return byteArray;
}
function bytesToNumber(byteArray) {
let result = 0;
for (let i = byteArray.length - 1; i >= 0; i--) {
result = (result * 256) + byteArray[i];
}
return result;
}
by using const len = Math.ceil(Math.log2(number) / 8); the array have only bytes needed. If you want a fixed size you can use a constant 8 or 4.
In my case, I just saved the length of the bytes in the first byte.
General answer
These functions allow any integer (it uses BigInts internally, but can accept Number arguments) to be encoded into, and decoded from, any part of a Uint8Array. It is somewhat overkill, but I wanted to learn how to work with arbitrary-sized integers in JS.
// n can be a bigint or a number
// bs is an optional Uint8Array of sufficient size
// if unspecified, a large-enough Uint8Array will be allocated
// start (optional) is the offset
// where the length-prefixed number will be written
// returns the resulting Uint8Array
function writePrefixedNum(n, bs, start) {
start = start || 0;
let len = start+2; // start, length, and 1 byte min
for (let i=0x100n; i<n; i<<=8n, len ++) /* increment length */;
if (bs === undefined) {
bs = new Uint8Array(len);
} else if (bs.length < len) {
throw `byte array too small; ${bs.length} < ${len}`;
}
let r = BigInt(n);
for (let pos = start+1; pos < len; pos++) {
bs[pos] = Number(r & 0xffn);
r >>= 8n;
}
bs[start] = len-start-1; // write byte-count to start byte
return bs;
}
// bs must be a Uint8Array from where the number will be read
// start (optional, defaults to 0)
// is where the length-prefixed number can be found
// returns a bigint, which can be coerced to int using Number()
function readPrefixedNum(bs, start) {
start = start || 0;
let size = bs[start]; // read byte-count from start byte
let n = 0n;
if (bs.length < start+size) {
throw `byte array too small; ${bs.length} < ${start+size}`;
}
for (let pos = start+size; pos >= start+1; pos --) {
n <<= 8n;
n |= BigInt(bs[pos])
}
return n;
}
function test(n) {
const array = undefined;
const offset = 2;
let bs = writePrefixedNum(n, undefined, offset);
console.log(bs);
let result = readPrefixedNum(bs, offset);
console.log(n, result, "correct?", n == result)
}
test(0)
test(0x1020304050607080n)
test(0x0807060504030201n)
Simple 4-byte answer
This answer encodes 4-byte integers to and from Uint8Arrays.
function intToArray(i) {
return Uint8Array.of(
(i&0xff000000)>>24,
(i&0x00ff0000)>>16,
(i&0x0000ff00)>> 8,
(i&0x000000ff)>> 0);
}
function arrayToInt(bs, start) {
start = start || 0;
const bytes = bs.subarray(start, start+4);
let n = 0;
for (const byte of bytes.values()) {
n = (n<<8)|byte;
}
return n;
}
for (let v of [123, 123<<8, 123<<16, 123<<24]) {
let a = intToArray(v);
let r = arrayToInt(a, 0);
console.log(v, a, r);
}
Posting this one-liner in case it is useful to anyone who is looking to work with numbers below 2^53. This strictly uses bitwise operations and has no need for constants or values other than the input to be defined.
export const encodeUvarint = (n: number): Uint8Array => n >= 0x80
? Uint8Array.from([(n & 0x7f) | 0x80, ...encodeUvarint(n >> 7)])
: Uint8Array.from([n & 0xff]);
I want to do the equivalent of ArrayBuffer.slice without actually copying the contents. The use case is converting a very large (50mb) ArrayBuffer into a string
Below you can see that I am using new Uint16Array(buffer, start, chunkSize)). This copies the value from the ArrayBuffer, which is slow. Any ideas on how to make this more performant?
function arrayBufferToStr(buffer: ArrayBuffer) {
// Use chunks to not go over call stack
// chunks of 1024 bytes * 64
const chunkSize = 1024 * 64
// To right before the last chunk so the last `String.fromCharCode`
// can skip passing a byteLength argument
const endCondition = buffer.byteLength - (chunkSize * 2)
let str = ""
let start = 0
for (start = 0; start < endCondition; start += chunkSize * 2) {
str += String.fromCharCode.apply(null, new Uint16Array(buffer, start, chunkSize))
}
const view = new Uint16Array(buffer, start)
buffer = null
str += String.fromCharCode.apply(null, view)
return str
}
I am working with Javascript typed arrays, and I need to compress them as much as possible for networking purposes.
The smallest built in array Javascript has is 8 bits per entry. This will store numbers between 0 and 255.
However the data I'm working with will only contain numbers between 0 and 3. This can can be stored using 2 bits.
So my question is, if I have an 8 bit array that is populated with data only using numbers between 0 and 3, how can I "convert" it into a 2 bit array?
I know I'll need to use a bit operator, but I'm not sure how to make a mask that will only focus on 2 bits at a time.
A longer example is hard to fit into a comment :)
Up front, please note that very often, network data is compressed already - e.g. with gzip (especially when there is concern about data volume and the network libraries are setup properly). However, this is not always the case and would still not be as compact as doing it manually.
You need to keep track of two things, the current array index and the current slot inside the 8-Bit that is being read or written. For writing, | is useful, for reading &. Shifts (<< or >>) are used to select the position.
const randomTwoBitData = () => Math.floor(Math.random() * 4);
//Array of random 2-Bit data
const sampleData = Array(256).fill().map(e => randomTwoBitData());
//four entries per 8-Bit
let buffer = new Uint8Array(sampleData.length / 4);
//Writing data, i made my life easy
//because the data is divisible by four and fits perfectly.
for (let i = 0; i < sampleData.length; i += 4) {
buffer[i / 4] =
sampleData[i] |
(sampleData[i + 1] << 2) |
(sampleData[i + 2] << 4) |
(sampleData[i + 3] << 6);
}
//padding for console logging
const pad = (s, n) => "0".repeat(Math.max(0, n - s.length)) + s;
//Some output to see results at the middle
console.log(`buffer: ${pad(buffer[31].toString(2), 8)}, ` +
`original data: ${pad(sampleData[127].toString(2), 2)}, ` +
`${pad(sampleData[126].toString(2), 2)}, ` +
`${pad(sampleData[125].toString(2), 2)}, ` +
`${pad(sampleData[124].toString(2), 2)}`);
console.log("(order of original data inverted for readability)");
console.log("");
//Reading back:
let readData = [];
buffer.forEach(byte => {
readData.push(byte & 3); // 3 is 00000011 binary
readData.push((byte & 12) >> 2); // 12 is 00001100 binary
readData.push((byte & 48) >> 4); // 48 is 00110000 binary
readData.push((byte & 192) >> 6); // 192 is 11000000 binary
});
//Check if data read from compacted buffer is the same
//as the original
console.log(`original data and re-read data are identical: ` +
readData.every((e, i) => e === sampleData[i]));
Here is a function to do 8 bits number to 2 bits array of length 4 with & and >>:
function convert8to2(val){
var arr = [];
arr.push((val&parseInt('11000000', 2))>>6);
arr.push((val&parseInt('00110000', 2))>>4);
arr.push((val&parseInt('00001100', 2))>>2);
arr.push((val&parseInt('00000011', 2)));
return arr;
}
function convert2to8(arr){
if(arr.length != 4)
throw 'erorr';
return (arr[0]<<6)+(arr[1]<<4)+(arr[2]<<2)+arr[3];
}
// 228 = 11100100
var arr = convert8to2(228);
console.log(arr);
console.log(convert2to8(arr));
Edited
Change the example value and format the binary number with leading 0
Edited
Add convert2to8 and create an example usage:
function convert8to2(val){
var arr = [];
arr.push((val&parseInt('11000000', 2))>>6);
arr.push((val&parseInt('00110000', 2))>>4);
arr.push((val&parseInt('00001100', 2))>>2);
arr.push((val&parseInt('00000011', 2)));
return arr;
}
function convert2to8(arr){
if(arr.length != 4)
throw 'erorr';
return (arr[0]<<6)+(arr[1]<<4)+(arr[2]<<2)+arr[3];
}
var randomData = [];
for(var i=0;i<10;i++){
randomData.push(Math.floor(Math.random() * 255));
}
console.log(randomData);
var arrayOf2 = []
for(var i=0;i<randomData.length;i++){
arrayOf2.push(convert8to2(randomData[i]));
}
console.log(arrayOf2);
var arrayOf8 = [];
for(var i=0;i<arrayOf2.length;i++){
arrayOf8.push(convert2to8(arrayOf2[i]));
}
console.log(arrayOf8);
I'm learning about Blockchain and wanted to create an example of creating an address, purely for educational purposes - WOULD NOT BE DONE ANYWHERE NEAR PRODUCTION.
Task: create 160 random bits, convert it to hex, convert that to base 58, then to test correctness by reversing the process.
It kind of works, however I get intermittent 'false' on comparison of before and after binary. The hexStringToBinary function returns strings with varying lengths:
const bs58 = require('bs58');
//20 * 8 = 160
function generate20Bytes () {
let byteArray = [];
let bytes = 0;
while (bytes < 20) {
let byte = '';
while (byte.length < 8) {
byte += Math.floor(Math.random() * 2);
}
byteArray.push(byte);
bytes++;
}
return byteArray;
}
//the issue is probably from here
function hexStringToBinary (string) {
return string.match(/.{1,2}/g)
.map(hex => parseInt(hex, 16).toString(2).padStart(8, '0'));
}
const addressArray = generate20Bytes();
const binaryAddress = addressArray.join('');
const hex = addressArray.map(byte => parseInt(byte, 2).toString(16)).join('');
console.log(hex);
// then lets convert it to base 58
const base58 = bs58.encode(Buffer.from(hex));
console.log('base 58');
console.log(base58);
// lets see if we can reverse the process
const destructuredHex = bs58.decode(base58).toString();
console.log('hex is the same');
console.log(hex === destructuredHex);
// lets convert back to a binary string
const destructuredAddress = hexStringToBinary(destructuredHex).join('');
console.log('destructured address');
console.log(destructuredAddress);
console.log('binaryAddress address');
console.log(binaryAddress);
//intermittent false/true
console.log(destructuredAddress === binaryAddress);
Got round to refactoring with tdd. Realised it wasn't zero filling hex < 16. My playground repo