I want to do the equivalent of ArrayBuffer.slice without actually copying the contents. The use case is converting a very large (50mb) ArrayBuffer into a string
Below you can see that I am using new Uint16Array(buffer, start, chunkSize)). This copies the value from the ArrayBuffer, which is slow. Any ideas on how to make this more performant?
function arrayBufferToStr(buffer: ArrayBuffer) {
// Use chunks to not go over call stack
// chunks of 1024 bytes * 64
const chunkSize = 1024 * 64
// To right before the last chunk so the last `String.fromCharCode`
// can skip passing a byteLength argument
const endCondition = buffer.byteLength - (chunkSize * 2)
let str = ""
let start = 0
for (start = 0; start < endCondition; start += chunkSize * 2) {
str += String.fromCharCode.apply(null, new Uint16Array(buffer, start, chunkSize))
}
const view = new Uint16Array(buffer, start)
buffer = null
str += String.fromCharCode.apply(null, view)
return str
}
Related
I have found 3 methods to convert Uint8Array to BigInt and all of them give different results for some reason. Could you please tell me which one is correct and which one should I use?
Using bigint-conversion library. We can use bigintConversion.bufToBigint() function to get a BigInt. The implementation is as follows:
export function bufToBigint (buf: ArrayBuffer|TypedArray|Buffer): bigint {
let bits = 8n
if (ArrayBuffer.isView(buf)) bits = BigInt(buf.BYTES_PER_ELEMENT * 8)
else buf = new Uint8Array(buf)
let ret = 0n
for (const i of (buf as TypedArray|Buffer).values()) {
const bi = BigInt(i)
ret = (ret << bits) + bi
}
return ret
}
Using DataView:
let view = new DataView(arr.buffer, 0);
let result = view.getBigUint64(0, true);
Using a FOR loop:
let result = BigInt(0);
for (let i = arr.length - 1; i >= 0; i++) {
result = result * BigInt(256) + BigInt(arr[i]);
}
I'm honestly confused which one is right since all of them give different results but do give results.
I'm fine with either BE or LE but I'd just like to know why these 3 methods give a different result.
One reason for the different results is that they use different endianness.
Let's turn your snippets into a form where we can execute and compare them:
let source_array = new Uint8Array([
0xff, 0xee, 0xdd, 0xcc, 0xbb, 0xaa, 0x99, 0x88,
0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11]);
let buffer = source_array.buffer;
function method1(buf) {
let bits = 8n
if (ArrayBuffer.isView(buf)) {
bits = BigInt(buf.BYTES_PER_ELEMENT * 8)
} else {
buf = new Uint8Array(buf)
}
let ret = 0n
for (const i of buf.values()) {
const bi = BigInt(i)
ret = (ret << bits) + bi
}
return ret
}
function method2(buf) {
let view = new DataView(buf, 0);
return view.getBigUint64(0, true);
}
function method3(buf) {
let arr = new Uint8Array(buf);
let result = BigInt(0);
for (let i = arr.length - 1; i >= 0; i--) {
result = result * BigInt(256) + BigInt(arr[i]);
}
return result;
}
console.log(method1(buffer).toString(16));
console.log(method2(buffer).toString(16));
console.log(method3(buffer).toString(16));
Note that this includes a bug fix for method3: where you wrote for (let i = arr.length - 1; i >= 0; i++), you clearly meant i-- at the end.
For "method1" this prints: ffeeddccbbaa998877665544332211
Because method1 is a big-endian conversion (first byte of the array is most-significant part of the result) without size limit.
For "method2" this prints: 8899aabbccddeeff
Because method2 is a little-endian conversion (first byte of the array is least significant part of the result) limited to 64 bits.
If you switch the second getBigUint64 argument from true to false, you get big-endian behavior: ffeeddccbbaa9988.
To eliminate the size limitation, you'd have to add a loop: using getBigUint64 you can get 64-bit chunks, which you can assemble using shifts similar to method1 and method3.
For "method3" this prints: 112233445566778899aabbccddeeff
Because method3 is a little-endian conversion without size limit. If you reverse the for-loop's direction, you'll get the same big-endian behavior as method1: result * 256n gives the same value as result << 8n; the latter is a bit faster.
(Side note: BigInt(0) and BigInt(256) are needlessly verbose, just write 0n and 256n instead. Additional benefit: 123456789123456789n does what you'd expect, BigInt(123456789123456789) does not.)
So which method should you use? That depends on:
(1) Do your incoming arrays assume BE or LE encoding?
(2) Are your BigInts limited to 64 bits or arbitrarily large?
(3) Is this performance-critical code, or are all approaches "fast enough"?
Taking a step back: if you control both parts of the overall process (converting BigInts to Uint8Array, then transmitting/storing them, then converting back to BigInt), consider simply using hexadecimal strings instead: that'll be easier to code, easier to debug, and significantly faster. Something like:
function serialize(bigint) {
return "0x" + bigint.toString(16);
}
function deserialize(serialized_bigint) {
return BigInt(serialized_bigint);
}
If you need to store really big integers that isn't bound to any base64 or 128 and also keep negative numbers then this is a solution for you...
function encode(n) {
let hex, bytes
// shift all numbers 1 step to the left and xor if less then 0
n = (n << 1n) ^ (n < 0n ? -1n : 0n)
// convert to hex
hex = n.toString(16)
// pad if neccesseery
if (hex.length % 2) hex = '0' + hex
// convert hex to bytes
bytes = hex.match(/.{1,2}/g).map(byte => parseInt(byte, 16))
return bytes
}
function decode(bytes) {
let hex, n
// convert bytes back into hex
hex = bytes.map(e => e.toString(16).padStart(2, 0)).join('')
// Convert hex to BigInt
n = BigInt(`0x`+hex)
// Shift all numbers to right and xor if the first bit was signed
n = (n >> 1n) ^ (n & 1n ? -1n : 0n)
return n
}
const input = document.querySelector('input')
input.oninput = () => {
console.clear()
const bytes = encode(BigInt(input.value))
// TODO: Save or transmit this bytes
// new Uint8Array(bytes)
console.log(bytes.join(','))
const n = decode(bytes)
console.log(n.toString(10)+'n') // cuz SO can't render bigints...
}
input.oninput()
<input type="number" value="-39287498324798237498237498273323423" style="width: 100%">
I need to add compression to my project and I decided to use the LZJB algorithm that is fast and the code is small. Found this library https://github.com/copy/jslzjb-k
But the API is not very nice because to decompress the file you need input buffer length (because Uint8Array is not dynamic you need to allocate some data). So I want to save the length of the input buffer as the first few bytes of Uint8Array so I can extract that value and create output Uint8Array based on that integer value.
I want the function that returns Uint8Array from integer to be generic, maybe save the length of the bytes into the first byte so you know how much data you need to extract to read the integer. I guess I need to extract those bytes and use some bit shifting to get the original number. But I'm not exactly sure how to do this.
So how can I write a generic function that converts an integer into Uint8Array that can be embedded into a bigger array and then extract that number?
Here are working functions (based on Converting javascript Integer to byte array and back)
function numberToBytes(number) {
// you can use constant number of bytes by using 8 or 4
const len = Math.ceil(Math.log2(number) / 8);
const byteArray = new Uint8Array(len);
for (let index = 0; index < byteArray.length; index++) {
const byte = number & 0xff;
byteArray[index] = byte;
number = (number - byte) / 256;
}
return byteArray;
}
function bytesToNumber(byteArray) {
let result = 0;
for (let i = byteArray.length - 1; i >= 0; i--) {
result = (result * 256) + byteArray[i];
}
return result;
}
by using const len = Math.ceil(Math.log2(number) / 8); the array have only bytes needed. If you want a fixed size you can use a constant 8 or 4.
In my case, I just saved the length of the bytes in the first byte.
General answer
These functions allow any integer (it uses BigInts internally, but can accept Number arguments) to be encoded into, and decoded from, any part of a Uint8Array. It is somewhat overkill, but I wanted to learn how to work with arbitrary-sized integers in JS.
// n can be a bigint or a number
// bs is an optional Uint8Array of sufficient size
// if unspecified, a large-enough Uint8Array will be allocated
// start (optional) is the offset
// where the length-prefixed number will be written
// returns the resulting Uint8Array
function writePrefixedNum(n, bs, start) {
start = start || 0;
let len = start+2; // start, length, and 1 byte min
for (let i=0x100n; i<n; i<<=8n, len ++) /* increment length */;
if (bs === undefined) {
bs = new Uint8Array(len);
} else if (bs.length < len) {
throw `byte array too small; ${bs.length} < ${len}`;
}
let r = BigInt(n);
for (let pos = start+1; pos < len; pos++) {
bs[pos] = Number(r & 0xffn);
r >>= 8n;
}
bs[start] = len-start-1; // write byte-count to start byte
return bs;
}
// bs must be a Uint8Array from where the number will be read
// start (optional, defaults to 0)
// is where the length-prefixed number can be found
// returns a bigint, which can be coerced to int using Number()
function readPrefixedNum(bs, start) {
start = start || 0;
let size = bs[start]; // read byte-count from start byte
let n = 0n;
if (bs.length < start+size) {
throw `byte array too small; ${bs.length} < ${start+size}`;
}
for (let pos = start+size; pos >= start+1; pos --) {
n <<= 8n;
n |= BigInt(bs[pos])
}
return n;
}
function test(n) {
const array = undefined;
const offset = 2;
let bs = writePrefixedNum(n, undefined, offset);
console.log(bs);
let result = readPrefixedNum(bs, offset);
console.log(n, result, "correct?", n == result)
}
test(0)
test(0x1020304050607080n)
test(0x0807060504030201n)
Simple 4-byte answer
This answer encodes 4-byte integers to and from Uint8Arrays.
function intToArray(i) {
return Uint8Array.of(
(i&0xff000000)>>24,
(i&0x00ff0000)>>16,
(i&0x0000ff00)>> 8,
(i&0x000000ff)>> 0);
}
function arrayToInt(bs, start) {
start = start || 0;
const bytes = bs.subarray(start, start+4);
let n = 0;
for (const byte of bytes.values()) {
n = (n<<8)|byte;
}
return n;
}
for (let v of [123, 123<<8, 123<<16, 123<<24]) {
let a = intToArray(v);
let r = arrayToInt(a, 0);
console.log(v, a, r);
}
Posting this one-liner in case it is useful to anyone who is looking to work with numbers below 2^53. This strictly uses bitwise operations and has no need for constants or values other than the input to be defined.
export const encodeUvarint = (n: number): Uint8Array => n >= 0x80
? Uint8Array.from([(n & 0x7f) | 0x80, ...encodeUvarint(n >> 7)])
: Uint8Array.from([n & 0xff]);
I made extensive 2 days research on the topic, but there is no really well explained piece that would work.
So the flow is following:
load mp3 (store bought) cbr 320 into wavesurfer
apply all the changes you need
download processed result back to mp3 file (without usage of server)
Ive seen online apps that can do that, nothing is transmitted to server, all happens in the browser.
when we inspect wavesurfer, we have access to these:
The goal would be to use already available references from wavesurfer to produce the download mp3.
from my understanding this can be done with MediaRecorder, WebCodecs API or some libraries like lamejs.
Ive tried to find working example of how to do it with two first methods but without luck. I also tried to do it with lamejs using their example provided on the git but i am getting errors from the lib that are hard to debug, most likely related to providing wrong input.
So far i only managed to download wav file using following script:
handleCopyRegion = (region, instance) => {
var segmentDuration = region.end - region.start;
var originalBuffer = instance.backend.buffer;
var emptySegment = instance.backend.ac.createBuffer(
originalBuffer.numberOfChannels,
Math.ceil(segmentDuration * originalBuffer.sampleRate),
originalBuffer.sampleRate
);
for (var i = 0; i < originalBuffer.numberOfChannels; i++) {
var chanData = originalBuffer.getChannelData(i);
var emptySegmentData = emptySegment.getChannelData(i);
var mid_data = chanData.subarray(
Math.ceil(region.start * originalBuffer.sampleRate),
Math.ceil(region.end * originalBuffer.sampleRate)
);
emptySegmentData.set(mid_data);
}
return emptySegment;
};
bufferToWave = (abuffer, offset, len) => {
var numOfChan = abuffer.numberOfChannels,
length = len * numOfChan * 2 + 44,
buffer = new ArrayBuffer(length),
view = new DataView(buffer),
channels = [],
i,
sample,
pos = 0;
// write WAVE header
setUint32(0x46464952); // "RIFF"
setUint32(length - 8); // file length - 8
setUint32(0x45564157); // "WAVE"
setUint32(0x20746d66); // "fmt " chunk
setUint32(16); // length = 16
setUint16(1); // PCM (uncompressed)
setUint16(numOfChan);
setUint32(abuffer.sampleRate);
setUint32(abuffer.sampleRate * 2 * numOfChan); // avg. bytes/sec
setUint16(numOfChan * 2); // block-align
setUint16(16); // 16-bit (hardcoded in this demo)
setUint32(0x61746164); // "data" - chunk
setUint32(length - pos - 4); // chunk length
// write interleaved data
for (i = 0; i < abuffer.numberOfChannels; i++)
channels.push(abuffer.getChannelData(i));
while (pos < length) {
for (i = 0; i < numOfChan; i++) {
// interleave channels
sample = Math.max(-1, Math.min(1, channels[i][offset])); // clamp
sample = (0.5 + sample < 0 ? sample * 32768 : sample * 32767) | 0; // scale to 16-bit signed int
view.setInt16(pos, sample, true); // update data chunk
pos += 2;
}
offset++; // next source sample
}
// create Blob
return new Blob([buffer], { type: "audio/wav" });
function setUint16(data) {
view.setUint16(pos, data, true);
pos += 2;
}
function setUint32(data) {
view.setUint32(pos, data, true);
pos += 4;
}
};
const cutSelection = this.handleCopyRegion(
this.wavesurfer.regions.list.cut,
this.wavesurfer
);
const blob = this.bufferToWave(cutSelection, 0, cutSelection.length);
// you can now download wav from the blob
Is there a way to avoid making wav and right away make mp3 and download it, or if not make mp3 from that wav, if so how it can be done?
I mainly tried to use wavesurfer.backend.buffer as input, because this reference is AudioBuffer and accessing .getChannelData(0|1) gives you left and right channels. But didnt accomplish anything, maybe i am thinking wrong.
Alright, here is the steps we need to do:
Get buffer data from the wavesurfer player
Analyze the buffer to get the number of Channels(STEREO or MONO), channels data and Sample rate.
Use lamejs library to convert buffer to the MP3 blob file
Then we can get that download link from blob
Here is a quick DEMO
and also the JS code:
function downloadMp3() {
var MP3Blob = analyzeAudioBuffer(wavesurfer.backend.buffer);
console.log('here is your mp3 url:');
console.log(URL.createObjectURL(MP3Blob));
}
function analyzeAudioBuffer(aBuffer) {
let numOfChan = aBuffer.numberOfChannels,
btwLength = aBuffer.length * numOfChan * 2 + 44,
btwArrBuff = new ArrayBuffer(btwLength),
btwView = new DataView(btwArrBuff),
btwChnls = [],
btwIndex,
btwSample,
btwOffset = 0,
btwPos = 0;
setUint32(0x46464952); // "RIFF"
setUint32(btwLength - 8); // file length - 8
setUint32(0x45564157); // "WAVE"
setUint32(0x20746d66); // "fmt " chunk
setUint32(16); // length = 16
setUint16(1); // PCM (uncompressed)
setUint16(numOfChan);
setUint32(aBuffer.sampleRate);
setUint32(aBuffer.sampleRate * 2 * numOfChan); // avg. bytes/sec
setUint16(numOfChan * 2); // block-align
setUint16(16); // 16-bit
setUint32(0x61746164); // "data" - chunk
setUint32(btwLength - btwPos - 4); // chunk length
for (btwIndex = 0; btwIndex < aBuffer.numberOfChannels; btwIndex++)
btwChnls.push(aBuffer.getChannelData(btwIndex));
while (btwPos < btwLength) {
for (btwIndex = 0; btwIndex < numOfChan; btwIndex++) {
// interleave btwChnls
btwSample = Math.max(-1, Math.min(1, btwChnls[btwIndex][btwOffset])); // clamp
btwSample = (0.5 + btwSample < 0 ? btwSample * 32768 : btwSample * 32767) | 0; // scale to 16-bit signed int
btwView.setInt16(btwPos, btwSample, true); // write 16-bit sample
btwPos += 2;
}
btwOffset++; // next source sample
}
let wavHdr = lamejs.WavHeader.readHeader(new DataView(btwArrBuff));
//Stereo
let data = new Int16Array(btwArrBuff, wavHdr.dataOffset, wavHdr.dataLen / 2);
let leftData = [];
let rightData = [];
for (let i = 0; i < data.length; i += 2) {
leftData.push(data[i]);
rightData.push(data[i + 1]);
}
var left = new Int16Array(leftData);
var right = new Int16Array(rightData);
//STEREO
if (wavHdr.channels===2)
return bufferToMp3(wavHdr.channels, wavHdr.sampleRate, left,right);
//MONO
else if (wavHdr.channels===1)
return bufferToMp3(wavHdr.channels, wavHdr.sampleRate, data);
function setUint16(data) {
btwView.setUint16(btwPos, data, true);
btwPos += 2;
}
function setUint32(data) {
btwView.setUint32(btwPos, data, true);
btwPos += 4;
}
}
function bufferToMp3(channels, sampleRate, left, right = null) {
var buffer = [];
var mp3enc = new lamejs.Mp3Encoder(channels, sampleRate, 128);
var remaining = left.length;
var samplesPerFrame = 1152;
for (var i = 0; remaining >= samplesPerFrame; i += samplesPerFrame) {
if (!right)
{
var mono = left.subarray(i, i + samplesPerFrame);
var mp3buf = mp3enc.encodeBuffer(mono);
}
else {
var leftChunk = left.subarray(i, i + samplesPerFrame);
var rightChunk = right.subarray(i, i + samplesPerFrame);
var mp3buf = mp3enc.encodeBuffer(leftChunk,rightChunk);
}
if (mp3buf.length > 0) {
buffer.push(mp3buf);//new Int8Array(mp3buf));
}
remaining -= samplesPerFrame;
}
var d = mp3enc.flush();
if(d.length > 0){
buffer.push(new Int8Array(d));
}
var mp3Blob = new Blob(buffer, {type: 'audio/mpeg'});
//var bUrl = window.URL.createObjectURL(mp3Blob);
// send the download link to the console
//console.log('mp3 download:', bUrl);
return mp3Blob;
}
Let me know if you have any question about the code
I am working on a function that iterates over PCM data. I am getting chunks of data of varying size and I am currently handling this by buffer concatenation. Problem is, I am quite sure that this approach is a performance killer.
One of the simplest algorithm consists of chunking 500 chunks of 4800 bytes (= grain), and repeating them 3 times as such :
buf = <grain1, grain1, grain1, ..., grain500, grain500, grain500>
function(){
// ...
let buf = Buffer.alloc(0) // returned buffer, mutated
// nGrains is defined somewhere else in the function
// example: nGrains = 500
for(let i=0;i<nGrains;i++){
// a chunk of PCM DATA
// example: grain.byteLength = 4800
const grain = Buffer.from(this._getGrain())
// example: nRepeats = 3
for(let j=0;j<nRepeats;j++)
buf = Buffer.concat([buf, grain])
}
return buf
}
I feel like these performance heavy operations (1500 mutating concatenations) could be avoided if there were some sort of way to directly write "raw data" from a given offset to a pre-size-allocated buffer. I made the following helper function that gave me HUGE performance improvements, but I feel like I am doing something wrong...
function writeRaw(buf, rawBytes, offset) => {
for(i=0;i<rawBytes.byteLength;i++){
buf.writeUInt8(rawBytes.readUInt8(i), offset + i)
}
return buf
}
My function now looks like this:
function(){
// ...
const buf = Buffer.alloc(len) // returned buffer, immutable
for(let i=0;i<nGrains;i++){
const grain = Buffer.from(this._getGrain())
for(let j=0;j<nRepeats;j++)
writeRaw(buf, grain, (i * nRepeats + j) * grainSize)
}
return buf
}
My question is : Is there a cleaner way (or more standard way) to do this instead of iterating over bytes ? Buffer.write only seems to work for strings, although this would be ideal...
There is Buffer.copy.
const buf = Buffer.alloc(len);
for(let i = 0; i < nGrains; i++){
const grain = Buffer.from(this._getGrain());
for(let j=0;j<nRepeats;j++)
grain.copy(/*to*/ buf, /*at*/ (i * nRepeats + j) * grainSize);
}
You could also use Buffer.fill:
const buf = Buffer.alloc(len);
for(let i = 0; i < nGrains; i++) {
const grain = Buffer.from(this._getGrain());
buf.fill(grain, i * nRepeats * grainSize, (i + 1) * nRepeats * grainSize);
}
BigQuery uses Javascript for its user-defined functions. Input and outputs that are BYTES in BigQuery are mapped to and from base64-encoded strings in Javascript.
BigQuery doesn't have the browser window object, so atob and btoa are missing. Is there an easy way to encode and decode in the Bigquery JS environment, or do you have to include a library for doing the mapping?
You'll need to include a library, but it's fairly straightforward once you get the JavaScript onto Cloud Storage, and you can use this approach for other common libraries that you want to import. I found an implementation in a StackOverflow post, and I put these contents in a file named btoa_atob.js:
(function () {
var chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=';
function InvalidCharacterError(message) {
this.message = message;
}
InvalidCharacterError.prototype = new Error;
InvalidCharacterError.prototype.name = 'InvalidCharacterError';
// encoder
// [https://gist.github.com/999166] by [https://github.com/nignag]
btoa = function (input) {
var str = String(input);
for (
// initialize result and counter
var block, charCode, idx = 0, map = chars, output = '';
// if the next str index does not exist:
// change the mapping table to "="
// check if d has no fractional digits
str.charAt(idx | 0) || (map = '=', idx % 1);
// "8 - idx % 1 * 8" generates the sequence 2, 4, 6, 8
output += map.charAt(63 & block >> 8 - idx % 1 * 8)
) {
charCode = str.charCodeAt(idx += 3/4);
if (charCode > 0xFF) {
throw new InvalidCharacterError("'btoa' failed: The string to be encoded contains characters outside of the Latin1 range.");
}
block = block << 8 | charCode;
}
return output;
};
// decoder
// [https://gist.github.com/1020396] by [https://github.com/atk]
atob = function (input) {
var str = String(input).replace(/[=]+$/, ''); // #31: ExtendScript bad parse of /=
if (str.length % 4 == 1) {
throw new InvalidCharacterError("'atob' failed: The string to be decoded is not correctly encoded.");
}
for (
// initialize result and counters
var bc = 0, bs, buffer, idx = 0, output = '';
// get next character
buffer = str.charAt(idx++);
// character found in table? initialize bit storage and add its ascii value;
~buffer && (bs = bc % 4 ? bs * 64 + buffer : buffer,
// and if not first of each 4 characters,
// convert the first 8 bits to one ascii character
bc++ % 4) ? output += String.fromCharCode(255 & bs >> (-2 * bc & 6)) : 0
) {
// try to find character in table (0-63, not found => -1)
buffer = chars.indexOf(buffer);
}
return output;
};
}());
Then I copied the file to my Cloud Storage:
gsutil cp btoa_atob.js gs://my-bucket/
Then I wrote a dummy function that uses it:
#standardSQL
CREATE TEMP FUNCTION Foo(b BYTES) RETURNS STRING LANGUAGE js AS """
var result = atob(b);
// ... process result of atob.
return result;
"""
OPTIONS (library='gs://my-bucket/btoa_atob.js');
SELECT Foo(b'\xa0b1\xff\xee');