truncate the string to 1 MB size limit - javascript

I need to cut the string - basically if the string if longer that 1 MB I should cut it to this size.
I am using these functions to check the string size
function __to_mb(bytes) {
return bytes / Math.pow(1024, 2)
}
function __size_mb(str) {
return __to_mb(Buffer.byteLength(str, 'utf8'))
}
Then I check the size of string like this
if (__size_mb(str) > 1) { /* do something */ }
But how to cut it?

A Javascript string consists of 16-bit sequences, with some characters using one 16-bit sequence and others needing two 16-bit sequences.
There is no easy way to just take an amount of bytes and consider it done - there might be a 2x 16-bit character at both sides of the cut-off location, which would then be cut in half.
To make a safe cut, we can use str.codePointAt(index) which was introduced in ES2015. It knows which characters are 16-bit and which are 2x 16-bit. It combines either 1 or 2 of these 16-bit values into an integer result value.
If codePointAt() returns a value <= 2^16-1 then we have a 16-bit character at offset index.
If codePointAt() returns a value >= 2^16 then we have a 2x 16-bit character at offsets index and index+1.
Unfortunately this means going through the entire string to assess each index. This may seem awkward, and it may even be slow, but I am not aware of a faster or smarter way of doing this.
Demo:
var str = "abç🔥😂déΩf👍g😏h"; // string of 13 characters
console.log("str.length = " + str.length); // shows 17 because of double-width chars
console.log("size in bytes = " + str.length * 2); // length * 2 gives size in bytes
var maxByteLengths = [8, 16, 24, 32, 40];
for (var maxBytes of maxByteLengths) {
var data = safeCutOff(str, maxBytes);
console.log(maxBytes + " bytes -> " + data.text + " (" + data.bytes + " bytes)");
}
function safeCutOff(str, maxBytes) {
let widthInBytes = 0;
for (var index = 0; index < str.length; /* index is incremented below */ ) {
let positionsUsed = str.codePointAt(index) <= 0xFFFF ? 1 : 2;
newWidthInBytes = widthInBytes + 2 * positionsUsed;
if (newWidthInBytes > maxBytes)
break;
index += positionsUsed;
widthInBytes = newWidthInBytes;
}
return { text: str.substring(0, index), bytes: widthInBytes };
}

Related

Compress group of arrays into smallest possible string

This question can be answered using javascript, underscore or Jquery functions.
given 4 arrays:
[17,17,17,17,17,18,18,18,18,18,19,19,19,19,19,20,20,20,20,20] => x coordinate of unit
[11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15] => y coordinate of unit
[92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92] => x (unit moves to this direction x)
[35,36,37,38,39,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39] => y (unit moves to this direction y)
They are very related to each other.
For example first element of all arrays: [17,11,92,35] is unit x/y coordinates and also x/y coordinates of this units target.
So here are totally 5*4 = 20 units. Every unit has slightly different stats.
These 4 arrays of units x/y coordinates visually looks like an army of 20 units "x" (targeting "o"):
xxxxx o
xxxxx o
xxxxx o
xxxxx o
There will always be 4 arrays. Even if 1 unit, there will be 4 arrays, but each size of 1. This is the simplest situation and most common.
In real situation, every unit has totally 20 different stats(keys) and 14 keys are mostly exact to other group of units - all 14 keys.
So they are grouped as an army with same stats. Difference is only coordinates of the unit and also coordinates of the units target.
I need to compress all this data into as small as possible data, which later can be decompressed.
There can also be more complex situations, when all these 14 keys are accidently same, but coordinates are totally different from pattern. Example:
[17,17,17,17,17,18,18,18, 215, 18,18,19,19,19,19,19,20,20,20,20,20] => x coordinate of unit
[11,12,13,14,15,11,12,13, 418, 14,15,11,12,13,14,15,11,12,13,14,15] => y coordinate of unit
[92,92,92,92,92,92,92,92, -78, 92,92,92,92,92,92,92,92,92,92,92,92] => x (unit moves to this direction x)
[35,36,37,38,39,35,36,37, -887, 38,39,35,36,37,38,39,35,36,37,38,39] => y (unit moves to this direction y)
In this situation i need to extract this array as for 2 different armies. When there are less than 3 units in army, i just simply write these units without the
pattern - as [215,418,-78,-887],[..] and if there are more than 2 units army, i need a compressed string with pattern, which can be decompressed later. In this example there are 21 units. It just has to be splitted into armies of 1 unit and (5x4 = 20) untis army.
In assumption that every n units has a pattern,
encode units with
n: sequence units count
ssx: start of source x
dsx: difference of source x
ssy: start of source y
dsy: difference of source y
stx: start of target x
dtx: difference of target x
sty: start of target y
dty: difference of target y
by the array: [n,ssx,dsx,ssy,dsy,stx,dtx,sty,dty]
so that the units:
[17,17,17,17,17],
[11,12,13,14,15],
[92,92,92,92,92],
[35,36,37,38,39]
are encoded:
[5,17,0,11,1,92,0,35,1]
of course if you know in advance that, for example the y targets are always the same for such a sequence you can give up the difference parameter, to have:
[n,ssx,dsx,ssy,---,stx,dtx,sty,---] => [n,ssx,dsx,ssy,stx,dtx,sty], and so on.
For interruption of a pattern like you mentioned in your last example, you can use other 'extra' arrays, and then insert them in the sequence, with:
exsx: extra unit starting x
exsy: extra unit starting y
extx: extra unit target x
exty: extra unit target y
m: insert extra unit at
so that the special case is encoded:
{
patterns:[
[5,17,0,11,1,92,0,35,1],
[5,18,0,11,1,92,0,35,1],
[5,19,0,11,1,92,0,35,1],
[5,17,0,11,1,92,0,35,1]
],
extras: [
[215,418,-78,-887,8] // 8 - because you want to insert this unit at index 8
]
}
Again, this is a general encoding. Any specific properties for the patterns may further reduce the encoded representation.
Hope this helps.
High compression using bitstreams
You can encode sets of values into a bit stream allowing you to remove unused bits. The numbers you have shown are not greater than -887 (ignoring the negative) and that means you can fit all the numbers into 10 bits saving 54 bits per number (Javascript uses 64 bit numbers).
Run length compression
You also have many repeated sets of numbers which you can use run length compression on. You set a flag in the bitstream that indicates that the following set of bits represents a repeated sequence of numbers, then you have the number of repeats and the value to repeat. For sequences of random numbers you just keep them as is.
If you use run-length compression you create a block type structure in the bit stream, this makes it possible to embed further compression. As you have many numbers that are below 128 many of the numbers can be encoded into 7 bits, or even less. For a small overhead (in this case 2 bits per block) you can select the smallest bit size to pack all the numbers in that block in.
Variable bit depth numbers
I have created a number type value that represent the number of bits used to store numbers in a block. Each block has a number type and all numbers in the block use that type. There are 4 number types that can be encoded into 2 bits.
00 = 4 bit numbers. Range 0-15
01 = 5 bit numbers. Range 0-31
10 = 7 bit numbers. Range 0-127
11 = 10 bit numbers. Range 0-1023
The bitstream
To make this easy you will need a bit stream read/write. It allows you to easily write and read bits from a stream of bits.
// Simple unsigned bit stream
// Read and write to and from a bit stream.
// Numbers are stored as Big endian
// Does not comprehend sign so wordlength should be less than 32 bits
// methods
// eof(); // returns true if read pos > buffer bit size
// read(numberBits); // number of bits to read as one number. No sign so < 32
// write(value,numberBits); // value to write, number of bits to write < 32
// getBuffer(); // return object with buffer and array of numbers, and bitLength the total number of bits
// setBuffer(buffer,bitLength); // the buffers as an array of numbers, and bitLength the total number of bits
// Properties
// wordLength; // read only length of a word.
function BitStream(){
var buffer = [];
var pos = 0;
var numBits = 0;
const wordLength = 16;
this.wordLength = wordLength;
// read a single bit
var readBit = function(){
var b = buffer[Math.floor(pos / wordLength)]; // get word
b = (b >> ((wordLength - 1) - (pos % wordLength))) & 1;
pos += 1;
return b;
}
// write a single bit. Will fill bits with 0 if wite pos is moved past buffer length
var writeBit = function(bit){
var rP = Math.floor(pos / wordLength);
if(rP >= buffer.length){ // check that the buffer has values at current pos.
var end = buffer.length; // fill buffer up to pos with zero
while(end <= rP){
buffer[end] = 0;
end += 1;
}
}
var b = buffer[rP];
bit &= 1; // mask out any unwanted bits
bit <<= (wordLength - 1) - (pos % wordLength);
b |= bit;
buffer[rP] = b;
pos += 1;
}
// returns true is past eof
this.eof = function(){
return pos >= numBits;
}
// reads number of bits as a Number
this.read = function(bits){
var v = 0;
while(bits > 0){
v <<= 1;
v |= readBit();
bits -= 1;
}
return v;
}
// writes value to bit stream
this.write = function(value,bits){
var v;
while(bits > 0){
bits -= 1;
writeBit( (value >> bits) & 1 );
}
}
// returns the buffer and length
this.getBuffer = function(){
return {
buffer : buffer,
bitLength : pos,
};
}
// set the buffer and length and returns read write pos to start
this.setBuffer = function(_buffer,bitLength){
buffer = _buffer;
numBits = bitLength;
pos = 0;
}
}
A format for your numbers
Now to design the format. The first bit read from a stream is a sequence flag, if 0 then the following block will be a repeated value, if 1 the following block will be a sequence of random numbers.
Block bits : description;
repeat block holds a repeated number
bit 0 : Val 0 = repeat
bit 1 : Val 0 = 4bit repeat count or 1 = 5bit repeat count
then either
bits 2,3,4,5 : 4 bit number of repeats - 1
bits 6,7 : 2 bit Number type
or
bits 2,3,4,5,6 : 5 bit number of repeats - 1
bits 7,8 : 2 bit Number type
Followed by
Then a value that will be repeated depending on the number type
End of block
sequence block holds a sequence of random numbers
bit 0 : Val 1 = sequence
bit 1 : Val 0 = positive sequence Val 1 = negative sequence
bits 2,3,4,5 : 4 bit number of numbers in sequence - 1
bits 6,7 : 2 bit Number type
then the sequence of numbers in the number format
End of block.
Keep reading blocks until the end of file.
Encoder and decoder
The following object will encode and decode the a flat array of numbers. It will only handles numbers upto 10 bits long, So no values over 1023 or under -1023.
If you want larger numbers you will have to change the number types that are used. To do this change the arrays
const numberSize = [0,0,0,0,0,1,2,2,3,3,3]; // the number bit depth
const numberBits = [4,5,7,10]; // the number bit depth lookup;
If you want max number to be 12 bits -4095 to 4095 ( the sign bit is in the block encoding). I have also shown the 7 bit number type changed to 8. The first array is used to look up the bit depth, if I have a 3 bit number you get the number type with numberSize[bitcount] and the bits used to store the number numberBits[numberSize[bitCount]]
const numberSize = [0,0,0,0,0,1,2,2,2,3,3,3,3]; // the number bit depth
const numberBits = [4,5,8,12]; // the number bit depth lookup;
function ArrayZip(){
var zipBuffer = 0;
const numberSize = [0,0,0,0,0,1,2,2,3,3,3]; // the number bit depth lookup;
const numberBits = [4,5,7,10]; // the number bit depth lookup;
this.encode = function(data){ // encodes the data
var pos = 0;
function getRepeat(){ // returns the number of repeat values
var p = pos + 1;
if(data[pos] < 0){
return 1; // ignore negative numbers
}
while(p < data.length && data[p] === data[pos]){
p += 1;
}
return p - pos;
}
function getNoRepeat(){ // returns the number of non repeat values
// if the sequence has negitive numbers then
// the length is returned as a negative
var p = pos + 1;
if(data[pos] < 0){ // negative numbers
while(p < data.length && data[p] !== data[p-1] && data[p] < 0){
p += 1;
}
return -(p - pos);
}
while(p < data.length && data[p] !== data[p-1] && data[p] >= 0){
p += 1;
}
return p - pos;
}
function getMax(count){
var max = 0;
var p = pos;
while(count > 0){
max = Math.max(Math.abs(data[p]),max);
p += 1;
count -= 1;
}
return max;
}
var out = new BitStream();
while(pos < data.length){
var reps = getRepeat();
if(reps > 1){
var bitCount = numberSize[Math.ceil(Math.log(getMax(reps) + 1) / Math.log(2))];
if(reps < 16){
out.write(0,1); // repeat header
out.write(0,1); // use 4 bit repeat count;
out.write(reps-1,4); // write 4 bit number of reps
out.write(bitCount,2); // write 2 bit number size
out.write(data[pos],numberBits[bitCount]);
pos += reps;
}else {
if(reps > 32){ // if more than can fit in one repeat block split it
reps = 32;
}
out.write(0,1); // repeat header
out.write(1,1); // use 5 bit repeat count;
out.write(reps-1,5); // write 5 bit number of reps
out.write(bitCount,2); // write 2 bit number size
out.write(data[pos],numberBits[bitCount]);
pos += reps;
}
}else{
var seq = getNoRepeat(); // get number no repeats
var neg = seq < 0 ? 1 : 0; // found negative numbers
seq = Math.min(16,Math.abs(seq));
// check if last value is the start of a repeating block
if(seq > 1){
var tempPos = pos;
pos += seq;
seq -= getRepeat() > 1 ? 1 : 0;
pos = tempPos;
}
// ge the max bit count to hold numbers
var bitCount = numberSize[Math.ceil(Math.log(getMax(seq) + 1) / Math.log(2))];
out.write(1,1); // sequence header
out.write(neg,1); // write negative flag
out.write(seq - 1,4); // write sequence length;
out.write(bitCount,2); // write 2 bit number size
while(seq > 0){
out.write(Math.abs(data[pos]),numberBits[bitCount]);
pos += 1;
seq -= 1;
}
}
}
// get the bit stream buffer
var buf = out.getBuffer();
// start bit stream with number of trailing bits. There are 4 bits used of 16 so plenty
// of room for aulturnative encoding flages.
var str = String.fromCharCode(buf.bitLength % out.wordLength);
// convert bit stream to charcters
for(var i = 0; i < buf.buffer.length; i ++){
str += String.fromCharCode(buf.buffer[i]);
}
// return encoded string
return str;
}
this.decode = function(zip){
var count,rSize,header,_in,i,data,endBits,numSize,val,neg;
data = []; // holds character codes
decompressed = []; // holds the decompressed array of numbers
endBits = zip.charCodeAt(0); // get the trailing bits count
for(i = 1; i < zip.length; i ++){ // convert string to numbers
data[i-1] = zip.charCodeAt(i);
}
_in = new BitStream(); // create a bitstream to read the bits
// set the buffer data and length
_in.setBuffer(data,(data.length - 1) * _in.wordLength + endBits);
while(!_in.eof()){ // do until eof
header = _in.read(1); // read header bit
if(header === 0){ // is repeat header
rSize = _in.read(1); // get repeat count size
if(rSize === 0){
count = _in.read(4); // get 4 bit repeat count
}else{
count = _in.read(5); // get 5 bit repeat count
}
numSize = _in.read(2); // get 2 bit number size type
val = _in.read(numberBits[numSize]); // get the repeated value
while(count >= 0){ // add to the data count + 1 times
decompressed.push(val);
count -= 1;
}
}else{
neg = _in.read(1); // read neg flag
count = _in.read(4); // get 4 bit seq count
numSize = _in.read(2); // get 2 bit number size type
while(count >= 0){
if(neg){ // if negative numbers convert to neg
decompressed.push(-_in.read(numberBits[numSize]));
}else{
decompressed.push(_in.read(numberBits[numSize]));
}
count -= 1;
}
}
}
return decompressed;
}
}
The best way to store a bit stream is as a string. Javascript has Unicode strings so we can pack 16 bits into every character
The results and how to use.
You need to flatten the array. If you need to add extra info to reinstate the multi/dimensional arrays just add that to the array and let the compressor compress it along with the rest.
// flatten the array
var data = [17,17,17,17,17,18,18,18,18,18,19,19,19,19,19,20,20,20,20,20,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39];
var zipper = new ArrayZip();
var encoded = zipper.encode(data); // packs the 80 numbers in data into 21 characters.
// compression rate of the data array 5120 bits to 336 bits
// 93% compression.
// or as a flat 7bit ascii string as numbers 239 charcters (including ,)
// 239 * 7 bits = 1673 bits to 336 bits 80% compression.
var decoded = zipper.decode(encoded);
I did not notice the negative numbers at first so the compression does not do well with the negative values.
var data = [17,17,17,17,17,18,18,18, 215, 18,18,19,19,19,19,19,20,20,20,20,20, 11,12,13,14,15,11,12,13, 418, 14,15,11,12,13,14,15,11,12,13,14,15, 92,92,92,92,92,92,92,92, -78, 92,92,92,92,92,92,92,92,92,92,92,92, 35,36,37,38,39,35,36,37, -887, 38,39,35,36,37,38,39,35,36,37,38,39]
var encoded = zipper.encode(data); // packs the 84 numbers in data into 33 characters.
// compression rate of the data array 5376 bits to 528 bits
var decoded = zipper.decode(encoded);
Summary
As you can see this results in a very high compression rate (almost twice as good as LZ compression). The code is far from optimal and you could easily implement a multi pass compressor with various settings ( there are 12 spare bits at the start of the encoded string that can be used to select many options to improve compression.)
Also I did not see the negative numbers until I came back to post so the fix for negatives is not good, so you can some more out of it by modifying the bitStream to understand negatives (ie use the >>> operator)

Find line by character position

I have a string of around 4MB (4 million characters) and around 30.000 lines in a variable. Next I have the index of a character, lets say 3605506, what would be the quickest most efficient way to find on which line this character is? I need to do this hundreds of times after each other, so that's why it's relatively important it's efficient.
Pass the string and and index to the below function. It splits the string based on new line characters and checks if the count has passed the index value.
function getlineNumberofChar(data,index) {
var perLine = data.split('\n');
var total_length = 0;
for (i = 0; i < perLine.length; i++) {
total_length += perLine[i].length;
if (total_length >= index)
return i + 1;
}
}
Similar to brute_force but with the off-by-1 error fixed. Also returns column number.
const lines = code.split('\n')
function findLineColForByte(lines, index) {
let totalLength = 0
let lineStartPos = 0
for (let lineNo = 0; lineNo < lines.length; lineNo++) {
totalLength += lines[lineNo].length + 1 // Because we removed the '\n' during split.
if (index < totalLength) {
const colNo = index - lineStartPos
return [lineNo + 1, colNo]
}
lineStartPos = totalLength
}
}
You mentioned that
I need to do this hundreds of times after each other, so that's why it's relatively important it's efficient.
Most of these solutions require the computations to be done for each lookup, which means you are doing a lot of work over and over again.
To checkpoint some of these computations would (could) improve efficiency greatly.
Of course, first things first we need to split the lines up:
/**
* Returns a tuple (array with two elements) containing the split lines
* and whether or not the last character was a newLine
*
* #param {string} stringData The string to split
*
* #return {array} a tuple containing the lines
* and a boolean for if the last line has a newLine
*/
function splitLines( stringData ) {
var lines = stringData.split("\n");
if(stringData.slice(-1) === '\n') {
lines.pop(); // Remove last empty line
return [lines, true];
} else {
return [lines, false];
}
}
This will ensure that our last line is not an empty string, if this is arbitrary, you don't need to check for this.
Next up is computing the cumulative character count for each line, that is, after line x there have been n total characters.
/**
* Returns an array with the cumulative character count from the beginning,
* based on the line number
*
* #param {array} lineData The lines of the string
* #param {boolean} lastLineHasNewLineChar Whether or not the last line had a newLineChar
*
* #return {array} The cumulative character counts for each line
* (e.g.) Line 0 has 18 chars plus a newLine, or 19; Line 1 has 8 chars, so 28, etc, etc.
*/
function buildLineEndingPositions( lineData, lastLineHasNewLineChar = false ) {
var cumulativeSum = (sum => lineCharCount => sum += lineCharCount)(0); // Start sum at 0, keep adding the chars from each line.
var numLines = lineData.length;
var lineLengths = lineData.map( (line, index) => {
if(numLines - 1 === index && !lastLineHasNewLineChar) {
return line.length; // last line, last char was not a new line
} else {
return line.length + 1; // new line char was stripped
}
});
return lineLengths.map(cumulativeSum);
}
Finally, we can compute these once, and access them for any number of future lookups based on character position to determine the line (the first index to be less than or equal to the cumulative character count)
const testString = "There once was a guy from france\nHe really liked to dance\nUntil one day, his legs ran away\nIdk where I was going with this";
const [testLines, lastLineHadNewLineChar] = splitLines(testString);
const cumulativeCharCounts = buildLineEndingPositions(testLines, lastLineHadNewLineChar);
console.log(cumulativeCharCounts); //[33, 58, 91, 122]
By iterating through the cumulativeCharCounts we can now use the index to determine the line number with a simple boolean compare to the desired char position, until we reach the first cumulative position that is less than or equal to our desired position. The split and cumulative counts are figured out 1x and reused, thus less overhead for each of the hundreds of calls.
// Let this be your 4MB string.
var str = "This \n is a\n test\n string."
// Let this be the index of the character you are finding within the 4MB string.
var index = str.indexOf("test")
// Create substring from beginning to index of character.
var substr = str.substring(0, index)
// Count the number of new lines.
var numberOfLines = (function(){
try{
// Add 1 to final result to account for the first line.
return substr.match(new RegExp("\n", "g")).length + 1
} catch(e){
// Return 1 if none found because the character is found on the first line.
return 1
}})()

SIP2 Checksum Calculation in Javascript

I'm working on a REST interface to a library system that uses the SIP2 protocol (https://en.wikipedia.org/wiki/Standard_Interchange_Protocol) and was able to get things working on a system that doesn't require error correction without a problem. However, my code is now talking to another system that requires checksums, described as so in the specification:
"To calculate the checksum add each character as an unsigned binary number, take the lower 16 bits of the total and perform a 2's complement. The checksum field is the result represented by four hex digits."
I've taken a few runs at this but not matter what I do I can't get a checksum back that matches my example message. I'm probably making this harder than it should be (seems like it would be easier in a lower-level language with proper binary types, etc.). Here's my latest attempt:
var checksum = 0;
var message = "63AOAA21221021780249|AD9999|AY0AZ";
// add each character as an unsigned binary number
for(var i=0;i<message.length;i++){
checksum += message[i].charCodeAt();
}
console.log("character sum: " + checksum);
// take the lower 16 bits of the total
checksum = checksum.toString(2);
console.log("character sum binary representation: " + checksum);
while(checksum.length < 16){
checksum = "0" + checksum;
}
checksum = checksum.substr(0,16);
console.log("lower 16 bits of character total: " + checksum);
// convert to dec
checksum = parseInt(checksum,2);
console.log("checksum dec: " + checksum);
// perform 2's complement
checksum = (checksum & 0xFFFF) * -1;
console.log("2s complement: " + checksum.toString(2));
// convert to 4 hex digits
checksum = dec2hex(checksum);
console.log("checksum hex: " + checksum);
function dec2hex(i) {
return (i+0x10000).toString(16).substr(-4).toUpperCase();
}
The expected checksum for the string above is "F39A".
I found a way to get there, it's probably not the most elegant approach, and certainly not high-performance, but it works reliably.
Here's the code should some other unfortunate soul find themselves looking to answer this question. I'm plan to bundle this up with some other SIP2-related bits in a library, but for now here's a function that will generate the checksum.
function sip2_checksum(message){
var checksum_int = 0;
var checksum_binary_string = "";
var checksum_binary_string_inverted = "";
var checksum_binary_string_inverted_plus1 = "";
var checksum_hex_string = "";
// add each character as an unsigned binary number
for(var i=0;i<message.length;i++){
checksum_int += message[i].charCodeAt();
}
// convert integer to binary representation stored in a string
while(checksum_int > 0){
checksum_binary_string = (checksum_int % 2).toString() + checksum_binary_string;
checksum_int = Math.floor(checksum_int / 2);
}
// pad binary string to 16 bytes
while(checksum_binary_string.length < 16){
checksum_binary_string = "0" + checksum_binary_string;
}
// invert the binary string
for(var i=0;i<checksum_binary_string.length;i++){
var inverted_value = "X"; // something weird to make mistakes jump out
if(checksum_binary_string[i] == "1"){
inverted_value = "0";
} else {
inverted_value = "1";
}
checksum_binary_string_inverted += inverted_value;
}
// add 1 to the binary string
var carry_bit = true;
for(var i=checksum_binary_string_inverted.length - 1;i>=0;i--){
if(carry_bit){
if(checksum_binary_string_inverted[i] === "0"){
checksum_binary_string_inverted_plus1 = "1" + checksum_binary_string_inverted_plus1;
carry_bit = false;
} else {
checksum_binary_string_inverted_plus1 = "0" + checksum_binary_string_inverted_plus1;
carry_bit = true;
}
} else {
checksum_binary_string_inverted_plus1 = checksum_binary_string_inverted[i] + checksum_binary_string_inverted_plus1;
}
}
// convert binary string to hex string and uppercase it because that's what the gateway likes
checksum_hex_string = parseInt(checksum_binary_string_inverted_plus1,2).toString(16).toUpperCase();
return checksum_hex_string;
}

Convert two 32 bit integers to one signed 64 bit integer string

I have a 64 bit unsigned integer I need to represent in PostgreSQL. I've broken it down into two 32 bit unsigned integers, high and low. To allow Postgres to accept it, I need to convert high and low to a string representing a signed 64 bit integer.
How can I go about converting two 32 bit unsigned integers to a string representing in decimal a signed 64 bit integer?
I've done exactly this in Javascript in a quick'n'dirty-but-works'n'fast manner at: Int64HighLowToFromString, using 53-bit mantissa double precision arithmetic and 32-bit bit operations, specialized for decimal input/output.
function Int64HiLoToString(hi,lo){
hi>>>=0;lo>>>=0;
var sign="";
if(hi&0x80000000){
sign="-";
lo=(0x100000000-lo)>>>0;
hi=0xffffffff-hi+ +(lo===0);
}
var dhi=~~(hi/0x5af4),dhirem=hi%0x5af4;
var dlo=dhirem*0x100000000+dhi*0xef85c000+lo;
dhi += ~~(dlo/0x5af3107a4000);
dlo%=0x5af3107a4000;
var slo=""+dlo;
if(dhi){
slo="000000000000000000".slice(0,14-slo.length)+dlo;
return sign+dhi+slo;
}else{
return sign+slo;
}
}
Most likely this is what you needed.
I adapted the base conversion code from https://codegolf.stackexchange.com/questions/1620/arbitrary-base-conversion. Mistakes are mine, clevernesses are theirs.
I also had to add a bunch of code to deal with negative numbers (twos complement).
This code is ecmascript5, and will need slight reworking to work in older browsers.
function convert(hi, lo) {
function invertBit(bit) {
return bit == "0" ? "1" : "0";
}
function binaryInvert(binaryString) {
return binaryString.split("").map(invertBit).join("");
}
function binaryIncrement(binaryString) {
var idx = binaryString.lastIndexOf("0");
return binaryString.substring(0, idx) + "1" + binaryInvert(binaryString.substring(idx + 1));
}
function binaryDecrement(binaryString) {
var idx = binaryString.lastIndexOf("1");
return binaryString.substring(0, idx) + binaryInvert(binaryString.substring(idx));
}
function binaryAbs(binaryString) {
if (binaryString[0] === "1") {
return invertBits(binaryDecrement(binaryString));
}
return binaryString;
}
function to32Bits(val) {
var binaryString = val.toString(2);
if (binaryString[0] === "-") {
binaryString = Array(33 - (binaryString.length - 1)).join("1") + binaryInvert(binaryString.substr(1));
return binaryIncrement(binaryString);
}
return Array(33 - binaryString.length).join("0") + binaryString;
}
var fullBinaryNumber = to32Bits(hi) + to32Bits(lo);
var isNegative = fullBinaryNumber[0] === "1";
fullBinaryNumber = binaryAbs(fullBinaryNumber);
var result = "";
while (fullBinaryNumber.length > 0) {
var remainingToConvert = "", resultDigit = 0;
for (var position = 0; position < fullBinaryNumber.length; ++position) {
var currentValue = Number(fullBinaryNumber[position]) + resultDigit * 2;
var remainingDigitToConvert = Math.floor(currentValue / 10);
resultDigit = currentValue % 10;
if (remainingToConvert.length || remainingDigitToConvert) {
remainingToConvert += remainingDigitToConvert;
}
}
fullBinaryNumber = remainingToConvert;
result = resultDigit + result;
}
return (isNegative?"-":"") + result;
}
Examples:
> // largest negative number -2^63 (just the most significant bit set)
> convert(1 << 31, 0)
'-9223372036854775808'
> // largest positive number
> convert(0x7fffffff, 0xffffffff)
'9223372036854775807'
> // -1 is all bits set.
> convert(0xffffffff, 0xffffffff)
'-1'
According to JavaScript can't handle 64-bit integers, can it?, native numbers in Javascript have 53 bits of mantissa, so JS can't deal with 64 bits integers unless using specialized libraries.
Whatever the datatype and implementation limits, I assume you want to compute the Two's complement of the initial 64 bits unsigned number, to convert it from the [0 ... 2^64-1] range into the [-2^63 ... 2^63-1] range.
high is presumably the initial unsigned 64 bits number divided by 2^32, and low is the remainder.
The conversion to a signed 64 bits should go like this:
if high>=2^63 then
s64 = -(2^64-(high*2^32+low))
else
s64 = high*2^32+low;
In a PostgreSQL function, this can be done using the exact-precision numeric type to avoid overflows in intermediate multiplications, and downcast the final result to bigint (signed 64 bits):
create function concat64(bigint, bigint) returns bigint
as $$
select (case when $1>=2147483648
then -(18446744073709551616::numeric-($1*4294967296::numeric+$2))
else $1*4294967296::numeric+$2 end)::bigint;
$$ language sql;
The input arguments have to be bigint (64 bits) because postgres doesn't have unsigned types.
They're assumed to be in the [0..4294967296] range and the output should be in the [-9223372036854775808..9223372036854775807] range.

Converting large integer to 8 byte array in JavaScript

I'm trying to convert a large number into an 8 byte array in javascript.
Here is an IMEI that I am passing in: 45035997012373300
var bytes = new Array(7);
for(var k=0;k<8;k++) {
bytes[k] = value & (255);
value = value / 256;
}
This ends up giving the byte array: 48,47,7,44,0,0,160,0. Converted back to a long, the value is 45035997012373296, which is 4 less than the correct value.
Any idea why this is and how I can fix it to serialize into the correct bytes?
Since you are converting from decimal to bytes, dividing by 256 is an operation that is pretty easily simulated by splitting up a number in a string into parts. There are two mathematical rules that we can take advantage of.
The right-most n digits of a decimal number can determine divisibility by 2^n.
10^n will always be divisible by 2^n.
Thus we can take the number and split off the right-most 8 digits to find the remainder (i.e., & 255), divide the right part by 256, and then also divide the left part of the number by 256 separately. The remainder from the left part can be shifted into the right part of the number (the right-most 8 digits) by the formula n*10^8 \ 256 = (q*256+r)*10^8 \ 256 = q*256*10^8\256 + r*10^8\256 = q*10^8 + r*5^8, where \ is integer division and q and r are quotient and remainder, respectively for n \ 256. This yields the following method to do integer division by 256 for strings of up to 23 digits (15 normal JS precision + 8 extra yielded by this method) in length:
function divide256(n)
{
if (n.length <= 8)
{
return (Math.floor(parseInt(n) / 256)).toString();
}
else
{
var top = n.substring(0, n.length - 8);
var bottom = n.substring(n.length - 8);
var topVal = Math.floor(parseInt(top) / 256);
var bottomVal = Math.floor(parseInt(bottom) / 256);
var rem = (100000000 / 256) * (parseInt(top) % 256);
bottomVal += rem;
topVal += Math.floor(bottomVal / 100000000); // shift back possible carry
bottomVal %= 100000000;
if (topVal == 0) return bottomVal.toString();
else return topVal.toString() + bottomVal.toString();
}
}
Technically this could be implemented to divide an integer of any arbitrary size by 256, simply by recursively breaking the number into 8-digit parts and handling the division of each part separately using the same method.
Here is a working implementation that calculates the correct byte array for your example number (45035997012373300): http://jsfiddle.net/kkX2U/.
[52, 47, 7, 44, 0, 0, 160, 0]
Your value and the largest JavaScript integer compared:
45035997012373300 // Yours
9007199254740992 // JavaScript's biggest integer
JavaScript cannot represent your original value exactly as an integer; that's why your script breaking it down gives you an inexact representation.
Related:
var diff = 45035997012373300 - 45035997012373298;
// 0 (not 2)
Edit: If you can express your number as a hexadecimal string:
function bytesFromHex(str,pad){
if (str.length%2) str="0"+str;
var bytes = str.match(/../g).map(function(s){
return parseInt(s,16);
});
if (pad) for (var i=bytes.length;i<pad;++i) bytes.unshift(0);
return bytes;
}
var imei = "a000002c072f34";
var bytes = bytesFromHex(imei,8);
// [0,160,0,0,44,7,47,52]
If you need the bytes ordered from least-to-most significant, throw a .reverse() on the result.
store the imei as a hex string (if you can), then parse the string in that manner, this way you can keep the precision when you build the array. I will be back with a PoC when i get home on my regular computer, if this question has not been answered.
something like:
function parseHexString(str){
for (var i=0, j=0; i<str.length; i+=2, j++){
array[j] = parseInt("0x"+str.substr(i, 2));
}
}
or close to that whatever...

Categories

Resources