I just took a coding test online and this one question really bothered me. My solution was correct but was rejected for being unoptimized. The question is as following:
Write a function combineTheGivenNumber taking two arguments:
numArray: number[]
num: a number
The function should check all the concatenation pairs that can result in making a number equal to num and return their count.
E.g. if numArray = [1, 212, 12, 12] & num = 1212 then we will have return value of 3 from combineTheGivenNumber
The pairs are as following:
numArray[0]+numArray[1]
numArray[2]+numArray[3]
numArray[3]+numArray[2]
The function I wrote for this purpose is as following:
function combineTheGivenNumber(numArray, num) {
//convert all numbers to strings for easy concatenation
numArray = numArray.map(e => e+'');
//also convert the `hay` to string for easy comparison
num = num+'';
let pairCounts = 0;
// itereate over the array to get pairs
numArray.forEach((e,i) => {
numArray.forEach((f,j) => {
if(i!==j && num === (e+f)) {
pairCounts++;
}
});
});
return pairCounts;
}
console.log('Test 1: ', combineTheGivenNumber([1,212,12,12],1212));
console.log('Test 2: ', combineTheGivenNumber([4,21,42,1],421));
From my experience, I know conversion of number to string is slow in JS, but I am not sure whether my approach is wrong/lack of knowledge or does the tester is ignorant of this fact. Can anyone suggest further optimization of the code snipped?
Elimination of string to number to string will be a significant speed boost but I am not sure how to check for concatenated numbers otherwise.
Elimination of string to number to string will be a significant speed boost
No, it won't.
Firstly, you're not converting strings to numbers anywhere, but more importantly the exercise asks for concatenation so working with strings is exactly what you should do. No idea why they're even passing numbers. You're doing fine already by doing the conversion only once for each number input, not every time your form a pair. And last but not least, avoiding the conversion will not be a significant improvement.
To get a significant improvement, you should use a better algorithm. #derpirscher is correct in his comment: "[It's] the nested loop checking every possible combination which hits the time limit. For instance for your example, when the outer loop points at 212 you don't need to do any checks, because regardless, whatever you concatenate to 212, it can never result in 1212".
So use
let pairCounts = 0;
numArray.forEach((e,i) => {
if (num.startsWith(e)) {
//^^^^^^^^^^^^^^^^^^^^^^
numArray.forEach((f,j) => {
if (i !== j && num === e+f) {
pairCounts++;
}
});
}
});
You might do the same with suffixes, but it becomes more complicated to rule out concatenation to oneself there.
Optimising further, you can even achieve a linear complexity solution by putting the strings in a lookup structure, then when finding a viable prefix just checking whether the missing part is an available suffix:
function combineTheGivenNumber(numArray, num) {
const strings = new Map();
for (const num of numArray) {
const str = String(num);
strings.set(str, 1 + (strings.get(str) ?? 0));
}
const whole = String(num);
let pairCounts = 0;
for (const [prefix, pCount] of strings) {
if (!whole.startsWith(prefix))
continue;
const suffix = whole.slice(prefix.length);
if (strings.has(suffix)) {
let sCount = strings.get(suffix);
if (suffix == prefix) sCount--; // no self-concatenation
pairCounts += pCount*sCount;
}
}
return pairCounts;
}
(the proper handling of duplicates is a bit difficile)
I like your approach of going to strings early. I can suggest a couple of simple optimizations.
You only need the numbers that are valid "first parts" and those that are valid "second parts"
You can use the javascript .startsWith and .endsWith to test for those conditions. All other strings can be thrown away.
The lengths of the strings must add up to the length of the desired answer
Suppose your target string is 8 digits long. If you have 2 valid 3-digit "first parts", then you only need to know how many valid 5-digit "second parts" you have. Suppose you have 9 of them. Those first parts can only combine with those second parts, and give you 2 * 9 = 18 valid pairs.
You don't actually need to keep the strings!
It struck me that if you know you have 2 valid 3-digit "first parts", you don't need to keep those actual strings. Knowing that they are valid 2-digit first parts is all you need to know.
So let's build an array containing:
How many valid 1-digit first parts do we have?,
How many valid 2-digit first parts do we have?,
How many valid 3-digit first parts do we have?,
etc.
And similarly an array containing the number of valid 1-digit second parts, etc.
X first parts and Y second parts can be combined in X * Y ways
Except if the parts are the same length, in which case we are reusing the same list, and so it is just X * (Y-1).
So not only do we not need to keep the strings, but we only need to do the multiplication of the appropriate elements of the arrays.
5 1-char first parts & 7 3-char second parts = 5 * 7 = 35 pairs
6 2-char first part & 4 2-char second parts = 6 * (4-1) = 18 pairs
etc
So this becomes extremely easy. One pass over the strings, tallying the "first part" and "second part" matches of each length. This can be done with an if and a ++ of the relevant array element.
Then one pass over the lengths, which will be very quick as the array of lengths will be very much shorter than the array of actual strings.
function combineTheGivenNumber(numArray, num) {
const sElements = numArray.map(e => "" + e);
const sTarget = "" + num;
const targetLength = sTarget.length
const startsByLen = (new Array(targetLength)).fill(0);
const endsByLen = (new Array(targetLength)).fill(0);
sElements.forEach(sElement => {
if (sTarget.startsWith(sElement)) {
startsByLen[sElement.length]++
}
if (sTarget.endsWith(sElement)) {
endsByLen[sElement.length]++
}
})
// We can now throw away the strings. We have two separate arrays:
// startsByLen[1] is the count of strings (without attempting to remove duplicates) which are the first character of the required answer
// startsByLen[2] similarly the count of strings which are the first 2 characters of the required answer
// etc.
// and endsByLen[1] is the count of strings which are the last character ...
// and endsByLen[2] is the count of strings which are the last 2 characters, etc.
let pairCounts = 0;
for (let firstElementLength = 1; firstElementLength < targetLength; firstElementLength++) {
const secondElementLength = targetLength - firstElementLength;
if (firstElementLength === secondElementLength) {
pairCounts += startsByLen[firstElementLength] * (endsByLen[secondElementLength] - 1)
} else {
pairCounts += startsByLen[firstElementLength] * endsByLen[secondElementLength]
}
}
return pairCounts;
}
console.log('Test 1: ', combineTheGivenNumber([1, 212, 12, 12], 1212));
console.log('Test 2: ', combineTheGivenNumber([4, 21, 42, 1], 421));
Depending on a setup, the integer slicing can be marginally faster
Although in the end it falls short
Also, when tested on higher N values, the previous answer exploded in jsfiddle. Possibly a memory error.
As far as I have tested with both random and hand-crafted values, my solution holds. It is based on an observation, that if X, Y concantenated == Z, then following must be true:
Z - Y == X * 10^(floor(log10(Y)) + 1)
an example of this:
1212 - 12 = 1200
12 * 10^(floor((log10(12)) + 1) = 12 * 10^(1+1) = 12 * 100 = 1200
Now in theory, this should be faster then manipulating strings. And in many other languages it most likely would be. However in Javascript as I just learned, the situation is a bit more complicated. Javascript does some weird things with casting that I haven't figured out yet. In short - when I tried storing the numbers(and their counts) in a map, the code got significantly slower making any possible gains from this logarithm shenanigans evaporate. Furthermore, storing them in a custom-crafted data structure isn't guaranteed to be faster since you have to build it etc. Also it would be quite a lot of work.
As it stands this log comparison is ~ 8 times faster in a case without(or with just a few) matches since the quadratic factor is yet to kick in. As long as the possible postfix count isn't too high, it will outperform the linear solution. Unfortunately it is still quadratic in nature with the breaking point depending on a total number of strings as well as their length.
So if you are searching for a needle in a haystack - for example you are looking for a few pairs in a huge heap of numbers, this can help. In the other case of searching for many matches, this won't help. Similarly, if the input array was sorted, you could use binary search to push the breaking point further up.
In the end, unless you manage to figure out how to store ints in a map(or some custom implementation of it) in a way that doesn't completely kill the performance, the linear solution of the previous answer will be faster. It can still be useful even with the performance hit if your computation is going to be memory heavy. Storing numbers takes less space then storing strings.
var log10 = Math.log(10)
function log10floored(num) {
return Math.floor(Math.log(num) / log10)
}
function combineTheGivenNumber(numArray, num) {
count = 0
for (var i=0; i!=numArray.length; i++) {
let portion = num - numArray[i]
let removedPart = Math.pow(10, log10floored(numArray[i]))
if (portion % (removedPart * 10) == 0) {
for (var j=0; j!=numArray.length; j++) {
if (j != i && portion / (removedPart * 10) == numArray[j] ) {
count += 1
}
}
}
}
return count
}
//The previous solution, that I used for timing, comparison and check purposes
function combineTheGivenNumber2(numArray, num) {
const strings = new Map();
for (const num of numArray) {
const str = String(num);
strings.set(str, 1 + (strings.get(str) ?? 0));
}
const whole = String(num);
let pairCounts = 0;
for (const [prefix, pCount] of strings) {
if (!whole.startsWith(prefix))
continue;
const suffix = whole.slice(prefix.length);
if (strings.has(suffix)) {
let sCount = strings.get(suffix);
if (suffix == prefix) sCount--; // no self-concatenation
pairCounts += pCount*sCount;
}
}
return pairCounts;
}
var myArray = []
for (let i =0; i!= 10000000; i++) {
myArray.push(Math.floor(Math.random() * 1000000))
}
var a = new Date()
t1 = a.getTime()
console.log('Test 1: ', combineTheGivenNumber(myArray,15285656));
var b = new Date()
t2 = b.getTime()
console.log('Test 2: ', combineTheGivenNumber2(myArray,15285656));
var c = new Date()
t3 = c.getTime()
console.log('Test1 time: ', t2 - t1)
console.log('test2 time: ', t3 - t2)
Small update
As long as you are willing to take a performance hit with the setup and settle for the ~2 times performance, using a simple "hashing" table can help.(Hashing tables are nice and tidy, this is a simple modulo lookup table. The principle is similar though.)
Technically this isn't linear, practicaly it is enough for the most cases - unless you are extremely unlucky and all your numbers fall in the same bucket.
function combineTheGivenNumber(numArray, num) {
count = 0
let size = 1000000
numTable = new Array(size)
for (var i=0; i!=numArray.length; i++) {
let idx = numArray[i] % size
if (numTable[idx] == undefined) {
numTable[idx] = [numArray[i]]
} else {
numTable[idx].push(numArray[i])
}
}
for (var i=0; i!=numArray.length; i++) {
let portion = num - numArray[i]
let removedPart = Math.pow(10, log10floored(numArray[i]))
if (portion % (removedPart * 10) == 0) {
if (numTable[portion / (removedPart * 10) % size] != undefined) {
let a = numTable[portion / (removedPart * 10) % size]
for (var j=0; j!=a.length; j++) {
if (j != i && portion / (removedPart * 10) == a[j] ) {
count += 1
}
}
}
}
}
return count
}
Here's a simplified, and partially optimised approach with 2 loops:
// let's optimise 'combineTheGivenNumber', where
// a=array of numbers AND n=number to match
const ctgn = (a, n) => {
// convert our given number to a string using `toString` for clarity
// this isn't entirely necessary but means we can use strict equality later
const ns = n.toString();
// reduce is an efficient mechanism to return a value based on an array, giving us
// _=[accumulator], na=[array number] and i=[index]
return a.reduce((_, na, i) => {
// convert our 'array number' to an 'array number string' for later concatenation
const nas = na.toString();
// iterate back over our array of numbers ... we're using an optimised/reverse loop
for (let ii = a.length - 1; ii >= 0; ii--) {
// skip the current array number
if (i === ii) continue;
// string + number === string, which lets us strictly compare our 'number to match'
// if there's a match we increment the accumulator
if (a[ii] + nas === ns) ++_;
}
// we're done
return _;
}, 0);
}
This question can be answered using javascript, underscore or Jquery functions.
given 4 arrays:
[17,17,17,17,17,18,18,18,18,18,19,19,19,19,19,20,20,20,20,20] => x coordinate of unit
[11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15] => y coordinate of unit
[92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92] => x (unit moves to this direction x)
[35,36,37,38,39,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39] => y (unit moves to this direction y)
They are very related to each other.
For example first element of all arrays: [17,11,92,35] is unit x/y coordinates and also x/y coordinates of this units target.
So here are totally 5*4 = 20 units. Every unit has slightly different stats.
These 4 arrays of units x/y coordinates visually looks like an army of 20 units "x" (targeting "o"):
xxxxx o
xxxxx o
xxxxx o
xxxxx o
There will always be 4 arrays. Even if 1 unit, there will be 4 arrays, but each size of 1. This is the simplest situation and most common.
In real situation, every unit has totally 20 different stats(keys) and 14 keys are mostly exact to other group of units - all 14 keys.
So they are grouped as an army with same stats. Difference is only coordinates of the unit and also coordinates of the units target.
I need to compress all this data into as small as possible data, which later can be decompressed.
There can also be more complex situations, when all these 14 keys are accidently same, but coordinates are totally different from pattern. Example:
[17,17,17,17,17,18,18,18, 215, 18,18,19,19,19,19,19,20,20,20,20,20] => x coordinate of unit
[11,12,13,14,15,11,12,13, 418, 14,15,11,12,13,14,15,11,12,13,14,15] => y coordinate of unit
[92,92,92,92,92,92,92,92, -78, 92,92,92,92,92,92,92,92,92,92,92,92] => x (unit moves to this direction x)
[35,36,37,38,39,35,36,37, -887, 38,39,35,36,37,38,39,35,36,37,38,39] => y (unit moves to this direction y)
In this situation i need to extract this array as for 2 different armies. When there are less than 3 units in army, i just simply write these units without the
pattern - as [215,418,-78,-887],[..] and if there are more than 2 units army, i need a compressed string with pattern, which can be decompressed later. In this example there are 21 units. It just has to be splitted into armies of 1 unit and (5x4 = 20) untis army.
In assumption that every n units has a pattern,
encode units with
n: sequence units count
ssx: start of source x
dsx: difference of source x
ssy: start of source y
dsy: difference of source y
stx: start of target x
dtx: difference of target x
sty: start of target y
dty: difference of target y
by the array: [n,ssx,dsx,ssy,dsy,stx,dtx,sty,dty]
so that the units:
[17,17,17,17,17],
[11,12,13,14,15],
[92,92,92,92,92],
[35,36,37,38,39]
are encoded:
[5,17,0,11,1,92,0,35,1]
of course if you know in advance that, for example the y targets are always the same for such a sequence you can give up the difference parameter, to have:
[n,ssx,dsx,ssy,---,stx,dtx,sty,---] => [n,ssx,dsx,ssy,stx,dtx,sty], and so on.
For interruption of a pattern like you mentioned in your last example, you can use other 'extra' arrays, and then insert them in the sequence, with:
exsx: extra unit starting x
exsy: extra unit starting y
extx: extra unit target x
exty: extra unit target y
m: insert extra unit at
so that the special case is encoded:
{
patterns:[
[5,17,0,11,1,92,0,35,1],
[5,18,0,11,1,92,0,35,1],
[5,19,0,11,1,92,0,35,1],
[5,17,0,11,1,92,0,35,1]
],
extras: [
[215,418,-78,-887,8] // 8 - because you want to insert this unit at index 8
]
}
Again, this is a general encoding. Any specific properties for the patterns may further reduce the encoded representation.
Hope this helps.
High compression using bitstreams
You can encode sets of values into a bit stream allowing you to remove unused bits. The numbers you have shown are not greater than -887 (ignoring the negative) and that means you can fit all the numbers into 10 bits saving 54 bits per number (Javascript uses 64 bit numbers).
Run length compression
You also have many repeated sets of numbers which you can use run length compression on. You set a flag in the bitstream that indicates that the following set of bits represents a repeated sequence of numbers, then you have the number of repeats and the value to repeat. For sequences of random numbers you just keep them as is.
If you use run-length compression you create a block type structure in the bit stream, this makes it possible to embed further compression. As you have many numbers that are below 128 many of the numbers can be encoded into 7 bits, or even less. For a small overhead (in this case 2 bits per block) you can select the smallest bit size to pack all the numbers in that block in.
Variable bit depth numbers
I have created a number type value that represent the number of bits used to store numbers in a block. Each block has a number type and all numbers in the block use that type. There are 4 number types that can be encoded into 2 bits.
00 = 4 bit numbers. Range 0-15
01 = 5 bit numbers. Range 0-31
10 = 7 bit numbers. Range 0-127
11 = 10 bit numbers. Range 0-1023
The bitstream
To make this easy you will need a bit stream read/write. It allows you to easily write and read bits from a stream of bits.
// Simple unsigned bit stream
// Read and write to and from a bit stream.
// Numbers are stored as Big endian
// Does not comprehend sign so wordlength should be less than 32 bits
// methods
// eof(); // returns true if read pos > buffer bit size
// read(numberBits); // number of bits to read as one number. No sign so < 32
// write(value,numberBits); // value to write, number of bits to write < 32
// getBuffer(); // return object with buffer and array of numbers, and bitLength the total number of bits
// setBuffer(buffer,bitLength); // the buffers as an array of numbers, and bitLength the total number of bits
// Properties
// wordLength; // read only length of a word.
function BitStream(){
var buffer = [];
var pos = 0;
var numBits = 0;
const wordLength = 16;
this.wordLength = wordLength;
// read a single bit
var readBit = function(){
var b = buffer[Math.floor(pos / wordLength)]; // get word
b = (b >> ((wordLength - 1) - (pos % wordLength))) & 1;
pos += 1;
return b;
}
// write a single bit. Will fill bits with 0 if wite pos is moved past buffer length
var writeBit = function(bit){
var rP = Math.floor(pos / wordLength);
if(rP >= buffer.length){ // check that the buffer has values at current pos.
var end = buffer.length; // fill buffer up to pos with zero
while(end <= rP){
buffer[end] = 0;
end += 1;
}
}
var b = buffer[rP];
bit &= 1; // mask out any unwanted bits
bit <<= (wordLength - 1) - (pos % wordLength);
b |= bit;
buffer[rP] = b;
pos += 1;
}
// returns true is past eof
this.eof = function(){
return pos >= numBits;
}
// reads number of bits as a Number
this.read = function(bits){
var v = 0;
while(bits > 0){
v <<= 1;
v |= readBit();
bits -= 1;
}
return v;
}
// writes value to bit stream
this.write = function(value,bits){
var v;
while(bits > 0){
bits -= 1;
writeBit( (value >> bits) & 1 );
}
}
// returns the buffer and length
this.getBuffer = function(){
return {
buffer : buffer,
bitLength : pos,
};
}
// set the buffer and length and returns read write pos to start
this.setBuffer = function(_buffer,bitLength){
buffer = _buffer;
numBits = bitLength;
pos = 0;
}
}
A format for your numbers
Now to design the format. The first bit read from a stream is a sequence flag, if 0 then the following block will be a repeated value, if 1 the following block will be a sequence of random numbers.
Block bits : description;
repeat block holds a repeated number
bit 0 : Val 0 = repeat
bit 1 : Val 0 = 4bit repeat count or 1 = 5bit repeat count
then either
bits 2,3,4,5 : 4 bit number of repeats - 1
bits 6,7 : 2 bit Number type
or
bits 2,3,4,5,6 : 5 bit number of repeats - 1
bits 7,8 : 2 bit Number type
Followed by
Then a value that will be repeated depending on the number type
End of block
sequence block holds a sequence of random numbers
bit 0 : Val 1 = sequence
bit 1 : Val 0 = positive sequence Val 1 = negative sequence
bits 2,3,4,5 : 4 bit number of numbers in sequence - 1
bits 6,7 : 2 bit Number type
then the sequence of numbers in the number format
End of block.
Keep reading blocks until the end of file.
Encoder and decoder
The following object will encode and decode the a flat array of numbers. It will only handles numbers upto 10 bits long, So no values over 1023 or under -1023.
If you want larger numbers you will have to change the number types that are used. To do this change the arrays
const numberSize = [0,0,0,0,0,1,2,2,3,3,3]; // the number bit depth
const numberBits = [4,5,7,10]; // the number bit depth lookup;
If you want max number to be 12 bits -4095 to 4095 ( the sign bit is in the block encoding). I have also shown the 7 bit number type changed to 8. The first array is used to look up the bit depth, if I have a 3 bit number you get the number type with numberSize[bitcount] and the bits used to store the number numberBits[numberSize[bitCount]]
const numberSize = [0,0,0,0,0,1,2,2,2,3,3,3,3]; // the number bit depth
const numberBits = [4,5,8,12]; // the number bit depth lookup;
function ArrayZip(){
var zipBuffer = 0;
const numberSize = [0,0,0,0,0,1,2,2,3,3,3]; // the number bit depth lookup;
const numberBits = [4,5,7,10]; // the number bit depth lookup;
this.encode = function(data){ // encodes the data
var pos = 0;
function getRepeat(){ // returns the number of repeat values
var p = pos + 1;
if(data[pos] < 0){
return 1; // ignore negative numbers
}
while(p < data.length && data[p] === data[pos]){
p += 1;
}
return p - pos;
}
function getNoRepeat(){ // returns the number of non repeat values
// if the sequence has negitive numbers then
// the length is returned as a negative
var p = pos + 1;
if(data[pos] < 0){ // negative numbers
while(p < data.length && data[p] !== data[p-1] && data[p] < 0){
p += 1;
}
return -(p - pos);
}
while(p < data.length && data[p] !== data[p-1] && data[p] >= 0){
p += 1;
}
return p - pos;
}
function getMax(count){
var max = 0;
var p = pos;
while(count > 0){
max = Math.max(Math.abs(data[p]),max);
p += 1;
count -= 1;
}
return max;
}
var out = new BitStream();
while(pos < data.length){
var reps = getRepeat();
if(reps > 1){
var bitCount = numberSize[Math.ceil(Math.log(getMax(reps) + 1) / Math.log(2))];
if(reps < 16){
out.write(0,1); // repeat header
out.write(0,1); // use 4 bit repeat count;
out.write(reps-1,4); // write 4 bit number of reps
out.write(bitCount,2); // write 2 bit number size
out.write(data[pos],numberBits[bitCount]);
pos += reps;
}else {
if(reps > 32){ // if more than can fit in one repeat block split it
reps = 32;
}
out.write(0,1); // repeat header
out.write(1,1); // use 5 bit repeat count;
out.write(reps-1,5); // write 5 bit number of reps
out.write(bitCount,2); // write 2 bit number size
out.write(data[pos],numberBits[bitCount]);
pos += reps;
}
}else{
var seq = getNoRepeat(); // get number no repeats
var neg = seq < 0 ? 1 : 0; // found negative numbers
seq = Math.min(16,Math.abs(seq));
// check if last value is the start of a repeating block
if(seq > 1){
var tempPos = pos;
pos += seq;
seq -= getRepeat() > 1 ? 1 : 0;
pos = tempPos;
}
// ge the max bit count to hold numbers
var bitCount = numberSize[Math.ceil(Math.log(getMax(seq) + 1) / Math.log(2))];
out.write(1,1); // sequence header
out.write(neg,1); // write negative flag
out.write(seq - 1,4); // write sequence length;
out.write(bitCount,2); // write 2 bit number size
while(seq > 0){
out.write(Math.abs(data[pos]),numberBits[bitCount]);
pos += 1;
seq -= 1;
}
}
}
// get the bit stream buffer
var buf = out.getBuffer();
// start bit stream with number of trailing bits. There are 4 bits used of 16 so plenty
// of room for aulturnative encoding flages.
var str = String.fromCharCode(buf.bitLength % out.wordLength);
// convert bit stream to charcters
for(var i = 0; i < buf.buffer.length; i ++){
str += String.fromCharCode(buf.buffer[i]);
}
// return encoded string
return str;
}
this.decode = function(zip){
var count,rSize,header,_in,i,data,endBits,numSize,val,neg;
data = []; // holds character codes
decompressed = []; // holds the decompressed array of numbers
endBits = zip.charCodeAt(0); // get the trailing bits count
for(i = 1; i < zip.length; i ++){ // convert string to numbers
data[i-1] = zip.charCodeAt(i);
}
_in = new BitStream(); // create a bitstream to read the bits
// set the buffer data and length
_in.setBuffer(data,(data.length - 1) * _in.wordLength + endBits);
while(!_in.eof()){ // do until eof
header = _in.read(1); // read header bit
if(header === 0){ // is repeat header
rSize = _in.read(1); // get repeat count size
if(rSize === 0){
count = _in.read(4); // get 4 bit repeat count
}else{
count = _in.read(5); // get 5 bit repeat count
}
numSize = _in.read(2); // get 2 bit number size type
val = _in.read(numberBits[numSize]); // get the repeated value
while(count >= 0){ // add to the data count + 1 times
decompressed.push(val);
count -= 1;
}
}else{
neg = _in.read(1); // read neg flag
count = _in.read(4); // get 4 bit seq count
numSize = _in.read(2); // get 2 bit number size type
while(count >= 0){
if(neg){ // if negative numbers convert to neg
decompressed.push(-_in.read(numberBits[numSize]));
}else{
decompressed.push(_in.read(numberBits[numSize]));
}
count -= 1;
}
}
}
return decompressed;
}
}
The best way to store a bit stream is as a string. Javascript has Unicode strings so we can pack 16 bits into every character
The results and how to use.
You need to flatten the array. If you need to add extra info to reinstate the multi/dimensional arrays just add that to the array and let the compressor compress it along with the rest.
// flatten the array
var data = [17,17,17,17,17,18,18,18,18,18,19,19,19,19,19,20,20,20,20,20,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39];
var zipper = new ArrayZip();
var encoded = zipper.encode(data); // packs the 80 numbers in data into 21 characters.
// compression rate of the data array 5120 bits to 336 bits
// 93% compression.
// or as a flat 7bit ascii string as numbers 239 charcters (including ,)
// 239 * 7 bits = 1673 bits to 336 bits 80% compression.
var decoded = zipper.decode(encoded);
I did not notice the negative numbers at first so the compression does not do well with the negative values.
var data = [17,17,17,17,17,18,18,18, 215, 18,18,19,19,19,19,19,20,20,20,20,20, 11,12,13,14,15,11,12,13, 418, 14,15,11,12,13,14,15,11,12,13,14,15, 92,92,92,92,92,92,92,92, -78, 92,92,92,92,92,92,92,92,92,92,92,92, 35,36,37,38,39,35,36,37, -887, 38,39,35,36,37,38,39,35,36,37,38,39]
var encoded = zipper.encode(data); // packs the 84 numbers in data into 33 characters.
// compression rate of the data array 5376 bits to 528 bits
var decoded = zipper.decode(encoded);
Summary
As you can see this results in a very high compression rate (almost twice as good as LZ compression). The code is far from optimal and you could easily implement a multi pass compressor with various settings ( there are 12 spare bits at the start of the encoded string that can be used to select many options to improve compression.)
Also I did not see the negative numbers until I came back to post so the fix for negatives is not good, so you can some more out of it by modifying the bitStream to understand negatives (ie use the >>> operator)