This question can be answered using javascript, underscore or Jquery functions.
given 4 arrays:
[17,17,17,17,17,18,18,18,18,18,19,19,19,19,19,20,20,20,20,20] => x coordinate of unit
[11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15] => y coordinate of unit
[92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92] => x (unit moves to this direction x)
[35,36,37,38,39,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39] => y (unit moves to this direction y)
They are very related to each other.
For example first element of all arrays: [17,11,92,35] is unit x/y coordinates and also x/y coordinates of this units target.
So here are totally 5*4 = 20 units. Every unit has slightly different stats.
These 4 arrays of units x/y coordinates visually looks like an army of 20 units "x" (targeting "o"):
xxxxx o
xxxxx o
xxxxx o
xxxxx o
There will always be 4 arrays. Even if 1 unit, there will be 4 arrays, but each size of 1. This is the simplest situation and most common.
In real situation, every unit has totally 20 different stats(keys) and 14 keys are mostly exact to other group of units - all 14 keys.
So they are grouped as an army with same stats. Difference is only coordinates of the unit and also coordinates of the units target.
I need to compress all this data into as small as possible data, which later can be decompressed.
There can also be more complex situations, when all these 14 keys are accidently same, but coordinates are totally different from pattern. Example:
[17,17,17,17,17,18,18,18, 215, 18,18,19,19,19,19,19,20,20,20,20,20] => x coordinate of unit
[11,12,13,14,15,11,12,13, 418, 14,15,11,12,13,14,15,11,12,13,14,15] => y coordinate of unit
[92,92,92,92,92,92,92,92, -78, 92,92,92,92,92,92,92,92,92,92,92,92] => x (unit moves to this direction x)
[35,36,37,38,39,35,36,37, -887, 38,39,35,36,37,38,39,35,36,37,38,39] => y (unit moves to this direction y)
In this situation i need to extract this array as for 2 different armies. When there are less than 3 units in army, i just simply write these units without the
pattern - as [215,418,-78,-887],[..] and if there are more than 2 units army, i need a compressed string with pattern, which can be decompressed later. In this example there are 21 units. It just has to be splitted into armies of 1 unit and (5x4 = 20) untis army.
In assumption that every n units has a pattern,
encode units with
n: sequence units count
ssx: start of source x
dsx: difference of source x
ssy: start of source y
dsy: difference of source y
stx: start of target x
dtx: difference of target x
sty: start of target y
dty: difference of target y
by the array: [n,ssx,dsx,ssy,dsy,stx,dtx,sty,dty]
so that the units:
[17,17,17,17,17],
[11,12,13,14,15],
[92,92,92,92,92],
[35,36,37,38,39]
are encoded:
[5,17,0,11,1,92,0,35,1]
of course if you know in advance that, for example the y targets are always the same for such a sequence you can give up the difference parameter, to have:
[n,ssx,dsx,ssy,---,stx,dtx,sty,---] => [n,ssx,dsx,ssy,stx,dtx,sty], and so on.
For interruption of a pattern like you mentioned in your last example, you can use other 'extra' arrays, and then insert them in the sequence, with:
exsx: extra unit starting x
exsy: extra unit starting y
extx: extra unit target x
exty: extra unit target y
m: insert extra unit at
so that the special case is encoded:
{
patterns:[
[5,17,0,11,1,92,0,35,1],
[5,18,0,11,1,92,0,35,1],
[5,19,0,11,1,92,0,35,1],
[5,17,0,11,1,92,0,35,1]
],
extras: [
[215,418,-78,-887,8] // 8 - because you want to insert this unit at index 8
]
}
Again, this is a general encoding. Any specific properties for the patterns may further reduce the encoded representation.
Hope this helps.
High compression using bitstreams
You can encode sets of values into a bit stream allowing you to remove unused bits. The numbers you have shown are not greater than -887 (ignoring the negative) and that means you can fit all the numbers into 10 bits saving 54 bits per number (Javascript uses 64 bit numbers).
Run length compression
You also have many repeated sets of numbers which you can use run length compression on. You set a flag in the bitstream that indicates that the following set of bits represents a repeated sequence of numbers, then you have the number of repeats and the value to repeat. For sequences of random numbers you just keep them as is.
If you use run-length compression you create a block type structure in the bit stream, this makes it possible to embed further compression. As you have many numbers that are below 128 many of the numbers can be encoded into 7 bits, or even less. For a small overhead (in this case 2 bits per block) you can select the smallest bit size to pack all the numbers in that block in.
Variable bit depth numbers
I have created a number type value that represent the number of bits used to store numbers in a block. Each block has a number type and all numbers in the block use that type. There are 4 number types that can be encoded into 2 bits.
00 = 4 bit numbers. Range 0-15
01 = 5 bit numbers. Range 0-31
10 = 7 bit numbers. Range 0-127
11 = 10 bit numbers. Range 0-1023
The bitstream
To make this easy you will need a bit stream read/write. It allows you to easily write and read bits from a stream of bits.
// Simple unsigned bit stream
// Read and write to and from a bit stream.
// Numbers are stored as Big endian
// Does not comprehend sign so wordlength should be less than 32 bits
// methods
// eof(); // returns true if read pos > buffer bit size
// read(numberBits); // number of bits to read as one number. No sign so < 32
// write(value,numberBits); // value to write, number of bits to write < 32
// getBuffer(); // return object with buffer and array of numbers, and bitLength the total number of bits
// setBuffer(buffer,bitLength); // the buffers as an array of numbers, and bitLength the total number of bits
// Properties
// wordLength; // read only length of a word.
function BitStream(){
var buffer = [];
var pos = 0;
var numBits = 0;
const wordLength = 16;
this.wordLength = wordLength;
// read a single bit
var readBit = function(){
var b = buffer[Math.floor(pos / wordLength)]; // get word
b = (b >> ((wordLength - 1) - (pos % wordLength))) & 1;
pos += 1;
return b;
}
// write a single bit. Will fill bits with 0 if wite pos is moved past buffer length
var writeBit = function(bit){
var rP = Math.floor(pos / wordLength);
if(rP >= buffer.length){ // check that the buffer has values at current pos.
var end = buffer.length; // fill buffer up to pos with zero
while(end <= rP){
buffer[end] = 0;
end += 1;
}
}
var b = buffer[rP];
bit &= 1; // mask out any unwanted bits
bit <<= (wordLength - 1) - (pos % wordLength);
b |= bit;
buffer[rP] = b;
pos += 1;
}
// returns true is past eof
this.eof = function(){
return pos >= numBits;
}
// reads number of bits as a Number
this.read = function(bits){
var v = 0;
while(bits > 0){
v <<= 1;
v |= readBit();
bits -= 1;
}
return v;
}
// writes value to bit stream
this.write = function(value,bits){
var v;
while(bits > 0){
bits -= 1;
writeBit( (value >> bits) & 1 );
}
}
// returns the buffer and length
this.getBuffer = function(){
return {
buffer : buffer,
bitLength : pos,
};
}
// set the buffer and length and returns read write pos to start
this.setBuffer = function(_buffer,bitLength){
buffer = _buffer;
numBits = bitLength;
pos = 0;
}
}
A format for your numbers
Now to design the format. The first bit read from a stream is a sequence flag, if 0 then the following block will be a repeated value, if 1 the following block will be a sequence of random numbers.
Block bits : description;
repeat block holds a repeated number
bit 0 : Val 0 = repeat
bit 1 : Val 0 = 4bit repeat count or 1 = 5bit repeat count
then either
bits 2,3,4,5 : 4 bit number of repeats - 1
bits 6,7 : 2 bit Number type
or
bits 2,3,4,5,6 : 5 bit number of repeats - 1
bits 7,8 : 2 bit Number type
Followed by
Then a value that will be repeated depending on the number type
End of block
sequence block holds a sequence of random numbers
bit 0 : Val 1 = sequence
bit 1 : Val 0 = positive sequence Val 1 = negative sequence
bits 2,3,4,5 : 4 bit number of numbers in sequence - 1
bits 6,7 : 2 bit Number type
then the sequence of numbers in the number format
End of block.
Keep reading blocks until the end of file.
Encoder and decoder
The following object will encode and decode the a flat array of numbers. It will only handles numbers upto 10 bits long, So no values over 1023 or under -1023.
If you want larger numbers you will have to change the number types that are used. To do this change the arrays
const numberSize = [0,0,0,0,0,1,2,2,3,3,3]; // the number bit depth
const numberBits = [4,5,7,10]; // the number bit depth lookup;
If you want max number to be 12 bits -4095 to 4095 ( the sign bit is in the block encoding). I have also shown the 7 bit number type changed to 8. The first array is used to look up the bit depth, if I have a 3 bit number you get the number type with numberSize[bitcount] and the bits used to store the number numberBits[numberSize[bitCount]]
const numberSize = [0,0,0,0,0,1,2,2,2,3,3,3,3]; // the number bit depth
const numberBits = [4,5,8,12]; // the number bit depth lookup;
function ArrayZip(){
var zipBuffer = 0;
const numberSize = [0,0,0,0,0,1,2,2,3,3,3]; // the number bit depth lookup;
const numberBits = [4,5,7,10]; // the number bit depth lookup;
this.encode = function(data){ // encodes the data
var pos = 0;
function getRepeat(){ // returns the number of repeat values
var p = pos + 1;
if(data[pos] < 0){
return 1; // ignore negative numbers
}
while(p < data.length && data[p] === data[pos]){
p += 1;
}
return p - pos;
}
function getNoRepeat(){ // returns the number of non repeat values
// if the sequence has negitive numbers then
// the length is returned as a negative
var p = pos + 1;
if(data[pos] < 0){ // negative numbers
while(p < data.length && data[p] !== data[p-1] && data[p] < 0){
p += 1;
}
return -(p - pos);
}
while(p < data.length && data[p] !== data[p-1] && data[p] >= 0){
p += 1;
}
return p - pos;
}
function getMax(count){
var max = 0;
var p = pos;
while(count > 0){
max = Math.max(Math.abs(data[p]),max);
p += 1;
count -= 1;
}
return max;
}
var out = new BitStream();
while(pos < data.length){
var reps = getRepeat();
if(reps > 1){
var bitCount = numberSize[Math.ceil(Math.log(getMax(reps) + 1) / Math.log(2))];
if(reps < 16){
out.write(0,1); // repeat header
out.write(0,1); // use 4 bit repeat count;
out.write(reps-1,4); // write 4 bit number of reps
out.write(bitCount,2); // write 2 bit number size
out.write(data[pos],numberBits[bitCount]);
pos += reps;
}else {
if(reps > 32){ // if more than can fit in one repeat block split it
reps = 32;
}
out.write(0,1); // repeat header
out.write(1,1); // use 5 bit repeat count;
out.write(reps-1,5); // write 5 bit number of reps
out.write(bitCount,2); // write 2 bit number size
out.write(data[pos],numberBits[bitCount]);
pos += reps;
}
}else{
var seq = getNoRepeat(); // get number no repeats
var neg = seq < 0 ? 1 : 0; // found negative numbers
seq = Math.min(16,Math.abs(seq));
// check if last value is the start of a repeating block
if(seq > 1){
var tempPos = pos;
pos += seq;
seq -= getRepeat() > 1 ? 1 : 0;
pos = tempPos;
}
// ge the max bit count to hold numbers
var bitCount = numberSize[Math.ceil(Math.log(getMax(seq) + 1) / Math.log(2))];
out.write(1,1); // sequence header
out.write(neg,1); // write negative flag
out.write(seq - 1,4); // write sequence length;
out.write(bitCount,2); // write 2 bit number size
while(seq > 0){
out.write(Math.abs(data[pos]),numberBits[bitCount]);
pos += 1;
seq -= 1;
}
}
}
// get the bit stream buffer
var buf = out.getBuffer();
// start bit stream with number of trailing bits. There are 4 bits used of 16 so plenty
// of room for aulturnative encoding flages.
var str = String.fromCharCode(buf.bitLength % out.wordLength);
// convert bit stream to charcters
for(var i = 0; i < buf.buffer.length; i ++){
str += String.fromCharCode(buf.buffer[i]);
}
// return encoded string
return str;
}
this.decode = function(zip){
var count,rSize,header,_in,i,data,endBits,numSize,val,neg;
data = []; // holds character codes
decompressed = []; // holds the decompressed array of numbers
endBits = zip.charCodeAt(0); // get the trailing bits count
for(i = 1; i < zip.length; i ++){ // convert string to numbers
data[i-1] = zip.charCodeAt(i);
}
_in = new BitStream(); // create a bitstream to read the bits
// set the buffer data and length
_in.setBuffer(data,(data.length - 1) * _in.wordLength + endBits);
while(!_in.eof()){ // do until eof
header = _in.read(1); // read header bit
if(header === 0){ // is repeat header
rSize = _in.read(1); // get repeat count size
if(rSize === 0){
count = _in.read(4); // get 4 bit repeat count
}else{
count = _in.read(5); // get 5 bit repeat count
}
numSize = _in.read(2); // get 2 bit number size type
val = _in.read(numberBits[numSize]); // get the repeated value
while(count >= 0){ // add to the data count + 1 times
decompressed.push(val);
count -= 1;
}
}else{
neg = _in.read(1); // read neg flag
count = _in.read(4); // get 4 bit seq count
numSize = _in.read(2); // get 2 bit number size type
while(count >= 0){
if(neg){ // if negative numbers convert to neg
decompressed.push(-_in.read(numberBits[numSize]));
}else{
decompressed.push(_in.read(numberBits[numSize]));
}
count -= 1;
}
}
}
return decompressed;
}
}
The best way to store a bit stream is as a string. Javascript has Unicode strings so we can pack 16 bits into every character
The results and how to use.
You need to flatten the array. If you need to add extra info to reinstate the multi/dimensional arrays just add that to the array and let the compressor compress it along with the rest.
// flatten the array
var data = [17,17,17,17,17,18,18,18,18,18,19,19,19,19,19,20,20,20,20,20,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,11,12,13,14,15,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,92,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39,35,36,37,38,39];
var zipper = new ArrayZip();
var encoded = zipper.encode(data); // packs the 80 numbers in data into 21 characters.
// compression rate of the data array 5120 bits to 336 bits
// 93% compression.
// or as a flat 7bit ascii string as numbers 239 charcters (including ,)
// 239 * 7 bits = 1673 bits to 336 bits 80% compression.
var decoded = zipper.decode(encoded);
I did not notice the negative numbers at first so the compression does not do well with the negative values.
var data = [17,17,17,17,17,18,18,18, 215, 18,18,19,19,19,19,19,20,20,20,20,20, 11,12,13,14,15,11,12,13, 418, 14,15,11,12,13,14,15,11,12,13,14,15, 92,92,92,92,92,92,92,92, -78, 92,92,92,92,92,92,92,92,92,92,92,92, 35,36,37,38,39,35,36,37, -887, 38,39,35,36,37,38,39,35,36,37,38,39]
var encoded = zipper.encode(data); // packs the 84 numbers in data into 33 characters.
// compression rate of the data array 5376 bits to 528 bits
var decoded = zipper.decode(encoded);
Summary
As you can see this results in a very high compression rate (almost twice as good as LZ compression). The code is far from optimal and you could easily implement a multi pass compressor with various settings ( there are 12 spare bits at the start of the encoded string that can be used to select many options to improve compression.)
Also I did not see the negative numbers until I came back to post so the fix for negatives is not good, so you can some more out of it by modifying the bitStream to understand negatives (ie use the >>> operator)