Why Diff-match-patch broken linediff beyond 65K lines - javascript

I try using The google diff-match-path library for line diffs:
https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs. I get wrong patches when in sum the lines of both inputs goes beyond 65,536 (2^16) lines.
Is that a bug (in my code or diff-match-patch), or am I hitting a known limitation of javascript/nodejs? Anything I can do to use d-m-p with larger files?
Using node version v6.3.1, diff-match-patch 1.0.4
This script reproduces the problem
var diff_match_patch = require("diff-match-patch")
// function copied from google wiki
// https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs
function diff_lineMode(text1, text2) {
var dmp = new diff_match_patch();
var a = dmp.diff_linesToChars_(text1, text2);
var lineText1 = a.chars1;
var lineText2 = a.chars2;
var lineArray = a.lineArray;
var diffs = dmp.diff_main(lineText1, lineText2, false);
dmp.diff_charsToLines_(diffs, lineArray);
return diffs;
}
// reproduce problem by diffing string with many lines to "abcd"
for (let size = 65534; size < 65538; size += 1) {
let text1 = "";
for (let i = 0; i < size; i++) {
text1 += i + "\n";
}
var patches = diff_lineMode(text1, "abcb")
console.log("######## Size: " + size + ": patches " + patches.length)
for (let i = 0; i < patches.length; i++) {
// patch[0] is action, patch[1] is value
var action = patches[i][0] < 0 ? "remove" : (patches[i][0] > 0 ? "add" : "keep")
console.log("patch" + i + ": " + action + "\n" + patches[i][1].substring(0, 10))
}
}
Giving these outputs:
######## Size: 65534: patches 2
patch0: remove
0
1
2
3
4
patch1: add
abcb
######## Size: 65535: patches 2
patch0: remove
0
1
2
3
4
patch1: add
######## Size: 65536: patches 2
patch0: keep
0
patch1: remove
1
2
3
4
5
######## Size: 65537: patches 3
patch0: remove
0
patch1: keep
1
patch2: remove
2
3
4
5
6

It's a limitation from ES5 and the algorithm mapping lines to 16bit unicode characters. On ES6, it can be extended to 2^21 bit instead, covering longer files.
To speed up line-diffing, the algorithm does not compare the whole texts, but replaces each line with a single unicode character. So each character in the replacement maps to one unique line in a hashmap. The number of unicode characters however is limited, and the current implementation merely overflows.
This will not cause false positives (same lines will still be considered same), but it may miss some line differences at a low probability of 1/65K per line for natural diffs.
And it prevents the patches to be mapped back to the original text lines reliably, because different lines were mapped to the same character, so the inverse process maps all such chars to the first mapped line.
It should be possible to scale correct diffing to much larger inputs by using a larger target space of symbols, such as by using 2 or 3 characters to represent unique lines.

Related

Emojis to/from codepoints in Javascript

In a hybrid Android/Cordova game that I am creating I let users provide an identifier in the form of an Emoji + an alphanumeric - i.e. 0..9,A..Z,a..z - name. For example
🙋‍️Stackoverflow
Server-side the user identifiers are stored with the Emoji and Name parts separated with only the Name part requiried to be unique. From time-to-time the game displays a "league table" so the user can see how well they are performing compared to other players. For this purpose the server sends back a sequence of ten "high score" values consisting of Emoji, Name and Score.
This is then presented to the user in a table with three columns - one each for Emoji, Name and Score. And this is where I have hit a slight problem. Initially I had quite naively assumed that I could figure out the Emoji by simply looking at handle.codePointAt(0). When it dawned on me that an Emoji could in fact be a sequence of one or more 16 bit Unicode values I changed my code as follows
Part 1:Dissecting the user supplied "handle"
var i,username,
codepoints = [],
handle = "🙋‍️StackOverflow",
len = handle,length;
while ((i < len) && (255 < handle.codePointAt(i)))
{codepoints.push(handle.codePointAt(i));i += 2;}
username = handle.substring(codepoints.length + 1);
At this point I have the "disssected" handle with
codepoints =  [128587, 8205, 65039];
username = 'Stackoverflow;
A note of explanation for the i += 2 and the use of handle.length above. This article suggests that
handle.codePointAt(n) will return the code point for the full surrogate pair if you hit the leading surrogate. In my case since the Emoji has to be first character the leading surrogates for the sequence of 16 bit Unicodes for the emoji are at 0,2,4....
From the same article I learnt that String.length in Javascript will return the number of 16 bit code units.
Part II - Re generating the Emojis for the "league table"
Suppose the league table data squirted back to the app by my servers has the entry {emoji: [128583, 8205, 65039],username:"Stackexchange",points:100} for the emoji character 🙇‍️. Now here is the bothersome thing. If I do
var origCP = [],
i = 0,
origEmoji = '🙇‍️',
origLen = origEmoji.length;
while ((i < origLen) && (255 < origEmoji.codePointAt(i))
{origCP.push(origEmoji.codePointAt(i);i += 2;}
I get
origLen = 5, origCP = [128583, 8205, 65039]
However, if I regenerate the emoji from the provided data
var reEmoji = String.fromCodePoint.apply(String,[128583, 8205, 65039]),
reEmojiLen = reEmoji.length;
I get
reEmoji = '🙇‍️'
reEmojiLen = 4;
So while reEmoji has the correct emoji its reported length has mysteriously shrunk down to 4 code units in place of the original 5.
If I then extract code points from the regenerated emoji
var reCP = [],
i = 0;
while ((i < reEmojiLen) && (255 < reEmoji.codePointAt(i))
{reCP.push(reEmoji.codePointAt(i);i += 2;}
which gives me
reCP = [128583, 8205];
Even curioser, origEmoji.codePointAt(3) gives the trailing surrogate pair value of 9794 while reEmoji.codePointAt(3) gives the value of the next full surrogate pair 65039.
I could at this point just say
Do I really care?
After all, I just want to show the league table emojis in a separate column so as long as I am getting the right emoji the niceties of what is happening under the hood do not matter. However, this might well be stocking up problems for the future.
Can anyone here shed any light on what is happening?
emojis are more complicated than just single chars, they come in "sequences", e.g. a zwj-sequence (combine multiple emojis into one image) or a presentation sequence (provide different variations of the same symbol) and some more, see tr51 for all the nasty details.
If you "dump" your string like this
str = "🙋‍️StackOverflow"
console.log(...[...str].map(x => x.codePointAt(0).toString(16)))
you'll see that it's actually an (incorrectly formed) zwj-sequence wrapped in a presentation sequence.
So, to slice emojis accurately, you need to iterate the string as an array of codepoints (not units!) and extract plane 1 CPs (>0xffff) + ZWJ's + variation selectors. Example:
function sliceEmoji(str) {
let res = ['', ''];
for (let c of str) {
let n = c.codePointAt(0);
let isEmoji = n > 0xfff || n === 0x200d || (0xfe00 <= n && n <= 0xfeff);
res[1 - isEmoji] += c;
}
return res;
}
function hex(str) {
return [...str].map(x => x.codePointAt(0).toString(16))
}
myStr = "🙋‍️StackOverflow"
console.log(sliceEmoji(myStr))
console.log(sliceEmoji(myStr).map(hex))

Rubik´Cube Scrambling Algorithm - JavaScript

I have been working on a Rubik’s Cube Timer website, and I need to make a scrambling algorithm. I’ll go over how the scrambling algorithm should work:
Each face has it’s own letter, it’s initial. for examble, if you want to move the front face, you would write “ F “. If you want to move the the right face, you would write “ R “, and so on. just note that the bottom face is D, as for down. So you have D U R L B F.
If there is nothing after that letter, you turn it clockwise. If there is an appostrophe “ ‘ “, you turn it counter-clockwise. If there is a 2, you turn it two times. Now the thing is that you cannot have 2 same letters next to oneanother, as they would cancel (For example “.. U U’ ...” would be the same as doing nothing. So far, I have this taken care of in my algorithm.
The problem comes when you have one letter, then it’s opposite, then again the first letter, ( For example “.. U D U’...” (would mean Up clockwise, Down clockwise, Up counterclokwise)).
I have no idea how to check for these and avoid them automatically. Here’s the code:
<div id=“Scramble”></div>
<script>
generateScramble();
function generateScramble() {
// Possible Letters
var array = new Array(" U", " D", " R", " L", " F", " B")
// Possible switches
var switches = ["", "\'", "2"];
var array2 = new Array(); // The Scramble.
var last = ''; // Last used letter
var random = 0;
for (var i = 0; i < 20; i++) {
// the following loop runs until the last one
// letter is another of the new one
do {
random = Math.floor(Math.random() * array.length);
} while (last == array[random])
// assigns the new one as the last one
last = array[random];
// the scramble item is the letter
// with (or without) a switch
var scrambleItem = array[random] + switches[parseInt(Math.random()*switches.length)];
array2.push(scrambleItem); // Get letters in random order in the array.
}
var scramble = "Scramble: ";
// Appends all scramble items to scramble variable
for(i=0; i<20; i++) {
scramble += array2[i];
}
document.getElementById("Scramble").innerHTML = scramble; // Display the scramble
}
</script>
For starters God's Number is 20 for Rubik;s cube so you got only 20 moves instead of 25. I assume you are not doing scrambling (as your title suggest) but instead generate solution command strings for genere&test solver type. There are too many sequences that cancel each other and to check for all of them would be most likely slower than try them out actually.
The problem is that even O(n^20) is huge and you need to lower the 20. That is done by LUT holding semi solved states. For example create table holding states for all combinations of 5 turn scrambling. Then use that as end condition turning your solver into O(n^15 + n^5) = O(n^15) ...

Adding space to fill array length

I am working with a javascript program that needs to be formatted a certain way. Basically, I need to have each section of information from an array be a set length, for example 12 characters long, and no more than that.
The problem I am running into comes when a value in the array is NOT 12 characters long. If I have a value that is less than the 12 characters the remaining character allotment needs to be filled with blank spaces.
The length of each section of information varies in size and is not always 12. How can I add X number of blank spaces, should the length not meet the maximum requirement, for each section?
This is where I am at with adding space:
str = str + new Array(str.length).join(' ');
I am pretty sure what I have above is wrong but I believe I am on the right track with the .join function. Any ideas?
EDIT: I was asked to show a wanted outcome. It is a bit complicated because this javascript is being run out of a web report tool and not out of something like Visual Studio so its not traditional JS.
The outcome expected should look something like:
Sample Image
So as shown above the data is in one line, cutting off longer strings of information or filling in blank spaces if its too short for the "column" to keep that nice even look.
try this code and leverage the wonders of the map function:
let say your array is:
var myArr = ["123456789012", "12345678901", "123"];
now just apply this function
myArr.map(function(item){ //evalueate each item inside the array
var strLength = item.length; //apply this function to each item
if (strLength < 12){
return item + ' '.repeat(12-item.length) //add the extra spaces as needed
} else {
return item; // return the item because it's length is 12 or +
}
})
What you are looking for is the ' '.repeat(x) - where x is the times you want to repeat the string you have set, it could be '*'.repeat(2) and you would get '**', if you want to understand more about it look at the docs
depending on which version of javascript, this might work:
if (str.length < 12) str += ' '.repeat(12 - str.length);
Not exactly sure how you're setup -- but something like the following will accept an array and return another array with all its values being 12 characters in length.
var array = ['Test', 'Testing', 'Tested', 'This is not a Test'];
var adjustedArray = correctLength(array, 12);
function correctLength(array, length) {
array.map(function(v, i) {
if (array[i].length < length) {
array[i] += Array((length+1) - array[i].length).join('_');
}
// might not need this if values are already no greater than 12
array[i] = array[i].substring(0, length);
});
return array;
}
console.log(adjustedArray);

compressing a string of 0's and 1's in js

Itroduction
I'm currently working on John Conway's Game of Life in js. I have the game working (view here) and i'm working on extra functionalities such as sharing your "grid / game" to your friends. To do this i'm extracting the value's of the grid (if the cell is alive or dead) into a long string of 0's and 1's.
This string has a variable length since the grid is not always the same size. for example:
grid 1 has a length and width of 30 => so the string's length is 900
grid 2 has a length and width of 50 => so the string's length is 2500
The problem
As you can see these string's of 0's and 1's are way too long to copy around and share.
However hard i try I don't seem to be able to come up with a code that would compress a string this long to a easy to handle one.
Any ideas on how to compress (and decompress) this?
I have considered simply writing down every possible grid option for the gird sizes 1x1 to 100x100 and giving them a key/reference to use as sharable code. Doing that by hand would be madness but maybe any of you has an idea on how to create an algorithm that can do this?
GitHub repository
In case it wasn't already obvious, the string you're trying to store looks like a binary string.
Counting systems
Binary is a number in base-2. This essentially means that there are two characters being used to keep count. Normally we are used to count with base-10 (decimal characters). In computer science the hexadecimal system (base-16) is also widely being used.
Since you're not storing the bits as bits but as bytes (use var a = 0b1100001; if you ever wish to store them like bits) the 'binary' you wish to store just takes as much space as any other random string with the same length.
Since you're using the binary system each position just has 2 possible values. When using the hexadecimal value a single position can hold up to 16 possible values. This is already a big improvement when it comes to storing the data compactly. As an example 0b11111111 and 0xff both represents the decimal number 255.
In your situation that'd shave 6 bytes of every 8 bytes you have to store. In the end you'd be stuck with a string just 1/4th of the length of the original string.
Javascript implementation
Essentially what we want to do is to interpret the string you store as binary and retrieve the hexadecimal value. Luckily JavaScript has built in functionality to achieve stuff like this:
var bin =
'1110101110100011' +
'0000101111100001' +
'1010010101011010' +
'0000110111011111' +
'1111111001010101' +
'0111000011100001' +
'1011010100110001' +
'0111111110010100' +
'0111110110100101' +
'0000111101100111' +
'1100001111011100' +
'0101011100001111' +
'0110011011001101' +
'1000110010001001' +
'1010100010000011' +
'0011110000000000';
var returnValue = '';
for (var i = 0; i < parseInt(bin.length / 8); i++) {
returnValue += parseInt(bin.substr(i*8, 8), 2).toString(16);
}
console.log(bin.length); // Will return 265
console.log(returnValue.length); // Will return 64
We're saying "parse this string and interpret it like a base-2 number and store it as a hexadecimal string".
Decoding is practically the same. Replace all occurrences of the number 8 in the example above with 2 and vice versa.
Please note
A prerequisite for this code to work correctly is that the binary length is dividable by 8. See the following example:
parseInt('00011110', 2).toString(16); // returns '1e'
parseInt('1e', 16).toString(2); // returns '11110'
// Technically both representations still have the same decimal value
When decoding you should add leading zeros until you have a full byte (8 bits).
In case the positions you have to store are not dividable by 8 you can, for example, add padding and add a number to the front of the output string to identify how much positions to strip.
Wait, there's more
To get even shorter strings you can build a lookup table with 265 characters in which you search for the character associated with the specific position. (This works because you're still storing the hexadecimal value as a string.) Sadly neither the ASCII nor the UTF-8 encodings are suited for this as there are blocks with values which have no characters defined.
It may look like:
// Go fill this array until you have 265 values within it.
var lookup = ['A', 'B', 'C', 'D'];
var smallerValue = lookup[0x00];
This way you can have 265 possible values at a single position, AND you have used your byte to the fullest.
Please note that no real compression is happening here. We're rather utilising data types to be used more efficiently for your current use case.
If we make the assumption than the grid contains much more 0's than 1's, you may want to try this simple compression scheme:
convert the binary string to an hexadecimal string
convert '00' sub-strings to 'z' symbol
convert 'zz' sub-strings to 'Z' symbol
we could go further, but let's stop here for the demo
Below is an example with a 16x16 grid:
var bin =
'0000000000000000' +
'0000001000000000' +
'0000011100000000' +
'0000001000000000' +
'0000000000000000' +
'0000000000111000' +
'0000100000111000' +
'0000000000111000' +
'0000000000000000' +
'0000000000000000' +
'0000000010000000' +
'0000000101000000' +
'0000000010000000' +
'0000000000000000' +
'0000100000000000' +
'0000000000000000';
var packed = bin
.match(/(.{4})/g)
.map(function(x) {
return parseInt(x, 2).toString(16);
})
.join('')
.replace(/00/g, 'z')
.replace(/zz/g, 'Z');
This will produce the string "Z02z07z02ZZ380838z38ZZz8z14z08Zz8Zz".
The unpacking process is doing the exact opposite:
var bin = packed
.replace(/Z/g, 'zz')
.replace(/z/g, '00')
.split('')
.map(function(x) {
return ('000' + parseInt(x, 16).toString(2)).substr(-4, 4);
})
.join('');
Note that this code will only work correctly if the length of the input string is a multiple of 4. If it's not the case, you'll have to pad the input and crop the output.
EDIT : 2nd method
If the input is completely random -- with roughly as many 0's as 1's and no specific repeating patterns -- the best you can do is probably to convert the binary string to a BASE64 string. It will be significantly shorter (this time with a fixed compression ratio of about 17%) and can still be copied/pasted by the user.
Packing:
var bin =
'1110101110100011' +
'0000101111100001' +
'1010010101011010' +
'0000110111011111' +
'1111111001010101' +
'0111000011100001' +
'1011010100110001' +
'0111111110010100' +
'0111110110100101' +
'0000111101100111' +
'1100001111011100' +
'0101011100001111' +
'0110011011001101' +
'1000110010001001' +
'1010100010000011' +
'0011110000000000';
var packed =
btoa(
bin
.match(/(.{8})/g)
.map(function(x) {
return String.fromCharCode(parseInt(x, 2));
})
.join('')
);
Will produce the string "66ML4aVaDd/+VXDhtTF/lH2lD2fD3FcPZs2MiaiDPAA=".
Unpacking:
var bin =
atob(packed)
.split('')
.map(function(x) {
return ('0000000' + x.charCodeAt(0).toString(2)).substr(-8, 8);
})
.join('');
Or if you want to go a step further, you can consider using something like base91 instead, for a reduced encoding overhead.
LZ-string
Using LZ-string I was able to compress the "code" quite a bit.
By simply compressing it to base64 like this:
var compressed = LZString.compressToBase64(string)
Decompressing is also just as simple as this:
var decompressed = LZString.decompressFromBase64(compressed)
However the length of this compressed string is still pretty long given that you have about as many 0s as 1s (not given in the example)
example
But the compression does work.
ANSWER
For any of you who are wondering how exactly I ended up doing it, here's how:
First I made sure every string passed in would be padded with leading 0s untill it was devidable by 8. (saving the amount of 0s used to pad, since they're needed while decompressing)
I used Corstian's answer and functions to compress my string (interpreted as binary) into a hexadecimal string. Although i had to make one slight alteration.
Not every binary substring with a lenght of 8 will return exactly 2 hex characters. so for those cases i ended up just adding a 0 in front of the substring. The hex substring will have the same value but it's length will now be 2.
Next up i used a functionality from Arnaulds answer. Taking every double character and replacing it with a single character (one not used in the hexadecimal alphabet to avoid conflict). I did this twice for every hexadecimal character.
For example:
the hex string 11 will become h and hh will become H
01101111 will become 0h0H
Since most grids are gonna have more dead cells then alive ones, I made sure the 0s would be able to compress even further, using Arnaulds method again but going a step further.
00 -> g | gg -> G | GG -> w | ww -> W | WW -> x | xx -> X | XX-> y | yy -> Y | YY -> z | zz -> Z
This resulted in Z representing 4096 (binary) 0s
The last step of the compression was adding the amount of leading 0s in front of the compressed string, so we can shave those off at the end of decompressing.
This is how the returned string looks in the end.
amount of leading 0s-compressed string so a 64*64 empty grid, will result in 0-Z
Decompressing is practically doing everything the other way around.
Firstly splitting the number that represents how many leading 0s we've used as padding from the compressed string.
Then using Arnaulds functionality, turning the further "compressed" characters back into hexadecimal code.
Taking this hex string and turning it back into binary code. Making sure, as Corstian pointed out, that every binary substring will have a length of 8. (ifnot we pad the substrings with leading 0s untill the do, exactly, have a length of 8)
And then the last step is to shave off the leading 0s we've used as padding to make the begin string devidable by 8.
The functions
Function I use to compress:
/**
* Compresses the a binary string into a compressed string.
* Returns the compressed string.
*/
Codes.compress = function(bin) {
bin = bin.toString(); // To make sure the binary is a string;
var returnValue = ''; // Empty string to add our data to later on.
// If the lenght of the binary string is not devidable by 8 the compression
// won't work correctly. So we add leading 0s to the string and store the amount
// of leading 0s in a variable.
// Determining the amount of 'padding' needed.
var padding = ((Math.ceil(bin.length/8))*8)-bin.length;
// Adding the leading 0s to the binary string.
for (var i = 0; i < padding; i++) {
bin = '0'+bin;
}
for (var i = 0; i < parseInt(bin.length / 8); i++) {
// Determining the substring.
var substring = bin.substr(i*8, 8)
// Determining the hexValue of this binary substring.
var hexValue = parseInt(substring, 2).toString(16);
// Not all binary values produce two hex numbers. For example:
// '00000011' gives just a '3' while what we wand would be '03'. So we add a 0 in front.
if(hexValue.length == 1) hexValue = '0'+hexValue;
// Adding this hexValue to the end string which we will return.
returnValue += hexValue;
}
// Compressing the hex string even further.
// If there's any double hex chars in the string it will take those and compress those into 1 char.
// Then if we have multiple of those chars these are compressed into 1 char again.
// For example: the hex string "ff will result in a "v" and "ffff" will result in a "V".
// Also: "11" will result in a "h" and "1111" will result in a "H"
// For the 0s this process is repeated a few times.
// (string with 4096 0s) (this would represent a 64*64 EMPTY grid)
// will result in a "Z".
var returnValue = returnValue.replace(/00/g, 'g')
.replace(/gg/g, 'G')
// Since 0s are probably more likely to exist in our binary and hex, we go a step further compressing them like this:
.replace(/GG/g, 'w')
.replace(/ww/g, 'W')
.replace(/WW/g, 'x')
.replace(/xx/g, 'X')
.replace(/XX/g, 'y')
.replace(/yy/g, 'Y')
.replace(/YY/g, 'z')
.replace(/zz/g, 'Z')
//Rest of the chars...
.replace(/11/g, 'h')
.replace(/hh/g, 'H')
.replace(/22/g, 'i')
.replace(/ii/g, 'I')
.replace(/33/g, 'j')
.replace(/jj/g, 'J')
.replace(/44/g, 'k')
.replace(/kk/g, 'K')
.replace(/55/g, 'l')
.replace(/ll/g, 'L')
.replace(/66/g, 'm')
.replace(/mm/g, 'M')
.replace(/77/g, 'n')
.replace(/nn/g, 'N')
.replace(/88/g, 'o')
.replace(/oo/g, 'O')
.replace(/99/g, 'p')
.replace(/pp/g, 'P')
.replace(/aa/g, 'q')
.replace(/qq/g, 'Q')
.replace(/bb/g, 'r')
.replace(/rr/g, 'R')
.replace(/cc/g, 's')
.replace(/ss/g, 'S')
.replace(/dd/g, 't')
.replace(/tt/g, 'T')
.replace(/ee/g, 'u')
.replace(/uu/g, 'U')
.replace(/ff/g, 'v')
.replace(/vv/g, 'V');
// Adding the number of leading 0s that need to be ignored when decompressing to the string.
returnValue = padding+'-'+returnValue;
// Returning the compressed string.
return returnValue;
}
The function I use to decompress:
/**
* Decompresses the compressed string back into a binary string.
* Returns the decompressed string.
*/
Codes.decompress = function(compressed) {
var returnValue = ''; // Empty string to add our data to later on.
// Splitting the input on '-' to seperate the number of paddin 0s and the actual hex code.
var compressedArr = compressed.split('-');
var paddingAmount = compressedArr[0]; // Setting a variable equal to the amount of leading 0s used while compressing.
compressed = compressedArr[1]; // Setting the compressed variable to the actual hex code.
// Decompressing further compressed characters.
compressed = compressed// Decompressing the further compressed 0s. (even further then the rest of the chars.)
.replace(/Z/g, 'zz')
.replace(/z/g, 'YY')
.replace(/Y/g, 'yy')
.replace(/y/g, 'XX')
.replace(/X/g, 'xx')
.replace(/x/g, 'WW')
.replace(/W/g, 'ww')
.replace(/w/g, 'GG')
.replace(/G/g, 'gg')
.replace(/g/g, '00')
// Rest of chars...
.replace(/H/g, 'hh')
.replace(/h/g, '11')
.replace(/I/g, 'ii')
.replace(/i/g, '22')
.replace(/J/g, 'jj')
.replace(/j/g, '33')
.replace(/K/g, 'kk')
.replace(/k/g, '44')
.replace(/L/g, 'll')
.replace(/l/g, '55')
.replace(/M/g, 'mm')
.replace(/m/g, '66')
.replace(/N/g, 'nn')
.replace(/n/g, '77')
.replace(/O/g, 'oo')
.replace(/o/g, '88')
.replace(/P/g, 'pp')
.replace(/p/g, '99')
.replace(/Q/g, 'qq')
.replace(/q/g, 'aa')
.replace(/R/g, 'rr')
.replace(/r/g, 'bb')
.replace(/S/g, 'ss')
.replace(/s/g, 'cc')
.replace(/T/g, 'tt')
.replace(/t/g, 'dd')
.replace(/U/g, 'uu')
.replace(/u/g, 'ee')
.replace(/V/g, 'vv')
.replace(/v/g, 'ff');
for (var i = 0; i < parseInt(compressed.length / 2); i++) {
// Determining the substring.
var substring = compressed.substr(i*2, 2);
// Determining the binValue of this hex substring.
var binValue = parseInt(substring, 16).toString(2);
// If the length of the binary value is not equal to 8 we add leading 0s (js deletes the leading 0s)
// For instance the binary number 00011110 is equal to the hex number 1e,
// but simply running the code above will return 11110. So we have to add the leading 0s back.
if (binValue.length != 8) {
// Determining how many 0s to add:
var diffrence = 8 - binValue.length;
// Adding the 0s:
for (var j = 0; j < diffrence; j++) {
binValue = '0'+binValue;
}
}
// Adding the binValue to the end string which we will return.
returnValue += binValue
}
var decompressedArr = returnValue.split('');
returnValue = ''; // Emptying the return variable.
// Deleting the not needed leading 0s used as padding.
for (var i = paddingAmount; i < decompressedArr.length; i++) {
returnValue += decompressedArr[i];
}
// Returning the decompressed string.
return returnValue;
}
URL shortener
I still found the "compressed" strings a little long for sharing / pasting around. So i used a simple URL shortener (view here) to make this process a little easier for the user.
Now you might ask, then why did you need to compress this string anyway?
Here's why:
First of all, my project is hosted on github pages (gh-pages). The info page of gh-pages tells us that the url can't be any longer than 2000 chars. This would mean that the max grid size would be the square root of 2000 - length of the base url, which isn't that big. By using this "compression" we are able to share much larger grids.
Now the second reason why is that, it's a challange. I find dealing with problems like these fun and also helpfull since you learn a lot.
Live
You can view the live version of my project here. and/or find the github repository here.
Thankyou
I want to thank everyone who helped me with this problem. Especially Corstian and Arnauld, since i ended up using their answers to reach my final functions.
Sooooo.... thanks guys! apriciate it!
In the Game of Life there is a board of ones and zeros. I want to back up to previous generation - size 4800 - save each 16 cells as hexadecimal = 1/4 the size. http://innerbeing.epizy.com/cwebgl/gameoflife.html [g = Go] [b = Backup]
function drawGen(n) {
stop(); var i = clamp(n,0,brw*brh-1), hex = gensave[i].toString();
echo(":",i, n,nGEN); nGEN = i; var str = '';
for (var i = 0; i < parseInt(hex.length / 4); i++)
str = str + pad(parseInt(hex.substr(i*4,4), 16).toString(2),16,'0');
for (var j=0;j<Board.length;j++) Board[j] = intr(str.substr(j,1));
drawBoard();
}
function Bin2Hex(n) {
var i = n.indexOf("1"); /// leading Zeros = NAN
if (i == -1) return "0000";
i = right(n,i*-1);
return pad(parseInt(i,2).toString(16),4,'0');
}
function saveGen(n) {
var b = Board.join(''), str = ''; /// concat array to string 10101
for (var i = 0; i < parseInt(b.length / 16); i++)
str = str + Bin2Hex(b.substr(i*16,16));
gensave[n] = str;
}
function right(st,n) {
var s = st.toString();
if (!n) return s;
if (n < 0) return s.substr(n * -1,s.length + n);
return s.substr(s.length - n,n);
}
function pad(str, l, padwith) {
var s = str;
while (s.length < l) s = padwith + s;
return s;
}

Javascript Code 128 String Builder - Ascii Value > 127 issue

I've been trying to build a small html/javascript based code 128 type B text builder.
I have it working for most barcodes, but I'm running into an issue if the value used creates a checksum that is a character that is greater than ascii 127. I'm not sure what I should be using to replace that value in that case. I've read of adding 'Code 3' and 'FNC X' values in the barcode, but it's not clear in what format, with braces and should FNC be 'FNC4' or 'FNC 4', or if that is relevant to the checksum.
I'm using the free 128 font from this site, http://jtbarton.com/Barcodes/BarcodeStringBuilderExample.aspx.
I've tried various conditions, such as if the value is >127, take the existing ascii value instead of adding 32, but the barcode is then not readable.
I have a jsfiddle here, https://jsfiddle.net/jcqvag5g/ . If you use a value like 500.77005.YELLO.XXXXX.0160828, the barcode text is invalid.
Any insight would be appreciated. I haven't found a working solution at the moment. It could also be the specific barcode font I'm using, so recommendations for other solid 128 fonts would also be appreciated.
This is the main js code.
function textTo128(str) {
/*
* Generate 128 Barcode text, suitable for copying and pasting.
*/
var len = str.length; //str.length - get length of string, used to generate the checksum.
var type128 = 104; // 128 Type B start
var typeClose = 106;
var total = 104;
var i; // Counter Variable
for(i=0;i<len;i++){
total += ((i+1) * (str.charCodeAt(i)-32)); //Multiply char position with decimal value of character, keep running total
}
var modVal = total % 103; // Use Modulus to find our checksum
var checksum = String.fromCharCode(modVal+32);
if(modVal+32>126){alert(modVal+32);};
document.getElementById('barcodeTotal').innerHTML = String.fromCharCode(type128+100) + str + checksum + String.fromCharCode(typeClose+100);
}
Thanks,
-David
checksum should add 18 if larger than 126
var checksum = String.fromCharCode(modVal+32 > 126 ? modVal+32+18 : modVal+32);

Categories

Resources