JavaScript Pi Spigot Algorithm not working - javascript

I have translated the following C++ code:
#include <iostream>
using namespace std;
#define NDIGITS 100
#define LEN (NDIGITS/4+1)*14
long a[LEN];
long b;
long c = LEN;
long d;
long e = 0;
long f = 10000;
long g;
long h = 0;
int main(void) {
cout<<b<<endl;
for(; (b=c-=14) > 0 ;){
for(; --b > 0 ;){
d *= b;
if( h == 0 )
d += 2000*f;
else
d += a[b]*f;
g=b+b-1;
a[b] = d % g;
d /= g;
}
h = printf("%ld",e+d/f);
d = e = d % f;
}
getchar();
return 0;
}
Into JavaScript:
function mod(n, m) {
return ((m % n) + n) % n;
} // mod function to fix javascript modulo bug
function calculate(NDIGITS){
var LEN = (NDIGITS / 4 + 1) * 14,
out = "",
a = [],
b = 0,
c = LEN,
d = 0,
e = 0,
f = 10000,
g = 0,
h = 0;
for( ; a.length != LEN; a.push(0));
for( ; (b=c-=14) > 0 ; ){
for(; --b > 0 ;){
d *= b;
if(h == 0)
d += 2000*f;
else
d += a[b]*f;
g=b+b-1;
a[b] = mod(d, g);
d /= g;
};
h = 4;
out += e + d / f;
d = e = mod(d, f);
};
return out;
};
calculate(100);
The problem is, the C++ (which is correct) output looks like this:
314159265358979323846264338327952884197169399375105820974944592307816406286208998628034825342117067
But the JavaScript (which is wrong) output looks like this:
3141.59265358979345928.3358757688158002.0385670499462603.1996016540431161.44919092773639662.2465149363658988.6127837844255865.38922090756173.61883094848226189.6324225085448150.3443440509899223.2179589088062808.1943642437717982.8973948575671840.86646781354151140.38694447211833938.5632867441137341.458720505086448.7384444661472807.14448220310268936.5521832735086764.9290682040381301.76585926509928223.4135991546457438.115065010927
Where did I mess up in my coding? Thanks for the help.

JavaScript does floating point division.
Arguments exchanged in modulo calculation function.
Here is code that produces the same result as the C++ code provided for the given sample (100) digits:
function mod(m, n) {
return ((m % n) + n) % n;
} // mod function to fix javascript modulo bug
function calculate(NDIGITS) {
var LEN = (NDIGITS / 4 + 1) * 14,
out = "",
a = [],
b = 0,
c = LEN,
d = 0,
e = 0,
f = 10000,
g = 0,
h = 0;
for (; a.length !== LEN; a.push(0));
for (; (b = c -= 14) > 0;) {
for (; --b > 0;) {
d *= b;
if (h === 0) {
d += 2000 * f;
} else {
d += a[b] * f;
}
g = b + b - 1;
a[b] = mod(d, g);
d = Math.floor(d / g);
}
h = Math.floor(e + d / f);
out += h;
h = h.length;
d = e = mod(d, f);
}
return out;
}
console.log(calculate(100));

Related

Function for computing percentage of how similar an array of text strings are to each other in javascript

Lets say I have the following array of strings, var = array_of_strings["abc","abcd"]
My goal is to run a function and have this return roughly 75% (0.75). Implying that the results are roughly 75% in common. Roughly being defined as within a certain error range, let us say 5% or some settable number.
I'm currently using the the Levenshtein algorithm to compute differences in the strings, however, this is extremely slow and taxing on the CPU in my situation as the strings I'm using are thousands and thousands of lines long.
Levenshtein gives me what the differences are; and while useful in certain situations, my particular use case is simply looking to see what percentage the strings are roughly different from each other and not what each difference actually is necessarily.
The current levenshtein algorithm I'm using is below (which I borrowed from another answer here on stackoverflow). It will return how many differences it found which I can then use to calculate a percentage difference, but it's very slow! Sometimes taking a couple of seconds to run and freezes up the computer as well.
async function levenshtein(s, t) {
return new Promise((resolve, reject) => {
console.log("levenshtein active");
if (s === t) {
return 0;
}
var n = s.length, m = t.length;
if (n === 0 || m === 0) {
return n + m;
}
var x = 0, y, a, b, c, d, g, h, k;
var p = new Array(n);
for (y = 0; y < n;) {
p[y] = ++y;
}
for (; (x + 3) < m; x += 4) {
var e1 = t.charCodeAt(x);
var e2 = t.charCodeAt(x + 1);
var e3 = t.charCodeAt(x + 2);
var e4 = t.charCodeAt(x + 3);
c = x;
b = x + 1;
d = x + 2;
g = x + 3;
h = x + 4;
for (y = 0; y < n; y++) {
k = s.charCodeAt(y);
a = p[y];
if (a < c || b < c) {
c = (a > b ? b + 1 : a + 1);
}
else {
if (e1 !== k) {
c++;
}
}
if (c < b || d < b) {
b = (c > d ? d + 1 : c + 1);
}
else {
if (e2 !== k) {
b++;
}
}
if (b < d || g < d) {
d = (b > g ? g + 1 : b + 1);
}
else {
if (e3 !== k) {
d++;
}
}
if (d < g || h < g) {
g = (d > h ? h + 1 : d + 1);
}
else {
if (e4 !== k) {
g++;
}
}
p[y] = h = g;
g = d;
d = b;
b = c;
c = a;
}
}
for (; x < m;) {
var e = t.charCodeAt(x);
c = x;
d = ++x;
for (y = 0; y < n; y++) {
a = p[y];
if (a < c || d < c) {
d = (a > d ? d + 1 : a + 1);
}
else {
if (e !== s.charCodeAt(y)) {
d = c + 1;
}
else {
d = c;
}
}
p[y] = d;
c = a;
}
h = d;
}
resolve(h);
})
}
My question is, is there a way to calculate the difference faster when large string sets are used? In my case accuracy is not too important just as long as a rough difference is known of a certain percentage.
For example, if a research paper was published and I have the original paper and the students paper I want to know if roughly 10% of the students paper is plagiarized.
Maybe if I cut a random parts out of the strings this can help to save on time but this feels very dirty/inefficient.

Unknow json response from Sencha based api

I am having trouble decoding the following json response from a sencha based api.
I tried decoding it to base64 string but the end result was always malformed and has many unrecognized characters.
function handleServerReponse(a) {
if (a.substr(0, 5) == 'I:Qc[') {
var f = 5;
var e = (a.substr(5) + '').split('');
var d = [];
for (var b = e.length - 1; b >= 0; b--) {
d[b] = String.fromCharCode(e[b].charCodeAt(0) - f)
}
var c = d.join('');
//c = fix_utf8(base64_decode(c));
//c = base64_decode(c);
a = c
}
return a
}
function fix_utf8(c) {
var d = []
, a = 0
, b = 0
, e = 0
, f = 0;
while (a < c.length) {
b = c.charCodeAt(a);
if (b < 128) {
d.push(String.fromCharCode(b));
a++
} else {
if (b > 191 && b < 224) {
e = c.charCodeAt(a + 1);
d.push(String.fromCharCode((b & 31) << 6 | e & 63));
a += 2
} else {
e = c.charCodeAt(a + 1);
f = c.charCodeAt(a + 2);
d.push(String.fromCharCode((b & 15) << 12 | (e & 63) << 6 | f & 63));
a += 3
}
}
}
return d.join('')
}
function base64_decode(a) {
var e = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=';
var f, j, n, l, m, h, i, d, b = 0, g = 0, k = '', c = [];
if (!a) {
return a
}
a += '';
do {
l = e.indexOf(a.charAt(b++));
m = e.indexOf(a.charAt(b++));
h = e.indexOf(a.charAt(b++));
i = e.indexOf(a.charAt(b++));
d = l << 18 | m << 12 | h << 6 | i;
f = d >> 16 & 255;
j = d >> 8 & 255;
n = d & 255;
if (h == 64) {
c[g++] = String.fromCharCode(f)
} else {
if (i == 64) {
c[g++] = String.fromCharCode(f, j)
} else {
c[g++] = String.fromCharCode(f, j, n)
}
}
} while (b < a.length);k = c.join('');
return k
}
function utf8_from_str(s) {
for(var i=0, enc = encodeURIComponent(s), a = []; i < enc.length;) {
if(enc[i] === '%') {
a.push(parseInt(enc.substr(i+1, 2), 16))
i += 3
} else {
a.push(enc.charCodeAt(i++))
}
}
return a
}
function utf8_to_str(a) {
for(var i=0, s=''; i<a.length; i++) {
var h = a[i].toString(16)
if(h.length < 2) h = '0' + h
s += '%' + h
}
return decodeURIComponent(s)
}
The js function above just tries to convert into utf-8 but there are still non-recognised characters in the resulting json.
The API response I'm trying to decode is
I:Qc[j~O~_]S6gMWNougj~OVhr>|_]O5jZyqjXN;NoZ}SYl~TYN}SnNxNp}{griuiM[p_XN;Nn5}RYhzRoZ9RV~Nn|nYLK5f]W6_LZnTnNRn99RY^~TYJnQHOYiLK5_XN;NpSGNn|nWpqVZ~N;No^|SRnQHOIf]W:NotnYJJlXp>RYJJnQHOff]GLf]_qNotnTYN|RhnQHOG_LW~_]SNotnSYZ7RHGIVZ:JYJ[RXZiN[HGJZnNxNqOm_LK~XZVnTnOVRYqHRYF8WXNxNqGZj]GqNotnZ5_XNn|n[]SqV7>p_[WuiL}qNotnZ7qz_7}qNJ_mg\qxjXGX_]Su_L[z^7ZnQHOH_\WNotnR~NxNpOmiLmNotnRnNxNqS}WsVnTnN~QIV6R~NxNp}{iKSujrZnTnN}SX|8RIFnQHORg8WYf]uqV\S~_]RnTnN|QoVnQHO__\K~Vs[ugMVnTnN}TY^7Nn|nf]SY^\6qY\KugLqz_5>~W]mqg]G5NotnYr=nQHOG[p5nTnNpRn|6SoFxSIF|Nn|nV]_mf\}m^r}qW]K6f]W:NotnOIVTH||RoVnQHOugp_{hr[ogL>i]OqNotnYr=nQHOR^]S5[MOmgsSr_]OX_\SJ^]WqY\>ziLm__\K~NotnV][sNIN|RYNnQHOR^]S5[MOmgsSr_]O\^\}6_XN;NnV}QIh8RH|6RIFnQHOuh5}uh8Wq_J_{hqSmgLZnTnOTg~NxNp}uh8WNou<Nrm~_\^nTnN{ioJ{hMO{hL[~iLqqh~>VRYqHRYF8WX>xf]S5h~O>QHOIg7:5^\S5h~N;j~Othr[rNotnQ8^}Q8G~g8GqhsWu_]R{ZIJ:VoJ|S5Z{hL[~h7>zh~>rf\[x_MR{Y8_qhs_u_]hnkX|nZLm{iL>[rqp_\>Nou<Nrm~_\^nTnN{ioJ{hMO{hL[~iLqqh~>VRYqHRYF8WX>|fL>5g8S7f\Wqg8RnkX|n[]GshrKp_Z}ugryLg8Oq^7}{h8[~_]RnTsxn_s[z^8Wug79nTnOofLKz_7[Yi\O^8OuhMWug79nQHOxf\:wiL[9iHN;NqGx_\K_XG[hLi~^\WqNs5xNq[|_8Om_L[Rf\:w[MOmgsSm^8Wug7:Nou<Nr_6grS5f\>zNotn^7mmgriqZ8[nh7S~f]G5f\>zNn|ngLqzf8WqjMVnTnOVgL[mh7Zl[]GshrKp_XO>QHO[hLi~^\WqYLqzf5qzir[iL6qgsWGgrKxj]Suh~N;j~Ori\:oiLq{gnN;NrSt^\:s_[S6^sSohrq|iLq{gnNxNr}ugry5_]m5NotnZL}q^]SqNK[|_8Om_LZnkX|n[]GshrKp_Z}ugryIg76|^]Om^r}qh~N;j~Ori\:oiLq{gnN;NrSt^\:s_[S6^sSohrq|iLq{gnNxNr}ugry5_]m5NotnZL}q^]SqNK[|_8Om_LZnk]6ikVBB

Decode encripted/base64 string

I am working on a project where I need to decode the following string:
I:Qc[j~O~_]S6gMWNougj~OVhr>|_]O5jZyqjXN;NoZ}SYl~TYN}SnNxNp}{griuiM[p_XN;Nn5}RYhzRoZ9RV~Nn|nYLK5f]W6_LZnTnNRn99RY^~TYJnQHOYiLK5_XN;NpSGNn|nWpqVZ~N;No^|SRnQHOIf]W:NotnYJJlXp>RYJJnQHOff]GLf]_qNotnTYN|RhnQHOG_LW~_]SNotnSYZ7RHGIVZ:JYJ[RXZiN[HGJZnNxNqOm_LK~XZVnTnOVRYqHRYF8WXNxNqGZj]GqNotnZ5_XNn|n[]SqV7>p_[WuiL}qNotnZ7qz_7}qNJ_mg\qxjXGX_]Su_L[z^7ZnQHOH_\WNotnR~NxNpOmiLmNotnRnNxNqS}WsVnTnN~QIV6R~NxNp}{iKSujrZnTnN}SX|8RIFnQHORg8WYf]uqV\S~_]RnTnN|QoVnQHO__\K~Vs[ugMVnTnN}TY^7Nn|nf]SY^\6qY\KugLqz_5>~W]mqg]G5NotnYr=nQHOG[p5nTnNpRn|6SoFxSIF|Nn|nV]_mf\}m^r}qW]K6f]W:NotnOIVTH||RoVnQHOugp_{hr[ogL>i]OqNotnYr=nQHOR^]S5[MOmgsSr_]OX_\SJ^]WqY\>ziLm__\K~NotnV][sNIN|RYNnQHOR^]S5[MOmgsSr_]O\^\}6_XN;NnV}QIh8RH|6RIFnQHOuh5}uh8Wq_J_{hqSmgLZnTnOTg~NxNp}uh8WNou<Nrm~_\^nTnN{ioJ{hMO{hL[~iLqqh~>VRYqHRYF8WX>xf]S5h~O>QHOIg7:5^\S5h~N;j~Othr[rNotnQ8^}Q8G~g8GqhsWu_]R{ZIJ:VoJ|S5Z{hL[~h7>zh~>rf\[x_MR{Y8_qhs_u_]hnkX|nZLm{iL>[rqp_\>Nou<Nrm~_\^nTnN{ioJ{hMO{hL[~iLqqh~>VRYqHRYF8WX>|fL>5g8S7f\Wqg8RnkX|n[]GshrKp_Z}ugryLg8Oq^7}{h8[~_]RnTsxn_s[z^8Wug79nTnOofLKz_7[Yi\O^8OuhMWug79nQHOxf\:wiL[9iHN;NqGx_\K_XG[hLi~^\WqNs5xNq[|_8Om_L[Rf\:w[MOmgsSm^8Wug7:Nou<Nr_6grS5f\>zNotn^7mmgriqZ8[nh7S~f]G5f\>zNn|ngLqzf8WqjMVnTnOVgL[mh7Zl[]GshrKp_XO>QHO[hLi~^\WqYLqzf5qzir[iL6qgsWGgrKxj]Suh~N;j~Ori\:oiLq{gnN;NrSt^\:s_[S6^sSohrq|iLq{gnNxNr}ugry5_]m5NotnZL}q^]SqNK[|_8Om_LZnkX|n[]GshrKp_Z}ugryIg76|^]Om^r}qh~N;j~Ori\:oiLq{gnN;NrSt^\:s_[S6^sSohrq|iLq{gnNxNr}ugry5_]m5NotnZL}q^]SqNK[|_8Om_LZnk]6ikVBB
I believe I was able to extract the following JavaScript functions that are needed to decode the string:
function handleServerReponse(a) {
if (a.substr(0, 5) == 'I:Qc[') {
var f = 5;
var e = (a.substr(5) + '').split('');
var d = [];
for (var b = e.length - 1; b >= 0; b--) {
d[b] = String.fromCharCode(e[b].charCodeAt(0) - f)
}
a = d.join('');
}
return a
}
function fix_utf8(c) {
var d = []
, a = 0
, b = 0
, e = 0
, f = 0;
while (a < c.length) {
b = c.charCodeAt(a);
if (b < 128) {
d.push(String.fromCharCode(b));
a++
} else {
if (b > 191 && b < 224) {
e = c.charCodeAt(a + 1);
d.push(String.fromCharCode((b & 31) << 6 | e & 63));
a += 2
} else {
e = c.charCodeAt(a + 1);
f = c.charCodeAt(a + 2);
d.push(String.fromCharCode((b & 15) << 12 | (e & 63) << 6 | f & 63));
a += 3
}
}
}
return d.join('')
}
function base64_decode(a) {
var e = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=';
var f, j, n, l, m, h, i, d, b = 0, g = 0, k = '', c = [];
if (!a) {
return a
}
a += '';
do {
l = e.indexOf(a.charAt(b++));
m = e.indexOf(a.charAt(b++));
h = e.indexOf(a.charAt(b++));
i = e.indexOf(a.charAt(b++));
d = l << 18 | m << 12 | h << 6 | i;
f = d >> 16 & 255;
j = d >> 8 & 255;
n = d & 255;
if (h == 64) {
c[g++] = String.fromCharCode(f)
} else {
if (i == 64) {
c[g++] = String.fromCharCode(f, j)
} else {
c[g++] = String.fromCharCode(f, j, n)
}
}
} while (b < a.length);k = c.join('');
return k
}
function utf8_from_str(s) {
for(var i=0, enc = encodeURIComponent(s), a = []; i < enc.length;) {
if(enc[i] === '%') {
a.push(parseInt(enc.substr(i+1, 2), 16))
i += 3
} else {
a.push(enc.charCodeAt(i++))
}
}
return a
}
function utf8_to_str(a) {
for(var i=0, s=''; i<a.length; i++) {
var h = a[i].toString(16)
if(h.length < 2) h = '0' + h
s += '%' + h
}
return decodeURIComponent(s)
}
After different attemps/combinations this is how I am using the code:
var c = handleServerReponse('encoded string above');
c = base64_decode(c);
c = utf8_from_str(c);
c = utf8_to_str(c);
c = fix_utf8(c);
And this is the result I am getting. As you can notice, it seems to be a JSON string but is not decoded properly. Any hints on how I can decode it properly?
{"resultHȔ뜙\힒ٞHLMNM苈㛛隝YHLLMˌ펌C""¤ƗF燖FR#⠈蠄؈䄈ɍхє訉
谉%AL計؀Ӣ,"City":"LA JOLLA","ZipFive":"9201ȋٙ霒#⣓Sc4䄄TĔt傄E""¥&F$䂣⥃䣓tR"¥G熒#⥴e""¥W6T6憕F熆R#⥶榶ƒf層ቕͥᕹ픈ɉ䈎茈눐蝚#⣢"¥7gB#⣢ÃS2"¤ƷE6禒#⣓Ró"¤ƷE6禔
ɕ̈舀萈ɥᲂuilt":"1966","isS﹓憖洷$W斗B#⤦⢂$dң⢃"ÓcÃ"¤fᅉᕕŕ呤計и,024","inForeclo]\鈎蓛ȋ㘜ݕ蛜ٙ\䙴FFT�ᥑr":"Aug 2012","LastTransferXǖR#⢃ósÓ"¦紆緆VDf祶ƒ#⤦⢂$Ɨ7D詬顉梺"/v1/properties/P19B107E/lists"},"Cont`ݜȎ陈݌Kܜ뜙\횙\˔NP쌍ыܙ\웛싙ꖆG2䷦W'f旲'҂%淆噥ៈꜙb#⢷c燦熗'F旲僓䣓tR熆熷7fᕽ̉䰉U]Ʌᕱ幭퉕�͕ɕ̈鬉镹푥�艍ᅹ핍Ҙܚ\[ۈ눛淆W炣⥆ƅe UpgraH圙ܘYS浇&綖7F涤詬陕鍑ﮢ:"changeSubscriptkۈ눛[ꝙ^X\و\ܘYH圙ܘFTƖ洖禕ѵ幑酱卥̈鬉鞣tion":"chc陔ݘ옜ꜝ[ۈ눛[ꝙ^X\و\ܘYH圙ܘYS[ꐛۜ\蘛\Ȏ�F涢#⦶湝单鍍ɥQ彸谉᥹푕ᐈ艁ᕅ͔ᕁ퉅ᔉ嵵䀀

Speeding up Levenshtein distance calculation in Ionic app

What I'm doing: I'm developing a mobile dictionary app for a number of languages
How I'm doing it: Using ionic framework with combination of some angular and some pure js (imported from a working online dictionary site of the same languages)
The problem: Our search function is an approximate search that uses a Levenstein distance calculator to rank all entries in the dictionary with respect to the query form. When the dictionary has up to 1,500 words, this isn't a problem at all on phones, but when the dictionary has around 10,000 words, there is about a 5-8 second delay before results are shown, despite it being instantaneous on a web browser using "ionic serve". When I run firebug, the javascript that takes the longest time to process are the distance calculations, so my working assumption is that this is where I should start, but I'm open to any suggestions at all.
Here's the distance calculator:
/**
* editDistance.js
*
* A simple Levenshtein distance calculator, except weighted such
* that insertions at the beginning and deletions at the end cost less.
*
* AUTHOR: Pat Littell
* LAST UPDATED: 2015-05-16
*/
var distanceCalculator = {
insertionCost : 1.0,
deletionCost : 1.0,
insertionAtBeginningCost : 0.11,
deletionAtEndCost : 0.1,
substitutionCost : 1.0,
getEditDistance : function(a, b) {
if(a.length === 0) return b.length;
if(b.length === 0) return a.length;
var matrix = [];
// var currentInsertionCost, currentDeletionCost, currentSubstitutionCost = 0;
// increment along the first column of each row
var i;
for(i = 0; i <= b.length; i++){
matrix[i] = [i * this.insertionAtBeginningCost];
}
// increment each column in the first row
var j;
for(j = 0; j <= a.length; j++){
matrix[0][j] = j;
}
// Fill in the rest of the matrix
for(i = 1; i <= b.length; i++){
for(j = 1; j <= a.length; j++){
currentInsertionCost = matrix[i][j-1] + this.insertionCost;
currentSubstitutionCost = matrix[i-1][j-1] + (b.charAt(i-1) != a.charAt(j-1) ? this.substitutionCost : 0);
currentDeletionCost = matrix[i-1][j] + (j==a.length ? this.deletionAtEndCost : this.deletionCost);
matrix[i][j] = Math.min(currentSubstitutionCost, Math.min(currentInsertionCost, currentDeletionCost));
}
}
return matrix[b.length][a.length];
},
// Given a query <a> and a series of targets <bs>, return the least distance to any target
getLeastEditDistance : function(a, bs) {
var that = this;
return Math.min.apply(null, bs.map(function(b) {
return that.getEditDistance(a,b);
}));
}
}
First of all, if you have a known dictionary you will get the fastest solution with something like a Levenshtein Automata, which will solve this in linear time to get all candidates. You can't beat this with a general purpose implementation.
With that said, this implementation of levenshtein distance is a few times faster than yours.
function distance(s, t) {
if (s === t) {
return 0;
}
var n = s.length, m = t.length;
if (n === 0 || m === 0) {
return n + m;
}
var x = 0, y, py, a, b, c, d, e, f, k;
var p = new Array(n);
for (y = 0; y < n;) {
p[y] = ++y;
}
for (; (x + 3) < m; x += 4) {
var tx0 = t.charCodeAt(x);
var tx1 = t.charCodeAt(x + 1);
var tx2 = t.charCodeAt(x + 2);
var tx3 = t.charCodeAt(x + 3);
a = x;
b = x + 1;
c = x + 2;
d = x + 3;
e = x + 4;
for (y = 0; y < n; y++) {
k = s.charCodeAt(y);
py = p[y];
if (py < a || b < a) {
a = (py > b ? b + 1 : py + 1);
}
else {
if (tx0 !== k) {
a++;
}
}
if (a < b || c < b) {
b = (a > c ? c + 1 : a + 1);
}
else {
if (tx1 !== k) {
b++;
}
}
if (b < c || d < c) {
c = (b > d ? d + 1 : b + 1);
}
else {
if (tx2 !== k) {
c++;
}
}
if (c < d || e < d) {
d = (c > e ? e + 1 : c + 1);
}
else {
if (tx3 !== k) {
d++;
}
}
p[y] = e = d;
d = c;
c = b;
b = a;
a = py;
}
}
for (; x < m;) {
tx0 = t.charCodeAt(x);
a = x;
b = ++x;
for (y = 0; y < n; y++) {
py = p[y];
if (py < a || b < a) {
b = (py > b ? b + 1 : py + 1);
}
else {
if (tx0 !== s.charCodeAt(y)) {
b = a + 1;
}
else {
b = a;
}
}
p[y] = b;
a = py;
}
f = b;
}
return f;
}
I would also not use map in getLeastEditDistance, it is very slow. Just use a normal loop. Also Math.min with many arguments is not very performant.
I am working with Levenstein distances by my self and I have not found a good way to improve performance and will not recommend using it in a non-batch application.
I suggest you use another approach by using a search tree. A binary or ternary search tree can also find near match.
A good place to start is those articles:
http://www.codeproject.com/Articles/5819/Ternary-Search-Tree-Dictionary-in-C-Faster-String
or
http://www.codeproject.com/Articles/68500/Balanced-Binary-Search-Tree-BST-Search-Delete-InOr
The code is relatively simple sp you should not use much time to port it to JavaScript.

Fastest general purpose Levenshtein Javascript implementation

I am looking for a good general purpose Levenshtein implementation in Javascript. It must be fast and be useful for short and long strings. It should also be used many times (hence the caching). The most important thing is that it calculates a plain simple Levenshtein distance. I came up with this:
var levenshtein = (function() {
var row2 = [];
return function(s1, s2) {
if (s1 === s2) {
return 0;
} else {
var s1_len = s1.length, s2_len = s2.length;
if (s1_len && s2_len) {
var i1 = 0, i2 = 0, a, b, c, c2, row = row2;
while (i1 < s1_len)
row[i1] = ++i1;
while (i2 < s2_len) {
c2 = s2.charCodeAt(i2);
a = i2;
++i2;
b = i2;
for (i1 = 0; i1 < s1_len; ++i1) {
c = a + (s1.charCodeAt(i1) === c2 ? 0 : 1);
a = row[i1];
b = b < a ? (b < c ? b + 1 : c) : (a < c ? a + 1 : c);
row[i1] = b;
}
}
return b;
} else {
return s1_len + s2_len;
}
}
};
})();
Now I have two questions:
Can this be faster? I know by writing out the first iteration of each loop one can gain about 20%.
Is this code well written to serve as general purpose code, to be used in a library for instance?
We had a competition for fun at work about making the fastest levenshtein implementation and I came up with a faster one. First of all, I must say that it was not easy to beat your solution which was the fastest to find "out there". :)
This is tested with node.js and it my benchmarks results indicates that this implementation is ~15% faster on small texts (random words size 2-10 characters) and over twice as fast on longer texts (with lengths 30+ containing random characters)
Note: I removed array caching of all implementations
function levenshtein(s, t) {
if (s === t) {
return 0;
}
var n = s.length, m = t.length;
if (n === 0 || m === 0) {
return n + m;
}
var x = 0, y, a, b, c, d, g, h, k;
var p = new Array(n);
for (y = 0; y < n;) {
p[y] = ++y;
}
for (; (x + 3) < m; x += 4) {
var e1 = t.charCodeAt(x);
var e2 = t.charCodeAt(x + 1);
var e3 = t.charCodeAt(x + 2);
var e4 = t.charCodeAt(x + 3);
c = x;
b = x + 1;
d = x + 2;
g = x + 3;
h = x + 4;
for (y = 0; y < n; y++) {
k = s.charCodeAt(y);
a = p[y];
if (a < c || b < c) {
c = (a > b ? b + 1 : a + 1);
}
else {
if (e1 !== k) {
c++;
}
}
if (c < b || d < b) {
b = (c > d ? d + 1 : c + 1);
}
else {
if (e2 !== k) {
b++;
}
}
if (b < d || g < d) {
d = (b > g ? g + 1 : b + 1);
}
else {
if (e3 !== k) {
d++;
}
}
if (d < g || h < g) {
g = (d > h ? h + 1 : d + 1);
}
else {
if (e4 !== k) {
g++;
}
}
p[y] = h = g;
g = d;
d = b;
b = c;
c = a;
}
}
for (; x < m;) {
var e = t.charCodeAt(x);
c = x;
d = ++x;
for (y = 0; y < n; y++) {
a = p[y];
if (a < c || d < c) {
d = (a > d ? d + 1 : a + 1);
}
else {
if (e !== s.charCodeAt(y)) {
d = c + 1;
}
else {
d = c;
}
}
p[y] = d;
c = a;
}
h = d;
}
return h;
}
On longer texts it will get almost up to 3 times the speed of your implementation if it initially cache the inner loop's s.charCodeAt(y) in an Uint32Array. Longer texts also seemed to benefit from using a Uint16Array as a the distance cost array. Here is the code for that solution
function levenshtein(s, t) {
if (s === t) {
return 0;
}
var n = s.length, m = t.length;
if (n === 0 || m === 0) {
return n + m;
}
var x = 0, y, a, b, c, d, g, h;
var p = new Uint16Array(n);
var u = new Uint32Array(n);
for (y = 0; y < n;) {
u[y] = s.charCodeAt(y);
p[y] = ++y;
}
for (; (x + 3) < m; x += 4) {
var e1 = t.charCodeAt(x);
var e2 = t.charCodeAt(x + 1);
var e3 = t.charCodeAt(x + 2);
var e4 = t.charCodeAt(x + 3);
c = x;
b = x + 1;
d = x + 2;
g = x + 3;
h = x + 4;
for (y = 0; y < n; y++) {
a = p[y];
if (a < c || b < c) {
c = (a > b ? b + 1 : a + 1);
}
else {
if (e1 !== u[y]) {
c++;
}
}
if (c < b || d < b) {
b = (c > d ? d + 1 : c + 1);
}
else {
if (e2 !== u[y]) {
b++;
}
}
if (b < d || g < d) {
d = (b > g ? g + 1 : b + 1);
}
else {
if (e3 !== u[y]) {
d++;
}
}
if (d < g || h < g) {
g = (d > h ? h + 1 : d + 1);
}
else {
if (e4 !== u[y]) {
g++;
}
}
p[y] = h = g;
g = d;
d = b;
b = c;
c = a;
}
}
for (; x < m;) {
var e = t.charCodeAt(x);
c = x;
d = ++x;
for (y = 0; y < n; y++) {
a = p[y];
if (a < c || d < c) {
d = (a > d ? d + 1 : a + 1);
}
else {
if (e !== u[y]) {
d = c + 1;
}
else {
d = c;
}
}
p[y] = d;
c = a;
}
h = d;
}
return h;
}
All benchmark results is from my tests and test-data might be different with your test-data.
The main 2 difference in this solution than to yours (and some other fast ones) I think is
Not always do compare of the characters in the inner loop if not necessary.
Sort of "Loop unrolling" in the outer loop doing 4 rows at a time in the levenshtein "matrix". This was a major performance win.
http://jsperf.com/levenshtein-distance/24
I will put this solution on github when I find the time :)
Update:
Finally, I put the solution on github
https://github.com/gustf/js-levenshtein. Its a bit modified/optimized but it is the same base algorithm.

Categories

Resources