A bug in a string comparison algorithm - javascript

Description
I'm trying to implement a JS version of Levenshtein distance function, using the matrix method described on this page in Wikipedia.
Problem
The algorithm works as expected, it returns the difference between the strings (the amount of edits you need to do for strings to be equal), except it ignores index 0, no matter what character is at index 0, it always considers it to be "correct":
levenshteinDistance('cat', 'cave') // 2 (correct)
levenshteinDistance('cat', 'cap') // 1 (correct)
levenshteinDistance('cat', 'hat') // 0 (should be 1)
levenshteinDistance('cat', 'rat') // 0 (should be 1)
levenshteinDistance('cat', 'bat') // 0 (should be 1)
Code
https://codepen.io/aQW5z9fe/pen/mdPvJqV?editors=0011
function levenshteinDistance (string1, string2, options) {
if (string1 === string2) { return 0 }
let matrix = []
let cost
let i
let j
// Init first column of each row
for (i = 0; i <= string1.length; i++) {
matrix[i] = [i]
}
// Init each column in the first row
for (j = 0; j <= string2.length; j++) {
matrix[0][j] = j
}
// Fill in the rest of the matrix
for (i = 1; i <= string1.length; i++) {
for (j = 1; j <= string2.length; j++) {
// Set cost
cost = string1[i] === string2[j]
? 0
: 1
// Set the distances
matrix[i][j] = Math.min(
matrix[i - 1][j] + 1, // deletion
matrix[i][j - 1] + 1, // insertion
matrix[i - 1][j - 1] + cost // substitution
)
if (
options.allowTypos &&
i > 1 &&
j > 1 &&
string1[i] === string2[j - 1] &&
string1[i - 1] === string2[j]
) {
matrix[i][j] = Math.min(
matrix[i][j],
matrix[i - 2][j - 2] + 1
) // transposition
}
}
}
return matrix[string1.length][string2.length]
}
console.log(
levenshteinDistance('cat', 'hat', { allowTypos: true })
)

I think you just made a small mistake I think this:
cost = string1[i] === string2[j]
Should be :
cost = string1[i-1] === string2[j-1]
Since otherwise you never check for the cost of the first letter in the strings and the cost for the letters after that in case of the substitution is always derived from that.
EDIT:
The part inside the transpose section/ allow typo section should also be changed from:
string1[i] === string2[j - 1] &&
string1[i - 1] === string2[j]
to
string1[i-1] === string2[j - 2] &&
string1[i - 2] === string2[j-1]
After looking at the Wikipedia article they for some reason use 1 indexed arrays for the strings and 0 indexed arrays for the matrix, so I guess that was the root of the problem.

Related

Javascript - adding Integers using arrays

I am using an array in order to calculate large powers of 2. The arrays add to each other and afterwords they calculate the carries and loop n-1 amount of times until i end up with the number as an array. I do this in order to avoid the 15 digit limit that JavaScript has.
Everything works fine once i reach n = 42, where the carries start to be overlooked and numbers aren't reduced, producing wrong answers.
I tried changing the method of which the carry is processed inside the while loop from basic addition to integer division and modulus
Sounds stupid but i added an extra loop to check if any elements are greater than 10 but it didn't find them.
for (var n = 1; n <= 100; n++) {
for (var i = 0, x = [2]; i < n - 1; i++) { // Loop for amount of times to multiply
x.unshift(0)
for (var j = x.length - 1; j > 0; j--) { // Double each element of the array
x[j] += x[j]
}
for (j = x.length - 1; x[j] > 0; j--) { // Check if element >= 10 and carry
while (x[j] >= 10) {
x[j - 1] += Math.floor(x[j] / 10)
x[j] = x[j] % 10
}
}
if (x[0] === 0) {
x.shift()
}
}
console.log('N: ' + n + ' Array: ' + x)
}
The expected results are that each element in the array will be reduced into a single number and will "carry" onto the element to its left like :
N: 1 Array: 2
N: 2 Array: 4
N: 3 Array: 8
N: 4 Array: 1,6
N: 5 Array: 3,2
N: 6 Array: 6,4
but starting at n=42 carries get bugged looking like this:
N: 42 Array: 4,2,18,18,0,4,6,5,1,1,1,0,4
N: 43 Array: 8,4,36,36,0,8,12,10,2,2,2,0,8
N: 44 Array: 1,7,5,9,2,1,8,6,0,4,4,4,1,6
N: 45 Array: 2,14,10,18,4,2,16,12,0,8,8,8,3,2
N: 46 Array: 7,0,3,6,8,7,4,4,1,7,7,6,6,4
N: 47 Array: 14,0,7,3,7,4,8,8,3,5,5,3,2,8
What's the error that could be throwing it off like this?
I think the reason your code doesn't work is this line for (j = x.length - 1; x[j] > 0; j--) { // Check if element >= 10 and carry you don't want to check for x[j] > 0 but j > 0.
Also your second loop: for (var i = 0, x = [2]; i < n - 1; i++) { - you don't need it, there is no reason to recalculate everything on every iteration, you can use previous result.
You can also double values this way : x = x.map(n => n * 2) (seems a bit more coventional to me).
And there is no need to x[j - 1] += Math.floor(x[j] / 10) it could be just x[j - 1] += 1 as previous numbers are up to 9, doubled they are no more than 18 so 1 is the only case if x[j] >= 10.
Could be the code:
let x = [2] // starting point
for (var n = 1; n <= 100; n++) {
x = [0, ...x].map(n => n * 2)
for (j = x.length - 1; j > 0; j--) {
if (x[j] >= 10) {
x[j - 1] += 1
x[j] %= 10
}
}
if (x[0] === 0) {
x = x.slice(1)
}
console.log('N: ' + n + ' Array: ' + x)
}
If all you want are large powers of 2, why are you going through the insane hassle of using lists to calculate that? Isn't this the exact same:
function BigPow2(x, acc=2.0) {
//document.writeln(acc);
acc = acc >= 5 ? acc / 5 : acc * 2;
return x <= 1 ? acc : BigPow2(x-1, acc);
}
Or alternatively, use BigInt?

How to stop a For Loop in a middle and continue from there down back in JavaScript

I have a JavaScript code like so:
var myArray = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20];
for (var i = 0, di = 1; i >= 0; i += di) {
if (i == myArray.length - 1) { di = -1; }
document.writeln(myArray[i]);
}
I need it to stop right in the middle like 10 and from 10 starts counting down to 0 back.
So far, I've managed to make it work from 0 to 20 and from 20 - 0.
How can I stop it in a middle and start it from there back?
Please help anyone!
Here is an example using a function which accepts the array and the number of items you want to display forwards and backwards:
var myArray = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20];
if(myArray.length === 1){
ShowXElementsForwardsAndBackwards(myArray, 1);
}
else if(myArray.length === 0) {
//Do nothing as there are no elements in array and dividing 0 by 2 would be undefined
}
else {
ShowXElementsForwardsAndBackwards(myArray, (myArray.length / 2));
}
function ShowXElementsForwardsAndBackwards(mYarray, numberOfItems){
if (numberOfItems >= mYarray.length) {
throw "More Numbers requested than length of array!";
}
for(let x = 0; x < numberOfItems; x++){
document.writeln(mYarray[x]);
}
for(let y = numberOfItems - 1; y >= 0; y--){
document.writeln(mYarray[y]);
}
}
Just divide your array length by 2
var myArray = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20];
for (var i = 0, di = 1; i >= 0; i += di) {
if (i == ((myArray.length / 2) -1 )) { di = -1; }
document.writeln(myArray[i]);
}
Could Array.reverse() help you in this matter?
const array = [0,1,3,4,5,6,7,8,9,10,11,12,13,14,15]
const getArrayOfAmount = (array, amount) => array.filter((item, index) => index < amount)
let arraySection = getArrayOfAmount(array, 10)
let reversed = [...arraySection].reverse()
console.log(arraySection)
console.log(reversed)
And then you can "do stuff" with each array with watever array manipulation you desire.
Couldn’t you just check if you’ve made it halfway and then subtract your current spot from the length?
for(i = 0; i <= myArray.length; i++){
if( Math.round(i/myArray.length) == 1 ){
document.writeln( myArray[ myArray.length - i] );
} else {
document.writeln( myArray[i] );
}
}
Unless I’m missing something?
You could move the checking into the condition block of the for loop.
var myArray = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20];
for (
var i = 0, l = (myArray.length >> 1) - 1, di = 1;
i === l && (di = -1), i >= 0;
i += di
) {
document.writeln(myArray[i]);
}
If you capture the midpoint ( half the length of the array ), just start working your step in the opposite direction.
const N = 20;
let myArray = [...Array(N).keys()];
let midpoint = Math.round(myArray.length/2)
for ( let i=1, step=1; i; i+=step) {
if (i === midpoint)
step *= -1
document.writeln(myArray[i])
}
To make things clearer, I've:
Started the loop iterator variable (i) at 1; this also meant the array has an unused 0 value at 0 index; in other words, myArray[0]==0 that's never shown
Set the the loop terminating condition to i, which means when i==0 the loop will stop because it is falsy
Renamed the di to step, which is more consistent with other terminology
The midpoint uses a Math.round() to ensure it's the highest integer (midpoint) (e.g., 15/2 == 7.5 but you want it to be 8 )
The midpoint is a variable for performance reasons; calculating the midpoint in the loop body is redundant and less efficient since it only needs to be calculated once
For practical purpose, made sizing the array dynamic using N
Updated to ES6/ES7 -- this is now non-Internet Explorer-friendly [it won't work in IE ;)] primarily due to the use of the spread operator (...) ... but that's easily avoidable

Javascript's equivalent of R's findInterval() or Python's bisect.bisect_left

I can't find how to determine to which interval an element belongs based on an Array for JavaScript. I want the behavior of bisect.bisect_left from Python. Here is some sample code:
import bisect
a = [10,20,30,40]
print(bisect.bisect_left(a,0)) #0 because 0 <= 10
print(bisect.bisect_left(a,10)) #0 because 10 <= 10
print(bisect.bisect_left(a,15)) #1 because 10 < 15 < 20
print(bisect.bisect_left(a,25)) #2 ...
print(bisect.bisect_left(a,35)) #3 ...
print(bisect.bisect_left(a,45)) #4
I know this would be easy to implement, but why re-invent the wheel?
In case anyone else lands here, here's an implementation of bisect_left that actually runs in O(log N), and should work regardless of the interval between list elements. NB that is does not sort the input list, and, as-is, will likely blow the stack if you pass it an unsorted list. It's also only set up to work with numbers, but it should be easy enough to adapt it to accept a comparison function. Take this as a starting point, not necessarily your destination. Improvements are certainly welcome!
Run it in a REPL
function bisect(sortedList, el){
if(!sortedList.length) return 0;
if(sortedList.length == 1) {
return el > sortedList[0] ? 1 : 0;
}
let lbound = 0;
let rbound = sortedList.length - 1;
return bisect(lbound, rbound);
// note that this function depends on closure over lbound and rbound
// to work correctly
function bisect(lb, rb){
if(rb - lb == 1){
if(sortedList[lb] < el && sortedList[rb] >= el){
return lb + 1;
}
if(sortedList[lb] == el){
return lb;
}
}
if(sortedList[lb] > el){
return 0;
}
if(sortedList[rb] < el){
return sortedList.length
}
let midPoint = lb + (Math.floor((rb - lb) / 2));
let midValue = sortedList[midPoint];
if(el <= midValue){
rbound = midPoint
}
else if(el > midValue){
lbound = midPoint
}
return bisect(lbound, rbound);
}
}
console.log(bisect([1,2,4,5,6], 3)) // => 2
console.log(bisect([1,2,4,5,6], 7)) // => 5
console.log(bisect([0,1,1,1,1,2], 1)) // => 1
console.log(bisect([0,1], 0)) // => 0
console.log(bisect([1,1,1,1,1], 1)) // => 0
console.log(bisect([1], 2)); // => 1
console.log(bisect([1], 1)); // => 0
Speaking of re-inventing the wheel, I'd like to join the conversation:
function bisectLeft(arr, value, lo=0, hi=arr.length) {
while (lo < hi) {
const mid = (lo + hi) >> 1;
if (arr[mid] < value) {
lo = mid + 1;
} else {
hi = mid;
}
}
return lo;
}
I believe that is the schoolbook implementation of bisection. Actually, you'll find something pretty much the same inside the d3-array package mentioned before.
using the D3-array npm.
const d3 = require('d3-array');
var a = [10,20,30,40];
console.log(d3.bisectLeft(a,0));
console.log(d3.bisectLeft(a,10));
console.log(d3.bisectLeft(a,15));
console.log(d3.bisectLeft(a,25));
console.log(d3.bisectLeft(a,35));
console.log(d3.bisectLeft(a,45));
output:
0
0
1
2
3
4
A faster way than the previously accepted answer that works for same size intervals is:
var array = [5, 20, 35, 50]
//Intervals:
// <5: 0
// [5-20): 1
// [20-35): 2
// [35-50): 3
// >=50: 4
var getPosition = function(array, x) {
if (array.length == 0) return;
if (array.length == 1) return (x < array[0]) ? 0 : 1;
return Math.floor((x - array[0]) / (array[1] - array[0])) + 1
}
console.log(getPosition(array, 2)); //0
console.log(getPosition(array, 5)); //1
console.log(getPosition(array, 15));//1
console.log(getPosition(array, 20));//2
console.log(getPosition(array, 48));//3
console.log(getPosition(array, 50));//4
console.log(getPosition(array, 53));//4
console.log("WHEN SIZE: 1")
array = [5];
//Intervals:
// <5: 0
// >=5: 1
console.log(getPosition(array, 3));
console.log(getPosition(array, 5));
console.log(getPosition(array, 6));
There are no built-in bisection functions in JavaScript, so you will have to roll your own. Here is my personal reinvention of the wheel:
var array = [10, 20, 30, 40]
function bisectLeft (array, x) {
for (var i = 0; i < array.length; i++) {
if (array[i] >= x) return i
}
return array.length
}
console.log(bisectLeft(array, 5))
console.log(bisectLeft(array, 15))
console.log(bisectLeft(array, 25))
console.log(bisectLeft(array, 35))
console.log(bisectLeft(array, 45))
function bisectRight (array, x) {
for (var i = 0; i < array.length; i++) {
if (array[i] > x) return i
}
return array.length
}

Peak and Flag Codility latest chellange

I'm trying to solve the latest codility.com question (just for enhance my skills). I tried allot but not getting more than 30 marks there so now curious what exactly I am missing in my solution.
The question says
A non-empty zero-indexed array A consisting of N integers is given. A peak is an array element which is larger than its neighbours. More precisely, it is an index P such that
0 < P < N − 1 and A[P − 1] < A[P] > A[P + 1]
For example, the following array A:
A[0] = 1
A[1] = 5
A[2] = 3
A[3] = 4
A[4] = 3
A[5] = 4
A[6] = 1
A[7] = 2
A[8] = 3
A[9] = 4
A[10] = 6
A[11] = 2
has exactly four peaks: elements 1, 3, 5 and 10.
You are going on a trip to a range of mountains whose relative heights are represented by array A. You have to choose how many flags you should take with you. The goal is to set the maximum number of flags on the peaks, according to certain rules.
Flags can only be set on peaks. What's more, if you take K flags, then the distance between any two flags should be greater than or equal to K. The distance between indices P and Q is the absolute value |P − Q|.
For example, given the mountain range represented by array A, above, with N = 12, if you take:
> two flags, you can set them on peaks 1 and 5;
> three flags, you can set them on peaks 1, 5 and 10;
> four flags, you can set only three flags, on peaks 1, 5 and 10.
You can therefore set a maximum of three flags in this case.
Write a function that, given a non-empty zero-indexed array A of N integers, returns the maximum number of flags that can be set on the peaks of the array.
For example, given the array above
the function should return 3, as explained above.
Assume that:
N is an integer within the range [1..100,000];
each element of array A is an integer within the range [0..1,000,000,000].
Complexity:
expected worst-case time complexity is O(N);
expected worst-case space complexity is O(N), beyond input storage (not counting the
storage required for input arguments).
So I tried this code according to my understanding of question
var A = [1,5,3,4,3,4,1,2,3,4,6,2];
function solution(A) {
array = new Array();
for (i = 1; i < A.length - 1; i++) {
if (A[i - 1] < A[i] && A[i + 1] < A[i]) {
array.push(i);
}
}
//console.log(A);
//console.log(array);
var position = array[0];
var counter = 1;
var len = array.length;
for (var i = 0; i < len; i++) {
if (Math.abs(array[i+1] - position) >= len) {
position = array[i+1];
counter ++;
}
}
console.log("total:",counter);
return counter;
}
The above code works for sample array elements: [1,5,3,4,3,4,1,2,3,4,6,2]
Get peaks at indices: [1, 3, 5, 10] and set flags at 1, 5, and 10 (total 3)
But codility.com says it fails on array [7, 10, 4, 5, 7, 4, 6, 1, 4, 3, 3, 7]
My code get peaks at indices: [1, 4, 6, 8] and set flags at 1 and 6 (total 2)
but coditity.com says it should be 3 flags. (no idea why)
Am I miss-understanding the question ?
Please I am only looking for the hint/algo. I know this question is already asked by someone and solved on private chatroom but on that page I tried to get the help with that person but members rather flagging my posts as inappropriate answer so I am asking the question here again.
P.S: You can try coding the challenge yourself here!
This is a solution with better upper complexity bounds:
time complexity: O(sqrt(N) * log(N))
space complexity: O(1) (over the original input storage)
Python implementation
from math import sqrt
def transform(A):
peak_pos = len(A)
last_height = A[-1]
for p in range(len(A) - 1, 0, -1):
if (A[p - 1] < A[p] > last_height):
peak_pos = p
last_height = A[p]
A[p] = peak_pos
A[0] = peak_pos
def can_fit_flags(A, k):
flag = 1 - k
for i in range(k):
# plant the next flag at A[flag + k]
if flag + k > len(A) - 1:
return False
flag = A[flag + k]
return flag < len(A) # last flag planted successfully
def solution(A):
transform(A)
lower = 0
upper = int(sqrt(len(A))) + 2
assert not can_fit_flags(A, k=upper)
while lower < upper - 1:
next = (lower + upper) // 2
if can_fit_flags(A, k=next):
lower = next
else:
upper = next
return lower
Description
O(N) preprocessing (done inplace):
A[i] := next peak or end position after or at position i
(i for a peak itself, len(A) after last peak)
If we can plant k flags then we can certainly plant k' < k flags as well.
If we can not plant k flags then we certainly can not plant k' > k flags either.
We can always set 0 flags.
Let us assume we can not set X flags.
Now we can use binary search to find out exactly how many flags can be planted.
Steps:
1. X/2
2. X/2 +- X/4
3. X/2 +- X/4 +- X/8
...
log2(X) steps in total
With the preprocessing done before, each step testing whether k flags can be planted can be performed in O(k) operations:
flag(0) = next(0)
flag(1) = next(flag(1) + k)
...
flag(k-1) = next(flag(k-2) + k)
total cost - worst case - when X - 1 flags can be planted:
== X * (1/2 + 3/4 + ... + (2^k - 1)/(2^k))
== X * (log2(X) - 1 + (<1))
<= X * log(X)
Using X == N would work, and would most likely also be sublinear, but is not good enough to use in a proof that the total upper bound for this algorithm is under O(N).
Now everything depends on finding a good X, and it since k flags take about k^2 positions to fit, it seems like a good upper limit on the number of flags should be found somewhere around sqrt(N).
If X == sqrt(N) or something close to it works, then we get an upper bound of O(sqrt(N) * log(sqrt(N))) which is definitely sublinear and since log(sqrt(N)) == 1/2 * log(N) that upper bound is equivalent to O(sqrt(N) * log(N)).
Let's look for a more exact upper bound on the number of required flags around sqrt(N):
we know k flags requires Nk := k^2 - k + 3 flags
by solving the equation k^2 - k + 3 - N = 0 over k we find that if k >= 3, then any number of flags <= the resulting k can fit in some sequence of length N and a larger one can not; solution to that equation is 1/2 * (1 + sqrt(4N - 11))
for N >= 9 we know we can fit 3 flags
==> for N >= 9, k = floor(1/2 * (1 + sqrt(4N - 11))) + 1 is a strict upper bound on the number of flags we can fit in N
for N < 9 we know 3 is a strict upper bound but those cases do not concern us for finding the big-O algorithm complexity
floor(1/2 * (1 + sqrt(4N - 11))) + 1
== floor(1/2 + sqrt(N - 11/4)) + 1
<= floor(sqrt(N - 11/4)) + 2
<= floor(sqrt(N)) + 2
==> floor(sqrt(N)) + 2 is also a good strict upper bound for a number of flags that can fit in N elements + this one holds even for N < 9 so it can be used as a generic strict upper bound in our implementation as well
If we choose X = floor(sqrt(N)) + 2 we get the following total algorithm upper bound:
O((floor(sqrt(N)) + 2) * log(floor(sqrt(N)) + 2))
{floor(...) <= ...}
O((sqrt(N) + 2) * log(sqrt(N) + 2))
{for large enough N >= 4: sqrt(N) + 2 <= 2 * sqrt(N)}
O(2 * sqrt(N) * log(2 * sqrt(N)))
{lose the leading constant}
O(sqrt(N) * (log(2) + loq(sqrt(N)))
O(sqrt(N) * log(2) + sqrt(N) * log(sqrt(N)))
{lose the lower order bound}
O(sqrt(N) * log(sqrt(N)))
{as noted before, log(sqrt(N)) == 1/2 * log(N)}
O(sqrt(N) * log(N))
QED
Missing 100% PHP solution :)
function solution($A)
{
$p = array(); // peaks
for ($i=1; $i<count($A)-1; $i++)
if ($A[$i] > $A[$i-1] && $A[$i] > $A[$i+1])
$p[] = $i;
$n = count($p);
if ($n <= 2)
return $n;
$maxFlags = min(intval(ceil(sqrt(count($A)))), $n); // max number of flags
$distance = $maxFlags; // required distance between flags
// try to set max number of flags, then 1 less, etc... (2 flags are already set)
for ($k = $maxFlags-2; $k > 0; $k--)
{
$left = $p[0];
$right = $p[$n-1];
$need = $k; // how many more flags we need to set
for ($i = 1; $i<=$n-2; $i++)
{
// found one more flag for $distance
if ($p[$i]-$left >= $distance && $right-$p[$i] >= $distance)
{
if ($need == 1)
return $k+2;
$need--;
$left = $p[$i];
}
if ($right - $p[$i] <= $need * ($distance+1))
break; // impossible to set $need more flags for $distance
}
if ($need == 0)
return $k+2;
$distance--;
}
return 2;
}
import java.util.Arrays;
import java.lang.Integer;
import java.util.ArrayList;
import java.util.List;
public int solution(int[] A)
{
ArrayList<Integer> array = new ArrayList<Integer>();
for (int i = 1; i < A.length - 1; i++)
{
if (A[i - 1] < A[i] && A[i + 1] < A[i])
{
array.add(i);
}
}
if (array.size() == 1 || array.size() == 0)
{
return array.size();
}
int sf = 1;
int ef = array.size();
int result = 1;
while (sf <= ef)
{
int flag = (sf + ef) / 2;
boolean suc = false;
int used = 0;
int mark = array.get(0);
for (int i = 0; i < array.size(); i++)
{
if (array.get(i) >= mark)
{
used++;
mark = array.get(i) + flag;
if (used == flag)
{
suc = true;
break;
}
}
}
if (suc)
{
result = flag;
sf = flag + 1;
}
else
{
ef = flag - 1;
}
}
return result;
}
C++ solution, O(N) detected
#include <algorithm>
int solution(vector<int> &a) {
if(a.size() < 3) return 0;
std::vector<int> peaks(a.size());
int last_peak = -1;
peaks.back() = last_peak;
for(auto i = ++a.rbegin();i != --a.rend();i++)
{
int index = a.size() - (i - a.rbegin()) - 1;
if(*i > *(i - 1) && *i > *(i + 1))
last_peak = index;
peaks[index] = last_peak;
}
peaks.front() = last_peak;
int max_flags = 0;
for(int i = 1;i*i <= a.size() + i;i++)
{
int next_peak = peaks[0];
int flags = 0;
for(int j = 0;j < i && next_peak != -1;j++, flags++)
{
if(next_peak + i >= a.size())
next_peak = -1;
else
next_peak = peaks[next_peak + i];
}
max_flags = std::max(max_flags, flags);
}
return max_flags;
}
100% Java solution with O(N) complexity.
https://app.codility.com/demo/results/trainingPNYEZY-G6Q/
class Solution {
public int solution(int[] A) {
// write your code in Java SE 8
int[] peaks = new int[A.length];
int peakStart = 0;
int peakEnd = 0;
//Find the peaks.
//We don't want to traverse the array where peaks hasn't started, yet,
//or where peaks doesn't occur any more.
//Therefore, find start and end points of the peak as well.
for(int i = 1; i < A.length-1; i++) {
if(A[i-1] < A[i] && A[i+1] < A[i]) {
peaks[i] = 1;
peakEnd = i + 1;
}
if(peakStart == 0) {
peakStart = i;
}
}
int x = 1;
//The maximum number of flags can be √N
int limit = (int)Math.ceil(Math.sqrt(A.length));
int prevPeak = 0;
int counter = 0;
int max = Integer.MIN_VALUE;
while(x <= limit) {
counter = 0;
prevPeak = 0;
for(int y = peakStart; y < peakEnd; y++) {
//Find the peak points when we have x number of flags.
if(peaks[y] == 1 && (prevPeak == 0 || x <= (y - prevPeak))) {
counter++;
prevPeak = y;
}
//If we don't have any more flags stop.
if(counter == x ) {
break;
}
}
//if the number of flags set on the peaks starts to reduce stop searching.
if(counter <= max) {
return max;
}
//Keep the maximum number of flags we set on.
max = counter;
x++;
}
return max;
}
}
There is a ratio between the number of flags we can take with us and
the number of flags we can set. We can not set more than √N number of
flags since N/√N = √N. If we set more than √N, we will end up with
decreasing number of flags set on the peaks.
When we increase the numbers of flags we take with us, the number of
flags we can set increases up to a point. After that point the number
of flags we can set will decrease. Therefore, when the number of
flags we can set starts to decrease once, we don't have to check the
rest of the possible solutions.
We mark the peak points at the beginning of the code, and we also
mark the first and the last peak points. This reduces the unnecessary
checks where the peaks starts at the very last elements of a large
array or the last peak occurs at the very first elements of a large
array.
Here is a C++ Solution with 100% score
int test(vector<int> &peaks,int i,int n)
{
int j,k,sum,fin,pos;
fin = n/i;
for (k=0; k< i; k++)
{
sum=0;
for (j=0; j< fin; j++)
{ pos = j + k * fin;
sum=sum + peaks[ pos ];
}
if (0==sum) return 0;
}
return 1;
}
int solution(vector<int> &A) {
// write your code in C++98
int i,n,max,r,j,salir;
n = A.size();
vector<int> peaks(n,0);
if (0==n) return 0;
if (1==n) return 0;
for (i=1; i< (n-1) ; i++)
{
if ( (A[i-1] < A[i]) && (A[i+1] < A[i]) ) peaks[i]=1;
}
i=1;
max=0;
salir =0;
while ( ( i*i < n) && (0==salir) )
{
if ( 0== n % i)
{
r=test(peaks,i,n);
if (( 1==r ) && (i>max)) max=i;
j = n/i;
r=test(peaks,j,n);
if (( 1==r ) && (j>max)) max=j;
if ( max > n/2) salir =1;
}
i++;
}
if (0==salir)
{
if (i*i == n)
{
if ( 1==test(peaks,i,n) ) max=i;
}
}
return max;
}
The first idea is that we cannot set more than sqrt(N) flags. Lets imagine that we've taken N flags, in this case we should have at least N * N items to set all the flags, because N it's the minimal distance between the flags. So, if we have N items its impossible to set more than sqrt(N) flags.
function solution(A) {
const peaks = searchPeaks(A);
const maxFlagCount = Math.floor(Math.sqrt(A.length)) + 1;
let result = 0;
for (let i = 1; i <= maxFlagCount; ++i) {
const flagsSet = setFlags(peaks, i);
result = Math.max(result, flagsSet);
}
return result;
}
function searchPeaks(A) {
const peaks = [];
for (let i = 1; i < A.length - 1; ++i) {
if (A[i] > A[i - 1] && A[i] > A[i + 1]) {
peaks.push(i);
}
}
return peaks;
}
function setFlags(peaks, flagsTotal) {
let flagsSet = 0;
let lastFlagIndex = -flagsTotal;
for (const peakIndex of peaks) {
if (peakIndex >= lastFlagIndex + flagsTotal) {
flagsSet += 1;
lastFlagIndex = peakIndex;
if (flagsSet === flagsTotal) {
return flagsSet;
}
}
}
return flagsSet;
}
Such solution has O(N) complexity. We should iterate over A to find peaks and iterate from 1 to sqrt(N) flag counts trying to set all the flags. So we have O(N + 1 + 2 + 3 ... sqrt(N)) = O(N + sqrt(N*N)) = O(N) complexity.
Above solution is pretty fast and it gets 100% result, but it can be even more optimized. The idea is to binary search the flag count. Lets take F flags and try to set them all. If excess flags are left, the answer is less tan F. But, if all the flags have been set and there is space for more flags, the answer is greater than F.
function solution(A) {
const peaks = searchPeaks(A);
const maxFlagCount = Math.floor(Math.sqrt(A.length)) + 1;
return bSearchFlagCount(A, peaks, 1, maxFlagCount);
}
function searchPeaks(A) {
const peaks = [];
for (let i = 1; i < A.length - 1; ++i) {
if (A[i] > A[i - 1] && A[i] > A[i + 1]) {
peaks.push(i);
}
}
return peaks;
}
function bSearchFlagCount(A, peaks, start, end) {
const mid = Math.floor((start + end) / 2);
const flagsSet = setFlags(peaks, mid);
if (flagsSet == mid) {
return mid;
} else if (flagsSet < mid) {
return end > start ? bSearchFlagCount(A, peaks, start, mid) : mid - 1;
} else {
return bSearchFlagCount(A, peaks, mid + 1, end);
}
}
function setFlags(peaks, flagsTotal) {
let flagsSet = 0;
let lastFlagIndex = -flagsTotal;
for (const peakIndex of peaks) {
if (peakIndex >= lastFlagIndex + flagsTotal) {
flagsSet += 1;
lastFlagIndex = peakIndex;
// It only matters that we can set more flags then were taken.
// It doesn't matter how many extra flags can be set.
if (flagsSet > flagsTotal) {
return flagsSet;
}
}
}
return flagsSet;
}
Here is the official Codility solutions of the task.
My C++ solution with 100% result
bool check(const vector<int>& v, int flags, int mid) {
if (not v.empty()) {
flags--;
}
int start = 0;
for (size_t i = 1; i < v.size(); ++i) {
if (v[i] - v[start] >= mid) {
--flags;
start = i;
}
}
return flags <= 0;
}
int solution(vector<int> &A) {
vector<int> peaks;
for (size_t i = 1; i < A.size() - 1; ++i) {
if (A[i] > A[i - 1] and A[i] > A[i + 1]) {
peaks.push_back(i);
}
}
int low = 0;
int high = peaks.size();
int res = 0;
while (low <= high) {
int mid = high - (high - low) / 2;
if (check(peaks, mid, mid)) {
low = mid + 1;
res = mid;
} else {
high = mid - 1;
}
}
return res;
}
public int solution(int[] A) {
int p = 0;
int q = 0;
int k = 0;
for (int i = 0; i < A.length; i++) {
if (i > 0 && i < A.length && (i + 1) < A.length - 1) {
if (A[i] > A[i - 1] && A[i] > A[i + 1]) {
p = i;
if (i < A.length / 2)
k++;
}
if (i > 0 && i < A.length && (A.length - i + 1) < A.length) {
if (A[A.length - i] > A[A.length - i - 1]
&& A[A.length - i] > A[A.length - i + 1] ) {
q = A.length - i;
if (i < A.length / 2)
k++;
else {
if (Math.abs(p - q) < k && p != q)
k--;
}
}
}
}
}
return k;
}
import sys
def get_max_num_peaks(arr):
peaks = [i for i in range(1, len(arr)-1, 1) if arr[i]>arr[i-1] and arr[i]>arr[i+1]]
max_len = [1 for i in peaks]
smallest_diff = [0 for i in peaks]
smallest_diff[0] = sys.maxint
for i in range(1, len(peaks), 1):
result = 1
for j in range(0, i, 1):
m = min(smallest_diff[j], peaks[i]-peaks[j])
if smallest_diff[j]>0 and m>=max_len[j]+1:
max_len[i] = max_len[j]+1
smallest_diff[i] = m
result = max(result, max_len[i])
return result
if __name__ == "__main__":
result = get_max_num_peaks([7, 10, 4, 5, 7, 4, 6, 1, 4, 3, 3, 7])
print result
I used DP to solve this problem. Here is the python code:
The max num of flags can be set for array ending at i is the max num of flags can be set on j if min(min_diff(0 .. j), j to i) is no less than max_len(0 .. j)+1
Please correct me if I'm wrong or there is a O(N) solution
I know that the answer had been provided by francesco Malagrino, but i have written my own code. for the arrays {1,5,3,4,3,4,1,2,3,4,6,2} and { 7, 10, 4, 5, 7, 4, 6, 1, 4, 3, 3, 7 } my code is working just fine. and when I took my code on the codility exams i had failed on {9, 9, 4, 3, 5, 4, 5, 2, 8, 9, 3, 1}
my answer resulted to 3 maximum flags. the way I understand it it supposed to be 3 but instead
the correct answer is 2, and also with also in respect to francesco Malagrino's solution.
what seems to be wrong in my code and how come the answer should only be 2 the fact that
distances between peaks 4, 6, 9 followed the rule.
private static int getpeak(int[] a) {
List<Integer> peak = new ArrayList<Integer>();
int temp1 = 0;
int temp2 = 0;
int temp3 = 0;
for (int i = 1; i <= (a.length - 2); i++) {
temp1 = a[i - 1];
temp2 = a[i];
temp3 = a[i + 1];
if (temp2 > temp1 && temp2 > temp3) {
peak.add(i);
}
}
Integer[] peakArray = peak.toArray(new Integer[0]);
int max = 1;
int lastFlag = 0;
for (int i = 1; i <= peakArray.length - 1; i++) {
int gap = peakArray[i] - peakArray[lastFlag];
gap = Math.abs(gap);
if (gap >= i+1) {
lastFlag = i;
max = max + 1;
}
}
return max;
}
I cam up with an algorithm for this problem that is both of O(N) and passed all of the codility tests. The main idea is that the number of flags can not be more than the square root of N. So to keep the total order linear, each iteration should be less than the square root of N too, which is the number of flags itself.
So first, I built an array nextPeak that for each index of A provides the closest flag after the index.
Then, in the second part, I iterate f over all possible number of flags from root of N back to 0 to find the maximum number of flags that can be applied on the array. In each iteration, I try to apply the flags and use the nextPeak array to find the next peak in constant time.
The code looks like this:
public int solution(int[] A){
if( A==null || A.length<3){
return 0;
}
int[] next = new int[A.length];
int nextPeak=-1;
for(int i =1; i<A.length; i++){
if(nextPeak<i){
for(nextPeak=i; nextPeak<A.length-1; nextPeak++){
if(A[nextPeak-1]<A[nextPeak] && A[nextPeak]>A[nextPeak+1]){
break;
}
}
}
next[i] = nextPeak;
}
next[0] = next[1];
int max = new Double(Math.sqrt(A.length)).intValue();
boolean failed = true ;
int f=max;
while(f>0 && failed){
int v=0;
for(int p=0; p<A.length-1 && next[p]<A.length-1 && v<f; v++, p+=max){
p = next[p];
}
if(v<f){
f--;
} else {
failed = false;
}
}
return f;
}
Here is a 100% Java solution
class Solution {
public int solution(int[] A) {
int[] nextPeaks = nextPeaks(A);
int flagNumebr = 1;
int result = 0;
while ((flagNumebr-1)*flagNumebr <= A.length) {
int flagPos = 0;
int flagsTaken = 0;
while (flagPos < A.length && flagsTaken < flagNumebr) {
flagPos = nextPeaks[flagPos];
if (flagPos == -1) {
// we arrived at the end of the peaks;
break;
}
flagsTaken++;
flagPos += flagNumebr;
}
result = Math.max(result, flagsTaken);
flagNumebr++;
}
return result;
}
private boolean[] createPeaks(int[] A) {
boolean[] peaks = new boolean[A.length];
for (int i = 1; i < A.length-1; i++) {
if (A[i - 1] < A[i] && A[i] > A[i + 1]) {
peaks[i] = true;
}
}
return peaks;
}
private int[] nextPeaks (int[] A) {
boolean[] peaks = createPeaks(A);
int[] nextPeaks = new int[A.length];
// the last position is always -1
nextPeaks[A.length-1] = -1;
for (int i = A.length-2; i >= 0 ; i--) {
nextPeaks[i] = peaks[i] ? i : nextPeaks[i+1];
}
return nextPeaks;
}
}
to solve this problem:
you have to find peaks
calculate distance (indices differences) between every 2 peaks
Initially the number of flags is the same number of peaks
compare distance between every 2 peaks with the initially specified number of flags ([P - Q] >= K)
after the comparison you will find that you have to avoid some peaks
the final number of maximum flags is the same number of remain peaks
** I'm still searching for how to write the best optimized code for this problem
C# Solution with 100% points.
using System;
using System.Collections.Generic;
class Solution {
public int solution(int[] A) {
// write your code in C# 6.0 with .NET 4.5 (Mono)
List<int> peaks = new List<int>();
for (int i = 1; i < A.Length - 1; i++)
{
if (A[i - 1] < A[i] && A[i + 1] < A[i])
{
peaks.Add(i);
}
}
if (peaks.Count == 1 || peaks.Count == 0)
{
return peaks.Count;
}
int leastFlags = 1;
int mostFlags = peaks.Count;
int result = 1;
while (leastFlags <= mostFlags)
{
int flags = (leastFlags + mostFlags) / 2;
bool suc = false;
int used = 0;
int mark = peaks[0];
for (int i = 0; i < peaks.Count; i++)
{
if (peaks[i] >= mark)
{
used++;
mark = peaks[i] + flags;
if (used == flags)
{
suc = true;
break;
}
}
}
if (suc)
{
result = flags;
leastFlags = flags + 1;
}
else
{
mostFlags = flags - 1;
}
}
return result;
}
}
100% working JS solution:
function solution(A) {
let peaks = [];
for (let i = 1; i < A.length - 1; i++) {
if (A[i] > A[i - 1] && A[i] > A[i + 1]) {
peaks.push(i);
}
}
let n = peaks.length;
if (n <= 2) {
return n;
}
let maxFlags = Math.min(n, Math.ceil(Math.sqrt(A.length)));
let distance = maxFlags;
let rightPeak = peaks[n - 1];
for (let k = maxFlags - 2; k > 0; k--) {
let flags = k;
let leftPeak = peaks[0];
for (let i = 1; i <= n - 2; i++) {
if (peaks[i] - leftPeak >= distance && rightPeak - peaks[i] >= distance) {
if (flags === 1) {
return k + 2;
}
flags--;
leftPeak = peaks[i];
}
if (rightPeak - peaks[i] <= flags * (distance + 1)) {
break;
}
}
if (flags === 0) {
return k + 2;
}
distance--;
}
return 2;
}
100 % python O(N) detected.
import math
def solution(A):
N=len(A)
#Trivial cases
if N<3:
return 0
Flags_Idx=[]
for p in range(1,N-1):
if A[p-1]<A[p] and A[p]>A[p+1] :
Flags_Idx.append(p)
if len(Flags_Idx)==0:
return 0
if len(Flags_Idx)<=2:
return len(Flags_Idx)
Start_End_Flags=Flags_Idx[len(Flags_Idx)-1]-Flags_Idx[0]
#Maximum number of flags N is such that Start_End_Flags/(N-1)>=N
#After solving a second degree equation we obtain the maximal value of N
num_max_flags=math.floor(1.0+math.sqrt(4*Start_End_Flags+1.0))/2.0
#Set the current number of flags to its total number
len_flags=len(Flags_Idx)
min_peaks=len(Flags_Idx)
p=0
#Compute the minimal number of flags by checking each indexes
#and comparing to the maximal theorique value num_max_flags
while p<len_flags-1:
add = 1
#Move to the next flag until the condition Flags_Idx[p+add]-Flags_Idx[p]>=min(num_max_flags,num_flags)
while Flags_Idx[p+add]-Flags_Idx[p]<min(num_max_flags,min_peaks):
min_peaks-=1
if p+add<len_flags-1:
add+=1
else:
p=len_flags
break
p+=add
if num_max_flags==min_peaks:
return min_peaks
#Bisect the remaining flags : check the condition
#for flags in [min_peaks,num_max_flags]
num_peaks=min_peaks
for nf in range (min_peaks,int(num_max_flags)+1):
cnt=1
p=0
while p<len_flags-1:
add = 1
while Flags_Idx[p+add]-Flags_Idx[p]<nf:
if p+add<len_flags-1:
add+=1
else:
cnt-=1
p=len_flags
break
p+=add
cnt+=1
num_peaks=max(min(cnt,nf),num_peaks)
return num_peaks
I first computed the maximal possible number of flags verifying the condition
Interval/(N-1) >= N , where Interval is the index difference between first and last flag. Then browsing all the flags comparing with the minimum of this value and the current number of flags. Subtract if the condition is not verified.
Obtained the minimal number of flags and use it as a starting point to check the condition
on the remaining ones (in interval [min_flag,max_flag]).
100% python solution which is far simpler than the one posted above by #Jurko Gospodnetić
https://github.com/niall-oc/things/blob/master/codility/flags.py
https://app.codility.com/demo/results/training2Y78NP-VHU/
You don't need to do a binary search on this problem. MAX flags is the (square root of the (spread between first and last flag)) +1. First peak at index 9 and last peak at index 58 means the spread is sqrt(49) which is (7)+1. So try 8 flags then 7 then 6 and so on. You should break after your solution peaks! no need to flog a dead horse!
def solution(A):
peak=[x for x in range(1,len(A))if A[x-1]<A[x]>A[x+1]]
max_flag=len(peak)
for x in range(1,max_flag+1):
for y in range(x-1):
if abs(peak[y]-peak[y+1])>=max_flag:
max_flag=max_flag-1
print(max_flag)**strong text**
I got 100% with this solution in Java. I did one thing for the first loop to find peaks, i.e. after finding the peak I am skipping the next element as it is less than the peak.
I know this solution can be further optimized by group members but this is the best I can do as of now, so please let me know how can I optimize this more.
Detected time complexity: O(N)
https://app.codility.com/demo/results/trainingG35UCA-7B4/
public static int solution(int[] A) {
int N = A.length;
if (N < 3)
return 0;
ArrayList<Integer> peaks = new ArrayList<Integer>();
for (int i = 1; i < N - 1; i++) {
if (A[i] > A[i - 1]) {
if (A[i] > A[i + 1]) {
peaks.add(i);
i++;// skip for next as A[i + 1] < A[i] so no need to check again
}
}
}
int size = peaks.size();
if (size < 2)
return size;
int k = (int) Math.sqrt(peaks.get(size - 1) - peaks.get(0))+1; // added 1 to round off
int flagsLeft = k - 1; // one flag is used for first element
int maxFlag = 0;
int prevEle = peaks.get(0);
while (k > 0) { // will iterate in descending order
flagsLeft = k - 1; // reset first peak flag
prevEle = peaks.get(0); // reset the flag to first element
for (int i = 1; i < size && flagsLeft > 0; i++) {
if (peaks.get(i) - prevEle >= k) {
flagsLeft--;
prevEle = peaks.get(i);
}
if ((size - 1 - i) < flagsLeft) { // as no. of peaks < flagsLeft
break;
}
}
if (flagsLeft == 0 && maxFlag < k) {
maxFlag = k;
break; // will break at first highest flag as iterating in desc order
}
k--;
}
return maxFlag;
}
int solution(int A[], int N) {
int i,j,k;
int count=0;
int countval=0;
int count1=0;
int flag;
for(i=1;i<N-1;i++)
{`enter code here`
if((A[i-1]<A[i]) && (A[i]>A[i+1]))
{
printf("%d %d\n",A[i],i);
A[count++]=i;
i++;
}
}
j=A[0];
k=0;
if (count==1 || count==0)
return count;
if (count==2)
{
if((A[1]-A[0])>=count)
return 2;
else
return 1;
}
flag=0;
// contval=count;
count1=1;
countval=count;
while(1)
{
for(i=1;i<count;i++)
{
printf("%d %d\n",A[i],j);
if((A[i]-j)>=countval)
{
printf("Added %d %d\n",A[i],j);
count1++;
j=A[i];
}
/* if(i==count-1 && count1<count)
{
j=A[0];
i=0;
count1=1;
}*/
}
printf("count %d count1 %d \n",countval,count1);
if (count1<countval)
{
count1=1;
countval--;
j=A[0];
}
else
{
break; }
}
return countval;
}

Optimizing String Matching Algorithm

function levenshtein(a, b) {
var i,j,cost,d=[];
if (a.length == 0) {return b.length;}
if (b.length == 0) {return a.length;}
for ( i = 0; i <= a.length; i++) {
d[i] = new Array();
d[ i ][0] = i;
}
for ( j = 0; j <= b.length; j++) {
d[ 0 ][j] = j;
}
for ( i = 1; i <= a.length; i++) {
for ( j = 1; j <= b.length; j++) {
if (a.charAt(i - 1) == b.charAt(j - 1)) {
cost = 0;
} else {
cost = 1;
}
d[ i ][j] = Math.min(d[ i - 1 ][j] + 1, d[ i ][j - 1] + 1, d[ i - 1 ][j - 1] + cost);
if (i > 1 && j > 1 && a.charAt(i - 1) == b.charAt(j - 2) && a.charAt(i - 2) == b.charAt(j - 1)) {
d[i][j] = Math.min(d[i][j], d[i - 2][j - 2] + cost)
}
}
}
return d[ a.length ][b.length];
}
function suggests(suggWord) {
var sArray = [];
for(var z = words.length;--z;) {
if(levenshtein(words[z],suggWord) < 2) {
sArray.push(words[z]);
}
}
}
Hello.
I'm using the above implementation of Damerau-Levenshtein algorithm. Its fast enough on a normal PC browser, but on a tablet it takes ~2/3 seconds.
Basically, I'm comparing the word sent to a suggest function to every word in my dictionary, and if the distance is less than 2 adding it to my array.
The dic is an array of words approx size 600,000 (699KB)
The aim of this is to make a suggest word feature for my Javascript spell checker.
Any suggestion on how to speed this up? Or a different way of doing this?
One thing you can do if you are only looking for distances less than some threshold is to compare the lengths first. For example, if you only want distances less than 2, then the absolute value of the difference of the two strings' lengths must be less than 2 as well. Doing this will often allow you to avoid even doing the more expensive Levenshtein calculation.
The reasoning behind this is that two strings that differ in length by 2, will require at least two insertions (and thus a resulting minimum distance of 2).
You could modify your code as follows:
function suggests(suggWord) {
var sArray = [];
for(var z = words.length;--z;) {
if(Math.abs(suggWord.length - words[z].length) < 2) {
if (levenshtein(words[z],suggWord) < 2) {
sArray.push(words[z]);
}
}
}
}
I don't do very much javascript, but I think this is how you could do it.
Part of the problem is that you have a large array of dictionary words, and are doing at least some processing for every one of those words. One idea would be to have a separate array for each different word length, and organize your dictionary words into them instead of one big array (or, if you must have the one big array, for alpha lookups or whatever, then use arrays of indexes into that big array). Then, if you have a suggWord that's 5 characters long, you only have to look through the arrays of 4, 5, and 6 letter words. You can then remove the Match.Abs(length-length) test in my code above, because you know you are only looking at the words of the length that could match. This saves you having to do anything with a large chunk of your dictionary words.
Levenshtein is relatively expensive, and more so with longer words. If it is simply the case that Levenshtein is too expensive to do very many times, especially with longer words, you may leverage off another side effect of your threshold of only considering words that either exactly match or that have a distance of 1 (one insertion, deletion, substitution, or transposition). Given that requirement, you can further filter candidates for the Levenshtein calculation by checking that either their first character matches, or their last character matches (unless either word has a length of 1 or 2, in which case Levensthein should be cheap to do). In fact, you could check for a match of either the first n characters or the last n characters, where n = (suggWord.length-1)/2. If they don't pass that test, you can assume that they won't match via Levenshtein. For this you would want primary array of dictionary words ordered alphabetically, and in addition, an array of indexes into that array, but ordered alphabetically by their reversed characters. Then you could do a binary search into both of those arrays, and only have to do Levenshtein calculation on the small subset of words whose n characters of their start or end match the suggWord start or end, and that have a length that differs by at most one character.
I had to optimize the same algorithm. What worked best for me was to cache the d Array.. you create it with big size (the maximum length of the strings you expect) outside of the levenshtein function, so each time you call the function you don't have to reinitialize it.
In my case, in Ruby, it made a huge difference in performance. But of course it depends on the size of your words array...
function levenshtein(a, b, d) {
var i,j,cost;
if (a.length == 0) {return b.length;}
if (b.length == 0) {return a.length;}
for ( i = 1; i <= a.length; i++) {
for ( j = 1; j <= b.length; j++) {
if (a.charAt(i - 1) == b.charAt(j - 1)) {
cost = 0;
} else {
cost = 1;
}
d[ i ][j] = Math.min(d[ i - 1 ][j] + 1, d[ i ][j - 1] + 1, d[ i - 1 ][j - 1] + cost);
if (i > 1 && j > 1 && a.charAt(i - 1) == b.charAt(j - 2) && a.charAt(i - 2) == b.charAt(j - 1)) {
d[i][j] = Math.min(d[i][j], d[i - 2][j - 2] + cost)
}
}
}
return d[ a.length ][b.length];
}
function suggests(suggWord)
{
d = [];
for ( i = 0; i <= 999; i++) {
d[i] = new Array();
d[ i ][0] = i;
}
for ( j = 0; j <= 999; j++) {
d[ 0 ][j] = j;
}
var sArray = [];
for(var z = words.length;--z;)
{
if(levenshtein(words[z],suggWord, d) < 2)
{sArray.push(words[z]);}
}
}
There are some simple things you can do in your code to RADICALLY improve execution speed. I completely rewrote your code for performance, static typing compliance with JIT interpretation, and JSLint compliance:
var levenshtein = function (a, b) {
"use strict";
var i = 0,
j = 0,
cost = 1,
d = [],
x = a.length,
y = b.length,
ai = "",
bj = "",
xx = x + 1,
yy = y + 1;
if (x === 0) {
return y;
}
if (y === 0) {
return x;
}
for (i = 0; i < xx; i += 1) {
d[i] = [];
d[i][0] = i;
}
for (j = 0; j < yy; j += 1) {
d[0][j] = j;
}
for (i = 1; i < xx; i += 1) {
for (j = 1; j < yy; j += 1) {
ai = a.charAt(i - 1);
bj = b.charAt(j - 1);
if (ai === bj) {
cost = 0;
} else {
cost = 1;
}
d[i][j] = Math.min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + cost);
if (i > 1 && j > 1 && ai === b.charAt(j - 2) && a.charAt(i - 2) === bj) {
d[i][j] = Math.min(d[i][j], d[i - 2][j - 2] + cost);
}
}
}
return d[x][y];
};
Looking up the length of the array at each interval of a multidimensional lookup is very costly. I also beautified your code using http://prettydiff.com/ so that I could read it in half the time. I also removed some redundant look ups in your arrays. Please let me know if this executes faster for you.
You should store all the words in a trie. This is space efficient when compared to dictionary storing words. And the algorithm to match a word would be to traverse the trie (which marks the end of the word) and get to the word.
Edit
Like I mentioned in my comment. For Levenshtein distance of 0 or 1 you don't need to go through all the words. Two words have Levenshtein distance of 0 if they are equal. Now the problem boils down to predicting all the words which will have Levenshtein distance of 1 for a given word. Let's take an example:
array
For the above word if you want to find Levenshtein distance of 1, the examples will be
parray, aprray, arpray, arrpay, arrayp (Insertion of a character)
Here p can be substituted by any other letter.
Also for these words, Levenshtein distance is 1
rray, aray, arry (Deletion of a character)
And finally for these words:
prray, apray, arpay, arrpy and arrap (Substitution of a character)
Here again, p can be substituted with any other letter.
So if you look up for these particular combinations only and not all the words, you will get to your solution. If you know how a Levenshtein algorithm works, we have reverse engineered it.
A final example which is your usecase:
If pary is the word which you get as input and which should be corrected to part from the dictionary. So for pary you don't need to look at words starting with ab for e.g. because for any word starting with ab, Levenshtein distance will be greater than 1.

Categories

Resources