String Search Algorithm Implementation

String Search Algorithm Implementation - javascript

I have implemented the string search algorithm using the naive method to count the number of times a substring occurs in a string. I did the implementation in javascript and python.
Algorithm (From Topcoder):
function brute_force(text[], pattern[])
{
// let n be the size of the text and m the size of the
// pattern
count = 0
for(i = 0; i < n; i++) {
for(j = 0; j < m && i + j < n; j++)
if(text[i + j] != pattern[j]) break;
// mismatch found, break the inner loop
if(j == m) // match found
count+=1
return count
}
}
Javascript Implementation:
a = "Rainbow";
b = "Rain";
count = 0;
function findSubStr(Str, SubStr){
for (i = 0; i<a.length; i++){
//document.write(i, '<br/>');
for (j = 0; j < b.length; j++)
//document.write('i = ',i, '<br/>');
//document.write(j, '<br/>');
if(a[i + j] != b[j]) break;
document.write('j = ', j, '<br/>')
//document.write('i = ',i, '<br/>');
if (j == b.length)
count+=1;
}
return count;
}
document.write("Count is ",findSubStr(a,b), '<br/>');
Python Implementation:
a = "Rainbow"
b = "Rain"
def SubStrInStr(Str, SubStr):
count = 0
for i in range(len(Str)):
for j in range(len(SubStr)):
print j
if (a[i + j] != b[j]):
break
if (j+1 == len(SubStr)):
count+=1
return count
print(SubStrInStr(a, b))
Now my question is for the line that implements if (j == b.length): It works well in javascript but for python I need to add 1 to the value of j or deduct 1 from the length of b. I don't know why this is happening.

for x in range(4)
Unlike Javascript in Python for loop is used for every element in the list. Last value x will take is the last element of the list [0, 1, 2, 3] which is 3.
for(x = 0; x < 4; x++)
In Javascript x will take value for 4 and the loop will end because x < 4 condition no longer can be applied. Last value x will take is 4.

You have this confusion because your code isn't identical. Executing for (j = 0; j < b.length; j++) the final value for j will be b.length (in case that b is a substring of a), but for Python, things are a little bit different. Running range(len("1234")) will result in [0, 1, 2, 3], so your for is more like a foreach, j storing the last value from the array and this is the reason why you have to add one. I hope that I was clear enough. If not, please ask for details.

I don't know about javascript , But I have implemented naive search in Python with all the cases with easiest way.
Take a glance on it as below.
It will return no of time pattern got found.
def naive_pattern_search(data,search):
n = len(data) #Finding length of data
m = len(search) #Finding length of pattern to be searched.
i = 0
count = c = 0 #Taking for counting pattern if exixts.
for j in range(m-1):#Loop continue till length of pattern to be Search.
while i <= (n-1):#Data loop
#if searched patten length reached highest index at that time again initilize with 0.
if j > (m-1):
j = 0
#Data and search have same element then both Index increment by 1.
if data[i]==search[j]:
#print(f"\n{ data[i] } { search[j] }")
#print(f"i : {i} {data[i]} j : {j} {search[j]}")
i+=1
j+=1
count+=1
#If one pattern compared and found Successfully then Its Counter for pattern.
if count== (m-1):
c = c + 1
#Initilise pattern again with 0 for searching with next element in data.
else:
j = 0 #Direct move to 0th index.
i+=1
count=0 #If data not found as per pattern continuously then it will start counting from 0 again.
#Searched pattern occurs more then 0 then its Simply means that pattern found.
if c > 0:
return c;
else:
return -1;
Input : abcabcabcabcabc
Output: Pattern Found : 5 Times

I find your python implementation has some problem. If you set the b = "raiy", the function will incorrectly return 1. You may misunderstand the edge condition.
Those two condition statements should be in the same level.
a = "Rainbow"
b = "Rain"
def SubStrInStr(Str, SubStr):
count = 0
for i in range(len(Str)):
for j in range(len(SubStr)):
# print (j)
if (a[i + j] != b[j]):
break
if (j+1 == len(SubStr)):
count+=1
return count
print(SubStrInStr(a, b))here

Related

Understanding a Javascript Decrypter

I'm looking at a codewars solution for decrypting a set of numbers.
For example,
012345 => 304152 => 135024
So the 012 would take on every other index starting from 1, and 345 would take on every other index starting from 0. The solution is written as...
const decrypt = (s, n) => {
if (!s) return s;
const l = Math.floor(s.length / 2);
for (let i = 0; i < n; i++) {
let x = s.slice(0, l), y = s.slice(l);
s = '';
for (let j = 0; j < l + 1; j++)
s += (y[j] ? y[j] : '') + (x[j] ? x[j] : '');
}
return s;
}
From my understanding, the coder sliced the number into two halves, and the for loop runs through the digits in the first half and also the last digits in the second half. I'm not sure why there needs to be a ternary operator there and what conditions it's looking for. I'd appreciate any help!
I tried doing the s+= y[j] + x[j] and thought it would decrypt. I get the right answer, but there's an extra undefined statement at the end of the solution.

What is the difference between "hello".indexOf("") and "hello".indexOf("h")? [duplicate]

why this is happening in javascript?
'abc'.indexOf('a'); //0
'abc'.indexOf(''); //0
while in the other falsy values, the value is -1:
'abc'.indexOf(); //-1
'abc'.indexOf(false); //-1
'abc'.indexOf(undefined); //-1
i have seen also those questions:
Q1 didnt understand the answer in depth.
Q2 in java and not javascript.

The answer, fundamentally, is: Because that's how the function is specified to behave. And it makes sense, from a certain perspective.
The main bit related to returning 0 when you search for an empty string is this:
Return the smallest possible integer k not smaller than start such that k + searchLen is not greater than len, and for all nonnegative integers j less than searchLen, the code unit at index k + j within S is the same as the code unit at index j within searchStr; but if there is no such integer k, return the value -1.
Since the length of the search string is 0, both halves of that "and" are satisfied by k = 0: k + searchLen is not greater than the length of the string, and for all nonnegative integers less than the search length (there are zero), the code points match.
Or roughly speaking, in code:
function indexOf(searchString, position = 0) {
let s = String(this);
let searchStr = String(searchString);
let len = s.length;
let start = Math.min(Math.max(position, 0), len);
let searchLen = searchStr.length;
let k = 0;
while (k + searchLen <= len) {
if (s.substring(k, k + searchLen) === searchStr) {
break;
}
++k;
}
const found = k + searchLen <= len;
return found ? k : -1;
}
Since k + searchLen (0) is <= len (0), k (0)` is returned.
Live Example:
function indexOf(searchString, position = 0) {
let s = String(this);
let searchStr = String(searchString);
let len = s.length;
let start = Math.min(Math.max(position, 0), len);
let searchLen = searchStr.length;
let k = 0;
while (k + searchLen <= len) {
if (s.substring(k, k + searchLen) === searchStr) {
break;
}
++k;
}
const found = k + searchLen <= len;
return found ? k : -1;
}
console.log(indexOf.call("abcd", ""));
Another way to look at it is this answer related to Java...or to life in general.
Re your question passing in non-strings: One of the first steps is:
Let searchStr be ? ToString(searchString).
...which is let searchStr = String(searchString); in my rough code above. That means false becomes "false" and undefined becomes "undefined". "abc" doesn't contain either "false" or "undefined".

Result for passing empty string to indexOf() and lastIndexOf() for string containing text [duplicate]

why this is happening in javascript?
'abc'.indexOf('a'); //0
'abc'.indexOf(''); //0
while in the other falsy values, the value is -1:
'abc'.indexOf(); //-1
'abc'.indexOf(false); //-1
'abc'.indexOf(undefined); //-1
i have seen also those questions:
Q1 didnt understand the answer in depth.
Q2 in java and not javascript.

The answer, fundamentally, is: Because that's how the function is specified to behave. And it makes sense, from a certain perspective.
The main bit related to returning 0 when you search for an empty string is this:
Return the smallest possible integer k not smaller than start such that k + searchLen is not greater than len, and for all nonnegative integers j less than searchLen, the code unit at index k + j within S is the same as the code unit at index j within searchStr; but if there is no such integer k, return the value -1.
Since the length of the search string is 0, both halves of that "and" are satisfied by k = 0: k + searchLen is not greater than the length of the string, and for all nonnegative integers less than the search length (there are zero), the code points match.
Or roughly speaking, in code:
function indexOf(searchString, position = 0) {
let s = String(this);
let searchStr = String(searchString);
let len = s.length;
let start = Math.min(Math.max(position, 0), len);
let searchLen = searchStr.length;
let k = 0;
while (k + searchLen <= len) {
if (s.substring(k, k + searchLen) === searchStr) {
break;
}
++k;
}
const found = k + searchLen <= len;
return found ? k : -1;
}
Since k + searchLen (0) is <= len (0), k (0)` is returned.
Live Example:
function indexOf(searchString, position = 0) {
let s = String(this);
let searchStr = String(searchString);
let len = s.length;
let start = Math.min(Math.max(position, 0), len);
let searchLen = searchStr.length;
let k = 0;
while (k + searchLen <= len) {
if (s.substring(k, k + searchLen) === searchStr) {
break;
}
++k;
}
const found = k + searchLen <= len;
return found ? k : -1;
}
console.log(indexOf.call("abcd", ""));
Another way to look at it is this answer related to Java...or to life in general.
Re your question passing in non-strings: One of the first steps is:
Let searchStr be ? ToString(searchString).
...which is let searchStr = String(searchString); in my rough code above. That means false becomes "false" and undefined becomes "undefined". "abc" doesn't contain either "false" or "undefined".

Need help to fix a snippet from my math grid maze solver

The idea behind the following code is to test if any number between 0 and 13 + any other number equals 13. If one does both numbers should be saved to a different array but on the same index. So i should have all possible combinations to reach 13 in 2 arrays. But when i run my code I only get 2 combinations which are 0+13 and 13+0. Here is the code:
var number1 = [];
var number2 = [];
var index = 0;
var i = 0;
var j = 0;
//Tests if i + j (from the loop) add up to 13
var test = function(i, j) {
if (i + j === 13) {
number1[index] = i;
number2[index] = j;
index =+ 1;
}
}
//1st loop generates i from 0 to 13 in 0.5 step.
for (i = 0; i < 13.5; i += 0.5) {
//same for j, this number should test with i every loop
for (j = 0; j < 13.5; j += 0.5) {
test(i, j);
}
}
//outputs the 2 arrays, the matching numbers should be stored in
for (i = 0; i < number1.length; i++) {
console.log(number1[i]);
console.log(number2[i]);
}

Change index =+ 1 to index += 1
Then index =+ 1 sets the index to 1 it does not increment it by 1 (as you want)
See Expressions and operators: Assignment operators MDN

Optimizing String Matching Algorithm

function levenshtein(a, b) {
var i,j,cost,d=[];
if (a.length == 0) {return b.length;}
if (b.length == 0) {return a.length;}
for ( i = 0; i <= a.length; i++) {
d[i] = new Array();
d[ i ][0] = i;
}
for ( j = 0; j <= b.length; j++) {
d[ 0 ][j] = j;
}
for ( i = 1; i <= a.length; i++) {
for ( j = 1; j <= b.length; j++) {
if (a.charAt(i - 1) == b.charAt(j - 1)) {
cost = 0;
} else {
cost = 1;
}
d[ i ][j] = Math.min(d[ i - 1 ][j] + 1, d[ i ][j - 1] + 1, d[ i - 1 ][j - 1] + cost);
if (i > 1 && j > 1 && a.charAt(i - 1) == b.charAt(j - 2) && a.charAt(i - 2) == b.charAt(j - 1)) {
d[i][j] = Math.min(d[i][j], d[i - 2][j - 2] + cost)
}
}
}
return d[ a.length ][b.length];
}
function suggests(suggWord) {
var sArray = [];
for(var z = words.length;--z;) {
if(levenshtein(words[z],suggWord) < 2) {
sArray.push(words[z]);
}
}
}
Hello.
I'm using the above implementation of Damerau-Levenshtein algorithm. Its fast enough on a normal PC browser, but on a tablet it takes ~2/3 seconds.
Basically, I'm comparing the word sent to a suggest function to every word in my dictionary, and if the distance is less than 2 adding it to my array.
The dic is an array of words approx size 600,000 (699KB)
The aim of this is to make a suggest word feature for my Javascript spell checker.
Any suggestion on how to speed this up? Or a different way of doing this?

One thing you can do if you are only looking for distances less than some threshold is to compare the lengths first. For example, if you only want distances less than 2, then the absolute value of the difference of the two strings' lengths must be less than 2 as well. Doing this will often allow you to avoid even doing the more expensive Levenshtein calculation.
The reasoning behind this is that two strings that differ in length by 2, will require at least two insertions (and thus a resulting minimum distance of 2).
You could modify your code as follows:
function suggests(suggWord) {
var sArray = [];
for(var z = words.length;--z;) {
if(Math.abs(suggWord.length - words[z].length) < 2) {
if (levenshtein(words[z],suggWord) < 2) {
sArray.push(words[z]);
}
}
}
}
I don't do very much javascript, but I think this is how you could do it.
Part of the problem is that you have a large array of dictionary words, and are doing at least some processing for every one of those words. One idea would be to have a separate array for each different word length, and organize your dictionary words into them instead of one big array (or, if you must have the one big array, for alpha lookups or whatever, then use arrays of indexes into that big array). Then, if you have a suggWord that's 5 characters long, you only have to look through the arrays of 4, 5, and 6 letter words. You can then remove the Match.Abs(length-length) test in my code above, because you know you are only looking at the words of the length that could match. This saves you having to do anything with a large chunk of your dictionary words.
Levenshtein is relatively expensive, and more so with longer words. If it is simply the case that Levenshtein is too expensive to do very many times, especially with longer words, you may leverage off another side effect of your threshold of only considering words that either exactly match or that have a distance of 1 (one insertion, deletion, substitution, or transposition). Given that requirement, you can further filter candidates for the Levenshtein calculation by checking that either their first character matches, or their last character matches (unless either word has a length of 1 or 2, in which case Levensthein should be cheap to do). In fact, you could check for a match of either the first n characters or the last n characters, where n = (suggWord.length-1)/2. If they don't pass that test, you can assume that they won't match via Levenshtein. For this you would want primary array of dictionary words ordered alphabetically, and in addition, an array of indexes into that array, but ordered alphabetically by their reversed characters. Then you could do a binary search into both of those arrays, and only have to do Levenshtein calculation on the small subset of words whose n characters of their start or end match the suggWord start or end, and that have a length that differs by at most one character.

I had to optimize the same algorithm. What worked best for me was to cache the d Array.. you create it with big size (the maximum length of the strings you expect) outside of the levenshtein function, so each time you call the function you don't have to reinitialize it.
In my case, in Ruby, it made a huge difference in performance. But of course it depends on the size of your words array...
function levenshtein(a, b, d) {
var i,j,cost;
if (a.length == 0) {return b.length;}
if (b.length == 0) {return a.length;}
for ( i = 1; i <= a.length; i++) {
for ( j = 1; j <= b.length; j++) {
if (a.charAt(i - 1) == b.charAt(j - 1)) {
cost = 0;
} else {
cost = 1;
}
d[ i ][j] = Math.min(d[ i - 1 ][j] + 1, d[ i ][j - 1] + 1, d[ i - 1 ][j - 1] + cost);
if (i > 1 && j > 1 && a.charAt(i - 1) == b.charAt(j - 2) && a.charAt(i - 2) == b.charAt(j - 1)) {
d[i][j] = Math.min(d[i][j], d[i - 2][j - 2] + cost)
}
}
}
return d[ a.length ][b.length];
}
function suggests(suggWord)
{
d = [];
for ( i = 0; i <= 999; i++) {
d[i] = new Array();
d[ i ][0] = i;
}
for ( j = 0; j <= 999; j++) {
d[ 0 ][j] = j;
}
var sArray = [];
for(var z = words.length;--z;)
{
if(levenshtein(words[z],suggWord, d) < 2)
{sArray.push(words[z]);}
}
}

There are some simple things you can do in your code to RADICALLY improve execution speed. I completely rewrote your code for performance, static typing compliance with JIT interpretation, and JSLint compliance:
var levenshtein = function (a, b) {
"use strict";
var i = 0,
j = 0,
cost = 1,
d = [],
x = a.length,
y = b.length,
ai = "",
bj = "",
xx = x + 1,
yy = y + 1;
if (x === 0) {
return y;
}
if (y === 0) {
return x;
}
for (i = 0; i < xx; i += 1) {
d[i] = [];
d[i][0] = i;
}
for (j = 0; j < yy; j += 1) {
d[0][j] = j;
}
for (i = 1; i < xx; i += 1) {
for (j = 1; j < yy; j += 1) {
ai = a.charAt(i - 1);
bj = b.charAt(j - 1);
if (ai === bj) {
cost = 0;
} else {
cost = 1;
}
d[i][j] = Math.min(d[i - 1][j] + 1, d[i][j - 1] + 1, d[i - 1][j - 1] + cost);
if (i > 1 && j > 1 && ai === b.charAt(j - 2) && a.charAt(i - 2) === bj) {
d[i][j] = Math.min(d[i][j], d[i - 2][j - 2] + cost);
}
}
}
return d[x][y];
};
Looking up the length of the array at each interval of a multidimensional lookup is very costly. I also beautified your code using http://prettydiff.com/ so that I could read it in half the time. I also removed some redundant look ups in your arrays. Please let me know if this executes faster for you.

You should store all the words in a trie. This is space efficient when compared to dictionary storing words. And the algorithm to match a word would be to traverse the trie (which marks the end of the word) and get to the word.
Edit
Like I mentioned in my comment. For Levenshtein distance of 0 or 1 you don't need to go through all the words. Two words have Levenshtein distance of 0 if they are equal. Now the problem boils down to predicting all the words which will have Levenshtein distance of 1 for a given word. Let's take an example:
array
For the above word if you want to find Levenshtein distance of 1, the examples will be
parray, aprray, arpray, arrpay, arrayp (Insertion of a character)
Here p can be substituted by any other letter.
Also for these words, Levenshtein distance is 1
rray, aray, arry (Deletion of a character)
And finally for these words:
prray, apray, arpay, arrpy and arrap (Substitution of a character)
Here again, p can be substituted with any other letter.
So if you look up for these particular combinations only and not all the words, you will get to your solution. If you know how a Levenshtein algorithm works, we have reverse engineered it.
A final example which is your usecase:
If pary is the word which you get as input and which should be corrected to part from the dictionary. So for pary you don't need to look at words starting with ab for e.g. because for any word starting with ab, Levenshtein distance will be greater than 1.

Develop Reference

JavaScript is the programming language of the Web.

String Search Algorithm Implementation - javascript

Related

Understanding a Javascript Decrypter

What is the difference between "hello".indexOf("") and "hello".indexOf("h")? [duplicate]

Result for passing empty string to indexOf() and lastIndexOf() for string containing text [duplicate]

Need help to fix a snippet from my math grid maze solver

Optimizing String Matching Algorithm

Categories

Resources