Improve string - prefix matching performance

Improve string - prefix matching performance - javascript

I'm looking for a way to speed up my naive string matching process:
// Treat this as pseudo code
function find(input: string, prefixes: string[]) {
for (let i = 0; i < prefixes.length; i++) {
const prefix = prefixes[i];
if (input.startsWith(prefix)) {
return prefix;
}
}
return null;
}
const prefixes = [ "Hey", "Hi", "Hola", ... ];
const prefix = find("Hey, I'm Michael", prefixes);
I've looked into some probabilistic data structures like the bloom filter but I couldn't find one that'd fit my needs. This being said, I don't actually care to get the prefix that would have matched neither do I need a 100% guarantee that a match exists. I only need to know if the input does definitely NOT contain any prefix or that it might.
I've also come across an article about a Burst Tries algorithm which as far as I could understand will serve a similar purpose. Frankly, though I'm not deep enough into algorithms to grasp the full implementation details and make sure this is what I'm looking for.
Side note:
I assume that 99.95% of the input this function will be getting is not going to match any prefix. Therefore I would like for this to be an optimization step to only process strings that will likely have a prefix I'm looking for.
Any help or suggestions would be very much appreciated :3

If the prefixes are known in advance and can be preprocessed, you might try a trie. Especially if they are going to be as short as 10 characters. That would mean each check is on the order of 10 comparisons. Not sure how much better one could do.
function buildTrie(trie, words){
for (let word of words){
let _trie = trie;
for (let i=0; i<word.length; i++){
const letter = word[i];
_trie[letter] = _trie[letter] || {};
if (i == word.length - 1)
_trie[letter]['leaf'] = true;
_trie = _trie[letter];
}
}
return trie;
}
function find(trie, str, i=0){
const letter = str[i];
if (!trie[letter])
return false;
if (trie[letter]['leaf'])
return true;
return find(trie[letter], str, i + 1);
}
const prefixes = [ "Hey", "Heya", "Hi", "Hola"];
const trie = buildTrie({}, prefixes)
console.log(trie)
console.log(find(trie, "Hey, I'm Michael"));
console.log(find(trie, "Heuy, I'm Michael"));

This has no logical difference from the answer by גלעד ברקן, but it displays working with a trie in a quite different code style. (It also uses $ instead of leaf as a terminator; a Symbol would be a good alternative.)
const trie = (words) =>
words .reduce (insertWord, {})
const insertWord = (trie, [c, ...cs]) =>
c ? {...trie, [c]: insertWord (trie [c] || {}, cs)} : {...trie, $: 1}
const hasPrefix = (trie) => ([c, ...cs]) =>
'$' in trie ? true : c ? c in trie && hasPrefix (trie [c]) (cs) : true
const testPrefixes = (prefixes) =>
hasPrefix (trie (prefixes))
const hasGreeting = testPrefixes (["Hey", "Hi", "Hola", "Howdy"])
console .log (hasGreeting ("Hey, I'm Michael"))
console .log (hasGreeting ("Hello, Michael. I'm Michelle"))
console .log (trie ((["Hey", "Hi", "Hola", "Howdy"])))
.as-console-wrapper {max-height: 100% !important; top: 0}
testPrefixes accepts a list of prefixes and returns a function that will report on whether a string starts with one of those prefixes. It does this by creating a trie and partially applying it to hasPrefix. Internally, the trie is built by folding insertWord over an initial empty object.
Of course this only makes sense if your use-case has prefixes that are reused for multiple calls. If not, I see little better than const testPrefixes = (prefixes) => (word) => prefixes .some ((pfx) => word .startsWith (pfx))

For search for a lot of possible substrings in the string, you can use idea from Rabin-Karp algorithm.
In my program Banmoron I used this algorithm for select malicious requests by search substring. See sources on github.

Edit: It makes sense that startsWith is faster than indexOf, I struggle to find benchmarks comparing the two, the one I did find depends on browser speed and chrome runs indexOf faster than startsWith, I would love to know more on this. Below is the original answer:
I have taken a lot to .indexOf recently for most of these cases. The performance, from what I have read up, is better than most loop cases and allows for easy use, here is an example of two functions:
findOne: Returns the first entry that is found (breaks the loop which can increase performance).
findAll: Loops through all prefixes and returns an array of the
found prefixes
If you are specifically looking for prefixes (thus the index value is equal to 0, just change the function to represent it with
input.indexOf(prefixes[i]) === 0
instead of
input.indexOf(prefixes[i]) >= 0
Here is the code snippet:
const exampleString = "Hello, I am Michael, bey👋";
const examplePrefixes = ["Hello", "Holla", "bey👋"];
function findOne(input, prefixes) {
// Loop through prefixes to test if is in the string
for (let i = 0; i < prefixes.length; i++) {
// If the prefix does not exist in the string it returns a value of -1
if (input.indexOf(prefixes[i]) >= 0) {
// Retrun the prefix value if it is found
return prefixes[i];
}
}
// Return null if nothing is found
return null
}
function findAll(input, prefixes) {
// Initialize return array
let retArr = [];
// Loop through prefixes to test if is in the string
for (let i = 0; i < prefixes.length; i++) {
// If the prefix does not exist in the string it returns a value of -1
if (input.indexOf(prefixes[i]) >= 0) {
// If the prefix exists, push it onto the return array
retArr.push(prefixes[i]);
}
}
// return the array after looping through each prefix
return retArr.length !==0 ? retArr : null
}
let value1 = findOne(exampleString, examplePrefixes);
let value2 = findAll(exampleString, examplePrefixes);
console.log(value1); // Hello
console.log(value2); // [ 'Hello', 'bey👋' ]

Related

Isomorphic Strings Function Always Returns True

I am attempting the Isomorphic Strings problem on LeetCode and am having issues with my current solution. I'm sure there are plenty of answers on exactly how to complete this problem, but I would really prefer to finish it through my own thought process before learning the best possible way to do it. For reference, here is the problem: https://leetcode.com/problems/isomorphic-strings/?envType=study-plan&id=level-1
This is my code as it is right now:
var isIsomorphic = function(s, t) {
const map = new Map();
const array1 = [...s];
const array2 = [...t];
for (i = 0; i < s.length; i++) {
if ((map.has(array1[i]) === true) && (map.has(array2[i]) === true)) {
if (map.get(array1[i]) !== array2[i]) {
return false;
} else {
continue;
}
} else if (map.has(array1[i]) === false) {
map.set(array1[i], array2[i]);
}
}
return true;
};
It's messy but I can't figure out why it isn't giving me the desired results. Right now, it seems to always return true for any given values, even though I have the initial if statement to return false if it ever comes across previously-mapped values that don't match. Am I missing something obvious? This is my first question on SA, so I apologize if the format is wrong.

The map is set like:
map.set(array1[i], array2[i]);
The key is the character in the first string, and the value is the corresponding character in the second string. So, when iterating over a new character, checking map.has will only make sense if the character being passed is from the first string; doing map.has(array2[i]) === true)) does not test anything useful, because the second string's characters are not keys of the Map.
You need to perform two tests: that the 1st string's character corresponds to the 2nd string's character (which you're doing right), and that the 2nd string's character is not already set to a different 1st string's character (which needs to be fixed). For this second condition, consider having another Map that's the reverse of the first one - the keys are the characters from the 2nd string, and the values are the characters from the 1st string. (You don't have to have another Map - you could also iterate through the .entries of the first, check that, for every entry's value that matches the 2nd character, the entry's key matches the 1st - but that could feel a bit messy.)
Cleaning up your code some, there's also no need to turn the strings into arrays, and === true can be omitted entirely, and the i variable should be declared with let.
You also might want to check if the length of the first string is equal to the length of the second.
var isIsomorphic = function(s1, s2) {
if (s1.length !== s2.length) return false;
const map1to2 = new Map();
const map2to1 = new Map();
for (let i = 0; i < s1.length; i++) {
// Check that s1 here corresponds to s2
if (map1to2.has(s1[i]) && map1to2.get(s1[i]) !== s2[i]) {
return false;
}
// And that s2 here doesn't correspond to any other s1
if (map2to1.has(s2[i]) && map2to1.get(s2[i]) !== s1[i]) {
return false;
}
map1to2.set(s1[i], s2[i]);
map2to1.set(s2[i], s1[i]);
}
return true;
};
console.log(isIsomorphic('aa', 'bb'));
console.log(isIsomorphic('aa', 'ba'));
console.log(isIsomorphic('badc', 'baba'));

Filter array of strings, keeping only ones starting with vowels

I realise I've massively overengineered this, but as I'm just starting out with JS, I can't think of how to condense this into something not entirely ridiculous. I know I'm probably going to kick myself here, but can someone refactor this for me?
The aim was to create a new array from a provided one, one that only contained strings starting with vowels. It also needed to be case insensitive.
let results = []
for (let i = 0; i < strings.length; i++) {
if ((strings[i].startsWith('a')) || (strings[i].startsWith('A')) || (strings[i].startsWith('e')) || (strings[i].startsWith('E')) || (strings[i].startsWith('i')) || (strings[i].startsWith('I')) || (strings[i].startsWith('o')) || (strings[i].startsWith('O')) || (strings[i].startsWith('u')) || (strings[i].startsWith('U'))) {
results.push(strings[i])
}
}
return results

You can use a single RegExp and Array.prototype.filter() for that:
console.log([
'Foo',
'Bar',
'Abc',
'Lorem',
'Ipsum'
].filter(str => /^[aeiou]/i.test(str)));
Array.prototype.filter() returns a new array with all the elements that pass (return a truthy value) the predicate.
RegExp.prototype.test() returns true if the RegExp finds a match on the string you pass in.
Then, /^[aeiou]/i means:
^ matches the start of the string.
[aeiou] matches any of the characters inside the square brackets, a single time.
i is a case-insensitive modifier.

I'd use Array#filter and a regular expression:
let rex = /^[aeiou]/i;
let results = strings.filter(str => rex.test(str));
/^[aeiou]/i says "At the beginning of the string (^), match a, e, i, o, or u, case-insensitive (the i flag)."
Live Example:
let strings = [
"I'll match",
"But I won't",
"And I will",
"This is another one that won't",
"Useful match here"
];
let rex = /^[aeiou]/i;
let results = strings.filter(str => rex.test(str));
console.log(results);

Other answers are great, but please consider this approach shown below.
If you are new to JS, it will certainly help you understand cornerstones of JS like its array methods.
The map() method creates a new array with the results of calling a provided function on every element in the calling array.
var new_array = arr.map(function callback(currentValue, index, array {
// Return element for new_array
}, thisArg)
Try using a REPL website like https://repl.it/ in order to see what these methods do...
The following is my proposed answer...
function onlyVowels(array) {
// For every element (word) in array we will...
return array.map((element) => {
// ...convert the word to an array with only characters...
return (element.split('').map((char) => {
// ...and map will only return those matching the regex expression
// (a || e || i || o || u)
// parameter /i makes it case insensitive
// parameter /g makes it global so it does not return after
// finding first match or "only one"
return char.match(/[aeiou]/ig)
// After getting an array with only the vowels the join function
// converts it to a string, thus returning the desired value
})).join('')
})
};
function test() {
var input = ['average', 'exceptional', 'amazing'];
var expected = ['aeae', 'eeioa', 'aai']
var actual = onlyVowels(input)
console.log(expected);
console.log(actual);
};
test()

Code Refactoring. Trying to improve my code

My code passed, no problem. But I would like your guys opinion as to what I could have improved in my code. Unnecessary things, tips, better ways to do the same thing, faster ways, I'm literally open to any kind of feedback. Lately I'm only trying to focus on improve how fast I can solve a problem, and this one, took me almost 5 hours.
This code comes from the 2D Array HourGlass.
My thought process was to makeup a model of what I wanted, than for loop through the lines and rows, and that's how I came with this result.
Also, I wanted to improve from thinking of WHAT the code should do, other than HOW. It's hard, but any tips I would really appreciate.
Since I'm coding only Front End stuff, my solving problems is literally shit.
Thanks !
function hourglassSum(arr) {
let newInput = arr
let arrAnswer = []
for(let line in newInput){
for (let row in newInput){
let newRow = parseInt(row)
let newLine = parseInt(line)
if(newLine < 4){
let a =newInput[newLine +0][newRow]
let b =newInput[newLine +0][newRow+1]
let c =newInput[newLine +0][newRow+2]
let d =newInput[newLine +1][newRow+1]
let e =newInput[newLine +2][newRow]
let f =newInput[newLine +2][newRow+1]
let g =newInput[newLine +2][newRow+2]
if(a,b,c,d,e,f,g == undefined){
break
}
arrAnswer.push([a,b,c,d,e,f,g].reduce((item1,item2)=> item1 + item2, 0))
}
}
}
let answer = arrAnswer.reduce((item1, item2) => (item1 > item2 ) ? item1: item2 )
return answer
}

if(a,b,c,d,e,f,g == undefined) Are you expecting this to check if any of your 7 values are undefined?
Based on the comma operator specs I believe it is only checking g == undefined.
The comma operator evaluates each of its operands (from left to right) and returns the value of the last operand.
If you really mean to check for any null values, here's one way you could do it
if([a,b,c,d,e,f,g].indexOf(undefined)>=0) ...

Your code has a lot of redundancies:
let newInput = arr
Unnecessary.
let answer = arrAnswer.reduce((...
Stuffing it in a var is unnecessary, since you just return it on the next line.
As far as I can tell, your entire code can be changed to the following:
const hourglassSum = input => {
return input
.map((a, i, arr) => { // NEVER use for..in with arrays. Use .map or for..of
return arr.map(b => {
const idx1 = parseInt(a, 10); // always use radix
const idx2 = parseInt(b, 10);
// Use boolean short-circuiting, we'll filter later.
// Your original code had potentially error throw here
// if the first access resulted in undefined.
const intermediate = input[idx1] !== undefined &&
input[idx1 + 1] !== undefined &&
input[idx1 + 2] !== undefined &&
[
input[idx1][idx2],
input[idx1][idx2 + 1],
input[idx1][idx2 + 2],
input[idx1 + 1][idx2 + 1],
input[idx1 + 2][idx2],
input[idx1 + 2][idx2 + 1],
input[idx1 + 2][idx2 + 2],
];
// boolean again, check to make sure we don't pollute the
// arithmetic
return intermediate &&
intermediate.every(x => x !== undefined) &&
intermediate;
})
.filter(x => x) // weed out falses
.reduce((a, b) => a + b, 0); // sum to int
})
.reduce((a, b) => Math.max(a, b)); // Math.max replaces ternary
};
This is arguably more readable, definitely less error prone, slightly shorter, makes better use of the built-ins like Math.max and the array methods. Is also consistent rather than mixing functional style with loops. One thing it isn't is faster, but you make it correct first, then fast.

Simple way to force javascript to always return an array

I stumbled upon the YQL API to query for WOEIDs for use in Twitter, but I can see the output is not always in array. The API returns an object and I'm interested in value of response.query.results which returns the following:
if there are no results, it returns null
if there is only one result, it returns an object
if the are multiple results, it returns an array
I want the result to always be an array. I can solve this by checking the result using the following code:
var count = response.query.count;
if(count === 0) {
return [];
} else if(count === 1) {
var arr = [];
arr.push(response.query.results);
return arr;
} else {
return response.query.results;
}
Is there a javascript or lodash function that can simplify the above code? It seems _.forEach and _.toArray will treat each property as an object if provided with a single object.

You could use Array#concat with a default array if response.query.results is falsy.
return [].concat(response.query.results || []);
By having zero as value for response.query.results, you could take the Nullish coalescing operator ?? instead of logical OR ||, which repects all values without undefoned or null
return [].concat(response.query.results ?? []);

https://lodash.com/docs/4.17.4#concat
_.concat([],response.query.results);
would also do it.
but as #Brian pointed out, we need to handle null being equivalent to [] so you can add
_.concat([],_.isNull(response.query.results)?[]:response.query.results);
note that this is more correct because it will work for results with falsey values (like 0 and false etc)
in general, lodash is more robust than built in javascript. this usually works in your favour. one place this can trip you up is if results was a string (which is an array of characters)
https://github.com/lodash/lodash/blob/4.17.4/lodash.js#L6928
function concat() {
var length = arguments.length;
if (!length) {
return [];
}
var args = Array(length - 1),
array = arguments[0],
index = length;
while (index--) {
args[index - 1] = arguments[index];
}
return arrayPush(isArray(array) ? copyArray(array) : [array], baseFlatten(args, 1));
}

Similar to Tom's answer above, the lodash function castArray (https://lodash.com/docs/4.17.15#castArray), introduced in Lodash v4.4, could also work for this. It has the marginal benefit that its intent is slightly more clear that [].concat(x)
const _ = require('lodash')
_.castArray(null) // [null]
_.castArray({a:1}) // [{a:1}]
_.castArray([{a:1},{a:2}] // [{a:1},{a:2}]
To deal with the null, considerations are similar to answers above, depending on how you want to handle unexpected values. A ternary with _.isNull would work, or else ?? is useful, e.g.:
const castArrayRemovingNullUndef = x => _.castArray(x ?? [])
const castArrayRemovingNull = x => _.castArray(_.isNull(x) ? [] :x)
_.castArrayRemovingNull(null) // []
_.castArrayRemovingNull({a:1}) // [{a:1}]
_.castArrayRemovingNull([{a:1},{a:2}] // [{a:1},{a:2}]

regex to extract array indices

I'm still having a hard time to understand regex... :-/
Given strings (JavaScript-like expressions) like these...
foo[0]
foo[4][2]
foo[4][2][234523][3]
...I'm trying to deconstruct the indices in regex, so that I have
the name of the variable: foo
the single indices: fox example 4, 2, 234523 and 3 in the last example
while not accepting invalid syntax like
foo[23]bar[55]
foo[123]bar
[123]bla
foo[urrrr]
It would be nice to also ignore whitespace like foo [13] or foo[ 123 ] but that's not important.
Is that possible with regex?
I was able to extract the brackets with var matches = s.match(/\[([0-9]?)\]/g); but that includes the brackets in the result, is missing the variable name (could get around that) and also does not respect the edge cases as described above.

You'll have to use loops to extract multiple matches. Here's one way:
function run(string) {
var match;
if(match = string.match(/^([^[]+)\s*(\[\s*(\d+)\s*\]\s*)+\s*$/)) {
var variable = match[1], indices = [];
var re = /\[\s*(\d+)\s*\]/g;
while(match = re.exec(string)) {
indices.push(+match[1]);
}
return { variable: variable, indices: indices };
} else {
return null;
}
}
var strings = [
"foo[0]",
"foo[4][2]",
"foo[4][2][234523][3]",
"foo [13]",
"foo[ 123 ]",
"foo[1] [2]",
"foo$;bar%[1]",
// The following are invalid
"foo[23]bar[55]",
"foo[123]bar",
"[123]bla",
"foo[urrrr]",
];
// Demo
strings.forEach(function(string) {
document.write("<pre>" + JSON.stringify(run(string), null, 4) + "</pre>");
});

That is not possible.
You can test if it is a correct statement, and as long you know how many indices you have you can select them, but there is no way to catch a group multiple times with javascript .exec.
However the language is regular. So it would be this:
^([a-zA-Z][a-zA-Z_0-9]*)(\[[0-9]*\])*
The first group will match the variable, and the second group (with the *quantifier 0-n times) the index.
So if you want to do this I recommend to use another parsing approach:
function parse(str) {
let idx = 0;
while(str[idx+1] != '[') {
idx++;
}
let name = str.substr(0, idx+1);
let indices = [];
while(str[idx+1] == '[') {
idx++;
let startIdx = idx;
while(str[idx+1] != ']') {
idx ++;
}
console.log(idx);
indices.push(str.substr(startIdx+1, idx-startIdx));
idx++;
}
return {name,indices};
}

Here is small ES6 version of the 2 step regular expression to get the desired array:
function interpret(s) {
return (/^(\w+)\s*((?:\[\s*\d+\s*\]\s*)*)$/.exec(s) || [,null]).slice(1).reduce(
(fun, args) => [fun].concat(args.match(/\d+/g)));
}
var s = 'foo[4][2][234523][3]';
var result = interpret(s);
console.log(result);
It first gets the 2 main parts via exec(), which returns the complete match, the function name and the rest in an array (with 3 elements). Then with slice(1) it ignores the first of those three. The two others are passed to reduce.
The reduce callback will only be called once, since there is no initial value provided.
This is convenient, as it actually means the callback gets the two parts as its two arguments. It applies the second regular expression to split the index string, and returns the final array.
The || [,null] will take care of the case when the original match fails: it ensures that reduce acts on [null] and thus will return null.

Develop Reference

JavaScript is the programming language of the Web.

Improve string - prefix matching performance - javascript

For search for a lot of possible substrings in the string, you can use idea from Rabin-Karp algorithm. In my program Banmoron I used this algorithm for select malicious requests by search substring. See sources on github.

Related

Isomorphic Strings Function Always Returns True

Filter array of strings, keeping only ones starting with vowels

Code Refactoring. Trying to improve my code

Simple way to force javascript to always return an array

regex to extract array indices

Categories

Resources