optimize search through large js string array? - javascript

if I have a large javascript string array that has over 10,000 elements,
how do I quickly search through it?
Right now I have a javascript string array that stores the description of a job,
and I"m allowing the user to dynamic filter the returned list as they type into an input box.
So say I have an string array like so:
var descArr = {"flipping burgers", "pumping gas", "delivering mail"};
and the user wants to search for: "p"
How would I be able to search a string array that has 10000+ descriptions in it quickly?
Obviously I can't sort the description array since they're descriptions, so binary search is out. And since the user can search by "p" or "pi" or any combination of letters, this partial search means that I can't use associative arrays (i.e. searchDescArray["pumping gas"] )
to speed up the search.
Any ideas anyone?

As regular expression engines in actual browsers are going nuts in terms of speed, how about doing it that way? Instead of an array pass a gigantic string and separate the words with an identifer.
Example:
String "flipping burgers""pumping gas""delivering mail"
Regex: "([^"]*ping[^"]*)"
With the switch /g for global you get all the matches. Make sure the user does not search for your string separator.
You can even add an id into the string with something like:
String "11 flipping burgers""12 pumping gas""13 delivering mail"
Regex: "(\d+) ([^"]*ping[^"]*)"
Example: http://jsfiddle.net/RnabN/4/ (30000 strings, limit results to 100)

There's no way to speed up an initial array lookup without making some changes. You can speed up consequtive lookups by caching results and mapping them to patterns dynamically.
1.) Adjust your data format. This makes initial lookups somewhat speedier. Basically, you precache.
var data = {
a : ['Ant farm', 'Ant massage parlor'],
b : ['Bat farm', 'Bat massage parlor']
// etc
}
2.) Setup cache mechanics.
var searchFor = function(str, list, caseSensitive, reduce){
str = str.replace(/(?:^\s*|\s*$)/g, ''); // trim whitespace
var found = [];
var reg = new RegExp('^\\s?'+str, 'g' + caseSensitive ? '':'i');
var i = list.length;
while(i--){
if(reg.test(list[i])) found.push(list[i]);
reduce && list.splice(i, 1);
}
}
var lookUp = function(str, caseSensitive){
str = str.replace(/(?:^\s*|\s*$)/g, ''); // trim whitespace
if(data[str]) return cache[str];
var firstChar = caseSensitive ? str[0] : str[0].toLowerCase();
var list = data[firstChar];
if(!list) return (data[str] = []);
// we cache on data since it's already a caching object.
return (data[str] = searchFor(str, list, caseSensitive));
}
3.) Use the following script to create a precache object. I suggest you run this once and use JSON.stringify to create a static cache object. (or do this on the backend)
// we need lookUp function from above, this might take a while
var preCache = function(arr){
var chars = "abcdefghijklmnopqrstuvwxyz".split('');
var cache = {};
var i = chars.length;
while(i--){
// reduce is true, so we're destroying the original list here.
cache[chars[i]] = searchFor(chars[i], arr, false, true);
}
return cache;
}
Probably a bit more code then you expected, but optimalisation and performance doesn't come for free.

This may not be an answer for you, as I'm making some assumptions about your setup, but if you have server side code and a database, you'd be far better off making an AJAX call back to get the cut down list of results, and using a database to do the filtering (as they're very good at this sort of thing).
As well as the database benefit, you'd also benefit from not outputting this much data (10000 variables) to a web based front end - if you only return those you require, then you'll save a fair bit of bandwidth.

I can't reproduce the problem, I created a naive implementation, and most browsers do the search across 10000 15 char strings in a single digit number of milliseconds. I can't test in IE6, but I wouldn't believe it to more than 100 times slower than the fastest browsers, which would still be virtually instant.
Try it yourself: http://ebusiness.hopto.org/test/stacktest8.htm (Note that the creation time is not relevant to the issue, that is just there to get some data to work on.)
One thing you could do wrong is trying to render all results, that would be quite a huge job when the user has only entered a single letter, or a common letter combination.

I suggest trying a ready made JS function, for example the autocomplete from jQuery. It's fast and it has many options to configure.
Check out the jQuery autocomplete demo

Using a Set for large datasets (1M+) is around 3500 times faster than Array .includes()
You must use a Set if you want speed.
I just wrote a node script that needs to look up a string in a 1.3M array.
Using Array's .includes for 10K lookups:
39.27 seconds
Using Set .has for 10K lookups:
0.01084 seconds
Use a Set.

Related

CSV string to array when there is \n in body [duplicate]

This question already has answers here:
How to parse CSV data that contains newlines in field using JavaScript
(2 answers)
Closed 10 months ago.
I'm trying to convert a CSV string into an array of array of objects. Although the issue is, there is a bunch of \n in the body from the incoming request, with are causing the request to split and mess up all the code. I'm attempting to fix this even with \n in the body
The string looks like this, all the messages that are strings from the incoming request, starts with a \" and finishes with \".
"id,urn,title,body,risk,s.0.id,s.1.id,s.2.id,a.0.id,a.1.id,a.2.id,a.3.id
302,25,\"Secure Data\",\"Banking can save a lot of time but it’s not without risks. Scammers treat your bank account as a golden target –
it can be a quick and untraceable way to get money from you\n\n**TOP TIPS**\n\n**Always read your banks rules.** These tips don’t replace your banks rules - \
in fact we fully support them. If you don’t follow their rules, you may not get your money back if you are defrauded \n\n**Saving passwords or allowing auto-complete.**
Saving passwords in your browser is great for remembering them but if a hacker is able to access your computer, they will also have access to your passwords.
When on your banking site the password box we recommend you don’t enable the auto-complete function – a hacked device means they are able to gain access using this method \n\n**Use a
PIN number on your device.** It’s really important to lock your device when you’re not using it.\",,2,20,52,1,2,3,4"
I have attempted to make it smaller since there is a bunch of content, but the string that comes is basically the above, The big string with is messing my code up start at Banking can save and finishes at not using it. I have several other datas that have the same type of body, and always comes inside \" body \", I have been attempting to perform a function to separate the content from this CSV string, into an array of array or an array of objects.
This is what I attempted:
function csv_To_Array(str, delimiter = ",") {
const header_cols = str.slice(0, str.indexOf("\n")).split(delimiter);
const row_data = str.slice(str.indexOf("\n") + 1).split("\n");
const arr = row_data.map(function (row) {
const values = row.split(delimiter);
const el = header_cols.reduce(function (object, header, index) {
object[header] = values[index];
return object;
}, {});
return el;
});
// return the array
return arr;
}
I have thought on using regex too, where I would split if it had a comma of a \n, although if there is a /" it will split when it finds the next /":
array.split(/,/\n(?!\d)/))
Try this:
csvData.replace(/(\r\n|\n|\r)/gm, "");
Once you've used that to replace the new lines, or removed them, this code will help you get started with understanding how to build an array from the new CSV string:
const splitTheArrayAndLogIt = () => {
const everySingleCharacter = csvData.split(""); // <-- this is a new array
console.log(everySingleCharacter);
const splitAtCommas = csvData.split(",");
console.log(splitAtCommas);
}

Searching a key in javascript object

I have a building a chrome extension, I have following data as
var data = {}
data["Five, ok"] = "another one"
chrome.storage.sync.set(data)
chrome.storage.sync.get(function(content){console.log(content)})
>> {'Five, ok': "Another one ", 'ok, done': "New one"}
This can grow bigger with many values. (Key is a comma separated value).
I want to get all keys which include (2 different cases, this are user give values)
1. ok
2. done
this values are dynamic, what is the best way to achieve this in JavaScript/ jquery
chrome.storage.sync.get(function(content) {
var keys = Object.keys(content);
var keysOK = keys.filter(function(key){ return key.search(/\b(ok|done)\b/i) });
console.log(keysOK);
});
/\b(ok|done)\b/i finds the keys containing either ok or done
/\bok\b/i finds the keys containing ok
/\bdone\b/i finds the keys containing done
The i at the end makes the search case-insensitive.
As #charlietfl commented it's not efficient. However chrome.sync doesn't allow more than 100kB of data anyway so it's probably not an issue.

producing a word from a string in javascript

I have a string which is name=noazet difficulty=easy and I want to produce the two words noazet and easy. How can I do this in JavaScript?
I tried var s = word.split("=");
but it doesn't give me what I want .
In this case, you can do it with that split:
var s = "name=noazet difficulty=easy";
var arr = s.split('=');
var name = arr[0]; //= "name"
var easy = arr[2]; //= "easy"
here, s.split('=') returns an array:
["name","noazet difficulty","easy"]
you can try following code:
word.split(' ').map(function(part){return part.split('=')[1];});
it will return an array of two elements, first of which is name ("noazet") and second is difficulty ("easy"):
["noazet", "easy"]
word.split("=") will give you an array of strings which are created by cutting the input along the "=" character, in your case:
results = [name,noazet,difficulty,easy]
if you want to access noazet and easy, these are indices 1 and 3, ie.
results[1] //which is "noazet"
(EDIT: if you have a space in your input, as it just appeared in your edit, then you need to split by an empty string first - " ")
Based on your data structure, I'd expect the desired data to be always available in the odd numbered indices - but first of all I'd advise using a different data representation. Where is this string word coming from, user input?
Just as an aside, a better idea than making an array out of your input might be to map it into an object. For example:
var s = "name=noazet difficulty=easy";
var obj = s.split(" ").reduce(function(c,n) {
var a = n.split("=");
c[a[0]] = a[1];
return c;
}, {});
This will give you an object that looks like this:
{
name: "noazert",
difficulty: "easy"
}
Which makes getting the right values really easy:
var difficulty = obj.difficulty; // or obj["difficulty"];
And this is more robust since you don't need to hard code array indexes or worry about what happens if you set an input string where the keys are reversed, for example:
var s = "difficulty=easy name=noazet";
Will produce an equivalent object, but would break your code if you hard coded array indexes.
You may be able to get away with splitting it twice: first on spaces, then on equals signs. This would be one way to do that:
function parsePairs(s) {
return s.split(' ').reduce(
function (dict, pair) {
var parts = pair.split('=');
dict[parts[0]] = parts.slice(1).join('=');
return dict;
},
{}
);
}
This gets you an object with keys equal to the first part of each pair (before the =), and values equal to the second part of each pair (after the =). If a string has multiple equal signs, only the first one is used to obtain the key; the rest become part of the value. For your example, it returns {"name":"noazet", "difficulty":"hard"}. From there, getting the values is easy.
The magic happens in the Array.prototype.reduce callback. We've used String.prototype.split to get each name=value pair already, so we split that on equal signs. The first string from the split becomes the key, and then we join the rest of the parts with an = sign. That way, everything after the first = gets included in the value; if we didn't do that, then an = in the value would get cut off, as would everything after it.
Depending on the browsers you need to support, you may have to polyfill Array.prototype.reduce, but polyfills for that are everywhere.

Sending an array with JavaScript to the next page

The combination of my methods of declaring an array, adding elements to the array and applying the method toString() does not work. Essentially I enter a certain number (between one and five) values to textvariables : fontVorto1, fontVorto2, fontVorto3 ……… in the html-part of the document.
When I decide on leaving the remaining textelements empty, I click on a button, to assign them to an array, by way of the following function:
function difinNombroFv () {
var fontVortoj = new array();
fontVortoj[0] = document.getElementsByName("fontVorto1")[0].value;
fontVortoj[1] = document.getElementsByName("fontVorto2")[0].value;
fontVortoj[2] = document.getElementsByName("fontVorto3")[0].value;
……………….
and put them together in a string:
x = fontVortoj.toString();
document.getElementsByName("fontVorto")[0].value = x;
(the extra variable x is not needed) to enable me sending them to the next document, where I want to unserialize them with
$fontVortoj = unserialize($_POST["fontVorto"]);
I tested the method toString() by insering an alert(x), but the result was that I got for x the value of "fontVorto1" only.
I met solutions with JSON, jQuery etc., but I never used those "languages", only HTML, JavaScript, PHP.
Will my Christmas day be spoiled because of this simple problem ;>)?
couple of things to note:
1. var fontVortoj = new array(); . here new array() is not correct. it should be:
var fontVortoj = new Array();
now if you call fontVortoj.toString(), then it will convert the array and return a string with array elements separated by comma.
you can rebuild the array from the string in php by using "explode" function.
you can rebuild the array from the string in javascript by using "split" function.
Apparently I misunderstood the question to begin with.
To serialize an astray, you can use .join()
By default, it will give you the values, joined by commas.
To deserialize, use .split()
If there's a chance that there might be commas in your values, choose a more elaborate string for joining:
var ar = ["a", "b"];
var serialized = ar.join("|"); // "a|b"
var deserialized = serialized.split("|"); //["a", "b"]
The string that you use for joining and splitting can be as long as you like.
If you want to be completely covered against any values, then you need to look at JSON.stringify() & JSON.parse(). But that had browser compatibility issues.

How do I match this text faster?

I'm building an autosuggest for names. When the user types in the textbox, it hits the server and runs this:
var names = [ list of 1000 names ]; //I have a list of 1000 names, this is static.
var query = 'alex';
var matched_names = [];
//This is when it gets slow....
names.forEach(function(name){
if(name.indexOf(query) >= 0){
matched_names.push(name);
}
});
return matched_names;
How can I make this faster? I'm using Node.js
If the names are static then move this code to the client and run it there. The only reason to run code like this on the server is if the data source is dynamic in some way.
Doing this logic client-side will dramatically improve performance.
You should probably use filter instead, for one thing, because it's native:
var names = [ /* list of 1000 names */ ];
var query = 'alex';
var matched_names = names.filter(function(name) {
return name.indexOf(query) > -1;
});
return matched_names;
If you store the names in sorted order, then you can use binary search to find the region of names within the sorted order that start with the fragment of name that the user has typed so far, instead of checking all the names one by one.
On a system with a rather odd programming language, where I wanted to find all matches containing what the user had typed so far in any position, I got a satisfactory result for not much implementation effort by reviving http://en.wikipedia.org/wiki/Key_Word_in_Context. (Once at university I searched through a physical KWIC index, printed out from an IBM lineprinter, and then bound as a document for just this purpose.
I would suggest you to do this stuff on the client-side and prefer (for now) a while loop instead of a filter/forEach approach:
var names = [ /* list of 1000 names */ ]
, query = 'alex'
, i = names.length
, matched_names = [];
while(i--){
if(names[i].indexOf(query) > -1){
matched_names.push(names[i]);
}
}
return matched_names;
This will be much faster (even if filter/forEach are natively supported). See this benchmark: http://jsperf.com/function-loops/4

Categories

Resources