Fastest way to search string for substring using jQuery - javascript

I am using DataTables plugin for jQuery. Within the DataTable I have approximately 16 tr rows with 4 td columns each. The DataTables plugin provides an API extension that allows searching for a string in all cells of the table or in all cells of a specified column.
The search extension returns an array of row indices where a match was found. For example, [3, 7, 10, 11]. The search extension originally supported an exact match search which I had to modify from:
if (val == sSearch)
to:
if (val.indexOf(sSearch) > 0)
My customization is certainly the cause of the performance issues I'm having, but it was necessary since the contents of the cells are updated dynamically and therefore unpredictable for performing an exact match search.
An example haystack:
<input id="_HeatOfRejection" class="form-control text-right text-box single-line" type="text" name="HeatOfRejection" measureid="HeatLoad" value="5000.0" uomid="MBH">
An example needle:
' measureid=\"HeatLoad\" '
The average time required to perform a search for needle is ~17.5ms and since the inner loop contains ~16 different needles with an outer loop causing additional loops of the inner loop, the processing time is too noticeable. It's not horrible, but can take 2-3 seconds. On that note, this is not a critical function.
What I am wanting to know is if there is a faster way than using indexOf() to perform this search. Using a jQuery selector might be faster, but the id is unknown/unimportant to the search. Multiple controls can contain the needle and so I have to search the entire column.

To search for an element having an attribute with a specific value, you can use
$("[measureid='HeatLoad']");
It will return all the elements having attribute 'measureid' with value 'HeatLoad'.

Related

Using Google sheets to tally data sets

I have tried many formulas but i am still not able to get what i want. I need help to write an APP SCRIPT code for it. The problem is that I have to match two data sets and return the value of the adjacent cell. I want the sheet to pick a value from first cell of first row from a sheet and match it to entire cells of a row from other sheet (in the same workbook) and then paste the value which was being matched, infront of the cell which matches it. Now the problem is that my data sets are not equal so i can not use vlookup, i want to match and how much percentage it is matching. So highest percentage should be considered as a match. Kindly visit this link for an example in google sheet. [https://docs.google.com/spreadsheets/d/1u_-64UvpirL2JHpgA--GDa263wVb2idIhIYZlFnX2xQ/edit?usp=sharing]
There are a variety of ways to do this sort of partial matching, depending on the real data and how sophisticated you need to match logic to be.
Let's start with the simplest solution first. Did you know you can use wildcards in VLOOKUP? See Vlookup in Google Sheets using wildcards for partial matches.
So for your example data, add a column C to "Set 1" with the formula:
=VLOOKUP("*" & A2 & "*",'Set 2'!A1:A5,1,FALSE)
Obviously, this method fails if "Baseball bat" was supposed the be results for "Ball" instead of "Ballroom". VLOOKUP will simply return the first result that matches. This method also ignores case sensitivity. Finally, this method only works for appending data to set 1 from set 2, not the other way around. Without knowing more about the actual dataset, it's hard to give a solid solution.

What data Structure to use for storing and searching within huge JSON / Dict type object in Browser

I am creating a ReactJS app. The app has over 100,000 entities on screen which I am plotting using WebGL. The properties of these entirties are stored in a JSON/Dict type object. Whenever a user applies a filter, I need to go through the values, compare the properties, and select the ID (type UUID4) of those not matching the filter, so that I can turn their Visibility to False in the WebGL container.
I am presently using an Array of the following type :-
spriteProps = [ {id: xxxx-...-xxxx, color: Blue, Length: 10, points:50},
{id: yyyy-...-yyyy, color:Red, Length:25, points:112},
.....
]
The user may want to see all entities which are Blue in color and have a length less than 100. So I have to iterate through each value and check which values match the filter.
However, this is very slow.
What is the best data structure to use in this situation to get the best performance? Is there any JS library I can use to improve performance?
Thanks.
https://www.ag-grid.com/react-getting-started/
Ag-grid is a good library that will help you implement what you are looking for.
I have used it with data that was an array of objects and the dataset was very large. It should fit your needs perfectly.
Sorting and searching works seamlessly. Your properties will be the column headings and you can sort and filter based on columns. Selecting rows and pinning specific rows is also possible.
Basically in this use case you need to filter from large set of data.
Cross filter is very good option. Crossfilter is a JavaScript library for exploring large multivariate datasets in the browser.
https://github.com/crossfilter/crossfilter
You can try the following binary search approach.
Choose any property likely to be used in the filtering criteria. I'm choosinglength here. When user applies filters and if length isn't used as a filter, then fallback to simply iterating the array in sequence.
When data is available, sort the array in ascending or descending order based on length property.
When user applies filters, perform a binary search to find the index above or below which all elements are within the given length.
Iterate on the section of the array containing elements within given length and turn visibility off for elements with different color property.
Then iterate on other section of the array containing elements greater than given length and turn visibility off for all these elements.
We can see that we are visiting every element in the array. So, this approach isn't any better than visiting each element in the array sequentially.
If all the elements have visibility off at the beginning and if we have to turn visibility on for selected elements, then we can avoid visiting the second section of the array (point 4), and this binary search approach will be useful in such case.
But since it isn't the case, we have to visit every element in the array and therefore time complexity couldn't get better than linear time O(n).

Target search results number in variable

I'm trying to target the number of search results on our website for each search term so that I can see how many results each one pulls in.
I'm working off of this article, but I can't get the javascript function correct to pull out the number (which could be as high as 2000) and put it into a variable.
<div class="search-results-text"><strong>732 results</strong> found for ‘<strong>search term</strong>’</div>
Hoping someone can help me out with the javascript function that would grab that number before "results". Thanks!
You would probably get away with a custom Javascript variable like this:
function() {
return document.querySelector('.search-results-text strong').innerText.split(" ")[0];
}
The querySelector with the CSS selector gets the Element, innerText is the text without the markup, the split splits the string up by whitespace, which gives you an array, and the first element of that array is your number (array are index starting with zero, so [0] refers to the first element).
This is not particularly elegant (for one you probably want to add some sort of error handling), and you could actually replace document.querySelector('.search-results-text strong').innerText with a DOM type variable in GTM (which by default returns the text of the element).
I don't think you can get the number with CSS selectors alone.

Google App Script to Append Value from one Cell to String of Numbers in another Cell

I’ve been trying to figure out how to write a script which will take the value from one cell and append it to the end of a string of numbers in another cell of that same row. The newly appended number needs to be separated by a comma from the previously appended value, and the whole string needs to be wrapped between brackets. EX. [2,3,3,4.5,2.5,2.1,1.3,0.4]. The script will need to loop through all of the rows containing data on a named sheet beginning with the third row.
The above image is obviously just an example containing only two rows of data. The actual spreadsheet will contain well over a thousand rows, so the operation must be done programmatically and will run weekly using a timed trigger.
To be as specific as I can, what I need help with is to first know if something like the appending is even possible in Google App Scripts. I've spent hours searching and I can't seem to find a way to append a new value (ex. cell A3) to the current string (ex. cell B3) without overwriting it completely.
In full disclosure; I'm a middle school teacher trying to put something together for my school.
To be as specific as I can, what I need help with is to first know if something like the appending is even possible in Google App Scripts.
Seeing the expected result, it's inserting rather than appending, as the string should be added before the last character (]). Anyway, yes, this is possible by using JavaScript string handling methods.
Use getValue() to the get the cell values, both the Current GPA and the GPA History.
One way is to use replace
Example using pure JavaScript:
var currentGPA = 3.5
var gpaHistory = '[2,3.1,2.4]';
gpaHistory = gpaHistory.replace(']',','+currentGPA+']');
console.info(gpaHistory)
Once you get the modified gpaHistory, use setValue(gpaHistory) to add this value to the spreadsheet.

How to search for closest tag set match in JavaScript?

I have a set of documents, each annotated with a set of tags, which may contain spaces. The user supplies a set of possibly misspelled tags and I wants to find the documents with the highest number of matching tags (optionally weighted).
There are several thousand documents and tags but at most 100 tags per document.
I am looking on a lightweight and performant solution where the search should be fully on the client side using JavaScript but some preprocessing of the index with node.js is possible.
My idea is to create an inverse index of tags to documents using a multiset, and a fuzzy index that that finds the correct spelling of a misspelled tag, which are created in a preprocessing step in node.js and serialized as JSON files. In the search step, I want to consult for each item of the query set first the fuzzy index to get the most likely correct tag, and, if one exists to consult the inverse index and add the result set to a bag (numbered set). After doing this for all input tags, the contents of the bag, sorted in descending order, should provide the best matching documents.
My Questions
This seems like a common problem, is there already an implementation for it that I can reuse? I looked at lunr.js and fuse.js but they seem to have a different focus.
Is this a sensible approach to the problem? Do you see any obvious improvements?
Is it better to keep the fuzzy step separate from the inverted index or is there a way to combine them?
You should be able to achieve what you want using Lunr, here is a simplified example (and a jsfiddle):
var documents = [{
id: 1, tags: ["foo", "bar"],
},{
id: 2, tags: ["hurp", "durp"]
}]
var idx = lunr(function (builder) {
builder.ref('id')
builder.field('tags')
documents.forEach(function (doc) {
builder.add(doc)
})
})
console.log(idx.search("fob~1"))
console.log(idx.search("hurd~2"))
This takes advantage of a couple of features in Lunr:
If a document field is an array, then Lunr assumes the elements are already tokenised, this would allow you to index tags that include spaces as-is, i.e. "foo bar" would be treated as a single tag (if this is what you wanted, it wasn't clear from the question)
Fuzzy search is supported, here using the query string format. The number after the tilde is the maximum edit distance, there is some more documentation that goes into the details.
The results will be sorted by which document best matches the query, in simple terms, documents that contain more matching tags will rank higher.
Is it better to keep the fuzzy step separate from the inverted index or is there a way to combine them?
As ever, it depends. Lunr maintains two data structures, an inverted index and a graph. The graph is used for doing the wildcard and fuzzy matching. It keeps separate data structures to facilitate storing extra information about a term in the inverted index that is unrelated to matching.
Depending on your use case, it would be possible to combine the two, an interesting approach would be a finite state transducers, so long as the data you want to store is simple, e.g. an integer (think document id). There is an excellent article talking about this data structure which is similar to what is used in Lunr - http://blog.burntsushi.net/transducers/

Categories

Resources