I'm looking for a basic search functionality with JavaScript.
The Scenario: The user enters a single or multiple words, and hit a button. JavaScript looks up in an array of strings for items that probably relates to the entered search sentence.
The only function I'm aware of right now is "string.search" which returns the position of the string you are searching for. This is good, but not for all cases. Here are a few examples the search function should cover
Let's assume I have the following string: "This is a good day" in my array. The following search terms should return true when tested against my search function.
Search Term 1: This a good day
Search Term 2: This day
Search Term 3: This was good
Search Term 4: good dy -the user made a typo-
So nothing particular or specific. Just a basic search functionality that predicts (at a low level, and language agnostic) if the search term related to the strings in the tested array.
Was the last a typo for 'day'?
If not, you could simply split the search sentence, as well as the original string using the split() function.
Then you would iterate over the search words and for each make sure they appear in the source string. As soon as you don't find the word, you stop the search.
This is assuming that all the search words should be AND'ed, not OR'ed.
Does that help?
I guess what you are looking for is a pattern matching based live search similar to finite-state-automata-like (FSA) searching:
This link shows an example that'll allow you to search case-insensitively:
Example: Array contains 'This is a good day'
Searching for any (or all) of the following is valid:
THis a Day
Thagd (Th is a g oo d day)
good dy -intended typo-
etc.
A case-sensitive (albeit not perfect FSA based) version can be found here There is also one by John Resig but I don't have a link to his demo but it'd be worth looking at - it's a javascript/jquery port of the first link I mentioned.
Hope this helps!
This is not as simple as one would think. We're talking fuzzy matching and Levenshtein distance / algorithm.
See this past question:
Getting the closest string match
Related
the language itself is not that important, but I'd figure I'd stick with Javascript.
Essentially, I have thousands of "comments" each month and would like to have a naive happiness 'evaluation' by automation based on searching 10,000 words within those comments (average word count of each comment is 21 words, taking everything so far).
The way the formula works (borrowed from Hedonometer) -- is take the 'happiness' score of each word in the text (if found in the 10k list) and average it.
I'll test a few things and maybe edit back in the results here, but I'm not even sure where to begin. Seems like very heavy data lifting (Though only needs to be done once per comment of course) -- and maybe it's better suited to R or SQL (likely not), but not sure.
I believe this problem is sometimes referred to as 'bag of words' or 'term frequency saturation'.
You could create a hash table from your words like so (abbreviated) :
let wordRanks = {'hate':-100,'love':100,'ok':10};
Then have a string like this and split it into words.
let str = `I hate love it's just ok`;
let words = str.split(' ');
Then you can iterate through the words and get a score :
let commentScore = 0;
words.forEach(function(word){
if(wordRanks[word]){
commentScore += parseInt(wordRanks[word])
}
});
console.log(commentScore); //should be 10
Using a hash table shouldn't be computationally expensive for the lookup. Should work, although you may have to split the words better to remove trailing punctuation, as I had a comma after love in my initial code and it gave the wrong result because there was no hash table match for 'love,'
I'd definitely go with Python's Natural Language Toolkit (NLTK) it comes with a set of functions that will make your life easier, like text frequencies, remove duplicates, remove of stop words, find synonyms, etc., the idea being reducing the size of your text as much as possible to do the sentiment analysis.
In a similar project my approach was:
Remove neutral words, pronouns, prepositions, determiners, names, etc.
Remove duplicates.
Check for word synonyms as I progressed into the text and remove them from the rest of the text.
Dynamically create a sentiment threshold score for a paragraph, so once it reached that score I'd stop working on that paragraph and move on to the next one, the same for the text in overall.
Hope this works!
I have been looking for a few hours how to do this particular regular expression magic with little to no luck.
I have been playing around with parsing some of my own medical data (why not?) which unfortunately comes in the form of a very unstructured text document with no tags (XML or HTML).
Specifically, as a prototype, I only want to match what my LDL delta (cholesterol change) is as a percentage.
In the form it shows up in a few different ways:
LDL change since last visit: 10%
or
LDL change since last visit:
10%
or
LDL change since last visit:
10%
I have been trying to do this in JavaScript using the native RegExp engine for a few hours (more than I want to admit) with little success. I am by no means a RegExp expert but I have been looking at an expression like such:
(?<=LDL change since last visit)*(0*(100\.00|[0-9]?[0-9]\.[0-9]{0,2})%)
Which I know does not work in JS because the lack support for ?<=. I tested these in Ruby but even then they were not successful. Could anybody work me through some ways of doing this?
EDIT:
Since this particular metric shows up a few times in different areas, I would like the regex to match them all and have them be accessible in multiple groups. Say matching group 0 corresponds to the Lipid Profile section and matching group 1 corresponds to the Summary.
Lipid profile
...
LDL change since last visit:
10%
...
Summary of Important Metrics
...
LDL change since last visit: 10%
...
A lookbehind solution is complicated because most languages only support fixed or finite length lookbehind assertions. Therefore it's easier to use a capturing group instead. (Also, the * quantifier after the lookbehind that you used makes no sense).
And since you don't really need to validate the number (right?), I would simply do
regexp = /LDL change since last visit:\s*([\d.]+)%/
match = regexp.match(subject)
if match
match = match[1]
else
match = nil
end
If you expect multiple matches per string, use .scan():
subject.scan(/LDL change since last visit:\s*([\d.]+)%/)
Ok, this is a multipart concept. However I'm sure if I can figure this piece out, the rest will follow.
What I have is an array of Words and Phrases. And I have a TextArea where people can type in. What I want to do is be able to search the array for matches or similarities in what the user is typing. The closest thing I can think of is an auto complete function. But thats not entirely what I want, yes in part what I want is an auto completes functionality, but so much more in the end run that an existing auto complete is a bit bulky for my needs.
What my Aim is, is after the user hits the spacebar is to trigger the search as they type. Now up to this point I am good. My issue is my logic is flawed from here. I want to be able to take the entire boy of text up to the point of hitting the spacebar and check it against my array of words and phrases. But Im not sure how. Currently I am split() on the textarea itself where space is my split() delimiter, but I realize now that thats not right. What I was thinking initially was split it, check it against the other array and it would be a happy day if something matched, then I realized I have phrases, if I am trying to check a phrase for a match then I wont match one.
Well hopefully this makes sense. I need to walk through logic on this, there really isnt code currently, as I am not debugging, I am trying to figure out a logic to work with that works. So I can move forward.
UPDATE:
Check this fiddle: http://jsfiddle.net/VwNHN/
You will need to tweak it to your requirements, but it will give a fair idea of how the below logic can be implemented.
Well, the logic upon keypress (probably any key and not just spacebar) can be something like:
1) Get your current cursor position - say X
Refer for example: http://demo.vishalon.net/getset.htm
2) Get N characters to the left of X. i.e. a substring of the whole text from index X-N to X - store it in Y. You will need to fix on a value for N (for ex: 100). N is the longest word/phrase you are looking to match.
ex: if full text is "hello world i am a sentence", and cursor is at the end, and N is 10, Y would be "a sentence"
3) Split Y by space character and store each split in an array incrementally and then reverse it - lets call the array PHRASES
ex: if Y is "this is a sentence" - then PHRASES would be
[ "this is a sentence", "this is a", "this is", "this" ]
4) Check your array of words/phrases with each item in the PHRASES - the longest matching parts will come first and the short matching ones will come last - this set of matches is your auto-complete list.
I would split the problem at least into two branches:
search event triggered by user.
Search function
visualization of results
If I understood what you're trying to implement, I would trigger a search on any 'onkeypress' event, unless your array is not too big (otherwise it will hang on any keypression).
Then, the search function: you have to search in an array, so I would search element by element. Jquery provides a nice jQuery.each() function. Also, I would consider _.each(list, iterator, [context]) in the underscore plugin.
Visualization of results: it's not clear to me what you want to show (a grid, a table...?), but if every element of the array is associated to a different DOM object, then you could modify its properties runtime, maybe with jquery.
Let me know if you need more.
Is there a library available for Auto Suggest/Complete for cases like the following
Searching for "Vir" returns both "West Virginia" and "Virginia"
Thanks
EDIT
Sorry for not explaining it more. In the problem above, I do not want a "contains" search, but a prefix search on word boundaries. So "est" should not return "West Virginia", but "wes" or "vir" should.
The list is around 500 items large.
Proposed Solution
I modified the trie implementation by Mike de Boer https://github.com/mikedeboer/trie to solve this. I split an item on word boundaries and stored each word in the trie. For the last letter of each word I stored the index of the item that the word came from in the trie node. When user searches, I return a list of indices and then get the corresponding items from the main list.
What do you guys think?
First, you should try to use google or search previous question before asking such a straight forward question.
To answer your question, you could use jquery-ui wich has amoung many other widgets one called Autocomplete.
If you're familair with JQuery this should be pretty easy to implement.
http://jqueryui.com/demos/autocomplete/
you can use jquery autocomplete. ah, you've answered that already by yourself!
http://bit.ly/uXHRR0
Try this:
http://code.drewwilson.com/entry/autosuggest-jquery-plugin
I modified the trie implementation by Mike de Boer https://github.com/mikedeboer/trie to solve this. I split an item on word boundaries and stored each word in the trie. For the last letter of each word I stored the index of the item that the word came from in the trie node. When user searches, I return a list of indices and then get the corresponding items from the main list.
I have two paragraphs of text, one is saved in a file while the other is the piece entered by a user willing to write the same actual paragraph. Now I want to compare the two and tell the user how efficient was he to copy the same paragraph. Any techniques on how to do it ?
I was thinking of these issues which make it complex.
What if the user spelled a word wrong
What if the user skipped a word in between
What if the user skipped two words and the rest of the text is same.
Do a diff on the input and the file, there is a javascript library for that here
http://code.google.com/p/google-diff-match-patch/ will tell you exactly what is different then you can use this information to determine efficiency of copy
You're looking for a friendly diff output. Try something like this:
Javascript Diff Algorithm
The sample should be simple enough:
var diff = diffString(
"The red brown fox jumped over the rolling log.",
"The brown spotted fox leaped over the rolling log"
);
Working example: http://jsbin.com/uhalo3
You can do this in 2 ways:
This one gives quite a precise report:
Measure the time user took to write
Use split to make an array with every words in your file and same for the entered text
Compare each word entered by user with the similar from your list, and also with the one before and the next (because you need 2 see if he skipped a word or else... everything from there will go wrong)
Count the errors (you can use levenstein distance to compare how many mistakes where in each word)
Give the report
Use levenstein distance over the 2 strings (yes... treat all text like a single string).
This one is muuuuuuch easier to use... but the report is not so precise.