Ok, this is a multipart concept. However I'm sure if I can figure this piece out, the rest will follow.
What I have is an array of Words and Phrases. And I have a TextArea where people can type in. What I want to do is be able to search the array for matches or similarities in what the user is typing. The closest thing I can think of is an auto complete function. But thats not entirely what I want, yes in part what I want is an auto completes functionality, but so much more in the end run that an existing auto complete is a bit bulky for my needs.
What my Aim is, is after the user hits the spacebar is to trigger the search as they type. Now up to this point I am good. My issue is my logic is flawed from here. I want to be able to take the entire boy of text up to the point of hitting the spacebar and check it against my array of words and phrases. But Im not sure how. Currently I am split() on the textarea itself where space is my split() delimiter, but I realize now that thats not right. What I was thinking initially was split it, check it against the other array and it would be a happy day if something matched, then I realized I have phrases, if I am trying to check a phrase for a match then I wont match one.
Well hopefully this makes sense. I need to walk through logic on this, there really isnt code currently, as I am not debugging, I am trying to figure out a logic to work with that works. So I can move forward.
UPDATE:
Check this fiddle: http://jsfiddle.net/VwNHN/
You will need to tweak it to your requirements, but it will give a fair idea of how the below logic can be implemented.
Well, the logic upon keypress (probably any key and not just spacebar) can be something like:
1) Get your current cursor position - say X
Refer for example: http://demo.vishalon.net/getset.htm
2) Get N characters to the left of X. i.e. a substring of the whole text from index X-N to X - store it in Y. You will need to fix on a value for N (for ex: 100). N is the longest word/phrase you are looking to match.
ex: if full text is "hello world i am a sentence", and cursor is at the end, and N is 10, Y would be "a sentence"
3) Split Y by space character and store each split in an array incrementally and then reverse it - lets call the array PHRASES
ex: if Y is "this is a sentence" - then PHRASES would be
[ "this is a sentence", "this is a", "this is", "this" ]
4) Check your array of words/phrases with each item in the PHRASES - the longest matching parts will come first and the short matching ones will come last - this set of matches is your auto-complete list.
I would split the problem at least into two branches:
search event triggered by user.
Search function
visualization of results
If I understood what you're trying to implement, I would trigger a search on any 'onkeypress' event, unless your array is not too big (otherwise it will hang on any keypression).
Then, the search function: you have to search in an array, so I would search element by element. Jquery provides a nice jQuery.each() function. Also, I would consider _.each(list, iterator, [context]) in the underscore plugin.
Visualization of results: it's not clear to me what you want to show (a grid, a table...?), but if every element of the array is associated to a different DOM object, then you could modify its properties runtime, maybe with jquery.
Let me know if you need more.
Related
I'm working on this simple, straightforward text content filtering mechanism on our post commenting module where people are prohibited from writing foul, expletive words.
So far I'm able to compare (word-by-word, using .include()) comment contents against the blacklisted words we have in the database. But to save space, time and effort in entering database entries for each word such as 'Fucking' and 'Fuck', I want to create a mechanism where we check if a word contains a blacklisted word.
This way, we just enter 'Fuck' in the database. And when visitor's comment contains 'Fucking' or 'Motherfucker', the function will automatically detect that there is a word in the comment that contain's 'fuck' in it and then perform necessary actions.
I've been thinking of integrating .substring() but I guess that's not what I need.
Btw, I'm using React (in case you know of any built-in functions). Much as possible, I wanna deviate from using libraries for this mechanism.
Thanks a heap!
"handover".indexOf("hand")
It will return index if it exists otherwise -1
To ignore cases you can define all your blacklisted words in lower case and then use this
"HANDOVER".toLowerCase().indexOf("hand")
To detect if a string has another string inside of it you can simply use the .includes method, it does not work on a word by word basis but checks for a sequence of characters so it should meet you requirements. It returns a boolean value for if the string is inside the other string
var sentence = 'Stackoverflow';
console.log(sentence.includes("flow"));
You were on the right track with .includes()
console.log('handover'.includes('hand'));
Returns true
I am wondering if it would be possible to disable only the stop word filtering in the MongoDB text search. Sometimes I just want to search for words like "you", "I", "was", etc. I would still like to take advantage of the stemming. Just not the stop word filtering.
db.collection.find({$text: {$search: "you"}})
The above would not return any results.
But a traditional approach like
db.collection.find({shortDescription: new RegExp(".*you.*",'i')}) would give me what I want.
So, how can I have the text search but also be able to search these words (stop words).
You can disable stop words by changing the language value of your text index when you create it. From the MongoDB documentation:
If you specify a language value of "none", then the text search uses simple tokenization with no list of stop words and no stemming [source].
So create your index using:
db.collection.createIndex(
{ content : "text" },
{ default_language: "none" }
)
[code source]
when you are inserting any text for text-indexed field. The index values are created after filtering the text.
So when you are searching for any stop words it's not present in the index value list. That's the reason its never going to search out the stop words. It's by design and probably non-editable.You have to use Regex for such criteria. I hope there is no other way available.
Since you want the stemming, I assume there will never be just stop words, but always at least one "normal" word too. On top of that, I hope you know exactly which stop words you want.
If that's the case, I suggest to put the stop words into quotes. As the docs say, if there are phrases present "the search performs a logical AND of the phrase with the individual terms in the search string." And thankfully, it appears that stop words are not stripped from phrases.
For example, assume a collection with the following documents:
{"text": "I love blueberries"},
{"text": "She loves blueberries"},
{"text": "She loved the last blueberry most."}
Searching for blueberry, blueberry I or blueberries she each time returns all three collections. But searching for blueberries "she" only returns the last two collections, i.e. stemming is considered and the existence of the stop word enforced.
Sadly, this won't work if you're searching for just stop words, i.e. searching for "she" will return nothing. Also, you can't OR several stop words: If you add "and me" to each of the first two documents so that they become "I love blueberries and me" and "She loves blueberries and me" respectively, searching for blueberry "she" "me" will only return the second document.
However, beware of extremely short stop words that may be part of other words: On my test database, searching for blueberry "I" returned both the first and second documents -- I assume due to the i in "blueberries".
the language itself is not that important, but I'd figure I'd stick with Javascript.
Essentially, I have thousands of "comments" each month and would like to have a naive happiness 'evaluation' by automation based on searching 10,000 words within those comments (average word count of each comment is 21 words, taking everything so far).
The way the formula works (borrowed from Hedonometer) -- is take the 'happiness' score of each word in the text (if found in the 10k list) and average it.
I'll test a few things and maybe edit back in the results here, but I'm not even sure where to begin. Seems like very heavy data lifting (Though only needs to be done once per comment of course) -- and maybe it's better suited to R or SQL (likely not), but not sure.
I believe this problem is sometimes referred to as 'bag of words' or 'term frequency saturation'.
You could create a hash table from your words like so (abbreviated) :
let wordRanks = {'hate':-100,'love':100,'ok':10};
Then have a string like this and split it into words.
let str = `I hate love it's just ok`;
let words = str.split(' ');
Then you can iterate through the words and get a score :
let commentScore = 0;
words.forEach(function(word){
if(wordRanks[word]){
commentScore += parseInt(wordRanks[word])
}
});
console.log(commentScore); //should be 10
Using a hash table shouldn't be computationally expensive for the lookup. Should work, although you may have to split the words better to remove trailing punctuation, as I had a comma after love in my initial code and it gave the wrong result because there was no hash table match for 'love,'
I'd definitely go with Python's Natural Language Toolkit (NLTK) it comes with a set of functions that will make your life easier, like text frequencies, remove duplicates, remove of stop words, find synonyms, etc., the idea being reducing the size of your text as much as possible to do the sentiment analysis.
In a similar project my approach was:
Remove neutral words, pronouns, prepositions, determiners, names, etc.
Remove duplicates.
Check for word synonyms as I progressed into the text and remove them from the rest of the text.
Dynamically create a sentiment threshold score for a paragraph, so once it reached that score I'd stop working on that paragraph and move on to the next one, the same for the text in overall.
Hope this works!
I'm looking for a basic search functionality with JavaScript.
The Scenario: The user enters a single or multiple words, and hit a button. JavaScript looks up in an array of strings for items that probably relates to the entered search sentence.
The only function I'm aware of right now is "string.search" which returns the position of the string you are searching for. This is good, but not for all cases. Here are a few examples the search function should cover
Let's assume I have the following string: "This is a good day" in my array. The following search terms should return true when tested against my search function.
Search Term 1: This a good day
Search Term 2: This day
Search Term 3: This was good
Search Term 4: good dy -the user made a typo-
So nothing particular or specific. Just a basic search functionality that predicts (at a low level, and language agnostic) if the search term related to the strings in the tested array.
Was the last a typo for 'day'?
If not, you could simply split the search sentence, as well as the original string using the split() function.
Then you would iterate over the search words and for each make sure they appear in the source string. As soon as you don't find the word, you stop the search.
This is assuming that all the search words should be AND'ed, not OR'ed.
Does that help?
I guess what you are looking for is a pattern matching based live search similar to finite-state-automata-like (FSA) searching:
This link shows an example that'll allow you to search case-insensitively:
Example: Array contains 'This is a good day'
Searching for any (or all) of the following is valid:
THis a Day
Thagd (Th is a g oo d day)
good dy -intended typo-
etc.
A case-sensitive (albeit not perfect FSA based) version can be found here There is also one by John Resig but I don't have a link to his demo but it'd be worth looking at - it's a javascript/jquery port of the first link I mentioned.
Hope this helps!
This is not as simple as one would think. We're talking fuzzy matching and Levenshtein distance / algorithm.
See this past question:
Getting the closest string match
I have two paragraphs of text, one is saved in a file while the other is the piece entered by a user willing to write the same actual paragraph. Now I want to compare the two and tell the user how efficient was he to copy the same paragraph. Any techniques on how to do it ?
I was thinking of these issues which make it complex.
What if the user spelled a word wrong
What if the user skipped a word in between
What if the user skipped two words and the rest of the text is same.
Do a diff on the input and the file, there is a javascript library for that here
http://code.google.com/p/google-diff-match-patch/ will tell you exactly what is different then you can use this information to determine efficiency of copy
You're looking for a friendly diff output. Try something like this:
Javascript Diff Algorithm
The sample should be simple enough:
var diff = diffString(
"The red brown fox jumped over the rolling log.",
"The brown spotted fox leaped over the rolling log"
);
Working example: http://jsbin.com/uhalo3
You can do this in 2 ways:
This one gives quite a precise report:
Measure the time user took to write
Use split to make an array with every words in your file and same for the entered text
Compare each word entered by user with the similar from your list, and also with the one before and the next (because you need 2 see if he skipped a word or else... everything from there will go wrong)
Count the errors (you can use levenstein distance to compare how many mistakes where in each word)
Give the report
Use levenstein distance over the 2 strings (yes... treat all text like a single string).
This one is muuuuuuch easier to use... but the report is not so precise.