comparing two sets of text - javascript

I have two paragraphs of text, one is saved in a file while the other is the piece entered by a user willing to write the same actual paragraph. Now I want to compare the two and tell the user how efficient was he to copy the same paragraph. Any techniques on how to do it ?
I was thinking of these issues which make it complex.
What if the user spelled a word wrong
What if the user skipped a word in between
What if the user skipped two words and the rest of the text is same.

Do a diff on the input and the file, there is a javascript library for that here
http://code.google.com/p/google-diff-match-patch/ will tell you exactly what is different then you can use this information to determine efficiency of copy

You're looking for a friendly diff output. Try something like this:
Javascript Diff Algorithm
The sample should be simple enough:
var diff = diffString(
"The red brown fox jumped over the rolling log.",
"The brown spotted fox leaped over the rolling log"
);
Working example: http://jsbin.com/uhalo3

You can do this in 2 ways:
This one gives quite a precise report:
Measure the time user took to write
Use split to make an array with every words in your file and same for the entered text
Compare each word entered by user with the similar from your list, and also with the one before and the next (because you need 2 see if he skipped a word or else... everything from there will go wrong)
Count the errors (you can use levenstein distance to compare how many mistakes where in each word)
Give the report
Use levenstein distance over the 2 strings (yes... treat all text like a single string).
This one is muuuuuuch easier to use... but the report is not so precise.

Related

Comparing two strings, find the indexes of added words, ignoring "edited" words

Is it possible? With no library's or memory of the changes as they're made?
The example I'm showing below is probably more edited than the text I will be evaluating (although much shorter). For my use case, I will be editing a transcription that is usually very accurate and I need to be able to know if a new word is added & where so I can approximate a timecode for the new word, as well as shift the existing timecodes forwards to combat the offset that the new word has created.
originalString = "Hello, this sum example txt. Hopefully this is possible!";
editedString = "Hello, this is some example text. Maybe this is not impossible.";
//The edited words are: "sum/some", "txt/text", "Hopefully/Maybe", "possible/impossible"
//It would be useful to get the edited words also - but not essential
//the new words are: "is", "not"
//Output would be
newWordIdxs = [2, 9];
First post here and have only been coding for 3 or 4 months so any tips on how to ask better questions are very welcome.
How to estimate an word was edited? According to the example you posted, the word 'some' may be added, and 'sum' is removed. It's more logical to find which word or character is removed or added by excluding the same words or characters in sequence. This javascript library may help , https://www.npmjs.com/package/diff

Javascript - Most efficient way to search thousand texts for thousands of words?

the language itself is not that important, but I'd figure I'd stick with Javascript.
Essentially, I have thousands of "comments" each month and would like to have a naive happiness 'evaluation' by automation based on searching 10,000 words within those comments (average word count of each comment is 21 words, taking everything so far).
The way the formula works (borrowed from Hedonometer) -- is take the 'happiness' score of each word in the text (if found in the 10k list) and average it.
I'll test a few things and maybe edit back in the results here, but I'm not even sure where to begin. Seems like very heavy data lifting (Though only needs to be done once per comment of course) -- and maybe it's better suited to R or SQL (likely not), but not sure.
I believe this problem is sometimes referred to as 'bag of words' or 'term frequency saturation'.
You could create a hash table from your words like so (abbreviated) :
let wordRanks = {'hate':-100,'love':100,'ok':10};
Then have a string like this and split it into words.
let str = `I hate love it's just ok`;
let words = str.split(' ');
Then you can iterate through the words and get a score :
let commentScore = 0;
words.forEach(function(word){
if(wordRanks[word]){
commentScore += parseInt(wordRanks[word])
}
});
console.log(commentScore); //should be 10
Using a hash table shouldn't be computationally expensive for the lookup. Should work, although you may have to split the words better to remove trailing punctuation, as I had a comma after love in my initial code and it gave the wrong result because there was no hash table match for 'love,'
I'd definitely go with Python's Natural Language Toolkit (NLTK) it comes with a set of functions that will make your life easier, like text frequencies, remove duplicates, remove of stop words, find synonyms, etc., the idea being reducing the size of your text as much as possible to do the sentiment analysis.
In a similar project my approach was:
Remove neutral words, pronouns, prepositions, determiners, names, etc.
Remove duplicates.
Check for word synonyms as I progressed into the text and remove them from the rest of the text.
Dynamically create a sentiment threshold score for a paragraph, so once it reached that score I'd stop working on that paragraph and move on to the next one, the same for the text in overall.
Hope this works!

Basic search functionality with JavaScript

I'm looking for a basic search functionality with JavaScript.
The Scenario: The user enters a single or multiple words, and hit a button. JavaScript looks up in an array of strings for items that probably relates to the entered search sentence.
The only function I'm aware of right now is "string.search" which returns the position of the string you are searching for. This is good, but not for all cases. Here are a few examples the search function should cover
Let's assume I have the following string: "This is a good day" in my array. The following search terms should return true when tested against my search function.
Search Term 1: This a good day
Search Term 2: This day
Search Term 3: This was good
Search Term 4: good dy -the user made a typo-
So nothing particular or specific. Just a basic search functionality that predicts (at a low level, and language agnostic) if the search term related to the strings in the tested array.
Was the last a typo for 'day'?
If not, you could simply split the search sentence, as well as the original string using the split() function.
Then you would iterate over the search words and for each make sure they appear in the source string. As soon as you don't find the word, you stop the search.
This is assuming that all the search words should be AND'ed, not OR'ed.
Does that help?
I guess what you are looking for is a pattern matching based live search similar to finite-state-automata-like (FSA) searching:
This link shows an example that'll allow you to search case-insensitively:
Example: Array contains 'This is a good day'
Searching for any (or all) of the following is valid:
THis a Day
Thagd (Th is a g oo d day)
good dy -intended typo-
etc.
A case-sensitive (albeit not perfect FSA based) version can be found here There is also one by John Resig but I don't have a link to his demo but it'd be worth looking at - it's a javascript/jquery port of the first link I mentioned.
Hope this helps!
This is not as simple as one would think. We're talking fuzzy matching and Levenshtein distance / algorithm.
See this past question:
Getting the closest string match

jQuery, JavaScript auto complete concept, but not

Ok, this is a multipart concept. However I'm sure if I can figure this piece out, the rest will follow.
What I have is an array of Words and Phrases. And I have a TextArea where people can type in. What I want to do is be able to search the array for matches or similarities in what the user is typing. The closest thing I can think of is an auto complete function. But thats not entirely what I want, yes in part what I want is an auto completes functionality, but so much more in the end run that an existing auto complete is a bit bulky for my needs.
What my Aim is, is after the user hits the spacebar is to trigger the search as they type. Now up to this point I am good. My issue is my logic is flawed from here. I want to be able to take the entire boy of text up to the point of hitting the spacebar and check it against my array of words and phrases. But Im not sure how. Currently I am split() on the textarea itself where space is my split() delimiter, but I realize now that thats not right. What I was thinking initially was split it, check it against the other array and it would be a happy day if something matched, then I realized I have phrases, if I am trying to check a phrase for a match then I wont match one.
Well hopefully this makes sense. I need to walk through logic on this, there really isnt code currently, as I am not debugging, I am trying to figure out a logic to work with that works. So I can move forward.
UPDATE:
Check this fiddle: http://jsfiddle.net/VwNHN/
You will need to tweak it to your requirements, but it will give a fair idea of how the below logic can be implemented.
Well, the logic upon keypress (probably any key and not just spacebar) can be something like:
1) Get your current cursor position - say X
Refer for example: http://demo.vishalon.net/getset.htm
2) Get N characters to the left of X. i.e. a substring of the whole text from index X-N to X - store it in Y. You will need to fix on a value for N (for ex: 100). N is the longest word/phrase you are looking to match.
ex: if full text is "hello world i am a sentence", and cursor is at the end, and N is 10, Y would be "a sentence"
3) Split Y by space character and store each split in an array incrementally and then reverse it - lets call the array PHRASES
ex: if Y is "this is a sentence" - then PHRASES would be
[ "this is a sentence", "this is a", "this is", "this" ]
4) Check your array of words/phrases with each item in the PHRASES - the longest matching parts will come first and the short matching ones will come last - this set of matches is your auto-complete list.
I would split the problem at least into two branches:
search event triggered by user.
Search function
visualization of results
If I understood what you're trying to implement, I would trigger a search on any 'onkeypress' event, unless your array is not too big (otherwise it will hang on any keypression).
Then, the search function: you have to search in an array, so I would search element by element. Jquery provides a nice jQuery.each() function. Also, I would consider _.each(list, iterator, [context]) in the underscore plugin.
Visualization of results: it's not clear to me what you want to show (a grid, a table...?), but if every element of the array is associated to a different DOM object, then you could modify its properties runtime, maybe with jquery.
Let me know if you need more.

autocomplete search multiple parts of string, then returns the most likely ones

Kind of like this question
I have many text snippets that I use many, many, many times a day. I want to build something that can search a database/ preferably a javascript array full of sentence length strings, returning the most likely one. Most autocomplete returns things you type in the sequence you type them. I do not remember seeing what I describe.
For example; Say I have this item in my array:
"a yellow banana"
When I search for "a banana" it won't show me anything.
It only works when I typ "a yello" ... etc.
Is it possible to also return matches when multiple words are present in an item's name, but on different places?
So when I typ, for example, "fox quick dog" that it returns:
"The quick brown fox jumps over the lazy dog"
got the idea and question from actb.js
Thanks for helping me be more lazy
Maybe this could help: jQuery Quick Search

Categories

Resources