JavaScript webpage version comparison - javascript

In order to expedite our 'content update review process', which is used in approving web page content for publishing, I'm looking to implement a JavaScript function that will compare two webpage versions.
So far, I've created a page that will load the content to be compared from the new and old versions of a particular page. Is there a (relatively) simple way to iterate through the html of each using JavaScript/jQuery and highlight what content has changed or is missing?
Since there would be so many html-specific details (since this is essentially html text comare), is there a JavaScript library I can use?
I should add that my first would be to implement this in PHP. Unfortunately, we have many constraints that only permit us to use limited resources such as JavaScript.

Version Control is a non-trivial problem. It's probably not something you should implement from scratch, either, if this is part of your "content update review process."
Instead, consider using a tool like Subversion, Git, or your favorite source control solution.
If you really wanna do this, you can go from something as simple as Regex matching to DOM matching. There's no "magic library" that I'm aware of that will encapsulate this for you, so it'll be work. Work that you'll probably do wrong.
Seriously consider a version control provider, or use a CMS that has built in versioning of pages. If you're feeling squirrely, check out an open source CMS (like Drupal) and try to figure out how they implement versioning, then reverse engineer/re-engineer it yourself. I hope the inefficiency in that is obvious.

I would do this in 3 steps
1/ segment the content into 2 arrays
for each page
. choose a separator, like the "." or ""
. you have the content as a big string, split it and build an array
2/ compare the arrays
loop on these 2 arrays containing the segmented content, let's say A[idxA] and B[idxB]
. if A[idxA] == B[idxB] then idxA++ and idxB++
. else find if there is an index where A[idxA] == B[index]
. if there is, mark all indexes between idxB and index as "B modified"
. else, mark idxA as "A modified"
3/ display the differences
At the end you should have all the indexes where A and B are not equal. You can then join the 2 arrays after adding some markups to highlight the differences.
It is not a perfect solution, it will be wrong sometimes.. But not often if you choose your separator correctly. If you want it perfect, you will have to test several match and compute the number of differences in order to minimise it

Related

When should I split a JSON into smaller parts?

I want to use cmudict file in a web. It contains 170000 words with its phonetic transcription (in ARPAbet symbols).
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
I want to use it in JSON format, search any word introduced by the user and return an explanation of how to pronounce it syllabe by syllabe. The second part is not very complex in search terms as there are only 39 different phonemes, but the first one with the 170000 entries may consume too much time if the user introduces a text instead of a single word to transcript.
I wonder if it's worth to split the JSON into for example 26 parts (one per initial letter) and search only in the corresponding file.
Also I don't know if JSON is the best format for this, but I want to use it in a free blog like Tumblr or Blogger ones (or similar, the thing is that I don't want to spend money in this) and Javascript is what they support. I would listen suggestions on this too.
Well, that is tough call since you must consider download size. I would shorten the names of all your properties to be as small as possible, so instead of repeating "description" : "the short description", I would go with "sd" : "the short description". You are trying to use javascript to serve a data file, which is okay since you can rely on caching and what not, but the initial download size may be rather large. I would do something like var myDictionary = { }; at the top of the file, that way you can reference the variable since it is in the global space. It is an interesting experiment for sure.

Extract document extensions from clicks

I'm using this technique to extract the click events in my SharePoint site. It uses jquery and a regular expression to capture clicks and report them as events to google analytics.
I'm also just past total newbie with regex -- It is starting to make some sense to me, but I have a lot to learn still. So here goes.
I have a preapproved list of filetypes that I am interested in based on the site listed above.
var filetypes = /\.(zip|pdf|doc.*|xls.*|ppt.*|mp3|txt|wma|mov|avi|wmv|flv|wav|jpg)$/i;
But it isn't quite working like I need. With the $ I assume it is trying to match to the end of the line. But often in SharePoint we get links like this:
example.org/sharepoint/_layouts/15/wopiframe.aspx?sourcedoc=/sharepointlibrary/the%20document%20name.docx&action=default&defaultitemopen=1
The two problems I have are, I can't count on the file name being before the query or hash and I can't count on it being at the end. And all the different Microsoft Office extensions.
I found this thread on extracting extensions, but it doesn't seem to work correctly.
I've put together this version
var filetypes = \.(zip|pdf|doc|xls|ppt|mp3|txt|wma|mov|avi|wmv|flv|wav|jpg)[A-Za-z]*
I changed the office bits from doc.* to just plain doc and added the optional alpha character afterwards. And removed the $ end anchor. It seems to be working with my test sample, but I don't know if there are gotchas that I don't understand.
Does this seem like a good solution or is there a better way to get a predetermined list of extensions (including for example the Office varions like doc, docx, docm) that is either before the query string or might be one parameter in the query string?
I would go with the following which matches file name and extension:
/[^/]+\.(zip|pdf|doc[xm]?|xlsx?|ppt|mp3|txt|wma|mov|avi|wmv|flv|wav|jpg)/i
Outputs the%20document%20name.docx from you example.
There may be other formats that it might not work on but should get you what you want.

Javascript library to manage translation forms

Is anybody aware of any javascript tool (compatible with jQuery, tinymce or any other clientside library) able to manage the following requirements?
I need to show translation forms in which every field (either input or textarea) could contain some segment variables or code sections (mostly HTML).
For example:
"Hello {{firstname}}, this is your personal page."
or
"You improved your personal score of <strong>{{n}} points</strong>."
Of course I obtain these segments from a template parser and I need to show them to a set of translators that will perform localization towards many languages. I know that in many cases I can (and should!) avoid variables and code inside translation segments, but in many other cases I really can't.
The problem is: I would like to manage coherence about variables and code directly on the browser (I trust my translators but a bit more of UI/UX help is always a good thing!).
A nice approach could be providing the set of variables and code tags, ready to be inserted by means of a single click (in order to avoid mispelled variables or incorrect code syntax) and a bit of pre-submit validation to be sure everything was inserted.
I've seen this approach in other websites, such as Facebook or Freelancer.com (who have the power and the ability to reimplement the whole thing from scratch!).
Do you know about any almost-ready tool/library for this purpose?
Thank you all in advance for any suggestion.
If you are asking for a library to translate text - here is Google Translate API: https://developers.google.com/translate/?csw=1
If you are asking for a library which can take user input, perform validation, and insert into the DOM - then Jquery has everything you need.
If you are asking for something else, let me know and I'll edit my question.

Run Database Stored RegEx against DOM

I have a question about how to approach a certain scenario before I get halfway through it and figure out it was not the best option.
I work for a large company that has a team that creates tools for the team mates to use that aren’t official enterprise tools. We have no access to the database directly, just access to an internal server to store our files to run and be able to access the main site with javascript etc (same domain).
What I am working on is a tool that has a ton of options in it that allow you to select that I will call “data points” on a page.
There are things like “Account status, Balance, Name, Phone number, email etc” and have it save those to an excel sheet.
So you input account numbers, choose what you need and then using IE Objects it navigates to the page and scrapes data you request.
My question is as follows..
I want to make the scraping part pretty Dynamic in the way it works. I want to be able to add new datapoints on the fly.
My goal or idea is so store the regular expression needed to get the specific piece of data in the table with the “data point option”.
If I choose “Name” it knows the expression for name in the database to run again the DOM.
What would be the best way about creating that type of function in Javascript / Jquery?
I need to pass a Regex to a function, have it run against the DOM and then return the result.
I have a feeling that there will be things that require more than 1 step to get the information etc.
I am just trying to think of the best way to approach it without having to hardcode 200+ expressions into the file as the page may get updated and need to be changed.
Any ideas?
IRobotSoft scraper may be the tool you are looking for. Check this forum and see if questions are similar to what you are doing: http://irobotsoft.org/bb/YaBB.pl?board=newcomer. It is free.
What it uses is not regular expression but a language called HTQL, which may be more suitable for extracting web pages. It also supports regular expression, but not as the main language.
It organizes all your actions well with a visual interface, so you can dynamically compose actions or tasks for changing needs.

Fulltext search ignoring comments

I want fulltext search for my JavaScript code, but I'm usually not interested in matches from the comments.
How can I have fulltext search ignoring any commented match? Such a feature would increase my productivity as a programmer.
Also, how can I do the opposite: search within the comments only?
(I'm currently using Text Mate, but happy to change.)
See our Source Code Search Engine (SCSE). This tool indexes your code base using the langauge structure to guide the indexing; it can do so for many languages including JavaScript. Search queries are then stated in terms of abstract language tokens, e.g., to find identifiers involving the string "tax" multiplied by some constant, you'd write:
I=*tax* '*' N
This will search all indexed languages only for identifiers (in each language) following by a '*' token, followed by some kind of number. Because the tool understands language structure, it isn't confused by whitespace, formatting or interverning comments. Because it understands comments, you can search inside just comments (say, for authors):
C=*Author*
Given a query, the SCSE finds all the hits across the code base (possibly millions of lines), and offers these as set of choices; clicking on choice pulls up the file with the hit in the middle outlined where the match occurs.
If you insist on searching just raw text, the SCSE provides grep-style searches. If you have only a small set of files, this is still pretty fast. If you have a big set of files, this is a lot slower than language-structure based searches. In both cases, grep like searches get you more hits, usually at the cost of false positives (e.g., finding "tax" in a comment, or finding a variable named "Authorization_code"). But at least you have the choice.
While this doesn't operate from inside an editor, you can launch your editor (for most editors) on a file once you've found the hit you want.
Use ultraedit , It fully supports full text search ignoring comment or also within the comment search
How about NetBeans way (Find Symbol in the Navigate Menu),
It searches all variables,functions,objects etc.
Or you could customize JSLint and customize it if you want to integrate it in a web application or something like that.
I personnaly use Notepad++ wich is a great free code editor. It seems you need an editor supporting regular expression search (in one or many files). If you know Reg you can use powerfull search like in/out javascript comments...the work will be to build the right expression and test it with one file with all differents cases to be sure it will not miss things during real search, or maybe you can google for 'javascript comments regular expression' or something like...
Then must have a look at Notepad++ plugins, one is 'RegEx Helper' wich helps for building regular expressions.

Categories

Resources