Fulltext search ignoring comments - javascript

I want fulltext search for my JavaScript code, but I'm usually not interested in matches from the comments.
How can I have fulltext search ignoring any commented match? Such a feature would increase my productivity as a programmer.
Also, how can I do the opposite: search within the comments only?
(I'm currently using Text Mate, but happy to change.)

See our Source Code Search Engine (SCSE). This tool indexes your code base using the langauge structure to guide the indexing; it can do so for many languages including JavaScript. Search queries are then stated in terms of abstract language tokens, e.g., to find identifiers involving the string "tax" multiplied by some constant, you'd write:
I=*tax* '*' N
This will search all indexed languages only for identifiers (in each language) following by a '*' token, followed by some kind of number. Because the tool understands language structure, it isn't confused by whitespace, formatting or interverning comments. Because it understands comments, you can search inside just comments (say, for authors):
C=*Author*
Given a query, the SCSE finds all the hits across the code base (possibly millions of lines), and offers these as set of choices; clicking on choice pulls up the file with the hit in the middle outlined where the match occurs.
If you insist on searching just raw text, the SCSE provides grep-style searches. If you have only a small set of files, this is still pretty fast. If you have a big set of files, this is a lot slower than language-structure based searches. In both cases, grep like searches get you more hits, usually at the cost of false positives (e.g., finding "tax" in a comment, or finding a variable named "Authorization_code"). But at least you have the choice.
While this doesn't operate from inside an editor, you can launch your editor (for most editors) on a file once you've found the hit you want.

Use ultraedit , It fully supports full text search ignoring comment or also within the comment search

How about NetBeans way (Find Symbol in the Navigate Menu),
It searches all variables,functions,objects etc.
Or you could customize JSLint and customize it if you want to integrate it in a web application or something like that.

I personnaly use Notepad++ wich is a great free code editor. It seems you need an editor supporting regular expression search (in one or many files). If you know Reg you can use powerfull search like in/out javascript comments...the work will be to build the right expression and test it with one file with all differents cases to be sure it will not miss things during real search, or maybe you can google for 'javascript comments regular expression' or something like...
Then must have a look at Notepad++ plugins, one is 'RegEx Helper' wich helps for building regular expressions.

Related

Semi-obfuscate/uglify JavaScript

I know about JS minfiers, obfuscators and minifiers. I was wondering if there is any existing tool (or any fast-to-code solution) to partially obfuscate JavaScript. By partially I mean that it should become difficult to read, but not appear as uglified/minified. It should keep indentation, but lose comments, and partially change variable names, making them unclear without converting them to "a, b, c" like an obfuscator.
The purpose of this could be to take an explicit and reusable code and make it implicit and difficult to be reused by other people, without making it impossible to work with for yourself.
Any idea from where to start to achieve this ? Maybe editing an existing obfuscator ?
[This answer is a direct response to OP's request].
Semantic Designs JavaScript obfuscator will do what you want, but you'll need two passes.
On the first pass, run it as obfuscator; it will rename identifiers (although you can control how much or how that is done), strip whitepspace and comments. If you limit its ability to rename the identifiers, you lose some the strength of the obfuscator but that's your choice.
On the second pass, run it as a prettyprinter; it will introduce nice indentation again.
(In fact, the idea for obfsucation came from building a prettyprinter; if you can print-pretty, surely it is easy to print-ugly).
From the point of view of working with the code, you are better off working with your master copy any way you like, complete with your indentation and nice commentary as documentation. When you are ready to obfsucate, you run the obfuscator, shipping the obfuscated result. Errors reported in the obfuscated result that involve obfuscated names can be mapped back to the original names, using the map of obfuscated <--> original names produced during the obfuscation step.
This a product of my company. I'd provide a link but SO hates it when I do that, so you'll have to find it via my bio or googling.
PS: It works exactly as #georg suggests, by parsing to an AST, mangling, and prettyprinting. It doesn't use esprima.
I'm not aware of a tool that would meet your specific requirements, but it seems to be relatively easy to create, given that the vital parts already exist.
parse the source into an AST, using esprima or similar
manipulate the tree in the way you want (eg. remove comments, mangle identifiers etc)
rebuild the source from the tree using escodegen

Javascript library to manage translation forms

Is anybody aware of any javascript tool (compatible with jQuery, tinymce or any other clientside library) able to manage the following requirements?
I need to show translation forms in which every field (either input or textarea) could contain some segment variables or code sections (mostly HTML).
For example:
"Hello {{firstname}}, this is your personal page."
or
"You improved your personal score of <strong>{{n}} points</strong>."
Of course I obtain these segments from a template parser and I need to show them to a set of translators that will perform localization towards many languages. I know that in many cases I can (and should!) avoid variables and code inside translation segments, but in many other cases I really can't.
The problem is: I would like to manage coherence about variables and code directly on the browser (I trust my translators but a bit more of UI/UX help is always a good thing!).
A nice approach could be providing the set of variables and code tags, ready to be inserted by means of a single click (in order to avoid mispelled variables or incorrect code syntax) and a bit of pre-submit validation to be sure everything was inserted.
I've seen this approach in other websites, such as Facebook or Freelancer.com (who have the power and the ability to reimplement the whole thing from scratch!).
Do you know about any almost-ready tool/library for this purpose?
Thank you all in advance for any suggestion.
If you are asking for a library to translate text - here is Google Translate API: https://developers.google.com/translate/?csw=1
If you are asking for a library which can take user input, perform validation, and insert into the DOM - then Jquery has everything you need.
If you are asking for something else, let me know and I'll edit my question.

Generalizing XPaths

I would like seek your help for a problem I am trying to tackle involving XPaths.
I am trying to generalize multiple Xpaths provided by a user to get an XPath that would best 'fit' all the provided examples. This is for a web scraping system I am building.
Eg: If the user gives the following xpaths (each pointing to a link in the 'Spotlight' section from the Google News page)
Good examples:
/html/body/div[#id='page']/div/div[#id='main-wrapper']/div[#id='main']/div/div/div[3] /div[1]/table[#id='main-am2-pane']/tbody/tr/td[#id='rt-col']/div[3]/div[#id='s_en_us:ir']/div[2]/div[1]/div[2]/a[#id='MAE4AUgAUABgAmoCdXM']/span
/html/body/div[#id='page']/div/div[#id='main-wrapper']/div[#id='main']/div/div/div[3]/div[1]/table[#id='main-am2-pane']/tbody/tr/td[#id='rt-col']/div[3]/div[#id='s_en_us:ir']/div[2]/div[6]/div[2]/a[#id='MAE4AUgFUABgAmoCdXM']/span
/html/body/div[#id='page']/div/div[#id='main-wrapper']/div[#id='main']/div/div/div[3]/div[1]/table[#id='main-am2-pane']/tbody/tr/td[#id='rt-col']/div[3]/div[#id='s_en_us:ir']/div[2]/div[12]/div[2]/a[#id='MAE4AUgLUABgAmoCdXM']/span
Bad Examples: (pointing to a link in another section)
/html/body/div[#id='page']/div/div[#id='main-wrapper']/div[#id='main']/div/div/div[3]/div[1]/table[#id='main-am2-pane']/tbody/tr/td[#id='lt-col']/div[2]/div[#id='replaceable-section-blended']/div[1]/div[4]/div/h2/a[#id='MAA4AEgFUABgAWoCdXM']/span
It should be able to generalize and produce an xpath expression that would select all the links in the 'Spotlight' section. (It should be able to throw out the incorrect xpath given)
Generalized XPath
/html/body/div[#id='page']/div/div[#id='main-wrapper']/div[#id='main']/div/div/div[3]/div[1]/table[#id='main-am2-pane']/tbody/tr/td[#id='rt-col']/div[3]/div[#id='s_en_us:ir']/div[2]/div/div[2]/a[#id='MAE4AUgLUABgAmoCdXM']/span
Could you kindly advice me on how to go about it. I was thinking of using the Longest Common Substring strategy but however that would over-generalize if a bad example is given (like the fourth example given) Are there any libraries or any open source software that has been done in this area?
I saw some similar posts (finding common ancestor from a group of xpath? and Howto find the first common XPath ancestor in Javascript?) However they are talking about longest common ancestor.
I am writing it in Javascript as a form of a firefox extension.
Thanks for your time and any help would be greatly appreciated!
The question here is in Automaton minimization problem. So you have (Xpath1|Xpath2|Xpath3) and you would like to get minimal automaton Xpath4 which match same nodes. THere are also question about minimization with information lose or not, like JPEG. For exact minimization you could google "Algorithms for Minimization of Finite-State Automata".
Ok, the simplest way is finding common subsequence, after converting each Xpath operator to character and run character based substring finder from list of string. So we have for example
adcba, acba, adba --common substring--> aba --general reg exp--> a.*b.*a --convert back to xpath--> ...
You can also try to set something less general in place of .*

JavaScript webpage version comparison

In order to expedite our 'content update review process', which is used in approving web page content for publishing, I'm looking to implement a JavaScript function that will compare two webpage versions.
So far, I've created a page that will load the content to be compared from the new and old versions of a particular page. Is there a (relatively) simple way to iterate through the html of each using JavaScript/jQuery and highlight what content has changed or is missing?
Since there would be so many html-specific details (since this is essentially html text comare), is there a JavaScript library I can use?
I should add that my first would be to implement this in PHP. Unfortunately, we have many constraints that only permit us to use limited resources such as JavaScript.
Version Control is a non-trivial problem. It's probably not something you should implement from scratch, either, if this is part of your "content update review process."
Instead, consider using a tool like Subversion, Git, or your favorite source control solution.
If you really wanna do this, you can go from something as simple as Regex matching to DOM matching. There's no "magic library" that I'm aware of that will encapsulate this for you, so it'll be work. Work that you'll probably do wrong.
Seriously consider a version control provider, or use a CMS that has built in versioning of pages. If you're feeling squirrely, check out an open source CMS (like Drupal) and try to figure out how they implement versioning, then reverse engineer/re-engineer it yourself. I hope the inefficiency in that is obvious.
I would do this in 3 steps
1/ segment the content into 2 arrays
for each page
. choose a separator, like the "." or ""
. you have the content as a big string, split it and build an array
2/ compare the arrays
loop on these 2 arrays containing the segmented content, let's say A[idxA] and B[idxB]
. if A[idxA] == B[idxB] then idxA++ and idxB++
. else find if there is an index where A[idxA] == B[index]
. if there is, mark all indexes between idxB and index as "B modified"
. else, mark idxA as "A modified"
3/ display the differences
At the end you should have all the indexes where A and B are not equal. You can then join the 2 arrays after adding some markups to highlight the differences.
It is not a perfect solution, it will be wrong sometimes.. But not often if you choose your separator correctly. If you want it perfect, you will have to test several match and compute the number of differences in order to minimise it

wikionary API - meaning of words

I would like get meaning of selected word using wikionary API.
Content retrieve data should be the same as is presented in "Word of the day", only the basic meaning without etympology, Synonyms etc..
for example
"postiche n
Any item of false hair worn on the head or face, such as a false beard or wig."
I tried use documentation but i can find similar example, can anybody help with this problem?
Although MediaWiki has an API (api.php), it might be easiest for your purposes to just use the action=raw parameter to index.php if you just want to retrieve the source code of one revision (not wrapped in XML, JSON, etc., as opposed to the API).
For example, this is the raw word of the day page for November 14:
http://en.wiktionary.org/w/index.php?title=Wiktionary:Word_of_the_day/November_14&action=raw
What's unfortunate is that the format of wiki pages focuses on presentation (for the human reader) rather than on semantics (for the machine), so you should not be surprised that there is no "get word definition" API command. Instead, your script will have to make sense of the numerous text formatting templates that Wiktionary editors have created and used, as well as complex presentational formatting syntax, including headings, unordered lists, and others. For example, here is the source code for the page "overflow":
http://en.wiktionary.org/w/index.php?title=overflow&action=raw
There is a "generate XML parse tree" option in the API, but it doesn't break much of the presentational formatting into XML. Just see for yourself:
http://en.wiktionary.org/w/api.php?action=query&titles=overflow&prop=revisions&rvprop=content&rvgeneratexml=&format=jsonfm
In case you are wondering whether there exists a parser for MediaWiki-format pages other than MediaWiki, no, there isn't. At least not anything written in JavaScript that's currently maintained (see list of alternative parsers, and check the web sites of the two listed ones). And even then, supporting most/all of the common templates will be a big challenge. Good luck.
OK, I admit defeat.
There are some files relating to Wiktionary in Pywikipediabot and I looking at the code, it does look like you should be able to get it to parse meaning/definition fields for you.
However the last half an hour has convinced me otherwise. The code is not well written and I wonder if it has ever worked.
So I defer to idealmachine's answer, but I thought I would post this to save anyone else from making the same mistakes. :)
As mentioned earlier, the content of the Wiktionary pages is in human-readable format, wikitext, so MediaWiki API doesn't allow to get word meaning because the data is not structured.
However, each page follows specific convention, so it's not that hard to extract the meanings from the wikitext. Also, there're some APIs, like Wordnik or Lingua Robot that parse Wiktionary content and provide it in JSON format.
MediaWiki does have an API but it's low-level and has no support for anything specific to each wiki. For instance it has no encyclopedia support for Wikipedia and no dictionary support for Wiktionary. You can retrieve the raw wikitext markup of a page or a section using the API but you will have to parse it yourself.
The first caveat is that each Wiktionary has evolved its own format but I assume you are only interested in the English Wiktionary. One cheap trick many tools use is to get the first line which begins with the '#' character. This will usually be the text of the definition of the first sense of the first homonym.
Another caveat is that every Wiktionary uses many wiki templates so if you are looking at the raw text you will see plenty of these. The only way to reliably expand these templates is by calling the API with action=parse.

Categories

Resources