If we obfuscate code, how do we then debug and modify it? - javascript

I came to know that "obscure" the code - make it less readable, but will still execute.
It replaces symbol names with non-meaningful one
Replaces numeric constants with expressions
Replaces characters in strings with their hex escapes
So If we Obfuscate the Code , then if something goes wrong in production & how we can fix it ?
If we want to do modification how we can go around with it?

You fix the problem in the original code and then run it through the obfuscater again.

You do not debug with obfuscated code. You should turn off the obfuscation on development environments, since it is only needed for production.

Quentin has it right: don't throw away the original source, and then you can debug the problem there. (Likewise for "modifications" to the code).
A bit trickier is, how do you diagnose the problem, when it occurs while running the obfuscated version? I don't know how others do this. However, our tools provide an invertible mapping from the original source to the obfuscated version as an additional output from the obfuscation process. When a problem is found in the running code, if the obfuscated identifiers "nearby" the problem (e.g., in the offending statement/function/... or in a crash call backtrace) can be captured, the inverse map can be used to produce corresponding unobfuscated identifiers, which can then be used to locate the offending construct in the original source code.

While I of course agree with the above posters, they're kind of negating the question, not answering it.
Anyway, if you use Chrome, you can open the dev tools ( Right click anywhere>inspect element> sources tab). Find the obfuscated code on the file explorer on the left, then click the two curly braces on the bottom.
Once you do that, the variable names are still obfuscated, but at least it's formatted properly. This is extremely useful for when you're worried that the minification process itself is breaking something.


Spaces in equal signs

I'm just wondering is there a difference in performance using removing spaces before and after equal signs. Like this two code snippets.
int i = 0;
int i=0;
I'm using the first one, but my friend who is learning html/javascript told me that my coding is inefficient. Is it true in html/javascript? And is it a huge bump in the performance? Will it also be same in c++/c# and other programming languages? And about the indent, he said 3 spaces is better that tab. But I already used to code like this. So I just want to know if he is correct.
Your friend is a bit misguided.
The extra spaces in the code will make a small difference in the size of the JS file which could make a small difference in the download speed, though I'd be surprised if it was noticeable or meaningful.
The extra spaces are unlikely to make a meaningful difference in the time to parse the file.
Once the file is parsed, the extra spaces will not make any difference in execution speed since they are not part of the parsed code.
If you really want to optimize download or parse speed, the way to do that is to write your code in the most readable fashion possible for best maintainability and then use a minimizer for the deployed code and this is a standard practice by many web sites. This will give you the best of both worlds - maintainable, readable code and minimum deployed size.
A minimizer will remove all unnecessary spacing, shorten the names of variables, remove comments, collapse lines, etc... all designed to make the deployed code as small as possible without changing the run-time meaning of the code at all.
C++ is a compiled language. As such, only the compiler that the developer uses sees any extra spaces (same with comments). Those spaces are gone once the code has been compiled into native code which is what the end-user gets and runs. So, issues about spaces between elements in a line are simply not applicable at all for C++.
Javascript is an interpreted language. That means the source code is downloaded to the browser and the browser then parses the code at runtime into some opcode form that the interpreter can run. The spaces in Javascript will be part of the downloaded code (if you don't use a minimizer to remove them), but once the code is parsed, those extra spaces are not part of the run-time performance of the code. Thus, the spaces could have a small influence on the download time and perhaps an even smaller influence on the parse time (though I'm guessing unlikely to be measurable or meaningful). As I said above, the way to optimize this for Javascript is to use spaces to enhance readability in the source code and then run a minimizer over the code to generate a deployed version of the code to minimize the deployed size of the file. This preserves maximum readability and minimizes download size.
There is little (javascript) to no (c#, c++, Java) difference in performance. In the compiled languages in particular, the source code compiles to the exact same machine code.
Using spaces instead of tabs can be a good idea, but not because of performance. Rather, if you aren't careful, use of tabs can result in "tab rot", where there are tabs in some places and spaces in others, and the indentation of the source code depends on your tab settings, making it hard to read.

finding the meaning of the obfuscated javascript

i saw this piece of code in an obfuscated javascript :
if(1s Q.is.ep=='a')
do you have any idea what this might mean? Im quite confused about the space..
thanks :)
The code looks like generated by Dean Edwards' packer (or another similar one). You could unpack it with this tool.
It's indeed JavaScript, however replaced keywords, method, variables with meaningless strings. The bottom half of the file you provided is actually a mapper between obscured and original.
And this, it the power of eval (and don't use eval if by all means you could do without it).

Semi-obfuscate/uglify JavaScript

I know about JS minfiers, obfuscators and minifiers. I was wondering if there is any existing tool (or any fast-to-code solution) to partially obfuscate JavaScript. By partially I mean that it should become difficult to read, but not appear as uglified/minified. It should keep indentation, but lose comments, and partially change variable names, making them unclear without converting them to "a, b, c" like an obfuscator.
The purpose of this could be to take an explicit and reusable code and make it implicit and difficult to be reused by other people, without making it impossible to work with for yourself.
Any idea from where to start to achieve this ? Maybe editing an existing obfuscator ?
[This answer is a direct response to OP's request].
Semantic Designs JavaScript obfuscator will do what you want, but you'll need two passes.
On the first pass, run it as obfuscator; it will rename identifiers (although you can control how much or how that is done), strip whitepspace and comments. If you limit its ability to rename the identifiers, you lose some the strength of the obfuscator but that's your choice.
On the second pass, run it as a prettyprinter; it will introduce nice indentation again.
(In fact, the idea for obfsucation came from building a prettyprinter; if you can print-pretty, surely it is easy to print-ugly).
From the point of view of working with the code, you are better off working with your master copy any way you like, complete with your indentation and nice commentary as documentation. When you are ready to obfsucate, you run the obfuscator, shipping the obfuscated result. Errors reported in the obfuscated result that involve obfuscated names can be mapped back to the original names, using the map of obfuscated <--> original names produced during the obfuscation step.
This a product of my company. I'd provide a link but SO hates it when I do that, so you'll have to find it via my bio or googling.
PS: It works exactly as #georg suggests, by parsing to an AST, mangling, and prettyprinting. It doesn't use esprima.
I'm not aware of a tool that would meet your specific requirements, but it seems to be relatively easy to create, given that the vital parts already exist.
parse the source into an AST, using esprima or similar
manipulate the tree in the way you want (eg. remove comments, mangle identifiers etc)
rebuild the source from the tree using escodegen

Finding comments in HTML

I have an HTML file and within it there may be Javascript, PHP and all this stuff people may or may not put into their HTML file.
I want to extract all comments from this html file.
I can point out two problems in doing this:
What is a comment in one language may not be a comment in another.
In Javascript, remainder of lines are commented out using the // marker. But URLs also contain // within them and I therefore may well eliminate parts of URLs if I
just apply substituting // and then the
remainder of the line, with nothing.
So this is not a trivial problem.
Is there anywhere some solution for this already available?
Has anybody already done this?
Problem 2: Isn't every url quoted, with either "www.url.com" or 'www.url.com', when you write it in either language? I'm not sure. If that's the case then all you haft to do is to parse the code and check if there's any quote marks preceding the backslashes to know if it's a real url or just a comment.
Look into parser generators like ANTLR which has grammars for many languages and write a nesting parser to reliably find comments. Regular expressions aren't going to help you if accuracy is important. Even then, it won't be 100% accurate.
Problem 3, a comment in a language is not always a comment in a language.
<textarea><!-- not a comment --></textarea>
<script>var re = /[/*]not a comment[*/]/, str = "//not a comment";</script>
Problem 4, a comment embedded in a language may not obviously be a comment.
<button onclick="// this is a comment//
Problem 5, what is a comment may depend on how the browser is configured.
<noscript><!-- </noscript> Whether this is a comment depends on whether JS is turned on -->
<!--[if IE 8]>This is a comment, except on IE 8<![endif]-->
I had to solve this problem partially for contextual templating systems that elide comments from source code to prevent leaking software implementation details.
https://github.com/mikesamuel/html-contextual-autoescaper-java/blob/master/src/tests/com/google/autoesc/HTMLEscapingWriterTest.java#L1146 shows a testcase where a comment is identified in JavaScript, and later testcases show comments identified in CSS and HTML. You may be able to adapt that code to find comments. It will not handle comments in PHP code sections.
It seems from your word that you are pondering some approach based on regular expressions: it is a pain to do so on the whole file, try to use some tools to highlight or to discard interesting or uninteresting text and then work on what is left from your sieve according to the keep/discard criteria. Have a look at HTML::Tree and TreeBuilder, it could be very useful to deal with the HTML markup.
I would convert the HTML file into a character array and parse it. You can detect key strings like "<", "--" ,"www", "http", as you move forward and either skip or delete those segments.
The start/end indices will have to be identified properly, which is a challenge but you will have full power.
There are also other ways to simplify the process if performance is not a problem. For example, all tags can be grabbed with XML::Twig and the string can be parsed to detect JS comments.

What is a good stand-alone JavaScript formatter for fixing missing semicolons?

I'm trying to retrofit/fix lots of legacy web code and unfortunately most of it is poorly formatted JavaScript. I'm looking for a batch/scriptable utility that can fix JavaScript that is missing simicolons at the end of executable statements.
I've tried the beautify-cl.js script with Rhino but that does not does not add semicolons. In addition, I have tried JSTidy thinking I could modify it to be scriptable, but it strips all comments. Considering we have something like 2000-3000 files, any solution has to be scriptable.
The following topics were referenced, however none of the solutions were sufficient for various reasons:
Javascript Beautifier - Doesn't handle semicolon
Best source code formatter for Javascript? - Not scriptable
Any ideas/solutions? Thanks in advance.
I've found a winning combination in js-beautify and Google's Closure Linter:
# legacy.js is your poorly formatted JavaScript file, and will be modified in-place
js-beautify -s 2 -n -b end-expand -x -r legacy.js && fixjsstyle legacy.js
Explanation of js-beautify options:
-s 2: indent with two spaces
-n: ensure newline at end of file
-b end-expand: puts { braces at the end of the line, but always gives } braces their own line.
-x: unescape \xNN-escaped characters in strings
-r: make changes in-place
fixjsstyle, which is installed with the Closure Linter suite, makes changes in-place by default.
This pipeline retains comments (!), indents everything (mostly) how I like, adds semicolons where appropriate, and even changes double quotes to single quotes where feasible. Both commands can be given a list of files (e.g., **/*.js), instead of just one.
To install the required packages on Mac OS X:
npm install -g js-beautify
brew install closure-linter
Obviously you'll need to do this if you want to minify the files on deployment. Missing semicolons are probably the #1 reason JS files don't minify properly, so I understand your motivation.
Write a little Python (or whatever) script to run the file through jslint, then use the output from jslint to see which lines need semicolons, then spin through the js source and add them.
I think you can be fairly fearless here, because JavaScript implicitly adds the semicolons anyway.
Update: This set of tools may be what you are looking for. The "format" tab offers missing semicolon insertion.
If you use JavaScript Utlity V2 at http://jsutility.pjoneil.net and use the formatting function, it will automatically replace missing semicolons.
In addition, if you use the compaction function, it will also replace missing semicolons so that the compaction will not cause any errors.
You shouldn't be worried about doing a mass update on a lot of legacy code for the sole purpose of inserting semi colons. That's a classic case of "doing it wrong".
How would you test the results?
How would you ensure no "functionality" (as a side effect of a bug caused by a semi colon being missing) isn't lost?
What do you think adding semi colons to all these files will get you? Beside larger files (I'm not knocking the use of semicolons) and massive amounts of untested code changes?
As gumbo said, use jslint. I would use it on the files as you edit them in your day to day work. As you edit these files, presumably you will be testing changes to the file at that time. That would be the most ideal time to go crazy with semi colon insertion.
Also, if you're concerned about keeping 2000-3000 legacy javascript files alive and supported, you've got far bigger problems than semi colons
If http://jsutility.pjoneil.net is throwing too much errors (and be unable to format it), you may try to compress it with: http://refresh-sf.com/yui/ (which will add missing semicolons) and then go back to pjoneil.net formatter to obtain the pretty code with semicolons.
If you are using Visual Studio Code then Prettier Formatter is the way to go:
You simply install it and then on the keyboard shortcut to Format Document, the js file is reformatted but also any missing semicolons are automatically filled.

