Can google closure generated code with advanced settings be reversed engineered? - javascript

Does anybody know if JavaScript code that was generated by Google closure with advanced settings be reversed engineered?
Google closure renames most of the js variable and function names so I curious to know if it's a good option for protecting code from being stolen.
Thanks

Yes, it can easily be reversed. Google Closure is a code optimizer which is almost the opposite of code obfuscation. That is not always obvious to everyone because some of the things it does, actually obfuscate a little bit your code.
I talking about the variable and function renaming, whitespace and comment removal.
But it is not meant to protect. For instance, the new names it generates are always the same (deterministic). And the reason is that it only renames variables and functions to replace them with short ones. It does not care for protection. It does not have anything to change the control flow, which is what you'll want if you seek protection. It does not have anything to obfuscate strings and other literals as well. On the contrary, it uses techniques such as Constant Folding and Constant Propagation which actually make your code simpler and easier to follow.
For protection I really think your best option is JScrambler. They have good range of source code transformations and code traps that are meant for protection.

Yes it can be reversed, however, if your code in ADVANCED_OPTIMIZATIONS has a lot of inlining to it, it can be much harder to reverse engineer. Secondly, since there is a lot of constant folding, it makes the code much harder to read, ie:
/**
* #type {number}
* #const
*/
var FLAG_A=0x4FFFFFF0;
/**
* #type {number}
* #const
*/
var FLAG_B=0x01;
/** #type {number} */
var flags=FLAG_A|FLAG_B;
console.log(flags,(flags&FLAG_A)>0);
outputs to:
console.log(1342177265,!0);
This makes the actual code much harder to read and figure out what it's doing. The less you rely on inheritance and strings, and move towards bit flags -- will make the reverse engineered code work, but make it much harder to change or extend.

By using Javascript beautifier, you can convert code to some extend readble but not completly reverse engineered.

Related

Type expressions in JavaScript

I have seen Google using a lot of Type Expressions in various APIs. Can someone explain to me what good they do? Are they only there to highlight/colorcode a certain functions, or ease readability? Or do they serve some actual purpose?
I'm a bit confused as they're used as together with commented-out code but differ from regular comments with /** #type */ instead of /* #type */ (which do not color-code the comment)
Could someone give me the how's and why's of Type Expressions?
You will see these in a lot of places other than Google. They can be used for Closure Compiler to improve optimization and also provide a bit of type safety by warning when there are type errors.
But more generally these are JSDoc tags, which allow you to annotate and describe javascript code. The tags can document what types of data you code expects to receive and what will be returned. This can then be used to automatically produce documentation about your code and can also be used by text editors and IDEs to give feedback while working. Many popular editors support this such as Sublime Text and Visual Studio.
Lots of information here: http://usejsdoc.org/index.html

Semi-obfuscate/uglify JavaScript

I know about JS minfiers, obfuscators and minifiers. I was wondering if there is any existing tool (or any fast-to-code solution) to partially obfuscate JavaScript. By partially I mean that it should become difficult to read, but not appear as uglified/minified. It should keep indentation, but lose comments, and partially change variable names, making them unclear without converting them to "a, b, c" like an obfuscator.
The purpose of this could be to take an explicit and reusable code and make it implicit and difficult to be reused by other people, without making it impossible to work with for yourself.
Any idea from where to start to achieve this ? Maybe editing an existing obfuscator ?
[This answer is a direct response to OP's request].
Semantic Designs JavaScript obfuscator will do what you want, but you'll need two passes.
On the first pass, run it as obfuscator; it will rename identifiers (although you can control how much or how that is done), strip whitepspace and comments. If you limit its ability to rename the identifiers, you lose some the strength of the obfuscator but that's your choice.
On the second pass, run it as a prettyprinter; it will introduce nice indentation again.
(In fact, the idea for obfsucation came from building a prettyprinter; if you can print-pretty, surely it is easy to print-ugly).
From the point of view of working with the code, you are better off working with your master copy any way you like, complete with your indentation and nice commentary as documentation. When you are ready to obfsucate, you run the obfuscator, shipping the obfuscated result. Errors reported in the obfuscated result that involve obfuscated names can be mapped back to the original names, using the map of obfuscated <--> original names produced during the obfuscation step.
This a product of my company. I'd provide a link but SO hates it when I do that, so you'll have to find it via my bio or googling.
PS: It works exactly as #georg suggests, by parsing to an AST, mangling, and prettyprinting. It doesn't use esprima.
I'm not aware of a tool that would meet your specific requirements, but it seems to be relatively easy to create, given that the vital parts already exist.
parse the source into an AST, using esprima or similar
manipulate the tree in the way you want (eg. remove comments, mangle identifiers etc)
rebuild the source from the tree using escodegen

How to obfuscate variable names in JavaScript?

I'm an artist that's written a simple game in Javascript. Yah! But go easy on me because I bruise like a peach!
I'm looking into difficult to cheat at the game. So code obfuscation will make it difficult to cheat, right? Difficult, not impossible. I realise that, and could accidentally open a can of worms here...
Essentially, I'm Looking for an online tool that renames variables; and don't say search and replace in textpad :).
For example using http://packer.50x.eu/ on one line of code
var loopCounter = 0;
we get the result:
eval(function(p,a,c,k,e,d){e=function(c){return c};if(!''.replace(/^/,String)){while(c--){d[c]=k[c]||c}k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c])}}return p}('1 2=0;',3,3,'|var|loopCounter'.split('|'),0,{}))
The above looks like a mess, which is great; but it's quite easy to pick out English words like loopCounter. I would have expected it to make variable names obscure (single letter? words without nouns? look very similar?? Or should have that been my task anyway as part of writing the code. Or is this a waste of time trying to make variable names since a variable declaration is preceded by var and therefore there's no point to disguise it?
After a lot of searching (and links to the above) I found this which allows obfuscated string variables. And that is what I was after.
there are a few online tools available for this: javascript compressor and then theres javascript minifier that you can use for large images also. otherwise you could just google some offline tools, pretty sure they're easy to find
You could use the Javascript Obfuscator... your code will be very difficult to decode!
Hope it helps! ^_^

Coffeescript Static Analysis / Static Typechecking - Roadblocks

I think Coffeescript is an awesome language! I was looking for some projects / issues / features that add Static Analysis to Coffeescript. However after some searching I found that the Coffeescript faq and this page suggest that static analysis might not be viable.
I was wondering that if there is a fundamental issue in implementing static analysis / static type checking in Coffeescript, because of which something of this sort does not already exist in the compiler?
Also, is it something that is not possible to do for non trivial checks but might work only for straightforward analysis? When I say straightforward I mean checking for trivial stuff like, whether the user has defined a function twice by the same name (in a class) or on the top level (or perhaps on the top level in a collection of related .coffee files).
I would appreciate if anyone could please point out some examples that show why implementing static analysis / type checking is not straightforward / possible / worth spending time on?
Thank you very much!
This answer is a bit of a brain dump since I'm interested in this also. Hope it helps.
I use the Google Closure Compiler to statically analyze the code that CoffeeScript generates. It has a really good static analyzer, and I'm not sure if there's a good reason to reinvent the wheel here. The easy way is to just write the annotations by hand:
###*
* #param {number} x
* #param {number} y
* #return {number}
###
adder = (x, y) -> x + y
It's a bit verbose, but on the other hand you're borrowing the static analysis abilities of the closure compiler which is really powerful and is able to check a lot. I actually write type annotations in a slightly more concise way, then have a script to rewrite the coffee file. My code ends up looking like this:
#! {number} x {number} y #return {number}
adder = (x, y) -> x + y
I'm sure you can see that the rewriter is pretty straightforward.
A quick note before I move on. Be sure to compile your code with -b (bare) if you're running it through the closure compiler. The closure compiler is pretty good, but it's not smart enough to do data flow analysis. CoffeeScript wraps your code in an anonymous function by default, which will trip up the compiler.
Another option along the same path (this would break compatibility with CoffeeScript, but would be a lot cooler) would be to have the Coffee compiler compile something like this:
adder = (number x, number y): number -> x + y
into JS like this:
/***
* #param {number} x
* #param {number} y
* #return {number
*/
var adder = function(x, y) {
return x + y;
};
which could then be fed into the closure compiler on a compile - if there were no errors the compiler could then strip all the comments out.
Indeed, this guy appeared to be doing exactly this. Sadly, his work seems to be in an incomplete state.
In all of these cases, we defer the hard work - static typechecking - to the closure compiler. If you don't want to do this, I'd understand, but it'd be tough to convince me that it's worthwhile to build a whole new static analysis tool from scratch. :)
EDIT a year later: I just use typescript these days. :)
I'm not an expert in CoffeeScript, so this might be the completely wrong answer, but it basically boils down to this: CoffeeScript is a very expressive language, with most of the semantics dynamically determined (and possibly strange edge cases). This is in much contrast to languages like Standard ML, which have a much more strictly defined semantics. In general, doing static analysis on higher order languages is considered very hard. I.e., static analysis on real higher order programs (Haskell, ML, especially javascript because of things like eval) is just hard because the flow of control is much more flexible. In fact, the static analysis solutions for higher order languages have really only been explored within the past twenty years or so. (Notably, see Matt Might's article on a tutorial style description of CFA.)
Basically, the reasons are this:
To do analysis, you have to deal with the problem of the expressive semantics coming form the flow control you get by slamming around higher order functions.
To do typing, typically these languages have a much richer set of types that are available. For example, there are very typically situations where if you try to assign a static type to a variable in Ruby (as in C, Java, ML, etc...) you get an error, but because certain paths of your program are never executed, it's all fine. Along with that, languages like Ruby others add a plethora of implicit type conversions that are really used to do cool programming. The notable work in this area with which I'm familar (dynamic analysis of static types for Ruby) comes from some of the people I work with, but there are certainly other examples.
Basically, the language is used in a much more dynamic way, with a much more expressive semantics, and reasoning about that statically is much harder, and prone to be imprecise. The basic front of approaching this (these days) is starting to look hybrid: you can statically analyze part of a program, and than also require the programmer give some test cases to do some kind of refined analysis.
I hope this somewhat answers your questions, again, sorry I can't address directly the direct concerns of your question as it applies to CoffeeScript, but there's a lot of work going on in analyzing things like JavaScript right now. I'll note that some of the real problems with Javascript come from it's strange semantics, the prototypical inheritance is hard to reason about, and especially eval()! Typically program analyses for these languages impose certain restrictions (for example, throwing out eval completely!) to make the analysis more feasible!

lightweight javascript to javascript parser

How would I go about writing a lightweight javascript to javascript parser. Something simple that can convert some snippets of code.
I would like to basically make the internal scope objects in functions public.
So something like this
var outer = 42;
window.addEventListener('load', function() {
var inner = 42;
function magic() {
var in_magic = inner + outer;
console.log(in_magic);
}
magic();
}, false);
Would compile to
__Scope__.set('outer', 42);
__Scope__.set('console', console);
window.addEventListener('load', constructScopeWrapper(__Scope__, function(__Scope__) {
__Scope__.set('inner', 42);
__Scope__.set('magic',constructScopeWrapper(__Scope__, function _magic(__Scope__) {
__Scope__.set('in_magic', __Scope__.get('inner') + __Scope__.get('outer'));
__Scope__.get('console').log(__Scope__.get('in_magic'));
}));
__Scope__.get('magic')();
}), false);
Demonstation Example
Motivation behind this is to serialize the state of functions and closures and keep them synchronized across different machines (client, server, multiple servers). For this I would need a representation of [[Scope]]
Questions:
Can I do this kind of compiler without writing a full JavaScript -> (slightly different) JavaScript compiler?
How would I go about writing such a compiler?
Can I re-use existing js -> js compilers?
I don't think your task is easy or short given that you want to access and restore all the program state. One of the issues is that you might have to capture the program state at any moment during a computation, right? That means the example as shown isn't quite right; that captures state sort of before execution of that code (except that you've precomputed the sum that initializes magic, and that won't happen before the code runs for the original JavaScript). I assume you might want to capture the state at any instant during execution.
The way you've stated your problem, is you want a JavaScript parser in JavaScript.
I assume you are imagining that your existing JavaScript code J, includes such a JavaScript parser and whatever else is necessary to generate your resulting code G, and that when J starts up it feeds copies of itself to G, manufacturing the serialization code S and somehow loading that up.
(I think G is pretty big and hoary if it can handle all of Javascript)
So your JavaScript image contains J, big G, S and does an expensive operation (feed J to G) when it starts up.
What I think might serve you better is a tool G that processes your original JavaScript code J offline, and generates program state/closure serialization code S (to save and restore that state) that can be added to/replace J for execution. J+S are sent to the client, who never sees G or its execution. This decouples the generation of S from the runtime execution of J, saving on client execution time and space.
In this case, you want a tool that will make generation of such code S easiest. A pure JavaScript parser is a start but isn't likely enough; you'll need symbol table support to know which function code is connected a function call F(...), and which variable definition in which scope corresponds to assignments or accesses to a variable V. You may need to actually modify your original code J to insert points of access where the program state can be captured. You may need flow analysis to find out where some values went. Insisting all of this in JavaScript narrows your range of solutions.
For these tasks, you will likely find a program transformation tool useful. Such tools contain parsers for the langauge of interest, build ASTs representing the program, enable the construction of identifier-to-definition maps ("symbol tables"), can carry out modifications to the ASTs representing insertion of access points, or synthesis of ASTs representing your demonstration example, and then regenerate valid JavaScript code containing the modified J and the additions S.
Of all the program transformation systems that I know about (which includes all the ones at the Wikipedia site), none are implemented in JavaScript.
Our DMS Software Reengineering Toolkit is such a program transformation system offering all the features I just described. (Yes, its big and hoary; it has to be to handle the complexities of real computer languages). It has a JavaScript front end that contains a complete JavaScript parser to ASTs, and the machinery to regenerate JavaScript code from modified or synthesized ASTs. (Also big and hoary; good thing that hoary + hoary is still just hoary). Should it be useful, DMS also provides support for building control and dataflow analysis.
If you want something with a simple interface, you could try node-burrito: https://github.com/substack/node-burrito
It generates an AST using the uglify-js parser and then recursively walks the nodes. All you have to do is give a single callback which tests each node. You can alter the ones you need to change, and it outputs the resulting code.
I'd try to look for an existing parser to modify. Perhaps you could adapt JSLint/JSHint?
There is a problem with the rewriting above, you're not hoisting the initialization of magic to the top of the scope.
There's a number of projects out there that parse JavaScript.
Crock's Pratt parser which works well on JavaScript that fits within "The good parts" and less well on other JS.
The es-lab parser based on ometa which handles the full grammar including a lot of corner cases that Crock's parser misses. It may not perform as well as Crock's.
narcissus parser and evaluator. I don't have much experience with this.
There are also a number of high-quality lexers for JavaScript that let you manipulate JS at the token level. This can be tougher than it sounds though since JavaScript is not lexically regular, and predicting semicolon insertion is difficult without a full parse.
My es5-lexer is a carefully constructed and efficient lexer for EcmaScript 5 that provides the ability to tokenize JavaScript. It is heuristic where JavaScript's grammar is not lexically regular but the heuristic is very good and it provides a means to transform a token stream so that an interpreter is guaranteed to interpret it the way the lexer interpreted the tokens so if you don't trust your input, you can still be sure that the interpretation underlying the security transformations is sound even if not correct according to the spec for some bizarre inputs.
Your problem seams to be in same family of problems as what is solved with the JS Opfuscators and JS Compressors -- they as well as you need to be able to parse and reformat the JS to an equivalent script;
There was a good discussion on obfuscators here and the possible solution to your problem could be to leverage the parse and generator part from one of the FOSS versions.
One callout, your example code does not take into account the scopes of the variables you want to set/get and that will eventually become a problem that you will have to solve.
Addition
Given the scope problem for closure defined functions, you are probably unlikely to be able to solve this problem as a static parsing problem, as the scope variables outside the closure will have to be imported/exported to resolve/save and re-instantiate scope. Hence you may need to dig into the evaluation engine itself, and perhaps get the V8 engine and make a hack to the interpreter itself -- that is assuming that you do not need this to be generic cross all script engines and that you can tie it down to a single implementation which you control.

Categories

Resources