How to obfuscate variable names in JavaScript?

How to obfuscate variable names in JavaScript? - javascript

I'm an artist that's written a simple game in Javascript. Yah! But go easy on me because I bruise like a peach!
I'm looking into difficult to cheat at the game. So code obfuscation will make it difficult to cheat, right? Difficult, not impossible. I realise that, and could accidentally open a can of worms here...
Essentially, I'm Looking for an online tool that renames variables; and don't say search and replace in textpad :).
For example using http://packer.50x.eu/ on one line of code
var loopCounter = 0;
we get the result:
eval(function(p,a,c,k,e,d){e=function(c){return c};if(!''.replace(/^/,String)){while(c--){d[c]=k[c]||c}k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c])}}return p}('1 2=0;',3,3,'|var|loopCounter'.split('|'),0,{}))
The above looks like a mess, which is great; but it's quite easy to pick out English words like loopCounter. I would have expected it to make variable names obscure (single letter? words without nouns? look very similar?? Or should have that been my task anyway as part of writing the code. Or is this a waste of time trying to make variable names since a variable declaration is preceded by var and therefore there's no point to disguise it?

After a lot of searching (and links to the above) I found this which allows obfuscated string variables. And that is what I was after.

there are a few online tools available for this: javascript compressor and then theres javascript minifier that you can use for large images also. otherwise you could just google some offline tools, pretty sure they're easy to find

You could use the Javascript Obfuscator... your code will be very difficult to decode!
Hope it helps! ^_^

Related

Semi-obfuscate/uglify JavaScript

I know about JS minfiers, obfuscators and minifiers. I was wondering if there is any existing tool (or any fast-to-code solution) to partially obfuscate JavaScript. By partially I mean that it should become difficult to read, but not appear as uglified/minified. It should keep indentation, but lose comments, and partially change variable names, making them unclear without converting them to "a, b, c" like an obfuscator.
The purpose of this could be to take an explicit and reusable code and make it implicit and difficult to be reused by other people, without making it impossible to work with for yourself.
Any idea from where to start to achieve this ? Maybe editing an existing obfuscator ?

[This answer is a direct response to OP's request].
Semantic Designs JavaScript obfuscator will do what you want, but you'll need two passes.
On the first pass, run it as obfuscator; it will rename identifiers (although you can control how much or how that is done), strip whitepspace and comments. If you limit its ability to rename the identifiers, you lose some the strength of the obfuscator but that's your choice.
On the second pass, run it as a prettyprinter; it will introduce nice indentation again.
(In fact, the idea for obfsucation came from building a prettyprinter; if you can print-pretty, surely it is easy to print-ugly).
From the point of view of working with the code, you are better off working with your master copy any way you like, complete with your indentation and nice commentary as documentation. When you are ready to obfsucate, you run the obfuscator, shipping the obfuscated result. Errors reported in the obfuscated result that involve obfuscated names can be mapped back to the original names, using the map of obfuscated <--> original names produced during the obfuscation step.
This a product of my company. I'd provide a link but SO hates it when I do that, so you'll have to find it via my bio or googling.
PS: It works exactly as #georg suggests, by parsing to an AST, mangling, and prettyprinting. It doesn't use esprima.

I'm not aware of a tool that would meet your specific requirements, but it seems to be relatively easy to create, given that the vital parts already exist.
parse the source into an AST, using esprima or similar
manipulate the tree in the way you want (eg. remove comments, mangle identifiers etc)
rebuild the source from the tree using escodegen

How to decompile this?

I am decompiling a google chrome extension, because it seems suspicious.
The extension was written in javascript, but can somebody tell me exactly what symbols like this are, and how to "translate" them back to normal strings?
"\x63\x68\x61\x72\x43\x6F\x64\x65"
Jsbin of the full file:
http://jsbin.com/OnEviRa/1/

the code seems to be using an array to hide all the strings, and they seemed to have changed variable names to hide their meaning.
I can't give you a good automated way to change variable names to something meaningful, but the strings are easy.
you can evaluate the array, then use a regex to replace all occurences of _0x13d2[number] with "evaluated result".
here is a fiddle of it

Writing a Parser for javascript code

I want to extract javasscript code and find out if there are any dynamic tag creations like document.createElement('script'); I have tried to do this with Regular expressions but using regular expressions restricts me to get only some formats so i thought of writing a javascript parser which extracts all the keywords, strings and functions from the javascript code.

In general there is no way to know if a given line of code will ever run, you would need to solve the halting problem.
If you restrict your analysis to just finding occurances of a function call you don't make much progress. Naive methods will still be easy to trick, if you just regex match for document.createElement, you would not be able to match something as simple as document["create" + "Element"]. In general you would need to not only parse the code but evaluate it as well to get around this. And to be sure that you can evaluate the code you would again need to solve the halting problem.

Maybe you should try using Burrito

Well the first rule is never use regex for big things like this, or DOM, or ... . You have to parse it by tokens. The good news is that you don't have to write your own. There are a few JS to JS parsers.
UglifyJS
narcissus
Esprima
ZeParser
They may be a bit hard to work with it. But well better to work with them. There are other projects that are uses these such as burrito or code surgeon. So you can have a look at the source code and see how they uses them.
But there is bad news too, which people can still outsmart other people, let alone the parsers and the code they write. At least you need to evaluate the code with some execution time variables and see if it tries to access the DOM or not.

Special characters (and MooTools) are ruining my life

I'm working on localization for my toolkit.
My goal is that if you were a German web developer and you wanted to use a forEach loop, rather then type ['hey', 'there'].forEach(function () {}); they could type ['hey', 'there'].fürJeder(function () {});
I have all the words stored in an object at $.i18n.de.
In my JavaScript file I have
de: {
extend: 'verlänger',
forEach: 'fürJeder'
}
but when I go into the object to get the words they turn into verlÃ¤nger and fÃ¼rJeder.
I have no idea why.
Some details:
I'm on a MacBook Pro running 10.6.7
I'm using Kod as my editor.
I'm using Google Chrome as my browser.
I'm using Option + U + letter to type the ä and ü.
My question: How do I get the browser to handle these correctly?
I've tried using backslashes before them but it stays the same.
EDIT: Screw it. I just found out that the people who inspired me to do this did it as an April Fool's day joke. I really should've clicked on some of those links. It would've saved me 2 hours of trying to set up an API.

Turns out that this is a really bad idea to try and do.
Programming languages are almost exclusively written in English (JavaScript being one of those) which means that even if you write your program in a different language, keywords like return, var, function are still going to be in English and you're still going to have to use them which would get confusing when using functions, constants etc. that have non-English names.
The best solution is to just avoid using non-latin characters in variable names all together.
Even thought it works in most modern browsers, it makes your code harder to write and more confusing.
Leave the coding to the English speakers.

Although I don't get the point of that (especially for german people generally comfortable with english and sharing the same germanic linguistic origin), the simplest workaround to avoid encoding issues is to replace those special characters with their latin counterparts:
fürJeder -> fuerJeder
verlänger -> verlaenger
Uncommon spelling but still correct

Syntax / Logical checker In Javascript?

I'm building a solution for a client which allows them to create very basic code,
now i've done some basic syntax validation but I'm stuck at variable verification.
I know JSLint does this using Javascript and i was wondering if anyone knew of a good way to do this.
So for example say the user wrote the code
moose = "barry"
base = 0
if(moose == "barry"){base += 100}
Then i'm trying to find a way to clarify that the "if" expression is in the correct syntax, if the variable moose has been initialized etc etc
but I want to do this without scanning character by character,
the code is a mini language built just for this application so is very very basic and doesn't need to manage memory or anything like that.
I had thought about splitting first by Carriage Return and then by Space but there is nothing to say the user won't write something like moose="barry" or if(moose=="barry")
and there is nothing to say the user won't keep the result of a condition inline.
Obviously compilers and interpreters do this on a much more extensive scale but i'm not sure if they do do it character by character and if they do how have they optimized?
(Other option is I could send it back to PHP to process which would then releave the browser of responsibility)
Any suggestions?
Thanks
The use case is limited, the syntax will never be extended in this case, the language is a simple scripted language to enable the client to create a unique cost based on their users input the end result will be processed by PHP regardless to ensure the calculation can't be adjusted by the end user and to ensure there is some consistency.
So for example, say there is a base cost of £1.00
and there is a field on the form called "Additional Cost", the language will allow them manipulate the base cost relative to the "additional cost" field.
So
base = 1;
if(additional > 100 && additional < 150){base += 50}
elseif(additional == 150){base *= 150}
else{base += additional;}
This is a basic example of how the language would be used.
Thank you for all your answers,
I've investigated a parser and creating one would be far more complex than is required
having run several tests with 1000's of lines of code and found that character by character it only takes a few seconds to process even on a single core P4 with 512mb of memory (which is far less than the customer uses)
I've decided to build a PHP based syntax checker which will check the information and convert the variables etc into valid PHP code whilst it's checking it (so that it's ready to be called later without recompilation) using this instead of javascript this seems more appropriate and will allow for more complex code to arise without hindering the validation process
It's only taken an hour and I have code which is able to check the validity of an if statement and isn't confused by nested if's, spaces or odd expressions, there is very little left to be checked whereas a parser and full blown scripting language would have taken a lot longer
You've all given me a lot to think about and i've rated relevant answers thank you

If you really want to do this — and by that I mean if you really want your software to work properly and predictably, without a bunch of weird "don't do this" special cases — you're going to have to write a real parser for your language. Once you have that, you can transform any program in your language into a data structure. With that data structure you'll be able to conduct all sorts of analyses of the code, including procedures that at least used to be called use-definition and definition-use chain analysis.
If you concoct a "programming language" that enables some scripting in an application, then no matter how trivial you think it is, somebody will eventually write a shockingly large program with it.
I don't know of any readily-available parser generators that generate JavaScript parsers. Recursive descent parsers are not too hard to write, but they can get ugly to maintain and they make it a little difficult to extend the syntax (esp. if you're not very experienced crafting the original version).

You might want to look at JS/CC which is a parser generator that generates a parser for a grammer, in Javascript. You will need to figure out how to describe your language using a BNF and EBNF. Also, JS/CC has its own syntax (which is somewhat close to actual BNF/EBNF) for specifying the grammar. Given the grammer, JS/CC will generate a parser for that grammar.
Your other option, as Pointy said, is to write your own lexer and recursive-descent parser from scratch. Once you have a BNF/EBNF, it's not that hard. I recently wrote a parser from an EBNF in Javascript (the grammar was pretty simple so it wasn't that hard to write one YMMV).
To address your comments about it being "client specific". I will also add my own experience here. If you're providing a scripting language and a scripting environment, there is no better route than an actual parser.
Handling special cases through a bunch of if-elses is going to be horribly painful and a maintenance nightmare. When I was a freshman in college, I tried to write my own language. This was before I knew anything about recursive-descent parsers, or just parsers in general. I figured out by myself that code can be broken down into tokens. From there, I wrote an extremely unwieldy parser using a bunch of if-elses, and also splitting the tokens by spaces and other characters (exactly what you described). The end result was terrible.
Once I read about recursive-descent parsers, I wrote a grammar for my language and easily created a parser in a 10th of the time it took me to write my original parser. Seriously, if you want to save yourself a lot of pain, write an actual parser. If you go down your current route, you're going to be fixing issues forever. You're going to have to handle cases where people put the space in the wrong place, or perhaps they have one too many (or one too little) spaces. The only other alternative is to provide an extremely rigid structure (i.e, you must have exactly x number of spaces following this statement) which is liable to make your scripting environment extremely unattractive. An actual parser will automatically fix all these problems.

Javascript has a function 'eval'.
var code = 'alert(1);';
eval(code);
It will show alert. You can use 'eval' to execute basic code.

Develop Reference

JavaScript is the programming language of the Web.