Javascript - Is there an easier way to read obfuscated, condensed code? - javascript

Is the code below an example of obfuscation? Is there a way to make code that has been obfuscated easier to interpret or follow the actions that lead through the script using chrome or firefox browsers? Are the original naming conventions somewhere behind the scenes? In other words, could I de-obfuscate the code short of downloading a couple hundred lines and then renaming everything based on what I think is happening?
main: function (a, b, c, d, e) {
var r,
o,
s = null != t ? t : {
},
l = i.helperMissing,
c = 'function',
d = e.escapeExpression;

That code is a result of minification and obfuscation which results in condensed code, and thus unreadable code. The entire point of minification is to reduce file size, so you the "real" function, attribute, and values aren't stored somewhere to be found. The idea of obfuscation is to make the code hard to decypher and "steal".
If you found this somewhere and the file ended in .min.js, you can always see if the same file exists in that location without the .min part in the filename.
For example
https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js
https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.js
Also, it would be near impossible for software to deobfuscate code, but you could read through it and try to understand its meaning. Here's a simple example
x=(a,b)=>a+b // obfuscated/minified code
function sum(number1, number2) { return number1 + number2 } // original code

Related

OOL (out of line) code

What is OOL (Out of line) code? I've found it in ION compiler but can't understand what is going on.
bool CodeGeneratorShared::generateOutOfLineCode() {
for (size_t i = 0; i < outOfLineCode_.length(); i++) {
// Add native => bytecode mapping entries for OOL sites.
// Not enabled on asm.js yet since asm doesn't contain bytecode mappings.
if (!gen->compilingAsmJS()) {
if (!addNativeToBytecodeEntry(outOfLineCode_[i]->bytecodeSite()))
return false;
}
if (!gen->alloc().ensureBallast())
return false;
JitSpew(JitSpew_Codegen, "# Emitting out of line code");
masm.setFramePushed(outOfLineCode_[i]->framePushed());
lastPC_ = outOfLineCode_[i]->pc();
outOfLineCode_[i]->bind(&masm);
outOfLineCode_[i]->generate(this);
}
return !masm.oom();
}
I've tried to use google to found information about it, but didn't have a success. Maybe you can give me some idea what it is? Thank you :)
I looked into the source, and it seems that "out of line" here means code that's generated after the normal code/function.
See e.g. CodeGenerator::generate basically looks like this:
generateProlog();
generateBody();
generateEpilog();
generateOutOfLineCode();
So out of line code is generated after the code's end. This is often used for exceptional control flow and to keep the code that invokes deoptimization, throws exception, etc. out of the instruction cache and the "normal" program code.
Let's assume we have a function int f(int a, int b) { return a / b; } and the languages semantics force us to throw an Exception if the divisor is 0. This is the code in pseudo assembly:
cmp b, 0
jump-if-not-zero lbl1
call throw_exception
lbl1:
div c, a, b
ret c
You can see that normal program flow needs to jump over the code that throws the exception. Usually b is non-zero in almost all cases so that seems kind of wasteful. With out of line code we can generate more efficient code:
cmp b, 0
jump-if-zero out-of-line1
div c, a, b
ret c
out-of-line1:
call throw_exception
Here we only jump for zero values which should be rare. The cmp and div instruction are also nearer to each other which is good for instruction cache usage.
In my JIT I am generating out of line code for null pointer exception throwing, failing asserts, etc. JS and IonMonkey may use it for different operations. One example for out of line code I've found is class OutOfLineTruncateF32OrF64ToI32 for WASM, that extends OutOfLineCode the base class of all out of line code.
What's also nice is that out of line code in IonMonkey can use the field rejoin to jump back to normal code flow.

JS: rename variables for refactor (using an AST, not text)

I often need to rename variables when refactoring code, which I currently do in a somewhat hacky way using regexs - I end up having to come with silly text workaround workarounds for the lack of actual structure, eg, rename 'req' to 'request' and avoid side effects with similar names like 'require'.
Thinking about this stuff: it's kind of like modifying the DOM with regexs: it just doesn't work.
I've learnt about ASTs and code structure modification tools like Esprima. Is there a tool to rename variables, Esprima based or otherwise?
1. grasp.js
It looks like http://graspjs.com/ does exactly this.
grasp selector --replace replacement file.js
For example, to rename 'el' to 'element':
grasp '#el' --replace 'element' index.js
Official grasp.replace docs.
2. vscode
Visual Studio Code also includes a real replacement tool. Just right click a token and choose rename symbol.
I know you've been asking for 'a tool'; but I think it's better to use just esprima itself and the various general purpose tools on top of esprima, and roll your own renamer. Because it's really really easy, and then you have more control. Here is a complete example in just 12 lines of code. It uses escodegen and estraverse, both on github, and, as far as I can see kind of 'the standard' to use in conjunction with esprima. While esprima essentially gives the parse function string -> abstract syntax tree, escodegen essentially gives the reverse of that, i.e. abstract syntax tree -> string. And estraverse 'walks the tree' with the traverse method, thus helping to analyze or modify it.
Here the code:
function rename(code, renamingObj){
var ast = esprima.parse(code);
function callback(node){
if (node.type==='Identifier') {
if (node.name in renamingObj){
node.name = renamingObj[node.name];
}
}
}
estraverse.traverse(ast, { enter: callback });
return escodegen.generate(ast);
}
Testcase:
function blah(x,y){
var difference = x + y;
var product = x - y;
var sum = x * y;
return 42;
}
var renamingObj = {
sum : 'product',
difference : 'sum',
product : 'difference'
};
run it:
rename(blah.toString(), renamingObj)
output:
function blah(x, y) {
var sum = x + y;
var difference = x - y;
var product = x * y;
return 42;
}
I would say, if you have something special to do, it's easier to modify above code than sifting through some tool documentation.
Our JavaScript Formatter/Obfuscator will do this reliably. It parses JavaScript, builds an AST, and either prints a pretty version, or an ugly (obfuscated) version. [The reason these are together is because pretty and ugly are opposite sides of the same coin!).
One of the things it does is scramble (whole) identifier names. By using the AST, it only renames "whole identifiers"; it won't confuse "req" with "require", or modify string or comment content by accident.
You can tell it to build a map of all identifier names, and then rename them all to themselves; change just one, and you get your rename effect. The identifier map looks like this:
x -> y
p -> q
...
directing that x gets renamed to y, p to q, etc. You want:
x -> x
p -> q
...
to change just p to q. Its pretty easy to produce the identity map
as your starting place.
I wouldn't say this is convenient, but it does work. It obviously doesn't
know anything about scopes. (someday!).
See my bio for details. Many SO moderators hate tool questions, and they especially seem to dislike it when I provide a tool answer that includes a link to the tool. So you'll have to track down the link yourself, sorry. (Complain on Meta if you think this is dumb).

I need a Javascript literal syntax converter/deobfuscation tools

I have searched Google for a converter but I did not find anything. Is there any tools available or I must make one to decode my obfuscated JavaScript code ?
I presume there is such a tool but I'm not searching Google with the right keywords.
The code is 3 pages long, this is why I need a tools.
Here is an exemple of the code :
<script>([][(![]+[])[!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+(!![]+[])[+!+[]]+(!![]+[])[+[]]][([][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(![]+[])[+!+[]]+(![]+[])[!+[]+!+[]]+(![]+[])[!+[]+!+[]]]()[(!![]+[])[!+[]+!+[]+!+[]]+(+(+[])+[][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[!+[]+!+[]+!+[]+[+[]]]+(![]+[])[+!+[]]+(![]+[])[!+[]+!+[]]])(([]+[])[([][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(!![]+
Thank you
This code is fascinating because it seems to use only nine characters ("[]()!+,;" and empty space U+0020) yet has some sophisticated functionality. It appears to use JavaScript's implicit type conversion to coerce arrays into various primitive types and their string representations and then use the characters from those strings to compose other strings which type out the names of functions which are then called.
Consider the following snippet which evaluates to the array filter function:
([][
(![]+[])[+[]] // => "f"
+ ([![]]+[][[]])[+!+[]+[+[]]] // => "i"
+ (![]+[])[!+[]+!+[]] // => "l"
+ (!![]+[])[+[]] // => "t"
+ (!![]+[])[!+[]+!+[]+!+[]] // => "e"
+ (!![]+[])[+!+[]] // => "r"
]) // => function filter() { /* native code */ }
Reconstructing the code as such is time consuming and error prone, so an automated solution is obviously desirable. However, the behavior of this code is so tightly bound to the JavaScript runtime that de-obsfucating it seems to require a JS interpreter to evaluate the code.
I haven't been able to find any tools that will work generally with this sort of encoding. It seems as though you'll have to study the code further and determine any patterns of usage (e.g. reliance on array methods) and figure out how to capture their usage (e.g. by wrapping high-level functions [such as Function.prototype.call]) to trace the code execution for you.
This question has already an accepted answer, but I will still post to clear some things up.
When this idea come up, some guy made a generator to encode JavaScript in this way. It is based on doing []["sort"]["call"]()["eval"](/* big blob of code here */). Therefore, you can decode the results of this encoder easily by removing the sort-call-eval part (i.e. the first 1628 bytes). In this case it produces:
if (document.cookie=="6ffe613e2919f074e477a0a80f95d6a1"){ alert("bravo"); }
else{ document.location="http://www.youtube.com/watch?v=oHg5SJYRHA0"; }
(Funny enough the creator of this code was not even able to compress it properly and save a kilobyte)
There is also an explanation of why this code doesn't work in newer browser anymore: They changed Array.prototype.sort so it does not return a reference to window. As far as I remember, this was the only way to get a reference to window, so this code is kind of broken now.

How do languages handle side effects of compound operators?

Assume such situation:
int a = (--t)*(t-2);
int b = (t/=a)+t;
In C and C++ this is undefined behaviour, as described here: Undefined behavior and sequence points
However, how does this situation look in:
JavaScript,
Java,
PHP...
C#
well, any other language which has compound operators?
I'm bugfixing a Javascript -> C++ port right now in which this got unnoticed in many places. I'd like to know how other languages generally handle this... Leaving the order undefined is somehow specific to C and C++, isn't it?
According to the ECMA Script specification, which I believe javascript is supposed to conform to, in multiplication and addition statements, it evaluates the left hand side before evaluating the right hand side. (see 11.5 and 11.6). I think this means that the code should be equivalent to
t = t - 1;
int a = t * (t - 2);
t = t / a;
int b = t + t;
However, you should not always trust the specification to be the same as the implementation!
Your best bet in confusing cases like this is to experiment with various inputs to the ambiguous lines of code in the original operating environment, and try to determine what it is doing. Make sure to test cases that can confirm a hypothesis, and also test cases that can falsify it.
Edit: Apparently most JavaScript implements the 3rd edition of ECMAScript, so I changed the link to that specification instead.
Practically speaking, if you have to ask or look up the rules for an expression, you shouldn't be using that expression in your code. Someone else will come back two years from now and get it wrong, then rewrite it and break the code.
If this was intended as a strictly theoretical question I unfortunately can't offer details regarding those other languages.
For javascript the following article should help.
This article clearly states whether a particular combination of
a OP b OP c goes from left-to-right and in which order.
I'm don't know about the other languages.
However, how does this situation look in: JS, Java, PHP, C#...
To be perfectly candid, int a = (--t)*(t-2); int b = (t/=a)+t; looks like crap.
It's nice to have fancy code that can be all pretty and elitist, but there's absolutely no need for it. The solution for every language when confronted with code like this is to add a couple more semi-colons (unless you're dealing with python):
--t;
int a = t * (t-2);
t /= a;
int b = t + t;
-or-
int b = t * 2;
-or-
int b = t << 1;
//whichever method you prefer
If a different order of operations is desired, the adjust the lines accordingly. If you're trying to fix old buggy code, fix the code, don't just re-implement someone else's spaghetti.
Edit to add:
I realized I never specifically answered the original question:
How do languages handle side effects of compound operators?
Poorly.

Good resources for extreme minified JavaScript (js1k-style)

As I'm sure most of the JavaScripters out there are aware, there's a new, Christmas-themed js1k. I'm planning on entering this time, but I have no experience producing such minified code. Does anyone know any good resources for this kind of thing?
Google Closure Compiler is a good javascript minifier.
There is a good online tool for quick use, or you can download the tool and run it as part of a web site build process.
Edit: Added a non-exhaustive list of tricks that you can use to minify JavaScript extremely, before using a minifier:
Shorten long variable names
Use shortened references to built in variables like d=document;w=window.
Set Interval
The setInterval function can take either a function or a string. Pass in a string to reduce the number of characters used: setInterval('a--;b++',10). Note that passing in a string forces an eval invokation so it will be slower than passing in a function.
Reduce Mathematical Calculations
Example a=b+b+b can be reduced to a=3*b.
Use Scientific Notation
10000 can be expressed in scientific notation as 1E4 saving 2 bytes.
Drop leading Zeroes
0.2 = .2 saves a byte
Ternery Operator
if (a > b) {
result = x;
}
else {
result = y;
}
can be expressed as result=a>b?x:y
Drop Braces
Braces are only required for blocks of more than one statement.
Operator Precedence
Rely on operator precedence rather than adding unneeded brackets which aid code readability.
Shorten Variable Assignment
Rather than function x(){a=1,b=2;...}() pass values into the function, function x(a,b){...}(1,2)
Think outside the box
Don't automatically reach for standard ways of doing things. Rather than using d.getElementById('p') to get a reference to a DOM element, could you use b.children[4] where d=document;b=body.
Original source for above list of tricks:
http://thingsinjars.com/post/293/the-quest-for-extreme-javascript-minification/
Spolto is right.
Any code minifier won't do the trick alone. You need to first optimize your code and then make some dirty manual tweaks.
In addition to Spolto's list of tricks I want to encourage the use of logical operators instead of the classical if else syntax. ex:
The following code
if(condition){
exp1;
}else{
exp2;
}
is somewhat equivalent to
condition&&exp1||exp2;
Another thing to consider might be multiple variable declaration :
var a = 1;var b = 2;var c = 1;
can be rewritten as :
var a=c=1,b=2;
Spolto is also right about the braces. You should drop them. But in addition, you should know that they can be dropped even for blocks of more expressions by writing the expressions delimited by a comma(with a leading ; of course) :
if(condition){
exp1;
exp2;
exp3;
}else{
exp4;
exp5;
}
Can be rewritten as :
if(condition)exp1,exp2,exp3;
else exp4,exp5;
Although it's not much (it saves you only 1 character/block for those who are counting), it might come in handy. (By the way, the latest Google Closure Compiler does this trick too).
Another trick worth mentioning is the controversial with functionality.
If you care more about the size, then you should use this because it might reduce code size.
For example, let's consider this object method:
object.method=function(){
this.a=this.b;
this.c++;
this.d(this.e);
}
This can be rewritten as :
object.method=function(){
with(this){
a=b;
c++;
d(e);
}
}
which is in most cases signifficantly smaller.
Something that most code packers & minifiers do not do is replacing large repeating tokens in the code with smaller ones. This is a nasty hack that also requires the use of eval, but since we're in it for the space, I don't think that should be a problem. Let's say you have this code :
a=function(){/*code here*/};
b=function(){/*code here*/};
c=function(){/*code here*/};
/*...*/
z=function(){/*code here*/};
This code has many "function" keywords repeating. What if you could replace them with a single(unused) character and then evaluate the code?
Here's how I would do it :
eval('a=F(){/*codehere*/};b=F(){/*codehere*/};c=F(){/*codehere*/};/*...*/z=F(){/*codehere*/};'.replace(/function/g,'F'));
Of course the replaced token(s) can be anything since our code is reduced to an evaluated string (ex: we could've replaced =function(){ with F, thus saving even more characters).
Note that this technique must be used with caution, because you can easily screw up your code with multiple text replacements; moreover, you should use it only in cases where it helps (ex: if you only have 4 function tokens, replacing them with a smaller token and then evaluating the code might actually increase the code length :
var a = "eval(''.replace(/function/g,'F'))".length,
b = ('function'.length-'F'.length)*4;
alert("you should" + (a<b?"":" NOT") + " use this technique!");
In the following link you'll find surprisingly good tricks to minify js code for this competition:
http://www.claudiocc.com/javascript-golfing/
One example: (extracted from section Short-circuit operators):
if (p) p=q; // before
p=p&&q; // after
if (!p) p=q; // before
p=p||q; // after
Or the more essoteric Canvas context hash trick:
// before
a.beginPath
a.fillRect
a.lineTo
a.stroke
a.transform
a.arc
// after
for(Z in a)a[Z[0]+(Z[6]||Z[2])]=a[Z];
a.ba
a.fc
a.ln
a.sr
a.to
a.ac
And here is another resource link with amazingly good tricks: https://github.com/jed/140bytes/wiki/Byte-saving-techniques
First off all, just throwing your code into a minifier won't help you that much. You need to have the extreme small file size in mind when you write the code. So in part, you need to learn all the tricks yourself.
Also, when it comes to minifiers, UglifyJS is the new shooting star here, its output is smaller than GCC's and it's way faster too. And since it's written in pure JavaScript it should be trivial for you to find out what all the tricks are that it applies.
But in the end it all comes down to whether you can find an intelligent, small solution for something that's awsome.
Also:
Dean Edwards Packer
http://dean.edwards.name/packer/
Uglify JS
http://marijnhaverbeke.nl/uglifyjs
A friend wrote jscrush packer for js1k.
Keep in mind to keep as much code self-similar as possible.
My workflow for extreme packing is: closure (pretty print) -> hand optimizations, function similarity, other code similarity -> closure (whitespace only) -> jscrush.
This packs away about 25% of the data.
There's also packify, but I haven't tested that myself.
This is the only online version of #cowboy's packer script:
http://iwantaneff.in/packer/
Very handy for packing / minifying JS

Categories

Resources