I apologize cause this may be a bizarre question but I'm pretty confused. Would anyone know why someone would use what I believe to be "a hexadecimal approach" to javascript? i.e.) I see someone naming variables like
_0x2f8b, _0xcb6545,_0x893751, _0x1d2177, etc
Why would anyone ever do this? Also, I see code like
'to\x20bypass\x20this\x20link'
as well as hexadecimals for numbers such as 725392
0xb1190
So, how would anyone even get this kind of naming convention and why would they ever want to use this?
Code like this has been, 99% of the time, automatically mangled/obfuscated from the original source code, in an attempt to make it more difficult to reverse engineer.
For example, if you start with
const foo = 'bar';
const somethingElse = 5;
you might use an obfuscation tool to come up with
var _0x2f8b, _0xcb6545;
_0x2f8b = 'bar';
_0xcb6545 = 5;
and serve that to clients. Reading 200 lines of obfuscated code is a lot harder than reading 200 lines of the original source code.
as well as hexadecimals for numbers such as 725392
Same thing - it's easier for a human to make sense of 725392 (which may be a magic number important for the application) than 0xb1190.
This isn't something that would be present in source code.
Related
I was watching Bret Victor's talk "Inventing on Principle" the other night and decided to try and build the real time JavaScript editor he demoed. You can see it in action at 18:05 when he implements binary search.
It doesn't look like he ever released such an editor, but regardless, I thought I could learn a lot building one like it.
Here's what I have so far
What it can currently do:
Keep track of variables and their values (if assigned as literals)
Print them on the same line on the right
Show parsing errors
I'm using Electron and Angular to build the app, so it's a desktop app for OSX, but written in JavaScript and HTML.
For parsing, I'm using Acorn. So far it's a fantastic parser, but it's really hard to actually run the code after it's been parsed. Permitting only literal assignments such as var x = 1 is doable, but things get really complex fast once you try to do stuff as simple as var x = 1 + 2, due to how Acorn structures the parsed result.
I don't want to just eval the whole thing, since it's could be dangerous and there's probably better ways to do it.
Ideally, I could find a safe way to evaluate the code on the left and keep track of all the variables somehow. Unfortunately, my research indicates that there is no access to private variables in JavaScript, so I'm hoping I can count on fellow developers' ingenuity to help me with this. Any hints on how to do this better/easier than with Acorn would be greatly appreciated.
If you need it, my code base is here: https://github.com/dchacke/nasherai
Try sandbox for safe evaluation of strings.
var s = new Sandbox()
s.run( '1 + 1 + " apples"', function( output ) {
// output.result == "2 apples"
})
I found this: click and thought what are the reasons behind this coding style ?
Defining identifiers like that _0x3384x4, kind of unreadable for a human ...?!
or writing object properties like:
{
"\x63\x68\x61\x72\x73": ' \uD83D\uDE23 ',
"\x63\x6C\x61\x73\x73": '_1az _1a- _2gc',
"\x6E\x61\x6D\x65": 'Bi\u1EC3u t\u01B0\u1EE3ng vui 18'
}
this could be written like that, couldn't it ?
{ chars=" 😣 ", class="_1az _1a- _2gc", name="Biểu tượng vui 18"}
Is it because of some old computers that can not display these characters? Is it kind of uglifying, protecting javascript code?
What kind of format is it (0x7892x8) kind of hex, what does it represent ? (eval("0x7892") evaluates 30866, but 0x7892x8 means 8th version of 30866 ... doesn't make sense for me ?!
It is no coding style. It is obfuscation.
From Wikipedia:
In software development, obfuscation is the deliberate act of creating
obfuscated code, i.e. source or machine code that is difficult for
humans to understand. Like obfuscation in natural language, it may use
needlessly roundabout expressions to compose statements.
Programmers may deliberately obfuscate code to conceal its purpose
(security through obscurity) or its logic, in order to prevent
tampering, deter reverse engineering, or as a puzzle or recreational
challenge for someone reading the source code.
Programs known as obfuscators transform readable code into obfuscated
code using various techniques.
There are many Tools out there, called Obfuscator, which obfusecate Code. Here is a Javascript Obfuscator for example:
http://www.jsobfuscate.com/
As you already right guesed it is hexadecimal. So for example x63 means 99 decimal.
Now we take a look into the Code Table:
http://www.codetable.net/
And we see, 99 decimal represents for example the c char. So \x63 basicly is c.
As an exercise, I've been working on replicating this game. In case it becomes inaccessible, the premise of the game is to take a quote that's been scrambled by swapping pairs of letters (eg replace A with M and vice versa), and unscramble it to its original arrangement.
As I'm studying this game, I realize it's almost trivial to extract the solution from the source - there are any number of breakpoints you can place to access it. I've been trying to come up with a way to obscure the string in a way that it isn't immediately accessible, and the only thing I can think of is some kind of native obscuring function before the quote even has a chance to land in a variable. Something like this:
var litmus, quotes = [
"String One",
"String Two",
....
"String n",
];
litmus = obscureString(quotes[Math.floor(Math.random()*(n-1))]);
This way the user can't summon up the raw quote, or even the random integer that was used - they're gone by the time the breakpoint hits.
My question is this: is there any kind of native function that would fit the role of obscureString() in the above example, even loosely? I'm aware JavaScript doesn't have any native encryption/hash methods, and any libraries that provide that functionality just provide a chance to drop a breakpoint. Thus, I'm hoping someone here can come up with a creative way to natively obscure a string, if it's even possible in JS.
Been crunching on it for a while, and I found a very makeshift solution.
The only native (read: non-user-corruptible) transformation/hash function I was able to find was window.btoa. It does exactly what I need, in letting me obscure a string before the user ever has a chance to get their hands on it. The problem, however, is that it has a counterpart window.atob, whose only purpose is to reverse the process.
To solve that, I was able to neutralize window.atob with the following line of code, essentially making window.btoa a one-way trip:
window.atob = function(f){ return f; };
Don't make a habit of this.
This is horrific practice, and I feel dirty for writing it. It's passable in this case because my application is small, self-contained, and won't ever need to rely on that function elsewhere - but I can't in good conscience recommend this as a general solution. Many browsers won't even let you override native functions in the first place.
Just wanted to post the answer in case someone found themselves in a similar situation needing a similar answer - this may be the closest we can get to a one-way native hash function for now.
I'm an artist that's written a simple game in Javascript. Yah! But go easy on me because I bruise like a peach!
I'm looking into difficult to cheat at the game. So code obfuscation will make it difficult to cheat, right? Difficult, not impossible. I realise that, and could accidentally open a can of worms here...
Essentially, I'm Looking for an online tool that renames variables; and don't say search and replace in textpad :).
For example using http://packer.50x.eu/ on one line of code
var loopCounter = 0;
we get the result:
eval(function(p,a,c,k,e,d){e=function(c){return c};if(!''.replace(/^/,String)){while(c--){d[c]=k[c]||c}k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c])}}return p}('1 2=0;',3,3,'|var|loopCounter'.split('|'),0,{}))
The above looks like a mess, which is great; but it's quite easy to pick out English words like loopCounter. I would have expected it to make variable names obscure (single letter? words without nouns? look very similar?? Or should have that been my task anyway as part of writing the code. Or is this a waste of time trying to make variable names since a variable declaration is preceded by var and therefore there's no point to disguise it?
After a lot of searching (and links to the above) I found this which allows obfuscated string variables. And that is what I was after.
there are a few online tools available for this: javascript compressor and then theres javascript minifier that you can use for large images also. otherwise you could just google some offline tools, pretty sure they're easy to find
You could use the Javascript Obfuscator... your code will be very difficult to decode!
Hope it helps! ^_^
I'm building a solution for a client which allows them to create very basic code,
now i've done some basic syntax validation but I'm stuck at variable verification.
I know JSLint does this using Javascript and i was wondering if anyone knew of a good way to do this.
So for example say the user wrote the code
moose = "barry"
base = 0
if(moose == "barry"){base += 100}
Then i'm trying to find a way to clarify that the "if" expression is in the correct syntax, if the variable moose has been initialized etc etc
but I want to do this without scanning character by character,
the code is a mini language built just for this application so is very very basic and doesn't need to manage memory or anything like that.
I had thought about splitting first by Carriage Return and then by Space but there is nothing to say the user won't write something like moose="barry" or if(moose=="barry")
and there is nothing to say the user won't keep the result of a condition inline.
Obviously compilers and interpreters do this on a much more extensive scale but i'm not sure if they do do it character by character and if they do how have they optimized?
(Other option is I could send it back to PHP to process which would then releave the browser of responsibility)
Any suggestions?
Thanks
The use case is limited, the syntax will never be extended in this case, the language is a simple scripted language to enable the client to create a unique cost based on their users input the end result will be processed by PHP regardless to ensure the calculation can't be adjusted by the end user and to ensure there is some consistency.
So for example, say there is a base cost of £1.00
and there is a field on the form called "Additional Cost", the language will allow them manipulate the base cost relative to the "additional cost" field.
So
base = 1;
if(additional > 100 && additional < 150){base += 50}
elseif(additional == 150){base *= 150}
else{base += additional;}
This is a basic example of how the language would be used.
Thank you for all your answers,
I've investigated a parser and creating one would be far more complex than is required
having run several tests with 1000's of lines of code and found that character by character it only takes a few seconds to process even on a single core P4 with 512mb of memory (which is far less than the customer uses)
I've decided to build a PHP based syntax checker which will check the information and convert the variables etc into valid PHP code whilst it's checking it (so that it's ready to be called later without recompilation) using this instead of javascript this seems more appropriate and will allow for more complex code to arise without hindering the validation process
It's only taken an hour and I have code which is able to check the validity of an if statement and isn't confused by nested if's, spaces or odd expressions, there is very little left to be checked whereas a parser and full blown scripting language would have taken a lot longer
You've all given me a lot to think about and i've rated relevant answers thank you
If you really want to do this — and by that I mean if you really want your software to work properly and predictably, without a bunch of weird "don't do this" special cases — you're going to have to write a real parser for your language. Once you have that, you can transform any program in your language into a data structure. With that data structure you'll be able to conduct all sorts of analyses of the code, including procedures that at least used to be called use-definition and definition-use chain analysis.
If you concoct a "programming language" that enables some scripting in an application, then no matter how trivial you think it is, somebody will eventually write a shockingly large program with it.
I don't know of any readily-available parser generators that generate JavaScript parsers. Recursive descent parsers are not too hard to write, but they can get ugly to maintain and they make it a little difficult to extend the syntax (esp. if you're not very experienced crafting the original version).
You might want to look at JS/CC which is a parser generator that generates a parser for a grammer, in Javascript. You will need to figure out how to describe your language using a BNF and EBNF. Also, JS/CC has its own syntax (which is somewhat close to actual BNF/EBNF) for specifying the grammar. Given the grammer, JS/CC will generate a parser for that grammar.
Your other option, as Pointy said, is to write your own lexer and recursive-descent parser from scratch. Once you have a BNF/EBNF, it's not that hard. I recently wrote a parser from an EBNF in Javascript (the grammar was pretty simple so it wasn't that hard to write one YMMV).
To address your comments about it being "client specific". I will also add my own experience here. If you're providing a scripting language and a scripting environment, there is no better route than an actual parser.
Handling special cases through a bunch of if-elses is going to be horribly painful and a maintenance nightmare. When I was a freshman in college, I tried to write my own language. This was before I knew anything about recursive-descent parsers, or just parsers in general. I figured out by myself that code can be broken down into tokens. From there, I wrote an extremely unwieldy parser using a bunch of if-elses, and also splitting the tokens by spaces and other characters (exactly what you described). The end result was terrible.
Once I read about recursive-descent parsers, I wrote a grammar for my language and easily created a parser in a 10th of the time it took me to write my original parser. Seriously, if you want to save yourself a lot of pain, write an actual parser. If you go down your current route, you're going to be fixing issues forever. You're going to have to handle cases where people put the space in the wrong place, or perhaps they have one too many (or one too little) spaces. The only other alternative is to provide an extremely rigid structure (i.e, you must have exactly x number of spaces following this statement) which is liable to make your scripting environment extremely unattractive. An actual parser will automatically fix all these problems.
Javascript has a function 'eval'.
var code = 'alert(1);';
eval(code);
It will show alert. You can use 'eval' to execute basic code.