I want to write a program that scans javascript code, and replace variable names with short, meaningless identifiers, without breaking the code.
I know YUI compresser and google's closure compiler can do this. I am wondering how can I implement this?
Is it necessary to build the abstract syntax tree? If not, how can I find the candidate variables for renaming?
Most modern javascript compressors are actually compilers. They parse javascript input into an abstract syntax tree, perform operations on the tree (some safe, some not) and then use the syntax tree to print out code. Both Uglify and Closure-Compiler are true compilers.
Implementing your own compiler is a large project and requires a great knowledge of computing theory. The dragon book is a great resource from which to get started.
You may be able to leverage existing work. I recommend starting from a non-optimizing compiler for reference.
I made http://www.whak.ca for whacking scripts to unreadable obfuscation, over 75 algorithms ready for you to obfuscate your code. There are also 20 compression JavaScript packers on http://www.whak.ca/packer/ that will obfuscate your codes. These all can be reversed engineered if someone wanted your code bad enough. But people can pick locks, yet we still lock our doors...
Related
I'm looking for a way to protect some javascript code from reading/modifying. I know many people consider that impossible, but still...
From what I see the Chrome's V8 engine does a number of optimizations when it sees JS code, probably compiles it (?) and then runs it.
So I'm wondering is it possible to use V8's C++ api to compile the JS code into machinecode/chromecode and then feed that directly into Chrome (I don't care about other browsers)?
Supposedly it will not only be faster, but also non-humanly readable, something like ASM.
Is this possible?
WebAssembly is doing this thing so I don't understand why we can't do it with JS code.
There's also EncloseJS and pkg that are doing a very similar thing.
V8 developer here. No, it is not possible to compile JavaScript ahead of time and send only the compiled code to the browser. V8 (and other virtual machines like it) contain compilers, but they cannot be used as standalone compilers to produce standalone binaries.
In theory, you could compile JavaScript to WebAssembly -- any two turing-complete programming languages can in theory be compiled to each other. As far as I know, no such compiler exists today though. One big reason for that is that performance of the end result would be horrible (see the discussion with Andreas Rossberg for details); so considering that browsers can execute JavaScript directly, people have little reason to develop such a thing. (It would also be a large and difficult task.)
As for your stated goal: your best shot at making JavaScript code unreadable is to minify it. In fact, that is effectively just as good as your idea to generate assembly, because disassemblers exist that turn assembly back into minified-like higher-level language code; they cannot reconstruct variable names or comments (because that information is lost during compilation), but they can reconstruct program logic.
What I ended up doing is moving some of the logic from JavaScript into C++ and compiling that into NodeJS native modules (that's possible for Electron apps).
It works pretty good, it's very fast, the source is... as protected as it can get, but you may need to worry about cross-platform issues, also compiling/linking can be a bit of a pain, but other than that it's great.
WebAssembly is not doing that. And no, it's not possible either. The web is supposed to be both browser- and hardware-independent.
Moreover, a language like JS would not be faster if compiled offline -- it only is anything close to fast because it is dynamically compiled and optimised, taking dynamic profile information into account.
I saw those javascript front-end build tools, e.g. webpack, using the word "compile" from time to time. I am not sure what does compile javascript codes mean exactly, at least not like compile c/c++ codes.
I think I understand the "build" process in general, like bundle all js codes into one big file, minify/uglify the codes, using babel to transforms ES6 syntax(transpile). But what does compiling mean here, how does it fit in the whole building process or it is just another name for the whole build process?
Currently, I thought it may be just another name for using Babel to transforms ES6 syntax.
PS. after reading this SO Is Babel a compiler or transpiler? I believe my question is not same as that. Because it is not just related to Bable. For example, webpack also uses the term compiler https://webpack.js.org/api/compiler/ I do not understand its meaning there!
Browserify uses compiler as well e.g, https://github.com/robrichard/browserify-compile-templates "Compiles underscore templates from HTML script tags into CommonJS in a browserify transform"
It's better to describe the process as "transpilation."
Javascript always executes in a specific environment: in Chrome and Electron, it's the V8 engine; in Firefox, it's SpiderMonkey; etc. Each of these engines supports a specific set of language features and not others. As an example, some engines only support var and do not support const or let. Some support async/await, and others only support Promise.
But web developers know about these other features, and they want to use them, even when they're writing for an engine that doesn't support those features. Why? Most new language features are designed with the goal of making it possible to express complicated concepts in simpler and cleaner ways. This is extremely important, because the number one job of code is to make its purpose clear.
Thus, most language features are essentially syntactic sugar for existing functionality. In those cases, it's always possible to express a routine using both new and old syntax. This is a logical necessity.
A transpiler like Babel can read a script written using advanced syntax, and then re-express the script using a restricted set of language features. Relying on an intermediate representation called an abstract syntax tree, it can produce code that is guaranteed to be functionally equivalent, even though it does the work using very different, more widely-supported control structures.
Perhaps we web developers have gotten lazy in our terminology. When we talk of "compiling" javascript, we aren't usually talking about converting the script to something like bytecode. We're talking about transpilation.
Other kinds of build tasks are also becoming quite common. These days, the front-end is obsessed with a million flavors of "templating," because it's extremely tedious and confusing to describe DOM changes using pure javascript, and because application complexity is increasingly very rapidly. Some frameworks require you to convert source code to other intermediary forms that are later consumed by the web application at runtime. Others permit devs to describe UI using invented syntaxes that no browser is even attempting to support natively. Which tasks are needed varies by application depending on which frameworks are being used, the particulars of the application architecture, and the contours of the deployment environment, and that's just a start.
At its foundation, a web page is built using HTML, CSS, and javascript. That much hasn't changed. But today, most serious applications are built almost entirely in javascript (or something very much like it) and sass. Building the application is the process of applying a set of transformations to the source code to yield the final artifacts in those three bedrock languages.
We lump all that stuff under the term "compile."
You pretty much hit the nail on the head. When the Compile (or more appropriately transpilation) operation happens on a JavaScript project it can mean a number of things. As you mentioned these could range from minification, applying polyfills, shims, or the literal act of "compiling" the scripts into a single bundle file for platform/browser consumption.
Transpilation when using super sets of the JavaScript language such as TypeScript, ActionScript, or UnityScript describes the process of converting the source x-script back into native JavaScript which can be in turn be interpreted by a browser (since the browser doesn't recognize the superset languages).
However you are absolutely correct. We aren't compiling our JavaScript into binary, but the term gets thrown around a lot which can lead to confusion. All that said, we are closing in on the age of adoption of WebAssembly and ASMJs which promises to bring the age of bytecode running in the browser which will bring about some interesting possibilities, but alas... That's a story for another day ;)
You're right when you say these front-end Javascript tools don't use the word compile in same context in what your used to with build tools for languages like C/C++. C/C++ compilers turn source code into machine code.
These JavaScript build tools-- like Webpack-- use the word compile in a sense thats more metaphorical than conventional.
When web build tools use the word compile, they're using it in the sense that they are transpiling, minifying (a.k.a uglyfying), and bundling the source files so they are better optimized for client browsers and network requests. (Smaller file sizes, better browser compatibility, less HTTP requests from bundled assets, etc.)
I'm trying to get a more powerful regex library into javascript. The only solution I found is to compile Oniguruma regex library to javascript using Emscripten
I've installed Emscripten and tested it with their small test scripts, also downloaded oniguruma source code, but still don't know what should be done next.
Anyone familiar with emscripten?
When you utilize Emscripten, the general way of building/compiling from C/C++ stays similar. The steps which change, are that you don't use e.g. the gcc compiler but Emscripten compiler.
That said there is the general question of whether you are familiar with C/C++ and more specific with autotools (which seems like the build tool Oniguruma uses). If you are not, you will probably have a very hard time understanding what needs to be done and how.
Last I checked Emscripten did not have support for Libtool, so building, utilizing autotools, will probably fail. Feel free to ask at Emscripten IRC channel though, whether this is indeed not possible.
Another way I can think of is using autotools to generate Makefiles and then writing custom targets for Emscripten programs. Beware that this is for advanced users, familiar with the make cruft.
If these steps are to taxing for you perhaps you should see whether a Javascript library can be sufficient for you.
A more realistic approach to do this is going to be to use http://xregexp.com. It adds many more features to RegExps and compiles them down to JavaScripts more limited RegExp dialect so it can get the best of both features and performance. Compiling a regexp library using emscripten is very unlikely to be performant enough to use in production. For some uses, emscripten is excellent, but in this case it seems like the overhead is going to be not worth the cost.
The author of XRegExp even has an article on lookbehinds http://blog.stevenlevithan.com/archives/javascript-regex-lookbehind
So basically I have a number of concerns holding me back from coffeescript:
I'm not really an expert in js yet, even though I'm using it for around 3 years now I still feel like I'm missing something important about it. Since it's mostly a supportive technology for me I never find time to go in depths of js ( which, I admit, might be a wrong attitude ).
My js knowledge will get even worse if I'll start Using coffeescript
I'm not sure if I can actually trust coffescript, meaning the js code it compiles to
At times I don't understand the js code coffeescript compiles to and even worse - why it compiles like that.
I'd like to know your thoughts on the above points. The crucial one is:
How using coffeescript affected your knowledge of js? And how important you think it is to fully understand js before switching to coffeescript?
You should understand what problems Coffeescript is supposed to solve.
And for that, you should have a basic knowledge of javascripts' "bad parts".
I suggest reading Douglas Crawford about that (there's a book, but also a lot of resorces on internet. Just google "javascript bad parts").
Basically, the idea is that "Underneath all those awkward braces and semicolons, JavaScript has always had a gorgeous object model at its heart. CoffeeScript is an attempt to expose the good parts of JavaScript in a simple way." (taken from coffeescript's site).
There's a tool to assist programmers to avoid javascript pitfalls called jslint.
This tool analizes your code and gives warnings about common mistakes, such as global variables, semicolon insertions, namespace pollution, etc...
Coffeescript translates to javascript. But the javascript it generates is a cannocical subset, highly compliant with jslint.
What's more, it generates javascript code valid on all browsers.
So it is not just a nice syntactic sugar layer, it really helps generate solid code.
I'd like to address your concerns.
1) If you've been using JS for three years, you probably have a pretty solid understanding of JS. If you haven't gained a solid understanding yet, it may be time to supplement your knowledge with one of the good JS books.
2) Coffee-script probably wont make your knowledge of JS worse. The way you design Coffee-script applications is the same way that you would design a JS application (for the most part), so the design skills you gain will transfer over. Program design, in my opinion, is the most important aspect of programming.
3) Why don't you trust the JS? Why do you trust any of the other compilers/interpreters/other tools you use? I doubt Coffee-script is bug free, but many people use it for many purposes. This means that a large set of behavior has been tested, often in production, so your use case has probably already been tried and tested.
4) Of course the JS generated by Coffee-script will look foreign to you, since the rules for generating it don't have human readability as a priority. Reading it, however, will increase your knowledge of JS as you see how peculiarly written programs run. This brings us back to point 1.
I think that the crucial thing to remember here is that Coffeescript IS javascript. Every Coffeescript statement or magic operator has a distinct concrete representation in Javascript. For instance (x) -> x * x in Coffeescript will translate directly to function (x) { return x * x; }.
You can't really write Coffeescript without being aware of the Javascript it will generate. For one thing, the generated Javascript is what you will have to debug. If anything, I believe that writing Coffeescript could possibly improve your understanding of Javascript, because it forces you to make decisions that are unique to Javascript. For instance, when in Coffeescript, you decide to use => instead of -> in reality you are making a decision about whether or not you want to bind this - a very real Javascript problem.
When (or if) to start using Coffeescript? I think the answer to this is more or less up to you. Try it out, and if you feel that it is easier to get your tasks done using Coffeescript, then stick to it. If you find it difficult to write the code in a different language from the one that runs (and thus the one you have to debug), then go back to Javascript.
So here's what I think about the topic:
JS is not a supportive technology (support for what?). It is a language mostly used on front-end and there is a new trend of using it on back-end. Since browser do not support CoffeeScript natively than unless you use it as a back-end then I don't think there is a point in using CoffeeScript. Although learning new language is always a good idea.
Not at all. Actually using CoffeeScript is like using different language. Learning one cannot make you dumber in the other. Unless you stop learning the other one.
There is no evidence that CoffeeScript compiles to buggy or slow code. Actually I am using CoffeeScript for some time and I didn't observe any performance hit.
Actually you don't need to understand why it compiles like this. If you are using CoffeeScript on back-end then you don't even have to look at the code it compiles into (you only need the source code). As for using it to make browser scripts then yes - it may be a bit difficult to work with it (debug). That's why I always advice to write normal JavaScript for browsers and use CoffeeScript on back-end.
Now as for the last question: I don't think that CoffeeScript affected my JS knowledge at all. I treat them as separate languages. Also you don't need to know JS in order to switch to CoffeeScript (although you should) unless you want to use CoffeeScript on front-end.
Also mastering JavaScript is always a good idea, no matter what. :)
Is there a way for me to pre-compile my JS code when building my solution? I would like to be made aware of common problems before I get to the browser. Ideally I would build the sln and, if necessary, have a plugin or call from the build events examine the js code against a Java compiler.
Thank you very much in advance!
Despite what many of the other posters have said, in many cases (incl. the Mozilla Spidermonkey engine found in the Firefox browser) Javascript is in fact compiled into bytecodes, vaguely similar to (but not compatible with) the ones used in Java. You just don't see the compiler's output because it is never available to you, only to the Javascript bytecode interpreter. It's also not possible to save the compiled bytecodes for reuse (at least in the web browser context; in alternate uses of the Spidermonkey engine I do think it is possible to save the compiled bytecodes in memory for reuse, but not in a form that can be saved to disk for another future use), as far as I know.
I use a Javascript shell JSDB which also uses the Spidermonkey engine; when you load in a file, it will complain about syntax errors before it runs even one line of code. This is not the same type of compilation as Java, though; Javascript is a loosely-typed language and so it won't catch problems the way a Java compiler would (e.g. complain about every last thing under the sun that it knows you haven't done right).
Having said that, I would second JSLint as it would probably catch many of your errors.
As a side note, the Rhino project lets you compile Javascript into Java classes; i've never tried this but it sounds interesting.
Javascript is an interpreted language, it's not compiled till runtime. Also, javascript has very, very little to do with Java. Netscape released JavaScript around the same time Sun released Java, and there was some sort of marketing deal between them. Otherwise, they're unrelated.
I apologize for my misuse of the term 'compile'. I do fully understand the difference between compiled and interpreted languages. What I am interested in is to have my syntax checked routinely during a build so typos, invalid method calls and the like are flagged. I'm going to look into what Jason S recommended for this. I am also fully aware that JavaScript is not Java, but have read before that you could run your JS code through a Java Compiler for syntax checking. I was hoping to find something better integrated with VS.
Thank you very much to everyone understanding the intent of my request.
Javascript isn't compiled, has nothing to do with Java but some shared syntax, and is best tested by loading your app into a browser.
There are some JS testing frameworks/tools available but I couldn't recommend any of them myself.
Try JSLint.VS