I think Coffeescript is an awesome language! I was looking for some projects / issues / features that add Static Analysis to Coffeescript. However after some searching I found that the Coffeescript faq and this page suggest that static analysis might not be viable.
I was wondering that if there is a fundamental issue in implementing static analysis / static type checking in Coffeescript, because of which something of this sort does not already exist in the compiler?
Also, is it something that is not possible to do for non trivial checks but might work only for straightforward analysis? When I say straightforward I mean checking for trivial stuff like, whether the user has defined a function twice by the same name (in a class) or on the top level (or perhaps on the top level in a collection of related .coffee files).
I would appreciate if anyone could please point out some examples that show why implementing static analysis / type checking is not straightforward / possible / worth spending time on?
Thank you very much!
This answer is a bit of a brain dump since I'm interested in this also. Hope it helps.
I use the Google Closure Compiler to statically analyze the code that CoffeeScript generates. It has a really good static analyzer, and I'm not sure if there's a good reason to reinvent the wheel here. The easy way is to just write the annotations by hand:
###*
* #param {number} x
* #param {number} y
* #return {number}
###
adder = (x, y) -> x + y
It's a bit verbose, but on the other hand you're borrowing the static analysis abilities of the closure compiler which is really powerful and is able to check a lot. I actually write type annotations in a slightly more concise way, then have a script to rewrite the coffee file. My code ends up looking like this:
#! {number} x {number} y #return {number}
adder = (x, y) -> x + y
I'm sure you can see that the rewriter is pretty straightforward.
A quick note before I move on. Be sure to compile your code with -b (bare) if you're running it through the closure compiler. The closure compiler is pretty good, but it's not smart enough to do data flow analysis. CoffeeScript wraps your code in an anonymous function by default, which will trip up the compiler.
Another option along the same path (this would break compatibility with CoffeeScript, but would be a lot cooler) would be to have the Coffee compiler compile something like this:
adder = (number x, number y): number -> x + y
into JS like this:
/***
* #param {number} x
* #param {number} y
* #return {number
*/
var adder = function(x, y) {
return x + y;
};
which could then be fed into the closure compiler on a compile - if there were no errors the compiler could then strip all the comments out.
Indeed, this guy appeared to be doing exactly this. Sadly, his work seems to be in an incomplete state.
In all of these cases, we defer the hard work - static typechecking - to the closure compiler. If you don't want to do this, I'd understand, but it'd be tough to convince me that it's worthwhile to build a whole new static analysis tool from scratch. :)
EDIT a year later: I just use typescript these days. :)
I'm not an expert in CoffeeScript, so this might be the completely wrong answer, but it basically boils down to this: CoffeeScript is a very expressive language, with most of the semantics dynamically determined (and possibly strange edge cases). This is in much contrast to languages like Standard ML, which have a much more strictly defined semantics. In general, doing static analysis on higher order languages is considered very hard. I.e., static analysis on real higher order programs (Haskell, ML, especially javascript because of things like eval) is just hard because the flow of control is much more flexible. In fact, the static analysis solutions for higher order languages have really only been explored within the past twenty years or so. (Notably, see Matt Might's article on a tutorial style description of CFA.)
Basically, the reasons are this:
To do analysis, you have to deal with the problem of the expressive semantics coming form the flow control you get by slamming around higher order functions.
To do typing, typically these languages have a much richer set of types that are available. For example, there are very typically situations where if you try to assign a static type to a variable in Ruby (as in C, Java, ML, etc...) you get an error, but because certain paths of your program are never executed, it's all fine. Along with that, languages like Ruby others add a plethora of implicit type conversions that are really used to do cool programming. The notable work in this area with which I'm familar (dynamic analysis of static types for Ruby) comes from some of the people I work with, but there are certainly other examples.
Basically, the language is used in a much more dynamic way, with a much more expressive semantics, and reasoning about that statically is much harder, and prone to be imprecise. The basic front of approaching this (these days) is starting to look hybrid: you can statically analyze part of a program, and than also require the programmer give some test cases to do some kind of refined analysis.
I hope this somewhat answers your questions, again, sorry I can't address directly the direct concerns of your question as it applies to CoffeeScript, but there's a lot of work going on in analyzing things like JavaScript right now. I'll note that some of the real problems with Javascript come from it's strange semantics, the prototypical inheritance is hard to reason about, and especially eval()! Typically program analyses for these languages impose certain restrictions (for example, throwing out eval completely!) to make the analysis more feasible!
Related
Does anybody know if JavaScript code that was generated by Google closure with advanced settings be reversed engineered?
Google closure renames most of the js variable and function names so I curious to know if it's a good option for protecting code from being stolen.
Thanks
Yes, it can easily be reversed. Google Closure is a code optimizer which is almost the opposite of code obfuscation. That is not always obvious to everyone because some of the things it does, actually obfuscate a little bit your code.
I talking about the variable and function renaming, whitespace and comment removal.
But it is not meant to protect. For instance, the new names it generates are always the same (deterministic). And the reason is that it only renames variables and functions to replace them with short ones. It does not care for protection. It does not have anything to change the control flow, which is what you'll want if you seek protection. It does not have anything to obfuscate strings and other literals as well. On the contrary, it uses techniques such as Constant Folding and Constant Propagation which actually make your code simpler and easier to follow.
For protection I really think your best option is JScrambler. They have good range of source code transformations and code traps that are meant for protection.
Yes it can be reversed, however, if your code in ADVANCED_OPTIMIZATIONS has a lot of inlining to it, it can be much harder to reverse engineer. Secondly, since there is a lot of constant folding, it makes the code much harder to read, ie:
/**
* #type {number}
* #const
*/
var FLAG_A=0x4FFFFFF0;
/**
* #type {number}
* #const
*/
var FLAG_B=0x01;
/** #type {number} */
var flags=FLAG_A|FLAG_B;
console.log(flags,(flags&FLAG_A)>0);
outputs to:
console.log(1342177265,!0);
This makes the actual code much harder to read and figure out what it's doing. The less you rely on inheritance and strings, and move towards bit flags -- will make the reverse engineered code work, but make it much harder to change or extend.
By using Javascript beautifier, you can convert code to some extend readble but not completly reverse engineered.
What's the fastest way to square a number in JavaScript?
function squareIt(number) {
return Math.pow(number,2);
}
function squareIt(number) {
return number * number;
}
Or some other method that I don't know about. I'm not looking for a golfed answer, but the answer that's likely to be shortest in the compiler, on the average.
Edit: I saw Why is squaring a number faster than multiplying two random numbers? which seemed to indicate that squaring is faster than multiplying two random numbers, and presumed that n*n wouldn't take advantage of this but that Math.pow(n,2) would. As jfriend00 pointed out in the comments, and then later in an answer, http://jsperf.com/math-pow-vs-simple-multiplication/10 seems to suggest that straight multiplication is faster in everything but Firefox (where both ways are similarly fast).
Note: Questions like this change over time as browser engines change how their optimizations work. For a recent look comparing:
Math.pow(x1, 2)
x1 * x1
x1 ** 2 // ES6 syntax
See this revised performance test and run it in the browsers you care about: https://jsperf.com/math-pow-vs-simple-multiplication/32.
As of April 2020, Chrome, Edge and Firefox show less than 1% difference between all three of the above methods.
If the jsperf link is not working (it seems to be occasionally down), then you can try this perf.link test case.
Original Answer from 2014:
All performance questions should be answered by measurement because specifics of the browser implementation and the particular scenario you care about are often what determine the outcome (thus a theoretical discussion is not always right).
In this case, performance varies greatly by browser implementation. Here are are results from a number of different browsers in this jsperf test: http://jsperf.com/math-pow-vs-simple-multiplication/10 which compares:
Math.pow(x1, 2)
x1 * x1
Longer bars are faster (greater ops/sec). You can see that Firefox optimizes both operations to be pretty much the same. In other browsers, the multiplication is significantly faster. IE is both the slowest and shows the greatest percentage difference between the two methods. Firefox is the fastest and shows the least difference between the two.
In ES6 you can do the following with Exponentiation (x ** y), which produces the same result as Math.pow(x,y):
function squareIt(number) {
return number ** 2;
}
console.log(squareIt(5));
or you can use a JavaScript library called BigInteger.js for the purpose.
alert(bigInt(5).square());
<script src="https://cdnjs.cloudflare.com/ajax/libs/big-integer/1.6.40/BigInteger.min.js"></script>
In general, x * x is either much faster than or about the same as a pow() call in any language. pow() is a general exponential designed to work with floating point arguments, and it usually uses a calculation that has a lot more operations than a single multiplication. It's notoriously slow. Some pow() implementations will helpfully filter out integer powers for special evaluations, like for x^4 it might do x2=x * x, x4=x2 * x2, but adding special conditions like that can slow down the general implementation, and the x * x vs. pow() rule is so well known among programmers you can't really count on the library implementation to help you out. This is standard advice in numerical analysis: never use pow() for x^2 (or x^.5). At best, it's no slower than the pow implementation, if it's optimized out as x * x at compile time, and at worst, it's horribly slower (and probably not as accurate either). You could go and test it on every possible platform you expect your code to run on, but in real life, there's no good reason to use pow() for squares. There can be good reasons to write a convenience function that does x * x, but if the language allows it, it's a good idea to make sure that it's marked up so that there's no actual function call going on at the machine level. Unfortunately, I don't think Javascript has anything like that, but I suspect that the JIT compilers are usually smart enough to render short functions like that without a jump.
Regarding the issue of x * x vs. x * y, the former would often be faster simply because it avoids a MOV at the machine level (aside from the considerations in the post you referenced), but it's pretty certain that the JS engine is smart enough not to do an extra MOV if the operand is already in a register. It's not going to load x from memory and then load it from memory again, or move it from one register into another. That's a basic behavior of optimizing compilers. You have to keep in mind that the compiler is going to do a lot of rearranging and consolidation of algebraic operations, so when you write x * x, a lot of things could be going on depending on what happened to x previously or happens to it later. This is another reason to avoid pow(), since the optimizer can do a lot of tricks with x * x that may not be available if it does a pow() call instead. Again, you can hope that it intelligently inlines pow(x,2) to x * x, but don't count on it.
I have read the question How to test and develop with asm.js?, and the accepted answer gives a link to http://kripken.github.com/mloc_emscripten_talk/#/.
The conclusion of that slide show is that "Statically-typed languages and especially C/C++ can be compiled effectively to JavaScript", so we can "expect the speed of compiled C/C++ to get to just 2X slower than native code, or better, later this year".
But what about non-statically-typed languages, such as regular JavaScript itself? Can it be compiled to asm.js?
Can JavaScript itself be compiled to asm.js?
Not really, because of its dynamic nature. It's the same problem as when trying to compile it to C or even to native code - you actually would need to ship a VM with it to take care of those non-static aspects. At least, such a VM is possible:
js.js is a JavaScript interpreter in JavaScript. Instead of trying to create an interpreter from scratch, SpiderMonkey is compiled into LLVM and then emscripten translates the output into JavaScript.
But if asmjs code runs faster than regular JS, then it makes sense to compile JS to asmjs, no?
No. asm.js is a quite restricted subset of JS that can be easily translated to bytecode. Yet you first would need to break down all the advanced features of JS to that subset for getting this advantage - a quite complicated task imo. But JavaScript engines are designed and optimized to translate all those advanced features directly into bytecode - so why bother about an intermediate step like asm.js? Js.js claims to be around 200 times slower than "native" JS.
And what about non-statically-typed languages in general?
The slideshow talks about that from …Just C/C++? onwards. Specifically:
Dynamic Languages
Entire C/C++ runtimes can be compiled and the original language
interpreted with proper semantics, but this is not lightweight
Source-to-source compilers from such languages to JavaScript ignore
semantic differences (for example, numeric types)
Actually, these languages depend on special VMs to be efficient
Source-to-source compilers for them lose out on the optimizations done in those VMs
In response to the general question "is it possible?" then the answer is that sure, both JavaScript and the asm.js subset are Turing complete so a translation exists.
Whether one should do this and expect a performance benefit is a different question. The short answer is "no, you shouldn't." I liken this to trying to compress a compressed file; yes, it is possible to run the compression algorithm, but in general you should not expect the resulting file to be smaller.
The short answer: The performance cost of dynamically-typed languages comes from the meaning of the code; a statically-typed program with an equivalent meaning would carry the same costs.
To understand this, it is important to understand why asm.js offers a performance benefit at all; or, more generally, why statically-typed languages perform better than dynamically-typed ones. The short answer is "run-time type checking takes time," and a longer answer would include the improved feasibility of optimizing statically-typed code. For example:
function a(x) { return x + 1; }
function b(x) { return x - 1; }
function c(x, y) { return a(x) + b(y); }
If x and y are both known to be integers, I can optimize function c to a couple of machine code instructions. If they could be integers or strings, the optimization problem becomes much harder; I have to treat these as string appends in some cases, and addition in other cases. In particular, there are four possible interpretations of the addition operation that occurs in c; it could be addition, or string append, or two different variants of coerce-to-string-and-append. As you add more possible types, the number of possible permutations grows; in the worst case for a dynamically-typed language, you have k^n possible interpretations of an expression involving n terms which could each have any number of k types. In a statically typed language, k=1, so there is always 1 interpretation of any given expression. Because of this, optimizers are fundamentally more efficient at optimizing statically-typed code than dynamically-typed code: There are fewer permutations to consider when searching for opportunities to optimize.
The point here is that when converting from dynamically-typed code to statically-typed code (as you'd be doing when going from JavaScript to asm.js), you have to account for the semantics of the original code. Meaning the type-checking still occurs (it's just now been spelled out statically-typed code) and all those permutations are still present to stifle the compiler.
A few facts about asm.js, which hopefully make the concept clear:
Yes you can write the asm.js dialect by hand.
If you did look at the examples for asm.js, they are very far from being user friendly. Obviously Javascript is not the front end language for creating this code.
Translating vanilla Javascript to asm.js dialect is not possible.
Think about it - if you already could translate standard Javascript in a fully statically manner, why would there be a need for asm.js? The sole existance of asm.js means that the Javascript JIT people at some people gave up on their promise that Javascript will get faster without any effort from the developer.
There are several reasons for this, but let's just say it would be really hard for the JIT to understand a dynamic language as good as a static compiler. And then probably for the developers to fully understand the JIT.
In the end it boils down to using the right tool for the task. If you want static, very performant code, use C / C++ ( / Java ) - if you want a dynamic language, use Javascript, Python, ...
asm.js has been created by the need of have a small subset of javascript which can be easily optimized. If you can have a way to convert javascript to javascript/asm.js, asm.js is not needed anymore - that method can be inserted in js interpreters directly.
In theory, it is possible to convert / compile / transpile any JavaScript operation to asm.js if it can be expressed with the limited subset of the language present in asm.js. In practice, however, there is no tool capable of converting ordinary JavaScript to asm.js at the moment (June, 2017).
Either way, it would make more sense to convert a language with static typing to asm.js, because static typing is a requirement of asm.js and the lack thereof one of the features of ordinary JavaScript that makes it exceptionally hard to compile to asm.js.
Back in 2013, when asm.js was hot, there has been an attempt to compile a statically typed language similar to JavaScript, but both the language and the compiler seem to have been abandoned.
Today, in 2017, JavaScipt subsets TypeScript and Flow would be suitable candidates for conversion to asm.js, but the core dev teams of neither language is interested in such conversion. LLJS had a fork that could compile to asm.js, but that project is pretty much dead. ThinScript is a much more recent attempt and is based on TypeScript, but it doesn't appear to be active either.
So, the best and easiest way to produce asm.js code is still to write your code in C/C++ and convert / compile / transpile it. However, it remains to be seen whether we'll even want to do this in the forseeable future. Web Assembly may soon replace asm.js altogether and there's already popping up TypeScript-like languages like TurboScript and AssemblyScript that convert to Web Assembly. In fact, TurboScript was originally based on ThinScript and used to compile to asm.js, but they appear to have abandoned this feature.
It may be possible to convert regular JavaScript to asm.js by first compiling it to C or C++, and then compiling the generated code to asm.js using Emscripten. I'm not sure if this would be practical, but it's an interesting concept nonetheless.
There is also a compiler called NectarJS that compiles JavaScript to WebAssembly and ASM.js.
check this http://badassjs.com/post/43420901994/asm-js-a-low-level-highly-optimizable-subset-of
basically you need check that your code would be asm.js compatible (no coercion or type casting, you need to manage the memory, etc). The idea behind this is write your code in javascript, detect the bottle neck and do the changes in your code for use asm.js and aot compilation instead jit and dynamic compilation...is a bit PITA but you can still use javascript or other languages like c++ or better..in a near future, lljs.....
When a programming language is statically typed, the compiler can be more precise about memory allocation and thus be generally more performant (with all other things equal).
I believe ES4 introduced optional type hinting (from what I understand, Adobe had a huge part in contributing to its spec due to actionscript). Does javascript officially support type hinting as a result? Will ES6 support optional type hinting for native variables?
If Javascript does support type hinting, are there any benchmarks that show how it pays off in terms of performance? I have not seen an open source project use this yet.
My understanding, from listening to many Javascript talks on the various sites, is that type-hinting won't do as much to help as people think it will.
In short, most Javascript objects tend to have the same "shape", if you will. That is, they will have the same properties created in the same order. This "shape" can be thought of as the "type" of the object.
An example:
function Point(x, y) {
this.x = x;
this.y = y;
}
All objects made from "Point" will have the same "shape" and the newer internal Javascript engines can do some fancy games to get faster lookup.
In Chrome (perhaps others), they use a high-bit flag to indicate if the rest of the number is an integer or a pointer.
With all of these fancy things going on, that just leaves typing for the human coders. I, for one, really like not having to worry about type and wouldn't use that feature.
You are semi-correct, though. Type hinting is a part of ActionScript 3 which is a derivative of ECMAScript -- but hinting has never made it into the standard. AFAIK, outside of wishful thinking, it hasn't been discussed.
This video describes things in far more detail:
http://www.youtube.com/watch?v=FrufJFBSoQY
I'm late, but since no one really answered you questions regarding the standards, I'll jump in.
Yes, type hinting was discussed as part of ECMAScript 4, and it looked like it was going to be the future of JavaScript... until ES4 bit the dust. ECMAScript 4 was abandoned and never finalized. ECMAScript 5 (the current standard) did not contain many of the things that were planned for ECMAScript 4 (including type hinting), and was really just a quickly beefed up version of the ECMAScript 3.1 draft -- to get some helpful features out the door in the wake of ES4's untimely demise.
As you mentioned, now they're working on churning out ECMAScript 6 (which has some totally awesome features!), but don't expect to see type hinting. The Adobe guys have, to a degree, parted ways with the ECMAScript committee, and the ES committee doesn't seem interested in bringing it back (I think for good reason).
If it is something you want, you might want to check out TypeScript. It's a brand new Microsoft project which is basically an attempt to be ES6+types. It's a superset of JavaScript (almost identical except for the inclusion of types), and it compiles to runnable JavaScript.
JavaScript JIT compilers have to do some pretty fancy stuff to determine the types of expressions and variables, since types are crucial to many optimizations. But the JavaScript compiler writers have spent the last five years doing all that work. The compilers are really smart now. Optional static types therefore would not improve the speed of a typical program.
Surprisingly, type annotations in ActionScript sometimes make the compiled code slower by requiring a type check (or implicit conversion) when a value is passed from untyped code to typed code.
There are other reasons you might want static types in a programming language, but the ECMAScript standards committee has no interest in adding them to JS.
ES7 (Not coming soon) has a new feature called guard might be the one you are asking.
The syntax now is a little similar to ES4 and TypeScript.
All use : and append the type to the variable.
But its not confirm syntax.
Javascript is prototype-based, so the 'type' of an object is entirely dynamic and able to change through its lifetime.
Have a look at Ben Firshman's findings on Javascript performance in regards to object types - http://jsconf.eu/2010/speaker/lessons_learnt_pushing_browser.html
Is there any C interpreter written in javascript or java ?
I don't need a full interpreter but I need to be able to do a step by step execution of the program and being able to see the values of variables, the stack...all that in a web interface.
The idea is to help C beginners by showing them the step by step execution of the program.
We are using GWT to build the interface so if something exists in Java we should be able to use it.
I can modify it to suit my needs but if I can avoid to write the parser / abstract-syntax tree walker / stack manipulation... that would be great.
Edit :
To be clear I don't want to simulate the complete C because some programs can be really tricky.
By step I mean a basic operation such as : expression evaluation, affectation, function call.
The C I want to simulate will contains : variables, for, while, functions, arrays, pointers, maths functions.
No goto, string functions, ctypes.h, setjmp.h... (at least for now).
Here is a prototype : http://www.di.ens.fr/~fevrier/war/simu.html
In this example we have manually converted the C code to a javascript representation but it's limited (expressions such as a == 2 || a = 1 are not handled) and is limited to programs manually converted.
We have a our disposal a C compiler on a remote server so we can check if the code is correct (and doesn't have any undefined behavior). The parsing / AST construction can also be done remotely (so any language) but the AST walking needs to be in javascript in order to run on the client side.
There's a C grammar available for antlr that you can use to generate a C parser in Java, and possibly JavaScript too.
There is em-scripten which converts LLVM languages to JS a little hacking on it and you may be able to produce a C interperter.
felixh's JSCPP project provides a C++ interpreter in Javascript, though with some limitations.
https://github.com/felixhao28/JSCPP
So an example program might look like this:
var JSCPP = require('JSCPP');
var launcher = JSCPP.launcher;
var code = 'int main(){int a;cin>>a;cout<<a;return 0;}';
var input = '4321';
var exitcode = launcher.run(code, input);
console.info('program exited with code ' + exitcode);
As of March 2015 this is under active development, so while it's usable there are still areas where it may continue to expand. Check the documentation for limitations. It looks like you can use it as a straight C interpreter with limited library support for now with no further issues.
I don't know of any C interpreters written in JavaScript, but here is a discussion of available C interpreters:
Is there an interpreter for C?
You might do better to look for any sort of virtual machine that runs on top of JavaScript, and then see if you can find a C compiler that emits the proper machine code for the VM. A likely one would seem to be LLVM; if you can find a JavaScript VM that can run LLVM, then you will be in great shape.
I did a few Google searches and found Emscripten, which translates C code into JavaScript directly! Perhaps you can do something with this:
https://github.com/kripken/emscripten/wiki
Perhaps you can modify Emscripten to emit a "sequence point" after each compiled line of C, and then you can make your simulated environment single-step from sequence point to sequence point.
I believe Emscripten is implementing LLVM, so it may actually have virtual registers; if so it might be ideal for your purposes.
I know you specified C code, but you might want to consider a JavaScript emulation of a simpler language. In particular, please consider FORTH.
FORTH runs on an extremely simple virtual machine. In FORTH there are two stacks, a data stack and a flow-of-control stack (called the "return" stack); plus some global memory. Originally FORTH was a 16-bit language but there are plenty of 32-bit FORTH implementations out there now.
Because FORTH code is sort of "close to the machine" it is easy to understand how it all works when you see it working. I learned FORTH before I learned C, and I found it to be a valuable learning experience.
There are several FORTH interpreters available in JavaScript already. The FORTH virtual machine is so simple, it doesn't take very long to implement it!
You could even then get a C-to-FORTH translator and let the students watch the FORTH virtual machine interpret compiled C code.
I consider this answer a long shot for you, so I'll stop writing here. If you are in fact interested in the idea, comment below and ask for more details and I will be happy to share them. It's been a long time since I wrote any FORTH code but I still remember it fondly, and I'd be happy to talk more about FORTH.
EDIT: Despite this answer being downvoted to a negative score, I am going to leave it up here. A simulation for educational purposes is IMHO more valuable if the simulation is simple and easy to understand. The simple stack-based virtual machine for FORTH is very simple, yet you could compile C code to run on it. (In the 80's there was even a CPU chip made that had FORTH instructions as its native machine code.) And, as I said, I personally studied FORTH when I was a complete beginner and it helped me to understand assembly language and C.
The question has no accepted answer, now over two years after it was asked. It could be that Loïc Février didn't find any suitable JavaScript interpreter. As I said, there already exist several JavaScript interpreters for the FORTH virtual machine. Therefore, this answer is a practical one.
C is a compiled language, not an interpreted language, and has features like pointers which JS completely doesn't support, so interpreting C in Javascript doesn't really make sense