Coverage-based fuzzing of JavaScript applications - javascript

The American Fuzzy Lop, and the conceptually related LLVM libfuzzer not only generate random fuzzy strings, but they also watch branch coverage of the code under test and use genetic algorithms to try to cover as many branches as possible. This increases the hit frequency of the more interesting code further downstream as otherwise most of the generated inputs will be stopped early in some deserialization or validation.
But those tools work at native code level, which is not useful for JavaScript applications as it would be trying to cover the interpreter, but not really the interpreted code.
So is there a way to fuzz JavaScript (preferably in browser, but tests running in node.js would help too) with coverage guidance?
I looked at the tools mentioned in this old question, but those that do javascript don't seem to mention anything about coverage profiling. And while radamsa mentions optionally pairing it with coverage analsysis, I haven't found any documentation on how to actually do it.
How can one fuzz-test java-script (in browser) application with coverage guidance?

Fuzzing a JavaScript engine draws a lot of attention as the number of browser users is about 4 Billion. Several works have been done to find bugs in JS engines, including popular large engines, e.g, v8, webkit, chakracore, gecko, or some small embedded engines, like jerryscript, QuickJS, jsish, mjs, mujs.
It is really difficult to find bugs using AFL as the mutation mechanisms provided by AFL is not practical for JS files, e.g, bitflip can hardly be a valid mutation. Since JS is a structured language, several works using ECMAScript grammar to mutate/generate JS files(seeds):
LangFuzz parses sample JS files and splits them into code fragments. It then recombines the fragments to produce test cases.
jsfunfuzz randomly generates syntactically valid JS statements from JS grammar manually written for fuzzing.
Dharma is a generation-based, context-free grammar fuzzer, generating files based on given grammar.
Superion extends AFL using tree-based mutation guided by JS grammar.
The above works can easily pass the syntax checks but fail at semantic checks. A lot of generated JS seeds are semantically invalid.
CodeAlchemist uses a semantics-aware approach to generate code segments based on a static type analysis.
There are two levels of bugs related to JS engines: simple parser/interpreter bugs and deep inside logic bugs. Recently, there is a trend that the number of simple bugs decreases while more and more deep bugs come out.
DIE uses aspect-preserving mutation to preserves the desirable properties of CVEs. It also using type analysis to generate semantic-valid bugs.
Some works focus on mutating intermediate representations.
Fuzzilli is a coverage-guided fuzzer based on mutation on the IR level. The mutations on IR can guarantee semantic validity and can be transferred to JS.
Fuzzing JS is an interesting and hot topic according to the top conference of security/SE in recent years. Hope this information is helpful.

Related

What makes pre-parsing a function faster than a full parse?

To improve performance JavaScript engines sometimes only fully parse functions when they are actually called.
For example, from the Spidermonkey source code:
Checking the syntax of a function is several times faster than doing a full parse/emit, and lazy parsing improves both performance and memory usage significantly when pages contain large amounts of code that never executes (which happens often).
What steps can the parser skip while still being able to validate the syntax?
It appears that in Spidermonkey some of the savings come from not emitting bytecode, like after a full parse. Does a full parse in e.g. V8 also include generating machine code?
First off a clarification: the two steps are called "pre-parsing" and "full parsing". "Lazy parsing" describes the strategy of doing the former first, and then the latter when needed.
The two big reasons why pre-parsing is faster are:
it doesn't build the AST (abstract syntax tree), which is usually the output of the parser
it doesn't do scope resolution for variables.
There are a few other internal steps done by the parser in preparation for code generation that aren't needed when only checking for errors, but the above two are the major points.
(Full) parsing and code generation are separate steps.

Semi-obfuscate/uglify JavaScript

I know about JS minfiers, obfuscators and minifiers. I was wondering if there is any existing tool (or any fast-to-code solution) to partially obfuscate JavaScript. By partially I mean that it should become difficult to read, but not appear as uglified/minified. It should keep indentation, but lose comments, and partially change variable names, making them unclear without converting them to "a, b, c" like an obfuscator.
The purpose of this could be to take an explicit and reusable code and make it implicit and difficult to be reused by other people, without making it impossible to work with for yourself.
Any idea from where to start to achieve this ? Maybe editing an existing obfuscator ?
[This answer is a direct response to OP's request].
Semantic Designs JavaScript obfuscator will do what you want, but you'll need two passes.
On the first pass, run it as obfuscator; it will rename identifiers (although you can control how much or how that is done), strip whitepspace and comments. If you limit its ability to rename the identifiers, you lose some the strength of the obfuscator but that's your choice.
On the second pass, run it as a prettyprinter; it will introduce nice indentation again.
(In fact, the idea for obfsucation came from building a prettyprinter; if you can print-pretty, surely it is easy to print-ugly).
From the point of view of working with the code, you are better off working with your master copy any way you like, complete with your indentation and nice commentary as documentation. When you are ready to obfsucate, you run the obfuscator, shipping the obfuscated result. Errors reported in the obfuscated result that involve obfuscated names can be mapped back to the original names, using the map of obfuscated <--> original names produced during the obfuscation step.
This a product of my company. I'd provide a link but SO hates it when I do that, so you'll have to find it via my bio or googling.
PS: It works exactly as #georg suggests, by parsing to an AST, mangling, and prettyprinting. It doesn't use esprima.
I'm not aware of a tool that would meet your specific requirements, but it seems to be relatively easy to create, given that the vital parts already exist.
parse the source into an AST, using esprima or similar
manipulate the tree in the way you want (eg. remove comments, mangle identifiers etc)
rebuild the source from the tree using escodegen

JS to Dart Conversion

This is a 2 part question. I am not lazy, simply not fundamentally fluent enough in JS to convert an entire library while referencing the Dart Synonyms page it seems. The Dart:js documentation explains how to access the JS global object as shown in this snippet, but if i'm not mistaken it's not what i'm looking for.
Q1: In the example snippet below, it wouldn't increase Angular's performance by utilizing Dart, correct?
var angular = context(['angular']);
var myapp = angular.module('myApp', ['ngResource','ngRoute']);
If I'm right, and I do need to convert libraries unavailable in Dart, jsparser and dart-synonym are really stumping me -- I can't find any simple documentation and when I look through the actual darts I get lost.
Dart Editor kicks an error when I try to run and build jsparser:
Unhandled exception:
'file:///C:/Work Root/Dart/jsparser-ec65c9e7467f/jsparser.dart': malformed type: line 26 pos 27: type 'Options' is not loaded
List args = new Options().arguments;
So I tried dart-synonym; it ran and built correctly, but then showed a clone of the Dart Synonyms page.
Q2: If accomplishing an automatic conversion is even possible, how do I use either of these?
Dart-synonym does not automatically convert other languages to Dart, it only provides a static synonym reference to allow manual conversion.
jsparser is meant to provide automatic conversion but the last commit is from more than a year ago. A lot has changed since then, and I doubt it will run without significant tweaks to the source. For instance, the Options class was removed a while back which is why you receive that malformed type error.
If you want to use Angular in Dart, you can use Google's own port: AngularDart
You could use a similar technique to amber-lang, particularly since Dart is essentially Smalltalk with JS syntax, while amber is Smalltalk that compiles to JS. Amber uses two base objects - STObject and JSObject, allowing ST code to call JS code and vice versa. Since amber-lang uses Pharo Smalltalk as its RI, a lib like SmaCC (a Smalltalk parser builder) could be used to generate the wrapper parsing code. It already provides such support for Java, Python, C and a number of other languages. The way JS works, you can't write, and certainly not debugm, a large or complex app. Dart is an attempt to do that the way ST does, with a strong type system and a semantic runtime equivalent to an interpreted language, with near-assembler speed, but with JS syntax, since Google has tons of traine node.js programmers on staff.
Creating a Smalltalk VM is much easier than something like the JVM, since it only includes the base Object, the code to interop with OS libs, and is itself written in Smalltalk and converted to C (or the cross platform libs to F-Script on MacOS) using SLANG (CLANG on MacOS). For that reason IBM Research did a Squeak/Pharo VM that can scale to over 1000 cores (RoarVM on GitHub). Doing that with the JVM would probably take a decade.
That Smalltalk is slow is an out of date notion (due to not being stack based, which no longer matters, and the work Sun did on JIT for Java, where the PoC was also in Smalltalk - called Strongtalk. Pharo's cogit JIT works essentially the same way - assembler code with pure interpreted semantics. I had to go away from Java due to the (lack of) speed of MSF4J microservices, which were themselves the fastest available in Java, and faster than anything in JS. I can run 256 microservices in Pharo ST with a quicker startup time, less memory use, better throughput and monitoring / management, than one express.js microservice.
Porting the 32 bit VM to a 64 bit UltraSparc was quite easy, and resulted in software that can filter and route large quantities of monitoring data significantly faster than a Cisco's offering - an IOS program running on a Cisco ASR-9010. The Sun/Oracle T5220's go used for about 1/600th the price of the ASR, which is a signicant advantage.
I like Dart, but I have to say to some degree for me it's just YAPL, since it doesn't do anything not already possible with a combination of PHaro and amber-lang. And the Smalltalk syntax (Ruby is similar) is both more readable and less verbose than JS (or Java for that matter). GO had some good ideas, but not enough to really generate much interest. ST has had 36 years of development, nothing brand new is going to offer equivalent tools or equivalent runtime stability.
Check out a4bp for an example of data analysis and visualization in Pharo. The website is also written in Pharo using Graphviz from within Smalltalk. SmallTalkHub is a combination of Pharo ST and amber-lang. Amber-lang could be used to wrap libs like Angular, until it becomes easy enough to write browser plugins for any arbitrary language and we aren't stuck with JS.

Can regular JavaScript be converted to asm.js, or is it only to speed up statically-typed low-level languages?

I have read the question How to test and develop with asm.js?, and the accepted answer gives a link to http://kripken.github.com/mloc_emscripten_talk/#/.
The conclusion of that slide show is that "Statically-typed languages and especially C/C++ can be compiled effectively to JavaScript", so we can "expect the speed of compiled C/C++ to get to just 2X slower than native code, or better, later this year".
But what about non-statically-typed languages, such as regular JavaScript itself? Can it be compiled to asm.js?
Can JavaScript itself be compiled to asm.js?
Not really, because of its dynamic nature. It's the same problem as when trying to compile it to C or even to native code - you actually would need to ship a VM with it to take care of those non-static aspects. At least, such a VM is possible:
js.js is a JavaScript interpreter in JavaScript. Instead of trying to create an interpreter from scratch, SpiderMonkey is compiled into LLVM and then emscripten translates the output into JavaScript.
But if asmjs code runs faster than regular JS, then it makes sense to compile JS to asmjs, no?
No. asm.js is a quite restricted subset of JS that can be easily translated to bytecode. Yet you first would need to break down all the advanced features of JS to that subset for getting this advantage - a quite complicated task imo. But JavaScript engines are designed and optimized to translate all those advanced features directly into bytecode - so why bother about an intermediate step like asm.js? Js.js claims to be around 200 times slower than "native" JS.
And what about non-statically-typed languages in general?
The slideshow talks about that from …Just C/C++? onwards. Specifically:
Dynamic Languages
Entire C/C++ runtimes can be compiled and the original language
interpreted with proper semantics, but this is not lightweight
Source-to-source compilers from such languages to JavaScript ignore
semantic differences (for example, numeric types)
Actually, these languages depend on special VMs to be efficient
Source-to-source compilers for them lose out on the optimizations done in those VMs
In response to the general question "is it possible?" then the answer is that sure, both JavaScript and the asm.js subset are Turing complete so a translation exists.
Whether one should do this and expect a performance benefit is a different question. The short answer is "no, you shouldn't." I liken this to trying to compress a compressed file; yes, it is possible to run the compression algorithm, but in general you should not expect the resulting file to be smaller.
The short answer: The performance cost of dynamically-typed languages comes from the meaning of the code; a statically-typed program with an equivalent meaning would carry the same costs.
To understand this, it is important to understand why asm.js offers a performance benefit at all; or, more generally, why statically-typed languages perform better than dynamically-typed ones. The short answer is "run-time type checking takes time," and a longer answer would include the improved feasibility of optimizing statically-typed code. For example:
function a(x) { return x + 1; }
function b(x) { return x - 1; }
function c(x, y) { return a(x) + b(y); }
If x and y are both known to be integers, I can optimize function c to a couple of machine code instructions. If they could be integers or strings, the optimization problem becomes much harder; I have to treat these as string appends in some cases, and addition in other cases. In particular, there are four possible interpretations of the addition operation that occurs in c; it could be addition, or string append, or two different variants of coerce-to-string-and-append. As you add more possible types, the number of possible permutations grows; in the worst case for a dynamically-typed language, you have k^n possible interpretations of an expression involving n terms which could each have any number of k types. In a statically typed language, k=1, so there is always 1 interpretation of any given expression. Because of this, optimizers are fundamentally more efficient at optimizing statically-typed code than dynamically-typed code: There are fewer permutations to consider when searching for opportunities to optimize.
The point here is that when converting from dynamically-typed code to statically-typed code (as you'd be doing when going from JavaScript to asm.js), you have to account for the semantics of the original code. Meaning the type-checking still occurs (it's just now been spelled out statically-typed code) and all those permutations are still present to stifle the compiler.
A few facts about asm.js, which hopefully make the concept clear:
Yes you can write the asm.js dialect by hand.
If you did look at the examples for asm.js, they are very far from being user friendly. Obviously Javascript is not the front end language for creating this code.
Translating vanilla Javascript to asm.js dialect is not possible.
Think about it - if you already could translate standard Javascript in a fully statically manner, why would there be a need for asm.js? The sole existance of asm.js means that the Javascript JIT people at some people gave up on their promise that Javascript will get faster without any effort from the developer.
There are several reasons for this, but let's just say it would be really hard for the JIT to understand a dynamic language as good as a static compiler. And then probably for the developers to fully understand the JIT.
In the end it boils down to using the right tool for the task. If you want static, very performant code, use C / C++ ( / Java ) - if you want a dynamic language, use Javascript, Python, ...
asm.js has been created by the need of have a small subset of javascript which can be easily optimized. If you can have a way to convert javascript to javascript/asm.js, asm.js is not needed anymore - that method can be inserted in js interpreters directly.
In theory, it is possible to convert / compile / transpile any JavaScript operation to asm.js if it can be expressed with the limited subset of the language present in asm.js. In practice, however, there is no tool capable of converting ordinary JavaScript to asm.js at the moment (June, 2017).
Either way, it would make more sense to convert a language with static typing to asm.js, because static typing is a requirement of asm.js and the lack thereof one of the features of ordinary JavaScript that makes it exceptionally hard to compile to asm.js.
Back in 2013, when asm.js was hot, there has been an attempt to compile a statically typed language similar to JavaScript, but both the language and the compiler seem to have been abandoned.
Today, in 2017, JavaScipt subsets TypeScript and Flow would be suitable candidates for conversion to asm.js, but the core dev teams of neither language is interested in such conversion. LLJS had a fork that could compile to asm.js, but that project is pretty much dead. ThinScript is a much more recent attempt and is based on TypeScript, but it doesn't appear to be active either.
So, the best and easiest way to produce asm.js code is still to write your code in C/C++ and convert / compile / transpile it. However, it remains to be seen whether we'll even want to do this in the forseeable future. Web Assembly may soon replace asm.js altogether and there's already popping up TypeScript-like languages like TurboScript and AssemblyScript that convert to Web Assembly. In fact, TurboScript was originally based on ThinScript and used to compile to asm.js, but they appear to have abandoned this feature.
It may be possible to convert regular JavaScript to asm.js by first compiling it to C or C++, and then compiling the generated code to asm.js using Emscripten. I'm not sure if this would be practical, but it's an interesting concept nonetheless.
There is also a compiler called NectarJS that compiles JavaScript to WebAssembly and ASM.js.
check this http://badassjs.com/post/43420901994/asm-js-a-low-level-highly-optimizable-subset-of
basically you need check that your code would be asm.js compatible (no coercion or type casting, you need to manage the memory, etc). The idea behind this is write your code in javascript, detect the bottle neck and do the changes in your code for use asm.js and aot compilation instead jit and dynamic compilation...is a bit PITA but you can still use javascript or other languages like c++ or better..in a near future, lljs.....

GWT reduce compiled javascript size

I have found that the size of the compiled JavaScript grows faster than I had expected. Adding a few lines of Java code to my project can increase the script size in several Kbs.
At the moment my compiled project weights 1Mb. I'm not using any external libraries except for those for MVP (Activities & Places) , testing (JUnit) and logging.
I would like to know if there are any coding practices/recommendations to keep the compiled script as small as possible. I'm not refering to code splitting, but to coding techniques or patterns that can make the compiled JavaScript effectively smaller.
Many thanks
GWT uses a "pay as you go" design philosophy, and since you're not allowed to use reflection the compiler can statically prove (on a method-by-method basis) that a section of code is "reachable", and eliminate those that are not. For example, if you never use the remove() method on ArrayList, then that code does not get included in the resulting JavaScript.
If you are seeing several kilobyte jumps with the addition of just a few lines, it probably means that you've introduced the use of a new type (and possibly one that depends on other new types) that you had not yet been using. It might also mean that you've made a change to send this new type "over the wire" back to the server, in which case a GWT generator had to include JavaScript for marshaling that type, and any new types that are reachable via its "has-a" and "is-a" references.
So if it were me, I would begin there: when you catch a 2-line change making a multi-kilobyte increase, start by looking at the types and asking whether it is a type that you have used before, and whether you're sending a new type over the wire, and whether that type also depends on other types under the hood.
One final thought: in Ray Ryan's 2009 presentation at Google I/O he mentioned a superstition that he had picked up from the GWT compiler team, where they recommended against using generic types (I'm not speaking of Java Generics here, but rather supertypes) as RPC arguments & return values. In particular, instead of having your RPC call take or return a Map, have it take or return a HashMap instead. The belief is that the GWT generator can then narrow the amount of serialization code that it has to create at compile time (because it could, for example, refrain from generating serialization code for a TreeMap).
I hope this helps.
GWT creates different output versions for each supported browser, so when you say the project size is 1MB are you then referring to the combined size of these ? (each browser only download's the one it actually needs).
I have tried to experiment with the generated output when using various inheritance/class/generics constructs. Unfortunately the extra complexity introduced far outweighs the small size improvements gained (when fx. dumping generics).
I have been on some large GWT projects (+50.000 lines) and have found that code obfuscating coupled with turning on compression on the web server to be the simplest most effective way to minimize the downloads. If this does not shrink the code enough, then look into GWT's compilation report which you can use to pinpoint potential problematic classes and places to insert code splitting.

Categories

Resources