Is there any C interpreter written in javascript or java ?
I don't need a full interpreter but I need to be able to do a step by step execution of the program and being able to see the values of variables, the stack...all that in a web interface.
The idea is to help C beginners by showing them the step by step execution of the program.
We are using GWT to build the interface so if something exists in Java we should be able to use it.
I can modify it to suit my needs but if I can avoid to write the parser / abstract-syntax tree walker / stack manipulation... that would be great.
Edit :
To be clear I don't want to simulate the complete C because some programs can be really tricky.
By step I mean a basic operation such as : expression evaluation, affectation, function call.
The C I want to simulate will contains : variables, for, while, functions, arrays, pointers, maths functions.
No goto, string functions, ctypes.h, setjmp.h... (at least for now).
Here is a prototype : http://www.di.ens.fr/~fevrier/war/simu.html
In this example we have manually converted the C code to a javascript representation but it's limited (expressions such as a == 2 || a = 1 are not handled) and is limited to programs manually converted.
We have a our disposal a C compiler on a remote server so we can check if the code is correct (and doesn't have any undefined behavior). The parsing / AST construction can also be done remotely (so any language) but the AST walking needs to be in javascript in order to run on the client side.
There's a C grammar available for antlr that you can use to generate a C parser in Java, and possibly JavaScript too.
There is em-scripten which converts LLVM languages to JS a little hacking on it and you may be able to produce a C interperter.
felixh's JSCPP project provides a C++ interpreter in Javascript, though with some limitations.
https://github.com/felixhao28/JSCPP
So an example program might look like this:
var JSCPP = require('JSCPP');
var launcher = JSCPP.launcher;
var code = 'int main(){int a;cin>>a;cout<<a;return 0;}';
var input = '4321';
var exitcode = launcher.run(code, input);
console.info('program exited with code ' + exitcode);
As of March 2015 this is under active development, so while it's usable there are still areas where it may continue to expand. Check the documentation for limitations. It looks like you can use it as a straight C interpreter with limited library support for now with no further issues.
I don't know of any C interpreters written in JavaScript, but here is a discussion of available C interpreters:
Is there an interpreter for C?
You might do better to look for any sort of virtual machine that runs on top of JavaScript, and then see if you can find a C compiler that emits the proper machine code for the VM. A likely one would seem to be LLVM; if you can find a JavaScript VM that can run LLVM, then you will be in great shape.
I did a few Google searches and found Emscripten, which translates C code into JavaScript directly! Perhaps you can do something with this:
https://github.com/kripken/emscripten/wiki
Perhaps you can modify Emscripten to emit a "sequence point" after each compiled line of C, and then you can make your simulated environment single-step from sequence point to sequence point.
I believe Emscripten is implementing LLVM, so it may actually have virtual registers; if so it might be ideal for your purposes.
I know you specified C code, but you might want to consider a JavaScript emulation of a simpler language. In particular, please consider FORTH.
FORTH runs on an extremely simple virtual machine. In FORTH there are two stacks, a data stack and a flow-of-control stack (called the "return" stack); plus some global memory. Originally FORTH was a 16-bit language but there are plenty of 32-bit FORTH implementations out there now.
Because FORTH code is sort of "close to the machine" it is easy to understand how it all works when you see it working. I learned FORTH before I learned C, and I found it to be a valuable learning experience.
There are several FORTH interpreters available in JavaScript already. The FORTH virtual machine is so simple, it doesn't take very long to implement it!
You could even then get a C-to-FORTH translator and let the students watch the FORTH virtual machine interpret compiled C code.
I consider this answer a long shot for you, so I'll stop writing here. If you are in fact interested in the idea, comment below and ask for more details and I will be happy to share them. It's been a long time since I wrote any FORTH code but I still remember it fondly, and I'd be happy to talk more about FORTH.
EDIT: Despite this answer being downvoted to a negative score, I am going to leave it up here. A simulation for educational purposes is IMHO more valuable if the simulation is simple and easy to understand. The simple stack-based virtual machine for FORTH is very simple, yet you could compile C code to run on it. (In the 80's there was even a CPU chip made that had FORTH instructions as its native machine code.) And, as I said, I personally studied FORTH when I was a complete beginner and it helped me to understand assembly language and C.
The question has no accepted answer, now over two years after it was asked. It could be that Loïc Février didn't find any suitable JavaScript interpreter. As I said, there already exist several JavaScript interpreters for the FORTH virtual machine. Therefore, this answer is a practical one.
C is a compiled language, not an interpreted language, and has features like pointers which JS completely doesn't support, so interpreting C in Javascript doesn't really make sense
Related
I found this: click and thought what are the reasons behind this coding style ?
Defining identifiers like that _0x3384x4, kind of unreadable for a human ...?!
or writing object properties like:
{
"\x63\x68\x61\x72\x73": ' \uD83D\uDE23 ',
"\x63\x6C\x61\x73\x73": '_1az _1a- _2gc',
"\x6E\x61\x6D\x65": 'Bi\u1EC3u t\u01B0\u1EE3ng vui 18'
}
this could be written like that, couldn't it ?
{ chars=" 😣 ", class="_1az _1a- _2gc", name="Biểu tượng vui 18"}
Is it because of some old computers that can not display these characters? Is it kind of uglifying, protecting javascript code?
What kind of format is it (0x7892x8) kind of hex, what does it represent ? (eval("0x7892") evaluates 30866, but 0x7892x8 means 8th version of 30866 ... doesn't make sense for me ?!
It is no coding style. It is obfuscation.
From Wikipedia:
In software development, obfuscation is the deliberate act of creating
obfuscated code, i.e. source or machine code that is difficult for
humans to understand. Like obfuscation in natural language, it may use
needlessly roundabout expressions to compose statements.
Programmers may deliberately obfuscate code to conceal its purpose
(security through obscurity) or its logic, in order to prevent
tampering, deter reverse engineering, or as a puzzle or recreational
challenge for someone reading the source code.
Programs known as obfuscators transform readable code into obfuscated
code using various techniques.
There are many Tools out there, called Obfuscator, which obfusecate Code. Here is a Javascript Obfuscator for example:
http://www.jsobfuscate.com/
As you already right guesed it is hexadecimal. So for example x63 means 99 decimal.
Now we take a look into the Code Table:
http://www.codetable.net/
And we see, 99 decimal represents for example the c char. So \x63 basicly is c.
This is a 2 part question. I am not lazy, simply not fundamentally fluent enough in JS to convert an entire library while referencing the Dart Synonyms page it seems. The Dart:js documentation explains how to access the JS global object as shown in this snippet, but if i'm not mistaken it's not what i'm looking for.
Q1: In the example snippet below, it wouldn't increase Angular's performance by utilizing Dart, correct?
var angular = context(['angular']);
var myapp = angular.module('myApp', ['ngResource','ngRoute']);
If I'm right, and I do need to convert libraries unavailable in Dart, jsparser and dart-synonym are really stumping me -- I can't find any simple documentation and when I look through the actual darts I get lost.
Dart Editor kicks an error when I try to run and build jsparser:
Unhandled exception:
'file:///C:/Work Root/Dart/jsparser-ec65c9e7467f/jsparser.dart': malformed type: line 26 pos 27: type 'Options' is not loaded
List args = new Options().arguments;
So I tried dart-synonym; it ran and built correctly, but then showed a clone of the Dart Synonyms page.
Q2: If accomplishing an automatic conversion is even possible, how do I use either of these?
Dart-synonym does not automatically convert other languages to Dart, it only provides a static synonym reference to allow manual conversion.
jsparser is meant to provide automatic conversion but the last commit is from more than a year ago. A lot has changed since then, and I doubt it will run without significant tweaks to the source. For instance, the Options class was removed a while back which is why you receive that malformed type error.
If you want to use Angular in Dart, you can use Google's own port: AngularDart
You could use a similar technique to amber-lang, particularly since Dart is essentially Smalltalk with JS syntax, while amber is Smalltalk that compiles to JS. Amber uses two base objects - STObject and JSObject, allowing ST code to call JS code and vice versa. Since amber-lang uses Pharo Smalltalk as its RI, a lib like SmaCC (a Smalltalk parser builder) could be used to generate the wrapper parsing code. It already provides such support for Java, Python, C and a number of other languages. The way JS works, you can't write, and certainly not debugm, a large or complex app. Dart is an attempt to do that the way ST does, with a strong type system and a semantic runtime equivalent to an interpreted language, with near-assembler speed, but with JS syntax, since Google has tons of traine node.js programmers on staff.
Creating a Smalltalk VM is much easier than something like the JVM, since it only includes the base Object, the code to interop with OS libs, and is itself written in Smalltalk and converted to C (or the cross platform libs to F-Script on MacOS) using SLANG (CLANG on MacOS). For that reason IBM Research did a Squeak/Pharo VM that can scale to over 1000 cores (RoarVM on GitHub). Doing that with the JVM would probably take a decade.
That Smalltalk is slow is an out of date notion (due to not being stack based, which no longer matters, and the work Sun did on JIT for Java, where the PoC was also in Smalltalk - called Strongtalk. Pharo's cogit JIT works essentially the same way - assembler code with pure interpreted semantics. I had to go away from Java due to the (lack of) speed of MSF4J microservices, which were themselves the fastest available in Java, and faster than anything in JS. I can run 256 microservices in Pharo ST with a quicker startup time, less memory use, better throughput and monitoring / management, than one express.js microservice.
Porting the 32 bit VM to a 64 bit UltraSparc was quite easy, and resulted in software that can filter and route large quantities of monitoring data significantly faster than a Cisco's offering - an IOS program running on a Cisco ASR-9010. The Sun/Oracle T5220's go used for about 1/600th the price of the ASR, which is a signicant advantage.
I like Dart, but I have to say to some degree for me it's just YAPL, since it doesn't do anything not already possible with a combination of PHaro and amber-lang. And the Smalltalk syntax (Ruby is similar) is both more readable and less verbose than JS (or Java for that matter). GO had some good ideas, but not enough to really generate much interest. ST has had 36 years of development, nothing brand new is going to offer equivalent tools or equivalent runtime stability.
Check out a4bp for an example of data analysis and visualization in Pharo. The website is also written in Pharo using Graphviz from within Smalltalk. SmallTalkHub is a combination of Pharo ST and amber-lang. Amber-lang could be used to wrap libs like Angular, until it becomes easy enough to write browser plugins for any arbitrary language and we aren't stuck with JS.
I have read the question How to test and develop with asm.js?, and the accepted answer gives a link to http://kripken.github.com/mloc_emscripten_talk/#/.
The conclusion of that slide show is that "Statically-typed languages and especially C/C++ can be compiled effectively to JavaScript", so we can "expect the speed of compiled C/C++ to get to just 2X slower than native code, or better, later this year".
But what about non-statically-typed languages, such as regular JavaScript itself? Can it be compiled to asm.js?
Can JavaScript itself be compiled to asm.js?
Not really, because of its dynamic nature. It's the same problem as when trying to compile it to C or even to native code - you actually would need to ship a VM with it to take care of those non-static aspects. At least, such a VM is possible:
js.js is a JavaScript interpreter in JavaScript. Instead of trying to create an interpreter from scratch, SpiderMonkey is compiled into LLVM and then emscripten translates the output into JavaScript.
But if asmjs code runs faster than regular JS, then it makes sense to compile JS to asmjs, no?
No. asm.js is a quite restricted subset of JS that can be easily translated to bytecode. Yet you first would need to break down all the advanced features of JS to that subset for getting this advantage - a quite complicated task imo. But JavaScript engines are designed and optimized to translate all those advanced features directly into bytecode - so why bother about an intermediate step like asm.js? Js.js claims to be around 200 times slower than "native" JS.
And what about non-statically-typed languages in general?
The slideshow talks about that from …Just C/C++? onwards. Specifically:
Dynamic Languages
Entire C/C++ runtimes can be compiled and the original language
interpreted with proper semantics, but this is not lightweight
Source-to-source compilers from such languages to JavaScript ignore
semantic differences (for example, numeric types)
Actually, these languages depend on special VMs to be efficient
Source-to-source compilers for them lose out on the optimizations done in those VMs
In response to the general question "is it possible?" then the answer is that sure, both JavaScript and the asm.js subset are Turing complete so a translation exists.
Whether one should do this and expect a performance benefit is a different question. The short answer is "no, you shouldn't." I liken this to trying to compress a compressed file; yes, it is possible to run the compression algorithm, but in general you should not expect the resulting file to be smaller.
The short answer: The performance cost of dynamically-typed languages comes from the meaning of the code; a statically-typed program with an equivalent meaning would carry the same costs.
To understand this, it is important to understand why asm.js offers a performance benefit at all; or, more generally, why statically-typed languages perform better than dynamically-typed ones. The short answer is "run-time type checking takes time," and a longer answer would include the improved feasibility of optimizing statically-typed code. For example:
function a(x) { return x + 1; }
function b(x) { return x - 1; }
function c(x, y) { return a(x) + b(y); }
If x and y are both known to be integers, I can optimize function c to a couple of machine code instructions. If they could be integers or strings, the optimization problem becomes much harder; I have to treat these as string appends in some cases, and addition in other cases. In particular, there are four possible interpretations of the addition operation that occurs in c; it could be addition, or string append, or two different variants of coerce-to-string-and-append. As you add more possible types, the number of possible permutations grows; in the worst case for a dynamically-typed language, you have k^n possible interpretations of an expression involving n terms which could each have any number of k types. In a statically typed language, k=1, so there is always 1 interpretation of any given expression. Because of this, optimizers are fundamentally more efficient at optimizing statically-typed code than dynamically-typed code: There are fewer permutations to consider when searching for opportunities to optimize.
The point here is that when converting from dynamically-typed code to statically-typed code (as you'd be doing when going from JavaScript to asm.js), you have to account for the semantics of the original code. Meaning the type-checking still occurs (it's just now been spelled out statically-typed code) and all those permutations are still present to stifle the compiler.
A few facts about asm.js, which hopefully make the concept clear:
Yes you can write the asm.js dialect by hand.
If you did look at the examples for asm.js, they are very far from being user friendly. Obviously Javascript is not the front end language for creating this code.
Translating vanilla Javascript to asm.js dialect is not possible.
Think about it - if you already could translate standard Javascript in a fully statically manner, why would there be a need for asm.js? The sole existance of asm.js means that the Javascript JIT people at some people gave up on their promise that Javascript will get faster without any effort from the developer.
There are several reasons for this, but let's just say it would be really hard for the JIT to understand a dynamic language as good as a static compiler. And then probably for the developers to fully understand the JIT.
In the end it boils down to using the right tool for the task. If you want static, very performant code, use C / C++ ( / Java ) - if you want a dynamic language, use Javascript, Python, ...
asm.js has been created by the need of have a small subset of javascript which can be easily optimized. If you can have a way to convert javascript to javascript/asm.js, asm.js is not needed anymore - that method can be inserted in js interpreters directly.
In theory, it is possible to convert / compile / transpile any JavaScript operation to asm.js if it can be expressed with the limited subset of the language present in asm.js. In practice, however, there is no tool capable of converting ordinary JavaScript to asm.js at the moment (June, 2017).
Either way, it would make more sense to convert a language with static typing to asm.js, because static typing is a requirement of asm.js and the lack thereof one of the features of ordinary JavaScript that makes it exceptionally hard to compile to asm.js.
Back in 2013, when asm.js was hot, there has been an attempt to compile a statically typed language similar to JavaScript, but both the language and the compiler seem to have been abandoned.
Today, in 2017, JavaScipt subsets TypeScript and Flow would be suitable candidates for conversion to asm.js, but the core dev teams of neither language is interested in such conversion. LLJS had a fork that could compile to asm.js, but that project is pretty much dead. ThinScript is a much more recent attempt and is based on TypeScript, but it doesn't appear to be active either.
So, the best and easiest way to produce asm.js code is still to write your code in C/C++ and convert / compile / transpile it. However, it remains to be seen whether we'll even want to do this in the forseeable future. Web Assembly may soon replace asm.js altogether and there's already popping up TypeScript-like languages like TurboScript and AssemblyScript that convert to Web Assembly. In fact, TurboScript was originally based on ThinScript and used to compile to asm.js, but they appear to have abandoned this feature.
It may be possible to convert regular JavaScript to asm.js by first compiling it to C or C++, and then compiling the generated code to asm.js using Emscripten. I'm not sure if this would be practical, but it's an interesting concept nonetheless.
There is also a compiler called NectarJS that compiles JavaScript to WebAssembly and ASM.js.
check this http://badassjs.com/post/43420901994/asm-js-a-low-level-highly-optimizable-subset-of
basically you need check that your code would be asm.js compatible (no coercion or type casting, you need to manage the memory, etc). The idea behind this is write your code in javascript, detect the bottle neck and do the changes in your code for use asm.js and aot compilation instead jit and dynamic compilation...is a bit PITA but you can still use javascript or other languages like c++ or better..in a near future, lljs.....
Lua is small and can be easily embedded. The current JavaScript VMs are quite big and hard to integrate into existing applications.
So wouldn't it be possible to compile JavaScript to Lua or Lua bytecode?
Especially for the constraints in mobile applications this seems like a good fit. Being able to easily integrate one of the most popular scripting languages into any iPhone or Android app would be great.
I'm not very familiar with Lua so I don't know if this is technically feasible.
With Luvit there is an active project trying to port the Node.js architecture to Lua. So the evented JavaScript world can't be too far away from whats possible in Lua.
The wins of compiling Javascript to Lua are not as great as you might first imagine. Javascript's semantics are very different to Lua's (the LuaJIT author cites Lua's design as one of the main reasons LuaJIT can compete so favourably with Javascript JIT compilers).
Take this code:
if("1" == 1)
{
print("Yes");
}
This prints "Yes" in Javascript. The equivalent code in Lua does not, as strings are never equal to numbers in Lua. This may seem like a small point, but it has a fundamental consequence: we can no longer use Lua's built-in equality testing.
There are two solutions we could take. We could rewrite 1 == "1" to javascript_equals(1, "1"). Or we could wrap every Javascript value in Lua, and use Lua's metatables to override the == operator behaviour.
So we already lost a some efficiency from Lua by mapping Javascript to it. This is a simple example, but it continues like this all the way down. For example all the operator rules are different between Javascript and Lua.
We would even have to wrap Javascript objects, because they aren't the same as Lua tables. For example Javascript objects only support string keys, and coerce any index to a string:
> a = {}
{}
> a[1] = "Hello"
'Hello'
> a["1"]
'Hello'
You also have to watch out for Javascript's scoping rules, vararg functions, and so on.
Now, all of these things are surmountable, if someone were to put the effort into a full compiler. However any efficiency gains would soon be drowned out. You would essentially end up building a Javascript interpreter in Lua. Most Javascript interpreters are written in C and already optimised for Javascript's semantics.
So, doing it for efficiency is a lost cause. There may be other reasons - such as supporting Javascript in a Lua-only environment, though even then if possible just writing Lua bindings to an existing Javascript interpreter would probably be less work.
If you want to have a play with a Javascript->Lua source-to-source translator, take a look at js2lua, which is a toy project I created some time back. It's not anywhere complete, but playing with it would certainly give some food for thought. It already includes a Javascript lexer, so that hard work is done already.
How would I go about writing a lightweight javascript to javascript parser. Something simple that can convert some snippets of code.
I would like to basically make the internal scope objects in functions public.
So something like this
var outer = 42;
window.addEventListener('load', function() {
var inner = 42;
function magic() {
var in_magic = inner + outer;
console.log(in_magic);
}
magic();
}, false);
Would compile to
__Scope__.set('outer', 42);
__Scope__.set('console', console);
window.addEventListener('load', constructScopeWrapper(__Scope__, function(__Scope__) {
__Scope__.set('inner', 42);
__Scope__.set('magic',constructScopeWrapper(__Scope__, function _magic(__Scope__) {
__Scope__.set('in_magic', __Scope__.get('inner') + __Scope__.get('outer'));
__Scope__.get('console').log(__Scope__.get('in_magic'));
}));
__Scope__.get('magic')();
}), false);
Demonstation Example
Motivation behind this is to serialize the state of functions and closures and keep them synchronized across different machines (client, server, multiple servers). For this I would need a representation of [[Scope]]
Questions:
Can I do this kind of compiler without writing a full JavaScript -> (slightly different) JavaScript compiler?
How would I go about writing such a compiler?
Can I re-use existing js -> js compilers?
I don't think your task is easy or short given that you want to access and restore all the program state. One of the issues is that you might have to capture the program state at any moment during a computation, right? That means the example as shown isn't quite right; that captures state sort of before execution of that code (except that you've precomputed the sum that initializes magic, and that won't happen before the code runs for the original JavaScript). I assume you might want to capture the state at any instant during execution.
The way you've stated your problem, is you want a JavaScript parser in JavaScript.
I assume you are imagining that your existing JavaScript code J, includes such a JavaScript parser and whatever else is necessary to generate your resulting code G, and that when J starts up it feeds copies of itself to G, manufacturing the serialization code S and somehow loading that up.
(I think G is pretty big and hoary if it can handle all of Javascript)
So your JavaScript image contains J, big G, S and does an expensive operation (feed J to G) when it starts up.
What I think might serve you better is a tool G that processes your original JavaScript code J offline, and generates program state/closure serialization code S (to save and restore that state) that can be added to/replace J for execution. J+S are sent to the client, who never sees G or its execution. This decouples the generation of S from the runtime execution of J, saving on client execution time and space.
In this case, you want a tool that will make generation of such code S easiest. A pure JavaScript parser is a start but isn't likely enough; you'll need symbol table support to know which function code is connected a function call F(...), and which variable definition in which scope corresponds to assignments or accesses to a variable V. You may need to actually modify your original code J to insert points of access where the program state can be captured. You may need flow analysis to find out where some values went. Insisting all of this in JavaScript narrows your range of solutions.
For these tasks, you will likely find a program transformation tool useful. Such tools contain parsers for the langauge of interest, build ASTs representing the program, enable the construction of identifier-to-definition maps ("symbol tables"), can carry out modifications to the ASTs representing insertion of access points, or synthesis of ASTs representing your demonstration example, and then regenerate valid JavaScript code containing the modified J and the additions S.
Of all the program transformation systems that I know about (which includes all the ones at the Wikipedia site), none are implemented in JavaScript.
Our DMS Software Reengineering Toolkit is such a program transformation system offering all the features I just described. (Yes, its big and hoary; it has to be to handle the complexities of real computer languages). It has a JavaScript front end that contains a complete JavaScript parser to ASTs, and the machinery to regenerate JavaScript code from modified or synthesized ASTs. (Also big and hoary; good thing that hoary + hoary is still just hoary). Should it be useful, DMS also provides support for building control and dataflow analysis.
If you want something with a simple interface, you could try node-burrito: https://github.com/substack/node-burrito
It generates an AST using the uglify-js parser and then recursively walks the nodes. All you have to do is give a single callback which tests each node. You can alter the ones you need to change, and it outputs the resulting code.
I'd try to look for an existing parser to modify. Perhaps you could adapt JSLint/JSHint?
There is a problem with the rewriting above, you're not hoisting the initialization of magic to the top of the scope.
There's a number of projects out there that parse JavaScript.
Crock's Pratt parser which works well on JavaScript that fits within "The good parts" and less well on other JS.
The es-lab parser based on ometa which handles the full grammar including a lot of corner cases that Crock's parser misses. It may not perform as well as Crock's.
narcissus parser and evaluator. I don't have much experience with this.
There are also a number of high-quality lexers for JavaScript that let you manipulate JS at the token level. This can be tougher than it sounds though since JavaScript is not lexically regular, and predicting semicolon insertion is difficult without a full parse.
My es5-lexer is a carefully constructed and efficient lexer for EcmaScript 5 that provides the ability to tokenize JavaScript. It is heuristic where JavaScript's grammar is not lexically regular but the heuristic is very good and it provides a means to transform a token stream so that an interpreter is guaranteed to interpret it the way the lexer interpreted the tokens so if you don't trust your input, you can still be sure that the interpretation underlying the security transformations is sound even if not correct according to the spec for some bizarre inputs.
Your problem seams to be in same family of problems as what is solved with the JS Opfuscators and JS Compressors -- they as well as you need to be able to parse and reformat the JS to an equivalent script;
There was a good discussion on obfuscators here and the possible solution to your problem could be to leverage the parse and generator part from one of the FOSS versions.
One callout, your example code does not take into account the scopes of the variables you want to set/get and that will eventually become a problem that you will have to solve.
Addition
Given the scope problem for closure defined functions, you are probably unlikely to be able to solve this problem as a static parsing problem, as the scope variables outside the closure will have to be imported/exported to resolve/save and re-instantiate scope. Hence you may need to dig into the evaluation engine itself, and perhaps get the V8 engine and make a hack to the interpreter itself -- that is assuming that you do not need this to be generic cross all script engines and that you can tie it down to a single implementation which you control.