I want to analyze a large and confusing JS code. The code is heavily obfuscated and even tools like JStillery cannot work with it.
I would like to somehow build one of the open JS-interpreters, run it outside the browser and debug in more traditional ways, if necessary, applying patches inside the interpreter.
Unfortunately, the code uses DOM and cannot be executed without a browser.
The question is: are there any known techniques to take any external engine (such as V7, V8, DukTape, JerryScript, MuJS, quad-wheel, QuickJS, tiny-js, ...) and run code inside them that contains calls to DOM and other browser parts?
There are pure-JavaScript implementations of the DOM, such as https://github.com/jsdom/jsdom. Not sure how useful that is for your use case, but it does address your primary question: it allows you to run JavaScript that assumes a browser environment outside the browser environment.
I believe jsdom is fairly accurate in its implementation; there are other implementations out there that are more mock-like. Either way, there are probably some remaining differences, so heavily obfuscated code may well include mechanisms to detect emulated environments...
Related
In Node.js website, they say Node.js is a JavaScript runtime.
Are web browsers like Chrome, Firefox, Edge, ... JavaScript runtimes?
I thought of course, web browser is JS runtime. But I'm confused, In this video 12:10~ He says Web browser is not just JavaScript runtime because it can do more things
at one time, it can give us other things.
But I think V8 engine only can do one thing at one time, while JS runtime can do more things than one at one time.
Am I wrong?
A browser contains a Javascript engine (for example Chrome v8). The engine implements a Javascript runtime, which includes the call stack, heap and event loop. The browser also usually includes a set of APIs that augment the Javascript runtime and make asynchronous code execution possible.
NodeJS also implements a Javascript runtime using Chrome's v8 engine as well as the Libuv library (event loop and worker threads).
Here's a good video that breaks this all down:
https://www.youtube.com/watch?v=4xsvn6VUTwQ
They are right, a JavaScript runtime just executes the JavaScript code.
All Web browsers include a JavaScript runtime engine(RE) that executes js code for them but they also have other plugins like java or flash, as well as an html/dom parser and renderers that are not part of the RE, even if those modules were written in JavaScript it does not mean they would be part of the RE.
I don't have a lot of hope for this one, but I have to ask. I am hoping, for didactic purposes, to come up with some means by which a student could load a simple javascript program into a browser, and have it interact with them in the old-fashioned command line manner, where it prints a line and then reads a line of input. This works fine if you use prompt(), but the fact that it creates popups is aesthetically annoying, and today's browsers cheerfully volunteer to stifle the scripts which overuse it. The problem is, prompt() appears to be the only way browser Javascript has of actually pausing a script to wait for input. If we avoid it, that throws us immediately into having to deal with a real-time GUI event input model.
I've been looking for a way to fake it -- to set up some kind of environment in which it is possible for a Javascript method to wait for input and then return when it's given. The best possibility I've got so far is to connect it to a Java applet, but the java applet brand is kind of poisoned now and I doubt people would want to install the plugin. Could there be another way? Worker threads? A browser add-on? Some server-side trick? Does anyone have an idea?
This question is now years old... I wonder if the addition of promises and async/await to Javascript makes this any more possible now?
...I sort of got it to work. If the read-eval-print loop is in an async function and uses await for the line that reads input, you can indeed make an old fashioned linear read-eval-print loop in an event-driven browser environment. But I don't think you can make the main thread wait on input without an async function, except with the grandfathered prompt method (which they're now getting ready to deprecate).
The technical term for what you want is a REPL: a Read-Evaluate-Print-Loop. There are many REPLs out there.
The Best Browser-Based Solution
Use Google Chrome's inbuilt console! It's a Javascript REPL that can also let you interact with a web page. You can access it by using Ctrl+Shift+I on Windows and Unix inside Chrome, or the equivalent command on a Mac (Google it!).
To load a file and play with it, all your students need to do is create a directory structure like this:
project
|--> index.html
|--> javascript.js
make sure index.html has a script tag that points to javascript.js, and then open index.html with Chrome. Voila! You have loaded a Javascript file, and can now play around with it in Chrome.
It's the best solution because you get the full power of the DOM, can do REPL stuff, and is virtually painless - everyone uses Chrome, and your students can even go home and mess around with it completely.
A Browser-Based Alternative
Rather than reinvent the wheel, you can also use Repl.it.
It's a browser-based Javascript REPL website that supports inserting an arbitrary Javascript program, and interacting with its contents. This is the closest you'll probably get to meeting your requirements - it'll be unable to interact with the DOM (obviously), but it'll more than work.
Non-Browser-Based Alternative
If the requirement for a web-based solution can be relaxed, simply using Node.js' inbuilt REPL on a terminal can be more than sufficient.
You could install Node on your lab's computers, and have people play with Javascript in that capacity. There'll be no DOM, but you could certainly have them write functions and algorithms to solve simple problems. Plus, it'd be a good way to introduce them to the fact that Javascript is no longer client-side-bound.
The fact that you'll be interacting with Javascript in an actual terminal, rather than an emulated one in the browser, is another neat bonus.
If I've Misunderstood...
Some of the question comments make it sound like you want a way to be able to interact with terminal utilities using Javascript from a browser. If that is the case, this is impossible.
There is no way for Javascript to evaluate, parse or do anything on the command line that isn't written using Javascript. You cannot expect the equivalent of a bash ls using a browser-based solution - that's because browsers don't have access to your underlying filesystem, which is a good thing. You cannot run sed, awk, grep, etc. for the same reason - Unix utilities are inaccessible to a browser. There are ways to run Unix utilities using Node, of course, but then you will be teaching them how to use Node, rather than how to play with the Unix console.
If all you want, however, is a way to SSH from a browser into a common environment, there are certainly browser-based ways to do that. FireSSH is a Firefox plugin (now also ported to Chrome) that lets people SSH into a common server. They can then do ls, vim, etc. and have it run in the server, with the results piped back to their browser screen. You'll have to think carefully about security in this case, of course, but I think simply giving people user permissions for this server should more than suffice.
Note that FireSSH doesn't use Javascript to parse or do anything - all it is doing is relaying commands you type to a server, having the server execute those commands remotely, and then piping the results back to your screen.
Alternatives to Prompt
I added this after understanding OP's requirements in more detail.
This is a question that has been asked before. I am fond of library solutions, and in 2013, iocream.js was developed for just this sort of browser functionality. You can embed it in a page, and use the jin function to assign values.
If going with a Node.js solution, by far the best approach is to make use of the prompt library. I personally find it very useful for embedding within Node.js applications.
SpiderMonkey is Mozilla's C++-based Javascript engine, and supports a function called readline(). Unfortunately, there doesn't appear to be a mainstream browser implementation.
The first part of this question is actually a request for confirmation based on JavaScript-centric research I've been doing all afternoon. If I am incorrect about any of these items, please correct me!
The ECMA is the official standards body that "maintains" JavaScript
Any browser that wishes to support JavaScript (which all/most of them do), must include some sort of interpreter (what is it???) engine deep inside the browser code
When someone points a JavaScript-enabled browser to a URI that contains JavaScript code, that browser downloads the JS file(s) along with the HTML, CSS, etc., runs the JS through this interpreter, and the resultant output affects how the page is ultimately rendered
In addition to these items I've also heard terms like JavaScript "plug-ins" or modules that browsers can have. What are these plugins/modules and why would a browser need them if they are ECMA-compliant and already contain a JS interpreter?
Thanks in advance!
ECMA is the standards body that enforces the "standards" of JS. They keep the language consistent and documented (although knowing it's history, they didn't do good with that job as a "standards body")
A JavaScript engine is a piece of software embedded into the browser to parse JavaScript source code. It's what turns your JavaScript into actions on screen (and off screen). An example of these are V8 (Chrome), TraceMonkey (Firefox), Chakra (IE), Carakan (Opera) and Nitro/SquirrelFish (safari)
Before the above happens, one must introduce the JavaScript code into the browser to be parsed (usually using <script> tags)
JavaScript Plugins/Toolkits are just code developed by programmers to do stuff easily. they just do things that normally you would code 1000 lines for. these code also "improve" programming by giving consistency cross-browser. Examples of plugins/toolkits are jQuery (and it's UI plug jQueryUI), YUI, Dojo and so on.
Browser extensions/plugins on the otherhand "extends" the functionality of the browser. Examples are ADBlock (whick blocks page ads), FlashGet (Downloads flash files on the page). These guys are programmed into the browser rather than into the page. However, lately, these extensions are powered by JavaScript for the reason that it's easy to program
The ECMA is the official standards body that sets the standard for ECMAScript which JavaScript implements. So does ActionScript. ECMAScript covers all the programming essentials and basic structure of the language.
The ECMA spec does NOT cover browser-oriented APIs like the DOM. This is covered by the W3C DOM standard which is meant to define a language neutral API. IE has supported the ECMA spec fairly religously since IE 6 while almost completely ignoring the DOM stuff in favor of its own proprietary BS all the way up until IE9.
The spec itself is just a bunch of rules for how the language is supposed to work. As long as you write the same stuff and a given browser's interpreter gives the expected result as defined by the spec, it is ECMA compliant to whatever version being considered.
An interpreter parses and tokenizes the actual text you've written and turns it into instructions to be read by the browser's run-time environment. Modern browsers sport actual JIT compilers which actually convert your JS into bytecode as it's executed so that the browsers run-time environment itself doesn't have to translate.
Most browsers cache the actual binary of a js file. So pages on the same domain that link to the same server location won't have to download the same file twice when a new page links to it. This is the same as with any resource (images, css files, etc.) I don't believe they cache the results of any of the interpretation that goes down but I think in the case of the JITs the results of certain pre-execution routines (JIT prepwork basically) might be kept in memory (pure speculation on my part - but it seems kind of duh).
We've been a bit fast and loose with language in regards to usage of words like plugins, frameworks, tools, libraries etc... It's all just JavaScript, usually. You "plug them in" by linking a file or cutting and pasting into an existing one like any other JS. By plugin, however, people usually mean it works with some existing pre-fab JS, like JQuery, which tends to be expanded on by adding methods to the object that it returns (JQuery is just a big fancy function that builds and returns the same object every time you fire it basically). A library tends to be a large collection of pre-defined methods for doing all sorts of things. Like a warehouse 'o stuff you can use. I think of JQuery as more of a tool than a library because its focus is more on reducing cruft and normalizing browser differences. JQ by itself doesn't really do anything all that far removed from core JS methods. It just makes it much easier/faster to do them. It has a UI library, which is basically a large set of plug-ins that actually spit out pre-fab UI elements, HTML, CSS and all. A framework tends to be more of a system for building large-scale app-type structures on the front end. It's not just a bunch of methods to call, it's a way of building things aimed at making it easy to barn-raise entire app structures while skipping a lot of the more granular work one typically needs to do to keep things flexible (as a result, frameworks usually aren't particularly flexible but that doesn't mean they can't be).
Is there an implementation or emulation of the DOM which is purely javascript?
There is env.js, but that requires Rhino.
There's jsdom, but that requires Node.
Is there a solution that works in most any javascript interpreter, such as v8, without being tied to any particular interpreter or engine? That is, is there any DOM implementation in JS that without any set up or shims can be dropped into a javascript interpreter and just run?
In addition to the ones you have listed, I have heard good things about dom.js. It requires limited ES6 features such as const, WeakMap, and Proxy, so it will work in V8 and SpiderMonkey (Rhino) but not JavaScriptCore, Chakra, or others.
It's hard to guess at exactly what you're trying to do, here, but I'll take a stab at it, just to keep the conversation going:
If you're trying to manipulate a DOM from within a browser, can't you just use Jquery?
If you're trying to get a "headless browser", I'd check out PhantomJS.
I guess it's hard to imagine how you'd even run Javascript code without a browser, or Rhino, or Node, or PhantomJS, or some other JS interpreter environment...
There are lots of widgets provided by sites that are effectively bits of JavaScript that generate HTML through DOM manipulation or document.write(). Rather than slow the browser down even more with additional requests and trust yet another provider to be fast, reliable and not change the widget output, I want to execute* the JavaScript to generate the rendered HTML, and then save that HTML source.
Things I've looked into that seem unworkable or way too difficult:
The Links Browser (not lynx!)
Headless use of Xvfb plus Firefox plus Greasemonkey (yikes)
The all-Java browser toolkit Cobra (the best bet!)
Any ideas?
** Obviously you can't really execute the JavaScript completely, as it doesn't necessarily have an exit path, but you get the idea.
Wikipedia's "Server-side JavaScript" article lists numerous implementations, many of which are based on Mozilla's Rhino JavaScript-to-Java converter, or its cousin SpiderMonkey (the same engine as found in Firefox and other Gecko-based browsers). In particular, something simple like mod_js for Apache may suit your needs.
If you're just using plain JS, Rhino should do the trick. But if the JS code is actually calling DOM methods and so on, you're going to need a full-blown browser. Crowbar might help you.
Is this really going to make things faster for users without causing compatibility issues?
There's John Resig's project Bringing the Browser to the Server: "browser/DOM environment, written in JavaScript, that runs on top of Rhino; capable of running jQuery, Prototype, and MochiKit (at the very least)."