I'm writing some JavaScript code to parse user-entered functions (for spreadsheet-like functionality). Having parsed the formula I could convert it into JavaScript and run eval() on it to yield the result.
However, I've always shied away from using eval() if I can avoid it because it's evil (and, rightly or wrongly, I've always thought it is even more evil in JavaScript, because the code to be evaluated might be changed by the user).
So, when it is OK to use it?
I'd like to take a moment to address the premise of your question - that eval() is "evil". The word "evil", as used by programming language people, usually means "dangerous", or more precisely "able to cause lots of harm with a simple-looking command". So, when is it OK to use something dangerous? When you know what the danger is, and when you're taking the appropriate precautions.
To the point, let's look at the dangers in the use of eval(). There are probably many small hidden dangers just like everything else, but the two big risks - the reason why eval() is considered evil - are performance and code injection.
Performance - eval() runs the interpreter/compiler. If your code is compiled, then this is a big hit, because you need to call a possibly-heavy compiler in the middle of run-time. However, JavaScript is still mostly an interpreted language, which means that calling eval() is not a big performance hit in the general case (but see my specific remarks below).
Code injection - eval() potentially runs a string of code under elevated privileges. For example, a program running as administrator/root would never want to eval() user input, because that input could potentially be "rm -rf /etc/important-file" or worse. Again, JavaScript in a browser doesn't have that problem, because the program is running in the user's own account anyway. Server-side JavaScript could have that problem.
On to your specific case. From what I understand, you're generating the strings yourself, so assuming you're careful not to allow a string like "rm -rf something-important" to be generated, there's no code injection risk (but please remember, it's very very hard to ensure this in the general case). Also, if you're running in the browser then code injection is a pretty minor risk, I believe.
As for performance, you'll have to weight that against ease of coding. It is my opinion that if you're parsing the formula, you might as well compute the result during the parse rather than run another parser (the one inside eval()). But it may be easier to code using eval(), and the performance hit will probably be unnoticeable. It looks like eval() in this case is no more evil than any other function that could possibly save you some time.
eval() isn't evil. Or, if it is, it's evil in the same way that reflection, file/network I/O, threading, and IPC are "evil" in other languages.
If, for your purpose, eval() is faster than manual interpretation, or makes your code simpler, or more clear... then you should use it. If neither, then you shouldn't. Simple as that.
When you trust the source.
In case of JSON, it is more or less hard to tamper with the source, because it comes from a web server you control. As long as the JSON itself contains no data a user has uploaded, there is no major drawback to use eval.
In all other cases I would go great lengths to ensure user supplied data conforms to my rules before feeding it to eval().
Let's get real folks:
Every major browser now has a built-in console which your would-be hacker can use with abundance to invoke any function with any value - why would they bother to use an eval statement - even if they could?
If it takes 0.2 seconds to compile 2000 lines of JavaScript, what is my performance degradation if I eval four lines of JSON?
Even Crockford's explanation for 'eval is evil' is weak.
eval is Evil, The eval function is the most misused feature of
JavaScript. Avoid it
As Crockford himself might say "This kind of statement tends to generate irrational neurosis. Don't buy it."
Understanding eval and knowing when it might be useful is way more important. For example, eval is a sensible tool for evaluating server responses that were generated by your software.
BTW: Prototype.js calls eval directly five times (including in evalJSON() and evalResponse()). jQuery uses it in parseJSON (via Function constructor).
I tend to follow Crockford's advice for eval(), and avoid it altogether. Even ways that appear to require it do not. For example, the setTimeout() allows you to pass a function rather than eval.
setTimeout(function() {
alert('hi');
}, 1000);
Even if it's a trusted source, I don't use it, because the code returned by JSON might be garbled, which could at best do something wonky, at worst, expose something bad.
Bottom Line
If you created or sanitized the code you eval, it is never evil.
Slightly More Detailed
eval is evil if running on the server using input submitted by a client that was not created by the developer or that was not sanitized by the developer.
eval is not evil if running on the client, even if using unsanitized input crafted by the client.
Obviously you should always sanitize the input, as to have some control over what your code consumes.
Reasoning
The client can run any arbitrary code they want to, even if the developer did not code it; This is true not only for what is evaled, but the call to eval itself.
Eval is complementary to compilation which is used in templating the code. By templating I mean that you write a simplified template generator that generates useful template code which increases development speed.
I have written a framework, where developers don't use EVAL, but they use our framework and in turn that framework has to use EVAL to generate templates.
Performance of EVAL can be increased by using the following method; instead of executing the script, you must return a function.
var a = eval("3 + 5");
It should be organized as
var f = eval("(function(a,b) { return a + b; })");
var a = f(3,5);
Caching f will certainly improve the speed.
Also Chrome allows debugging of such functions very easily.
Regarding security, using eval or not will hardly make any difference,
First of all, the browser invokes the entire script in a sandbox.
Any code that is evil in EVAL, is evil in the browser itself. The attacker or anyone can easily inject a script node in DOM and do anything if he/she can eval anything. Not using EVAL will not make any difference.
It is mostly poor server-side security that is harmful. Poor cookies validation or poor ACL implementation on the server causes most attacks.
A recent Java vulnerability, etc. was there in Java's native code. JavaScript was and is designed to run in a sandbox, whereas applets were designed to run outside a sandbox with certificates, etc. that lead to vulnerabilities and many other things.
Writing code for imitating a browser is not difficult. All you have to do is make a HTTP request to the server with your favourite user agent string. All testing tools mock browsers anyway; if an attacker want to harm you, EVAL is their last resort. They have many other ways to deal with your server-side security.
The browser DOM does not have access to files and not a user name. In fact nothing on the machine that eval can give access to.
If your server-side security is solid enough for anyone to attack from anywhere, you should not worry about EVAL. As I mentioned, if EVAL would not exist, attackers have many tools to hack into your server irrespective of your browser's EVAL capability.
Eval is only good for generating some templates to do complex string processing based on something that is not used in advance. For example, I will prefer
"FirstName + ' ' + LastName"
As opposed to
"LastName + ' ' + FirstName"
As my display name, which can come from a database and which is not hardcoded.
When debugging in Chrome (v28.0.1500.72), I found that variables are not bound to closures if they are not used in a nested function that produces the closure. I guess, that's an optimization of the JavaScript engine.
BUT: when eval() is used inside a function that causes a closure, ALL the variables of outer functions are bound to the closure, even if they are not used at all. If someone has the time to test if memory leaks can be produced by that, please leave me a comment below.
Here's my test code:
(function () {
var eval = function (arg) {
};
function evalTest() {
var used = "used";
var unused = "not used";
(function () {
used.toString(); // Variable "unused" is visible in debugger
eval("1");
})();
}
evalTest();
})();
(function () {
var eval = function (arg) {
};
function evalTest() {
var used = "used";
var unused = "not used";
(function () {
used.toString(); // Variable "unused" is NOT visible in debugger
var noval = eval;
noval("1");
})();
}
evalTest();
})();
(function () {
var noval = function (arg) {
};
function evalTest() {
var used = "used";
var unused = "not used";
(function () {
used.toString(); // Variable "unused" is NOT visible in debugger
noval("1");
})();
}
evalTest();
})();
What I like to point out here is, that eval() must not necessarily refer to the native eval() function. It all depends on the name of the function. So when calling the native eval() with an alias name (say var noval = eval; and then in an inner function noval(expression);) then the evaluation of expression may fail when it refers to variables that should be part of the closure, but is actually not.
I saw people advocate to not use eval, because is evil, but I saw the same people use Function and setTimeout dynamically, so they use eval under the hoods :D
BTW, if your sandbox is not sure enough (for example, if you're working on a site that allow code injection) eval is the last of your problems. The basic rule of security is that all input is evil, but in case of JavaScript even JavaScript itself could be evil, because in JavaScript you can overwrite any function and you just can't be sure you're using the real one, so, if a malicious code start before you, you can't trust any JavaScript built-in function :D
Now the epilogue to this post is:
If you REALLY need it (80% of the time eval is NOT needed) and you're sure of what you' re doing, just use eval (or better Function ;) ), closures and OOP cover the 80/90% of the case where eval can be replaced using another kind of logic, the rest is dynamically generated code (for example, if you're writing an interpreter) and as you already said evaluating JSON (here you can use the Crockford safe evaluation ;) )
The only instance when you should be using eval() is when you need to run dynamic JS on the fly. I'm talking about JS that you download asynchronously from the server...
...And 9 times of 10 you could easily avoid doing that by refactoring.
On the server side eval is useful when dealing with external scripts such as sql or influxdb or mongo. Where custom validation at runtime can be made without re-deploying your services.
For example an achievement service with following metadata
{
"568ff113-abcd-f123-84c5-871fe2007cf0": {
"msg_enum": "quest/registration",
"timely": "all_times",
"scope": [
"quest/daily-active"
],
"query": "`SELECT COUNT(point) AS valid from \"${userId}/dump/quest/daily-active\" LIMIT 1`",
"validator": "valid > 0",
"reward_external": "ewallet",
"reward_external_payload": "`{\"token\": \"${token}\", \"userId\": \"${userId}\", \"amountIn\": 1, \"conversionType\": \"quest/registration:silver\", \"exchangeProvider\":\"provider/achievement\",\"exchangeType\":\"payment/quest/registration\"}`"
},
"efdfb506-1234-abcd-9d4a-7d624c564332": {
"msg_enum": "quest/daily-active",
"timely": "daily",
"scope": [
"quest/daily-active"
],
"query": "`SELECT COUNT(point) AS valid from \"${userId}/dump/quest/daily-active\" WHERE time >= '${today}' ${ENV.DAILY_OFFSET} LIMIT 1`",
"validator": "valid > 0",
"reward_external": "ewallet",
"reward_external_payload": "`{\"token\": \"${token}\", \"userId\": \"${userId}\", \"amountIn\": 1, \"conversionType\": \"quest/daily-active:silver\", \"exchangeProvider\":\"provider/achievement\",\"exchangeType\":\"payment/quest/daily-active\"}`"
}
}
Which then allow,
Direct injection of object/values thru literal string in a json, useful for templating texts
Can be use as a comparator, say we make rules how to validate quest or events in CMS
Con of this:
Can be errors in the code and break up things in the service, if not fully tested.
If a hacker can write script on your system, then you are pretty much screwed.
One way to validate your script is keep the hash of your scripts somewhere safe, so you can check them before running.
Eval isn't evil, just misused.
If you created the code going into it or can trust it, it's alright.
People keep talking about how user input doesn't matter with eval. Well sort of~
If there is user input that goes to the server, then comes back to the client, and that code is being used in eval without being sanitized. Congrats, you've opened pandora's box for user data to be sent to whoever.
Depending on where the eval is, many websites use SPAs, and eval could make it easier for the user to access application internals that otherwise wouldn't have been easy. Now they can make a bogus browser extension that can tape into that eval and steal data again.
Just gotta figure what's the point of you using the eval. Generating code isn't really ideal when you could simply make methods to do that sort of thing, use objects, or the like.
Now a nice example of using eval.
Your server is reading the swagger file that you have created. Many of the URL params are created in the format {myParam}. So you'd like to read the URLs and then convert them to template strings without having to do complex replacements because you have many endpoints. So you may do something like this.
Note this is a very simple example.
const params = { id: 5 };
const route = '/api/user/{id}';
route.replace(/{/g, '${params.');
// use eval(route); to do something
eval is rarely the right choice. While there may be numerous instances where you can accomplish what you need to accomplish by concatenating a script together and running it on the fly, you typically have much more powerful and maintainable techniques at your disposal: associative-array notation (obj["prop"] is the same as obj.prop), closures, object-oriented techniques, functional techniques - use them instead.
As far as client script goes, I think the issue of security is a moot point. Everything loaded into the browser is subject to manipulation and should be treated as such. There is zero risk in using an eval() statement when there are much easier ways to execute JavaScript code and/or manipulate objects in the DOM, such as the URL bar in your browser.
javascript:alert("hello");
If someone wants to manipulate their DOM, I say swing away. Security to prevent any type of attack should always be the responsibility of the server application, period.
From a pragmatic standpoint, there's no benefit to using an eval() in a situation where things can be done otherwise. However, there are specific cases where an eval SHOULD be used. When so, it can definitely be done without any risk of blowing up the page.
<html>
<body>
<textarea id="output"></textarea><br/>
<input type="text" id="input" />
<button id="button" onclick="execute()">eval</button>
<script type="text/javascript">
var execute = function(){
var inputEl = document.getElementById('input');
var toEval = inputEl.value;
var outputEl = document.getElementById('output');
var output = "";
try {
output = eval(toEval);
}
catch(err){
for(var key in err){
output += key + ": " + err[key] + "\r\n";
}
}
outputEl.value = output;
}
</script>
<body>
</html>
Since no one has mentioned it yet, let me add that eval is super useful for Webassembly-Javascript interop. While it's certainly ideal to have pre-made scripts included in your page that your WASM code can invoke directly, sometimes it's not practicable and you need to pass in dynamic Javascript from a Webassembly language like C# to really accomplish what you need to do.
It's also safe in this scenario because you have complete control over what gets passed in. Well, I should say, it's no less safe than composing SQL statements using C#, which is to say it needs to be done carefully (properly escaping strings, etc.) whenever user-supplied data is used to generate the script. But with that caveat it has a clear place in interop situations and is far from "evil".
It's okay to use it if you have complete control over the code that's passed to the eval function.
Code generation. I recently wrote a library called Hyperbars which bridges the gap between virtual-dom and handlebars. It does this by parsing a handlebars template and converting it to hyperscript. The hyperscript is generated as a string first and before returning it, eval() it to turn it into executable code. I have found eval() in this particular situation the exact opposite of evil.
Basically from
<div>
{{#each names}}
<span>{{this}}</span>
{{/each}}
</div>
To this
(function (state) {
var Runtime = Hyperbars.Runtime;
var context = state;
return h('div', {}, [Runtime.each(context['names'], context, function (context, parent, options) {
return [h('span', {}, [options['#index'], context])]
})])
}.bind({}))
The performance of eval() isn't an issue in a situation like this too because you only need to interpret the generated string once and then reuse the executable output many times over.
You can see how the code generation was achieved if you're curious here.
There is no reason not to use eval() as long as you can be sure that the source of the code comes from you or the actual user. Even though he can manipulate what gets sent into the eval() function, that's not a security problem, because he is able to manipulate the source code of the web site and could therefore change the JavaScript code itself.
So... when to not use eval()? Eval() should only not be used when there is a chance that a third party could change it. Like intercepting the connection between the client and your server (but if that is a problem use HTTPS). You shouldn't eval() for parsing code that is written by others like in a forum.
If it's really needed eval is not evil. But 99.9% of the uses of eval that I stumble across are not needed (not including setTimeout stuff).
For me the evil is not a performance or even a security issue (well, indirectly it's both). All such unnecessary uses of eval add to a maintenance hell. Refactoring tools are thrown off. Searching for code is hard. Unanticipated effects of those evals are legion.
My example of using eval: import.
How it's usually done.
var components = require('components');
var Button = components.Button;
var ComboBox = components.ComboBox;
var CheckBox = components.CheckBox;
...
// That quickly gets very boring
But with the help of eval and a little helper function it gets a much better look:
var components = require('components');
eval(importable('components', 'Button', 'ComboBox', 'CheckBox', ...));
importable might look like (this version doesn't support importing concrete members).
function importable(path) {
var name;
var pkg = eval(path);
var result = '\n';
for (name in pkg) {
result += 'if (name !== undefined) throw "import error: name already exists";\n'.replace(/name/g, name);
}
for (name in pkg) {
result += 'var name = path.name;\n'.replace(/name/g, name).replace('path', path);
}
return result;
}
I think any cases of eval being justified would be rare. You're more likely to use it thinking that it's justified than you are to use it when it's actually justified.
The security issues are the most well known. But also be aware that JavaScript uses JIT compilation and this works very poorly with eval. Eval is somewhat like a blackbox to the compiler, and JavaScript needs to be able to predict code ahead of time (to some extent) in order to safely and correctly apply performance optimisations and scoping. In some cases, the performance impact can even affect other code outside eval.
If you want to know more:
https://github.com/getify/You-Dont-Know-JS/blob/master/scope%20%26%20closures/ch2.md#eval
Only during testing, if possible. Also note that eval() is much slower than other specialized JSON etc. evaluators.
My belief is that eval is a very powerful function for client-side web applications and safe... As safe as JavaScript, which are not. :-) The security issues are essentially a server-side problem because, now, with tool like Firebug, you can attack any JavaScript application.
When is JavaScript's eval() not evil?
I'm always trying to discourage from using eval. Almost always, a more clean and maintainable solution is available. Eval is not needed even for JSON parsing. Eval adds to maintenance hell. Not without reason, it is frowned upon by masters like Douglas Crockford.
But I found one example where it should be used:
When you need to pass the expression.
For example, I have a function that constructs a general google.maps.ImageMapType object for me, but I need to tell it the recipe, how should it construct the tile URL from the zoom and coord parameters:
my_func({
name: "OSM",
tileURLexpr: '"http://tile.openstreetmap.org/"+b+"/"+a.x+"/"+a.y+".png"',
...
});
function my_func(opts)
{
return new google.maps.ImageMapType({
getTileUrl: function (coord, zoom) {
var b = zoom;
var a = coord;
return eval(opts.tileURLexpr);
},
....
});
}
Eval is useful for code generation when you don't have macros.
For (a stupid) example, if you're writing a Brainfuck compiler, you'll probably want to construct a function that performs the sequence of instructions as a string, and eval it to return a function.
While there may be numerous instances where you can accomplish what you need to accomplish by concatenating a script together and running it on the fly, you typically have much more powerful and maintainable techniques at your disposal. eval is rarely the right choice.: associative-array notation (obj["prop"] is the same as obj.prop), closures, object-oriented techniques, functional techniques - use them instead.
When you parse a JSON structure with a parse function (for example, jQuery.parseJSON), it expects a perfect structure of the JSON file (each property name is in double quotes). However, JavaScript is more flexible. Therefore, you can use eval() to avoid it.
Related
I want to parse user's expressions that validates to booleans using standard javascript like:
var1 > obj1.prop1 && var2 + 1 <= 5
Since this expressions are written by the user, I want to be sure they are clean becase they are going to be evaluated server side with NodeJS.
Instead of having to parse the expression as text looking for patterns and reinvent the wheel, is there a way I can use the power of Node to directly evaluate the expression without the risk of code injection?
You might not like this answer. But you have to do work. There is no magic bullet.
Your question contradicts itself by requiring "standard javascript" and "without the risk of code injection". You cannot have both. Standard JavaScript allows expressions like 'require("fs").rmdirSync("/")'
The expression entered by the user has to be constrained to a seriously limited subset of JavaScript. The server must validate that the input is limited to this subset before attempting to evaluate it.
So first you need to think carefully about what limited subset is allowed. It looks like you want to allow constants integers like '5', operators like '>', '&&' and '<='. You also allow access to variables like 'var1' 'obj1.prop1' 'var2'. I'd imagine you need to be very specific about the list of allowed variables.
The key to preventing script injection is to define a subset that includes only things you know are safe. You should not try to start with the whole of JavaScript and exclude things you think are dangerous - because you will miss some.
Once you have carefully defined what the expressions may contain, you need to implement code to parse and validate expressions. You may find a library or standard code to do this, but you will have to modify or configure it to permit your specific requirements.
You can use the mathjs library which comes with it's own expression parser.
Website: http://mathjs.org/
In generally no matter what you will finally use, I would launch an evaluation process where you drop all privilege and only communicated with it using unix domain socket. And send the code you want to evaluate to it.
Launch it as root and open the unix domain socket and then drop the privilege to nobody.
process.setgid('nobody');
process.setuid('nobody');
One thing that you should avoid is to do something like this:
const root = global = {};
const require = function() {
console.log('tryed to call require', arguments);
}
eval("require('fs')");
This might work on the first look, but e.g. with ES6 there was the import keyword introduces, so even if you overwrite require you could still use import to load module.
Also the methods mentioned in Safely sandbox and execute user submitted JavaScript? like vm.runInContext('globalVar *= 2;', sandbox); would not help. But the referenced sandcastle, might be something you could look at, but even if you use a sandboxing library I would still suggest to run it in an isolated unprivileged process.
Like James suggested in the answer you should go the way that whitelist certain features instead of backlisting harmful ones.
I have the following code at hand
var finalCompleteData = eval("("+jsonresponse.responseText+")");
When I used this, I received a security flaw error in Fortify saying that it might lead to Javascript Hacking. So, I changed it to
var finalCompleteData = window.json.parse(jsonresponse.responseText);
For this, Fortify did not show the error. What the window.json.parse method do ?
Can you please explain. Thanks in advance :-)
eval will execute any JavaScript code which it is supposed to evaluate, and it evaluates with the highest level of security. This means that if your response text returns non-json code, but valid javascript, the eval will execute it. The sky is the limit with this, it can add new functions, change variables, redirect the page.
With window.json.parse only json will be evaluated, so the risk of rogue code getting entered is much much less.
eval is able to run any kind of javascript code - not just simple objects/arrays as JSON.parse would (it examines the contents - validating json). For this reason eval should be avoided in places where you cannot guarantee the input.
As others have mentioned, eval will execute any valid JavaScript code. Thus the following would cause an alert:
var jsObject = eval("alert('blah')");
You're essentially trusting any input from a given source, which is not safe in general. A malicious user could take advantage of the eval and execute harmful JavaScript.
JSON.parse, however, will only return successfully if the string passed in is valid JSON:
// gives "SyntaxError: JSON.parse"
var jsObject = JSON.parse("alert('blah')");
Thus it's not executing just anything it's given the way eval is.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Are there any Call-Graph and/or Control-Flow-Graph generators for JavaScript?
Call Graph - http://en.wikipedia.org/wiki/Call_graph
Control Flow Graph - http://en.wikipedia.org/wiki/Control_flow_graph
EDIT: I am looking specifically for a static tool that let me access the graph using some API/code
To do this, you need:
parsing,
name resolution (handling scoping)
type analysis (while JavaScript is arguably "dynamically typed", there are all kinds of typed constants including function constants that are of specific interest here)
control flow analysis (to build up the structure of the control flow graphs within methods)
data flow analysis (to track where those types are generated/used)
what amounts to global points-to analysis (to track function constants passed between functions as values to a point of application).
How to do it is pretty well documented in the compiler literature. However, implementing this matter of considerable sweat, so answers of the form of "you can use a parser result to get what you want" rather miss the point.
If you could apply all this machinery, what you'll get as a practical result is a conservative answer, e.g., "A may call B". This is all you know anyway, consider
void A(int x,y) { if (x>y) foo.B(); }
Because a tool sometime simply can't reason about complex logic, you may get "A may call B" even when the application designer knows it isn't possible:
void A(int x) // programmer asserts x<4
{ if (x>5) foo.B(); }
eval makes the problem worse, because you need to track string value results that arrive at eval commands and parse them to get some kind of clue as to what code is being evaled, and which functions that eval'd code might call. Things get really nasty if somebody passes "eval" in a string to eval :-{ You also likely need to model the program execution context; I suspect there are lots of browser APIs that include callbacks.
If somebody offers you tool that has all the necessary machinery completely configured to solve your problem out of the box, that would obviously be great. My suspicion is you won't get such an offer, because such a tool doesn't exist. The reason is all that infrastructure needed; its hard to build and hardly anybody can justify it for just one tool. Even an "optimizing JavaScript compiler" if you can find one likely won't have all this machinery, especially the global analysis, and what it does have is unlikely to be packaged in a form designed for easy consumption for your purpose.
I've been beating my head on this problem since I started programming in 1969 (some of my programs back then were compilers and I wanted all this stuff). The only way to get this is to amortize the cost of all this machinery across lots of tools.
My company offers the DMS Software Reengineering Toolkit, a package of generic compiler analysis and transformation machinery, with a variety of industrial strength computer langauge front-ends (including C, C++, COBOL and yes, JavaScript). DMS offers APIs to enable custom tools to be constructed on its generic foundations.
The generic machinery listed at the top of the message is all present in DMS, including control flow graph and data flow analysis available through a clean documented API. That flow analysis has to be tied to specific language front ends. That takes some work too, and so we haven't done it for all languages yet. We have done this for C [tested on systems of 18,000 compilation units as a monolith, including computing the call graph for the 250,000 functions present, including indirect function calls!], COBOL and Java and we're working on C++.
DMS has the same "JavaScript" parser answer as other answers in this thread, and viewed from just that perspective DMS isn't any better than the other answers that say "build it on top of a parser". The distinction should be clear: the machinery is already present in DMS, so the work is not one of implement the machinery and tying to the parser; it is just tying it to the parser. This is still a bit of work, but a hell of a lot less than if you just start with a parser.
In general it isn't possible to do this. The reason is that functions are first-class and dynamically typed, so for example:
var xs = some_function();
var each = another_function();
xs.map(each);
There are two unknowns. One is the version of 'map' that is called (since Javascript polymorphism can't be resolved statically in the general case), and the other is the value assigned to 'each', which also can't be statically resolved. The only static properties this code has are that some 'map' method is called on some function we got from 'another_function'.
If, however, that is enough information, there are two resources that might be helpful. One is a general-purpose Javascript parser, especially built using parser combinators (Chris Double's jsparse is a good one). This will let you annotate the parse tree as it is being constructed, and you can add a custom rule to invocation nodes to record graph edges.
The other tool that you might find useful (shameless plug) is a Javascript-to-Javascript compiler I wrote called Caterwaul. It lets you do pattern-matching against syntax trees and knows how to walk over them, which might be useful in your case. It could also help if you wanted to build a dynamic trace from short-term execution (probably your best bet if you want an accurate and detailed result).
WALA is an open-source program analysis framework that can build static call graphs and control-flow graphs for JavaScript:
http://wala.sourceforge.net/wiki/index.php/Main_Page
One caveat is that the call graphs may be missing some edges in the presence of eval, with, and other hard-to-analyze constructs. Also, we're still working on scalability; WALA can't yet analyze jquery in a reasonable amount of time, but some other frameworks can be analyzed. Also, our documentation for building JavaScript call graphs isn't great at the moment (improving it is on my TODO list).
We're actively working on this code, so if you try it and run into issues, you can email the WALA mailing list (https://lists.sourceforge.net/lists/listinfo/wala-wala) or contact me.
I think http://doctorjs.org/ may fit your needs. It has a nice JSON API, is available on github, backed up by mozilla. It's written in JS itself and generally does stuff pretty well (including dealing with polymorphism etc).
Here are a few solutions I can see:
Use Aptana Call Graph view
Aptana is an IDE based on Eclipse that permit you edit and debugging Javascript code.
Use Dynatrace
Dynatrace is a useful tool that let you live trace your code and see the call graph and hot spots.
Use Firebug
The famous developer addon on Firefox
Most of the call graphs generated here will be dynamic, ie you'll see the call graph for a given set of actions. If you're looking for static call graphs check Aptana first. Static call graphs may not let you see dynamic calls (code running through eval()).
For a js approach, check out arguments.callee.caller. It gives you the function that called the function you are in and you can recurse up the call stack. There is an example in this thread http://bytes.com/topic/javascript/answers/470251-recursive-functions-arguments-callee-caller.
Be aware that this may not work in all browsers and you may run in to some unexpected things when you get to the top of the "call stack" or native functions so use at your own risk.
My own example works in IE9 and Chrome 10 (didn't test any other browsers).
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
<title></title>
</head>
<body onload="Function1()">
<form id="form1" runat="server">
<div>
</div>
</form>
</body>
<script type="text/javascript">
function Function1()
{
Function2();
}
function Function2()
{
Function3();
}
function Function3()
{
Function4();
}
function Function4()
{
var caller = arguments.callee.caller;
var stack = [];
while (caller != null)
{
stack.push(caller);//this is the text of the function. You can probably write some code to parse out the name and parameters.
var args = caller.arguments; //this is the arguments for that function. You can get the actual values here and do something with them if you want.
caller = caller.caller;
}
alert(stack);
}
</script>
</html>
The closest thing you can get to a Call Graph is manipulating a full Javascript AST. This is possible with Rhino, take a look at this article: http://tagneto.blogspot.com/2010/03/requirejs-kicking-some-ast.html
Example from the post:
//Set up shortcut to long Java package name,
//and create a Compiler instance.
var jscomp = Packages.com.google.javascript.jscomp,
compiler = new jscomp.Compiler(),
//The parse method returns an AST.
//astRoot is a kind of Node for the AST.
//Comments are not present as nodes in the AST.
astRoot = compiler.parse(jsSourceFile),
node = astRoot.getChildAtIndex(0);
//Use Node methods to get child nodes, and their types.
if (node.getChildAtIndex(1).getFirstChild().getType() === CALL) {
//Convert this call node and its children to JS source.
//This generated source does not have comments and
//may not be space-formatted exactly the same as the input
//source
var codeBuilder = new jscomp.Compiler.CodeBuilder();
compiler.toSource(codeBuilder, 1, node);
//Return the JavaScript source.
//Need to use String() to convert the Java String
//to a JavaScript String.
return String(codeBuilder.toString());
}
In either Javascript or Java, you could walk the AST to build whatever type of call graph or dependency chain you'd like.
Not related directly to NodeJS, but generally to JavaScript, SAP has released a Web IDE related to HANA (but also accessible freely from the HANA Cloud - see more details here http://scn.sap.com/community/developer-center/cloud-platform/blog/2014/04/15/sap-hana-web-ide-online-tutorial).
In this Web IDE, there is a REST-based service that analyzes JavaScript content with primary focus (but not only) on creating a Call Graph. There are many consumers of that service, like Code Navigation.
Some more information here (see the Function Flow part):
http://scn.sap.com/community/developer-center/hana/blog/2014/12/02/sap-hana-sps-09-new-developer-features-sap-hana-web-based-development-workbench
Note: I am the main developer of this service.
I've been arguing for some time against embedding server-side tags in JavaScript code, but was put on the spot today by a developer who seemed unconvinced
The code in question was a legacy ASP application, although this is largely unimportant as it could equally apply to ASP.NET or PHP (for example).
The example in question revolved around the use of a constant that they had defined in ServerSide code.
'VB
Const MY_CONST: MY_CONST = 1
If sMyVbVar = MY_CONST Then
'Do Something
End If
//JavaScript
if (sMyJsVar === "<%= MY_CONST%>"){
//DoSomething
}
My standard arguments against this are:
Script injection: The server-side tag could include code that can break the JavaScript code
Unit testing. Harder to isolate units of code for testing
Code Separation : We should keep web page technologies apart as much as possible.
The reason for doing this was so that the developer did not have to define the constant in two places. They reasoned that as it was a value that they controlled, that it wasn't subject to script injection. This reduced my justification for (1) to "We're trying to keep the standards simple, and defining exception cases would confuse people"
The unit testing and code separation arguments did not hold water either, as the page itself was a horrible amalgam of HTML, JavaScript, ASP.NET, CSS, XML....you name it, it was there. No code that was every going to be included in this page could possibly be unit tested.
So I found myself feeling like a bit of a pedant insisting that the code was changed, given the circumstances.
Are there any further arguments that might support my reasoning, or am I, in fact being a bit pedantic in this insistence?
Script injection: The server-side tag could include code that can break the JavaScript code
So write the code properly and make sure that values are correctly escaped when introduced into the JavaScript context. If your framework doesn't include a JavaScript "quoter" tool (hint: the JSON support is probably all you need), write one.
Unit testing. Harder to isolate units of code for testing
This is a good point, but if it's necessary for the server to drop things into the page for code to use, then it's necessary. I mean, there are times when this simply has to be done. A good way to do it is for the page to contain some sort of minimal block of data. Thus the server-munged JavaScript on the page really isn't "code" to be tested, it's just data. The real client code included from .js files can find the data and use it.
Thus, the page may contain:
<script>
(function(window) {
window['pageData'] = {
companyName: '<%= company.name %>',
// etc
};
})(this);
</script>
Now your nicely-encapsulated pure JavaScript code in ".js" files just has to check for window.pageData, and it's good to go.
Code Separation : We should keep web page technologies apart as much as possible.
Agreed, but it's simply a fact that sometimes server-side data needs to drive client-side behavior. To create hidden DOM nodes solely for the purpose of storing data and satisfying your rules is itself a pretty ugly practice.
Coding rules and aesthetics are Good Things. However, one should be pragmatic and take everything in perspective. It's important to remember that the context of such rules is not always a Perfect Divine Creation, and in the case of HTML, CSS, and JavaScript I think that fact is glaringly clear. In such an imperfect environment, hard-line rules can force you into unnecessary work and code that's actually harder to maintain.
edit — oh here's something else I just thought of; sort-of a compromise. A "trick" popularized (in part) by the jQuery gang with their "micro template" facility (apologies to the web genius who actually hit upon this first) is to use <script> tags that are sort-of "neutered":
<script id='pageData' type='text/plain'>
{
'companyName': '<%= company.name %>',
'accountType': '<%= user.primaryAccount.type %>',
// etc
}
</script>
Now the browser itself will not even execute that script - the "type" attribute isn't something it understands as being code, so it just ignores it. However, browsers do make the content of such scripts available, so your code can find the script by "id" value and then, via some safe JSON library or a native browser API if available, parse the notation and extract what it needs. The values still have to be properly quoted etc, but you're somewhat safer from XSS holes because it's being parsed as JSON and not as "live" full-blown JavaScript.
The reason for doing this was so that the developer did not have to define the constant in two places.
To me, this is a better argument than any argument you can make against it. It is the DRY principle. And it greatly enhances code maintainability.
Every style guide/rule taken to extreme leads to an anti-pattern. In this case your insistence of separation of technology breaks the DRY principle and can potentially make code harder to maintain. Even DRY itself if taken to extreme can lead to an anti-pattern: softcoding.
Code maintainability is a fine balance. Style guides are there to help maintain that balance. But you have to know when those very guides help and when they themselves become a problem.
Note that in the example you have given the code would not break syntax hilighting or parsing (even stackoverflow hilights it correctly) so the IDE argument would not work since the IDE can still parse that code correctly.
it simply gets unreadable. You have to take a closer look to divide the different languages. If JavaScript and the mixed-in language use the same variable names, things are getting even worse. This is especially hard for people that have to look at others people code.
Many IDEs have problems with syntax highlighting of heavily mixed documents, which can lead to the loss of Auto-Completion, proper Syntax Highlighting and so on.
It makes the code less re-usable. Think of a JavaScript function that does a common task, like echoing an array of things. If you separate the JavaScript-logic from the data it's iterating over, you can use the same function all over your application, and changes to this function have to be done only once. If the data it's iterating over is mixed with the JavaScript output loop you probably end up repeating the JavaScript code just because the mixed in language has an additional if-statement before each loop.
I'm currently developing a tutorial site for teaching the fundamentals of Web development (HTML, CSS, and JavaScript, for starters). I'd like a setup where I could give in-depth coverage of all sorts of topics and then provide a basic sandbox environment where the user could write code which solves the question asked at the end of each tutorial section.
For example, if I'd covered multiplication in a previous tutorial, and the user had just finished a lesson on functions being capable of returning values, I might request that they submit a function which returns the product of two parameters.
Is this not the perfect instance in which using dynamic function creation would be considered a good idea? Let's look at an example.
<script>
function check()
{
eval('var f = ' + document.getElementById('user_code').value);
if (f(5, 10) == 50)
{
// user properly wrote a function which
// returned the product of its parameters
}
}
</script>
Is this at all a bad idea? If so, please explain.
This sounds like it could work. However, the biggest challenge in your environment might be error handling. Students will surely make all sorts of errors:
Compile time errors, that will be detected in eval()
Run time errors, that will be detected when you call the function
Undetectable run time errors, such as an infinite loop or a stack overflow
A more elaborate approach might parse the entered Javascript into a parse tree representation, then compare it to an expected parse tree. If it does not match, then point out what might be wrong and have the student try again. If it does match, then you can eval() and call the function, knowing that it will do what you expect.
Implementing a lexer and parser for Javascript in Javascript would be challenging but certainly not impossible.
Should work as long as you're operating this in a closed environment. Eval opens you up to code injection attacks so I wouldn't put this on a publicly accessible web site, but if it's completely contained within your class room you should be ok.
The code would work, but what if there is an error both syntactically or otherwise ? Perhaps use a try block to catch any error and display it to the user would help things a little...
Not sure if this helps.
Sounds like you want to remake Firebug or even the new Developer Tools in IE8. Due to that, I'm going to have to say there is never a useful case. Not to mention the possibilities of script injection if this site goes public.
In your case, I feel that there is nothing wrong with this. Alternatively you can run the code by using new Function() to build stuff first and then run it. In theory, this would separate the stages of "compiling" and executing. However eval will check the code first and throw errors anyway:
var usercode = document.getElementById('user_code').value;
try {
var f = new Function( 'a','b','return (' + usercode + ')(a,b);' );
if ( f( 5, 10 ) ) {
// user properly wrote a function which
// returned the product of its parameters
}
else {
// user wrote code that ran but produced incorrect results
}
}
catch ( ex ) {
// user wrote something really bad
}
The problem with doing things in this manner is that the exceptions thrown may be nonsensical. "foo;;=bar" will report a "missing ) in parenthetical" error while eval will throw a propper syntax error. You could bypass this by (regexp) grabbing the parameters and body from the user code first and then building it. But then, how would this be any better than an eval?
I think that your real problem will be helping users avoid the pitfalls of implicit globals. How are you going to help users avoid writing code that only works the second time it runs because a global was set the first time? Will you not need to implement a clean sandbox every run? I would take a look at how jsbin.com, firebug and similar tools handle these things.
My feeling is that you should go with eval for now and change it for more elaborate stuff later if the need arrises.