Safely parsing and evaluating user input

Safely parsing and evaluating user input - javascript

I'm working on a project that's essentially a templating domain-specific language. In my project, I accept lines of user input in the following form:
'{{index(1, 5)}}'
'{{firstName()}} X. {{lastName()}}'
'{{floating(-0.5, 0.5)}}'
'{{text(5, "words")}}'
Any command between double curly braces ({{ }}) has a corresponding Javascript method that should be called when that command is encountered. (For example, function index(min, max) {...} in the case of the first one).
I'm having a difficult time figuring out how to safely accept the input and call the appropriate function. I know that the way I'm doing it now isn't safe. I simply eval() anything between two sets of curly braces.
How can I parse these input strings such that I can flexibly match a function call between curly braces and execute that function with any parameters given, while still not blindly calling eval() with the code?
I've considered making a mapping (if command is index(), call function index() {}), but this doesn't seem very flexible; how do I collect and pass any parameters (e.g. {{index(2, 5)}}) if any are present?
This is written in Node.js.

This problem breaks down into:
Parsing the string
Evaluating the resulting function graph
Dispatching to each function (as part of #2 above)
Parsing the string
Unfortunately, with the requirements you have, parsing the {{...}} string is quite complex. You have at least these issues to deal with:
Functions can be nested {{function1(function2(), 2, 3)}}.
Strings can contain (escaped) quotes, and can contain commas, so even without requirement #1 above the trivial approach to finding the discrete arguments (splitting on a comma) won't work.
So...you need a proper parser. You could try to cobble one together ad hoc, but this is where parser generators come into the picture, like PEG.js or Jison (those are just examples, not necessarily recommendations — I did happen to notice one of the Jison examples is a JSON parser, which would be about half the battle). Writing a parser is out of scope for answering a question on SO I'm afraid. :-)
Evaluating the resulting function graph
Depending on what tool you use, your parser generator may handle this for you. (I'm pretty sure PEG.js and Jison both would, for instance.)
If not, then after parsing you'll presumably end up with an object graph of some sort, which gives you the functions and their arguments (which might be functions with arguments...which might be...).
functionA
1
"two"
functionB
"a"
functionC
42
functionD
27
functionA there has five arguments, the third of which is functionB with two arguments, and so on.
Your next task, then, is to evaluate those functions deepest first (and at the same depth, left-to-right) and replace them in the relevant arguments list with their result, so you'll need a depth-first traversal algorithm. By deepest first and left-to-right (top-to-bottom in the bullet list above) I mean that in the list above, you have to call functionC first, then functionB, then functionD, and finally functionA.
Dispatching to each function
Depending again on the tool you use, it may handle this bit too. Again I suspect PEG.js does, and I wouldn't be surprised if Jison did as well.
At the point where you're ready to call a function that (no longer) has function calls as arguments, you'll presumably have the function name and an array of arguments. Assuming you store your functions in a map:
var functions = {
index: function() { /* ... */ },
firstName: function() { /* ... */ },
// ...
};
...calling them is the easy bit:
functionResult = functions[functionName].apply(undefined, functionArguments);
I'm sorry not to be able to say "Just do X, and you're there," but it really isn't a trivial problem. I would throw tools at it, I wouldn't invent this wheel myself.

If possible do not evaluate the user input.
If you need to evaluate it, evaluate it in controlled scope and environment.
The last one means instead of using eval() use new Function() or specially designed libraries like https://github.com/dtao/lemming.js
See http://www.2ality.com/2014/01/eval.html for more information about eval vs new Function()
For more sophisticated approach try creating your own parser, check https://stackoverflow.com/a/2630085/481422
Search for comment // ECMAScript parser in https://github.com/douglascrockford/JSLint/blob/master/jslint.js

You could try something like this:
Assuming you have a function like this:
'{{floating(-0.5, 0.5)}}'
And all your actual functions are referenced in an object, like this:
var myFunctions = {
'index': function(){/* Do stuff */},
'firstName': function(){}
}
Then, this should work:
function parse(var input){
var temp = input.replace('{{','').replace(')}}','').split('('),
fn = temp[0];
arguments = temp[1].split(',');
myFunctions[fn].apply(this, arguments);
}
Please note that this only works for simple function calls that don't have functions nested as their arguments. It also passes all arguments as strings, instead of the types that may be intended (Numbers, booleans, etc).
If you want to handle more complex strings, you'll need to use a proper parser or template engine, as #T.J. Crowder suggested in the comments.

Related

passing single argument into a function that requires multiple arguments in javascript

I'm trying to read through some source code on the internet, and I'm getting confused because the author defined a function as:
var _0x80a1 = function (x, a) {...}
But then only calls it using statements like this:
_0x80a1("0x0")
How does that work?

JavaScript parameters are optional you don't need to pass them. So you can do something like this:
function multiply(a, b) {
if(typeof b === 'undefined') {
b = 10;
}
return a * b;
}
console.log(multiply(5));
// expected output: 50
In newer versions of JS you can also do default parameters like this:
function multiply(a, b = 10) {
return a * b;
}
console.log(multiply(5));
// expected output: 50

No function "requires" an argument in JavaScript. It's not a strongly typed language.
I might be typing out of my own butt, but I think function's arguments are syntactic sugar in JS. You can always pass any amount of arguments, regardless of the function's "signature", because the only thing that identifies a function in JS, is its name (and the object on which it is called). That is why, the arguments object exists.
So, as others pointed it out, the second, third, or any other argument that wasn't given will simply be undefined.
An answer on this subject with examples

In ,JavaScript function parameters are optional.If your'e not making use of 'a' inside your function then JS compiler don't care about that.If your'e making use of 'a' inside your function then you will encounter some error like "a is undefined".
function (x,a=10){
}
You can set default parameters like this.Now even if your'r passing one parameter to your function it will run without any errors

I was curious so tried to understand this a bit so I could try to answer.
The variable _8x80a1 is a literal bit definition (https://www.hexadecimaldictionary.com/hexadecimal/0x80A1/) representing e.g. 32929 in decimal.
I'm guessing JS internally numbers all functions when its run. This leaves an entire integer (32766) 'vanilla' functions that can be compiled before using a literal as a function name might cause a conflict.
So the 'x' in the function def. looks like it's passing a string, but it might be just calling 'Function #0' in the _8x80a1 var/function. This would make sense if the function contains multiplpe 'sub functions', as then the 'a' variable can be an object collection (e.g. parameters), that can be passed to the sub-function.
Roughtly, I think .. Not used JS for a whilst and just thought I'd try to help answer! ;-) Essentially a compact way to make a toolkit you can copy between projects, and know your references will all work as expected to your tools, without disrupting e.g. jQuery or other scripts. (Wouldn't be surprised if this is how JS is minified actually ;)).
Chris

replace js function keyword with f

I want to know if it's possible to do something like:
var f = function;
and than use f like it would be the js function keyword
if something like this would be possible I would use it for js code minimization, cuz I have a lot of function in my code

Similar to what Pointy pointed out in the comments, what you can do is use the function constructor to create a new function and pass it as string. In other words:
function f(s) { return new Function(s) };
var foo = f('var s = "hello"; return s');
alert(foo()); //=> "hello"
But again, I think this is unnecessary.
Edit: To add parameters you'd use the arguments object.
function f() {
var args = Array.prototype.slice.call(arguments);
var func = args.pop();
return new Function(args.join(','), func);
}
var add = f('a,b', 'return a + b');

It is not possible to do as you describe.
The closest I can think of is to make a function for generating functions, using eval to parse a passed string. Using eval however, is generally evil and should be avoided. Overall, I would not recommend doing something like this at all.
Also, it is worth noting that there is very little point in doing what you want, as javascript sent over the wire should be compressed first, so the the word function will be represented by one token, regardless of how long the word is.
If you want your source code to be more readable and concise, I would recommend coffee-script which compiles to javascript, and allows you to define functions without using even the first letter of function. Chuck Norris would approve!
Summary of conclusions and recommendations
use coffee-script for cruft free source
compile it to verbose javascript you never look at
minify that javascript (uglify is a great compressor, or google's closure)
gzip it (will compress repetitive strings well)
send it
The key issue here is that after it is gzipped, the size of the word function becomes irrelevant, so the only remaining issue is the readability of your source code.

Yeah...to be quite honest, I wouldn't worry about trying to shorten the function keyword to 'f' which gives you back very little for the work and it would also hurt the readability of your code. And, unless you have something like a million functions, I wouldn't focus on that. As #prodigitalson said - run it through a minifier. That should take care of the real minifications that are possible in your code and still maintain readability in your code base (which is more important than saving a few bytes).
Also, per my comment, there is something (potentially) on its way to replace the long 'function' keyword (something Brendan Eich has said he would have considered changing - you have to remember that he designed the language in 10 days or so). Read more about it here: Fat Arrow Syntax Of course, these sorts of things can always change...but the standards definition bodies are looking at it.

You could do this with the hygenic macro functionality of http://sweetjs.org/

Amount of passed parameters to functions in JavaScript

JavaScript is a revelation to me. I thought it would be like another sort of classical languages like C#, Java, etc. But it didn't. "Dynamic world" is tough and unpredictable. I was astonished when I read that functions can receive as many parameters as you desire. Without any error! I don't like it at all. I want more "staticness", I want some sort of compile-time errors!
My question is: am I need to worry about that? Is it a good practice to throw an exception if a quantity of passed parameters are more than a particular function expects?
function foo(one, two, three)
{
// Is it good?
if(arguments.length > arguments.callee.length)
throw new Error("Wrong quantity of arguments in " + arguments.callee.name + "()");
/* Stuff */
}
foo(1, 2, 3, 4); // -> Error
foo(1, 2, 3); // -> OK
Should I be concerned about it at all?
Thanks in advance!

You probably should not be concerned. There is no blanket rule on how to handle errors like this. It depends entirely upon the type of error and the type of situation. In some cases, where it's a serious programming error and there is no way to proceed (like insufficient arguments to perform the desired function), it may make sense to throw an exception or return an error from the function. But, in other cases, an extra argument can just be safely ignored and you can continue on your merry way as if that argument was never passed.
As you get used to javascript, you will come to understand that many function arguments can be optional and a single function may be correctly called with zero, one, two or three or even N arguments and the code in the function can adapt appropriately. This actually allows you to do things that are not as easy to do in more "static" languages. It is even possible to adapt to the type of the arguments and do something appropriately based on the type of the argument. While this may sound like heresy to someone that only has experience in hard-typed languages, it can actually be extremely useful.
As you maintain a body of code over time, you will also come to find that it's nice to be able to add an argument to the definition of a function, add code to that function that defaults it to a reasonable value if it isn't passed and NOT have to change any of the prior code that was using that function, yet a few new places that need that new argument can start using it immediately. Rather then grepping through the entire codebase to fix up every caller of that function, you can just make one change in one file and immediately start using a new argument to the function without changing all the other callers. This is enormously useful.
So, in more direct answer to your question, an extra argument passed to a function is never a serious error in javascript. Your code could just ignore it and proceed. If you want to alert the developer who wrote that code that an unexpected argument was passed, you can notify them somehow (perhaps some warning text on the debug console) in the "debug" version of your function/library, but I see no reason why you should stop execution in the "production" version of your function/library when you can proceed without any harm.

You don't need to worry about this. If you pass too many arguments, the function will just ignore it. You should only throw an error if there are too few arguments. In that case, the function might not be able to run.

While I agree that the number of arguments aren't important (and won't cause a problem so long as you type-check the arguments you're getting before you use them), since an unused, uncalled, argument won't do anything, if you're particularly concerned you could just create a subset of the passed-arguments and access that object internally:
function test(arg1, arg2, arg3) {
var slice = Array.prototype.slice,
subset = slice.call(arguments, 0, 3); // depending on how many arguments you want
}
Of course this means that you've now got to recover the parameters from the args object, and since surplus arguments seem to be perfectly safe this seems pointless. But it is still an option.
Albeit unnecessary.

Multiple arguments vs. options object

When creating a JavaScript function with multiple arguments, I am always confronted with this choice: pass a list of arguments vs. pass an options object.
For example I am writing a function to map a nodeList to an array:
function map(nodeList, callback, thisObject, fromIndex, toIndex){
...
}
I could instead use this:
function map(options){
...
}
where options is an object:
options={
nodeList:...,
callback:...,
thisObject:...,
fromIndex:...,
toIndex:...
}
Which one is the recommended way? Are there guidelines for when to use one vs. the other?
[Update] There seems to be a consensus in favor of the options object, so I'd like to add a comment: one reason why I was tempted to use the list of arguments in my case was to have a behavior consistent with the JavaScript built in array.map method.

Like many of the others, I often prefer passing an options object to a function instead of passing a long list of parameters, but it really depends on the exact context.
I use code readability as the litmus test.
For instance, if I have this function call:
checkStringLength(inputStr, 10);
I think that code is quite readable the way it is and passing individual parameters is just fine.
On the other hand, there are functions with calls like this:
initiateTransferProtocol("http", false, 150, 90, null, true, 18);
Completely unreadable unless you do some research. On the other hand, this code reads well:
initiateTransferProtocol({
"protocol": "http",
"sync": false,
"delayBetweenRetries": 150,
"randomVarianceBetweenRetries": 90,
"retryCallback": null,
"log": true,
"maxRetries": 18
});
It is more of an art than a science, but if I had to name rules of thumb:
Use an options parameter if:
You have more than four parameters
Any of the parameters are optional
You've ever had to look up the function to figure out what parameters it takes
If someone ever tries to strangle you while screaming "ARRRRRG!"

Multiple arguments are mostly for obligatory parameters. There's nothing wrong with them.
If you have optional parameters, it gets complicated. If one of them relies on the others, so that they have a certain order (e.g. the fourth one needs the third one), you still should use multiple arguments. Nearly all native EcmaScript and DOM-methods work like this. A good example is the open method of XMLHTTPrequests, where the last 3 arguments are optional - the rule is like "no password without a user" (see also MDN docs).
Option objects come in handy in two cases:
You've got so many parameters that it gets confusing: The "naming" will help you, you don't have to worry about the order of them (especially if they may change)
You've got optional parameters. The objects are very flexible, and without any ordering you just pass the things you need and nothing else (or undefineds).
In your case, I'd recommend map(nodeList, callback, options). nodelist and callback are required, the other three arguments come in only occasionally and have reasonable defaults.
Another example is JSON.stringify. You might want to use the space parameter without passing a replacer function - then you have to call …, null, 4). An arguments object might have been better, although its not really reasonable for only 2 parameters.

Using the 'options as an object' approach is going to be best. You don't have to worry about the order of the properties and there's more flexibility in what data gets passed (optional parameters for example)
Creating an object also means the options could be easily used on multiple functions:
options={
nodeList:...,
callback:...,
thisObject:...,
fromIndex:...,
toIndex:...
}
function1(options){
alert(options.nodeList);
}
function2(options){
alert(options.fromIndex);
}

It can be good to use both. If your function has one or two required parameters and a bunch of optional ones, make the first two parameters required and the third an optional options hash.
In your example, I'd do map(nodeList, callback, options). Nodelist and callback are required, it's fairly easy to tell what's happening just by reading a call to it, and it's like existing map functions. Any other options can be passed as an optional third parameter.

I may be a little late to the party with this response, but I was searching for other developers' opinions on this very topic and came across this thread.
I very much disagree with most of the responders, and side with the 'multiple arguments' approach. My main argument being that it discourages other anti-patterns like "mutating and returning the param object", or "passing the same param object on to other functions". I've worked in codebases which have extensively abused this anti-pattern, and debugging code which does this quickly becomes impossible. I think this is a very Javascript-specific rule of thumb, since Javascript is not strongly typed and allows for such arbitrarily structured objects.
My personal opinion is that developers should be explicit when calling functions, avoid passing around redundant data and avoid modify-by-reference. It's not that this patterns precludes writing concise, correct code. I just feel it makes it much easier for your project to fall into bad development practices.
Consider the following terrible code:
function main() {
const x = foo({
param1: "something",
param2: "something else",
param3: "more variables"
});
return x;
}
function foo(params) {
params.param1 = "Something new";
bar(params);
return params;
}
function bar(params) {
params.param2 = "Something else entirely";
const y = baz(params);
return params.param2;
}
function baz(params) {
params.params3 = "Changed my mind";
return params;
}
Not only does this kind of require more explicit documentation to specify intent, but it also leaves room for vague errors.
What if a developer modifies param1 in bar()? How long do you think it would take looking through a codebase of sufficident size to catch this?
Admittedly, this is example is slightly disingenuous because it assumes developers have already committed several anti-patterns by this point. But it shows how passing objects containing parameters allows greater room for error and ambiguity, requiring a greater degree of conscientiousness and observance of const correctness.
Just my two-cents on the issue!

Your comment on the question:
in my example the last three are optional.
So why not do this? (Note: This is fairly raw Javascript. Normally I'd use a default hash and update it with the options passed in by using Object.extend or JQuery.extend or similar..)
function map(nodeList, callback, options) {
options = options || {};
var thisObject = options.thisObject || {};
var fromIndex = options.fromIndex || 0;
var toIndex = options.toIndex || 0;
}
So, now since it's now much more obvious what's optional and what's not, all of these are valid uses of the function:
map(nodeList, callback);
map(nodeList, callback, {});
map(nodeList, callback, null);
map(nodeList, callback, {
thisObject: {some: 'object'},
});
map(nodeList, callback, {
toIndex: 100,
});
map(nodeList, callback, {
thisObject: {some: 'object'},
fromIndex: 0,
toIndex: 100,
});

It depends.
Based on my observation on those popular libraries design, here are the scenarios we should use option object:
The parameter list is long (>4).
Some or all parameters are optional and they don’t rely on a certain
order.
The parameter list might grow in future API update.
The API will be called from other code and the API name is not clear
enough to tell the parameters’ meaning. So it might need strong
parameter name for readability.
And scenarios to use parameter list:
Parameter list is short (<= 4).
Most of or all of the parameters are required.
Optional parameters are in a certain order. (i.e.: $.get )
Easy to tell the parameters meaning by API name.

Object is more preferable, because if you pass an object its easy to extend number of properties in that objects and you don't have to watch for order in which your arguments has been passed.

For a function that usually uses some predefined arguments you would better use option object. The opposite example will be something like a function that is getting infinite number of arguments like: setCSS({height:100},{width:200},{background:"#000"}).

I would look at large javascript projects.
Things like google map you will frequently see that instantiated objects require an object but functions require parameters. I would think this has to do with OPTION argumemnts.
If you need default arguments or optional arguments an object would probably be better because it is more flexible. But if you don't normal functional arguments are more explicit.
Javascript has an arguments object too.
https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Functions_and_function_scope/arguments

Dynamic vs Inline RegExp performance in JavaScript

I stumbled upon that performance test, saying that RegExps in JavaScript are not necessarily slow: http://jsperf.com/regexp-indexof-perf
There's one thing i didn't get though: two cases involve something that i believed to be exactly the same:
RegExp('(?:^| )foo(?: |$)').test(node.className);
And
/(?:^| )foo(?: |$)/.test(node.className);
In my mind, those two lines were exactly the same, the second one being some kind of shorthand to create a RegExp object. Still, it's twice faster than the first.
Those cases are called "dynamic regexp" and "inline regexp".
Could someone help me understand the difference (and the performance gap) between these two?

Nowadays, answers given here are not entirely complete/correct.
Starting from ES5, the literal syntax behavior is the same as RegExp() syntax regarding object creation: both of them creates a new RegExp object every time code path hits an expression in which they are taking part.
Therefore, the only difference between them now is how often that regexp is compiled:
With literal syntax - one time during initial code parsing and
compiling
With RegExp() syntax - every time new object gets created
See, for instance, Stoyan Stefanov's JavaScript Patterns book:
Another distinction between the regular expression literal and the
constructor is that the literal creates an object only once during
parse time. If you create the same regular expression in a loop, the
previously created object will be returned with all its properties
(such as lastIndex) already set from the first time. Consider the
following example as an illustration of how the same object is
returned twice.
function getRE() {
var re = /[a-z]/;
re.foo = "bar";
return re;
}
var reg = getRE(),
re2 = getRE();
console.log(reg === re2); // true
reg.foo = "baz";
console.log(re2.foo); // "baz"
This behavior has changed in ES5 and the literal also creates new objects. The behavior has also been corrected in many browser
environments, so it’s not to be relied on.
If you run this sample in all modern browsers or NodeJS, you get the following instead:
false
bar
Meaning that every time you're calling the getRE() function, a new RegExp object is created even with literal syntax approach.
The above not only explains why you shouldn't use the RegExp() for immutable regexps (it's very well known performance issue today), but also explains:
(I am more surprised that inlineRegExp and storedRegExp have different
results.)
The storedRegExp is about 5 - 20% percent faster across browsers than inlineRegExp because there is no overhead of creating (and garbage collecting) a new RegExp object every time.
Conclusion:
Always create your immutable regexps with literal syntax and cache it if it's to be re-used. In other words, don't rely on that difference in behavior in envs below ES5, and continue caching appropriately in envs above.
Why literal syntax? It has some advantages comparing to constructor syntax:
It is shorter and doesn’t force you to think in terms of class-like
constructors.
When using the RegExp() constructor, you also need to escape quotes and double-escape backslashes. It makes regular expressions
that are hard to read and understand by their nature even more harder.
(Free citation from the same Stoyan Stefanov's JavaScript Patterns book).
Hence, it's always a good idea to stick with the literal syntax, unless your regexp isn't known at the compile time.

The difference in performance is not related to the syntax that is used is partly related to the syntax that is used: in /pattern/ and RegExp(/pattern/) (where you did not test the latter) the regular expression is only compiled once, but for RegExp('pattern') the expression is compiled on each usage. See Alexander's answer, which should be the accepted answer today.
Apart from the above, in your tests for inlineRegExp and storedRegExp you're looking at code that is initialized once when the source code text is parsed, while for dynamicRegExp the regular expression is created for each invocation of the method. Note that the actual tests run things like r = dynamicRegExp(element) many times, while the preparation code is only run once.
The following gives you about the same results, according to another jsPerf:
var reContains = /(?:^| )foo(?: |$)/;
...and
var reContains = RegExp('(?:^| )foo(?: |$)');
...when both are used with
function storedRegExp(node) {
return reContains.test(node.className);
}
Sure, the source code of RegExp('(?:^| )foo(?: |$)') might first be parsed into a String, and then into a RegExp, but I doubt that by itself will be twice as slow. However, the following will create a new RegExp(..) again and again for each method call:
function dynamicRegExp(node) {
return RegExp('(?:^| )foo(?: |$)').test(node.className);
}
If in the original test you'd only call each method once, then the inline version would not be a whopping 2 times faster.
(I am more surprised that inlineRegExp and storedRegExp have different results. This is explained in Alexander's answer too.)

in the second case, the regular expression object is created during the parsing of the language, and in the first case, the RegExp class constructor has to parse an arbitrary string.

Develop Reference

JavaScript is the programming language of the Web.

Safely parsing and evaluating user input - javascript

Related

passing single argument into a function that requires multiple arguments in javascript

replace js function keyword with f

Amount of passed parameters to functions in JavaScript

Multiple arguments vs. options object

Dynamic vs Inline RegExp performance in JavaScript

Categories

Resources