Building a LINQ-like query API in JavaScript - javascript

I'd like to write a JavaScript class that works like C#'s IQueryable<T> interface to enable nicely-formatted queries against a remote data source. In other words, I'd like to be able to write the following (using ES6 arrow syntax):
datasource.query(Sandwich)
.where(s => s.bread.type == 'rye')
.orderBy(s => s.ketchup.amount)
.take(5)
.select(s => { 'name': s.name });
and turn that into something like
SELECT s.name AS name
FROM sandwich s
JOIN bread b ON b.sandwich_id = s.id
JOIN ketchup k on k.sandwich_id = s.id
WHERE b.type = 'rye'
ORDER BY k.amount
LIMIT 5;
with the SQL query (or whatever query language is used) being actually sent to the server. (Doing the filtering on the client side is not feasible because the server might return tons of data.)
In C#, this functionality is supported by the Expression class, which lets you construct an expression tree from a lambda function. But JavaScript has no equivalent, as far as I know. My original plan was to feed f.toString() to Esprima's parser for the function f passed as the argument to select(), where(), etc. and use that expression tree. This approach works great as long as the expressions refer only to literals, but when you try something like
var breadType = 'rye';
datasource.query(Sandwich)
.where(s => s.bread.type == breadType)
...
it fails, because you'll have a token breadType that you can't replace with a value. As far as I can tell, JavaScript has no way to introspect the function closure and get the value of breadType after the fact externally.
My next thought was that since Esprima will give me a list of tokens, I could modify the function body in-place to something like
return {
'breadType': breadType
};
and then call it, taking advantage of the fact that even if I can't access the closure, the function itself can. But modification of a function's code in-place also seems to be impossible.
Another approach that would not require Esprima would be to pass in a sentinel object as the argument to the inner function f and override its comparison operators, which is how SQLAlchemy's filter() works in Python. But Python provides operator overloading and JavaScript does not, so this also fails.
This leaves me with two inferior solutions. One is to do something like this:
var breadType = 'rye';
datasource.query(Sandwich)
.where(s => s.bread.type == breadType)
.forValues(() => {
'breadType': breadType
});
In other words, I could force the caller to provide the closure context manually. But this is pretty lame.
Another approach is to do the sentinel object thing but with functions instead of operators since operators can't be overloaded:
var breadType = 'rye';
datasource.query(Sandwich)
.where(s => s.bread.type.equals(breadType));
ES6's Proxy objects will make this simple to implement, but it's still not as good as the version with regular operators.
Sorry for the long post. My ultimate question is whether it's possible to achieve this with the ideal syntax shown in the first code block and, if so, how to do it. Thanks!

No, this is indeed impossible for the reasons you outlined. If you want to support passing arbitrary closures as arguments, then your only choice is to execute the functions. You cannot transform them to SQL statements, at some degree this will always fail regardless how many static code analysis you perform on the files.
I guess your best bet here are template literals, where you could have something like
var breadType = 'rye';
datasource.query(Sandwich, `
.where(s => s.bread.type == ${breadType})
.orderBy(s => s.ketchup.amount)
.take(5)
.select(s => { 'name': s.name })
`)
so that you can keep your syntax as you want, but will have to supply all external variables explicitly.

Related

What's the purpose of Symbol in terms of unique object identifiers?

Notes about 'Not a duplicate':
I've been told this is a duplicate of What is the use of Symbol in javascript ECMAScript 6?. Well, it doesn't seem right to me. The code they've given is this:
const door = {};
// library 1
const cake1 = Symbol('cake');
door[cake1] = () => console.log('chocolate');
// library 2
const cake2 = Symbol('cake');
door[cake2] = () => console.log('vanilla');
// your code
door[cake1]();
door[cake2]();
The only thing that makes this work is because cake1 and cake2 are different (unique) names. But the developer has explicitly given these; there is nothing offered by Symbol which helps here.
For example if you change cake1 and cake2 to cake and run it, it will error:
Uncaught SyntaxError: Identifier 'cake' has already been declared
If you're already having to manually come up with unique identifiers then how is Symbol helping?
If you execute this in your console:
Symbol('cake') === Symbol('cake');
It evaluates to false. So they're unique. But in order to actually use them, you're now having to come up with 2 key names (cake1 and cake2) which are unique. This has to be done manually by the developer; there's nothing in Symbol or JavaScript in general which will help with that. You're basically creating a unique identifier using Symbol but then having to assign it manually to...a unique identifier that you've had to come up with as a developer.
With regards to the linked post they cite this as an example which does not use Symbol:
const door = {};
// from library 1
door.cake = () => console.log('chocolate');
// from library 2
door.cake = () => console.log('vanilla');
// your code
door.cake();
They try to claim this is a problem and will only log "vanilla". Well clearly that's because door.cake isn't unique (it's declared twice). The "fix" is as simple as using cake1 and cake2:
door.cake1 = () => console.log('chocolate');
door.cake2 = () => console.log('vanilla');
door.cake1(); // Outputs "chocolate"
door.cake2(); // Outputs "vanilla"
That will now work and log both "chocolate" and "vanilla". In this case Symbol hasn't been used at all, and indeed has no bearing on that working. It's simply a case that the developer has assigned a unique identifier but they have done this manually and without using Symbol.
Original question:
I'm taking a course in JavaScript and the presenter is discussing Symbol.
At the beginning of the video he says:
The thing about Symbol's is that every single one is unique and this makes them very valuable in terms of things like object property identifiers.
However he then goes on to say:
They are not enumerable in for...in loops.
They cannot be used in JSON.stringify. (It results in an empty object).
In the case of point (2) he gives this example:
console.log(JSON.stringify({key: 'prop'})); // object without Symbol
console.log(JSON.stringify({Symbol('sym1'): 'prop'})); // object using Symbol
This logs {"key": "prop"} and {} to the console respectively.
How does any of this make Symbol "valuable" in terms of being unique object keys or identifiers?
In my experience two very common things you'd want to do with an object is enumerate it, or convert the data in them to JSON to send via ajax or some such method.
I can't understand what the purpose of Symbol is at all, but especially why you would want to use them for making object identifiers? Given it will cause things later that you cannot do.
Edit - the following was part of the original question - but is a minor issue in comparison to the actual purpose of Symbol with respect to unique identifiers:
If you needed to send something like {Symbol('sym1'): 'prop'} to a backend via ajax what would you actually need to do in this case?
I replied to your comment in the other question, but since this is open I'll try to elaborate.
You are getting variable names mixed up with Symbols, which are unrelated to one another.
The variable name is just an identifier to reference a value. If I create a variable and then set it to something else, both of those refer to the same value (or in the case of non-primitives in JavaScript, the same reference).
In that case, I can do something like:
const a = Symbol('a');
const b = a;
console.log(a === b); // true
That's because there is only 1 Symbol created and the reference to that Symbol is assigned to both a and b. That isn't what you would use Symbols for.
Symbols are meant to provide unique keys which are not the same as a variable name. Keys are used in objects (or similar). I think the simplicity of the other example may be causing the confusion.
Let us imagine a more complex example. Say I have a program that lets you create an address book of people. I am going to store each person in an object.
const addressBook = {};
const addPerson = ({ name, ...data }) => {
addressBook[name] = data;
};
const listOfPeople = [];
// new user is added in the UI
const newPerson = getPersonFromUserEntry();
listOfPeople.push(newPerson.name);
addPerson(newPerson);
In this case, I would use listOfPeople to display a list and when you click it, it would show the information for that user.
Now, the problem is, since I'm using the person's name, that isn't truly unique. If I have two "Bob Smith"'s added, the second will override the first and clicking the UI from "listOfPeople" will take you to the same one for both.
Now, instead of doing that, lets use a Symbol in the addPerson() and return that and store it in listOfPeople.
const addressBook = {};
const addPerson = ({ name, ...data }) => {
const symbol = Symbol(name);
addressBook[symbol] = data;
return symbol;
};
const listOfPeople = [];
// new user is added in the UI
const newPerson = getPersonFromUserEntry();
listOfPeople.push(addPerson(newPerson));
Now, every entry in listOfPeople is totally unique. If you click the first "Bob Smith" and use that symbol to look him up you'll get the right one. Ditto for the second. They are unique even though the base of the key is the same.
As I mentioned in the other answer, the use-case for Symbol is actually fairly narrow. It is really only when you need to create a key you know will be wholly unique.
Another scenario where you might use it is if you have multiple independent libraries adding code to a common place. For example, the global window object.
If my library exports something to window named "getData" and someone has a library that also exports a "getData" one of us is going to override the other if they are loaded at the same time (whoever is loaded last).
However, if I want to be safer, instead of doing:
window.getData = () => {};
I can instead create a Symbol (whose reference I keep track of) and then call my getData() with the symbol:
window[getDataSymbol]();
I can even export that Symbol to users of my library so they can use that to call it instead.
(Note, all of the above would be fairly poor naming, but again, just an example.)
Also, as someone mentioned in the comments, these Symbols are not for sharing between systems. If I call Symbol('a') that is totally unique to my system. I can't share it with anyone else. If you need to share between systems you have to make sure you are enforcing key uniqueness.
As a very practical example what kind of problem Symbols solve, take angularjs's use of $ and $$:
AngularJS Prefixes $ and $$: To prevent accidental name collisions with your code, AngularJS prefixes names of public objects with $ and names of private objects with $$. Please do not use the $ or $$ prefix in your code.
https://docs.angularjs.org/api
You'll sometimes have to deal with objects that are "yours", but that Angular adds its own $ and $$ prefixed properties to, simply as a necessity for tracking certain states. The $ are meant for public use, but the $$ you're not supposed to touch. If you want to serialise your objects to JSON or such, you need to use Angular's provided functions which strip out the $-prefixed properties, or you need to otherwise be aware of dealing with those properties correctly.
This would be a perfect case for Symbols. Instead of adding public properties to objects which are merely differentiated by a naming convention, Symbols allow you to add truly private properties which only your code can access and which don't interfere with anything else. In practice Angular would define a Symbol once somewhere which it shares across all its modules, e.g.:
export const PRIVATE_PREFIX = Symbol('$$');
Any other module now imports it:
import { PRIVATE_PREFIX } from 'globals';
function foo(userDataObject) {
userDataObject[PRIVATE_PREFIX] = { foo: 'bar' };
}
It can now safely add properties to any and all objects without worrying about name clashes and without having to advise the user about such things, and the user doesn't need to worry about Angular adding any of its own properties since they won't show up anywhere. Only code which has access to the PRIVATE_PREFIX constant can access these properties at all, and if that constant is properly scoped, that's only Angular-related code.
Any other library or code could also add its own Symbol('$$') to the same object, and it would still not clash because they're different symbols. That's the point of Symbols being unique.
(Note that this Angular use is hypothetical, I'm just using its use of $$ as a starting point to illustrate the issue. It doesn't mean Angular actually does this in any way.)
To expand on #samanime's excellent answer, I'd just like to really put emphasis on how Symbols are most commonly used by real developers.
Symbols prevent key name collision on objects.
Let's inspect the following page from MDN on Symbols. Under "Properties", you can see some built-in Symbols. We'll look at the first one, Symbol.iterator.
Imagine for a second that you're designing a language like JavaScript. You've added special syntax like for..of and would like to allow developers to define their own behavior when their special object or class is iterated over using this syntax. Perhaps for..of could check for a special function defined on the object/class, named iterator:
const myObject = {
iterator: function() {
console.log("I'm being iterated over!");
}
};
However, this presents a problem. What if some developer, for whatever reason, happens to name their own function property iterator:
const myObject = {
iterator: function() {
//Iterate over and modify a bunch of data
}
};
Clearly this iterator function is only meant to be called to perform some data manipulation, probably very infrequently. And yet if some consumer of this library were to think myObject is iterable and use for..of on it, JavaScript will go right ahead and call that function, thinking it's supposed to return an iterator.
This is called a name collision and even if you tell every developer very firmly "don't name your object properties iterator unless it returns a proper iterator!", someone is bound to not listen and cause problems.
Even if you don't think just that one example is worthy of this whole Symbol thing, just look at the rest of the list of well-known symbols. replace, match, search, hasInstance, toPrimitive... So many possible collisions! Even if every developer is made to never use these as keys on their objects, you're really restricting the set of usable key names and therefore developer freedom to implement things how they want.
Symbols are the perfect solution for this. Take the above example, but now JavaScript doesn't check for a property named "iterator", but instead for a property with a key exactly equal to the unique Symbol Symbol.iterator. A developer wishing to implement their own iterator function writes it like this:
const myObject = {
[Symbol.iterator]: function() {
console.log("I'm being iterated over!");
}
};
...and a developer wishing to simply not be bothered and use their own property named iterator can do so completely freely without any possible hiccups.
This is a pattern developers of libraries may implement for any unique key they'd like to check for on an object, the same way the JavaScript developers have done it. This way, the problem of name collisions and needing to restrict the valid namespace for properties is completely solved.
Comment from the asker:
The bit which confused me on the linked OP is they've created 2 variables with the names cake1 and cake2. These names are unique and the developer has had to determine them so I didn't understand why they couldn't assign the variable to the same name, as a string (const cake1 = 'cake1'; const cake2 = 'cake2'). This could be used to make 2 unique key names since the strings 'cake1' !== 'cake2'. Also the answer says for Symbol you "can't share it" (e.g. between libraries) so what use is that in terms of avoiding conflict with other libraries or other developers code?
The linked OP I think is misleading - it seems the point was supposed to be that both symbols have the value "cake" and thus you technically have two duplicate property keys with the name "cake" on the object which normally isn't possible. However, in practice the capability for symbols to contain values is not really useful. I understand your confusion there, again, I think it was just another example of avoiding key name collision.
About the libraries, when a library is published, it doesn't publish the value generated for the symbol at runtime, it publishes code which, when added to your project, generates a completely unique symbol different than what the developers of the library had. However, this means nothing to users of the library. The point is that you can't save the value of a symbol, transfer it to another machine, and expect that symbol reference to work when running the same code. To reiterate, a library has code to create a symbol, it doesn't export the generated value of any symbols.
What's the purpose of Symbol in terms of unique object identifiers?
Well,
Symbol( 'description' ) !== Symbol( 'description' )
How does any of this make Symbol "valuable" in terms of being unique object keys or identifiers?
In a visitor pattern or chain of responsibility, some logic may add additional metadata to any object and that's it (imagine some validation OR ORM metadata) attached to objects but that does not persist *.
If you needed to send something like {Symbol('sym1'): 'prop'} to a backend via ajax what would you actually need to do in this case?
If I may assure you, you won't need to do that. you would consider { sym1: 'prop' } instead.
Now, this page even has a note about it
Note: If you are familiar with Ruby's (or another language) that also has a feature called "symbols", please don’t be misguided. JavaScript symbols are different.
As I said, there are useful for runtime metadata and not effective data.

Why does promise.join() take a function as its last parameter?

Say I have a step in a procedure that requires the retrieval of two objects. I would use join() to coordinate the retrievals:
return promise.join(retrieveA(), retrieveB())
.spread(function(A, B) {
// create something out of A and B
});
The documentation shows that you can also pass the handler as the last parameter:
return promise.join(retrieveA(), retrieveB(), function(A, B) {
// create something out of A and B
});
I'm curious as to what the rationale behind the existence of this option.
Fact time: The reason .join was added was to make #spion happy. Not without reason though, using .join means you have a static and known number of promise which makes using it with TypeScript a lot easier. Petka (Esailija) liked the idea and also the fact it can be optimised further because it doesn't have to abide to weird guarantees the other form does have to abide to.
Over time, people started (at least me) using it for other use cases - namely using promises as proxies.
So, let's talk about what it does better:
Static Analysis
It's hard to statically analyse Promise.all since it works on an array with an unknown types of promises of potentially different types. Promise.join can be typed since it can be seen as taking a tuple - so for example for the 3 promises case you can give it a type signature of (Promise<S>, Promise<U>, Promise<T>, ((S,U,T) -> Promise<K> | K)) -> Promise<K> which simply can't be done in a type safe way for Promise.all.
Proxying
It's very clean to use when writing promise code in the proxying style:
var user = getUser();
var comments = user.then(getComments);
var related = Promise.join(user, comments, getRelated);
Promise.join(user, comments, related, (user, comments, related) => {
// use all 3 here
});
It's faster
Since it doesn't need to produce the value of the given promises cached and to keep all the checks .all(...).spread(...) does - it'll perform slightly faster.
But... you really usually shouldn't care.
you can also pass the handler as the last parameter. I'm curious as to what the rationale behind the existence of this option.
It is not an "option". It's the sole purpose of the join function.
Promise.join(promiseA, promiseB, …, function(a, b, …) { … })
is exactly equivalent to
Promise.all([promiseA, promiseB, …]).spread(function(a, b, …) { … })
But, as mentioned in the documentation, it
is much easier (and more performant) to use when you have a fixed amount of discrete promises
It relieves you of needing to use that array literal, and it doesn't need to create that intermediate promise object for the array result.

Why can Array.prototype.forEach not be chained?

I learned today that forEach() returns undefined. What a waste!
If it returned the original array, it would be far more flexible without breaking any existing code. Is there any reason forEach returns undefined.
Is there anyway to chain forEach with other methods like map & filter?
For example:
var obj = someThing.keys()
.filter(someFilter)
.forEach(passToAnotherObject)
.map(transformKeys)
.reduce(reduction)
Wouldn't work because the forEach doesn't want to play nice, requiring you to run all the methods before the forEach again to get the object in the state needed for the forEach.
What you want is known as method cascading via method chaining. Describing them in brief:
Method chaining is when a method returns an object that has another method that you immediately invoke. For example, using jQuery:
$("#person")
.slideDown("slow")
.addClass("grouped")
.css("margin-left", "11px");
Method cascading is when multiple methods are called on the same object. For example, in some languages you can do:
foo
..bar()
..baz();
Which is equivalent to the following in JavaScript:
foo.bar();
foo.baz();
JavaScript doesn't have any special syntax for method cascading. However, you can simulate method cascading using method chaining if the first method call returns this. For example, in the following code if bar returns this (i.e. foo) then chaining is equivalent to cascading:
foo
.bar()
.baz();
Some methods like filter and map are chainable but not cascadable because they return a new array, but not the original array.
On the other hand the forEach function is not chainable because it doesn't return a new object. Now, the question arises whether forEach should be cascadable or not.
Currently, forEach is not cascadable. However, that's not really a problem as you can simply save the result of the intermediate array in a variable and use that later:
var arr = someThing.keys()
.filter(someFilter);
arr.forEach(passToAnotherObject);
var obj = arr
.map(transformKeys)
.reduce(reduction);
Yes, this solution looks uglier than the your desired solution. However, I like it more than your code for several reasons:
It is consistent because chainable methods are not mixed with cascadable methods. Hence, it promotes a functional style of programming (i.e. programming with no side effects).
Cascading is inherently an effectful operation because you are calling a method and ignoring the result. Hence, you're calling the operation for its side effects and not for its result.
On the other hand, chainable functions like map and filter don't have any side effects (if their input function doesn't have any side effects). They are used solely for their results.
In my humble opinion, mixing chainable methods like map and filter with cascadable functions like forEach (if it was cascadable) is sacrilege because it would introduce side effects in an otherwise pure transformation.
It is explicit. As The Zen of Python teaches us, “Explicit is better than implicit.” Method cascading is just syntactic sugar. It is implicit and it comes at a cost. The cost is complexity.
Now, you might argue that my code looks more complex than yours. If so, you would be judging the book by its cover. In their famous paper Out of the Tar Pit, the authors Ben Moseley and Peter Marks describe different types of software complexities.
The second biggest software complexity on their list is complexity caused by explicit concern with control flow. For example:
var obj = someThing.keys()
.filter(someFilter)
.forEach(passToAnotherObject)
.map(transformKeys)
.reduce(reduction);
The above program is explicitly concerned with control flow because you are explicit stating that .forEach(passToAnotherObject) should happen before .map(transformKeys) even though it shouldn't have any effect on the overall transformation.
In fact, you can remove it from the equation altogether and it wouldn't make any difference:
var obj = someThing.keys()
.filter(someFilter)
.map(transformKeys)
.reduce(reduction);
This suggests that the .forEach(passToAnotherObject) didn't have any business being in the equation in the first place. Since it's a side effectful operation, it should be kept separate from pure code.
When you write it explicitly as I did above, not only are you separating pure code from side effectful code but also you can choose when to evaluate each computation. For example:
var arr = someThing.keys()
.filter(someFilter);
var obj = arr
.map(transformKeys)
.reduce(reduction);
arr.forEach(passToAnotherObject); // evaluate after pure computation
Yes, you are still explicitly concerned with control flow. However, at least now you know that .forEach(passToAnotherObject) has nothing to do with the other transformations.
Thus, you have eliminated some (but not all) of the complexity caused by explicit concern with control flow.
For these reasons, I believe that the current implementation of forEach is actually beneficial because it prevents you from writing code that introduces complexity due to explicit concern with control flow.
I know from personal experience from when I used to work at BrowserStack that explicit concern with control flow is a big problem in large-scale software applications. It is indeed a real world problem.
It's easy to write complex code because complex code is usually shorter (implicit) code. So it's always tempting to drop in a side effectful function like forEach in the middle of a pure computation because it requires less code refactoring.
However, in the long run it makes your program more complex. Think of what would happen a few years down the line when you quit the company that you work for and somebody else has to maintain your code. Your code now looks like:
var obj = someThing.keys()
.filter(someFilter)
.forEach(passToAnotherObject)
.forEach(doSomething)
.map(transformKeys)
.forEach(doSomethingElse)
.reduce(reduction);
The person reading your code now has to assume that all the additional forEach methods in your chain are essential, put in extra work to understand what each function does, figure out by herself that these extra forEach methods are not essential to compute obj, eliminate them from her mental model of your code and only concentrate on the essential parts.
That's a lot of unnecessary complexity added to your program, and you thought that it was making your program more simple.
It's easy to implement a chainable forEach function:
Array.prototype.forEachChain = function () {
this.forEach(...arguments);
return this;
};
const arr = [1,2,3,4];
const dbl = (v, i, a) => {
a[i] = 2 * v;
};
arr.forEachChain(dbl).forEachChain(dbl);
console.log(arr); // [4,8,12,16]

Is it possible to define a new statement?

I want to be able to define a statement in javascript. For example, I want to define
a statement called file that works like a class.
function file() {
//code goes here
}
I want that to be used as a statement, like if,for,andreturn.
file filename(filename,purpose) {
//code goes here
}
Do I need to build a seperate compiler or is it possible?
Please change the title if there is a better way to say it.
What are you trying to accomplish?
You can emulate some class-like structure in JavaScript using the Revealing Module Pattern
Also, I've never seen a class work like what you've described -- typically you instantiate an object of a class, and then access the object's public properties. This can be done in JavaScript ('cept objects are created dynamically). For example:
// file 'class'
var file = function () {
var a; // private variable
function filename(name, purpose) {
// code goes here
}
// return public members
return {
filename: filename
};
};
// An object created from your 'class' with the member function 'filename()'
var aFile = file();
Then call your member function using the . operator, like so: aFile.filename(name, purpose);
This would be writing a new language based on Javascript, much like Coffeescript, among many others. Those languages need to compile to JS before being served to a web browser, for instance.
Take a look at a Coffeescript -> JS interpreter to know how to go about this. For the record, I don't think this is a good approach.
Lastly I'll note that languages like Scala have very good DSL support, meaning it's easy to add features within the language. For instance, even + in Scala is library code, not native code. (More technically, could be written that way from a language standpoint.)
I want to be able to define a statement in javascript.
I want that to be used as a statement, like if,for,andreturn.
No, you cannot do this, as a Javascript parser would not be able to parse this.
If you really wish to do this, your only option would be to create your own language, and write a transpiler from your new language to Javascript, as #djechlin has pointed out.
I believe what you want is to implement control structures rather than statements since the example you gave, if, for and return are control structures. If that is what you really mean then yes, you can do that with functions but not with the syntax you describe.
In javascript, functions are first class objects. That is, they can be assigned to variables and passed as arguments to other functions. For example, here's an implementation of if that uses no built-in control structure (no if, while, switch etc. and no ternary operator):
function IF (condition, callback) {
[function(){}, callback][+!!condition]();
}
You can use the above function as a replacement of if but the syntax is a bit unconventional:
IF ( a == b, function(){
console.log('hello');
});
But if you've been programming javascript long enough the above syntax would be familiar and you'd have encountered many similar control structures implemented as functions such as [].forEach() and setTimeout().
So, if you want to implement a control structure to parse a file for example, you can do something like this:
function parseFile (filename, callback) {
// code to process filename
callback(result);
}
Or even something like this:
function eachLine (filename, callback) {
// code to process filename
for (var i=0; i<file_content.length; i++) {
callback(file_content[i]);
}
}
which you can use like this:
eachLine("some_file.txt",function(line){
if (line.match(/hello/)) {
console.log('Found hello! This file is friendly.');
}
});
if you don't need parameters you can do:
Object.defineProperty(window, 'newcmd', {
get: () => console.log("hello")
})
newcmd

Safely parsing and evaluating user input

I'm working on a project that's essentially a templating domain-specific language. In my project, I accept lines of user input in the following form:
'{{index(1, 5)}}'
'{{firstName()}} X. {{lastName()}}'
'{{floating(-0.5, 0.5)}}'
'{{text(5, "words")}}'
Any command between double curly braces ({{ }}) has a corresponding Javascript method that should be called when that command is encountered. (For example, function index(min, max) {...} in the case of the first one).
I'm having a difficult time figuring out how to safely accept the input and call the appropriate function. I know that the way I'm doing it now isn't safe. I simply eval() anything between two sets of curly braces.
How can I parse these input strings such that I can flexibly match a function call between curly braces and execute that function with any parameters given, while still not blindly calling eval() with the code?
I've considered making a mapping (if command is index(), call function index() {}), but this doesn't seem very flexible; how do I collect and pass any parameters (e.g. {{index(2, 5)}}) if any are present?
This is written in Node.js.
This problem breaks down into:
Parsing the string
Evaluating the resulting function graph
Dispatching to each function (as part of #2 above)
Parsing the string
Unfortunately, with the requirements you have, parsing the {{...}} string is quite complex. You have at least these issues to deal with:
Functions can be nested {{function1(function2(), 2, 3)}}.
Strings can contain (escaped) quotes, and can contain commas, so even without requirement #1 above the trivial approach to finding the discrete arguments (splitting on a comma) won't work.
So...you need a proper parser. You could try to cobble one together ad hoc, but this is where parser generators come into the picture, like PEG.js or Jison (those are just examples, not necessarily recommendations — I did happen to notice one of the Jison examples is a JSON parser, which would be about half the battle). Writing a parser is out of scope for answering a question on SO I'm afraid. :-)
Evaluating the resulting function graph
Depending on what tool you use, your parser generator may handle this for you. (I'm pretty sure PEG.js and Jison both would, for instance.)
If not, then after parsing you'll presumably end up with an object graph of some sort, which gives you the functions and their arguments (which might be functions with arguments...which might be...).
functionA
1
"two"
functionB
"a"
functionC
42
functionD
27
functionA there has five arguments, the third of which is functionB with two arguments, and so on.
Your next task, then, is to evaluate those functions deepest first (and at the same depth, left-to-right) and replace them in the relevant arguments list with their result, so you'll need a depth-first traversal algorithm. By deepest first and left-to-right (top-to-bottom in the bullet list above) I mean that in the list above, you have to call functionC first, then functionB, then functionD, and finally functionA.
Dispatching to each function
Depending again on the tool you use, it may handle this bit too. Again I suspect PEG.js does, and I wouldn't be surprised if Jison did as well.
At the point where you're ready to call a function that (no longer) has function calls as arguments, you'll presumably have the function name and an array of arguments. Assuming you store your functions in a map:
var functions = {
index: function() { /* ... */ },
firstName: function() { /* ... */ },
// ...
};
...calling them is the easy bit:
functionResult = functions[functionName].apply(undefined, functionArguments);
I'm sorry not to be able to say "Just do X, and you're there," but it really isn't a trivial problem. I would throw tools at it, I wouldn't invent this wheel myself.
If possible do not evaluate the user input.
If you need to evaluate it, evaluate it in controlled scope and environment.
The last one means instead of using eval() use new Function() or specially designed libraries like https://github.com/dtao/lemming.js
See http://www.2ality.com/2014/01/eval.html for more information about eval vs new Function()
For more sophisticated approach try creating your own parser, check https://stackoverflow.com/a/2630085/481422
Search for comment // ECMAScript parser in https://github.com/douglascrockford/JSLint/blob/master/jslint.js
You could try something like this:
Assuming you have a function like this:
'{{floating(-0.5, 0.5)}}'
And all your actual functions are referenced in an object, like this:
var myFunctions = {
'index': function(){/* Do stuff */},
'firstName': function(){}
}
Then, this should work:
function parse(var input){
var temp = input.replace('{{','').replace(')}}','').split('('),
fn = temp[0];
arguments = temp[1].split(',');
myFunctions[fn].apply(this, arguments);
}
Please note that this only works for simple function calls that don't have functions nested as their arguments. It also passes all arguments as strings, instead of the types that may be intended (Numbers, booleans, etc).
If you want to handle more complex strings, you'll need to use a proper parser or template engine, as #T.J. Crowder suggested in the comments.

Categories

Resources