Creating a Basic Formula Editor in JavaScript

Creating a Basic Formula Editor in JavaScript - javascript

I'm working on creating a basic RPG game engine prototype using JavaScript and canvas. I'm still working out some design specs on paper, and I've hit a bit of a problem I'm not quite sure how to tackle.
I will have a Character object that will have an array of Attribute objects. Attributes will look something like this:
function(name, value){
this.name = name;
this.value = value;
...
}
A Character will also have "skills" that are calculated off attributes. A skills value can also be determined by a formula entered by the user. A legit formula would look something like this:
((#attribute1Name + (#attribute2Name / 2) * 5)
where any text following the # sign represents the name of an attribute belonging to that character. The formula will be entered into a text field as a string.
What I'm having a problem with is understanding the proper way to parse and evaluate this formula. Initially, my plan was to do a simple replace on the attribute names and eval the expression (if invalid, the eval would fail). However, this presents a problem as it would allow for JavaScript injection into the field. I'm assuming I'll need some kind of FSM similar to an infix calculator to solve this, but I'm a little rusty on my computation theory (thanks corporate world!). I'm really not asking for someone to just hand me the code so much as I'd like to get your input on what is the best solution to this problem?
EDIT: Thanks for the responses. Unfortunately life has kept me busy and I haven't tried a solution yet. Will update when I get a result (good or bad).

Different idea, hence a separate suggestion:
eval() works fine, and there's no need to re-invent the wheel.
Assuming that there's only a small and fixed number of variables in your formula language, it would be sufficient to scan your way through the expression and verify that everything you encounter is either a parenthesis, an operator or one of your variable names. I don't think there would be any way to assemble those pieces into a piece of code that could have malicious side effects on eval.
So:
Scan the expression to verify that it draws from just a very limited vocabulary.
Let eval() work it out.
Probably the compromise with the least amount of work and code while bringing risk down to (near?) 0. At worst, a misuser could tack parentheses on a variable name in an attempt to execute the variable.

I think instead of letting them put the whole formula in, you could have select tags that have operations and values, and let them choose.
ie. a set of tags with attribute-operation-number:
<select> <select> <input type="text">
#attribute1Name1 + (check if input is number)
#attribute1Name2 -
#attribute1Name3 *
#attribute1Name4 /
etc.

There is a really simple solution: Just enter a normal JavaScript formula (i.e. as if you were writing a method for your object) and use this to reference the object you're working on.
To change this when evaluating the method use apply() or call() (see this answer).

I recently wrote a similar application. I probably invested far too much work, but I went the whole 9 yards and wrote both a scanner and a parser.
The scanner converted the text into a series of tokens; tokens are simple objects consisting of token type and value. For the punctuation marks, value = character, for numbers the values would be integers corresponding to the numeric value of the number, and for variables it would be (a reference to) a variable object, where that variable would be sitting in a list of objects having a name. Same variable object = same variable, natch.
The parser was a simple brute force recursive descent parser. Here's the code.
My parser does logic expressions, with AND/OR taking the place of +/-, but I think you can see the idea. There are several levels of expressions, and each tries to assemble as much of itself as it can, and calls to lower levels for parsing nested constructs. When done, my parser has generated a single Node containing a tree structure that represents the expression.
In your program, I guess you could just store that Node, as its structure will essentially represent the formula for its evaluation.
Given all that work, though, I'd understand just how tempting it would be to just cave in and use eval!

I'm fascinated by the task of getting this done by the simplest means possible.
Here's another approach:
Convert infix to postfix;
use a very simple stack-based calculator to evaluate the resulting expression.
The rationale here being, once you get rid of the complication of "* before +" and parentheses, the remaining calculation is very straightforward.

You could look at running the user-defined code in a sandbox to prevent attacks:
Is It Possible to Sandbox JavaScript Running In the Browser?

Related

In a stringified array is it possible to differentiate between quotes that were in a string and those that surrounded the string itself?

Some Context:
• I'm still learning to code atm (started less than a year ago)
• I'm mostly self taught at that since I think my computer science class feels
too slow.
• The website I'm learning on is code.org, specifically in the "game lab"
• The site's coding environments only use ES5 because they don't want to
update them to ES6 or something like that
• In class we're making function libraries and while not required, I want
mine to be "highly usable," for lack of a better term, while also being
reasonably short (prefer not to automate things if I can get them done
quicker somehow, but that's just personal preference).
So now for where the actual question comes in: in a stringified array, is it possible to differentiate between a quotation mark that was inside a string and a quotation mark that actually denotes a string? Because I noticed something confusing with the output of JSON.parse(JSON.stringify()) on code.org, specifically, if you write something like,
JSON.parse(JSON.stringify(['hi","hi']))
the output will be ["hi","hi"] which looks just like an array containing two strings (on code.org it doesn't show the \'s), but still contains just one, which is fine unless you're using a regular expression to detect whether or not a match is within a string (if every quotation mark after the match has a "partner"), which is what I'm doing in 4 different functions. One flattens a list (since ES5 doesn't have Array.prototype.flat()), one removes all instances of the arguments from a list, one removes all instances of specified operand types, and one replaces all instances of an argument with the one that follows it.
Now I know the odds of a string containing an odd number of quotation marks (whether single or double) is likely extremely low, but it still bothers me that not having a way to differentiate between quotes formerly within a string and quotes which formerly denoted a string (in an array after it's been stringified) as these functions otherwise function exactly as intended. The regular expression I'm using to determine if there's an even number of quotes left in the stringified array is /(?=[^"]*(?:(?:"[^"]*){2})*$)/ where you put the match before the lookahead assertion and anything you absolutely want to follow before the first [^"]*.
To highlight the actual issue I'm trying to solve, this is my flatten function (since it's the shortest of the 4), and yeah, yeah, I know "eval bad" but it's extremely convenient to use here since it shortens the actual modification into a single line, and I highly doubt anyone's actually going to find a way to abuse it given its implementation ("this" needs to be an array for splice to work, so if I'm not mistaken, there isn't really a way to abuse it, but tell me if I'm wrong, since I probably am).
Array.prototype.flatten = function() {
eval(('this.splice(0,this.length,' + JSON.stringify(this).replace(/[\[\]](?=[^"]*(?:(?:"[^"]*){2})*$)/g, '') + ')').replace(/,(?=((,[^"]*(?:(?:"[^"]*){2})*)*.$))/g, ''));
return this;
};
This works really well outside of the previously specified conditions, but if I were to call it with something like [1,'"'] it'd find 3 quotation marks after the \[ and wouldn't be able to remove it but would be able to remove the \], thus when eval actually gets to .splice(), it would look like eval('this.splice(0,this.length,[1,"\"")') causing the error Unexpected token ')' to be thrown
Any help on this is appreciated, even if it's just telling me it isn't possible, thanks for reading my ramblings.
TL;DR: in a stringified array is it possible to differentiate between " and \" (string wrapping quotes of strings within a stringified array and quotes within a string within a stringified array) in a regular expression or any other method using only the tools available in ES5 (site I'm learning on doesn't want to update their project environments for whatever reason)

You are having a problem because your input is not a context free grammar and can not be correctly parsed with regular expressions.
Can you explain why JSON.parse is unacceptable? It is even in ancient browsers and versions of node.js.
Someone writing a json parser might use bison or yacc, so if this is a learning experience consider playing with jison.

I ended up finding a way to do this, for whatever reason (either I didn't notice last night because I was tired or it legitimately changed overnight, though likely the former) I can now see the " when viewing the value of the the stringified array, and lo and behold modifying the regular expression so that it ignored instances of " resolved the issue.
New regular expression for quotation mark pair matching now reads:
// old even number of quotation marks after match check
/(?=[^"]*(?:(?:"[^"]*){2})*$)/
// new even number of quotation marks after match check
/(?=(\\"|[^"])*(?:(?:(?<!\\)"(\\"|[^"])*){2})*$)/
// (only real difference is that it accounts for the \)
Sorry for anyone who may have misunderstood the question due to how all over the place it was, I'm aware that I tend to end up writing a lot more than is necessary and it often leads to tangents that muddle my view of what I was initially asking, which in turn makes the point I'm actually trying to get across even harder to grasp at. Thanks to those who still tried to help me regardless of how much of a mess of a first question this was.

RegExp for parsing a Math Expression?

Hey I've written a fractal-generating program in JavaScript and HTML5 (here's the link), which was about a 2 year process including all the research I did on Complex math and fractal equations, and I was looking to update the interface, since it is quite intimidating for people to look at. While looking through the code I noticed that some of my old techniques for going about doing things were very inefficient, such as my Complex.parseFunction.
I'm looking for a way to use RegExp to parse components of the expression such as functions, operators, and variables, as well as implementing the proper order of operations for the expression. An example below might demonstrate what I mean:
//the first example parses an expression with two variables and outputs to string
console.log(Complex.parseFunction("i*-sinh(C-Z^2)", ["Z","C"], false))
"Complex.I.mult(Complex.neg(Complex.sinh(C.sub(Z.cPow(new Complex(2,0,2,0))))))"
//the second example parses the same expression but outputs to function
console.log(Complex.parseFunction("i*-sinh(C-Z^2)", ["Z","C"], true))
function(Z,C){
return Complex.I.mult(Complex.neg(Complex.sinh(C.sub(Z.cPow(new Complex(2,0,2,0))))));
}
I know how to handle RegExp using String.prototype.replace and all that, all I need is the RegExp itself. Please note that it should be able to tell the difference between the subtraction operator (e.g. "C-Z^2") and the negative function (e.g. "i*-(Z^2+C)") by noting whether it is directly after a variable or an operator respectively.

While you can use regular expressions as part of an expression parser, for example to break out tokens, regular expressions do not have the computational power to parse properly nested mathematical expressions. That is essentially one of the core results of computing theory (finite state automata vs. push down automata). You probably want to look at something like recursive-descent or LR parsing.
I also wouldn't worry too much about the efficiency of parsing an expression provided you only do it once. Given all of the other math you are doing, I doubt it is material.

JavaScript to evaluate simple math string like 5*1.2 (eval/white-list?)

I have an input onchange that converts numbers like 05008 to 5,008.00.
I am considering expanding on this, to allow simple calculations. For example, 45*5 would be converted automatically to 225.00.
I could use a character white-list ()+/*-0123456789., and then pass the result to eval, I think that these characters are safe to prevent any dangerous injections. That is assuming I use an appropriate try/catch, because a syntax error could be created.
Is this an OK white-list, and then pass it to eval?
Do recommend a revised white-list
Do you recommend a different approach (maybe there is already a function that does this)
I would prefer to keep it lightweight. That is why I like the eval/white-list approach. Very little code.
What do you recommend?

That whitelist looks safe to me, but it's not such a simple question. In some browsers, for example, an eval-string like this:
/.(.)/(34)
is equivalent to this:
new RegExp('.(.)').exec('34')
and therefore returns the array ['34','4']. Is that "safe"?
So while the approach can probably be made to work safely, it might be a very tricky proposition. If you do go forward with this idea, I think you should use a much more aggressive approach to validate your inputs. Your principle should be "this is a member of a well-defined set of strings that is known to be 'safe'" rather than "this is a member of an ill-defined set of strings that excludes all strings known to be 'unsafe'". Furthermore, to avoid any risk of operators peeking through that you hadn't considered (such as ++ or += or whatnot), I think you should insert a space in front of every non-digit-non-dot character; and to avoid any risk of parentheses triggering a function call, I think you should handle them yourself by repeatedly replacing (...) with a space plus the result of evaluating ... (after confirming that that result is a number) plus a space.
(By the way, how come = is in your whitelist? I just can't figure out what that's useful for!)

Given that extremely restrictive whitelist, I can't see any way of performing a malicious action beyond throwing an exception. The bracket trick won't work since it requires square brackets [].
Perhaps the safest option is to modify your page's default values parser to only accept numbers and throw out anything else. That way, potentially malicious code in a link will never make it to eval.
This only leaves the possibility of the user typing something malicious into a field, but why even bother worrying about that? The user already has access to a console (Dev Tools) they could use to execute arbitrary code.

An often overlooked issue with eval is that it causes problems for javascript minifiers.
Some minifiers like YUI take the safe route and stop renaming variables as soon as they see an eval statement. This means your javascript will work but your compressed file will be larger than it needs to be.
Other's like Google Closure Compiler will continue to rename variables but if you are not careful they can break your code. You should avoid passing strings with variable names in it to eval. so for example.
var input = "1+2*3";
var result = eval("input"); // unsafe
var result = eval(input); // safe

Use of letters for doing matrix math in Javascript

I'm doing a course in Quantum Computation. In it, we represent possible actions, or operators, by matrices. I've been looking into creating a webpage for solving these maths problems.
It is also a small challenge for myself in order to freshen up my JS.
I've been looking at various options, like Sylvester, MathJax and MathML.
Problem: However, none of the above appear to give functionality for using letters throughout my computation.
For instance, in Quantum Computation we often use multiply a matrix containing unknowns alpha and beta, with other matrices.
This is the sort of math I need to do:
http://i.stack.imgur.com/vH9Dk.gif
Ideally, I'd write this in the style of:
M=[[a],[b]], which of course, I cannot. Further, I'd be able to multiply to get "2*a" etc.
Any suggestions?

As suggested in the comments on the question, you could use strings. Then you just have to write your own matrix-matrix multiplication routine which will understand the difference between an entry containing a string and an entry containing a number.
However, as soon as you do more than one of these, you'll end up with expressions as well as variables and numbers. So we can generalise this to make every element be an expression. This is the beginnings of a symbolic algebra system as #High Performance Mark pointed out.
In javascript, I would guess that you want a set of expression objects, each implementing an interface including a method that returns whether the expression is determined or not yet. The gnarly bit is simplifying the resulting expressions to resolve the values of the variables.
Alternatively, do a bit more maths beforehand; move the variables out of the equations, and then let the code do the calculation.

Syntax / Logical checker In Javascript?

I'm building a solution for a client which allows them to create very basic code,
now i've done some basic syntax validation but I'm stuck at variable verification.
I know JSLint does this using Javascript and i was wondering if anyone knew of a good way to do this.
So for example say the user wrote the code
moose = "barry"
base = 0
if(moose == "barry"){base += 100}
Then i'm trying to find a way to clarify that the "if" expression is in the correct syntax, if the variable moose has been initialized etc etc
but I want to do this without scanning character by character,
the code is a mini language built just for this application so is very very basic and doesn't need to manage memory or anything like that.
I had thought about splitting first by Carriage Return and then by Space but there is nothing to say the user won't write something like moose="barry" or if(moose=="barry")
and there is nothing to say the user won't keep the result of a condition inline.
Obviously compilers and interpreters do this on a much more extensive scale but i'm not sure if they do do it character by character and if they do how have they optimized?
(Other option is I could send it back to PHP to process which would then releave the browser of responsibility)
Any suggestions?
Thanks
The use case is limited, the syntax will never be extended in this case, the language is a simple scripted language to enable the client to create a unique cost based on their users input the end result will be processed by PHP regardless to ensure the calculation can't be adjusted by the end user and to ensure there is some consistency.
So for example, say there is a base cost of £1.00
and there is a field on the form called "Additional Cost", the language will allow them manipulate the base cost relative to the "additional cost" field.
So
base = 1;
if(additional > 100 && additional < 150){base += 50}
elseif(additional == 150){base *= 150}
else{base += additional;}
This is a basic example of how the language would be used.
Thank you for all your answers,
I've investigated a parser and creating one would be far more complex than is required
having run several tests with 1000's of lines of code and found that character by character it only takes a few seconds to process even on a single core P4 with 512mb of memory (which is far less than the customer uses)
I've decided to build a PHP based syntax checker which will check the information and convert the variables etc into valid PHP code whilst it's checking it (so that it's ready to be called later without recompilation) using this instead of javascript this seems more appropriate and will allow for more complex code to arise without hindering the validation process
It's only taken an hour and I have code which is able to check the validity of an if statement and isn't confused by nested if's, spaces or odd expressions, there is very little left to be checked whereas a parser and full blown scripting language would have taken a lot longer
You've all given me a lot to think about and i've rated relevant answers thank you

If you really want to do this — and by that I mean if you really want your software to work properly and predictably, without a bunch of weird "don't do this" special cases — you're going to have to write a real parser for your language. Once you have that, you can transform any program in your language into a data structure. With that data structure you'll be able to conduct all sorts of analyses of the code, including procedures that at least used to be called use-definition and definition-use chain analysis.
If you concoct a "programming language" that enables some scripting in an application, then no matter how trivial you think it is, somebody will eventually write a shockingly large program with it.
I don't know of any readily-available parser generators that generate JavaScript parsers. Recursive descent parsers are not too hard to write, but they can get ugly to maintain and they make it a little difficult to extend the syntax (esp. if you're not very experienced crafting the original version).

You might want to look at JS/CC which is a parser generator that generates a parser for a grammer, in Javascript. You will need to figure out how to describe your language using a BNF and EBNF. Also, JS/CC has its own syntax (which is somewhat close to actual BNF/EBNF) for specifying the grammar. Given the grammer, JS/CC will generate a parser for that grammar.
Your other option, as Pointy said, is to write your own lexer and recursive-descent parser from scratch. Once you have a BNF/EBNF, it's not that hard. I recently wrote a parser from an EBNF in Javascript (the grammar was pretty simple so it wasn't that hard to write one YMMV).
To address your comments about it being "client specific". I will also add my own experience here. If you're providing a scripting language and a scripting environment, there is no better route than an actual parser.
Handling special cases through a bunch of if-elses is going to be horribly painful and a maintenance nightmare. When I was a freshman in college, I tried to write my own language. This was before I knew anything about recursive-descent parsers, or just parsers in general. I figured out by myself that code can be broken down into tokens. From there, I wrote an extremely unwieldy parser using a bunch of if-elses, and also splitting the tokens by spaces and other characters (exactly what you described). The end result was terrible.
Once I read about recursive-descent parsers, I wrote a grammar for my language and easily created a parser in a 10th of the time it took me to write my original parser. Seriously, if you want to save yourself a lot of pain, write an actual parser. If you go down your current route, you're going to be fixing issues forever. You're going to have to handle cases where people put the space in the wrong place, or perhaps they have one too many (or one too little) spaces. The only other alternative is to provide an extremely rigid structure (i.e, you must have exactly x number of spaces following this statement) which is liable to make your scripting environment extremely unattractive. An actual parser will automatically fix all these problems.

Javascript has a function 'eval'.
var code = 'alert(1);';
eval(code);
It will show alert. You can use 'eval' to execute basic code.

Develop Reference

JavaScript is the programming language of the Web.