I want a user to enter a string into a form (or use a js interface to build the string for them) which can then be logically evaluated server side.
For example: The event admin writes into a text area "((36&&37)||(37&&38))&&~42" which goes into a fee_option table. A registrant picked options 37 and 38. The registrant is then charge a certain fee because the expression evaluates true. I can see a basic option of replacing each number with the result of in_array($one_number,$options_selected_by_user) and then running that through some kind of regex security and then sticking that into an eval($str), but this is really just a dumb example.
My questions are then:
Is there a conventional syntax for writing expressions to be evaluated apart from writing code to stick into an eval statement?
Is there a PHP library, script, or even blog post that provides help when writing such a thing? Put another way - does everyone write user entered conditional statements for form builders and such from scratch everytime?
Is there a javascript library specifically designed to build logical statements via a user interface, ie. grouping ors, ands, and nots, with a little math too probably.
Symfony Expression Language does the great job for your task. Give it a try.
Related
This may be a possible duplicate of this question here, but it doesn´t really adress and answer my question in a way that I (stupid-head) can understand it.
Ok, I´ve got a webpage formular as seen in my previous question. Before using $txtpost for mysql query injection, I now added $ txtpost = htmlentities($txtpost, ENT_QUOTES);, which should protect me from XSS-attacks. But, as a user points out on php.net, won´t protect me from javascript injections. That said, how can I prevent such javascript injections? As you can see in the code from the previous question, i don´t know what exactly will be entered into the text field, so I can´t only allow specific values. Note that all code from the previous question, which was wrong, is now repaired and it all works fine at the moment.
VicStudio
Well, it is true that you won't be protected from people putting HTML into your database.
First of all
$txtpost = htmlentities($txtpost, ENT_QUOTES);
Will escape quotes, rendering an SQL-injection less probable. But I can still do OR 1 = 1. Which renders every statement true. Modern technology relies on prepared statements (How to replace MySQL functions with PDO?)
If you read the above you'll see a PDO example of prepared statement. You can also do this with MySQLi. It prevents the fact that people can do SQL injection.
Second:
Yes, I can still put things like
XSS
Into your database. You should define the elements you like into your database by using a sanity function. PHP gives you several
filter_input: Allows you to filter and sanitize certain input.
strip_tags: allows you to strip all tags and/or use a white list of tags you do want to allow.
htmlspecialchars: converts all special characters into entities. Like " to ".
The conclusion is that you need to be in control. You decide what goes onto your page. So if you want to be safe you can filter everything and put it on your page as plain text. For safety I recommend sanitizing three times. Before the stuff is posted, when it is passed onto the database and again when it is put onto the page. This way you minimalize the danger of having an injection.
Im working on a project and one of the requirements is that users can write their own javascript code, on a simple text area component.
This is easy, but I have to validate the syntax, something like 'error: missing ; at the end'... like most syntax checkers do.
I dont want to develop it, cause it would take a lot of time.
Does anybody know if a plugin exists for that?
I found one called Javascript Lint but it is a .exe file and it doesnt have native integration with Java (its a java ee project, jsp files, etc)
Thanks for the help!
Here is a simple top-down parser: https://github.com/douglascrockford/TDOP/blob/master/parse.js.
A more complicated parser is JSLint: https://github.com/douglascrockford/JSLint. JSLint is half parser, have C-style "lint" tool (for checking for common mistakes), but you can just make use of the parser half by not reporting 'lint' results to your users. You can also turn off all of the "lint" checks.
The major difference is that JSLint will do things like checking that variables are defined and in scope and checking for other mistakes/common bad practices.
If the data is not critical, you can ask users to paste their data in http://www.jslint.com/ (Doug Crockford's site) and it displays the errors within the JSON.
We had used YUI in our JS application, so we used YAHOO.lang.JSON.parse(your textarea's json content) to validate the user's Json. However, we were never able to give them an exact list of errors within the Json, we could only tell them if it was valid Json.
Thanks
I have heard so many bad things about eval that I've never even tried to use it. However today I have a situation where it seems to be the right answer.
I need a script that can do simple calculations by combining variables. For example, if value=5 and max=8, I want to evaluate value*100/max. Both the values and the formulas will be retrieved from external sources, which is why I am concerned with eval.
I have set up a jsfiddle demo with some sample code:
http://jsfiddle.net/6yzgA/
The values are converted to numbers using parseFloat, so I believe I'm pretty safe here. The characters in the formula are matched again this regular expression:
regex=/[^0-9\.+-\/*<>!=&()]/, // allows numbers (including decimal), operations, comparison
My questions:
Does my regex filter protect me from any attack?
Is there any reason to use eval vs. new Function in this case?
Is there another, safer way to evaluate formulas?
Since you aren't sending anything sending anything to your server, or using anything on anyone else's system, the worst that can happen is that the user crashes his own browser, nothing more. There is nothing unsafe about using eval here, since everything happens user-side.
Escaping and preventing anything on the client-side doesn't make sense at all. User can alter any piece of JS code and run it just as easy as I can change the jsfiddle you posted. Trust me, it's just that simple and you cannot rely on the client-side security.
If you remember to escape input fields on the server-side it's nothing to be worried about. There are plenty of functions for that by default, depending on which language you're using.
If user wants to type in <script>haxx(l33t);</script> - let him do it. Just remember to escape special characters so you'll have <script>haxx(l33t);</script>.
I'm building a solution for a client which allows them to create very basic code,
now i've done some basic syntax validation but I'm stuck at variable verification.
I know JSLint does this using Javascript and i was wondering if anyone knew of a good way to do this.
So for example say the user wrote the code
moose = "barry"
base = 0
if(moose == "barry"){base += 100}
Then i'm trying to find a way to clarify that the "if" expression is in the correct syntax, if the variable moose has been initialized etc etc
but I want to do this without scanning character by character,
the code is a mini language built just for this application so is very very basic and doesn't need to manage memory or anything like that.
I had thought about splitting first by Carriage Return and then by Space but there is nothing to say the user won't write something like moose="barry" or if(moose=="barry")
and there is nothing to say the user won't keep the result of a condition inline.
Obviously compilers and interpreters do this on a much more extensive scale but i'm not sure if they do do it character by character and if they do how have they optimized?
(Other option is I could send it back to PHP to process which would then releave the browser of responsibility)
Any suggestions?
Thanks
The use case is limited, the syntax will never be extended in this case, the language is a simple scripted language to enable the client to create a unique cost based on their users input the end result will be processed by PHP regardless to ensure the calculation can't be adjusted by the end user and to ensure there is some consistency.
So for example, say there is a base cost of £1.00
and there is a field on the form called "Additional Cost", the language will allow them manipulate the base cost relative to the "additional cost" field.
So
base = 1;
if(additional > 100 && additional < 150){base += 50}
elseif(additional == 150){base *= 150}
else{base += additional;}
This is a basic example of how the language would be used.
Thank you for all your answers,
I've investigated a parser and creating one would be far more complex than is required
having run several tests with 1000's of lines of code and found that character by character it only takes a few seconds to process even on a single core P4 with 512mb of memory (which is far less than the customer uses)
I've decided to build a PHP based syntax checker which will check the information and convert the variables etc into valid PHP code whilst it's checking it (so that it's ready to be called later without recompilation) using this instead of javascript this seems more appropriate and will allow for more complex code to arise without hindering the validation process
It's only taken an hour and I have code which is able to check the validity of an if statement and isn't confused by nested if's, spaces or odd expressions, there is very little left to be checked whereas a parser and full blown scripting language would have taken a lot longer
You've all given me a lot to think about and i've rated relevant answers thank you
If you really want to do this — and by that I mean if you really want your software to work properly and predictably, without a bunch of weird "don't do this" special cases — you're going to have to write a real parser for your language. Once you have that, you can transform any program in your language into a data structure. With that data structure you'll be able to conduct all sorts of analyses of the code, including procedures that at least used to be called use-definition and definition-use chain analysis.
If you concoct a "programming language" that enables some scripting in an application, then no matter how trivial you think it is, somebody will eventually write a shockingly large program with it.
I don't know of any readily-available parser generators that generate JavaScript parsers. Recursive descent parsers are not too hard to write, but they can get ugly to maintain and they make it a little difficult to extend the syntax (esp. if you're not very experienced crafting the original version).
You might want to look at JS/CC which is a parser generator that generates a parser for a grammer, in Javascript. You will need to figure out how to describe your language using a BNF and EBNF. Also, JS/CC has its own syntax (which is somewhat close to actual BNF/EBNF) for specifying the grammar. Given the grammer, JS/CC will generate a parser for that grammar.
Your other option, as Pointy said, is to write your own lexer and recursive-descent parser from scratch. Once you have a BNF/EBNF, it's not that hard. I recently wrote a parser from an EBNF in Javascript (the grammar was pretty simple so it wasn't that hard to write one YMMV).
To address your comments about it being "client specific". I will also add my own experience here. If you're providing a scripting language and a scripting environment, there is no better route than an actual parser.
Handling special cases through a bunch of if-elses is going to be horribly painful and a maintenance nightmare. When I was a freshman in college, I tried to write my own language. This was before I knew anything about recursive-descent parsers, or just parsers in general. I figured out by myself that code can be broken down into tokens. From there, I wrote an extremely unwieldy parser using a bunch of if-elses, and also splitting the tokens by spaces and other characters (exactly what you described). The end result was terrible.
Once I read about recursive-descent parsers, I wrote a grammar for my language and easily created a parser in a 10th of the time it took me to write my original parser. Seriously, if you want to save yourself a lot of pain, write an actual parser. If you go down your current route, you're going to be fixing issues forever. You're going to have to handle cases where people put the space in the wrong place, or perhaps they have one too many (or one too little) spaces. The only other alternative is to provide an extremely rigid structure (i.e, you must have exactly x number of spaces following this statement) which is liable to make your scripting environment extremely unattractive. An actual parser will automatically fix all these problems.
Javascript has a function 'eval'.
var code = 'alert(1);';
eval(code);
It will show alert. You can use 'eval' to execute basic code.
I'm working on creating a basic RPG game engine prototype using JavaScript and canvas. I'm still working out some design specs on paper, and I've hit a bit of a problem I'm not quite sure how to tackle.
I will have a Character object that will have an array of Attribute objects. Attributes will look something like this:
function(name, value){
this.name = name;
this.value = value;
...
}
A Character will also have "skills" that are calculated off attributes. A skills value can also be determined by a formula entered by the user. A legit formula would look something like this:
((#attribute1Name + (#attribute2Name / 2) * 5)
where any text following the # sign represents the name of an attribute belonging to that character. The formula will be entered into a text field as a string.
What I'm having a problem with is understanding the proper way to parse and evaluate this formula. Initially, my plan was to do a simple replace on the attribute names and eval the expression (if invalid, the eval would fail). However, this presents a problem as it would allow for JavaScript injection into the field. I'm assuming I'll need some kind of FSM similar to an infix calculator to solve this, but I'm a little rusty on my computation theory (thanks corporate world!). I'm really not asking for someone to just hand me the code so much as I'd like to get your input on what is the best solution to this problem?
EDIT: Thanks for the responses. Unfortunately life has kept me busy and I haven't tried a solution yet. Will update when I get a result (good or bad).
Different idea, hence a separate suggestion:
eval() works fine, and there's no need to re-invent the wheel.
Assuming that there's only a small and fixed number of variables in your formula language, it would be sufficient to scan your way through the expression and verify that everything you encounter is either a parenthesis, an operator or one of your variable names. I don't think there would be any way to assemble those pieces into a piece of code that could have malicious side effects on eval.
So:
Scan the expression to verify that it draws from just a very limited vocabulary.
Let eval() work it out.
Probably the compromise with the least amount of work and code while bringing risk down to (near?) 0. At worst, a misuser could tack parentheses on a variable name in an attempt to execute the variable.
I think instead of letting them put the whole formula in, you could have select tags that have operations and values, and let them choose.
ie. a set of tags with attribute-operation-number:
<select> <select> <input type="text">
#attribute1Name1 + (check if input is number)
#attribute1Name2 -
#attribute1Name3 *
#attribute1Name4 /
etc.
There is a really simple solution: Just enter a normal JavaScript formula (i.e. as if you were writing a method for your object) and use this to reference the object you're working on.
To change this when evaluating the method use apply() or call() (see this answer).
I recently wrote a similar application. I probably invested far too much work, but I went the whole 9 yards and wrote both a scanner and a parser.
The scanner converted the text into a series of tokens; tokens are simple objects consisting of token type and value. For the punctuation marks, value = character, for numbers the values would be integers corresponding to the numeric value of the number, and for variables it would be (a reference to) a variable object, where that variable would be sitting in a list of objects having a name. Same variable object = same variable, natch.
The parser was a simple brute force recursive descent parser. Here's the code.
My parser does logic expressions, with AND/OR taking the place of +/-, but I think you can see the idea. There are several levels of expressions, and each tries to assemble as much of itself as it can, and calls to lower levels for parsing nested constructs. When done, my parser has generated a single Node containing a tree structure that represents the expression.
In your program, I guess you could just store that Node, as its structure will essentially represent the formula for its evaluation.
Given all that work, though, I'd understand just how tempting it would be to just cave in and use eval!
I'm fascinated by the task of getting this done by the simplest means possible.
Here's another approach:
Convert infix to postfix;
use a very simple stack-based calculator to evaluate the resulting expression.
The rationale here being, once you get rid of the complication of "* before +" and parentheses, the remaining calculation is very straightforward.
You could look at running the user-defined code in a sandbox to prevent attacks:
Is It Possible to Sandbox JavaScript Running In the Browser?