When running a Postgres query using POSIX regular expression matching, the query may result in a invalid regular expression error if one of the RegExp patterns is invalid. If the regex query uses a database column, the error will occur if just one of the database rows contains an invalid RegExp pattern.
The problem is that validating values to be used for this type of query does not appear to be very straightforward. All of the solutions I have come across for validating RegExp patterns in javascript, including libraries such as regexpp do not appear to be reliable for testing whether Postgres would consider a given pattern to be valid.
Is there a way to test whether a pattern would be valid in a Postgres query, or is the only way to do this validation to actually run a Postgres query using the pattern?
I don't think there is anything built-in that does it in a user-accessible way. You can create your own function to do this by catching the error.
create function true_on_error(text) returns bool language plpgsql as $$
BEGIN
perform regexp_match('',$1);
return false;
exception when others then
return true;
end$$;
Related
I'm assuming that DoS is a possible issue when matching, on the backend in Node.js, arbitrary strings with arbitrary regexes with one of JS's regex functions. If the provided regex is simply invalid, the error thrown by the constructor can just be caught -- but I'm thinking it's possible that matching the string with the RegExp could become a significantly or even completely blocking operation, deliberately or accidentally by the creator of the regex and the string? If so, how exactly would this be caused, and how could it be mitigated?
To start this off, I am well aware that parameterized queries are the best option, but I am asking what makes the strategy I present below vulnerable. People insist the below solution doesn't work, so I am look for an example of why it wouldn't.
If dynamic SQL is built in code using the following escaping before being sent to a SQL Server, what kind of injection can defeat this?
string userInput= "N'" + userInput.Replace("'", "''") + "'"
A similar question was answered here, but I don't believe any of the answers are applicable here.
Escaping the single quote with a "\" isn't possible in SQL Server.
I believe SQL Smuggling with Unicode (outlined here) would be thwarted by the fact that the string being produced is marked as Unicode by the N preceding the single quote. As far as I know, there are no other character sets that SQL Server would automatically translate to a single quote. Without an unescaped single quote, I don't believe injection is possible.
I don't believe String Truncation is a viable vector either. SQL Server certainly won't be doing the truncating since the max size for an nvarchar is 2GB according to microsoft. A 2 GB string is unfeasible in most situations, and impossible in mine.
Second Order Injection could be possible, but is it possible if:
All data going into the database is sanitized using the above method
Values from the database are never appended into dynamic SQL (why would you ever do that anyways, when you can just reference the table value in the static part of any dynamic SQL string?).
I'm not suggesting that this is better than or an alternative to using parameterized queries, but I want to know how what I outlined is vulnerable. Any ideas?
There are a few cases where this escape function will fail. The most obvious is when a single quote isn't used:
string table= "\"" + table.Replace("'", "''") + "\""
string var= "`" + var.Replace("'", "''") + "`"
string index= " " + index.Replace("'", "''") + " "
string query = "select * from `"+table+"` where name=\""+var+"\" or id="+index
In this case, you can "break out" using a double-quote, a back-tick. In the last case there is nothing to "break out" of, so you can just write 1 union select password from users-- or whatever sql payload the attacker desires.
The next condition where this escape function will fail is if a sub-string is taken after the string is escaped (and yes I have found vulnerabilities like this in the wild):
string userPassword= userPassword.Replace("'", "''")
string userName= userInput.Replace("'", "''")
userName = substr(userName,0,10)
string query = "select * from users where name='"+userName+"' and password='"+userPassword+"'";
In this case a username of abcdefgji' will be turned into abcdefgji'' by the escape function and then turned back into abcdefgji' by taking the sub-string. This can be exploited by setting the password value to any sql statement, in this case or 1=1-- would be interpreted as sql and the username would be interpreted as abcdefgji'' and password=. The resulting query is as follows:
select * from users where name='abcdefgji'' and password=' or 1=1--
T-SQL and other advanced sql injection techniques where already mentioned. Advanced SQL Injection In SQL Server Applications is a great paper and you should read it if you haven't already.
The final issue is unicode attacks. This class of vulnerabilities arises because the escape function is not aware of multi-byte encoding, and this can be used by an attacker to "consume" the escape character. Prepending an "N" to the string will not help, as this doesn't affect the value of multi-byte chars later in the string. However, this type of attack is very uncommon because the database must be configured to accept GBK unicode strings (and I'm not sure that MS-SQL can do this).
Second-Order code injection is still possible, this attack pattern is created by trusting attacker-controlled data sources. Escaping is used to represent control characters as their character literal. If the developer forgets to escape a value obtained from a select and then uses this value in another query then bam the attacker will have a character literal single quote at their disposal.
Test everything, trust nothing.
With some additional stipulations, your approach above is not vulnerable to SQL injection. The main vector of attack to consider is SQL Smuggling. SQL Smuggling occurs when similiar unicode characters are translated in an unexpected fashion (e.g. ` changing to ' ). There are several locations where an application stack could be vulnerable to SQL Smuggling.
Does the Programming language handle unicode strings appropriately? If the language isn't unicode aware, it may mis-identify a byte in a unicode character as a single quote and escape it.
Does the client database library (e.g. ODBC, etc) handle unicode strings appropriately? System.Data.SqlClient in the .Net framework does, but how about old libraries from the windows 95 era? Third party ODBC libraries actually do exist. What happens if the ODBC driver doesn't support unicode in the query string?
Does the DB handle the input correctly? Modern versions of SQL are immune assuming you're using N'', but what about SQL 6.5? SQL 7.0? I'm not aware of any particular vulnerabilities, however this wasn't on the radar for developers in the 1990's.
Buffer overflows? Another concern is that the quoted string is longer than the original string. In which version of Sql Server was the 2GB limit for input introduced? Before that what was the limit? On older versions of SQL, what happened when a query exceeded the limit? Do any limits exist on the length of a query from the standpoint of the network library? Or on the length of the string in the programming language?
Are there any language settings that affect the comparison used in the Replace() function? .Net always does a binary comparison for the Replace() function. Will that always be the case? What happens if a future version of .NET supports overriding that behavior at the app.config level? What if we used a regexp instead of Replace() to insert a single quote? Does the computer's locale settings affect this comparison? If a change in behavior did occur, it might not be vulnerable to sql injection, however, it may have inadvertently edited the string by changing a uni-code character that looked like a single quote into a single quote before it ever reached the DB.
So, assuming you're using the System.String.Replace() function in C# on the current version of .Net with the built-in SqlClient library against a current (2005-2012) version of SQL server, then your approach is not vulnerable. As you start changing things, then no promises can be made. The parameterized query approach is the correct approach for efficiency, for performance, and (in some cases) for security.
WARNING The above comments are not an endorsement of this technique. There are several other very good reasons why this the wrong approach to generating SQL. However, detailing them is outside the scope for this question.
DO NOT USE THIS TECHNIQUE FOR NEW DEVELOPMENT.
DO NOT USE THIS TECHNIQUE FOR NEW DEVELOPMENT.
DO NOT USE THIS TECHNIQUE FOR NEW DEVELOPMENT.
Using query parameters is better, easier, and faster than escaping quotes.
Re your comment, I see that you acknowledged parameterization, but it deserves emphasis. Why would you want to use escaping when you could parameterize?
In Advanced SQL Injection In SQL Server Applications, search for the word "replace" in the text, and from that point on read some examples where developers inadvertently allowed SQL injection attacks even after escaping user input.
There is an edge case where escaping quotes with \ results in a vulnerability, because the \ becomes half of a valid multi-byte character in some character sets. But this is not applicable to your case since \ isn't the escaping character.
As others have pointed out, you may also be adding dynamic content to your SQL for something other than a string literal or date literal. Table or column identifiers are delimited by " in SQL, or [ ] in Microsoft/Sybase. SQL keywords of course don't have any delimiters. For these cases, I recommend whitelisting the values to interpolate.
Bottom line is that escaping is an effective defense, if you can ensure that you do it consistently. That's the risk: that one of the team of developers on your application could omit a step and do some string interpolation unsafely.
Of course, the same is true of other methods, like parameterization. They're only effective if you do them consistently. But I find it's easier and quicker to use parameters, than to figure out the right type of escaping. Developers are more likely to use a method that is convenient and doesn't slow them down.
SQL injection occur if user supplied inputs are interpreted as commands. Here command means anything that is not interpreted as a recognized data type literal.
Now if you’re using the user’s input only in data literals, specifically only in string literals, the user input would only be interpreted as something different than string data if it would be able to leave the string literal context. For character string or Unicode string literals, it’s the single quotation mark that encloses the literal data while embedded single quotation mark need to be represented with two single quotation marks.
So to leave a string literal context, one would need to supply a single single quotation mark (sic) as two single quotation marks are interpreted as string literal data and not as the string literal end delimiter.
So if you’re replacing any single quotation mark in the user supplied data by two single quotation marks, it will be impossible for the user to leave the string literal context.
SQL Injection can occur via unicode. If the web app has a URL like this:
http://mywebapp/widgets/?Code=ABC
which generates SQL like
select * from widgets where Code = 'ABC'
but a hacker enters this:
http://mywebapp/widgets/?Code=ABC%CA%BC;drop table widgets--
the SQL will look like
select * from widgets where Code = 'ABC’;drop table widgets--'
and SQL Server will run two SQL Statements. One to do the select and one to do the drop.
Your code probably converts the url-encoded %CA%BC into unicode U02BC which is a "Modifier letter apostrophe". The Replace function in .Net will NOT treat that as a single quote. However Microsoft SQL Server treats it like a single quote. Here is an example that will probably allow SQL Injection:
string badValue = ((char)0x02BC).ToString();
badValue = badValue + ";delete from widgets--";
string sql = "SELECT * FROM WIDGETS WHERE ID=" + badValue.Replace("'","''");
TestTheSQL(sql);
There is probably no 100% safe way if you are doing string concatenation. What you can do is try to check data type for each parameter and if all parameters pass such validation then go ahead with execution. For example, if your parameter should be type int and you’re getting something that can’t be converted to int then just reject it.
This doesn’t work though if you’re accepting nvarchar parameters.
As others already pointed out. Safest way is to use parameterized query.
I regularly receive emails from the same person, each containing one or more unique identifying codes. I need to get those codes.
The email body contains a host of inconsistent email content, but it is the strings I am interested in. They look like...
loYm9vYzE6Z-aaj5lL_Og539wFer0KfD
FuZTFvYzE68y8-t4UgBT9npHLTGmVAor
JpZDRwYzE6dgyo1legz9sqpVy_F21nx8
ZzZ3RwYzE63P3UwX2ANPI-c4PMo7bFmj
What the strings seem to have in common is, they are all 32 characters in length and all composed of a mixture of both uppercase, lowercase, numbers and symbols. But a given email may contain none, one or multiple, and the strings will be in an unpredictable position, not on adjacent lines as above.
I wish to make a Zap workflow in Zapier, the linking tool for web services, to find these strings and use them in another app - ie. whenever a string is found, create a new Trello card.
I have already started the workflow with Zapier's "Gmail" integration as a "trigger", specifically a search using the "from:" field corresponding to the regular sender. That's the easy part.
But the actual parsing of the email body is foxing me. Zapier has a rudimentary email parser, but it is not suitable for this task. What is suitable is using Zapier's own "Code" integration to execute freeform code - namely, a regular expression to identify those strings.
I have never done this before and am struggling to formulate working code. Zapier Code can take either Python (documentation) or Javascript (documentation). It supports data variables "input_data" (Python) or "inputData" (Javascript) and "output" (both).
See, below, how I insert the Gmail body in to "body" for parsing...
I need to use the Code box to construct a regular expression to find each unique identifier string and output it as input to the next integration in the workflow, ie. Trello.
For info, in the above screengrab, the existing "hello world" code in the box is Zapier's own test code. The fields "id" and "hello" are made available to the next workflow app in the chain.
But I need to do my process for all of the strings found within an email body - ie. if an email contains just one code, create one Trello card; but if an email contains four codes, create a Trello card for each of the four.
That is, there could be multiple outputs. I have no idea how this could work, since I think these workflows are only supposed to accommodate one action.
I could use some help getting over the hill. Thank-you.
David here, from the Zapier Platform team.
I'm glad you're showing interest in the code step. Assuming your assumptions (32 characters exactly) is always going to be true, this should be fairly straightforward.
First off, the regex. We want to look for a character that's a letter, number, or punctuation. Luckily, javascript's \w is equivalent to [A-Z0-9a-z_], which covers the bases in all of your examples besides the -, which we'll include manually. Finally, we want exactly 32 character length strings, so we'll ask for that. We also want to add the global flag, so we find all matches, not just the first. So we have the following:
/[\w-]{32}/g
You've already covered mapping the body in, so that's good. The javascript code will be as follows:
// stores an array of any length (0 or more) with the matches
var matches = inputData.body.match(/[\w-]{32}/g)
// the .map function executes the nameless inner function once for each
// element of the array and returns a new array with the results
// [{str: 'loYm9vYzE6Z-aaj5lL_Og539wFer0KfD'}, ...]
return (matches || []).map(function (m) { return {str: m} })
Here, you'll be taking advantage of an undocumented feature of code steps: when you return an array of objects, subsequent steps are executed once for each object. If you return an empty array (which is what'll happen if no keys are found), the zap halts and nothing else happens. When you're testing, there'll be no indicator that anything besides the first result does anything. Once your zap is on and runs for real though, it'll fan out as described here.
That's all it takes! Hopefully that all makes sense. Let me know if you've got any other questions!
I'm currently using Redis, but examples in any database (that are good with NodeJS) would be good to get me going.
I'm looking to find Regex Patterns from a list, by providing potential matches.
I want to query my database of patterns and ask it - "which patterns would match this string?"
Example
Pattern Database:
(\/some\/)
(\/relative\/)
(\/other\/)
Search: "/some/relative/url/"
Return:
(\/some\/)
(\/relative\/)
Search: "/some/other/url/"
Return:
(\/some\/)
(\/other\/)
So my question is: is this possible? If so, how?
This is not possible (to my knowledge) using only redis call. I suggest loading all the regular expressions from the database and running them in javascript to figure out which ones match.
I think there is a way to that. You need to treat the pattern to be stored as a string, then retrieve them as you retrieve any other string from the redis datastore. Just for your reference I am giving a link to redis-nodejs tutorial I found - Using Redis With Nodejs.
I am trying to get this Regex statement to work
^([_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})+(\s?[,]\s?|$))+$
for a string of comma separated emails in a textbox using jQuery('#textbox').val(); which passes the values into the Regex statement to find errors for a string like:
"test#test.com, test1#test.com,test2#test.com"
But for some reason it is returning an error. I tried running it through http://regexpal.com/ but i'm unsure ?
NB: This is just a basic client-side test. I validate emails via the MailClass on the server-side using .NET4.0 - so don't jump down my throat re-this. The aim here is to eliminate simple errors.
Escaped Version:
^([_a-z0-9-]+(\\.[_a-z0-9-]+)*#[a-z0-9-]+(\\.[a-z0-9-]+)*(\\.[a-z]{2,3})+(\\s?[,]\\s?|$))+$
You can greatly simplify things by first splitting on commas, as Pablo said, then repeatedly applying the regex to validate each individual email. You can also then point out the one that's bad -- but there's a big caveat to that.
Take a look at the regex in the article Comparing E-mail Address Validating Regular Expressions. There's another even better regex that I couldn't find just now, but the point is a correct regex for checking email is incredibly complicated, because the rules for a valid email address as specified in the RFC are incredibly complicated.
In yours, this part (\.[a-z]{2,3})+ jumped out at me; the two-or-three-letters group {2,3} I often see as an attempt to validate the top-level domain, but (1) your regex allows one or more of these groups and (2) you will exclude valid email addresses from domains such as .info or .museum (Many sites reject my .us address because they thought only 3 letter domains were legal.)
My advice to reject seriously invalid addresses, while leaving the final validation to the server, is to allow basically (anything)#(anything).(anything) -- check only for an "at" and a "dot", and of course allow multiple dots.
EDIT: Example for "simple" regex
[^#]+#[^.]+(\.[^.]+)+
This matches
test#test.com
test1#test.com
test2#test.com
foo#bar.baz.co.uk
myname#modern.museum
And doesn't match foo#this....that
Note: Even this will reject some valid email addresses, because anything is allowed on the left of the # - even another # - if it's all escaped properly. But I've never seen that in 25 years of using email in Real Life.