Regex works with JavaScript but not PHP - javascript

I used this regex in the JavaScript for my webpage, and it worked perfectly:
var nameRegex = /^([ \u00c0-\u01ffa-zA-Z'\-])+$/;
return nameRegex.test(name);
I then wanted to include it in my PHP script as a second barrier in case the user disables JavaScript etc. But whenever I use it it will fail every string that I pass through it, even correct names.
I tried using single quotes to stop escape characters, but then I had to escape the single quote contained within the regex, and came up with this:
$nameRegex = '/^([ \u00c0-\u01ffa-zA-Z\'\-])+$/';
if ($firstName == ""){
$valSuccess = FALSE;
$errorMsgTxt .= "Please enter your first name<br>\n";
} elseif (!preg_match($nameRegex, $firstName)){
$valSuccess = FALSE;
$errorMsgTxt .= "Please enter a valid first name<br>\n";
}
But, once again, it fails valid names.
So my question is, how can I make my regex "safe" for use in PHP?

The problem with your regular expression is that this works in javascript, but your syntax is not valid in pcre.
You need to consider \X which matches a single Unicode grapheme, whether encoded as a single code point or multiple code points using combining marks. The correct syntax would be..
/^[ \X{00c0-01ff}a-zA-Z'-]+$/

Related

Javascript How to escape \u in string literal

Strange thing...
I have a string literal that is passed to my source code as a constant token (I cannot prehandle or escape it beforehand).
Example
var username = "MYDOMAIN\tom";
username = username.replace('MYDOMAIN','');
The string somewhere contains a backslash followed by a character.
It's too late to escape the backslash at this point, so I have to escape these special characters individually like
username = username.replace(/\t/ig, 't');
However, that does not work in the following scenario:
var username = "MYDOMAIN\ulrike";
\u seems to introduce a unicode character sequence. \uLRIK cannot be interpreted as a unicode sign so the Javascript engine stops interpreting at this point and my replace(/\u/ig,'u') comes too late.
Has anybody a suggestion or workaround on how to escape such a non-unicode character sequence contained in a given string literal? It seems a similar issue with \b like in "MYDOMAIN\bernd".
I have a string literal that is passed to my source code
Assuming you don't have any < or >, move this to inside an HTML control (instead of inside your script block) or element and use Javacript to read the value. Something like
<div id="myServerData">
MYDOMAIN\tom
</div>
and you retrieve it so
alert(document.getElementById("myServerData").innerText);
IMPORTANT : injecting unescaped content, where the user can control the content (say this is data entered in some other page) is a security risk. This goes for whether you are injecting it in script or HTML
Writing var username = "MYDOMAIN\ulrike"; will throw a syntax error. I think you have this string coming from somewhere.
I would suggest creating some html element and setting it's innerHTML to the received value, and then picking it up.
Have something like:
<div id="demo"></div>
Then do document.getElementById("demo").innerHTML = username;
Then read the value from there as document.getElementById("demo").innerHTML;
This should work I guess.
Important: Please make sure this does not expose the webpage to script injections. If it does, this method is bad, don't use it.

Regular expression to match at least two special characters in any order

I have to do jQuery form validation for password.
The password should contain at least two special characters in any order. I have tried with
Regular Expression for password validation but it does not address that two random special characters can come at any order.
How do I do it using a JavaScript regular expression?
You do not have to use look-arounds in cases when you do not have to.
If you only need to make sure the string has at least 2 characters of a specific set, use this kind of a regex (with a negated class to make it more robust):
/(?:[^`!##$%^&*\-_=+'\/.,]*[`!##$%^&*\-_=+'\/.,]){2}/
See demo
In javascript it worked for me:
/(?=(.*[`!##$%\^&*\-_=\+'/\.,]){2})/
var goodtogo = false;
var pass = 'simp!le#'; //example
var times = pass.match(/[\\\[\]\/\(\)\+\*\?`!##$%\^&_=-]/g).length;
if(times >= 2)
goodtogo = true;
Now I advice you to try several passwords and if you find a bug or something don't hesitate to yell back.
And if you have more special chars just add them to the parameter for match.
Hope it helps.

Find exact match in haystack but only when text is not part of larger string of chars

My google-fu has failed me here, because I'm not sure how to search for an answer without getting generic results for finding a string needle in a haystack in javascript. If this is a duplicate question, just let me know and I'll close this one out.
What I'm trying to do
I'm currently searching through text using indexOf() in javascript to find any occurrence of a user's username that starts with the # char. indexOf is working well enough in most cases for this, but it's failing when a user has a name that is also part of another user's name.
For instance, using indexOf(), I current get matches for a username "RandomDonkey" when there is text directed at "#RandomDonkeyKong" or "#RandomDonkeyFarmer".
What I'm looking for
I'd like to find the most efficient way to ensure that messages containing (for example) "#RandomDonkeyFarmer" don't cause alerts for the user "RandomDonkey", as only an exact match with no extra chars included with the username for "#RandomDonkey" should be cause for an alert.
What I've considered
I'm no good at writing regex, so I've considered it a possible solution but am not sure how to write it.
I've also considered looking for the match, and the further checking that there are no characters other than a space after the last character (assuming that the username doesn't end the string).
Is there a better way to go about this, or would one of those two solutions be the most efficient?
The code I'm currently using and some examples that should pass / fail
var username = 'RandomDonkey';
if(message.toLowerCase().indexOf('#' + username.toLowerCase()) != -1){
alert('this is a direct message');
}//if direct message
else{
alert('this is NOT a direct message');
}
Some messages that should pass:
message = "Hey #randomdonkey what's going on?";
message = "#RandomDonkey what are you up to";
message = "These are silly examples #RandomDonkey";
Some messages that should fail:
message = "#RandomDonkeyKong is not a match for RandomDonkey";
message = "I'm messaging #RandomDonkeyFarmer";
Currently all of these examples pass because of the way that indexOf() works, which is why I'm looking for another method.
I believe regex is indeed the answer. This should work for most cases:
var username = "RandomDonkey"
var text = "hey #RandomDonkey are you something something etc."
var re=new RegExp("#"+username+"\b", "i")
if (re.test(text)) {
alert(username);
}
This works when usernames can only have word characters in them (so A-Z, a-z, 0-9 and the underscore character _)
To allow usernames with dashes so that this doesn't break, use this in place of "\b" in the regex: "([^\\w\\-]|$)"
So the regex defining line becomes: var re=new RegExp("#"+username+"([^\\w\\-]|$)", "i")
It looks for a character which doesn't match a word character or a dash, so anything passwords shouldn't contain, or the end of the string.
The only issues that might arise are if people have special regex chars in their usernames, which should be easily preventable by just prohibiting usernames that don't match ^[\w\-]+$ (one or more letters, numbers, dashes and underscores and nothing more)

Regex validation rules

I'm writing a database backup function as part of my school project.
I need to write a regex rule so the database backup name can only contain legal characters.
By 'legal' I mean a string that doesn't contain ANY symbols or spaces. Only letters from the alphabet and numbers.
An example of a valid string would be '31Jan2012' or '63927jkdfjsdbjk623' or 'hello123backup'.
Here's my JS code so far:
// Check if the input box contains the charactes a-z, A-Z ,or 0-9 with a regular expression.
function checkIfContainsNumbersOrCharacters(elem, errorMessage){
var regexRule = new RegExp("^[\w]+$");
if(regexRule.test( $(elem).val() ) ){
return true;
}else{
alert(errorMessage);
return false;
}
}
//call the function
checkIfContainsNumbersOrCharacters("#backup-name", "Input can only contain the characters a-z or 0-9.");
I've never really used regular expressions before though, however after a quick bit of googling i found this tool, from which I wrote the following regex rule:
^[\w]+$
^ = start of string
[/w] = a-z/A-Z/0-9
'+' = characters after the string.
When running my function, the whatever string I input seems to return false :( is my code wrong? or am I not using regex rules correctly?
The problem here is, that when writing \w inside a string, you escape the w, and the resulting regular expression looks like this: ^[w]+$, containing the w as a literal character. When creating a regular expression with a string argument passed to the RegExp constructor, you need to escape the backslash, like so: new RegExp("^[\\w]+$"), which will create the regex you want.
There is a way to avoid that, using the shorthand notation provided by JavaScript: var regex = /^[\w]+$/; which does not need any extra escaping.
It can be simpler. This works:
function checkValid(name) {
return /^\w+$/.test(name);
}
/^\w+$/ is the literal notation for new RegExp(). Since the .test function returns a boolean, you only need to return its result. This also reads better than new RegExp("^\\w+$"), and you're less likely to goof up (thanks #x3ro for pointing out the need for two backslashes in strings).
The \w is a synonym for [[:alnum:]], which matches a single character of the alnum class. Note that using character classes means that you may match characters that are not part of the ASCII character encoding, which may or may not be what you want. If what you really intend to match is [0-9A-Za-z], then that's what you should use.
When you declare the regex as a string parameter to the RegExp constructor, you need to escape it. Both
var regexRule = new RegExp("^[\\w]+$");
...and...
var regexRule = new RegExp(/^[\w]+$/);
will work.
Keep in mind though, that client side validation for database data will never be enough, as the validation is easily bypassed by disabling javascript in the browser, and invalid/malicious data can reach your DB. You need to validate the data on the server side, but preventing the request with invalid data, but validating client side is good practice.
This is the official spec: http://dev.mysql.com/doc/refman/5.0/en/identifiers.html but it's not very easily converted to a regular expression. Just a regular expression won't do it as there are also reserved words.
Why not just put it in the query (don't forget to escape it properly) and let MySQL give you an error? There might for instance be a bug in the MySQL version you're using, and even though your check is correct, MySQL might still refuse.

Why is regex failing when input contains a newline?

I've inherited this javascript regex from another developer and now, even though nothing has changed, it doesn't seem to match the required text. Here is the regex:
/^.*(already (active|exists|registered)).*$/i
I need it to match any text that looks like
stuff stuff already exists more stuff etc
It looks perfectly fine to me, it only looks for those 2 words together and should in theory ignore the rest of the string. In my script I check the text like this
var cardUsedRE = /^.*(already (active|exists|registered)).*$/i;
if(cardUsedRE.test(responseText)){
mdiv.className = 'userError';
mdiv.innerHTML = 'The card # has already been registered';
document.getElementById('cardErrMsg').innerHTML = arrowGif;
}
I've stepped through this in FireBug and I've seen it fail to test this string:
> Error: <detail>Card number already registered for CLP.\n</detail>
Am I missing something? What is the likely issue with this?
Here's a simplified but functionally-equivalent regex that should handle newlines:
/(already\s+(active|exists|registered))/i
Not sure why you'd ever want to lead with ^.* or end with .*$ unless your goal is specifically to prevent newlines. Otherwise it's just superfluous.
EDIT: I replaced the space with \s+ so it will be more liberal with how it handles whitespace (e.g. one space, two spaces, a tab, etc. should all match).
tldr; Use the m modifier to make . match newlines. See the MDC regular expression documentation.
Failing (note the "\n" in the string literal):
var str = "Error: <detail>Card number already registered for CLP.\n</detail>"
str.match(/^.*(already (active|exists|registered)).*$/i)
Working (note m flag for "multi-line" behavior of .):
var str = "Error: <detail>Card number already registered for CLP.\n</detail>"
str.match(/^.*(already (active|exists|registered)).*$/mi)
I would use a simpler form, however: (Adjust for definition of "space".)
var str = "Error: <detail>Card number already registered for CLP.\n</detail>";
str.match(/(?:already\s+(?:active|exists|registered))/i)
Happy coding.

Categories

Resources