I am trying to find a regex that will work for validating URLs. I found this guy:
^(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?$
Which worked pretty well when I tested it using regexpal, but when I actually plug it into my javascript it fails to match. Fiddle here.
I am testing against this URL:
http://s3.amazonaws.com/SomeShow/Podcasts/HouroneofWhatever234.mp3
Can anyone see why it would match in regexpal, but not when I try to use it in my javascript?
Use a regex literal:
/^(http|ftp|https):\/\/[\w-]+...$/
(you also have to escape the slashes to prevent the them to be interpreted as ending regex terminal symbol)
If you use a string, you have to escape every backslash, because the backslash is the escape characters in strings as well.
new RegExp("^(http|ftp|https)://[\\w-]+...$")
In your current expression, "[\w-]+" will turn to [w]+ because \w is not a valid escape sequence in strings.
See: https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Regular_Expressions#Creating_a_Regular_Expression
Related
I have created a javascript regular expression in order to validate comments entered by users in my app. The regex allows letters, numbers some special symbols and a range of emojis
I received help here to correctly format my javascript regular expression and the final expression I am using is as follows:
Javascript Regex:
commentRegex = /^(?:[A-Za-z0-9\u00C0-\u017F\u20AC\u2122\u2150\u00A9 \/.,\-_$!\'&*()="?#+%:;\<\[\]\r\r\n]|(?:\ud83c[\udf00-\udfff])|(?:\ud83d[\udc00-\ude4f\ude80-\udeff]))*$/;
I was advised to perform the same validation on the server side (with php) and so I am trying to perform a similar process using preg_replace().
So I would like to replace all characters (that are not contained in the regex), with the empty string. Here is my attempt however it is not working. thanks for any help
PHP
$commentText = preg_replace('#^(?:[A-Za-z0-9\u00C0-\u017F\u20AC\u2122\u2150\u00A9 \/.,\-_$!\'&*()="?#+%:;\<\[\]\r\r\n]|(?:\ud83c[\udf00-\udfff])|(?:\ud83d[\udc00-\ude4f\ude80-\udeff]))*$#', '', $commentText);
Edit:
After taking your advice in the comments I now have the following regex.
$postText = preg_replace('/^(?:[A-Za-z0-9\x{00C0}-\x{017F}\x{20AC}\x{2122}\x{2150}\x{00A9} \/.,\-_$!\'&*()="?\#\+%:;\<\[\]\r\n]|(?:\x{d83c}[\x{df00}-\x{dfff}])|(?:\x{d83d}[\x{dc00}-\x{de4f}\x{de80}-\x{deff}]))*$/', '', $postText);
However I am getting a warning
<b>Warning</b>: preg_replace(): Compilation failed: character value in \x{} or \o{} is too large at offset 30 in <b>submit_post.php</b> on line <b>37
In short: use
$re = '/[^A-Za-z0-9\x{00C0}-\x{017F}\x{20AC}\x{2122}\x{2150}\x{00A9} \/.,\-_$!\'&*()="?#+%:;<[\]\r\n\x{1F300}-\x{1F3FF}\x{1F400}-\x{1F64F}\x{1F680}-\x{1F6FF}]+/u';
$text = 'test>><<<®¥§';
echo preg_replace($re, '', $text);
See the PHP demo.
A bit of an explanation:
Escape only special regex metacharacters inside the pattern AND the regex delimiters (if you choose a # as a regex delimiter, escape the # in the pattern, and then there is no need to escape /)
\uXXXX in PCRE must be replaced with \x{XXXX} notation
Since the text to be processed is Unicode and the chars you have in your pattern are out of the ASCII range, you have to use /u UNICODE modifier
As most emojis come outside the BMP plane, and the string now treated as a chain of Unicode code points, these symbols must be written using the extended \x notation, not as two byte notation used in JavaScript
Your 3 alternatives can be merged into 1 big character class and then you want to negated it by adding ^ at its start to make it a negated character class.
The regex in PHP has a character, which sourrounds the regex. In your case you are using the hash (#), but the character should not occour in the regex itslef, which it does...
You have to excape this character inside, or use another char. Why did you not use the same "/" as in the JS Version? The benefit is, it is already escaped.
I have not looked, if the rest would work, but I think so.
$commentText = preg_replace('/^(?:[A-Za-z0-9\u00C0-\u017F\u20AC\u2122\u2150\u00A9 \/.,\-_$!\'&*()="?#+%:;\<\[\]\r\r\n]|(?:\ud83c[\udf00-\udfff])|(?:\ud83d[\udc00-\ude4f\ude80-\udeff]))*$/', '', $commentText);
should work.
convert the \u.... sequences to \x{....}, and the result appears to be a valid PHP regular expression.
pattern: \\u(\w{4})
replace: \\x{$1}
regex101 demo
I am trying to find all occurrences of a special character / surrounded by either letters or numbers.
After many tries, I have come up with the following Regex that almost does what I need:
(?![a-z0-9])\/(?=[a-z0-9])
This works fine for these examples:
aa/aa
123/123
aa/123
However, it fails if there are two forward slashes together:
http://regexr.com/
In this case, it matches the second forward slash after http which I do not want.
How can I modify this Regex to meet my needs?
EDIT: I do not want to a match when two forward slashes are together. I only want to match if a single forward slash is surrounded by alphanumeric characters.
you would need a positive lookbehind group, like so:
(?<=[a-z0-9])+\/{1}(?=[a-z0-9]+)
however, according to http://regexr.com/ it is not supported in javascript.
Works fine in e.g. python http://pythex.org/
Easy!
(?![a-z0-9])\/+(?=[a-z0-9])
You should have put + for 1 on more occurrence of a character. So you should have written \/+ instead of just \/.
Try this
(!?[a-z0-9])\/(?=[a-z0-9])
Try this
[a-z0-9](\/)[a-z0-9]
Regex demo
Explanation:
( … ): Capturing group sample
\: Escapes a special character sample
I am facing this wierd two jslint issues for the below code
function hasSpecialChars(str){
return (/[~`#!#$%\^&*+=\-\[\]\\';,/{}()|\\":<>\?\s]/g).test(str);}
Unescape '/'
wrap regex patterns /regexp/ to disambiguate slash operator
I am trying to find the special characters in the string given.
You need to use match function inorder to find all the special characters.
str.match(/[~`#!#$%\^&*+=\-\[\]';,\/{}()|"\\:<>\?\s]/g)
And you must escape the forward slash.
To test for atleast one special char.
/[~`#!#$%^&*+=\-\[\]';,\/{}()|"\\:<>?\s]/.test(str)
or
/\W/.test(str)
I am a Regex newbie and trying to implement Regex to replace a matching pattern in a string only when it has a ( - open parentheses using Javascript. for example if I have a string
IN(INTERM_LEVEL_IN + (int)X_ID)
I would only like to highlight the first IN( in the string. Not the INTERM_LEVEL_IN (2 ins here) and the int.
What is the Regex to accomplish this?
To match the opening bracket you just need to escape it: IN\(.
For instance, running this in Firebug console:
enter code here"IN(INTERM_LEVEL_IN + (int)X_ID)".replace(/(IN()/, 'test');`
Will result in:
>>> "IN(INTERM_LEVEL_IN + (int)X_ID)".replace(/(IN\()/, 'test');
"testINTERM_LEVEL_IN + (int)X_ID)"
Parenthesis in regular expressions have a special meaning (sub-capture groups), so when you want them to be interpreted literally you have to escape them by with a \ before them. The regular expression IN\( would match the string IN(.
The following should only match IN( at the beginning of a line:
/^IN\(/
The following would match IN( that is not preceded by any alphanumeric character or underscore:
/[a-zA-Z0-9_]IN\(/
And finally, the following would match any instance of IN( no matter what precedes it:
/IN\(/
So, take your pick. If you're interested in learning more about regex, here's a good tutorial: http://www.regular-expressions.info/tutorial.html
You can use just regular old Javascript for regex, a simple IN\( would work for the example you gave (see here), but I suspect your situation is more complicated than that. In which case, you need to define exactly what you are trying to match and what you don't want to match.
I wrote the following regex:
(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?
Its behaviour can be seen here: http://gskinner.com/RegExr/?34b8m
I wrote the following JavaScript code:
var urlexp = new RegExp(
'^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$', 'gi'
);
document.write(urlexp.test("blaaa"))
And it returns true even though the regex was supposed to not allow single words as valid.
What am I doing wrong?
Your problem is that JavaScript is viewing all your escape sequences as escapes for the string. So your regex goes to memory looking like this:
^(https?://)?([da-z.-]+).([a-z]{2,6})(/(w|-)*)*/?$
Which you may notice causes a problem in the middle when what you thought was a literal period turns into a regular expressions wildcard. You can solve this in a couple ways. Using the forward slash regular expression syntax JavaScript provides:
var urlexp = /^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$/gi
Or by escaping your backslashes (and not your forward slashes, as you had been doing - that's exclusively for when you're using /regex/mod notation, just like you don't have to escape your single quotes in a double quoted string and vice versa):
var urlexp = new RegExp('^(https?://)?([da-z.-]+)\\.([a-z]{2,6})(/(\\w|-)*)*/?$', 'gi')
Please note the double backslash before the w - also necessary for matching word characters.
A couple notes on your regular expression itself:
[da-z.-]
d is contained in the a-z range. Unless you meant \d? In that case, the slash is important.
(/(\w|-)*)*/?
My own misgivings about the nested Kleene stars aside, you can whittle that alternation down into a character class, and drop the terminating /? entirely, as a trailing slash will be match by the group as you've given it. I'd rewrite as:
(/[\w-]*)*
Though, maybe you'd just like to catch non space characters?
(/[^/\s]*)*
Anyway, modified this way your regular expression winds up looking more like:
^(https?://)?([\da-z.-]+)\.([a-z]{2,6})(/[\w-]*)*$
Remember, if you're going to use string notation: Double EVERY backslash. If you're going to use native /regex/mod notation (which I highly recommend), escape your forward slashes.