using same regex both in php and javascript - javascript

Problem:-
I am using the following regex to find the special characters in a string
"/[^a-zA-Z0-9\s]/i"
I want to get all the characters that match this pattern and all that is working fine.
The condition is that that I have to use the same expression both in php and javascript.
But the g flag in the above regex is creating problem as preg_match and preg_match_all do not accept this flag and I have to search globally.
Question:-
SO how can I get all the special characters using the same expression both in php and javascript?

You can't use the same regex in both PHP and JavaScript because their regex engines make different assumptions and support different features.
More than just the incompatibility with the g modifier, this regex will fail you if the input contains non-ASCII characters: the input encoding in PHP and JS will be almost certainly different and PHP will not even be Unicode-aware unless you use the u flag (which does not exist in JS because it's Unicode-aware by default).
Just use two different regular expressions.

To match [^a-zA-Z0-9\s] in JavaScript you would have to use:
[\u0000-\u0008\u000F-\u001F\u0022-\u002F\u003B-\u0040\u005C-\u0060\u007C-\u0084\u0087-\u009F\u00A2-\u167F\u1682-\u180D\u1810-\u1FFF\u200C-\u2027\u202B-\u202E\u2031-\u205E\u2061-\u2FFF\u3002-\uFFFF]

Related

JS : Test if string contains any unicode capital

Do you know if there is a js regular expression that would catch any possibly unicode capital letter. Of course [A-Z] works but there are thousand of alternate capitals.
Thanks in advance for the hints.
The only Unicode support in JavaScript regex (at least ecmascript 5 and below) is matching specific code points of the form \uFFFF. You can use those in ranges in character classes. (see this question)
This of course, makes your task difficult. But I did find an online utility that says it:
Compiles character ranges suitable for use in JavaScript, using the
cset library.
Selecting "uppercase letter", then, produces this regex:
[A-ZÀ-ÖØ-ÞĀĂĄĆĈĊČĎĐĒĔĖĘĚĜĞĠĢĤĦĨĪĬĮİIJĴĶĹĻĽĿŁŃŅŇŊŌŎŐŒŔŖŘŚŜŞŠŢŤŦŨŪŬŮŰŲŴŶŸ-ŹŻŽƁ-ƂƄƆ-ƇƉ-ƋƎ-ƑƓ-ƔƖ-ƘƜ-ƝƟ-ƠƢƤƦ-ƧƩƬƮ-ƯƱ-ƳƵƷ-ƸƼDŽLJNJǍǏǑǓǕǗǙǛǞǠǢǤǦǨǪǬǮDZǴǶ-ǸǺǼǾȀȂȄȆȈȊȌȎȐȒȔȖȘȚȜȞȠȢȤȦȨȪȬȮȰȲȺ-ȻȽ-ȾɁɃ-ɆɈɊɌɎͰͲͶΆΈ-ΊΌΎ-ΏΑ-ΡΣ-ΫϏϒ-ϔϘϚϜϞϠϢϤϦϨϪϬϮϴϷϹ-ϺϽ-ЯѠѢѤѦѨѪѬѮѰѲѴѶѸѺѼѾҀҊҌҎҐҒҔҖҘҚҜҞҠҢҤҦҨҪҬҮҰҲҴҶҸҺҼҾӀ-ӁӃӅӇӉӋӍӐӒӔӖӘӚӜӞӠӢӤӦӨӪӬӮӰӲӴӶӸӺӼӾԀԂԄԆԈԊԌԎԐԒԔԖԘԚԜԞԠԢԱ-ՖႠ-ჅḀḂḄḆḈḊḌḎḐḒḔḖḘḚḜḞḠḢḤḦḨḪḬḮḰḲḴḶḸḺḼḾṀṂṄṆṈṊṌṎṐṒṔṖṘṚṜṞṠṢṤṦṨṪṬṮṰṲṴṶṸṺṼṾẀẂẄẆẈẊẌẎẐẒẔẞẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼẾỀỂỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỦỨỪỬỮỰỲỴỶỸỺỼỾἈ-ἏἘ-ἝἨ-ἯἸ-ἿὈ-ὍὙὛὝὟὨ-ὯᾸ-ΆῈ-ΉῘ-ΊῨ-ῬῸ-Ώℂℇℋ-ℍℐ-ℒℕℙ-ℝℤΩℨK-ℭℰ-ℳℾ-ℿⅅↃⰀ-ⰮⱠⱢ-ⱤⱧⱩⱫⱭ-ⱯⱲⱵⲀⲂⲄⲆⲈⲊⲌⲎⲐⲒⲔⲖⲘⲚⲜⲞⲠⲢⲤⲦⲨⲪⲬⲮⲰⲲⲴⲶⲸⲺⲼⲾⳀⳂⳄⳆⳈⳊⳌⳎⳐⳒⳔⳖⳘⳚⳜⳞⳠⳢꙀꙂꙄꙆꙈꙊꙌꙎꙐꙒꙔꙖꙘꙚꙜꙞꙢꙤꙦꙨꙪꙬꚀꚂꚄꚆꚈꚊꚌꚎꚐꚒꚔꚖꜢꜤꜦꜨꜪꜬꜮꜲꜴꜶꜸꜺꜼꜾꝀꝂꝄꝆꝈꝊꝌꝎꝐꝒꝔꝖꝘꝚꝜꝞꝠꝢꝤꝦꝨꝪꝬꝮꝹꝻꝽ-ꝾꞀꞂꞄꞆꞋA-Z]|\ud801[\udc00-\udc27]|\ud835[\udc00-\udc19\udc34-\udc4d\udc68-\udc81\udc9c\udc9e-\udc9f\udca2\udca5-\udca6\udca9-\udcac\udcae-\udcb5\udcd0-\udce9\udd04-\udd05\udd07-\udd0a\udd0d-\udd14\udd16-\udd1c\udd38-\udd39\udd3b-\udd3e\udd40-\udd44\udd46\udd4a-\udd50\udd6c-\udd85\udda0-\uddb9\uddd4-\udded\ude08-\ude21\ude3c-\ude55\ude70-\ude89\udea8-\udec0\udee2-\udefa\udf1c-\udf34\udf56-\udf6e\udf90-\udfa8\udfca]
I've also read (but not used personally) that the XRegExp javascript library is good and would allow you to use \p{Lu}.
Here's a link containing all Unicode capital letters. This is based on the GREP engine of Adobe InDesign CC2015, searching for the posix expression [[:upper:]]:
http://www.id-extras.com/uploads/AllUnicodeCapitals.html
With any JavaScript environment supporting the ECMAScript2018+ standard, you can use
/\p{Lu}/u
to test if a string contains any Unicode uppercase letter.
See a JavaScript demo:
console.log(/\p{Lu}/u.test('... Yes!'));
console.log(/\p{Lu}/u.test('Łąka'));
console.log(/\p{Lu}/u.test('и Витя с ними'));
console.log(/\p{Lu}/u.test('nonono'));

How to translate a Ruby regex to JavaScript?

In Ruby I have a regex to get a string formatted like "#xxx":
(/(?<!\S)#[A-Za-z0-9\-]+/)
I also need this regex on the client side, but JavaScript can't read this.
How can I change this regex to JavaScript?
Well you don't have lookbehind in JavaScript regular expression so you can't use (?<!\S) in a JavaScript regex.
You can use:
/(?:^|\s)(#[A-Za-z0-9-]+)/
And use captured group #1 for your matched text.
Alternatively you can use XRegExp library in JS and use the lookbehind feature.

help making a "universal" regex Javascript compatible

I found a very nice URL regex matcher on this site: http://daringfireball.net/2010/07/improved_regex_for_matching_urls . It states that it's free to use and that it's cross language compatible (including Javascript). First of all, I have to escape some of the slashes to get it to compile at all. When I do that, it works fine on Rubular.com (where I generally test regexes), with the strange side effect that each match has 5 fields: 1 is the url, and the extra 4 are empty. When I put this in JS, I get the error "Invalid Group". I am using Node.js if that makes any difference, but I wish I could understand that error. I'd like to cut back on the unnecessary empty match fields, but I don't even know where to begin diagnosing this beast. This is what I had after escaping:
(?xi)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’] ))
Actually, you don't need the first capturing group either; it's the same as the whole match in this case, and that can always be accessed via $&. You can change all the capturing groups to non-capturing by adding ?: after the opening parens:
/\b(?:(?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\((?:[^\s()<>]+|(\(?:[^\s()<>]+\)))*\))+(?:\((?:[^\s()<>]+|(?:\(?:[^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/i
That "invalid group" error is due to the inline modifiers (i.e., (?xi)) which, as #kirilloid observed, are not supported in JavaScript. Jon Gruber (the regex's author) was mistaken about that, as he was about JS supporting free-spacing mode.
Just FYI, the reason you had to escape the slashes is because you were using regex-literal notation, the most common form of which uses the forward-slash as the regex delimiter. In other words, it's the language (Ruby or JavaScript) that requires you to escape that particular character, not the regex. Some languages let you choose different regex delimiters, while others don't support regex literals at all.
But these are all language issues, not regex issues; the regex itself appears to work as advertised.
Seemes, that you copied it wrong.
http://www.regular-expressions.info/javascript.html
No mode modifiers to set matching options within the regular expression.
No regular expression comments
I.e. (?xi) at the beginning is useless.
x is useless at all for compacted RegExp
i can be replaced with flag
All these result in:
/\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/i
Tested and working in Google Chrome => should work in Node.js

Regular expression validator is not allowing non-english characters for \w

I have email field in my page, which i am validating using regular expression validator provided my asp.net. I am using same validation expression as given along with validator for emails i.e
ValidationExpression="\w+([-+.']\w+)#\w+([-.]\w+).\w+([-.]\w+)*"
It is working fine but problem comes when I tried giving non-english letters e.g.
è é ü û ă etc.
But my problem is, when i use same expression in javascript it allows these characters, even at server side also same expression allows these characters.
I think '\w' allows all alphanumeric characters as well as non english characters but
I dont know why it is not allowing when using it in validator.
Please suggest if I did anythig wrong.
\w means word character. And the definition of a word character may differ from implementation to implementation. Some do only use [A-Za-z0-9_] while others also include non-US-ASCII characters (see “ascii-only” in comparison of regular expression flavors).
If you want to make sure that the same characters are used, list them explicitly like [A-Za-z0-9_èéüûă].
This is a limitation of the ECMAScript standard; f.e. in .NET \w does also match non-english chars.
The simplest solution is to turn off client-side validation as you are working with ASP.NET, so the server-side validator (which uses the .NET implementation) will validate accordingly.
var r = new Regex(#"\w");
foreach(var m in r.Matches("è é ü û a"))
Console.WriteLine(m);
Output:
è
é
ü
û
a
It's a known issue: ASP.Net regular expression client-side validator is buggy for non-English characters. You may either use server-side validation (if it's an option), or write your own client-side CustomValidator.

How to find a URL within full text using regular expression

What is wrong with the following regular expression, which works in many online JavaScript regular expression testers (and RegEx Buddy), yet doesn't work in my application?
It is intended to replace URLs with a Hyperlink. The Javascript is found in a javascript file.
var fixed = text.replace(/\b(https?|ftp|file)://[-A-Z0-9+&##/%?=~_|$!:,.;]*[A-Z0-9+&##/%=~_|$]/ig, "<a href='$&' target='blank'>$&</a>");
Chrome, for example, complains that & is not valid (as does IE8). Is there some way to escape the ampersand (or whatever else is wrong), without resorting to the RegEx object?
Those testers let you input the regex in its raw form, but when you use it in source code you have to write it in the form of a string literal or (as is the case here) a regex literal. JavaScript uses forward-slashes for its regex-literal delimiters, so you have to escape any slashes in the regex itself to avoid confusing the interpreter.
Once you escape the slashes it should stop complaining about the ampersand. That was most likely caused by the malformed regex literal.
I recognize that regex, having used it myself the other day; you got it from RegexBuddy's Library, didn't you? If you had used RB's "Use" feature to create a JS-compatible regex, it would have escaped the slashes for you.
This works for me in Chrome
var fixed = text.replace(/(ftp|http|https):\/\/(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-\/]))?/igm, "<a href='$1' target='blank'>$1</a>");

Categories

Resources