RegExp \A \z doesnt work, but thats what Rails 4 requires - javascript

I recently switched to Rails 4 and the security requirements no longer seem to allow the use of regular expressions in the style of /^..$/. The error states that regular expressions should instead be written in the style of /\A..\z/. Making this change seems to resolve all of my server side validation issues, but unfortunately it also broke all of my client side validation in javascript.
A simple example. I want to validate a username to be letters, number, or periods.
The old regex looked like /^[0-9a-zA-Z.]+$/ and worked both server side (Rails 3.x) and client side
new RegExp( /^[0-9a-zA-Z.]+$/ ).test('myuser.name') = true
The new regex looks like /\A[0-9a-zA-Z.]+\z/ and works server side but fails client side
new RegExp( /\A[0-9a-zA-Z.]+\z/ ).test('myser.name') = false
So I'm clearly doing something wrong, but I can't seem to find any explanations. I checked that \A..\z are valid regex to make sure that its not some Rails-specific hack and it seems to be legit.
Any ideas?

JavaScript does not support \A or \z in its RegExp.
Here's some raw data, first for JavaScript:
var a = "hello\nworld"
(/^world/).test(a) // false
(/^world/m).test(a) // true
(/hello$/).test(a) // false
(/hello$/m).test(a) // true
Next, for ruby:
a = "hello\nworld"
a.match(/^world/) # => #<MatchData "world">
a.match(/\Aworld/) # => nil
a.match(/hello$/) # => #<MatchData "hello">
a.match(/hello\z/) # => nil
From this, we see that ruby's \A and \z are equivalent to JavaScript's ^ and $ as long as you don't use the multiline m modifier. If you are concerned about the input having multiple lines, you're simply going to have to translate your regular expressions between these two languages with respect to these matching characters.

As a workaround for \A\a \Z\z lack of support, you can add a "sentinel" character (or characters) to the end of the input string.
Please, note that:
the sentinel character(s) should be something which very low chances of being used in the input string.
should not be used in sensitive stuff (such as user verification or something) since a workaround like this can be easily exploitable.
In this specific case, since only [0-9a-zA-Z.] are allowed, something like ¨ or ~ is ok.
Example:
let inputString = 'myuser.name';
inputString = '¨0' + inputString + '¨1';
let result = new RegExp( /¨0[0-9a-zA-Z.]+(?=¨1)/ ).test(inputString);
inputString.replace(/^¨0/, '').replace(/¨1$/, '');
If you're worried that, for some reason, the input string might have the selected characters you're using, you can escape them.

(?<![\r\n])^ emulates \A, match absolute string start.
$(?![\r\n]) emulates \z, match absolute string end.
(source)
(?=[\r\n]?$(?![\r\n])) emulates \Z, match string end (before final newline if present).
If all of your line endings are \n, you can simplify the above to:
\A: (?<!\n)^
\z: $(?!\n)
\Z: (?=\n?$(?!\n))
Note: JavaScript has always supported lookahead (used for \z & \Z emulation above), but lookbehind (used for \A emulation above) support is newer, and still limited due to Safari / WebKit, see caniuse.com and bugs.webkit.org for details. Node.js has had lookbehind support since v9.

Related

Safari Regex error "invalid regular expression invalid group specifier name" [duplicate]

In my Javascript code, this regex /(?<=\/)([^#]+)(?=#*)/ works fine in Chrome, but in safari, I get:
Invalid regular expression: invalid group specifier name
Any ideas?
Looks like Safari doesn't support lookbehind yet (that is, your (?<=\/)). One alternative would be to put the / that comes before in a non-captured group, and then extract only the first group (the content after the / and before the #).
/(?:\/)([^#]+)(?=#*)/
Also, (?=#*) is odd - you probably want to lookahead for something (such as # or the end of the string), rather than a * quantifier (zero or more occurrences of #). It might be better to use something like
/(?:\/)([^#]+)(?=#|$)/
or just omit the lookahead entirely (because the ([^#]+) is greedy), depending on your circumstances.
The support for RegExp look behind assertions as been issued by web kit:
Check link: https://github.com/WebKit/WebKit/pull/7109
Regex ?<= not supported Safari iOS, we can use ?:
Note: / or 1st reference letter that comes before in a non-captured group
See detail: https://caniuse.com/js-regexp-lookbehind
let str = "Get from Slash/to Next hashtag #GMK"
let workFineOnChromeOnly = str?.match(/(?<=\/)([^#]+)(?=#*)/g)
console.log("❌ Work Fine On Chrome Only", workFineOnChromeOnly )
let workFineSafariToo = str?.match(/(?:\/)([^#]+)(?=#*)/g)
console.log("✔️ Work Fine Safari too", workFineSafariToo )
Just wanted to put this out there for anyone who stumbles across this issue and can't find anything...
I had the same issue, and it turned out to be a RegEx expression in one of my dependencies, namely Discord.js .
Luckily I no longer needed that package but if you do, consider putting an issue out there or something (maybe you shouldn't even be running discord.js in your frontend react app).

Converting Ruby regex to JavaScript one

I would like to convert this Ruby regex to JavaScript one:
/\<br \/\>\<a href\=(.*?)\>([0-9]+\:[0-9]+)\<\/a\> \<a href\=\'.*?\' target\=\_blank\>(.*?)(?=\<\/a\>\<br\>\<p.*?\<\/p\>\<br \/\>\<a href\=.*?\>([0-9]+\:[0-9]+)\<\/a\> \<a href\=\'.*?\' target\=\_blank\>.*?\<\/a\>.*?\<br \/\>)/m
It works perfectly in Ruby, but not in the Chrome JavaScript console. Then I will use it to extract some information from a webpage source HTML code (document.body.innerHTML) with a JavaScript function using this scan method described here: JavaScript equivalent of Ruby's String#scan
I think the lookahead (?= ) may be problematic in JavaScript, on the top of that it contains a capture group. Can it be converted at all?
In JavaScript you could do the following:
var re = new RegExp("<br /><a href=(.*?)>([0-9]+:[0-9]+)</a> <a href='.*?' target=_blank>(.*?)(?=</a><br><p.*?</p><br /><a href=.*?>([0-9]+:[0-9]+)</a> <a href='.*?' target=_blank>.*?</a>.*?<br />)", "m");
But this likely will not work the same because the m modifier in Ruby makes the . match all characters while in JavaScript this means multi-line mode where ^ and $ match at the start and end of each line.
So if you really think you need regex to do this and the HTML data you are matching has or could have line breaks, you will need to remove the m flag and replace .*? with a workaround such as [\S\s]*? to match those characters as well since JavaScript does not have a dotall mode modifier that works like the Ruby m modifier.

difference between ruby regex and javascript regex

I made this regular expression: /.net.(\w*)/
I'm trying to capture the qa in a string like this:
https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG
I'm doing .replace on it like so location.replace(/.net.(\w*)/,data.newName));
But instead of capturing qa, it captures .net, when I run the code in Javascript
According to this online regex tool made for ruby, it captures qa as intended
http://rubular.com/r/ItrG7BRNRn
What's the difference between Javascript regexes and Ruby regexes, and how can I make my regex work as intended in javascript?
Edit:
I changed my code to this:
var str = `https://xxxxxxxxxx.cloudfront.net/qa/club`;
var re = /\.net\/([^\/]*)\//;
console.log(data2.files[i].location.replace(re,'$1'+ "test"));
And instead of
https://dm7svtk8jb00c.cloudfront.net/test/club
I get this:
https://dm7svtk8jb00c.cloudfrontqatestclub
If I remove the $1 I get https://dm7svtk8jb00c.cloudfronttestclub, which is closer, but I want to keep the slashes.
This would be a better regex:
/\.net\/([^\/]*)\//
Remember that . will match any character, not the period character. For that you need to escape it with a leading backslash: \.
Also, \w will only match numbers, letters and underscores. You could quite legitimately have a dash in that part of the URL. Therefore you're far better off matching anything that isn't a forward slash.
I am not sure how Ruby works, but JavaScript replace will not just replace the capture group, it replaces the whole matched string. By adding another capture group, you can use $1 to add back in the string you want to keep.
...replace(/(.net.)(\w*)/,"$1" + data.newName");
You have to do that like this:
location.replace(/(\.net.)(\w*)/, '$1' + data.newName)
replace replaces the whole matched substring, not a particular group. Ruby works exactly in the same way:
ruby -e "puts 'https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG'.sub(/.net.(\w*)/, '##')"
https://xxxxxx.cloudfront##/club/Slide1.PNG
ruby -e "puts 'https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG'.sub(/(.net.)(\w*)/, '\\1' + '##')"
https://xxxxxx.cloudfront.net/##/club/Slide1.PNG
There's no difference (at least with the pattern you've provided). In both cases, the expression matches ".net/qa", with qa being the first capture group within the expression. Notice that even in your linked example the entire match is highlighted.
I'd recommend something like this:
location.replace(/(.net.)\w*/, "$1" + data.newName);
Or this, to be a bit safer:
location.replace(/(.net.)\w*/, function(m, a) { return a + data.newName; });
It's not so much a different between JavaScript and Ruby's implementations of regular expressions, it's your pattern that needs a bit of work. It's not tight enough.
You can use something like /\.net\/([^\/]+)/, which you can see in action at Rubular.
That returns the characters delimited by / following .net.
Regex patterns are very powerful, but they're also fraught with dangerous side-effects that open up big holes easily, causing false-positives, which can ruin results unexpectedly. Until you know them well, start simply, and test them every imaginable way. And, once you think you know them well, keep doing that; Patterns in code we write where I work are a particular hot-button for me, and I'm always finding holes in them in our code-reviews and requiring them to be tightened until they do exactly what the developer meant, not what they thought they meant.
While the pattern above works, I'd probably do it a bit differently in Ruby. Using the tools made for the job:
require 'uri'
URL = 'https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG'
uri = URI.parse(URL)
path = uri.path # => "/qa/club/Slide1.PNG"
path.split('/')[1] # => "qa"
Or, more succinctly:
URI.parse(URL).path.split('/')[1] # => "qa"

Remove a long dash from a string in JavaScript?

I've come across an error in my web app that I'm not sure how to fix.
Text boxes are sending me the long dash as part of their content (you know, the special long dash that MS Word automatically inserts sometimes). However, I can't find a way to replace it; since if I try to copy that character and put it into a JavaScript str.replace statement, it doesn't render right and it breaks the script.
How can I fix this?
The specific character that's killing it is —.
Also, if it helps, I'm passing the value as a GET parameter, and then encoding it in XML and sending it to a server.
This code might help:
text = text.replace(/\u2013|\u2014/g, "-");
It replaces all – (–) and — (—) symbols with simple dashes (-).
DEMO: http://jsfiddle.net/F953H/
That character is call an Em Dash. You can replace it like so:
str.replace('\u2014', '');​​​​​​​​​​
Here is an example Fiddle: http://jsfiddle.net/x67Ph/
The \u2014 is called a unicode escape sequence. These allow to to specify a unicode character by its code. 2014 happens to be the Em Dash.
There are three unicode long-ish dashes you need to worry about: http://en.wikipedia.org/wiki/Dash
You can replace unicode characters directly by using the unicode escape:
'—my string'.replace( /[\u2012\u2013\u2014\u2015]/g, '' )
There may be more characters behaving like this, and you may want to reuse them in html later. A more generic way to to deal with it could be to replace all 'extended characters' with their html encoded equivalent. You could do that Like this:
[yourstring].replace(/[\u0080-\uC350]/g,
function(a) {
return '&#'+a.charCodeAt(0)+';';
}
);
With the ECMAScript 2018 standard, JavaScript RegExp now supports Unicode property (or, category) classes. One of them, \p{Dash}, matches any Unicode character points that are dashes:
/\p{Dash}/gu
In ES5, the equivalent expression is:
/[-\u058A\u05BE\u1400\u1806\u2010-\u2015\u2053\u207B\u208B\u2212\u2E17\u2E1A\u2E3A\u2E3B\u2E40\u2E5D\u301C\u3030\u30A0\uFE31\uFE32\uFE58\uFE63\uFF0D]|\uD803\uDEAD/g
See the Unicode Utilities reference.
Here are some JavaScript examples:
const text = "Dashes: \uFF0D\uFE63\u058A\u1400\u1806\u2010-\u2013\uFE32\u2014\uFE58\uFE31\u2015\u2E3A\u2E3B\u2053\u2E17\u2E40\u2E5D\u301C\u30A0\u2E1A\u05BE\u2212\u207B\u208B\u3030𐺭";
const es5_dash_regex = /[-\u058A\u05BE\u1400\u1806\u2010-\u2015\u2053\u207B\u208B\u2212\u2E17\u2E1A\u2E3A\u2E3B\u2E40\u2E5D\u301C\u3030\u30A0\uFE31\uFE32\uFE58\uFE63\uFF0D]|\uD803\uDEAD/g;
console.log(text.replace(es5_dash_regex, '-')); // Normalize each dash to ASCII hyphen
// => Dashes: ----------------------------
To match one or more dashes and replace with a single char (or remove in one go):
/\p{Dash}+/gu
/(?:[-\u058A\u05BE\u1400\u1806\u2010-\u2015\u2053\u207B\u208B\u2212\u2E17\u2E1A\u2E3A\u2E3B\u2E40\u2E5D\u301C\u3030\u30A0\uFE31\uFE32\uFE58\uFE63\uFF0D]|\uD803\uDEAD)+/g

Regex validation rules

I'm writing a database backup function as part of my school project.
I need to write a regex rule so the database backup name can only contain legal characters.
By 'legal' I mean a string that doesn't contain ANY symbols or spaces. Only letters from the alphabet and numbers.
An example of a valid string would be '31Jan2012' or '63927jkdfjsdbjk623' or 'hello123backup'.
Here's my JS code so far:
// Check if the input box contains the charactes a-z, A-Z ,or 0-9 with a regular expression.
function checkIfContainsNumbersOrCharacters(elem, errorMessage){
var regexRule = new RegExp("^[\w]+$");
if(regexRule.test( $(elem).val() ) ){
return true;
}else{
alert(errorMessage);
return false;
}
}
//call the function
checkIfContainsNumbersOrCharacters("#backup-name", "Input can only contain the characters a-z or 0-9.");
I've never really used regular expressions before though, however after a quick bit of googling i found this tool, from which I wrote the following regex rule:
^[\w]+$
^ = start of string
[/w] = a-z/A-Z/0-9
'+' = characters after the string.
When running my function, the whatever string I input seems to return false :( is my code wrong? or am I not using regex rules correctly?
The problem here is, that when writing \w inside a string, you escape the w, and the resulting regular expression looks like this: ^[w]+$, containing the w as a literal character. When creating a regular expression with a string argument passed to the RegExp constructor, you need to escape the backslash, like so: new RegExp("^[\\w]+$"), which will create the regex you want.
There is a way to avoid that, using the shorthand notation provided by JavaScript: var regex = /^[\w]+$/; which does not need any extra escaping.
It can be simpler. This works:
function checkValid(name) {
return /^\w+$/.test(name);
}
/^\w+$/ is the literal notation for new RegExp(). Since the .test function returns a boolean, you only need to return its result. This also reads better than new RegExp("^\\w+$"), and you're less likely to goof up (thanks #x3ro for pointing out the need for two backslashes in strings).
The \w is a synonym for [[:alnum:]], which matches a single character of the alnum class. Note that using character classes means that you may match characters that are not part of the ASCII character encoding, which may or may not be what you want. If what you really intend to match is [0-9A-Za-z], then that's what you should use.
When you declare the regex as a string parameter to the RegExp constructor, you need to escape it. Both
var regexRule = new RegExp("^[\\w]+$");
...and...
var regexRule = new RegExp(/^[\w]+$/);
will work.
Keep in mind though, that client side validation for database data will never be enough, as the validation is easily bypassed by disabling javascript in the browser, and invalid/malicious data can reach your DB. You need to validate the data on the server side, but preventing the request with invalid data, but validating client side is good practice.
This is the official spec: http://dev.mysql.com/doc/refman/5.0/en/identifiers.html but it's not very easily converted to a regular expression. Just a regular expression won't do it as there are also reserved words.
Why not just put it in the query (don't forget to escape it properly) and let MySQL give you an error? There might for instance be a bug in the MySQL version you're using, and even though your check is correct, MySQL might still refuse.

Categories

Resources