Converting Ruby regex to JavaScript one - javascript

I would like to convert this Ruby regex to JavaScript one:
/\<br \/\>\<a href\=(.*?)\>([0-9]+\:[0-9]+)\<\/a\> \<a href\=\'.*?\' target\=\_blank\>(.*?)(?=\<\/a\>\<br\>\<p.*?\<\/p\>\<br \/\>\<a href\=.*?\>([0-9]+\:[0-9]+)\<\/a\> \<a href\=\'.*?\' target\=\_blank\>.*?\<\/a\>.*?\<br \/\>)/m
It works perfectly in Ruby, but not in the Chrome JavaScript console. Then I will use it to extract some information from a webpage source HTML code (document.body.innerHTML) with a JavaScript function using this scan method described here: JavaScript equivalent of Ruby's String#scan
I think the lookahead (?= ) may be problematic in JavaScript, on the top of that it contains a capture group. Can it be converted at all?

In JavaScript you could do the following:
var re = new RegExp("<br /><a href=(.*?)>([0-9]+:[0-9]+)</a> <a href='.*?' target=_blank>(.*?)(?=</a><br><p.*?</p><br /><a href=.*?>([0-9]+:[0-9]+)</a> <a href='.*?' target=_blank>.*?</a>.*?<br />)", "m");
But this likely will not work the same because the m modifier in Ruby makes the . match all characters while in JavaScript this means multi-line mode where ^ and $ match at the start and end of each line.
So if you really think you need regex to do this and the HTML data you are matching has or could have line breaks, you will need to remove the m flag and replace .*? with a workaround such as [\S\s]*? to match those characters as well since JavaScript does not have a dotall mode modifier that works like the Ruby m modifier.

Related

Javascript Regex vs Java Regex

I have a a regex in Javascript that works great: /:([\w]+):/g
I am working on converting my javascript app to java, and I know to escape the \ using \ i.e. /:([\\w]+):/g, yet my tests are still returning no match for the string "hello :testsmilie: how are you?"
Pattern smiliePattern = Pattern.compile("/:([\\w]+):/g");
Matcher m = smiliePattern.matcher(message);
if(m.find()) {
System.println(m.group(0));
}
In javascript it returns ":testsmilie:" just fine, so i'm not sure what the difference is. Any help would be much appreciated!
Your regex in java can just be :
Pattern.compile(":[^:]+:")
Which match : followed by one or more no two dots : followed by :
Or if you want to use \w you can use :
Pattern.compile(":\\w+:")
If you note you don't need parenthesis of group (), so to get the result you can just use :
System.out.println(m.group());
You should learn how is made a Javascript regex, because the / are the delimiters of the real regex, and g is a modifier for global
In Java the equivalent is: :([\\w]+):, and no need of global flag as you just need to call multiple times .find() to get all the matches
You should take a look at regex101 which is a good website to test regex

Translate javascript string replace function to php

I found in this site a very basic javascript function to encode text. Looking at the source code this is the string replacement code:
txtEscape = txtEscape.replace(/%/g,'#');
So the string stackoverflow becomes #73#74#61#63#6B#6F#76#65#72#66#6C#6F#77
I need a function that does the same elementary encryption in php, but I really don't understand what the /%/g does. I think in php the same function would be something like:
str_replace(/%/g,"#","stackoverflow");
But of course the /%/g doesn't work
Replace a character
Indeed, the PHP function is str_replace (there are many functions for replacements). But, the regex expression is not the same :)
See official documentation: http://php.net/manual/en/function.str-replace.php
In your case, you want to replace a letter % by #.
g is a regex flag. And // are delimiter to activate the regex mode :)
The "g" flag indicates that the regular expression should be tested against all possible matches in a string. Without the g flag, it'll only test for the first.
<?php
echo str_replace('%', '#', '%73%74%61%63%6B%6F%76%65%72%66%6C%6F%77');
?>
In PHP, you can use flags with regex: preg_replace & cie.
Escape
See this post: PHP equivalent for javascript escape/unescape
There are two functions stringToHex and hexToString to do what you want :)
Indeed, the site you provided use espace function to code the message:
document.write(unescape(str.replace(/#/g,'%')));

RegExp \A \z doesnt work, but thats what Rails 4 requires

I recently switched to Rails 4 and the security requirements no longer seem to allow the use of regular expressions in the style of /^..$/. The error states that regular expressions should instead be written in the style of /\A..\z/. Making this change seems to resolve all of my server side validation issues, but unfortunately it also broke all of my client side validation in javascript.
A simple example. I want to validate a username to be letters, number, or periods.
The old regex looked like /^[0-9a-zA-Z.]+$/ and worked both server side (Rails 3.x) and client side
new RegExp( /^[0-9a-zA-Z.]+$/ ).test('myuser.name') = true
The new regex looks like /\A[0-9a-zA-Z.]+\z/ and works server side but fails client side
new RegExp( /\A[0-9a-zA-Z.]+\z/ ).test('myser.name') = false
So I'm clearly doing something wrong, but I can't seem to find any explanations. I checked that \A..\z are valid regex to make sure that its not some Rails-specific hack and it seems to be legit.
Any ideas?
JavaScript does not support \A or \z in its RegExp.
Here's some raw data, first for JavaScript:
var a = "hello\nworld"
(/^world/).test(a) // false
(/^world/m).test(a) // true
(/hello$/).test(a) // false
(/hello$/m).test(a) // true
Next, for ruby:
a = "hello\nworld"
a.match(/^world/) # => #<MatchData "world">
a.match(/\Aworld/) # => nil
a.match(/hello$/) # => #<MatchData "hello">
a.match(/hello\z/) # => nil
From this, we see that ruby's \A and \z are equivalent to JavaScript's ^ and $ as long as you don't use the multiline m modifier. If you are concerned about the input having multiple lines, you're simply going to have to translate your regular expressions between these two languages with respect to these matching characters.
As a workaround for \A\a \Z\z lack of support, you can add a "sentinel" character (or characters) to the end of the input string.
Please, note that:
the sentinel character(s) should be something which very low chances of being used in the input string.
should not be used in sensitive stuff (such as user verification or something) since a workaround like this can be easily exploitable.
In this specific case, since only [0-9a-zA-Z.] are allowed, something like ¨ or ~ is ok.
Example:
let inputString = 'myuser.name';
inputString = '¨0' + inputString + '¨1';
let result = new RegExp( /¨0[0-9a-zA-Z.]+(?=¨1)/ ).test(inputString);
inputString.replace(/^¨0/, '').replace(/¨1$/, '');
If you're worried that, for some reason, the input string might have the selected characters you're using, you can escape them.
(?<![\r\n])^ emulates \A, match absolute string start.
$(?![\r\n]) emulates \z, match absolute string end.
(source)
(?=[\r\n]?$(?![\r\n])) emulates \Z, match string end (before final newline if present).
If all of your line endings are \n, you can simplify the above to:
\A: (?<!\n)^
\z: $(?!\n)
\Z: (?=\n?$(?!\n))
Note: JavaScript has always supported lookahead (used for \z & \Z emulation above), but lookbehind (used for \A emulation above) support is newer, and still limited due to Safari / WebKit, see caniuse.com and bugs.webkit.org for details. Node.js has had lookbehind support since v9.

difference between ruby regex and javascript regex

I made this regular expression: /.net.(\w*)/
I'm trying to capture the qa in a string like this:
https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG
I'm doing .replace on it like so location.replace(/.net.(\w*)/,data.newName));
But instead of capturing qa, it captures .net, when I run the code in Javascript
According to this online regex tool made for ruby, it captures qa as intended
http://rubular.com/r/ItrG7BRNRn
What's the difference between Javascript regexes and Ruby regexes, and how can I make my regex work as intended in javascript?
Edit:
I changed my code to this:
var str = `https://xxxxxxxxxx.cloudfront.net/qa/club`;
var re = /\.net\/([^\/]*)\//;
console.log(data2.files[i].location.replace(re,'$1'+ "test"));
And instead of
https://dm7svtk8jb00c.cloudfront.net/test/club
I get this:
https://dm7svtk8jb00c.cloudfrontqatestclub
If I remove the $1 I get https://dm7svtk8jb00c.cloudfronttestclub, which is closer, but I want to keep the slashes.
This would be a better regex:
/\.net\/([^\/]*)\//
Remember that . will match any character, not the period character. For that you need to escape it with a leading backslash: \.
Also, \w will only match numbers, letters and underscores. You could quite legitimately have a dash in that part of the URL. Therefore you're far better off matching anything that isn't a forward slash.
I am not sure how Ruby works, but JavaScript replace will not just replace the capture group, it replaces the whole matched string. By adding another capture group, you can use $1 to add back in the string you want to keep.
...replace(/(.net.)(\w*)/,"$1" + data.newName");
You have to do that like this:
location.replace(/(\.net.)(\w*)/, '$1' + data.newName)
replace replaces the whole matched substring, not a particular group. Ruby works exactly in the same way:
ruby -e "puts 'https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG'.sub(/.net.(\w*)/, '##')"
https://xxxxxx.cloudfront##/club/Slide1.PNG
ruby -e "puts 'https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG'.sub(/(.net.)(\w*)/, '\\1' + '##')"
https://xxxxxx.cloudfront.net/##/club/Slide1.PNG
There's no difference (at least with the pattern you've provided). In both cases, the expression matches ".net/qa", with qa being the first capture group within the expression. Notice that even in your linked example the entire match is highlighted.
I'd recommend something like this:
location.replace(/(.net.)\w*/, "$1" + data.newName);
Or this, to be a bit safer:
location.replace(/(.net.)\w*/, function(m, a) { return a + data.newName; });
It's not so much a different between JavaScript and Ruby's implementations of regular expressions, it's your pattern that needs a bit of work. It's not tight enough.
You can use something like /\.net\/([^\/]+)/, which you can see in action at Rubular.
That returns the characters delimited by / following .net.
Regex patterns are very powerful, but they're also fraught with dangerous side-effects that open up big holes easily, causing false-positives, which can ruin results unexpectedly. Until you know them well, start simply, and test them every imaginable way. And, once you think you know them well, keep doing that; Patterns in code we write where I work are a particular hot-button for me, and I'm always finding holes in them in our code-reviews and requiring them to be tightened until they do exactly what the developer meant, not what they thought they meant.
While the pattern above works, I'd probably do it a bit differently in Ruby. Using the tools made for the job:
require 'uri'
URL = 'https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG'
uri = URI.parse(URL)
path = uri.path # => "/qa/club/Slide1.PNG"
path.split('/')[1] # => "qa"
Or, more succinctly:
URI.parse(URL).path.split('/')[1] # => "qa"

Remove a long dash from a string in JavaScript?

I've come across an error in my web app that I'm not sure how to fix.
Text boxes are sending me the long dash as part of their content (you know, the special long dash that MS Word automatically inserts sometimes). However, I can't find a way to replace it; since if I try to copy that character and put it into a JavaScript str.replace statement, it doesn't render right and it breaks the script.
How can I fix this?
The specific character that's killing it is —.
Also, if it helps, I'm passing the value as a GET parameter, and then encoding it in XML and sending it to a server.
This code might help:
text = text.replace(/\u2013|\u2014/g, "-");
It replaces all – (–) and — (—) symbols with simple dashes (-).
DEMO: http://jsfiddle.net/F953H/
That character is call an Em Dash. You can replace it like so:
str.replace('\u2014', '');​​​​​​​​​​
Here is an example Fiddle: http://jsfiddle.net/x67Ph/
The \u2014 is called a unicode escape sequence. These allow to to specify a unicode character by its code. 2014 happens to be the Em Dash.
There are three unicode long-ish dashes you need to worry about: http://en.wikipedia.org/wiki/Dash
You can replace unicode characters directly by using the unicode escape:
'—my string'.replace( /[\u2012\u2013\u2014\u2015]/g, '' )
There may be more characters behaving like this, and you may want to reuse them in html later. A more generic way to to deal with it could be to replace all 'extended characters' with their html encoded equivalent. You could do that Like this:
[yourstring].replace(/[\u0080-\uC350]/g,
function(a) {
return '&#'+a.charCodeAt(0)+';';
}
);
With the ECMAScript 2018 standard, JavaScript RegExp now supports Unicode property (or, category) classes. One of them, \p{Dash}, matches any Unicode character points that are dashes:
/\p{Dash}/gu
In ES5, the equivalent expression is:
/[-\u058A\u05BE\u1400\u1806\u2010-\u2015\u2053\u207B\u208B\u2212\u2E17\u2E1A\u2E3A\u2E3B\u2E40\u2E5D\u301C\u3030\u30A0\uFE31\uFE32\uFE58\uFE63\uFF0D]|\uD803\uDEAD/g
See the Unicode Utilities reference.
Here are some JavaScript examples:
const text = "Dashes: \uFF0D\uFE63\u058A\u1400\u1806\u2010-\u2013\uFE32\u2014\uFE58\uFE31\u2015\u2E3A\u2E3B\u2053\u2E17\u2E40\u2E5D\u301C\u30A0\u2E1A\u05BE\u2212\u207B\u208B\u3030𐺭";
const es5_dash_regex = /[-\u058A\u05BE\u1400\u1806\u2010-\u2015\u2053\u207B\u208B\u2212\u2E17\u2E1A\u2E3A\u2E3B\u2E40\u2E5D\u301C\u3030\u30A0\uFE31\uFE32\uFE58\uFE63\uFF0D]|\uD803\uDEAD/g;
console.log(text.replace(es5_dash_regex, '-')); // Normalize each dash to ASCII hyphen
// => Dashes: ----------------------------
To match one or more dashes and replace with a single char (or remove in one go):
/\p{Dash}+/gu
/(?:[-\u058A\u05BE\u1400\u1806\u2010-\u2015\u2053\u207B\u208B\u2212\u2E17\u2E1A\u2E3A\u2E3B\u2E40\u2E5D\u301C\u3030\u30A0\uFE31\uFE32\uFE58\uFE63\uFF0D]|\uD803\uDEAD)+/g

Categories

Resources