How do I make a valid multiline Javascript REGEX - javascript

I have a block of text that can literally be anything. Somewhere in the text is something like [block] or [header]. I want to match against the following regex:
new RegExp("(.*)\\[" + config.wrapper+ "\\](.*)", "m");
If I write "hello[block]" it works perfect.
If I write "hello
[block]" it catches nothing.
What am I missing to properly match?
Note, for complicated reasons, I can't currently use a template engine like handlebars or any of those in this special case.

Change this:
(.*)
to this:
([\\s\\S]*)
because the . doesn't match line breaks.

Related

Regular Expression matching extra unwanted content

I'm trying to get a parameter stored in a html comment using regex. However when I execute the expression it return the widest string possible and not all the possible matches.
So I have some content that might include this string:
<!--url:/new--><!--title:My Title-->
I use the following simply expression to get the url I need:
/<!--url:(.*)-->/
The issue I have is that the result match part of the title which is of course valid but not what I was looking for
["<!--url:/new--><!--title:My Title-->", "/new--><!--title:My Title"]
There is workarounds I can use like making sure there is a line break after each parameter line but I prefer to have a solid regex and also of course understand what I missing out.
PS: Please comment if you come up with a better title.
Make the regex non-greedy:
/<!--url:(.*?)-->/
You can test this regex by clicking here:
Regex101

difference between ruby regex and javascript regex

I made this regular expression: /.net.(\w*)/
I'm trying to capture the qa in a string like this:
https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG
I'm doing .replace on it like so location.replace(/.net.(\w*)/,data.newName));
But instead of capturing qa, it captures .net, when I run the code in Javascript
According to this online regex tool made for ruby, it captures qa as intended
http://rubular.com/r/ItrG7BRNRn
What's the difference between Javascript regexes and Ruby regexes, and how can I make my regex work as intended in javascript?
Edit:
I changed my code to this:
var str = `https://xxxxxxxxxx.cloudfront.net/qa/club`;
var re = /\.net\/([^\/]*)\//;
console.log(data2.files[i].location.replace(re,'$1'+ "test"));
And instead of
https://dm7svtk8jb00c.cloudfront.net/test/club
I get this:
https://dm7svtk8jb00c.cloudfrontqatestclub
If I remove the $1 I get https://dm7svtk8jb00c.cloudfronttestclub, which is closer, but I want to keep the slashes.
This would be a better regex:
/\.net\/([^\/]*)\//
Remember that . will match any character, not the period character. For that you need to escape it with a leading backslash: \.
Also, \w will only match numbers, letters and underscores. You could quite legitimately have a dash in that part of the URL. Therefore you're far better off matching anything that isn't a forward slash.
I am not sure how Ruby works, but JavaScript replace will not just replace the capture group, it replaces the whole matched string. By adding another capture group, you can use $1 to add back in the string you want to keep.
...replace(/(.net.)(\w*)/,"$1" + data.newName");
You have to do that like this:
location.replace(/(\.net.)(\w*)/, '$1' + data.newName)
replace replaces the whole matched substring, not a particular group. Ruby works exactly in the same way:
ruby -e "puts 'https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG'.sub(/.net.(\w*)/, '##')"
https://xxxxxx.cloudfront##/club/Slide1.PNG
ruby -e "puts 'https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG'.sub(/(.net.)(\w*)/, '\\1' + '##')"
https://xxxxxx.cloudfront.net/##/club/Slide1.PNG
There's no difference (at least with the pattern you've provided). In both cases, the expression matches ".net/qa", with qa being the first capture group within the expression. Notice that even in your linked example the entire match is highlighted.
I'd recommend something like this:
location.replace(/(.net.)\w*/, "$1" + data.newName);
Or this, to be a bit safer:
location.replace(/(.net.)\w*/, function(m, a) { return a + data.newName; });
It's not so much a different between JavaScript and Ruby's implementations of regular expressions, it's your pattern that needs a bit of work. It's not tight enough.
You can use something like /\.net\/([^\/]+)/, which you can see in action at Rubular.
That returns the characters delimited by / following .net.
Regex patterns are very powerful, but they're also fraught with dangerous side-effects that open up big holes easily, causing false-positives, which can ruin results unexpectedly. Until you know them well, start simply, and test them every imaginable way. And, once you think you know them well, keep doing that; Patterns in code we write where I work are a particular hot-button for me, and I'm always finding holes in them in our code-reviews and requiring them to be tightened until they do exactly what the developer meant, not what they thought they meant.
While the pattern above works, I'd probably do it a bit differently in Ruby. Using the tools made for the job:
require 'uri'
URL = 'https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG'
uri = URI.parse(URL)
path = uri.path # => "/qa/club/Slide1.PNG"
path.split('/')[1] # => "qa"
Or, more succinctly:
URI.parse(URL).path.split('/')[1] # => "qa"

Invalid regular expression in javascript

I'm trying to find out if a string contains css code with this expression:
var pattern = new RegExp('\s(?[a-zA-Z-]+)\s[:]{1}\s*(?[a-zA-Z0-9\s.#]+)[;]{1}');
But I get "invalid regular expression" error on the line above...
What's wrong with it?
found the regex here: http://www.catswhocode.com/blog/10-regular-expressions-for-efficient-web-development
It's for PHP but it should work in javascript too, right?
What are the ? at the start of the two [a-zA-z-] blocks for? They look wrong to me.
The ? is unfortunately somewhat overload in regexp syntax, it can have three different meanings that I know of, and none of them match what I see in your example.
Also, your \s sequences need the backslash escaping because this is a string - they should look like \\s. To avoid escaping, just use the /.../ syntax instead of new Regexp("...").
That said, even that is insufficient - the regexp still produces an Invalid Group error in Chrome, probably related to the {1} sequences.
The ?'s are messing it up. I'm not sure what they are for.
/\s[a-zA-Z\-]+\s*:\s*[a-zA-Z0-9\s.#]+;/
worked for me (as far as compiling. I didn't test to see if it properly detected a CSS string).
Replace the quotes with / (slashes):
var pattern = /\s([a-zA-Z-]+)\s[:]{1}\s*([a-zA-Z0-9\s.#]+)[;]{1}/;
You also don't need the new RegExp() part either, which is why it's been removed; instead of using a quote or double quote to denote a string, JavaScript uses a slash / to denote a regular expression, which isn't a normal string.
That regular expression is very bad and I would avoid its source in the future. That said, I cleaned it up a bit and got the following result:
var pattern = /\s(?:[a-zA-Z-]+)\s*:\s*(?:[^;\n\r]+);/;
this matches something that looks like css, for example:
background-color: red;
Here's the fiddle to prove it, though I'd recommend to find a different solution to your problem. This is a very simple regex and it's not save to say that it is reliable.

javascript regex invalid quantifier error

I have the following javascript code:
if (url.match(/?rows.*?(?=\&)|.*/g)){
urlset= url.replace(/?rows.*?(?=\&)|.*/g,"rows="+document.getElementById('rowcount').value);
}else{
urlset= url+"&rows="+document.getElementById('rowcount').value;
}
I get the error invalid quantifier at the /?rows.*?.... This same regex works when testing it on http://www.pagecolumn.com/tool/regtest.htm using the test string
?srt=acc_pay&showfileCL=yes&shownotaryCL=yes&showclientCL=no&showborrowerCL=yes&shownotaryStatusCL=yes&showclientStatusCL=yes&showbillCL=yes&showfeeCL=yes&showtotalCL=yes&dir=asc&closingDate=12/01/2011&closingDate2=12/31/2011&sort=notaryname&pageno=0&rows=anything&Start=0','bodytable','xyz')
In this string, the above regex is supposed to match:
rows=anything
I actually don't even need the /? to get it to work, but if I don't put that into my javascript, it acts like it's not even regex... I'm terrible with Regex period, so this one has me pretty confused. And that error is the only one I am getting in Firefox's error console.
EDIT
Using that link I posted above, it seems that the leading / tries to match an actual forward slash instead of just marking the code as the beginning of a regex statement. So the ? is in there so that if it doesn't match the / to anything, it continues anyway.
RESOLUTION
Ok, so in the end, I had to change my regex to this:
/rows=.*(?=\&?)/g
This matched the word "rows=" followed by anything until it hit an ampersand or ran out of text.
You need to escape the first ?, since it has special meaning in a regex.
/\?rows.*?(?=\&)|.*/g
// ^---escaped
regtest.htm produces
new RegExp("?rows.?(?=\&)|.", "") returned a SyntaxError: invalid
quantifier
The value you put into the web site shouldn't have the / delimiters on the regex, so put in ?rows.*?(?=\&)|.* and it shows the same problem. Your JavaScript code should look like
re = /rows.*?(?=\&)|.*/g;
or similar (but that is a pointless regex as it matches everything). If you can't fix it, please describe what you want to match and show your JavaScript
You might consider refactoring you code to look something like this:
var url = "sort=notaryname&pageno=0&rows=anything&Start=0"
var rowCount = "foobar";
if (/[\?\&]rows=/.test(url))
{
url = url.replace(/([\?\&]rows=)[^\&]+/g,"$1"+rowCount);
}
console.log(url);
Output
sort=notaryname&pageno=0&rows=foobar&Start=0

URL regex does not work in javascript

I am trying to use John Gruber's URL regex in Javascript but NetBeans keeps telling me there is a syntax error and illegal errors:
var patt = "/(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])
|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]
{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|
(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|
(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:
'".,<>?«»“”‘’]))/";
Anyone know how to solve this?
As others have said, it's the double quote. But alternatively, you can just write the regexp as a literal in javascript (but then you need to escape the forward slashes in lines 1 and 3 instead).
var regexp = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/i;
I also moved the case-insensitive modifier to the end. Just because. (edit: Well, not just "because" - see Alan Moore's comment below)
Note: Whether you use a literal or a string, it has to be on 1 line.
put the whole expression in one line, and remove the quotes at the start and end so it looks like this var patt = /the-long-patttern/;, netbeans will still complain, but the browsers won't and thats what matters.
You should write it like this in NetBeans:
"(?i)\\b((?:[a-z][\\w-]+:(?:\\/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]"
+ "+[.][a-z]{2,4}\\/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))"
+ "+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’]))";

Categories

Resources