RegExp in JavaScript, when a quantifier is part of the pattern - javascript

I have been trying to use a regexp that matches any text that is between a caret, less than and a greater than, caret.
So it would look like: ^< THE TEXT I WANT SELECTED >^
I have tried something like this, but it isn't working: ^<(.*?)>^
I'm assuming this is possible, right? I think the reason I have been having such a tough time is because the caret serves as a quantifier. Thanks for any help I get!
Update
Just so everyone knows, they following from am not i am worked
/\^<(.*?)>\^/
But, it turned out that I was getting html entities since I was getting my string by using the .innerHTML property. In other words,
> ... >
< ... <
To solve this, my regexp actually looks like this:
\^<(.*?)((.|\n)*)>\^
This includes the fact that the string in between should be any character or new line. Thanks!

You need to escape the ^ symbol since it has special meaning in a JavaScript regex.
/\^<(.*?)>\^/
In a JavaScript regex, the ^ means beginning of the string, unless the m modifier was used, in which case it means beginning of the line.

This should work:
\^<(.*?)>\^
In a regex, if you want to use a character that has a special meaning (caret, brackets, pipe, ...), you have to escape it using a backslash. For example, (\w\b)*\w\. will select a sequence of words terminated by a dot.
Careful!
If you have to pass the regex pattern as a string, i.e. there's no regex literal like in javascript or perl, you may have to use a double backslash, which the programming language will escape to a single one, which will then be processed by the regex engine.
Same regex in multiple languages:
Python:
import re
myRegex=re.compile(r"\^<(.*?)>\^") # The r before the string prevents backslash escaping
PHP:
$result=preg_match("/\\^<(.*?)>\\^/",$subject); // Notice the double backslashes here?
JavaScript:
var myRegex=/\^<(.*?)>\^/,
subject="^<blah example>^";
subject.match(myRegex);
If you tell us what programming language you're writing in, we'll be able to give you some finished code to work with.
Edit: Whoops, didn't even notice this was tagged as javascript. Then, you don't have to worry about double backslash at all.
Edit 2: \b represent a word boundary. Though I agree yours is what I would have used myself.

Related

How to prevent regex characters from being changed after page is rendered?

I'am stuck after searching and trying several tests, but just can't figure out how to fix the following issue.
I use these characters \x3c, \x3e and \x22 in a regEx and save is in a variable in *.component.ts but when I use the variable in the markup/HTML, it turns it into <, > and ". the result is that my Pattern doesn't work as expected.
Here is one of test on regex101.com and as you can see it works as it should be:
^(?=.*[a-zA-Z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-])[A-Za-z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-]{8,50}$
How can I prevent this and keep the characters as they are in the original when the page is rendered? Is it a behavior of TypeScript or JavaScript browser engine or what? Any hint would be great.
First of all, you need to use double backslashes to introduce literal backslashes into the regex patterns. I.e. if you write "\x22" as a string literal, it is in fact a mere ". So, to define \x22 in a string literal, write "\\x22".
Then, you have
^(?=.*[a-zA-Z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-])[A-Za-z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-]{8,50}$
The lookahead here is redundant because it requires the same set of chars as is required by the consuming part. The lookahead can be removed, or better replaced with the one you need, (?=[^A-Z]*[A-Z]), requiring at least 1 uppercase ASCII letter:
^(?=[^A-Z]*[A-Z])[A-Za-z\d!\x22#$%&\'()*+,.:;\x3c=\x3e?#[\]^_`{|}~/\\-]{8,50}$
As a string literal:
"^(?=[^A-Z]*[A-Z])[A-Za-z\\d!\\x22#$%&'()*+,.:;\\x3c=\\x3e?#[\\]^_`{|}~/\\\\-]{8,50}$"
See the regex demo.

Unable to find a string matching a regex pattern

While trying to submit a form a javascript regex validation always proves to be false for a string.
Regex:- ^(([a-zA-Z]:)|(\\\\{2}\\w+)\\$?)(\\\\(\\w[\\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
I have tried following strings against it
abc.jpg,
abc:.jpg,
a:.jpg,
a:asdas.jpg,
What string could possible match this regex ?
This regex won't match against anything because of that $? in the middle of the string.
Apparently using the optional modifier ? on the end string symbol $ is not correct (if you paste it on https://regex101.com/ it will give you an error indeed). If the javascript parser ignores the error and keeps the regex as it is this still means you are going to match an end string in the middle of a string which is supposed to continue.
Unescaped it was supposed to match a \$ (dollar symbol) but as it is written it won't work.
If you want your string to be accepted at any cost you can probably use Firebug or a similar developer tool and edit the string inside the javascript code (this, assuming there's no server side check too and assuming it's not wrong aswell). If you ignore the $? then a matching string will be \\\\w\\\\ww.jpg (but since the . is unescaped even \\\\w\\\\ww%jpg is a match)
Of course, I wrote this answer assuming the escaping is indeed the one you showed in the question. If you need to find a matching pattern for the correctly escaped one ^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(\.jpeg|\.JPEG|\.jpg|\.JPG)$ then you can use this tool to find one http://fent.github.io/randexp.js/ (though it will find weird matches). A matching pattern is c:\zz.jpg
If you are just looking for a regular expression to match what you got there, go ahead and test this out:
(\w+:?\w*\.[jpe?gJPE?G]+,)
That should match exactly what you are looking for. Remove the optional comma at the end if you feel like it, of course.
If you remove escape level, the actual regex is
^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
After ^start the first pipe (([a-zA-Z]:)|(\\{2}\w+)\$?) which matches an alpha followed by a colon or two backslashes followed by one or more word characters, followed by an optional literal $. There is some needless parenthesis used inside.
The second part (\\(\w[\w].*))+ matches a backslash, followed by two word characters \w[\w] which looks weird because it's equivalent to \w\w (don't need a character class for second \w). Followed by any amount of any character. This whole thing one or more times.
In the last part (.jpeg|.JPEG|.jpg|.JPG) one probably forgot to escape the dot for matching a literal. \. should be used. This part can be reduced to \.(JPE?G|jpe?g).
It would match something like
A:\12anything.JPEG
\\1$\anything.jpg
Play with it at regex101. A better readable could be
^([a-zA-Z]:|\\{2}\w+\$?)(\\\w{2}.*)+\.(jpe?g|JPE?G)$
Also read the explanation on regex101 to understand any pattern, it's helpful!

Regex not working as expected in JavaScript

I wrote the following regex:
(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?
Its behaviour can be seen here: http://gskinner.com/RegExr/?34b8m
I wrote the following JavaScript code:
var urlexp = new RegExp(
'^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$', 'gi'
);
document.write(urlexp.test("blaaa"))
And it returns true even though the regex was supposed to not allow single words as valid.
What am I doing wrong?
Your problem is that JavaScript is viewing all your escape sequences as escapes for the string. So your regex goes to memory looking like this:
^(https?://)?([da-z.-]+).([a-z]{2,6})(/(w|-)*)*/?$
Which you may notice causes a problem in the middle when what you thought was a literal period turns into a regular expressions wildcard. You can solve this in a couple ways. Using the forward slash regular expression syntax JavaScript provides:
var urlexp = /^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$/gi
Or by escaping your backslashes (and not your forward slashes, as you had been doing - that's exclusively for when you're using /regex/mod notation, just like you don't have to escape your single quotes in a double quoted string and vice versa):
var urlexp = new RegExp('^(https?://)?([da-z.-]+)\\.([a-z]{2,6})(/(\\w|-)*)*/?$', 'gi')
Please note the double backslash before the w - also necessary for matching word characters.
A couple notes on your regular expression itself:
[da-z.-]
d is contained in the a-z range. Unless you meant \d? In that case, the slash is important.
(/(\w|-)*)*/?
My own misgivings about the nested Kleene stars aside, you can whittle that alternation down into a character class, and drop the terminating /? entirely, as a trailing slash will be match by the group as you've given it. I'd rewrite as:
(/[\w-]*)*
Though, maybe you'd just like to catch non space characters?
(/[^/\s]*)*
Anyway, modified this way your regular expression winds up looking more like:
^(https?://)?([\da-z.-]+)\.([a-z]{2,6})(/[\w-]*)*$
Remember, if you're going to use string notation: Double EVERY backslash. If you're going to use native /regex/mod notation (which I highly recommend), escape your forward slashes.

Replace Pipe and Comma with Regex in Javascript

I'm sitting here with "The Good Parts" in hand but I'm still none the wiser.
Can anyone knock up a regex for me that will allow me to replace any instances of "|" and "," from a string.
Also, could anyone point me in the direction of a really good resource for learning regular expressions, especially in javascript (are they a particular flavour??) It really is a weak point in my knowledge.
Cheers.
str.replace(/(\||,)/g, "replaceWith") don't forget the g at the end so it seaches the string globally, if you don't put it the regex will only replace the first instance of the characters.
What is saying is replace | (you need to escape this character) OR(|) ,
Nice Cheatsheet here
The best resource I have found if you really want to understand regular expressions (and the special caveats or quirks of any of a majority of the implementations/flavors) is Regular-Expressions.info.
If you really get into regular expressions, I would recommend the product called RegexBuddy for testing and debugging regular expressions in all sorts of languages (though there are a few things it does not quite support, it is rather good overall)
Edit:
The best way (I think, especially if you consider readability) is using a character class rather than alternation (i.e.: [] instead of |)
use:
var newString = str.replace(/[|,]/g, ";");
This will replace either a | or a , with a semicolon
The character class essentially means "match anything inside these square brackets" - with only a few exceptions.
First, you can specify ranges of characters ([a-zA-Z] means any letter from a to z or from A to Z).
Second, putting a caret (^) at the beginning of the character class negates it - it means anything not in this character class ([^0-9] means any character that is not from 0 to 9).
put the dash at the beginning and the caret at the end of the character class to match those characters literally, or escape them anywhere else in the class with a \ if you prefer

Regex replaces everything that matches any character

a[b].innerHTML=a[b].innerHTML.replace(RegExp('['+bbc[c][0]+']','ig'),bbc[c][1])
This is basically what I'm working with. It's wrapped in two loops so that should explain why it looks like what it does. Basically I want to replace something that matches '['+variable from an array+']'. I'm making a BBCode script for a free forum and no don't point me at any BBCode scripts.
The problem is that regex is replacing everything that matches any character. So, it replaces [, q, c, o, d, e, ], all with the second part of the array. (QCODE is an example BBCode being used) I don't know if it does that in a normal /regex/ with [] but it's annoying as hell. I've tried escaping the [] ('\['+v+'\]'), I've tried eval(), I've tried everything you can imagine. I need to make this thing work like it's supposed to, because everything is set up as it should be. If you know how to fix this, please answer. I'd like you to test your solution before answering though because you have no idea how many methods I've tried to make this work.
Use the right escape character:
RegExp('\\['+bbc[c][0]+'\\]','ig'),
/ is just a regular character (except in regex literals, which you're not using), the escape character is \. You also have to escape twice, once for the string literal, and once for the regex.
The reason why your code is not working is because you are using RegExp, which takes in a string for a regular expression. In this string, you need to escape the backslash escape character. The following will work:
​var str = 'Before. [qcode] After.';
alert(str.replace(RegExp('\\[qcode\\]', 'ig'), 'During.'));​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​

Categories

Resources