Regular expression with asterisk quantifier - javascript

This documentation states this about the asterisk quantifier:
Matches the preceding character 0 or more times.
It works in something like this:
var regex = /<[A-Za-z][A-Za-z0-9]*>/;
var str = "<html>";
console.log(str.match(regex));
The result of the above is : <html>
But when tried on the following code to get all the "r"s in the string below, it only returns the first "r". Why is this?
var regex = /r*/;
var str = "rodriguez";
console.log(str.match(regex));
Why, in the first example does it cause "the preceding" character/token to be repeated "0 or more times" but not in the second example?

var regex = /r*/;
var str = "rodriguez";
The regex engine will first try to match r in rodriguez from left to right and since there is a match, it consumes this match.
The regex engine then tries to match another r, but the next character is o, so it stops there.
Without the global flag g (used as so var regex = /r*/g;), the regex engine will stop looking for more matches once the regex is satisfied.
Try using:
var regex = /a*/;
var str = "cabbage";
The match will be an empty string, despite having as in the string! This is because at first, the regex engine tries to find a in cabbage from left to right, but the first character is c. Since this doesn't match, the regex tries to match 0 times. The regex is thus satisfied and the matching ends here.
It might be worth pointing out that * alone is greedy, which means it will first try to match as many as possible (the 'or more' part from the description) before trying to match 0 times.
To get all r from rodriguez, you will need the global flag as mentioned earlier:
var regex = /r*/g;
var str = "rodriguez";
You'll get all the r, plus all the empty strings inside, since * also matches 'nothing'.

Use global switch to match 1 or more r anywhere in the string:
var regex = /r+/g;
In your other regex:
var regex = /<[A-Za-z][A-Za-z0-9]*>/;
You're matching literal < followed by a letter followed by 0 or more letter or digits and it will perfectly match <html>
But if you have input as <foo>:<bar>:<abc> then it will just match <foo> not other segments. To match all segments you need to use /<[A-Za-z][A-Za-z0-9]*>/g with global switch.

Related

How to replace all instance of string '&user_story=x' using regex in javascript?

I have string like this -
var string = 'callback&user_story=1&user_story=2&user_story=100&user_story=a&user_story=john';
&user_story=x (Here x can be anything) can repeat n number of times.
How to replace this '&user_story=x' with a blank value.
What would be regex for it in JS?
The regex looks like this:
var string = 'callback&user_story=1&user_story=2&user_story=100&user_story=a&user_story=john';
var re = /&user_story=.*?(?=&|$)/g
console.log(string.replace(re, ""));
Broken down:
/&user_story=.*?(?=&|$)/g
&user_story= - checks whether the match starts with "&user_story="
.*? - matches any number of any characters, but the ? makes it non-greedy, so it will find as few of these characters before finding the next part of the regex
(?=&|$) - the brackets make this a group, and ?= means it's a lookahead, i.e. it won't actually add matches to the regex match, but just checks to see they're there. It will match either another &, or the end of the string (symbolised by $).
g - is a flag which tells the regex to check the entire string, and not stop after just finding one match.

how to found 2 matches in regular expression

I need a regular expression for :
<<12.txt>> <<45.txt>
I have created a regular expression :
<<.+.txt>>
But this found one match in whole string but here is 2 matches:
<<12.txt>>
<<45.txt>>
if anyone have solution for this problem please help me out there
Part of the issue is that the string you've specified wouldn't match because the second > is missing in <<45.txt>.
Also, you're using the . (dot) selector, and also trying to find a period. It works, but now how you think it is.
Here's the regex you want:
var regex = /<<\d+\.txt>>/g
\d matches only numbers
\. matches an actual period
/g means global, so it won't stop at the first match
Practice Regular Expressions
https://regexr.com/43bs4
Demo
var string = "<<12.txt>> <<45.txt>>";
var regex = /<<\d+\.txt>>/g;
var matches = string.match(regex);
console.log(matches);
P.S., if you actually want to match with 1 > or 2 >>, you can with:
var regex = /<<\d+\.txt>>?/g
? optionally matches the character right before it
/<<.+.txt>>/gm
g is for global (will search through entire source)
m is for multi line search support

regex match not outputting the adjacent matches javascript

i was experimenting on regex in javascript. Then i came across an issue such that let consider string str = "+d+a+", I was trying to output those characters in the string which are surrounded by +, I used str.match(/\+[a-z]\+/ig), so here what I'm expecting is ["+d+","+a+"], but what i got is just ["+d+"], "+a+" is not showing in the output. Why?
.match(/.../g) returns all non-overlapping matches. Your regex requires a + sign on each side. Given your target string:
+d+a+
^^^
^^^
Your matches would have to overlap in the middle in order to return "+a+".
You can use look-ahead and a manual loop to find overlapping matches:
var str = "+d+a+";
var re = /(?=(\+[a-z]\+))/g;
var matches = [], m;
while (m = re.exec(str)) {
matches.push(m[1]);
re.lastIndex++;
}
console.log(matches);
With regex, when a character gets consumed with a match, then it won't count for the next match.
For example, a regex like /aba/g wouldn't find 2 aba's in a string like "ababa".
Because the second "a" was already consumed.
However, that can be overcome by using a positive lookahead (?=...).
Because lookaheads just check what's behind without actually consuming it.
So a regex like /(ab)(?=(a))/g would return 2 capture groups with 'ab' and 'a' for each 'aba'.
But in this case it just needs to be followed by 1 fixed character '+'.
So it can be simplified, because you don't really need capture groups for this one.
Example snippet:
var str = "+a+b+c+";
var matches = str.match(/\+[a-z]+(?=\+)/g).map(function(m){return m + '+'});
console.log(matches);

Javascript word boundaries

I have seen this answer proposed in this question
However the resulting match is not the same. When the match is at the beginning of the string the string is returned, however when matched after a whitespace the whitespace is also returned as part of the match; even though the non-capture colon is used.
I tested with the following code is Firefox console:
let str1 = "un ejemplo";
let str2 = "ejemplo uno";
let reg = /(?:^|\s)un/gi;
console.log(str1.match(reg)); // ["un"]
console.log(str2.match(reg)); // [" un"]
Why is the whitespace being returned?
The colon in (?:^|\s) just means that it's a non-capturing group. In other words, when reading, back-referencing, or replacing with the captured group values, it will not be included. Without the colon, it would be reference-able as \1, but with the colon, there is no way to reference it. However, non-capturing groups are by default still included in the match. For instance My (?:dog|cat) is sick will still include the word dog or cat in the match, even though it's a non-capturing group.
To make it exclude the value, you have two options. If your regex engine supports negative look-behinds, you can use on of those, such as (?!<^|\s). If it does not (and unfortunately, JavaScript's engine is one of the ones which does not), you could put a capturing group around just the part you want and then read that group's value rather than the whole match (e.g, (?:^|\s)(un)). For instance:
let reg = /(?:^|\s)(un)/gi;
let match = reg.exec(input)
let result = match[1];
One solution would be to use a capturing group (ie. (un)) so that you can use RegExp.prototype.exec() and then use match[1] of this result to get the matched string, like this:
let str1 = "un ejemplo";
let str2 = "ejemplo uno";
let reg = /(?:^|\s)(un)/gi;
var match1 = reg.exec(str1);
var match2 = reg.exec(str2);
console.log(match1[1]); // ["un"]
console.log(match2[1]); // ["un"]

match a string not after another string

This
var re = /[^<a]b/;
var str = "<a>b";
console.log(str.match(re)[0]);
matches >b.
However, I don't understand why this pattern /[^<a>]b/ doesn't match anything. I want to capture only the "b".
The reason why /[^<a>]b/ doesn't do anything is that you are ignoring <, a, and > as individual characters, so rewriting it as /[^><a]b/ would do the same thing. I doubt this is what you want, though. Try the following:
var re = /<a>(b)/;
var str = "<a>b";
console.log(str.match(re)[1]);
This regex looks for a string that looks like <a>b first, but it captures the b with the parentheses. To access the b, simply use [1] when you call .match instead of [0], which would return the entire string (<a>b).
What you're using here is a match for a b preceded by any character that is not listed in the group. The syntax [^a-z+-] where the a-z+- is a range of characters (in this case, the range of the lowercase Latin letters, a plus sign and a minus sign). So, what your regex pattern matches is any b preceded by a character that is NOT < or a. Since > doesn't fall in that range, it matches it.
The range selector basically works the same as a list of characters that are seperated by OR pipes: [abcd] matches the same as (a|b|c|d). Range selectors just have an extra functionality of also matching that same string via [a-d], using a dash in between character ranges. Putting a ^ at the start of a range automatically turns this positive range selector into a negative one, so it will match anything BUT the characters in that range.
What you are looking for is a negative lookahead. Those can exclude something from matching longer strings. Those work in this format: (?!do not match) where do not match uses the normal regex syntax. In this case, you want to test if the preceding string does not match <a>, so just use:
(?!<a>)(.{3}|^.{0,2})b
That will match the b when it is either preceded by three characters that are not <a>, or by fewer characters that are at the start of the line.
PS: what you are probably looking for is the "negative lookbehind", which sadly isn't available in JavaScript regular expressions. The way that would work is (?<!<a>)b in other languages. Because JavaScript doesn't have negative lookbehinds, you'll have to use this alternative regex.
you could write a pattern to match anchor tag and then replace it with empty string
var str = "<a>b</a>";
str = str.replace(/((<a[\w\s=\[\]\'\"\-]*>)|</a>)/gi,'')
this will replace the following strings with 'b'
<a>b</a>
<a class='link-l3'>b</a>
to better get familiar with regEx patterns you may find this website very useful regExPal
Your code :
var re = /[^<a>]b/;
var str = "<a>b";
console.log(str.match(re));
Why [^<a>]b is not matching with anything ?
The meaning of [^<a>]b is any character except < or a or > then b .
Hear b is followed by > , so it will not match .
If you want to match b , then you need to give like this :
var re = /(?:[\<a\>])(b)/;
var str = "<a>b";
console.log(str.match(re)[1]);
DEMO And EXPLANATION

Categories

Resources