Javascript word boundaries - javascript

I have seen this answer proposed in this question
However the resulting match is not the same. When the match is at the beginning of the string the string is returned, however when matched after a whitespace the whitespace is also returned as part of the match; even though the non-capture colon is used.
I tested with the following code is Firefox console:
let str1 = "un ejemplo";
let str2 = "ejemplo uno";
let reg = /(?:^|\s)un/gi;
console.log(str1.match(reg)); // ["un"]
console.log(str2.match(reg)); // [" un"]
Why is the whitespace being returned?

The colon in (?:^|\s) just means that it's a non-capturing group. In other words, when reading, back-referencing, or replacing with the captured group values, it will not be included. Without the colon, it would be reference-able as \1, but with the colon, there is no way to reference it. However, non-capturing groups are by default still included in the match. For instance My (?:dog|cat) is sick will still include the word dog or cat in the match, even though it's a non-capturing group.
To make it exclude the value, you have two options. If your regex engine supports negative look-behinds, you can use on of those, such as (?!<^|\s). If it does not (and unfortunately, JavaScript's engine is one of the ones which does not), you could put a capturing group around just the part you want and then read that group's value rather than the whole match (e.g, (?:^|\s)(un)). For instance:
let reg = /(?:^|\s)(un)/gi;
let match = reg.exec(input)
let result = match[1];

One solution would be to use a capturing group (ie. (un)) so that you can use RegExp.prototype.exec() and then use match[1] of this result to get the matched string, like this:
let str1 = "un ejemplo";
let str2 = "ejemplo uno";
let reg = /(?:^|\s)(un)/gi;
var match1 = reg.exec(str1);
var match2 = reg.exec(str2);
console.log(match1[1]); // ["un"]
console.log(match2[1]); // ["un"]

Related

regex match not outputting the adjacent matches javascript

i was experimenting on regex in javascript. Then i came across an issue such that let consider string str = "+d+a+", I was trying to output those characters in the string which are surrounded by +, I used str.match(/\+[a-z]\+/ig), so here what I'm expecting is ["+d+","+a+"], but what i got is just ["+d+"], "+a+" is not showing in the output. Why?
.match(/.../g) returns all non-overlapping matches. Your regex requires a + sign on each side. Given your target string:
+d+a+
^^^
^^^
Your matches would have to overlap in the middle in order to return "+a+".
You can use look-ahead and a manual loop to find overlapping matches:
var str = "+d+a+";
var re = /(?=(\+[a-z]\+))/g;
var matches = [], m;
while (m = re.exec(str)) {
matches.push(m[1]);
re.lastIndex++;
}
console.log(matches);
With regex, when a character gets consumed with a match, then it won't count for the next match.
For example, a regex like /aba/g wouldn't find 2 aba's in a string like "ababa".
Because the second "a" was already consumed.
However, that can be overcome by using a positive lookahead (?=...).
Because lookaheads just check what's behind without actually consuming it.
So a regex like /(ab)(?=(a))/g would return 2 capture groups with 'ab' and 'a' for each 'aba'.
But in this case it just needs to be followed by 1 fixed character '+'.
So it can be simplified, because you don't really need capture groups for this one.
Example snippet:
var str = "+a+b+c+";
var matches = str.match(/\+[a-z]+(?=\+)/g).map(function(m){return m + '+'});
console.log(matches);

How can I access the expression that caused a match in a conditional match group Javascript regex?

I have a conditional match grouped regex like /(sun|\bmoon)/. When I access the matches in a string, I want to be able to see the expression that caused my match.
let regex = /(sun|\bmoon)/
let match = regex.exec('moon')
// return '\bmoon' ??
Is this possible?
JavaScript's RegExp does not currently have a method to show which part of the regex pattern matched. I don't believe this is something that will be implemented any time soon (or even ever), but that's my own opinion. You can, instead, use two separate patterns as I show in the snippet below.
let regexes = [/sun/, /\bmoon/]
let str = 'moon'
regexes.forEach(function(regex) {
let m = regex.exec(str)
if (m == null) return
console.log(`${regex}: ${m}`)
0})
The reason why capturing groups exist is to identify the part of the input string that matches a subexpression. In your example, the subexpression that matches is sun|\bmoon (the content of the capturing group).
If you want to know which of the two sub-expression actually matches the input string, all you have to do is to put them into smaller capturing groups:
let regex = /((sun)|(\bmoon))/
let match = regex.exec('moon')
# Array [ "moon", "moon", undefined, "moon" ]
The returned array contains the string that matched the entire regex (at position 0) and the substrings that matched each capturing group at the other positions.
The capturing groups are counted starting from 1 in the order they are open.
In the example above, "moon", undefined and "moon" correspond to the capturing groups (in order) ((sun)|(\bmoon)), (sun) and (\bmoon).
By checking the values of match[2] and match[3] you can find if the input string matched sun (no, it is undefined) or \bmoon (yes).
You can use non-capturing groups for groups you don't need to capture but cannot be removed because they are needed for the grouping purposes.
You can't see the regexp expression as written in the pattern, but you can see in the array returned by exec what has been matched.
you mean?
console.log(match[0]);
or you want the full expression that matches? Like \bmoon ? If so, you can't see it.

Regular expression with asterisk quantifier

This documentation states this about the asterisk quantifier:
Matches the preceding character 0 or more times.
It works in something like this:
var regex = /<[A-Za-z][A-Za-z0-9]*>/;
var str = "<html>";
console.log(str.match(regex));
The result of the above is : <html>
But when tried on the following code to get all the "r"s in the string below, it only returns the first "r". Why is this?
var regex = /r*/;
var str = "rodriguez";
console.log(str.match(regex));
Why, in the first example does it cause "the preceding" character/token to be repeated "0 or more times" but not in the second example?
var regex = /r*/;
var str = "rodriguez";
The regex engine will first try to match r in rodriguez from left to right and since there is a match, it consumes this match.
The regex engine then tries to match another r, but the next character is o, so it stops there.
Without the global flag g (used as so var regex = /r*/g;), the regex engine will stop looking for more matches once the regex is satisfied.
Try using:
var regex = /a*/;
var str = "cabbage";
The match will be an empty string, despite having as in the string! This is because at first, the regex engine tries to find a in cabbage from left to right, but the first character is c. Since this doesn't match, the regex tries to match 0 times. The regex is thus satisfied and the matching ends here.
It might be worth pointing out that * alone is greedy, which means it will first try to match as many as possible (the 'or more' part from the description) before trying to match 0 times.
To get all r from rodriguez, you will need the global flag as mentioned earlier:
var regex = /r*/g;
var str = "rodriguez";
You'll get all the r, plus all the empty strings inside, since * also matches 'nothing'.
Use global switch to match 1 or more r anywhere in the string:
var regex = /r+/g;
In your other regex:
var regex = /<[A-Za-z][A-Za-z0-9]*>/;
You're matching literal < followed by a letter followed by 0 or more letter or digits and it will perfectly match <html>
But if you have input as <foo>:<bar>:<abc> then it will just match <foo> not other segments. To match all segments you need to use /<[A-Za-z][A-Za-z0-9]*>/g with global switch.

Regex produces different result in javascript

Why does this regex return an entirely different result in javascript as compared to an on-line regex tester, found at http://www.gskinner.com/RegExr/
var patt = new RegExp(/\D([0-9]*)/g);
"/144444455".match(patt);
The return in the console is:
["/144444455"]
While it does return the correct group in the regexr tester.
All I'm trying to do is extract the first amount inside a piece of text. Regardless if that text starts with a "/" or has a bunch of other useless information.
The regex does exactly what you tell it to:
\D matches a non-digit (in this case /)
[0-9]* matches a string of digits (144444455)
You will need to access the content of the first capturing group:
var match = patt.exec(subject);
if (match != null) {
result = match[1];
}
Or simply drop the \D entirely - I'm not sure why you think you need it in the first place...
Then, you should probably remove the /g modifier if you only want to match the first number, not all numbers in your text. So,
result = subject.match(/\d+/);
should work just as well.

Regex non capturing groups in javascript

I'm a bit rusty on my regex and javascript. I have the following string var:
var subject = "javascript:loadNewsItemWithIndex(5, null);";
I want to extract 5 using a regex. This is my regex:
/(?:loadNewsItemWithIndex\()[0-9]+/)
Applied like so:
subject.match(/(?:loadNewsItemWithIndex\()[0-9]+/)
The result is:
loadNewsItemWithIndex(5
What is cleanest, most readable way to extract 5 as a one-liner? Is it possible to do this by excluding loadNewsItemWithIndex( from the match rather than matching 5 as a sub group?
The return value from String.match is an array of matches, so you can put parentheses around the number part and just retrieve that particular match index (where the first match is the entire matched result, and subsequent entries are for each capture group):
var subject = "javascript:loadNewsItemWithIndex(5, null);";
var result = subject.match(/loadNewsItemWithIndex\(([0-9]+)/);
// ^ ^ added parens
document.writeln(result[1]);
// ^ retrieve second match (1 in 0-based indexing)
Sample code: http://jsfiddle.net/LT62w/
Edit: Thanks #Alan for the correction on how non-capturing matches work.
Actually, it's working perfectly. Text that's matched inside a
non-capturing group is still consumed, the same as text that's matched
outside of any group. A capturing group is like a non-capturing group
with extra functionality: in addition to grouping, it allows you to
extract whatever it matches independently of the overall match.
I believe the following regex should work for you:
loadNewsItemWithIndex\(([0-9]+).*$
var test = new RegExp(/loadNewsItemWithIndex\(([0-9]+).*$/);
test.exec('var subject = "javascript:loadNewsItemWithIndex(5, null);";');
The break down of this is
loadNewsItemWithIndex = exactly that
\( = open parentheses
([0-9]+) = Capture the number
.* = Anything after that number
$ = end of the line
This should suffice:
<script>
var subject = "javascript:loadNewsItemWithIndex(5, null);";
number = subject.match(/[0-9]+/);
alert(number);
</script>

Categories

Resources