Regex non capturing groups in javascript - javascript

I'm a bit rusty on my regex and javascript. I have the following string var:
var subject = "javascript:loadNewsItemWithIndex(5, null);";
I want to extract 5 using a regex. This is my regex:
/(?:loadNewsItemWithIndex\()[0-9]+/)
Applied like so:
subject.match(/(?:loadNewsItemWithIndex\()[0-9]+/)
The result is:
loadNewsItemWithIndex(5
What is cleanest, most readable way to extract 5 as a one-liner? Is it possible to do this by excluding loadNewsItemWithIndex( from the match rather than matching 5 as a sub group?

The return value from String.match is an array of matches, so you can put parentheses around the number part and just retrieve that particular match index (where the first match is the entire matched result, and subsequent entries are for each capture group):
var subject = "javascript:loadNewsItemWithIndex(5, null);";
var result = subject.match(/loadNewsItemWithIndex\(([0-9]+)/);
// ^ ^ added parens
document.writeln(result[1]);
// ^ retrieve second match (1 in 0-based indexing)
Sample code: http://jsfiddle.net/LT62w/
Edit: Thanks #Alan for the correction on how non-capturing matches work.
Actually, it's working perfectly. Text that's matched inside a
non-capturing group is still consumed, the same as text that's matched
outside of any group. A capturing group is like a non-capturing group
with extra functionality: in addition to grouping, it allows you to
extract whatever it matches independently of the overall match.

I believe the following regex should work for you:
loadNewsItemWithIndex\(([0-9]+).*$
var test = new RegExp(/loadNewsItemWithIndex\(([0-9]+).*$/);
test.exec('var subject = "javascript:loadNewsItemWithIndex(5, null);";');
The break down of this is
loadNewsItemWithIndex = exactly that
\( = open parentheses
([0-9]+) = Capture the number
.* = Anything after that number
$ = end of the line

This should suffice:
<script>
var subject = "javascript:loadNewsItemWithIndex(5, null);";
number = subject.match(/[0-9]+/);
alert(number);
</script>

Related

Javascript word boundaries

I have seen this answer proposed in this question
However the resulting match is not the same. When the match is at the beginning of the string the string is returned, however when matched after a whitespace the whitespace is also returned as part of the match; even though the non-capture colon is used.
I tested with the following code is Firefox console:
let str1 = "un ejemplo";
let str2 = "ejemplo uno";
let reg = /(?:^|\s)un/gi;
console.log(str1.match(reg)); // ["un"]
console.log(str2.match(reg)); // [" un"]
Why is the whitespace being returned?
The colon in (?:^|\s) just means that it's a non-capturing group. In other words, when reading, back-referencing, or replacing with the captured group values, it will not be included. Without the colon, it would be reference-able as \1, but with the colon, there is no way to reference it. However, non-capturing groups are by default still included in the match. For instance My (?:dog|cat) is sick will still include the word dog or cat in the match, even though it's a non-capturing group.
To make it exclude the value, you have two options. If your regex engine supports negative look-behinds, you can use on of those, such as (?!<^|\s). If it does not (and unfortunately, JavaScript's engine is one of the ones which does not), you could put a capturing group around just the part you want and then read that group's value rather than the whole match (e.g, (?:^|\s)(un)). For instance:
let reg = /(?:^|\s)(un)/gi;
let match = reg.exec(input)
let result = match[1];
One solution would be to use a capturing group (ie. (un)) so that you can use RegExp.prototype.exec() and then use match[1] of this result to get the matched string, like this:
let str1 = "un ejemplo";
let str2 = "ejemplo uno";
let reg = /(?:^|\s)(un)/gi;
var match1 = reg.exec(str1);
var match2 = reg.exec(str2);
console.log(match1[1]); // ["un"]
console.log(match2[1]); // ["un"]

or with regular expressions in javascript

let's say I have this code:
var pattern = /^\d{1}-\d{3}-\d{3}-\d{4}$/;
//var res = pattern.replace("-", ".");
how can you not make it mandatory for the user to not always use a dash when inputting a number? I wanted to let the input a dash or a dot...
You may add a ? quantifier after them to make the dashes optional:
var pattern = /^\d[-.]?\d{3}[-.]?\d{3}[-.]?\d{4}$/;
Here, [-.]? matches 1 or 0 occurrences of a - or . character.
See the regex demo.
If you want to require a consistent way of entering data, use grouping and backreferences:
/^\d([-.]?)\d{3}\1\d{3}\1\d{4}$/;
See the same demo fiddle with this pattern.
Here, ([-.]?) forms a capturing group with ID 1, and \1 is a backreference to the value stored in the Group 1 memory buffer. So, if the first group matched ., all \1 will be only matching ..

Not sure why this Regex is returning true

Trying to use this regex to verify usernames and this is what I have :
var goodUsername = /[a-zA-Z0-9_]/g;
console.log(goodUsername.test("HELO $"));
But wether or not I have $ in there it returns true. Not sure why.
I basically only want letters, numbers and _ in usernames and that's it
It seems to work here https://regex101.com/r/nP4iG7/1
The RegEx that you use searches any match in the subject string. In your case HELO matches the criteria. If you like to apply the criteria to the whole string you should define the string begin and end using
var goodUsername = /^[a-zA-Z0-9_]+$/;
console.log(goodUsername.test("HELO $"));//false
You need to add anchors..
/^[a-zA-Z0-9_]+$/;
Anchors help to do exact matching. ^ start of the line anchor, $ end of the line anchor. And also you need to repeat the char class one or more times otherwise it would match a string which contains exactly one character.
You could search for any characters not in the list (a "negated character set"):
var badUsername = /[^a-zA-Z0-9_]/;
console.log(!badUsername.test("HELO $"));
or more simply
var badUsername = /\W/;
since \W is defined as
Matches any character that is not a word character from the basic Latin alphabet. Equivalent to [^A-Za-z0-9_].
If you prefer to do a positive match, using anchors as other answers have suggested, you can shorten your regexp by using \w:
var goodUsername = /^\w+$/;

javascript regex to return letters only

My string can be something like A01, B02, C03, possibly AA18 in the future as well. I thought I could use a regex to get just the letters and work on my regex since I haven't done much with it. I wrote this function:
function rowOffset(sequence) {
console.log(sequence);
var matches = /^[a-zA-Z]+$/.exec(sequence);
console.log(matches);
var letter = matches[0].toUpperCase();
return letter;
}
var x = "A01";
console.log(rowOffset(x));
My matches continue to be null. Am I doing this correctly? Looking at this post, I thought the regex was correct: Regular expression for only characters a-z, A-Z
You can use String#replace to remove all non letters from input string:
var r = 'AA18'.replace(/[^a-zA-Z]+/g, '');
//=> "AA"
Your main issue is the use of the ^ and $ characters in the regex pattern. ^ indicates the beginning of the string and $ indicates the end, so you pattern is looking for a string that is ONLY a group of one or more letters, from the beginning to the end of the string.
Additionally, if you want to get each individual instance of the letters, you want to include the "global" indicator (g) at the end of your regex pattern: /[a-zA-Z]+/g. Leaving that out means that it will only find the first instance of the pattern and then stop searching . . . adding it will match all instances.
Those two updates should get you going.
EDIT:
Also, you may want to use match() rather than exec(). If you have a string of multiple values (e.g., "A01, B02, C03, AA18"), match() will return them all in an array, whereas, exec() will only match the first one. If it is only ever one value, then exec() will be fine (and you also wouldn't need the "global" flag).
If you want to use match(), you need to change your code order just a bit to:
var matches = sequence.match(/[a-zA-Z]+/g);
To return an array of separate letters remove +:
var matches = sequence.match(/[a-zA-Z]/g);
You're confused about what the goal of the other question was: he wanted to check that there were only letters in his string.
You need to remove the anchors ^$, who match respectively the beginning and end of the string:
[a-zA-Z]+
This will match the first of letters in your input string.
If there might be more (ie you want multiple matches in your single string), use
sequence.match(/[a-zA-Z]+/g)
This /[^a-z]/g solves the problem. Look at the example below.
function pangram(str) {
let regExp = /[^a-z]/g;
let letters = str.toLowerCase().replace(regExp, '');
document.getElementById('letters').innerHTML = letters;
}
pangram('GHV 2## %hfr efg uor7 489(*&^% knt lhtkjj ngnm!##$%^&*()_');
<h4 id="letters"></h4>
You can do this:
var r = 'AA18'.replace(/[\W\d_]/g, ''); // AA
Also can be done by String.prototype.split(regex).
'AA12BB34'.split(/(\d+)/); // ["AA", "12", "BB", "34", ""]
'AA12BB34'.split(/(\d+)/)[0]; // "AA"
Here regex divides the giving string by digits (\d+)

Regular Expressions: Capture multiple groups using quantifier

Consider the following code:
<!DOCTYPE html>
<html>
<body>
<script type="text/javascript">
var str = '<12> rnbqkb-r Rnbq-b-r ';
var pat1 = new RegExp('^\\<12\\> ([rnbqkpRNBQKP-]{8}) ([rnbqkpRNBQKP-]{8})');
var pat2 = new RegExp('^\\<12\\> ([rnbqkp RNBQKP-]{8}){2}');
var pat3 = new RegExp('^\\<12\\> ([rnbqkp RNBQKP-]{8}){2}?');
document.write(str.match(pat1));
document.write('<br />');
document.write(str.match(pat2));
document.write('<br />');
document.write(str.match(pat3));
</script>
</body>
</html>
which produces
<12> rnbqkb-r Rnbq-b-r,rnbqkb-r,Rnbq-b-r
<12> rnbqkb-r Rnbq-b-, Rnbq-b-
<12> rnbqkb-r Rnbq-b-, Rnbq-b-
as output.
Why does neither pattern pat2 nor pat3 capture the first group rnbqkb-r? I would like to capture all groups without having to repeat them explicitly as in pattern pat1.
Why does neither pattern pat2 nor pat3 capture the first group rnbqkb-r?
Because you have white-space at the end of each 8-character sequence that your regexes pat2 and pat3 do not allow.
I would like to capture all groups without having to repeat them explicitly as in pattern pat1.
You can't.
It is not possible (in JavaScript) to capture two groups when your regex only contains one group.
Groups are defined thorugh parentheses. Your match result will contain as many groups as there are parentheses pairs in your regex (except modified parentheses like (?:...) which will not count towards match groups). Want two separate group matches in your match result? Define two separate groups in your regex.
If a group can match multiple times, the group's value will be whatever it matched last. All previous match occurrences for that group will be overridden by its last match.
Try
var pat1 = /^<12> ((?:[rnbqkp-]{8} ?)*)/i,
match = str.match(pat1);
if (match) {
match[1].split(/\s+/); // ["rnbqkb-r", "Rnbq-b-r", ""]
}
Notes:
Trim str beforehand if you don't want the last empty array value.
In general, prefer regex literal notation (/expression/). Use new RegExp() only for expressions you generate from dynamic values.
< and > are not special, you don't need to escape them.
Count again (8 vs 9). pat2 and pat3 are missing the space in between the two parts.
Update: Additionally, I don't thing it's possible what you are trying to achieve by using match. See How can I match multiple occurrences with a regex in JavaScript similar to PHP's preg_match_all()? and use exec.

Categories

Resources