Regular expression only returning first group - javascript

Probably something simple but i am trying to return the capture groups from this regex...
const expression = /^\/api(?:\/)?([^\/]+)?\/users\/([^\/]+)$/g
The code i am using to do this is the following...
const matchExpression = (expression, pattern) => {
let match;
let matches = [];
while((match = expression.exec(pattern)) != null) {
matches.push(match[1]);
};
return matches;
};
I am expecting the following result when matched against /api/v1/users/1...
['v1', '1']
But instead only seem to get one result which is always the first group.
The expression itself is fine and has been tested across multiple services but can't seem to figure out why this is not working as expected.
Any help would be hugely appreciated.

You must make sure you add the second capturing group contents to the resulting array:
while((match = expression.exec(pattern)) != null) {
matches.push(match[1]);
matches.push(match[2]); // <- here
};
Since you are matching an entire string, you can use a /^\/api(?:\/)?([^\/]+)?\/users\/([^\/]+)$/ regex (since you are matching a whole string you need no g global modifier) and reduce the code to:
const matchExpression = (expression, pattern) => {
let matches = pattern.match(expression);
if (matches) {
matches = matches.slice(1);
}
return matches;
};
The point is that you can use String#match with a regex without global modifier to access capturing group contents.
Demo:
var expr = /^\/api(?:\/)?([^\/]+)?\/users\/([^\/]+)$/;
var matches = "/api/v1/users/1".match(expr);
if (matches) {
console.log(matches.slice(1));
}

Related

Insure that regex moves to the second OR element only if the first one doesn't exist

I'm trying to match a certain word on a string and only if it doesn't exist i want to match the another one using the OR | operator ....but the match is ignoring that... how can i insure that the behavior works :
const str = 'Soraka is an ambulance 911'
const regex = RegExp('('+'911'+'|'+'soraka'+')','i')
console.log(str.match(regex)[0]) // should get 911 instead
911 occurs late in the string, whereas Soraka occurs earlier, and the regex engine iterates character-by-character, so Soraka gets matched first, even though it's on the right-hand side of the alternation.
One option would be to match Soraka or 911 in captured lookaheads instead, and then with the regex match object, alternate between the two groups to get the one which is not undefined:
const check = (str) => {
const regex = /^(?=.*(911)|.*(Soraka))/;
const match = str.match(regex);
console.log(match[1] || match[2]);
};
check('Soraka is an ambulance 911');
check('foo 911');
check('foo Soraka');
You can use includes and find
You can pass the strings in the priority sequence, so as soon as find found any string in the original string it returns that strings back,
const str = 'Soraka is an ambulance 911'
const findStr = (...arg) => {
return [...arg].find(toCheck => str.includes(toCheck))
}
console.log(findStr("911", "Soraka"))
You can extend the findStr if you want your match to be case insensitive something like this
const str = 'Soraka is an ambulance 911'
const findStr = (...arg) => {
return [...arg].find(toCheck => str.toLowerCase().includes(toCheck.toLowerCase()))
}
console.log(findStr("Soraka", "911"))
If you want match to be whole word not the partial words than you can build dynamic regex and use it search value
const str = '911234 Soraka is an ambulance 911'
const findStr = (...arg) => {
return [...arg].find(toCheck =>{
let regex = new RegExp(`\\b${toCheck}\\b`,'i')
return regex.test(str)
})
}
console.log(findStr("911", "Soraka"))
Just use a greedy dot before a capturing group that matches 911 or Soraka:
/.*(911)|(Soraka)/
See the regex demo
The .* (or, if there are line breaks, use /.*(911)|(Soraka)/s in Chrome/Node, or /[^]*(911)|(Soraka)/ to support legacy EMCMScript versions) will ensure the regex index advances to the rightmost position when matching 911 or Soraka.
JS demo (borrowed from #CertainPerformance's answer):
const check = (str) => {
const regex = /.*(911)|(Soraka)/;
const match = str.match(regex) || ["","NO MATCH","NO MATCH"];
console.log(match[1] || match[2]);
};
check('Soraka is an ambulance 911');
check('Ambulance 911, Soraka');
check('foo 911');
check('foo Soraka');
check('foo oops!');

Group array with two words, rather than one

CODE BELOW: When a word has been written, it stores that as its own array, meaning every single word is its own array, and then later checked for reoccurrences.
What i want: Instead of it creating an array of a word (after spacebar has been hit), i want it to do it after 2 words have been written.
IE: Instead of me writing "Hello" + spacebar, and the code creating "hello" as an array. I'd like it to wait until i've written "hello my" + spacebar and then create an array with those two numbers.
I am guessing this has something to do with the regular expression?
I've tried many different things (a little bit of a newbie) and i cannot understand how to get it to group 2 words together rather than one.
const count = (text) => {
const wordRegex = new RegExp(`([\\p{Alphabetic}\]+)`, 'gu');
let result;
const words = {};
while ((result = wordRegex.exec(text)) !== null) {
const word = result[0].toLowerCase();
if (!words[word]) {
words[word] = [];
}
words[word].push(result.index);
words[word].push(result.index + word.length);
}
return words;
};
You may use
const wordRegex = /\p{Alphabetic}+(?:\s+\p{Alphabetic}+)?/gu;
Details
\p{Alphabetic}+ - 1+ alphabetic chars
(?:\s+\p{Alphabetic}+)? - an optional sequence of:
\s+ - 1+ whitespaces
\p{Alphabetic}+ - 1+ alphabetic chars
The second word is matched optionally so that the final odd word could be matched, too.
See the JS demo below:
const count = (text) => {
const wordRegex = /\p{Alphabetic}+(?:\s+\p{Alphabetic}+)?/gu;
let result;
const words = {};
while ((result = wordRegex.exec(text)) !== null) {
const word = result[0].toLowerCase();
if (!words[word]) {
words[word] = [];
}
words[word].push(result.index);
words[word].push(result.index + word.length);
}
return words;
};
console.log(count("abc def ghi"))
A RegExp constructor way of defining this regex is
const wordRegex = new RegExp("\\p{Alphabetic}+(?:\\s+\\p{Alphabetic}+)?", "gu");
However, since the pattern is static, no variables are used to build the pattern, you can use the regex literal notation as shown at the top of the answer.

Return unique digits of a time format string using regex

Need to create a regex pattern that will return unique digits before or after a : symbol, using String.match. It should only return the digit, not the : symbol. PS: I know there is other (maybe easier) ways to do this, but I want to use regex for learning purposes
let s;
let regex = /(^\d:)(:\d$)/g // I tried this, off course it didn't work
s = '12:34'
s.match(regex) // return null
s = '1:34'
s.match(regex) // return [1]
s = '12:4'
s.match(regex) // return [4]
s = '1:4'
s.match(regex) // return [1,4]
Try using this:
let regex = /(((?<=:)\d(?!\d))|((?<!\d)\d(?=:)))/g
This will match the patterns you want!
Here's a reference for Regex.
let s;
let regex = /(((?<=:)\d(?!\d))|((?<!\d)\d(?=:)))/g
s = '12:34'
console.log(s.match(regex)) // return null
s = '1:34'
console.log(s.match(regex)) // return [1]
s = '12:4'
console.log(s.match(regex)) // return [4]
s = '1:4'
console.log(s.match(regex)) // return [1,4]
Example done in JavaScript. The 2nd regex is the most simple, it matches a digit followed by a colon, followed by a digit (you could use this with the g flag if there is more than one occurrence in your text).
1st regex matches the entire string and MAY have one or more characters before the 1st digit and one or more characters after the 2nd one. This will only capture one occurrence for the entire string.
let regex1 = /^.*(\d):(\d).*$/;
let regex2 = /(\d):(\d)/;
console.log("Make sure the entire string only contains one instance");
['12:34', '1:34', '12:4', '1:4' ].forEach( (s) => console.log(s.match(regex1) ));
console.log("Match the first instance found");
['12:34', '1:34', '12:4', '1:4' ].forEach( (s) => console.log(s.match(regex2) ));
Not sure what do you mean by "unique".
But if you want to just get numbers then you can use + or * quantifiers.
/^(\d+):(\d+)$/

Using Regex to pull out a part of a string

I can't figure out how to pull out multiple matches from the following example:
This code:
/prefix-(\w+)/g.exec('prefix-firstname prefix-lastname');
returns:
["prefix-firstname", "firstname"]
How do I get it to return:
[
["prefix-firstname", "firstname"],
["prefix-lastname", "lastname"]
]
Or
["prefix-firstname", "firstname", "prefix-lastname", "lastname"]
This will do what you want:
var str="prefix-firstname prefix-lastname";
var out =[];
str.replace(/prefix-(\w+)/g,function(match, Group) {
var row = [match, Group]
out.push(row);
});
Probably a mis-use of .replace, but I don't think you can pass a function to .match...
_Pez
Using a loop:
re = /prefix-(\w+)/g;
str = 'prefix-firstname prefix-lastname';
match = re.exec(str);
while (match != null) {
match = re.exec(str);
}
You get each match one at a time.
Using match:
Here, the regex will have to be a bit different, because you cannot get sub-captures (or I don't know how to do it with multiple matches)...
re = /[^\s-]+(?=\s|$)/g;
str = 'prefix-firstname prefix-lastname';
match = str.match(re);
alert(match);
[^\s-]+ matches all characters except spaces and dashes/hyphens only if they are followed by a space or are at the end of the string, which is a confition imposed by (?=\s|$).
You can find the groups in two steps:
"prefix-firstname prefix-lastname".match(/prefix-\w+/g)
.map(function(s) { return s.match(/prefix-(\w+)/) })

RegEx to extract all matches from string using RegExp.exec

I'm trying to parse the following kind of string:
[key:"val" key2:"val2"]
where there are arbitrary key:"val" pairs inside. I want to grab the key name and the value.
For those curious I'm trying to parse the database format of task warrior.
Here is my test string:
[description:"aoeu" uuid:"123sth"]
which is meant to highlight that anything can be in a key or value aside from space, no spaces around the colons, and values are always in double quotes.
In node, this is my output:
[deuteronomy][gatlin][~]$ node
> var re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
> re.exec('[description:"aoeu" uuid:"123sth"]');
[ '[description:"aoeu" uuid:"123sth"]',
'uuid',
'123sth',
index: 0,
input: '[description:"aoeu" uuid:"123sth"]' ]
But description:"aoeu" also matches this pattern. How can I get all matches back?
Continue calling re.exec(s) in a loop to obtain all the matches:
var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
var m;
do {
m = re.exec(s);
if (m) {
console.log(m[1], m[2]);
}
} while (m);
Try it with this JSFiddle: https://jsfiddle.net/7yS2V/
str.match(pattern), if pattern has the global flag g, will return all the matches as an array.
For example:
const str = 'All of us except #Emran, #Raju and #Noman were there';
console.log(
str.match(/#\w*/g)
);
// Will log ["#Emran", "#Raju", "#Noman"]
To loop through all matches, you can use the replace function:
var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
s.replace(re, function(match, g1, g2) { console.log(g1, g2); });
This is a solution
var s = '[description:"aoeu" uuid:"123sth"]';
var re = /\s*([^[:]+):\"([^"]+)"/g;
var m;
while (m = re.exec(s)) {
console.log(m[1], m[2]);
}
This is based on lawnsea's answer, but shorter.
Notice that the `g' flag must be set to move the internal pointer forward across invocations.
str.match(/regex/g)
returns all matches as an array.
If, for some mysterious reason, you need the additional information comes with exec, as an alternative to previous answers, you could do it with a recursive function instead of a loop as follows (which also looks cooler :).
function findMatches(regex, str, matches = []) {
const res = regex.exec(str)
res && matches.push(res) && findMatches(regex, str, matches)
return matches
}
// Usage
const matches = findMatches(/regex/g, str)
as stated in the comments before, it's important to have g at the end of regex definition to move the pointer forward in each execution.
We are finally beginning to see a built-in matchAll function, see here for the description and compatibility table. It looks like as of May 2020, Chrome, Edge, Firefox, and Node.js (12+) are supported but not IE, Safari, and Opera. Seems like it was drafted in December 2018 so give it some time to reach all browsers, but I trust it will get there.
The built-in matchAll function is nice because it returns an iterable. It also returns capturing groups for every match! So you can do things like
// get the letters before and after "o"
let matches = "stackoverflow".matchAll(/(\w)o(\w)/g);
for (match of matches) {
console.log("letter before:" + match[1]);
console.log("letter after:" + match[2]);
}
arrayOfAllMatches = [...matches]; // you can also turn the iterable into an array
It also seem like every match object uses the same format as match(). So each object is an array of the match and capturing groups, along with three additional properties index, input, and groups. So it looks like:
[<match>, <group1>, <group2>, ..., index: <match offset>, input: <original string>, groups: <named capture groups>]
For more information about matchAll there is also a Google developers page. There are also polyfills/shims available.
If you have ES9
(Meaning if your system: Chrome, Node.js, Firefox, etc supports Ecmascript 2019 or later)
Use the new yourString.matchAll( /your-regex/g ).
If you don't have ES9
If you have an older system, here's a function for easy copy and pasting
function findAll(regexPattern, sourceString) {
let output = []
let match
// auto-add global flag while keeping others as-is
let regexPatternWithGlobal = RegExp(regexPattern,[...new Set("g"+regexPattern.flags)].join(""))
while (match = regexPatternWithGlobal.exec(sourceString)) {
// get rid of the string copy
delete match.input
// store the match data
output.push(match)
}
return output
}
example usage:
console.log( findAll(/blah/g,'blah1 blah2') )
outputs:
[ [ 'blah', index: 0 ], [ 'blah', index: 6 ] ]
Based on Agus's function, but I prefer return just the match values:
var bob = "> bob <";
function matchAll(str, regex) {
var res = [];
var m;
if (regex.global) {
while (m = regex.exec(str)) {
res.push(m[1]);
}
} else {
if (m = regex.exec(str)) {
res.push(m[1]);
}
}
return res;
}
var Amatch = matchAll(bob, /(&.*?;)/g);
console.log(Amatch); // yeilds: [>, <]
Iterables are nicer:
const matches = (text, pattern) => ({
[Symbol.iterator]: function * () {
const clone = new RegExp(pattern.source, pattern.flags);
let match = null;
do {
match = clone.exec(text);
if (match) {
yield match;
}
} while (match);
}
});
Usage in a loop:
for (const match of matches('abcdefabcdef', /ab/g)) {
console.log(match);
}
Or if you want an array:
[ ...matches('abcdefabcdef', /ab/g) ]
Here is my function to get the matches :
function getAllMatches(regex, text) {
if (regex.constructor !== RegExp) {
throw new Error('not RegExp');
}
var res = [];
var match = null;
if (regex.global) {
while (match = regex.exec(text)) {
res.push(match);
}
}
else {
if (match = regex.exec(text)) {
res.push(match);
}
}
return res;
}
// Example:
var regex = /abc|def|ghi/g;
var res = getAllMatches(regex, 'abcdefghi');
res.forEach(function (item) {
console.log(item[0]);
});
If you're able to use matchAll here's a trick:
Array.From has a 'selector' parameter so instead of ending up with an array of awkward 'match' results you can project it to what you really need:
Array.from(str.matchAll(regexp), m => m[0]);
If you have named groups eg. (/(?<firstname>[a-z][A-Z]+)/g) you could do this:
Array.from(str.matchAll(regexp), m => m.groups.firstName);
Since ES9, there's now a simpler, better way of getting all the matches, together with information about the capture groups, and their index:
const string = 'Mice like to dice rice';
const regex = /.ice/gu;
for(const match of string.matchAll(regex)) {
console.log(match);
}
// ["mice", index: 0, input: "mice like to dice rice", groups:
undefined]
// ["dice", index: 13, input: "mice like to dice rice",
groups: undefined]
// ["rice", index: 18, input: "mice like to dice
rice", groups: undefined]
It is currently supported in Chrome, Firefox, Opera. Depending on when you read this, check this link to see its current support.
Use this...
var all_matches = your_string.match(re);
console.log(all_matches)
It will return an array of all matches...That would work just fine....
But remember it won't take groups in account..It will just return the full matches...
I would definatly recommend using the String.match() function, and creating a relevant RegEx for it. My example is with a list of strings, which is often necessary when scanning user inputs for keywords and phrases.
// 1) Define keywords
var keywords = ['apple', 'orange', 'banana'];
// 2) Create regex, pass "i" for case-insensitive and "g" for global search
regex = new RegExp("(" + keywords.join('|') + ")", "ig");
=> /(apple|orange|banana)/gi
// 3) Match it against any string to get all matches
"Test string for ORANGE's or apples were mentioned".match(regex);
=> ["ORANGE", "apple"]
Hope this helps!
This isn't really going to help with your more complex issue but I'm posting this anyway because it is a simple solution for people that aren't doing a global search like you are.
I've simplified the regex in the answer to be clearer (this is not a solution to your exact problem).
var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);
// We only want the group matches in the array
function purify_regex(reResult){
// Removes the Regex specific values and clones the array to prevent mutation
let purifiedArray = [...reResult];
// Removes the full match value at position 0
purifiedArray.shift();
// Returns a pure array without mutating the original regex result
return purifiedArray;
}
// purifiedResult= ["description", "aoeu"]
That looks more verbose than it is because of the comments, this is what it looks like without comments
var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);
function purify_regex(reResult){
let purifiedArray = [...reResult];
purifiedArray.shift();
return purifiedArray;
}
Note that any groups that do not match will be listed in the array as undefined values.
This solution uses the ES6 spread operator to purify the array of regex specific values. You will need to run your code through Babel if you want IE11 support.
Here's a one line solution without a while loop.
The order is preserved in the resulting list.
The potential downsides are
It clones the regex for every match.
The result is in a different form than expected solutions. You'll need to process them one more time.
let re = /\s*([^[:]+):\"([^"]+)"/g
let str = '[description:"aoeu" uuid:"123sth"]'
(str.match(re) || []).map(e => RegExp(re.source, re.flags).exec(e))
[ [ 'description:"aoeu"',
'description',
'aoeu',
index: 0,
input: 'description:"aoeu"',
groups: undefined ],
[ ' uuid:"123sth"',
'uuid',
'123sth',
index: 0,
input: ' uuid:"123sth"',
groups: undefined ] ]
My guess is that if there would be edge cases such as extra or missing spaces, this expression with less boundaries might also be an option:
^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.
Test
const regex = /^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$/gm;
const str = `[description:"aoeu" uuid:"123sth"]
[description : "aoeu" uuid: "123sth"]
[ description : "aoeu" uuid: "123sth" ]
[ description : "aoeu" uuid : "123sth" ]
[ description : "aoeu"uuid : "123sth" ] `;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
RegEx Circuit
jex.im visualizes regular expressions:
const re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
const matches = [...re.exec('[description:"aoeu" uuid:"123sth"]').entries()]
console.log(matches)
Basically, this is ES6 way to convert Iterator returned by exec to a regular Array
Here is my answer:
var str = '[me nombre es] : My name is. [Yo puedo] is the right word';
var reg = /\[(.*?)\]/g;
var a = str.match(reg);
a = a.toString().replace(/[\[\]]/g, "").split(','));

Categories

Resources