Regular Expression to find complex markers - javascript

I want to use JavaScript's regular expression something like this
/marker\d+"?(\w+)"?\s/gi
In a string like this:
IDoHaveMarker1"apple" IDoAlsoHaveAMarker352pear LastPointMakingmarker3134"foo"
And I want it to return an array like this:
[ "apple", "pear", "foo" ]
The quotes are to make clear they are strings. They shouldn't be in the result.

If you are asking about how to actually use the regex:
To get all captures of multiple (global) matches you have to use a loop and exec in JavaScript:
var regex = /marker\d+"?(\w+)/gi;
var result = [];
var match;
while (match = regex.exec(input)) {
result.push(match[1]);
}
(Note that you can omit the trailing "?\s? if you are only interested in the capture, since they are optional anyway, so they don't affect the matched result.)
And no, g will not allow you to do all of that in one call. If you had omitted g then exec would return the same match every time.
As Blender mentioned, if you want to rule out things like Marker13"something Marker14bar (unmatched ") you need to use another capturing group and a backreference. Note that this will push your desired capture to index 2:
var regex = /marker\d+("?)(\w+)\1/gi;
var result = [];
var match;
while (match = regex.exec(input)) {
result.push(match[2]);
}

Related

Regex extracting multiple matches for string [duplicate]

I'm trying to obtain all possible matches from a string using regex with javascript. It appears that my method of doing this is not matching parts of the string that have already been matched.
Variables:
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
Code:
var match = string.match(reg);
All matched results I get:
A1B1Y:A1B2Y
A1B5Y:A1B6Y
A1B9Y:A1B10Y
Matched results I want:
A1B1Y:A1B2Y
A1B2Y:A1B3Y
A1B5Y:A1B6Y
A1B6Y:A1B7Y
A1B9Y:A1B10Y
A1B10Y:A1B11Y
In my head, I want A1B1Y:A1B2Y to be a match along with A1B2Y:A1B3Y, even though A1B2Y in the string will need to be part of two matches.
Without modifying your regex, you can set it to start matching at the beginning of the second half of the match after each match using .exec and manipulating the regex object's lastIndex property.
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
var matches = [], found;
while (found = reg.exec(string)) {
matches.push(found[0]);
reg.lastIndex -= found[0].split(':')[1].length;
}
console.log(matches);
//["A1B1Y:A1B2Y", "A1B2Y:A1B3Y", "A1B5Y:A1B6Y", "A1B6Y:A1B7Y", "A1B9Y:A1B10Y", "A1B10Y:A1B11Y"]
Demo
As per Bergi's comment, you can also get the index of the last match and increment it by 1 so it instead of starting to match from the second half of the match onwards, it will start attempting to match from the second character of each match onwards:
reg.lastIndex = found.index+1;
Demo
The final outcome is the same. Though, Bergi's update has a little less code and performs slightly faster. =]
You cannot get the direct result from match, but it is possible to produce the result via RegExp.exec and with some modification to the regex:
var regex = /A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g;
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var arr;
var results = [];
while ((arr = regex.exec(input)) !== null) {
results.push(arr[0] + arr[1]);
}
I used zero-width positive look-ahead (?=pattern) in order not to consume the text, so that the overlapping portion can be rematched.
Actually, it is possible to abuse replace method to do achieve the same result:
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var results = [];
input.replace(/A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g, function ($0, $1) {
results.push($0 + $1);
return '';
});
However, since it is replace, it does extra useless replacement work.
Unfortunately, it's not quite as simple as a single string.match.
The reason is that you want overlapping matches, which the /g flag doesn't give you.
You could use lookahead:
var re = /A\d+B\d+Y(?=:A\d+B\d+Y)/g;
But now you get:
string.match(re); // ["A1B1Y", "A1B2Y", "A1B5Y", "A1B6Y", "A1B9Y", "A1B10Y"]
The reason is that lookahead is zero-width, meaning that it just says whether the pattern comes after what you're trying to match or not; it doesn't include it in the match.
You could use exec to try and grab what you want. If a regex has the /g flag, you can run exec repeatedly to get all the matches:
// using re from above to get the overlapping matches
var m;
var matches = [];
var re2 = /A\d+B\d+Y:A\d+B\d+Y/g; // make another regex to get what we need
while ((m = re.exec(string)) !== null) {
// m is a match object, which has the index of the current match
matches.push(string.substring(m.index).match(re2)[0]);
}
matches == [
"A1B1Y:A1B2Y",
"A1B2Y:A1B3Y",
"A1B5Y:A1B6Y",
"A1B6Y:A1B7Y",
"A1B9Y:A1B10Y",
"A1B10Y:A1B11Y"
];
Here's a fiddle of this in action. Open up the console to see the results
Alternatively, you could split the original string on :, then loop through the resulting array, pulling out the the ones that match when array[i] and array[i+1] both match like you want.

Javascript Regex to get text between certain characters

I need a regex in Javascript that would allow me to match an order number in two different formats of order URL:
The URLs:
http://store.apple.com/vieworder/1003123464/test#test.com
http://store.apple.com/vieworder/W411234368/test#test.com/AOS-A=
M-104121
The first one will always be all numbers, and the second one will always start with a W, followed by just numbers.
I need to be able to use a single regex to return these matches:
1003123464
W411234368
This is what I've tried so far:
/(vieworder\/)(.*?)(?=\/)/g
RegExr link
That allows me to match:
vieworder/1003123464
vieworder/W411234368
but I'd like it to not include the first capture group.
I know I could then run the result through a string.replace('vieworder/'), but it'd be cool to be able to do this in just one command.
Use your expression without grouping vieworder
vieworder\/(.*?)(?=\/)
DEMO
var string = 'http://store.apple.com/vieworder/1003123464/test#test.com http://store.apple.com/vieworder/W411234368/test#test.com/AOS-A=M-104121';
var myRegEx = /vieworder\/(.*?)(?=\/)/g;
var index = 1;
var matches = [];
var match;
while (match = myRegEx.exec(string)) {
matches.push(match[index]);
}
console.log(matches);
Use replace instead of match since js won't support lookbehinds. You could use capturing groups and exec method to print the chars present inside a particular group.
> var s1 = 'http://store.apple.com/vieworder/1003123464/test#test.com'
undefined
> var s2 = 'http://store.apple.com/vieworder/W411234368/test#test.com/AOS-A='
undefined
> s1.replace(/^.*?vieworder\/|\/.*/g, '')
'1003123464'
> s2.replace(/^.*?vieworder\/|\/.*/g, '')
'W411234368'
OR
> s1.replace(/^.*?\bvieworder\/([^\/]*)\/.*/g, '$1')
'1003123464'
I'd suggest
W?\d+
That ought to translate to "one or zero W and one or more digits".

How to extract two strings from url using regex?

I've matched a string successfully, but I need to split it and add some new segments to URL. If it is possible by regex, How to match url and extract two strings like in the example below?
Current result:
["domain.com/collection/430000000000000"]
Desired result:
["domain.com/collection/", "430000000000000"]
Current code:
var reg = new RegExp('domain.com\/collection\/[0-9]+');
var str = 'http://localhost:3000/#/domain.com/collection/430000000000000?page=0&layout=grid';
console.log(str.match(reg));
You want Regex Capture Groups.
Put the parts you want to extract into braces like this, each part forming a matching group:
new RegExp('(domain.com\/collection\/)([0-9]+)')
Then after matching, you can extract each group content by index, with index 0 being the whole string match, 1 the first group, 2 the second etc. (thanks for the addendum, jcubic!).
This is done with exec() on the regex string like described here:
/\d(\d)\d/.exec("123");
// → ["123", "2"]
First comes the whole match, then the group matches in the sequence they appear in the pattern.
You can declare an array and then fill it with the required values that you can capture with parentheses (thus, making use of capturing groups):
var reg = /(domain.com\/collection)\/([0-9]+)/g;
// ^ ^ ^ ^
var str = 'http://localhost:3000/#/domain.com/collection/430000000000000?page=0&layout=grid';
var arr = [];
while ((m = reg.exec(str)) !== null) {
arr.push(m[1]);
arr.push(m[2]);
}
console.log(arr);
Output: ["domain.com/collection", "430000000000000"]

Match repeating pattern using single group in regex

Using the JS regex object, is it possible to match a repeating pattern using a single, or few groups?
For instance, taking the following as input:
#222<abc#321cba#123>#111
Would it be possible to match both "#321" and "#123" (but not #222 or #111, as they're outside the /<.*?>/ group) using a regex similar to:
/<.*?(?:(#[0-9]+).*?)>/
(which currently only matches the first instance), when the number of matches in the input is unknown?
You'd have to loop over the inner pattern.
First use /<(.*?)>/ to extract it:
var outerRegex = /<(.*?)>/;
var match = outerRegex.exec(input);
var innerPattern = match[1];
Next, iterate over the result:
var innerRegex = /#\d+/g;
while (match = innerRegex.exec(innerPattern))
{
var result = match[0];
...
}

How to extract a string using JavaScript Regex?

I'm trying to extract a substring from a file with JavaScript Regex. Here is a slice from the file :
DATE:20091201T220000
SUMMARY:Dad's birthday
the field I want to extract is "Summary". Here is the approach:
extractSummary : function(iCalContent) {
/*
input : iCal file content
return : Event summary
*/
var arr = iCalContent.match(/^SUMMARY\:(.)*$/g);
return(arr);
}
function extractSummary(iCalContent) {
var rx = /\nSUMMARY:(.*)\n/g;
var arr = rx.exec(iCalContent);
return arr[1];
}
You need these changes:
Put the * inside the parenthesis as
suggested above. Otherwise your matching
group will contain only one
character.
Get rid of the ^ and $. With the global option they match on start and end of the full string, rather than on start and end of lines. Match on explicit newlines instead.
I suppose you want the matching group (what's
inside the parenthesis) rather than
the full array? arr[0] is
the full match ("\nSUMMARY:...") and
the next indexes contain the group
matches.
String.match(regexp) is
supposed to return an array with the
matches. In my browser it doesn't (Safari on Mac returns only the full
match, not the groups), but
Regexp.exec(string) works.
You need to use the m flag:
multiline; treat beginning and end characters (^ and $) as working
over multiple lines (i.e., match the beginning or end of each line
(delimited by \n or \r), not only the very beginning or end of the
whole input string)
Also put the * in the right place:
"DATE:20091201T220000\r\nSUMMARY:Dad's birthday".match(/^SUMMARY\:(.*)$/gm);
//------------------------------------------------------------------^ ^
//-----------------------------------------------------------------------|
Your regular expression most likely wants to be
/\nSUMMARY:(.*)$/g
A helpful little trick I like to use is to default assign on match with an array.
var arr = iCalContent.match(/\nSUMMARY:(.*)$/g) || [""]; //could also use null for empty value
return arr[0];
This way you don't get annoying type errors when you go to use arr
This code works:
let str = "governance[string_i_want]";
let res = str.match(/[^governance\[](.*)[^\]]/g);
console.log(res);
res will equal "string_i_want". However, in this example res is still an array, so do not treat res like a string.
By grouping the characters I do not want, using [^string], and matching on what is between the brackets, the code extracts the string I want!
You can try it out here: https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_match_regexp
Good luck.
(.*) instead of (.)* would be a start. The latter will only capture the last character on the line.
Also, no need to escape the :.
You should use this :
var arr = iCalContent.match(/^SUMMARY\:(.)*$/g);
return(arr[0]);
this is how you can parse iCal files with javascript
function calParse(str) {
function parse() {
var obj = {};
while(str.length) {
var p = str.shift().split(":");
var k = p.shift(), p = p.join();
switch(k) {
case "BEGIN":
obj[p] = parse();
break;
case "END":
return obj;
default:
obj[k] = p;
}
}
return obj;
}
str = str.replace(/\n /g, " ").split("\n");
return parse().VCALENDAR;
}
example =
'BEGIN:VCALENDAR\n'+
'VERSION:2.0\n'+
'PRODID:-//hacksw/handcal//NONSGML v1.0//EN\n'+
'BEGIN:VEVENT\n'+
'DTSTART:19970714T170000Z\n'+
'DTEND:19970715T035959Z\n'+
'SUMMARY:Bastille Day Party\n'+
'END:VEVENT\n'+
'END:VCALENDAR\n'
cal = calParse(example);
alert(cal.VEVENT.SUMMARY);

Categories

Resources