JS: matching entries with capture groups, accounting for new lines

JS: matching entries with capture groups, accounting for new lines - javascript

Given this text:
1/12/2011
I did something.
10/5/2013
I did something else.
Here is another line.
And another.
5/17/2014
Lalala.
More text on another line.
I would like to use regex (or maybe some other means?) to get this:
["1/12/2011", "I did something.", "10/5/2013", "I did something else.\n\nHere is another line.\n\nAnd another.", "5/17/2014", "Lalala.\nMore text on another line."]
The date part and content part are each separate entries, alternating.
I've tried using [^] instead of the dot since JS's .* does not match new lines (as Matching multiline Patterns says), but then the match is greedy and takes up too much, so the resulting array only has 1 entry:
var split_pattern = /\b(\d\d?\/\d\d?\/\d\d\d\d)\n([^]+)/gm;
var array_of_mems = contents.match(split_pattern);
// => ["1/12/2011↵I did something else..."]
If I add a question mark to get [^]+?, which according to How to make Regular expression into non-greedy? makes the match non-greedy, then I only get the first character of the content part.
What's the best method? Thanks in advance.

(\d{1,2}\/\d{1,2}\/\d{4})\n|((?:(?!\n*\d{1,2}\/\d{1,2}\/\d{4})[\s\S])+)
You can try this.grab the captures.See demo.
https://regex101.com/r/sJ9gM7/126
var re = /(\d{1,2}\/\d{1,2}\/\d{4})\n|((?:(?!\n*\d{1,2}\/\d{1,2}\/\d{4})[\s\S])+)/gim;
var str = '1/12/2011\nI did something.\n\n10/5/2013\nI did something else.\n\nHere is another line.\n\nAnd another.\n\n5/17/2014\nLalala.\nMore text on another line.';
var m;
if ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
// View your result using the m-variable.
// eg m[0] etc.
}

You can use the exec() method in a loop to get your desired results.
var re = /^([\d/]+)\s*((?:(?!\s*^[\d/]+)[\S\s])+)/gm,
matches = [];
while (m = re.exec(str)) {
matches.push(m[1]);
matches.push(m[2]);
}
Output
[ '1/12/2011',
'I did something.',
'10/5/2013',
'I did something else.\n\nHere is another line.\n\nAnd another.',
'5/17/2014',
'Lalala.\nMore text on another line.' ]
eval.in

Related

Find a string surrounded by square brackets and not prefaced with a specific character

I would like to have a match with
[testing]
but not
![testing]
This is my query to grab a string surrounded by square brackets:
\[([^\]]+)\]
var match = /^[^!]*\[([^\]]+)\]/.exec(issueBody);
if (match)
{
$ISSUE_BODY.selectRange(match.index, match.index+match[0].length);
}
and it works marvelously.
However, I have spent a good half hour on http://regexr.com/ trying to skip strings with a "!" in front, and couldn't.
EDIT: I'm sorry guys I didn't realize that there were operations that could not be supported by specific interpreters. I am writing in Javascript and apparently lookbehind is not supported, I get this error:
Uncaught SyntaxError: Invalid regular expression:
/(?
Sorry for wasting time :\

You can use alternation:
(?:^|[^!])(\[[^\]]+\])
RegEx Demo
Here (?:^|[^!]) will match start of input OR any character that is NOT !
Code:
var re = /(?:^|[^!])(\[[^\]]+\])/gm;
var str = '![foobar123]\n[xyz789]';
while ((m = re.exec(str)) !== null)
console.log(m[1]);
Output:
[xyz789]

In Javascript, where lookbehinds are not supported, you can use:
^[^!]*\[([^\]]+)\]
(with the multiline flag to match every start of a line)
See it on regexr.com.
And here's a visualization from debuggex.com:

You can just use capturing:
var re = /(?:^|[^!])(\[[^[\]]*])/g;
var str = '[goodtesting] ![badtesting] ';
var m;
while ((m = re.exec(str)) !== null) {
document.getElementById("r").innerHTML += m[1] + "<br/>";
}
<div id="r"/>
The (?:^|[^!])(\[[^[\]]*]) regex matches the start of string or any character other than a ! (with a non-capturing group (?:^|[^!])) and matches and captures the substring enclosed with [ and ] that has no [ and ] inside (with (\[[^[\]]*])). When we need to get multiple matches, we need to use RegExp#exec() and access the captured groups using the indices (here, index 1).
Also, in JS, when you do not need to check what is after the match, just a lookbehind without a lookahead, you can use a reverse string technique (use a lookahead with the reversed string):
function revStr(s) {
return s.split('').reverse().join('');
}
var re = /][^[\]]*\[(?!!)/g; // Here, the regex pattern is reverse, too
var str = '![badtesting] [goodtesting]';
var m;
while ((m = re.exec(revStr(str))) !== null) { // We reverse a string here
document.getElementById("res").innerHTML += revStr(m[0]); // and the matched value here
}
<div id="res"/>
This is not possible with longer patterns but this one seems simple enough to go for it.

Regex extracting multiple matches for string [duplicate]

I'm trying to obtain all possible matches from a string using regex with javascript. It appears that my method of doing this is not matching parts of the string that have already been matched.
Variables:
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
Code:
var match = string.match(reg);
All matched results I get:
A1B1Y:A1B2Y
A1B5Y:A1B6Y
A1B9Y:A1B10Y
Matched results I want:
A1B1Y:A1B2Y
A1B2Y:A1B3Y
A1B5Y:A1B6Y
A1B6Y:A1B7Y
A1B9Y:A1B10Y
A1B10Y:A1B11Y
In my head, I want A1B1Y:A1B2Y to be a match along with A1B2Y:A1B3Y, even though A1B2Y in the string will need to be part of two matches.

Without modifying your regex, you can set it to start matching at the beginning of the second half of the match after each match using .exec and manipulating the regex object's lastIndex property.
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
var matches = [], found;
while (found = reg.exec(string)) {
matches.push(found[0]);
reg.lastIndex -= found[0].split(':')[1].length;
}
console.log(matches);
//["A1B1Y:A1B2Y", "A1B2Y:A1B3Y", "A1B5Y:A1B6Y", "A1B6Y:A1B7Y", "A1B9Y:A1B10Y", "A1B10Y:A1B11Y"]
Demo
As per Bergi's comment, you can also get the index of the last match and increment it by 1 so it instead of starting to match from the second half of the match onwards, it will start attempting to match from the second character of each match onwards:
reg.lastIndex = found.index+1;
Demo
The final outcome is the same. Though, Bergi's update has a little less code and performs slightly faster. =]

You cannot get the direct result from match, but it is possible to produce the result via RegExp.exec and with some modification to the regex:
var regex = /A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g;
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var arr;
var results = [];
while ((arr = regex.exec(input)) !== null) {
results.push(arr[0] + arr[1]);
}
I used zero-width positive look-ahead (?=pattern) in order not to consume the text, so that the overlapping portion can be rematched.
Actually, it is possible to abuse replace method to do achieve the same result:
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var results = [];
input.replace(/A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g, function ($0, $1) {
results.push($0 + $1);
return '';
});
However, since it is replace, it does extra useless replacement work.

Unfortunately, it's not quite as simple as a single string.match.
The reason is that you want overlapping matches, which the /g flag doesn't give you.
You could use lookahead:
var re = /A\d+B\d+Y(?=:A\d+B\d+Y)/g;
But now you get:
string.match(re); // ["A1B1Y", "A1B2Y", "A1B5Y", "A1B6Y", "A1B9Y", "A1B10Y"]
The reason is that lookahead is zero-width, meaning that it just says whether the pattern comes after what you're trying to match or not; it doesn't include it in the match.
You could use exec to try and grab what you want. If a regex has the /g flag, you can run exec repeatedly to get all the matches:
// using re from above to get the overlapping matches
var m;
var matches = [];
var re2 = /A\d+B\d+Y:A\d+B\d+Y/g; // make another regex to get what we need
while ((m = re.exec(string)) !== null) {
// m is a match object, which has the index of the current match
matches.push(string.substring(m.index).match(re2)[0]);
}
matches == [
"A1B1Y:A1B2Y",
"A1B2Y:A1B3Y",
"A1B5Y:A1B6Y",
"A1B6Y:A1B7Y",
"A1B9Y:A1B10Y",
"A1B10Y:A1B11Y"
];
Here's a fiddle of this in action. Open up the console to see the results
Alternatively, you could split the original string on :, then loop through the resulting array, pulling out the the ones that match when array[i] and array[i+1] both match like you want.

Counting all the occurrences of a substing in a string using regular expression

I've seen many examples of this but didn't helped. I have the following string:
var str = 'asfasdfasda'
and I want to extract the following
asfa asfasdfa asdfa asdfasda asda
i.e all sub-strings starting with 'a' and ending with 'a'
here is my regular expression
/a+[a-z]*a+/g
but this always returns me only one match:
[ 'asdfasdfsdfa' ]
Someone can point out mistake in my implementation.
Thanks.
Edit Corrected no of substrings needed. Please note that overlapping and duplicate substring are required as well.

For capturing overlapping matches you will need to lookahead regex and grab the captured group #1 and #2:
/(?=(a.*?a))(?=(a.*a))/gi
RegEx Demo
Explanation:
(?=...) is called a lookahead which is a zero-width assertion like anchors or word boundary. It just looks ahead but doesn't move the regex pointer ahead thus giving us the ability to grab overlapping matches in groups.
See more on look arounds
Code:
var re = /(?=(a.*?a))(?=(a.*a))/gi;
var str = 'asfasdfasda';
var m;
var result = {};
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex)
re.lastIndex++;
result[m[1]]=1;
result[m[2]]=1;
}
console.log(Object.keys(result));
//=> ["asfa", "asfasdfasda", "asdfa", "asdfasda", "asda"]

parser doesnt goto previous state on tape to match the start a again.
var str = 'asfaasdfaasda'; // you need to have extra 'a' to mark the start of next string
var substrs = str.match(/a[b-z]*a/g); // notice the regular expression is changed.
alert(substrs)

You can count it this way:
var str = "asfasdfasda";
var regex = /a+[a-z]*a+/g, result, indices = [];
while ((result = regex.exec(str))) {
console.log(result.index); // you can instead count the values here.
}

Regular Expression to find complex markers

I want to use JavaScript's regular expression something like this
/marker\d+"?(\w+)"?\s/gi
In a string like this:
IDoHaveMarker1"apple" IDoAlsoHaveAMarker352pear LastPointMakingmarker3134"foo"
And I want it to return an array like this:
[ "apple", "pear", "foo" ]
The quotes are to make clear they are strings. They shouldn't be in the result.

If you are asking about how to actually use the regex:
To get all captures of multiple (global) matches you have to use a loop and exec in JavaScript:
var regex = /marker\d+"?(\w+)/gi;
var result = [];
var match;
while (match = regex.exec(input)) {
result.push(match[1]);
}
(Note that you can omit the trailing "?\s? if you are only interested in the capture, since they are optional anyway, so they don't affect the matched result.)
And no, g will not allow you to do all of that in one call. If you had omitted g then exec would return the same match every time.
As Blender mentioned, if you want to rule out things like Marker13"something Marker14bar (unmatched ") you need to use another capturing group and a backreference. Note that this will push your desired capture to index 2:
var regex = /marker\d+("?)(\w+)\1/gi;
var result = [];
var match;
while (match = regex.exec(input)) {
result.push(match[2]);
}

How can I split this string in JavaScript?

I have strings like this:
ab
rx'
wq''
pok'''
oyu,
mi,,,,
Basically, I want to split the string into two parts. The first part should have the alphabetical characters intact, the second part should have the non-alphabetical characters.
The alphabetical part is guaranteed to be 2-3 lowercase characters between a and z; the non-alphabetical part can be any length, and is gauranteed to only be the characters , or ', but not both in the one string (e.g. eex,', will never occur).
So the result should be:
[ab][]
[rx][']
[wq]['']
[pok][''']
[oyu][,]
[mi][,,,,]
How can I do this? I'm guessing a regular expression but I'm not particularly adept at coming up with them.

Regular expressions have is a nice special called "word boundary" (\b). You can use it, well, to detect the boundary of a word, which is a sequence of alpha-numerical characters.
So all you have to do is
foo.split(/\b/)
For example,
"pok'''".split(/\b/) // ["pok", "'''"]

If you can 100% guarantee that:
Letter-strings are 2 or 3 characters
There are always one or more primes/commas
There is never any empty space before, after or in-between the letters and the marks
(aside from line-break)
You can use:
/^([a-zA-Z]{2,3})('+|,+)$/gm
var arr = /^([a-zA-Z]{2,3})('+|,+)$/gm.exec("pok'''");
arr === ["pok'''", "pok", "'''"];
var arr = /^([a-zA-Z]{2,3})('+|,+)$/gm.exec("baf,,,");
arr === ["baf,,,", "baf", ",,,"];
Of course, save yourself some sanity, and save that RegEx as a var.
And as a warning, if you haven't dealt with RegEx like this:
If a match isn't found -- if you try to match foo','' by mixing marks, or you have 0-1 or 4+ letters, or 0 marks... ...then instead of getting an array back, you'll get null.
So you can do this:
var reg = /^([a-zA-Z]{2,3})('+|,+)$/gm,
string = "foobar'',,''",
result_array = reg.exec(string) || [string];
In this case, the result of the exec is null; by putting the || (or) there, we can return an array that has the original string in it, as index-0.
Why?
Because the result of a successful exec will have 3 slots; [*string*, *letters*, *marks*].
You might be tempted to just read the letters like result_array[1].
But if the match failed and result_array === null, then JavaScript will scream at you for trying null[1].
So returning the array at the end of a failed exec will allow you to get result_array[1] === undefined (ie: there was no match to the pattern, so there are no letters in index-1), rather than a JS error.

You could try something like that:
function splitString(string){
var match1 = null;
var match2 = null;
var stringArray = new Array();
match1 = string.indexOf(',');
match2 = string.indexOf('`');
if(match1 != 0){
stringArray = [string.slice(0,match1-1),string.slice(match1,string.length-1];
}
else if(match2 != 0){
stringArray = [string.slice(0,match2-1),string.slice(match2,string.length-1];
}
else{
stringArray = [string];
}
}

var str = "mi,,,,";
var idx = str.search(/\W/);
if(idx) {
var list = [str.slice(0, idx), str.slice(idx)]
}
You'll have the parts in list[0] and list[1].
P.S. There might be some better ways than this.

yourStr.match(/(\w{2,3})([,']*)/)

if (match = string.match(/^([a-z]{2,3})(,+?$|'+?$)/)) {
match = match.slice(1);
}

Develop Reference

JavaScript is the programming language of the Web.

JS: matching entries with capture groups, accounting for new lines - javascript

Related

Find a string surrounded by square brackets and not prefaced with a specific character

Regex extracting multiple matches for string [duplicate]

Counting all the occurrences of a substing in a string using regular expression

Regular Expression to find complex markers

How can I split this string in JavaScript?

Categories

Resources

Develop Reference

JavaScript is the programming language of the Web.

JS: matching entries with capture groups, accounting for new lines - javascript

Related

Find a string surrounded by square brackets and *not* prefaced with a specific character

Regex extracting multiple matches for string [duplicate]

Counting all the occurrences of a substing in a string using regular expression

Regular Expression to find complex markers

How can I split this string in JavaScript?

Categories

Resources

Find a string surrounded by square brackets and not prefaced with a specific character