js regex discrepancy - javascript

I have a JS regex match that seems to include the brackets incorrectly. I tested it out at Regex101 and it seems to work appropriately there but when I run it I get this alert response:
[#],[Type,' '],[Problem w/ICD],['- ',Assessment],[' : ',Comment],[LF],[LF]
var temp = "[#]. [Type,' '][Problem w/ICD]['- ',Assessment][' : ',Comment][LF][LF]";
var rep = temp.match(/\[(.*?)\]/g);
alert(rep);
Why are the brackets included when they are outside the capture group?

The brackets are included because when using string#match and a regex with /g modifier, you will lose capturing groups.
If the regular expression includes the g flag, the method returns an Array containing all matched substrings rather than match objects. Captured groups are not returned.
You need to use a RegExp#exec() in a loop, and access the first capturing group via index 1.
var re = /\[(.*?)\]/g;
var str = '[#]. [Type,\' \'][Problem w/ICD][\'- \',Assessment][\' : \',Comment][LF][LF]';
var m;
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[1]);
}
console.log(res);
Result:
["#", "Type,' '", "Problem w/ICD", "'- ',Assessment", "' : ',Comment", "LF", "LF"]

Related

Getting each 'word' after every underscore in a string in Javascript using regex

I'm wanting to extract each block of alphanumeric characters that come after underscores in a Javascript string. I currently have it working using a combination of string methods and regex like so:
var string = "ignore_firstMatch_match2_thirdMatch";
var firstValGone = string.substr(string.indexOf('_'));
// returns "_firstMatch_match2_thirdMatch"
var noUnderscore = firstValGone.match(/[^_]+/g);
// returns ["firstMatch", "match2" , "thirdMatch"]
I'm wondering if there's a way to do it purely using regex? Best I've managed is:
var string = "ignore_firstMatch_match2_thirdMatch";
var matchTry = string.match(/_[^_]+/g);
// returns ["_firstMatch", "_match2", "_thirdMatch"]
but that returns the preceding underscore too. Given you can't use lookbehinds in JS I don't know how to match the characters after, but exclude the underscore itself. Is this possible?
You can use a capture group (_([^_]+)) and use RegExp#exec in a loop while pushing the captured values into an array:
var re = /_([^_]+)/g;
var str = 'ignore_firstMatch_match2_thirdMatch';
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";
Note that using a string#match() with a regex defined with a global modifier /g will lose all the captured texts, that's why you cannot just use str.match(/_([^_]+)/g).
Since lookbehind is not supported in JS the only way I can think of is using a group like this.
Regex: _([^_]+) and capture group using \1 or $1.
Regex101 Demo
var myString = "ignore_firstMatch_match2_thirdMatch";
var myRegexp = /_([^_]+)/g;
match = myRegexp.exec(myString);
while (match != null) {
document.getElementById("match").innerHTML += "<br>" + match[0];
match = myRegexp.exec(myString);
}
<div id="match">
</div>
An alternate way using lookahead would be something like this.
But it takes long in JS. Killed my page thrice. Would make a good ReDoS exploit
Regex: (?=_([A-Za-z0-9]+)) and capture groups using \1 or $1.
Regex101 Demo
Why do you assume you need regex? a simple split will do the job:
string str = "ignore_firstMatch_match2_thirdMatch";
IEnumerable<string> matches = str.Split('_').Skip(1);

Find a string surrounded by square brackets and *not* prefaced with a specific character

I would like to have a match with
[testing]
but not
![testing]
This is my query to grab a string surrounded by square brackets:
\[([^\]]+)\]
var match = /^[^!]*\[([^\]]+)\]/.exec(issueBody);
if (match)
{
$ISSUE_BODY.selectRange(match.index, match.index+match[0].length);
}
and it works marvelously.
However, I have spent a good half hour on http://regexr.com/ trying to skip strings with a "!" in front, and couldn't.
EDIT: I'm sorry guys I didn't realize that there were operations that could not be supported by specific interpreters. I am writing in Javascript and apparently lookbehind is not supported, I get this error:
Uncaught SyntaxError: Invalid regular expression:
/(?
Sorry for wasting time :\
You can use alternation:
(?:^|[^!])(\[[^\]]+\])
RegEx Demo
Here (?:^|[^!]) will match start of input OR any character that is NOT !
Code:
var re = /(?:^|[^!])(\[[^\]]+\])/gm;
var str = '![foobar123]\n[xyz789]';
while ((m = re.exec(str)) !== null)
console.log(m[1]);
Output:
[xyz789]
In Javascript, where lookbehinds are not supported, you can use:
^[^!]*\[([^\]]+)\]
(with the multiline flag to match every start of a line)
See it on regexr.com.
And here's a visualization from debuggex.com:
You can just use capturing:
var re = /(?:^|[^!])(\[[^[\]]*])/g;
var str = '[goodtesting] ![badtesting] ';
var m;
while ((m = re.exec(str)) !== null) {
document.getElementById("r").innerHTML += m[1] + "<br/>";
}
<div id="r"/>
The (?:^|[^!])(\[[^[\]]*]) regex matches the start of string or any character other than a ! (with a non-capturing group (?:^|[^!])) and matches and captures the substring enclosed with [ and ] that has no [ and ] inside (with (\[[^[\]]*])). When we need to get multiple matches, we need to use RegExp#exec() and access the captured groups using the indices (here, index 1).
Also, in JS, when you do not need to check what is after the match, just a lookbehind without a lookahead, you can use a reverse string technique (use a lookahead with the reversed string):
function revStr(s) {
return s.split('').reverse().join('');
}
var re = /][^[\]]*\[(?!!)/g; // Here, the regex pattern is reverse, too
var str = '![badtesting] [goodtesting]';
var m;
while ((m = re.exec(revStr(str))) !== null) { // We reverse a string here
document.getElementById("res").innerHTML += revStr(m[0]); // and the matched value here
}
<div id="res"/>
This is not possible with longer patterns but this one seems simple enough to go for it.

Regex extracting multiple matches for string [duplicate]

I'm trying to obtain all possible matches from a string using regex with javascript. It appears that my method of doing this is not matching parts of the string that have already been matched.
Variables:
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
Code:
var match = string.match(reg);
All matched results I get:
A1B1Y:A1B2Y
A1B5Y:A1B6Y
A1B9Y:A1B10Y
Matched results I want:
A1B1Y:A1B2Y
A1B2Y:A1B3Y
A1B5Y:A1B6Y
A1B6Y:A1B7Y
A1B9Y:A1B10Y
A1B10Y:A1B11Y
In my head, I want A1B1Y:A1B2Y to be a match along with A1B2Y:A1B3Y, even though A1B2Y in the string will need to be part of two matches.
Without modifying your regex, you can set it to start matching at the beginning of the second half of the match after each match using .exec and manipulating the regex object's lastIndex property.
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
var matches = [], found;
while (found = reg.exec(string)) {
matches.push(found[0]);
reg.lastIndex -= found[0].split(':')[1].length;
}
console.log(matches);
//["A1B1Y:A1B2Y", "A1B2Y:A1B3Y", "A1B5Y:A1B6Y", "A1B6Y:A1B7Y", "A1B9Y:A1B10Y", "A1B10Y:A1B11Y"]
Demo
As per Bergi's comment, you can also get the index of the last match and increment it by 1 so it instead of starting to match from the second half of the match onwards, it will start attempting to match from the second character of each match onwards:
reg.lastIndex = found.index+1;
Demo
The final outcome is the same. Though, Bergi's update has a little less code and performs slightly faster. =]
You cannot get the direct result from match, but it is possible to produce the result via RegExp.exec and with some modification to the regex:
var regex = /A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g;
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var arr;
var results = [];
while ((arr = regex.exec(input)) !== null) {
results.push(arr[0] + arr[1]);
}
I used zero-width positive look-ahead (?=pattern) in order not to consume the text, so that the overlapping portion can be rematched.
Actually, it is possible to abuse replace method to do achieve the same result:
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var results = [];
input.replace(/A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g, function ($0, $1) {
results.push($0 + $1);
return '';
});
However, since it is replace, it does extra useless replacement work.
Unfortunately, it's not quite as simple as a single string.match.
The reason is that you want overlapping matches, which the /g flag doesn't give you.
You could use lookahead:
var re = /A\d+B\d+Y(?=:A\d+B\d+Y)/g;
But now you get:
string.match(re); // ["A1B1Y", "A1B2Y", "A1B5Y", "A1B6Y", "A1B9Y", "A1B10Y"]
The reason is that lookahead is zero-width, meaning that it just says whether the pattern comes after what you're trying to match or not; it doesn't include it in the match.
You could use exec to try and grab what you want. If a regex has the /g flag, you can run exec repeatedly to get all the matches:
// using re from above to get the overlapping matches
var m;
var matches = [];
var re2 = /A\d+B\d+Y:A\d+B\d+Y/g; // make another regex to get what we need
while ((m = re.exec(string)) !== null) {
// m is a match object, which has the index of the current match
matches.push(string.substring(m.index).match(re2)[0]);
}
matches == [
"A1B1Y:A1B2Y",
"A1B2Y:A1B3Y",
"A1B5Y:A1B6Y",
"A1B6Y:A1B7Y",
"A1B9Y:A1B10Y",
"A1B10Y:A1B11Y"
];
Here's a fiddle of this in action. Open up the console to see the results
Alternatively, you could split the original string on :, then loop through the resulting array, pulling out the the ones that match when array[i] and array[i+1] both match like you want.

Display characters other than alphabets using reqular expression

I have tried to display characters other than alphabets in the particular string but it is displaying only the first char.
var myArray = /[^a-zA-Z]+/g.exec("cdAbb#2547dbsbz78678");
The reason it is only displaying the first character is because with using exec and the g modifier (global), this method is meant to be used in a loop for getting all sub matches.
var str = "cdAbb#2547dbsbz78678";
var re = /[^a-zA-Z]+/g;
var myArray;
while (myArray = re.exec(str)) {
console.log(myArray[0]);
}
Output
#2547
78678
If you were wanting to combine the matches you could use the following.
var str = "cdAbb#2547dbsbz78678",
res = str.match(/[\W\d]+/g).join('');
# => "#254778678"
Or do a replacement
str = str.replace(/[a-z]+/gi, '');
You can do:
"cdAbb#2547dbsbz78678".match(/[^a-zA-Z]+/g).join('');
//=> #254778678
RegExp.exec with g (global) modifier needs to run in loop to give you all the matches.

how do I combine multiple matches (/g) with backreferences in javascript regex match

I'm confused about the array returned by a regex match when using both /g (to get multiple matches) and parentheses (to get backreferences). It's not clear to me how to get the backreferences because the subscript of the match array seems to refer to the multiple matches, not the back references.
for instance:
string = "#abc #bcd #cde";
re2 = '#([a-z]+)';
p = new RegExp(re2,["g"]);
m = string.match(p)
for (var i in m) { alert(m[i]; }
this is returning "#abc", "#bcd", "#cde"
but I want it to return "abc", "bcd", "cde"
how do I get the latter?
var str = "#abc #bcd #cde",
re = /#([a-z]+)/g,
match;
while (match = re.exec(str)) {
// match[1] contains text matched by first group, match[2] - second, etc.
alert(match[1]);
}
You should use non-capturing group:
(?:#)([a-z]+)

Categories

Resources