Find a string surrounded by square brackets and *not* prefaced with a specific character

Find a string surrounded by square brackets and *not* prefaced with a specific character - javascript

I would like to have a match with
[testing]
but not
![testing]
This is my query to grab a string surrounded by square brackets:
\[([^\]]+)\]
var match = /^[^!]*\[([^\]]+)\]/.exec(issueBody);
if (match)
{
$ISSUE_BODY.selectRange(match.index, match.index+match[0].length);
}
and it works marvelously.
However, I have spent a good half hour on http://regexr.com/ trying to skip strings with a "!" in front, and couldn't.
EDIT: I'm sorry guys I didn't realize that there were operations that could not be supported by specific interpreters. I am writing in Javascript and apparently lookbehind is not supported, I get this error:
Uncaught SyntaxError: Invalid regular expression:
/(?
Sorry for wasting time :\

You can use alternation:
(?:^|[^!])(\[[^\]]+\])
RegEx Demo
Here (?:^|[^!]) will match start of input OR any character that is NOT !
Code:
var re = /(?:^|[^!])(\[[^\]]+\])/gm;
var str = '![foobar123]\n[xyz789]';
while ((m = re.exec(str)) !== null)
console.log(m[1]);
Output:
[xyz789]

In Javascript, where lookbehinds are not supported, you can use:
^[^!]*\[([^\]]+)\]
(with the multiline flag to match every start of a line)
See it on regexr.com.
And here's a visualization from debuggex.com:

You can just use capturing:
var re = /(?:^|[^!])(\[[^[\]]*])/g;
var str = '[goodtesting] ![badtesting] ';
var m;
while ((m = re.exec(str)) !== null) {
document.getElementById("r").innerHTML += m[1] + "<br/>";
}
<div id="r"/>
The (?:^|[^!])(\[[^[\]]*]) regex matches the start of string or any character other than a ! (with a non-capturing group (?:^|[^!])) and matches and captures the substring enclosed with [ and ] that has no [ and ] inside (with (\[[^[\]]*])). When we need to get multiple matches, we need to use RegExp#exec() and access the captured groups using the indices (here, index 1).
Also, in JS, when you do not need to check what is after the match, just a lookbehind without a lookahead, you can use a reverse string technique (use a lookahead with the reversed string):
function revStr(s) {
return s.split('').reverse().join('');
}
var re = /][^[\]]*\[(?!!)/g; // Here, the regex pattern is reverse, too
var str = '![badtesting] [goodtesting]';
var m;
while ((m = re.exec(revStr(str))) !== null) { // We reverse a string here
document.getElementById("res").innerHTML += revStr(m[0]); // and the matched value here
}
<div id="res"/>
This is not possible with longer patterns but this one seems simple enough to go for it.

Related

Regex optimization and best practice

I need to parse information out from a legacy interface. We do not have the ability to update the legacy message. I'm not very proficient at regular expressions, but I managed to write one that does what I want it to do. I just need peer-review and feedback to make sure it's clean.
The message from the legacy system returns values resembling the example below.
%name0=value
%name1=value
%name2=value
Expression: /\%(.*)\=(.*)/g;
var strBody = body_text.toString();
var myRegexp = /\%(.*)\=(.*)/g;
var match = myRegexp.exec(strBody);
var objPair = {};
while (match != null) {
if (match[1]) {
objPair[match[1].toLowerCase()] = match[2];
}
match = myRegexp.exec(strBody);
}
This code works, and I can add partial matches the middle of the name/values without anything breaking. I have to assume that any combination of characters could appear in the "values" match. Meaning it could have equal and percent signs within the message.
Is this clean enough?
Is there something that could break the expression?

First of all, don't escape characters that don't need escaping: %(.*)=(.*)
The problem with your expression: An equals sign in the value would break your parser. %name0=val=ue would result in name0=val=ue instead of name0=val=ue.
One possible fix is to make the first repetition lazy by appending a question mark: %(.*?)=(.*)
But this is not optimal due to unneeded backtracking. You can do better by using a negated character class: %([^=]*)=(.*)
And finally, if empty names should not be allowed, replace the first asterisk with a plus: %([^=]+)=(.*)
This is a good resource: Regex Tutorial - Repetition with Star and Plus

Your expression is fine, and wrapping it with two capturing groups is simple to get your desired variables and values.
You likely may not need to escape some chars and it would still work.
You can use this tool and test/edit/modify/change your expressions if you wish:
%(.+)=(.+)
Since your data is pretty structured, you can also do so with string split and get the same desired outputs, if you want.
RegEx Descriptive Graph
This graph shows how the expression would work and you can visualize other expressions in this link:
JavaScript Test
const regex = /%(.+)=(.+)/gm;
const str = `%name0=value
%name1=value
%name2=value`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Performance Test
This JavaScript snippet shows the performance of that expression using a simple 1-million times for loop.
const repeat = 1000000;
const start = Date.now();
for (var i = repeat; i >= 0; i--) {
const string = '%name0=value';
const regex = /(%(.+)=(.+))/gm;
var match = string.replace(regex, "\nGroup #1: $1 \n Group #2: $2 \n Group #3: $3 \n");
}
const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");

Getting each 'word' after every underscore in a string in Javascript using regex

I'm wanting to extract each block of alphanumeric characters that come after underscores in a Javascript string. I currently have it working using a combination of string methods and regex like so:
var string = "ignore_firstMatch_match2_thirdMatch";
var firstValGone = string.substr(string.indexOf('_'));
// returns "_firstMatch_match2_thirdMatch"
var noUnderscore = firstValGone.match(/[^_]+/g);
// returns ["firstMatch", "match2" , "thirdMatch"]
I'm wondering if there's a way to do it purely using regex? Best I've managed is:
var string = "ignore_firstMatch_match2_thirdMatch";
var matchTry = string.match(/_[^_]+/g);
// returns ["_firstMatch", "_match2", "_thirdMatch"]
but that returns the preceding underscore too. Given you can't use lookbehinds in JS I don't know how to match the characters after, but exclude the underscore itself. Is this possible?

You can use a capture group (_([^_]+)) and use RegExp#exec in a loop while pushing the captured values into an array:
var re = /_([^_]+)/g;
var str = 'ignore_firstMatch_match2_thirdMatch';
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";
Note that using a string#match() with a regex defined with a global modifier /g will lose all the captured texts, that's why you cannot just use str.match(/_([^_]+)/g).

Since lookbehind is not supported in JS the only way I can think of is using a group like this.
Regex: _([^_]+) and capture group using \1 or $1.
Regex101 Demo
var myString = "ignore_firstMatch_match2_thirdMatch";
var myRegexp = /_([^_]+)/g;
match = myRegexp.exec(myString);
while (match != null) {
document.getElementById("match").innerHTML += "<br>" + match[0];
match = myRegexp.exec(myString);
}
<div id="match">
</div>
An alternate way using lookahead would be something like this.
But it takes long in JS. Killed my page thrice. Would make a good ReDoS exploit
Regex: (?=_([A-Za-z0-9]+)) and capture groups using \1 or $1.
Regex101 Demo

Why do you assume you need regex? a simple split will do the job:
string str = "ignore_firstMatch_match2_thirdMatch";
IEnumerable<string> matches = str.Split('_').Skip(1);

Regex extracting multiple matches for string [duplicate]

I'm trying to obtain all possible matches from a string using regex with javascript. It appears that my method of doing this is not matching parts of the string that have already been matched.
Variables:
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
Code:
var match = string.match(reg);
All matched results I get:
A1B1Y:A1B2Y
A1B5Y:A1B6Y
A1B9Y:A1B10Y
Matched results I want:
A1B1Y:A1B2Y
A1B2Y:A1B3Y
A1B5Y:A1B6Y
A1B6Y:A1B7Y
A1B9Y:A1B10Y
A1B10Y:A1B11Y
In my head, I want A1B1Y:A1B2Y to be a match along with A1B2Y:A1B3Y, even though A1B2Y in the string will need to be part of two matches.

Without modifying your regex, you can set it to start matching at the beginning of the second half of the match after each match using .exec and manipulating the regex object's lastIndex property.
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
var matches = [], found;
while (found = reg.exec(string)) {
matches.push(found[0]);
reg.lastIndex -= found[0].split(':')[1].length;
}
console.log(matches);
//["A1B1Y:A1B2Y", "A1B2Y:A1B3Y", "A1B5Y:A1B6Y", "A1B6Y:A1B7Y", "A1B9Y:A1B10Y", "A1B10Y:A1B11Y"]
Demo
As per Bergi's comment, you can also get the index of the last match and increment it by 1 so it instead of starting to match from the second half of the match onwards, it will start attempting to match from the second character of each match onwards:
reg.lastIndex = found.index+1;
Demo
The final outcome is the same. Though, Bergi's update has a little less code and performs slightly faster. =]

You cannot get the direct result from match, but it is possible to produce the result via RegExp.exec and with some modification to the regex:
var regex = /A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g;
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var arr;
var results = [];
while ((arr = regex.exec(input)) !== null) {
results.push(arr[0] + arr[1]);
}
I used zero-width positive look-ahead (?=pattern) in order not to consume the text, so that the overlapping portion can be rematched.
Actually, it is possible to abuse replace method to do achieve the same result:
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var results = [];
input.replace(/A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g, function ($0, $1) {
results.push($0 + $1);
return '';
});
However, since it is replace, it does extra useless replacement work.

Unfortunately, it's not quite as simple as a single string.match.
The reason is that you want overlapping matches, which the /g flag doesn't give you.
You could use lookahead:
var re = /A\d+B\d+Y(?=:A\d+B\d+Y)/g;
But now you get:
string.match(re); // ["A1B1Y", "A1B2Y", "A1B5Y", "A1B6Y", "A1B9Y", "A1B10Y"]
The reason is that lookahead is zero-width, meaning that it just says whether the pattern comes after what you're trying to match or not; it doesn't include it in the match.
You could use exec to try and grab what you want. If a regex has the /g flag, you can run exec repeatedly to get all the matches:
// using re from above to get the overlapping matches
var m;
var matches = [];
var re2 = /A\d+B\d+Y:A\d+B\d+Y/g; // make another regex to get what we need
while ((m = re.exec(string)) !== null) {
// m is a match object, which has the index of the current match
matches.push(string.substring(m.index).match(re2)[0]);
}
matches == [
"A1B1Y:A1B2Y",
"A1B2Y:A1B3Y",
"A1B5Y:A1B6Y",
"A1B6Y:A1B7Y",
"A1B9Y:A1B10Y",
"A1B10Y:A1B11Y"
];
Here's a fiddle of this in action. Open up the console to see the results
Alternatively, you could split the original string on :, then loop through the resulting array, pulling out the the ones that match when array[i] and array[i+1] both match like you want.

Counting all the occurrences of a substing in a string using regular expression

I've seen many examples of this but didn't helped. I have the following string:
var str = 'asfasdfasda'
and I want to extract the following
asfa asfasdfa asdfa asdfasda asda
i.e all sub-strings starting with 'a' and ending with 'a'
here is my regular expression
/a+[a-z]*a+/g
but this always returns me only one match:
[ 'asdfasdfsdfa' ]
Someone can point out mistake in my implementation.
Thanks.
Edit Corrected no of substrings needed. Please note that overlapping and duplicate substring are required as well.

For capturing overlapping matches you will need to lookahead regex and grab the captured group #1 and #2:
/(?=(a.*?a))(?=(a.*a))/gi
RegEx Demo
Explanation:
(?=...) is called a lookahead which is a zero-width assertion like anchors or word boundary. It just looks ahead but doesn't move the regex pointer ahead thus giving us the ability to grab overlapping matches in groups.
See more on look arounds
Code:
var re = /(?=(a.*?a))(?=(a.*a))/gi;
var str = 'asfasdfasda';
var m;
var result = {};
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex)
re.lastIndex++;
result[m[1]]=1;
result[m[2]]=1;
}
console.log(Object.keys(result));
//=> ["asfa", "asfasdfasda", "asdfa", "asdfasda", "asda"]

parser doesnt goto previous state on tape to match the start a again.
var str = 'asfaasdfaasda'; // you need to have extra 'a' to mark the start of next string
var substrs = str.match(/a[b-z]*a/g); // notice the regular expression is changed.
alert(substrs)

You can count it this way:
var str = "asfasdfasda";
var regex = /a+[a-z]*a+/g, result, indices = [];
while ((result = regex.exec(str))) {
console.log(result.index); // you can instead count the values here.
}

Javascript Remove strings in beginning and end

base on the following string
...here..
..there...
.their.here.
How can i remove the . on the beginning and end of string like the trim that removes all spaces, using javascript
the output should be
here
there
their.here

These are the reasons why the RegEx for this task is /(^\.+|\.+$)/mg:
Inside /()/ is where you write the pattern of the substring you want to find in the string:
/(ol)/ This will find the substring ol in the string.
var x = "colt".replace(/(ol)/, 'a'); will give you x == "cat";
The ^\.+|\.+$ in /()/ is separated into 2 parts by the symbol | [means or]
^\.+ and \.+$
^\.+ means to find as many . as possible at the start.
^ means at the start; \ is to escape the character; adding + behind a character means to match any string containing one or more that character
\.+$ means to find as many . as possible at the end.
$ means at the end.
The m behind /()/ is used to specify that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary.
The g behind /()/ is used to perform a global match: so it find all matches rather than stopping after the first match.
To learn more about RegEx you can check out this guide.

Try to use the following regex
var text = '...here..\n..there...\n.their.here.';
var replaced = text.replace(/(^\.+|\.+$)/mg, '');

Here is working Demo
Use Regex /(^\.+|\.+$)/mg
^ represent at start
\.+ one or many full stops
$ represents at end
so:
var text = '...here..\n..there...\n.their.here.';
alert(text.replace(/(^\.+|\.+$)/mg, ''));

Here is an non regular expression answer which utilizes String.prototype
String.prototype.strim = function(needle){
var first_pos = 0;
var last_pos = this.length-1;
//find first non needle char position
for(var i = 0; i<this.length;i++){
if(this.charAt(i) !== needle){
first_pos = (i == 0? 0:i);
break;
}
}
//find last non needle char position
for(var i = this.length-1; i>0;i--){
if(this.charAt(i) !== needle){
last_pos = (i == this.length? this.length:i+1);
break;
}
}
return this.substring(first_pos,last_pos);
}
alert("...here..".strim('.'));
alert("..there...".strim('.'))
alert(".their.here.".strim('.'))
alert("hereagain..".strim('.'))
and see it working here : http://jsfiddle.net/cettox/VQPbp/

Slightly more code-golfy, if not readable, non-regexp prototype extension:
String.prototype.strim = function(needle) {
var out = this;
while (0 === out.indexOf(needle))
out = out.substr(needle.length);
while (out.length === out.lastIndexOf(needle) + needle.length)
out = out.slice(0,out.length-needle.length);
return out;
}
var spam = "this is a string that ends with thisthis";
alert("#" + spam.strim("this") + "#");
Fiddle-ige

Use RegEx with javaScript Replace
var res = s.replace(/(^\.+|\.+$)/mg, '');

We can use replace() method to remove the unwanted string in a string
Example:
var str = '<pre>I'm big fan of Stackoverflow</pre>'
str.replace(/<pre>/g, '').replace(/<\/pre>/g, '')
console.log(str)
output:
Check rules on RULES blotter

Develop Reference

JavaScript is the programming language of the Web.

Find a string surrounded by square brackets and not prefaced with a specific character - javascript

You can use alternation: (?:^|[^!])(\[[^\]]+\]) RegEx Demo Here (?:^|[^!]) will match start of input OR any character that is NOT ! Code: var re = /(?:^|[^!])(\[[^\]]+\])/gm; var str = '![foobar123]\n[xyz789]'; while ((m = re.exec(str)) !== null) console.log(m[1]); Output: [xyz789]

In Javascript, where lookbehinds are not supported, you can use: ^[^!]*\[([^\]]+)\] (with the multiline flag to match every start of a line) See it on regexr.com. And here's a visualization from debuggex.com:

Related

Regex optimization and best practice

Getting each 'word' after every underscore in a string in Javascript using regex

Regex extracting multiple matches for string [duplicate]

Counting all the occurrences of a substing in a string using regular expression

Javascript Remove strings in beginning and end

Categories

Resources