Javascipt regex to get string between two characters except escaped without lookbehind - javascript

I am looking for a specific javascript regex without the new lookahead/lookbehind features of Javascript 2018 that allows me to select text between two asterisk signs but ignores escaped characters.
In the following example only the text "test" and the included escaped characters are supposed to be selected according the rules above:
\*jdjdjdfdf*test*dfsdf\*adfasdasdasd*test**test\**sd* (Selected: "test", "test", "test\*")
During my research I found this solution Regex, everything between two characters except escaped characters /(?<!\\)(%.*?(?<!\\)%)/ but it uses negative lookbehinds which is supported in javascript 2018 but I need to support IE11 as well, so this solution doesn't work for me.
Then i found another approach which is almost getting there for me here: Javascript: negative lookbehind equivalent?. I altered the answer of Kamil Szot to fit my needs: ((?!([\\])).|^)(\*.*?((?!([\\])).|^)\*) Unfortuantely it doesn't work when two asterisks ** are in a row.
I have already invested a lot of hours and can't seem to get it right, any help is appreciated!
An example with what i have so far is here: https://www.regexpal.com/?fam=117350
I need to use the regexp in a string.replace call (str.replace(regexp|substr, newSubStr|function); so that I can wrap the found strings with a span element of a specific class.

You can use this regular expression:
(?:\\.|[^*])*\*((?:\\.|[^*])*)\*
Your code should then only take the (only) capture group of each match.
Like this:
var str = "\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*";
var regex = /(?:\\.|[^*])*\*((?:\\.|[^*])*)\*/g
var match;
while (match = regex.exec(str)) {
console.log(match[1]);
}
If you need to replace the matches, for instance to wrap the matches in a span tag while also dropping the asterisks, then use two capture groups:
var str = "\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*";
var regex = /((?:\\.|[^*])*)\*((?:\\.|[^*])*)\*/g
var result = str.replace(regex, "$1<span>$2</span>");
console.log(result);
One thing to be careful with: when you use string literals in JavaScript tests, escape the backslash (with another backslash). If you don't do that, the string actually will not have a backslash! To really get the backslash in the in-memory string, you need to escape the backslash.

const testStr = `\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*`;
const m = testStr.match(/\*(\\.)*t(\\.)*e(\\.)*s(\\.)*t(\\.)*\*/g).map(m => m.substr(1, m.length-2));
console.log(m);
More generic code:
const prepareRegExp = (word, delimiter = '\\*') => {
const escaped = '(\\\\.)*';
return new RegExp([
delimiter,
escaped,
[...word].join(escaped),
escaped,
delimiter
].join``, 'g');
};
const testStr = `\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*`;
const m = testStr
.match(prepareRegExp('test'))
.map(m => m.substr(1, m.length-2));
console.log(m);
https://instacode.dev/#Y29uc3QgcHJlcGFyZVJlZ0V4cCA9ICh3b3JkLCBkZWxpbWl0ZXIgPSAnXFwqJykgPT4gewogIGNvbnN0IGVzY2FwZWQgPSAnKFxcXFwuKSonOwogIHJldHVybiBuZXcgUmVnRXhwKFsKICAgIGRlbGltaXRlciwKICAgIGVzY2FwZWQsCiAgICBbLi4ud29yZF0uam9pbihlc2NhcGVkKSwKICAgIGVzY2FwZWQsCiAgICBkZWxpbWl0ZXIKICBdLmpvaW5gYCwgJ2cnKTsKfTsKCmNvbnN0IHRlc3RTdHIgPSBgXFwqamRqZGpkZmRmKnRlc3QqZGZzZGZcXCphZGZhc2Rhc2Rhc2QqdGVzdCoqdGVzdFxcKipzZCpgOwpjb25zdCBtID0gdGVzdFN0cgoJLm1hdGNoKHByZXBhcmVSZWdFeHAoJ3Rlc3QnKSkKCS5tYXAobSA9PiBtLnN1YnN0cigxLCBtLmxlbmd0aC0yKSk7Cgpjb25zb2xlLmxvZyhtKTs=

Related

Lookbehind alternative with both lookbehind and lookahead

I'm looking for a regex to split user supplied strings on the : character but not when the user has escaped the colon \: or it's part of a url, e.g. https://stackoverflow...
In javascript the majority of browsers don't yet support lookbehinds. Is it possible to apply some other approach for the lookbehind part?
In clojure/ Clojurescript on Chrome (which does support lookbehinds) this regex does the trick:
#"(?<!\):(?!//)"
but not in Safari (for example).
The main problem is that currently browsers aren't supporting the lookbehind, which is required to find and negate the prefix \ so we don't include \:.
One workaround (not very pretty but it works) is to first substitute the \: with some "symbol" you know will not occur naturally in your text, do your split, and the substitute back any \:.
For example, this method will return an empty element "" if you have "::" in your string:
let regex = /:(?!\/\/)/
//original string literal \: has to be expressed as \\:
let str = "http://example.com::hello:dolly:12\\:00\\:PM";
//substitute out any \:
str = str.replace(/\\:/g,"<colon>"); //http://example.com::hello:dolly:12<colon>00<colon>PM
//now we split 'normally' without lookbehind
let arr = str.split(regex); //[ 'http://example.com', '', 'hello', 'dolly', '12\\:00\\:PM' ]
//substitute back \:
arr = arr.map(element => element.replace(/<colon>/g, "\\:")); //[ 'http://example.com', '', 'hello', 'dolly', '12\\:00\\:PM' ]
console.log(arr);
If you're just after non-empty elements you can just do an arr.filter(Boolean) on it, or just use #Skeeve's matching solution as it's more elegant for this purpose.
An alternative could be to not search for the separator but to search for the elements:
var str="this:is\\:a:test:https://stackoverflow:80:test::test";
var elements= str.match(/((?:[^\\:]|\\:|:\/\/)+)/g);
// elements= [ "this", "is\\:a", "test", "https://stackoverflow", "80", "test", "test" ]
The elements may not be empty (Observe the"+" in the regexp) and how the empty element between the last 2 "test" is missing
You forgot that an URL can contain multiple colons. What about `http://me:password#myhost.com:8080/path?value=d:f'
Besides these I think it should work for you.
I think you can only overcome the disadvantages with a more or less sophisticated loop using regexp-exec.
P.S. I know the grouping isn't required here, but if you want to use it in regexp-exec, you'll need it.
Disadvantages:
P.P.S. Fixed the typo #chatnoir found
You might also make use of replace and pass a function as the second parameter.
You could use a pattern to match what you don't want and capture in a group what you want to keep. Then you can replace the part that you want to keep with a marker just as in the approach of #chatnoir and afterwards split on that marker.
:\/\/\S+|\\:|(:)
Explanation
:\/\/\S+ Match :// followed by 1+ times a non whitespace char
| Or
\\: Match \:
| Or
(:) Capture a : in group 1
Regex demo
let pattern = /:\/\/\S+|\\:|(:)/g;
let str = "string\\: or https://www.example.com:8000 or split:me or te\\:st or \\:test or notsplit\\:me:splitted or \\: or ftp://example.com :";
str = str.replace(pattern, function(match, group1) {
return group1 === undefined ? match : "<split>"
});
console.log(str.split("<split>").filter(Boolean));

Using search and replace with regex in javascript

I have a regular expression that I have been using in notepad++ for search&replace to manipulate some text, and I want to incorporate it into my javascript code. This is the regular expression:
Search
(?-s)(.{150,250}\.(\[\d+\])*)\h+ and replace with \1\r\n\x20\x20\x20
In essence creating new paragraphs for every 150-250 words and indenting them.
This is what I have tried in JavaScript. For a text area <textarea name="textarea1" id="textarea1"></textarea>in the HTML. I have the following JavaScript:
function rep1() {
var re1 = new RegExp('(?-s)(.{150,250}\.(\[\d+\])*)\h+');
var re2 = new RegExp('\1\r\n\x20\x20\x20');
var s = document.getElementById("textarea1").value;
s = string.replace(re1, re2);
document.getElementById("textarea1").value = s;
}
I have also tried placing the regular expressions directly as arguments for string.replace() but that doesn't work either. Any ideas what I'm doing wrong?
Several issues:
JavaScript does not support (?-s). You would need to add modifiers separately. However, this is the default setting in JavaScript, so you can just leave it out. If it was your intention to let . also match line breaks, then use [^] instead of . in JavaScript regexes.
JavaScript does not support \h -- the horizontal white space. Instead you could use [^\S\r\n].
When passing a string literal to new RegExp be aware that backslashes are escape characters for the string literal notation, so they will not end up in the regex. So either double them, or else use JavaScript's regex literal notation
In JavaScript replace will only replace the first occurrence unless you provided the g modifier to the regular expression.
The replacement (second argument to replace) should not be a regex. It should be a string, so don't apply new RegExp to it.
The backreferences in the replacement string should be of the $1 format. JavaScript does not support \1 there.
You reference string where you really want to reference s.
This should work:
function rep1() {
var re1 = /(.{150,250}\.(\[\d+\])*)[^\S\r\n]+/g;
var re2 = '$1\r\n\x20\x20\x20';
var s = document.getElementById("textarea1").value;
s = s.replace(re1, re2);
document.getElementById("textarea1").value = s;
}

remove last part of string following '&&&' with JavaScript Regex

I'm trying to use a regex in JS to remove the last part of a string. This substring starts with &&&, is followed by something not &&&, and ends with .pdf.
So, for example, the final regex should take a string like:
parent&&&child&&&grandchild.pdf
and match
parent&&&child
I'm not that great with regex's, so my best effort has been something like:
.*?(?:&&&.*\.pdf)
Which matches the whole string. Can anyone help me out?
You may use this greedy regex either in replace or in match:
var s = 'parent&&&child&&&grandchild.pdf';
// using replace
var r = s.replace(/(.*)&&&.*\.pdf$/, '$1');
console.log(r);
//=> parent&&&child
// using match
var m = s.match(/(.*)&&&.*\.pdf$/)
if (m) {
console.log(m[1]);
//=> parent&&&child
}
By using greedy pattern .* before &&& we make sure to match **last instance of &&& in input.
You want to remove the last portion, so replace it
var str = "parent&&&child&&&grandchild.pdf"
var result = str.replace(/&&&[^&]+\.pdf$/, '')
console.log(result)

Replace all character matches that are not escaped with backslash

I am using regex to replace ( in other regexes (or regexs?) with (?: to turn them into non-matching groups. My expression assumes that no (?X structures are used and looks like this:
(
[^\\] - Not backslash character
|^ - Or string beginning
)
(?:
[\(] - a bracket
)
Unfortunatelly this doesn't work in case that there are two matches next to each other, like in this case: how((\s+can|\s+do)(\s+i)?)?
With lookbehinds, the solution is easy:
/(?<=[^\\]|^)[\(]/g
But javascript doesn't support lookbehinds, so what can I do? My searches didn't bring any easy universal lookbehind alternative.
Use lookbehind through reversal:
function revStr(str) {
return str.split('').reverse().join('');
}
var rx = /[(](?=[^\\]|$)/g;
var subst = ":?(";
var data = "how((\\s+can|\\s+do)(\\s+i)?)?";
var res = revStr(revStr(data).replace(rx, subst));
document.getElementById("res").value = res;
<input id="res" />
Note that the regex pattern is also reversed so that we could use a look-ahead instead of a look-behind, and the substitution string is reversed, too. It becomes too tricky with longer regexps, but in this case, it is still not that unreadable.
One option is to do a two-pass replacement, with a token (I like unicode for this, as it's unlikely to appear elsewhere):
var s = 'how((\\s+can|\\s+do)(\\s+i)?)?';
var token = "\u1234";
// Look for the character preceding the ( you want
// to replace. We'll add the token after it.
var patt1 = /([^\\])(?=\()/g;
// The second pattern looks for the token and the (.
// We'll replace both with the desired string.
var patt2 = new RegExp(token + '\\(', 'g');
s = s.replace(patt1, "$1" + token).replace(patt2, "(?:");
console.log(s);
https://jsfiddle.net/48e75wqz/1/
(EDITED)
string example:
how((\s+can|\s+do)(\s+i)?)?
one line solution:
o='how((\\s+can|\\s+do)(\\s+i)?)?';
alert(o.replace(/\\\(/g,9e9).replace(/\(/g,'(?:').replace(/90{9}/g,'\\('))
result:
how(?:(?:\s+can|\s+do)(?:\s+i)?)?
and of course it works with strings like how((\s+\(can\)|\s+do)(\s+i)?)?

Regular Expression - Match String Not Preceded by Another String (JavaScript)

I am trying to find a regular expression that will match a string when it's NOT preceded by another specific string (in my case, when it is NOT preceded by "http://"). This is in JavaScript, and I'm running on Chrome (not that it should matter).
The sample code is:
var str = 'http://www.stackoverflow.com www.stackoverflow.com';
alert(str.replace(new RegExp('SOMETHING','g'),'rocks'));
And I want to replace SOMETHING with a regular expression that means "match www.stackoverflow.com unless it's preceded by http://". The alert should then say "http://www.stackoverflow.com rocks", naturally.
Can anyone help? It feels like I tried everything found in previous answers, but nothing works. Thanks!
As JavaScript regex engines don't support 'lookbehind' assertions, it's not possible to do with plain regex. Still, there's a workaround, involving replace callback function:
var str = "As http://JavaScript regex engines don't support `lookbehind`, it's not possible to do with plain regex. Still, there's a workaround";
var adjusted = str.replace(/\S+/g, function(match) {
return match.slice(0, 7) === 'http://'
? match
: 'rocks'
});
console.log(adjusted);
You can actually create a generator for these functions:
var replaceIfNotPrecededBy = function(notPrecededBy, replacement) {
return function(match) {
return match.slice(0, notPrecededBy.length) === notPrecededBy
? match
: replacement;
}
};
... then use it in that replace instead:
var adjusted = str.replace(/\S+/g, replaceIfNotPrecededBy('http://', 'rocks'));
JS Fiddle.
raina77ow's answer reflected the situation in 2013, but it is now outdated, as the proposal for lookbehind assertions got accepted into the ECMAScript spec in 2018.
See docs for it on MDN:
Characters
Meaning
(?<!y)x
Negative lookbehind assertion: Matches "x" only if "x" is not preceded by "y". For example, /(?<!-)\d+/ matches a number only if it is not preceded by a minus sign. /(?<!-)\d+/.exec('3') matches "3". /(?<!-)\d+/.exec('-3') match is not found because the number is preceded by the minus sign.
Therefore, you can now express "match www.stackoverflow.com unless it's preceded by http://" as /(?<!http:\/\/)www.stackoverflow.com/:
const str = 'http://www.stackoverflow.com www.stackoverflow.com';
console.log(str.replace(/(?<!http:\/\/)www.stackoverflow.com/g, 'rocks'));
This also works:
var variable = 'http://www.example.com www.example.com';
alert(variable.replace(new RegExp('([^(http:\/\/)|(https:\/\/)])(www.example.com)','g'),'$1rocks'));
The alert says "http://www.example.com rocks".

Categories

Resources