Lookbehind alternative with both lookbehind and lookahead - javascript

I'm looking for a regex to split user supplied strings on the : character but not when the user has escaped the colon \: or it's part of a url, e.g. https://stackoverflow...
In javascript the majority of browsers don't yet support lookbehinds. Is it possible to apply some other approach for the lookbehind part?
In clojure/ Clojurescript on Chrome (which does support lookbehinds) this regex does the trick:
#"(?<!\):(?!//)"
but not in Safari (for example).

The main problem is that currently browsers aren't supporting the lookbehind, which is required to find and negate the prefix \ so we don't include \:.
One workaround (not very pretty but it works) is to first substitute the \: with some "symbol" you know will not occur naturally in your text, do your split, and the substitute back any \:.
For example, this method will return an empty element "" if you have "::" in your string:
let regex = /:(?!\/\/)/
//original string literal \: has to be expressed as \\:
let str = "http://example.com::hello:dolly:12\\:00\\:PM";
//substitute out any \:
str = str.replace(/\\:/g,"<colon>"); //http://example.com::hello:dolly:12<colon>00<colon>PM
//now we split 'normally' without lookbehind
let arr = str.split(regex); //[ 'http://example.com', '', 'hello', 'dolly', '12\\:00\\:PM' ]
//substitute back \:
arr = arr.map(element => element.replace(/<colon>/g, "\\:")); //[ 'http://example.com', '', 'hello', 'dolly', '12\\:00\\:PM' ]
console.log(arr);
If you're just after non-empty elements you can just do an arr.filter(Boolean) on it, or just use #Skeeve's matching solution as it's more elegant for this purpose.

An alternative could be to not search for the separator but to search for the elements:
var str="this:is\\:a:test:https://stackoverflow:80:test::test";
var elements= str.match(/((?:[^\\:]|\\:|:\/\/)+)/g);
// elements= [ "this", "is\\:a", "test", "https://stackoverflow", "80", "test", "test" ]
The elements may not be empty (Observe the"+" in the regexp) and how the empty element between the last 2 "test" is missing
You forgot that an URL can contain multiple colons. What about `http://me:password#myhost.com:8080/path?value=d:f'
Besides these I think it should work for you.
I think you can only overcome the disadvantages with a more or less sophisticated loop using regexp-exec.
P.S. I know the grouping isn't required here, but if you want to use it in regexp-exec, you'll need it.
Disadvantages:
P.P.S. Fixed the typo #chatnoir found

You might also make use of replace and pass a function as the second parameter.
You could use a pattern to match what you don't want and capture in a group what you want to keep. Then you can replace the part that you want to keep with a marker just as in the approach of #chatnoir and afterwards split on that marker.
:\/\/\S+|\\:|(:)
Explanation
:\/\/\S+ Match :// followed by 1+ times a non whitespace char
| Or
\\: Match \:
| Or
(:) Capture a : in group 1
Regex demo
let pattern = /:\/\/\S+|\\:|(:)/g;
let str = "string\\: or https://www.example.com:8000 or split:me or te\\:st or \\:test or notsplit\\:me:splitted or \\: or ftp://example.com :";
str = str.replace(pattern, function(match, group1) {
return group1 === undefined ? match : "<split>"
});
console.log(str.split("<split>").filter(Boolean));

Related

Javascipt regex to get string between two characters except escaped without lookbehind

I am looking for a specific javascript regex without the new lookahead/lookbehind features of Javascript 2018 that allows me to select text between two asterisk signs but ignores escaped characters.
In the following example only the text "test" and the included escaped characters are supposed to be selected according the rules above:
\*jdjdjdfdf*test*dfsdf\*adfasdasdasd*test**test\**sd* (Selected: "test", "test", "test\*")
During my research I found this solution Regex, everything between two characters except escaped characters /(?<!\\)(%.*?(?<!\\)%)/ but it uses negative lookbehinds which is supported in javascript 2018 but I need to support IE11 as well, so this solution doesn't work for me.
Then i found another approach which is almost getting there for me here: Javascript: negative lookbehind equivalent?. I altered the answer of Kamil Szot to fit my needs: ((?!([\\])).|^)(\*.*?((?!([\\])).|^)\*) Unfortuantely it doesn't work when two asterisks ** are in a row.
I have already invested a lot of hours and can't seem to get it right, any help is appreciated!
An example with what i have so far is here: https://www.regexpal.com/?fam=117350
I need to use the regexp in a string.replace call (str.replace(regexp|substr, newSubStr|function); so that I can wrap the found strings with a span element of a specific class.
You can use this regular expression:
(?:\\.|[^*])*\*((?:\\.|[^*])*)\*
Your code should then only take the (only) capture group of each match.
Like this:
var str = "\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*";
var regex = /(?:\\.|[^*])*\*((?:\\.|[^*])*)\*/g
var match;
while (match = regex.exec(str)) {
console.log(match[1]);
}
If you need to replace the matches, for instance to wrap the matches in a span tag while also dropping the asterisks, then use two capture groups:
var str = "\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*";
var regex = /((?:\\.|[^*])*)\*((?:\\.|[^*])*)\*/g
var result = str.replace(regex, "$1<span>$2</span>");
console.log(result);
One thing to be careful with: when you use string literals in JavaScript tests, escape the backslash (with another backslash). If you don't do that, the string actually will not have a backslash! To really get the backslash in the in-memory string, you need to escape the backslash.
const testStr = `\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*`;
const m = testStr.match(/\*(\\.)*t(\\.)*e(\\.)*s(\\.)*t(\\.)*\*/g).map(m => m.substr(1, m.length-2));
console.log(m);
More generic code:
const prepareRegExp = (word, delimiter = '\\*') => {
const escaped = '(\\\\.)*';
return new RegExp([
delimiter,
escaped,
[...word].join(escaped),
escaped,
delimiter
].join``, 'g');
};
const testStr = `\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*`;
const m = testStr
.match(prepareRegExp('test'))
.map(m => m.substr(1, m.length-2));
console.log(m);
https://instacode.dev/#Y29uc3QgcHJlcGFyZVJlZ0V4cCA9ICh3b3JkLCBkZWxpbWl0ZXIgPSAnXFwqJykgPT4gewogIGNvbnN0IGVzY2FwZWQgPSAnKFxcXFwuKSonOwogIHJldHVybiBuZXcgUmVnRXhwKFsKICAgIGRlbGltaXRlciwKICAgIGVzY2FwZWQsCiAgICBbLi4ud29yZF0uam9pbihlc2NhcGVkKSwKICAgIGVzY2FwZWQsCiAgICBkZWxpbWl0ZXIKICBdLmpvaW5gYCwgJ2cnKTsKfTsKCmNvbnN0IHRlc3RTdHIgPSBgXFwqamRqZGpkZmRmKnRlc3QqZGZzZGZcXCphZGZhc2Rhc2Rhc2QqdGVzdCoqdGVzdFxcKipzZCpgOwpjb25zdCBtID0gdGVzdFN0cgoJLm1hdGNoKHByZXBhcmVSZWdFeHAoJ3Rlc3QnKSkKCS5tYXAobSA9PiBtLnN1YnN0cigxLCBtLmxlbmd0aC0yKSk7Cgpjb25zb2xlLmxvZyhtKTs=

Regex to replace non A-Z characters plus specific strings

I'm trying to build a regex (for Javascript) that basically does /([\s\W])+/g, with the addition of specific strings (case-insensitive).
Right now I'm doing it like:
var a = 'Test 123 Enterprises PTY-Ltd&Llc.';
a.toLowerCase()
.replace('pty','')
.replace('ltd','')
.replace('llc','')
.replace(/([\s\W])+/g, '');
// Result: 'test123enterprises'
Of course, I'd love to be able to wrap this all into one replace() method, but I can't find any documentation online on how to achieve this via regex. Is this possible?
Try this:
a.toLowerCase().replace(/pty|ltd|llc|\W+/g,'');
It uses the pipe which is basically an OR operator for regular expressions.
You can use a logical OR :
/pty|ltd|llc|([\s\W])+/g
You can use an alternation operator | to specify alternatives:
var re = /\b(?:pty|ltd|llc)\b|\W+/gi;
var str = 'Test 123 Enterprises PTY-Ltd&Llc.';
var result = str.replace(re, '').toLowerCase();
alert(result);
To remove pty, ltd and llc as whole words, you need to use word boundary \b. Also, you need no capturing group since you are not using it. Also, \W includes \s, no need to repeat it.

How does this regexp work?

RegExes give me headaches. I have a very simple regex but I don't understand how it works.
The code:
var str= "startBlablablablablaend";
var regex = /start(.*?)end/;
var match = str.match(regex);
console.log( match[0] ); //startBlablablablablaend
console.log( match[1] ); //Blablablablabla
What I ultimately want would be the second one, in other words the text between the two delimiters (start,end).
My questions:
How does it work? (each character explained please)
Why does it match two different things?
Is there a better way to get match[1]?
If I want to get all the text's between all the start-end instances, how would I go about it?
For the last question, what I mean:
var str = "startBla1end startBla2end startBla3end";
var regex = /start(.*?)end/gmi;
var match = str.match(regex);
console.log( match ); // [ "startBla1end" , "startBla2end" , "startBla3end" ]
What I need is:
console.log( match ); // [ "Bla1" , "Bla2" , "Bla3" ];
Thanks :)
How does it work?
start matches start in the string
(.*?) non greedy match for character
end matches the end in the string
Matching
startBlablablablablaend
|
start
startBlablablablablaend
|
.
startBlablablablablaend
|
.
# and so on since quantifier * matches any number of character. ? makes the match non greedy
startBlablablablablaend
|
end
Why does it match two different things?
It doesnt match 2 differnt things
match[0] will contain the entire match
match[1] will contain the first capture group (the part matched in the first paranthesis)
Is there a better way to get match[1]?
Short answer No
If you are using languages other than javascript. its possible using look arounds
(?<=start)(.*?)(?=end)
#Blablablablabla
Note This wont work with javascript as it doesnt support negative lookbehinds
Last Question
The best that you can get from a single match statement would be
var str = "startBla1end startBla2end startBla3end";
var regex = /start(.*?)(?=end)/gmi;
var match = str.match(regex);
console.log( match ); // [ "startBla" , "startBla2" , "startBla3" ]
You need not to do a much effort on it.
Try this this regex:
start(.*)end
You can look at this stackoverflow question which already been answered before.
Regular Expression to get a string between two strings in Javascript
Hope it helps.
To solve your last question, you can split up your string and iterate:
var str = "startBla1end startBla2end startBla3end";
var str_array = str.split(" ");
Then iterate over each element of the str_array using your existing code to extract each Bla# substring.

Regular Expression - Match String Not Preceded by Another String (JavaScript)

I am trying to find a regular expression that will match a string when it's NOT preceded by another specific string (in my case, when it is NOT preceded by "http://"). This is in JavaScript, and I'm running on Chrome (not that it should matter).
The sample code is:
var str = 'http://www.stackoverflow.com www.stackoverflow.com';
alert(str.replace(new RegExp('SOMETHING','g'),'rocks'));
And I want to replace SOMETHING with a regular expression that means "match www.stackoverflow.com unless it's preceded by http://". The alert should then say "http://www.stackoverflow.com rocks", naturally.
Can anyone help? It feels like I tried everything found in previous answers, but nothing works. Thanks!
As JavaScript regex engines don't support 'lookbehind' assertions, it's not possible to do with plain regex. Still, there's a workaround, involving replace callback function:
var str = "As http://JavaScript regex engines don't support `lookbehind`, it's not possible to do with plain regex. Still, there's a workaround";
var adjusted = str.replace(/\S+/g, function(match) {
return match.slice(0, 7) === 'http://'
? match
: 'rocks'
});
console.log(adjusted);
You can actually create a generator for these functions:
var replaceIfNotPrecededBy = function(notPrecededBy, replacement) {
return function(match) {
return match.slice(0, notPrecededBy.length) === notPrecededBy
? match
: replacement;
}
};
... then use it in that replace instead:
var adjusted = str.replace(/\S+/g, replaceIfNotPrecededBy('http://', 'rocks'));
JS Fiddle.
raina77ow's answer reflected the situation in 2013, but it is now outdated, as the proposal for lookbehind assertions got accepted into the ECMAScript spec in 2018.
See docs for it on MDN:
Characters
Meaning
(?<!y)x
Negative lookbehind assertion: Matches "x" only if "x" is not preceded by "y". For example, /(?<!-)\d+/ matches a number only if it is not preceded by a minus sign. /(?<!-)\d+/.exec('3') matches "3". /(?<!-)\d+/.exec('-3') match is not found because the number is preceded by the minus sign.
Therefore, you can now express "match www.stackoverflow.com unless it's preceded by http://" as /(?<!http:\/\/)www.stackoverflow.com/:
const str = 'http://www.stackoverflow.com www.stackoverflow.com';
console.log(str.replace(/(?<!http:\/\/)www.stackoverflow.com/g, 'rocks'));
This also works:
var variable = 'http://www.example.com www.example.com';
alert(variable.replace(new RegExp('([^(http:\/\/)|(https:\/\/)])(www.example.com)','g'),'$1rocks'));
The alert says "http://www.example.com rocks".

Regex to get string between curly braces

Unfortunately, despite having tried to learn regex at least one time a year for as many years as I can remember, I always forget as I use them so infrequently. This year my new year's resolution is to not try and learn regex again - So this year to save me from tears I'll give it to Stack Overflow. (Last Christmas remix).
I want to pass in a string in this format {getThis}, and be returned the string getThis. Could anyone be of assistance in helping to stick to my new year's resolution?
Related questions on Stack Overflow:
How can one turn regular quotes (i.e. ', ") into LaTeX/TeX quotes (i.e. `', ``'')
Regex: To pull out a sub-string between two tags in a string
Regex to replace all \n in a String, but no those inside [code] [/code] tag
Try
/{(.*?)}/
That means, match any character between { and }, but don't be greedy - match the shortest string which ends with } (the ? stops * being greedy). The parentheses let you extract the matched portion.
Another way would be
/{([^}]*)}/
This matches any character except a } char (another way of not being greedy)
/\{([^}]+)\}/
/ - delimiter
\{ - opening literal brace escaped because it is a special character used for quantifiers eg {2,3}
( - start capturing
[^}] - character class consisting of
^ - not
} - a closing brace (no escaping necessary because special characters in a character class are different)
+ - one or more of the character class
) - end capturing
\} - the closing literal brace
/ - delimiter
If your string will always be of that format, a regex is overkill:
>>> var g='{getThis}';
>>> g.substring(1,g.length-1)
"getThis"
substring(1 means to start one character in (just past the first {) and ,g.length-1) means to take characters until (but not including) the character at the string length minus one. This works because the position is zero-based, i.e. g.length-1 is the last position.
For readers other than the original poster: If it has to be a regex, use /{([^}]*)}/ if you want to allow empty strings, or /{([^}]+)}/ if you want to only match when there is at least one character between the curly braces. Breakdown:
/: start the regex pattern
{: a literal curly brace
(: start capturing
[: start defining a class of characters to capture
^}: "anything other than }"
]: OK, that's our whole class definition
*: any number of characters matching that class we just defined
): done capturing
}: a literal curly brace must immediately follow what we captured
/: end the regex pattern
Try this:
/[^{\}]+(?=})/g
For example
Welcome to RegExr v2.1 by #{gskinner.com}, #{ssd.sd} hosted by Media Temple!
will return gskinner.com, ssd.sd.
Try this
let path = "/{id}/{name}/{age}";
const paramsPattern = /[^{}]+(?=})/g;
let extractParams = path.match(paramsPattern);
console.log("extractParams", extractParams) // prints all the names between {} = ["id", "name", "age"]
Here's a simple solution using javascript replace
var st = '{getThis}';
st = st.replace(/\{|\}/gi,''); // "getThis"
As the accepted answer above points out the original problem is easily solved with substring, but using replace can solve the more complicated use cases
If you have a string like "randomstring999[fieldname]"
You use a slightly different pattern to get fieldname
var nameAttr = "randomstring999[fieldname]";
var justName = nameAttr.replace(/.*\[|\]/gi,''); // "fieldname"
This one works in Textmate and it matches everything in a CSS file between the curly brackets.
\{(\s*?.*?)*?\}
selector {.
.
matches here
including white space.
.
.}
If you want to further be able to return the content, then wrap it all in one more set of parentheses like so:
\{((\s*?.*?)*?)\}
and you can access the contents via $1.
This also works for functions, but I haven't tested it with nested curly brackets.
You want to use regex lookahead and lookbehind. This will give you only what is inside the curly braces:
(?<=\{)(.*?)(?=\})
i have looked into the other answers, and a vital logic seems to be missing from them . ie, select everything between two CONSECUTIVE brackets,but NOT the brackets
so, here is my answer
\{([^{}]+)\}
Regex for getting arrays of string with curly braces enclosed occurs in string, rather than just finding first occurrence.
/\{([^}]+)\}/gm
var re = /{(.*)}/;
var m = "{helloworld}".match(re);
if (m != null)
console.log(m[0].replace(re, '$1'));
The simpler .replace(/.*{(.*)}.*/, '$1') unfortunately returns the entire string if the regex does not match. The above code snippet can more easily detect a match.
Try this one, according to http://www.regextester.com it works for js normaly.
([^{]*?)(?=\})
This one matches everything even if it finds multiple closing curly braces in the middle:
\{([\s\S]*)\}
Example:
{
"foo": {
"bar": 1,
"baz": 1,
}
}
You can use this regex recursion to match everythin between, even another {} (like a JSON text) :
\{([^()]|())*\}
Even this helps me while trying to solve someone's problem,
Split the contents inside curly braces ({}) having a pattern like,
{'day': 1, 'count': 100}.
For example:
#include <iostream>
#include <regex>
#include<string>
using namespace std;
int main()
{
//string to be searched
string s = "{'day': 1, 'count': 100}, {'day': 2, 'count': 100}";
// regex expression for pattern to be searched
regex e ("\\{[a-z':, 0-9]+\\}");
regex_token_iterator<string::iterator> rend;
regex_token_iterator<string::iterator> a ( s.begin(), s.end(), e );
while (a!=rend) cout << " [" << *a++ << "]";
cout << endl;
return 0;
}
Output:
[{'day': 1, 'count': 100}] [{'day': 2, 'count': 100}]
Your can use String.slice() method.
let str = "{something}";
str = str.slice(1,-1) // something

Categories

Resources