Regular Expression - Match String Not Preceded by Another String (JavaScript) - javascript

I am trying to find a regular expression that will match a string when it's NOT preceded by another specific string (in my case, when it is NOT preceded by "http://"). This is in JavaScript, and I'm running on Chrome (not that it should matter).
The sample code is:
var str = 'http://www.stackoverflow.com www.stackoverflow.com';
alert(str.replace(new RegExp('SOMETHING','g'),'rocks'));
And I want to replace SOMETHING with a regular expression that means "match www.stackoverflow.com unless it's preceded by http://". The alert should then say "http://www.stackoverflow.com rocks", naturally.
Can anyone help? It feels like I tried everything found in previous answers, but nothing works. Thanks!

As JavaScript regex engines don't support 'lookbehind' assertions, it's not possible to do with plain regex. Still, there's a workaround, involving replace callback function:
var str = "As http://JavaScript regex engines don't support `lookbehind`, it's not possible to do with plain regex. Still, there's a workaround";
var adjusted = str.replace(/\S+/g, function(match) {
return match.slice(0, 7) === 'http://'
? match
: 'rocks'
});
console.log(adjusted);
You can actually create a generator for these functions:
var replaceIfNotPrecededBy = function(notPrecededBy, replacement) {
return function(match) {
return match.slice(0, notPrecededBy.length) === notPrecededBy
? match
: replacement;
}
};
... then use it in that replace instead:
var adjusted = str.replace(/\S+/g, replaceIfNotPrecededBy('http://', 'rocks'));
JS Fiddle.

raina77ow's answer reflected the situation in 2013, but it is now outdated, as the proposal for lookbehind assertions got accepted into the ECMAScript spec in 2018.
See docs for it on MDN:
Characters
Meaning
(?<!y)x
Negative lookbehind assertion: Matches "x" only if "x" is not preceded by "y". For example, /(?<!-)\d+/ matches a number only if it is not preceded by a minus sign. /(?<!-)\d+/.exec('3') matches "3". /(?<!-)\d+/.exec('-3') match is not found because the number is preceded by the minus sign.
Therefore, you can now express "match www.stackoverflow.com unless it's preceded by http://" as /(?<!http:\/\/)www.stackoverflow.com/:
const str = 'http://www.stackoverflow.com www.stackoverflow.com';
console.log(str.replace(/(?<!http:\/\/)www.stackoverflow.com/g, 'rocks'));

This also works:
var variable = 'http://www.example.com www.example.com';
alert(variable.replace(new RegExp('([^(http:\/\/)|(https:\/\/)])(www.example.com)','g'),'$1rocks'));
The alert says "http://www.example.com rocks".

Related

Javascipt regex to get string between two characters except escaped without lookbehind

I am looking for a specific javascript regex without the new lookahead/lookbehind features of Javascript 2018 that allows me to select text between two asterisk signs but ignores escaped characters.
In the following example only the text "test" and the included escaped characters are supposed to be selected according the rules above:
\*jdjdjdfdf*test*dfsdf\*adfasdasdasd*test**test\**sd* (Selected: "test", "test", "test\*")
During my research I found this solution Regex, everything between two characters except escaped characters /(?<!\\)(%.*?(?<!\\)%)/ but it uses negative lookbehinds which is supported in javascript 2018 but I need to support IE11 as well, so this solution doesn't work for me.
Then i found another approach which is almost getting there for me here: Javascript: negative lookbehind equivalent?. I altered the answer of Kamil Szot to fit my needs: ((?!([\\])).|^)(\*.*?((?!([\\])).|^)\*) Unfortuantely it doesn't work when two asterisks ** are in a row.
I have already invested a lot of hours and can't seem to get it right, any help is appreciated!
An example with what i have so far is here: https://www.regexpal.com/?fam=117350
I need to use the regexp in a string.replace call (str.replace(regexp|substr, newSubStr|function); so that I can wrap the found strings with a span element of a specific class.
You can use this regular expression:
(?:\\.|[^*])*\*((?:\\.|[^*])*)\*
Your code should then only take the (only) capture group of each match.
Like this:
var str = "\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*";
var regex = /(?:\\.|[^*])*\*((?:\\.|[^*])*)\*/g
var match;
while (match = regex.exec(str)) {
console.log(match[1]);
}
If you need to replace the matches, for instance to wrap the matches in a span tag while also dropping the asterisks, then use two capture groups:
var str = "\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*";
var regex = /((?:\\.|[^*])*)\*((?:\\.|[^*])*)\*/g
var result = str.replace(regex, "$1<span>$2</span>");
console.log(result);
One thing to be careful with: when you use string literals in JavaScript tests, escape the backslash (with another backslash). If you don't do that, the string actually will not have a backslash! To really get the backslash in the in-memory string, you need to escape the backslash.
const testStr = `\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*`;
const m = testStr.match(/\*(\\.)*t(\\.)*e(\\.)*s(\\.)*t(\\.)*\*/g).map(m => m.substr(1, m.length-2));
console.log(m);
More generic code:
const prepareRegExp = (word, delimiter = '\\*') => {
const escaped = '(\\\\.)*';
return new RegExp([
delimiter,
escaped,
[...word].join(escaped),
escaped,
delimiter
].join``, 'g');
};
const testStr = `\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*`;
const m = testStr
.match(prepareRegExp('test'))
.map(m => m.substr(1, m.length-2));
console.log(m);
https://instacode.dev/#Y29uc3QgcHJlcGFyZVJlZ0V4cCA9ICh3b3JkLCBkZWxpbWl0ZXIgPSAnXFwqJykgPT4gewogIGNvbnN0IGVzY2FwZWQgPSAnKFxcXFwuKSonOwogIHJldHVybiBuZXcgUmVnRXhwKFsKICAgIGRlbGltaXRlciwKICAgIGVzY2FwZWQsCiAgICBbLi4ud29yZF0uam9pbihlc2NhcGVkKSwKICAgIGVzY2FwZWQsCiAgICBkZWxpbWl0ZXIKICBdLmpvaW5gYCwgJ2cnKTsKfTsKCmNvbnN0IHRlc3RTdHIgPSBgXFwqamRqZGpkZmRmKnRlc3QqZGZzZGZcXCphZGZhc2Rhc2Rhc2QqdGVzdCoqdGVzdFxcKipzZCpgOwpjb25zdCBtID0gdGVzdFN0cgoJLm1hdGNoKHByZXBhcmVSZWdFeHAoJ3Rlc3QnKSkKCS5tYXAobSA9PiBtLnN1YnN0cigxLCBtLmxlbmd0aC0yKSk7Cgpjb25zb2xlLmxvZyhtKTs=

Simple Regexp Pattern matching with escape characters

Hopefully a simple one!
I've been trying to get this to work for several hours now but am having no luck, as I'm fairly new to regexp I may be missing something very obvious here and was hoping someone could point me in the right direction. The pattern I want to match is as follows: -
At least 1 or more numbers + "##" + at least 1 or more numbers + "##" + at least 1 or more numbers
so a few examples of valid combinations would be: -
1##2##3
123#123#123
0##0##0
A few invalid combinations would be
a##b##c
1## ##1
I've got the following regexp like so: -
[\d+]/#/#[\d+]/#/#[\d+]
And am using it like so (note the double slashes as its inside a string): -
var patt = new RegExp("[\\d+]/#/#[\\d+]/#/#[\\d+]");
if(newFieldValue!=patt){newFieldValue=="no match"}
I also tried these but still nothing: -
if(!patt.text(newFieldValue)){newFieldValue==""}
if(patt.text(newFieldValue)){}else{newFieldValue==""}
But nothing I try is matching, where am I going wrong here?
Any pointers gratefully received, cheers!
1) I can't see any reason to use the RegExp constructor over a RegExp literal for your case. (The former is used primarily where the pattern needs to by dynamic, i.e. is contributed to by variables.)
2) You don't need a character class if there's only one type of character in it (so \d+ not [\d+]
3) You are not actually checking the pattern against the input. You don't apply RegEx by creating an instance of it and using ==; you need to use test() or match() to see if a match is made (the former if you want to check only, not capture)
4) You have == where you mean to assign (=)
if (!/\d+##\d+##\d+/.test(newFieldValue)) newFieldValue = "no match";
You put + inside the brackets, so you're matching a single character that's either a digit or +, not a sequence of digits. I also don't understand why you have / before each #, your description doesn't mention anything about this character.
Use:
var patt = /\d+##\d+##\d+/;
You should use the test method of the pat regex
if (!patt.test(newFieldValue)){ newFieldValue=="no match"; }
once you have a valid regular expression.
Try this regex :
^(?:\d+##){2}\d+$
Demo: http://regex101.com/r/mE8aG7
With the following regex
[\d+]/#/#[\d+]/#/#[\d+]
You would only match things like:
+/#/#5/#/#+
+/#/#+/#/#+
0/#/#0/#/#0
because the regex engine sees it like on the schema below:
Something like:
((-\s)?\d+##)+\d+

JS XRegExp Replace all non characters

My objective is to replace all characters which are not dash (-) or not number or not letters in any language in a string.All of the #!()[], and all other signs to be replaced with empty string. All occurences of - should not be replaced also.
I have used for this the XRegExp plugin but it seems I cannot find the magic solution :)
I have tryed like this :
var txt = "Ad СТИНГ (ALI) - Englishmen In New York";
var regex = new XRegExp('\\p{^N}\\p{^L}',"g");
var b = XRegExp.replace(txt, regex, "")
but the result is : AСТИН(AL EnglishmeINeYork ... which is kind of weird
If I try to add also the condition for not removing the '-' character leads to make the RegEx invalid.
\\p{^N}\\p{^L} means a non-number followed by a non-letter.
Try [^\\p{N}\\p{L}-] that means a non-number, non-letter, non-dash.
A jsfiddle where to do some tests... The third XRegExp is the one you asked.
\p{^N}\p{^L}
is a non-number followed by a non-letter. You probably meant to say a character that is neither a letter nor a number:
[^\p{N}\p{L}]
// all non letters/numbers in a string => /[^a-zA-z0-9]/g
I dont know XRegExp.
but in js Regexp you can replace it by
b.replace(/[^a-zA-z0-9]/g,'')

Remove all special characters with RegExp

I would like a RegExp that will remove all special characters from a string. I am trying something like this but it doesn’t work in IE7, though it works in Firefox.
var specialChars = "!##$^&%*()+=-[]\/{}|:<>?,.";
for (var i = 0; i < specialChars.length; i++) {
stringToReplace = stringToReplace.replace(new RegExp("\\" + specialChars[i], "gi"), "");
}
A detailed description of the RegExp would be helpful as well.
var desired = stringToReplace.replace(/[^\w\s]/gi, '')
As was mentioned in the comments it's easier to do this as a whitelist - replace the characters which aren't in your safelist.
The caret (^) character is the negation of the set [...], gi say global and case-insensitive (the latter is a bit redundant but I wanted to mention it) and the safelist in this example is digits, word characters, underscores (\w) and whitespace (\s).
Note that if you still want to exclude a set, including things like slashes and special characters you can do the following:
var outString = sourceString.replace(/[`~!##$%^&*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, '');
take special note that in order to also include the "minus" character, you need to escape it with a backslash like the latter group. if you don't it will also select 0-9 which is probably undesired.
Plain Javascript regex does not handle Unicode letters.
Do not use [^\w\s], this will remove letters with accents (like àèéìòù), not to mention to Cyrillic or Chinese, letters coming from such languages will be completed removed.
You really don't want remove these letters together with all the special characters. You have two chances:
Add in your regex all the special characters you don't want remove, for example: [^èéòàùì\w\s].
Have a look at xregexp.com. XRegExp adds base support for Unicode matching via the \p{...} syntax.
var str = "Їжак::: résd,$%& adùf"
var search = XRegExp('([^?<first>\\pL ]+)');
var res = XRegExp.replace(str, search, '',"all");
console.log(res); // returns "Їжак::: resd,adf"
console.log(str.replace(/[^\w\s]/gi, '') ); // returns " rsd adf"
console.log(str.replace(/[^\wèéòàùì\s]/gi, '') ); // returns " résd adùf"
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.js"></script>
using \W or [a-z0-9] regex won't work for non english languages like chinese etc.,
It's better to use all special characters in regex and exclude them from given string
str.replace(/[~`!##$%^&*()+={}\[\];:\'\"<>.,\/\\\?-_]/g, '');
The first solution does not work for any UTF-8 alphabet. (It will cut text such as Їжак). I have managed to create a function which does not use RegExp and use good UTF-8 support in the JavaScript engine. The idea is simple if a symbol is equal in uppercase and lowercase it is a special character. The only exception is made for whitespace.
function removeSpecials(str) {
var lower = str.toLowerCase();
var upper = str.toUpperCase();
var res = "";
for(var i=0; i<lower.length; ++i) {
if(lower[i] != upper[i] || lower[i].trim() === '')
res += str[i];
}
return res;
}
Update: Please note, that this solution works only for languages where there are small and capital letters. In languages like Chinese, this won't work.
Update 2: I came to the original solution when I was working on a fuzzy search. If you also trying to remove special characters to implement search functionality, there is a better approach. Use any transliteration library which will produce you string only from Latin characters and then the simple Regexp will do all magic of removing special characters. (This will work for Chinese also and you also will receive side benefits by making Tromsø == Tromso).
I use RegexBuddy for debbuging my regexes it has almost all languages very usefull. Than copy/paste for the targeted language.
Terrific tool and not very expensive.
So I copy/pasted your regex and your issue is that [,] are special characters in regex, so you need to escape them. So the regex should be : /!##$^&%*()+=-[\x5B\x5D]\/{}|:<>?,./im
str.replace(/\s|[0-9_]|\W|[#$%^&*()]/g, "") I did sth like this.
But there is some people who did it much easier like str.replace(/\W_/g,"");
#Seagull anwser (https://stackoverflow.com/a/26482552/4556619)
looks good but you get undefined string in result when there are some special (turkish) characters. See example below.
let str="bənövşəyi 😟пурпурный İdÖĞ";
i slightly improve it and patch with undefined check.
function removeSpecials(str) {
let lower = str.toLowerCase();
let upper = str.toUpperCase();
let res = "",i=0,n=lower.length,t;
for(i; i<n; ++i) {
if(lower[i] !== upper[i] || lower[i].trim() === ''){
t=str[i];
if(t!==undefined){
res +=t;
}
}
}
return res;
}
text.replace(/[`~!##$%^*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, '');
why dont you do something like:
re = /^[a-z0-9 ]$/i;
var isValid = re.test(yourInput);
to check if your input contain any special char

Using jQuery to find a substring

Say you have a string: "The ABC cow jumped over XYZ the moon" and you want to use jQuery to get the substring between the "ABC" and "XYZ", how would you do this? The substring should be "cow jumped over". Many thanks!
This has nothing to do with jQuery, which is primarily for DOM traversal and manipulation. You want a simple regular expression:
var str = "The ABC cow jumped over XYZ the moon";
var sub = str.replace(/^.*ABC(.*)XYZ.*$/m, '$1');
The idea is you're using a String.replace with a regular expression which matches your opening and closing delimiters, and replacing the whole string with the part matched between the delimiters.
The first argument is a regular expression. The trailing m causes it to match over multiple lines, meaning your text between ABC and XYZ may contain newlines. The rest breaks down as follows:
^ start at the beginning of the string
.* a series of 0 or more characters
ABC your opening delimiter
(.*) match a series of 0 or more characters
XYZ your closing delimiter
.* a series of 0 or more characters
$ match to the end of the string
The second parameter, the replacement string, is '$1'. replace will substitute in parenthesized submatchs from your regular exprsesion - the (.*) portion from above. Thus the return value is the entire string replace with the parts between the delimiters.
You may not need to use jQuery on this one. I'd do something like this:
function between(str, left, right) {
if( !str || !left || !right ) return null;
var left_loc = str.indexOf(left);
var right_loc = str.indexOf(right);
if( left_loc == -1 || right_loc == -1 ) return null;
return str.substring(left_loc + left.length, right_loc);
}
No guarantees the above code is bug-free, but the idea is to use the standard substring() function. In my experience these types of functions work the same across all browsers.
Meagar, your explanation is great, and clearly explains who it works.
Just a few minor questions:
Are the () parenthesis required ONLY as a way to indicate a submatch in the second parameter of the relpace function or would this also identify the submatches: /^.*ABC.XYZ.$/ but not work for what we are trying to do in this case?
Does this regular expression have 7 submatches:
^
.*
ABC
.*
XYZ
.*
$
Does the $1 mean to use the first parenthesized submatch? At first I thought it might mean to use the second submatch in the series (the first being $0).
Thanks,
Steve
Just to show you how you would use jQuery and meagar's regex. Let's say that you've got an HTML page with the following P tag:
<p id="grabthis">The ABC cow jumped over XYZ the moon</p>
To grab the string, you would use the following jQuery/JavaScript mix (sounds kind of stupid, since jQuery is JavaScript, but see jQuery as a JavaScript DOM library):
$(document).ready(function() { // Wait until the document has been fully loaded
var pStr=$("#grabthis").text(); // Grab the text from the P tag and put it into a JS variable
var subStr=pStr.replace(/^.*ABC(.*)XYZ.*$/m, '$1'); // Run the regex to grab the middle string
alert(subStr); // Output the grabbed middle string
});
Or the shorter version:
$(document).ready(function() {
alert($("#grabthis").text().replace(/^.*ABC(.*)XYZ.*$/m, '$1'));
});
The replace function is a JavaScript function. I hope this clears the confusion.

Categories

Resources