Regex to replace non A-Z characters plus specific strings - javascript

I'm trying to build a regex (for Javascript) that basically does /([\s\W])+/g, with the addition of specific strings (case-insensitive).
Right now I'm doing it like:
var a = 'Test 123 Enterprises PTY-Ltd&Llc.';
a.toLowerCase()
.replace('pty','')
.replace('ltd','')
.replace('llc','')
.replace(/([\s\W])+/g, '');
// Result: 'test123enterprises'
Of course, I'd love to be able to wrap this all into one replace() method, but I can't find any documentation online on how to achieve this via regex. Is this possible?

Try this:
a.toLowerCase().replace(/pty|ltd|llc|\W+/g,'');
It uses the pipe which is basically an OR operator for regular expressions.

You can use a logical OR :
/pty|ltd|llc|([\s\W])+/g

You can use an alternation operator | to specify alternatives:
var re = /\b(?:pty|ltd|llc)\b|\W+/gi;
var str = 'Test 123 Enterprises PTY-Ltd&Llc.';
var result = str.replace(re, '').toLowerCase();
alert(result);
To remove pty, ltd and llc as whole words, you need to use word boundary \b. Also, you need no capturing group since you are not using it. Also, \W includes \s, no need to repeat it.

Related

Lookbehind alternative with both lookbehind and lookahead

I'm looking for a regex to split user supplied strings on the : character but not when the user has escaped the colon \: or it's part of a url, e.g. https://stackoverflow...
In javascript the majority of browsers don't yet support lookbehinds. Is it possible to apply some other approach for the lookbehind part?
In clojure/ Clojurescript on Chrome (which does support lookbehinds) this regex does the trick:
#"(?<!\):(?!//)"
but not in Safari (for example).
The main problem is that currently browsers aren't supporting the lookbehind, which is required to find and negate the prefix \ so we don't include \:.
One workaround (not very pretty but it works) is to first substitute the \: with some "symbol" you know will not occur naturally in your text, do your split, and the substitute back any \:.
For example, this method will return an empty element "" if you have "::" in your string:
let regex = /:(?!\/\/)/
//original string literal \: has to be expressed as \\:
let str = "http://example.com::hello:dolly:12\\:00\\:PM";
//substitute out any \:
str = str.replace(/\\:/g,"<colon>"); //http://example.com::hello:dolly:12<colon>00<colon>PM
//now we split 'normally' without lookbehind
let arr = str.split(regex); //[ 'http://example.com', '', 'hello', 'dolly', '12\\:00\\:PM' ]
//substitute back \:
arr = arr.map(element => element.replace(/<colon>/g, "\\:")); //[ 'http://example.com', '', 'hello', 'dolly', '12\\:00\\:PM' ]
console.log(arr);
If you're just after non-empty elements you can just do an arr.filter(Boolean) on it, or just use #Skeeve's matching solution as it's more elegant for this purpose.
An alternative could be to not search for the separator but to search for the elements:
var str="this:is\\:a:test:https://stackoverflow:80:test::test";
var elements= str.match(/((?:[^\\:]|\\:|:\/\/)+)/g);
// elements= [ "this", "is\\:a", "test", "https://stackoverflow", "80", "test", "test" ]
The elements may not be empty (Observe the"+" in the regexp) and how the empty element between the last 2 "test" is missing
You forgot that an URL can contain multiple colons. What about `http://me:password#myhost.com:8080/path?value=d:f'
Besides these I think it should work for you.
I think you can only overcome the disadvantages with a more or less sophisticated loop using regexp-exec.
P.S. I know the grouping isn't required here, but if you want to use it in regexp-exec, you'll need it.
Disadvantages:
P.P.S. Fixed the typo #chatnoir found
You might also make use of replace and pass a function as the second parameter.
You could use a pattern to match what you don't want and capture in a group what you want to keep. Then you can replace the part that you want to keep with a marker just as in the approach of #chatnoir and afterwards split on that marker.
:\/\/\S+|\\:|(:)
Explanation
:\/\/\S+ Match :// followed by 1+ times a non whitespace char
| Or
\\: Match \:
| Or
(:) Capture a : in group 1
Regex demo
let pattern = /:\/\/\S+|\\:|(:)/g;
let str = "string\\: or https://www.example.com:8000 or split:me or te\\:st or \\:test or notsplit\\:me:splitted or \\: or ftp://example.com :";
str = str.replace(pattern, function(match, group1) {
return group1 === undefined ? match : "<split>"
});
console.log(str.split("<split>").filter(Boolean));

Remove specific words except last in JavaScript?

I have a sentence that I would like to have only the last 'and' remaining, and remove the others.
"Lions, and tigers, and bears, and elephants", and I would like to turn this into:
"Lions, tigers, bears, and elephants".
I have tried using a regex pattern like str = str.replace(/and([^and]*)$/, '$1'); which obviously didn't work. Thanks.
Use this regex:
and (?=.*and)
and matches any and followed by a space. Space is matched so it is removed on replacement, to prevent having 2 spaces
(?=.*and) is a lookahead, meaning it will only match if followed by .*and, if followed by and
Use this code:
str = str.replace(/and (?=.*and)/g, '');
You can use a positive look-ahead (?=...), to see if there is another and ahead of the current match. You also need to make the regex global with g.
function removeAllButLastAnd(str) {
return str.replace(/and\s?(?=.*and)/g, '');
}
console.log(removeAllButLastAnd("Lions, and tigers, and bears, and elephants"));
var multipleAnd = "Lions, and tigers, and bears, and elephants";
var lastAndIndex = multipleAnd.lastIndexOf(' and');
var onlyLastAnd = multipleAnd.substring(0, lastAndIndex).replace(/ and/gi, '')+multipleAnd.substring(lastAndIndex);
console.log(onlyLastAnd);

Replace all besides the Regex group?

I was given a task to do which requires a long time to do.
The image say it all :
This is what I have : (x100 times):
And I need to extract this value only
How can I capture the value ?
I have made it with this regex :
DbCommand.*?\("(.*?)"\);
As you can see it does work :
And after the replace function (replace to $1) I do get the pure value :
but the problem is that I need only the pure values and not the rest of the unmatched group :
Question : In other words :
How can I get the purified result like :
Eservices_Claims_Get_Pending_Claims_List
Eservices_Claims_Get_Pending_Claims_Step1
Here is my code at Online regexer
Is there any way of replacing "all besides the matched group" ?
p.s. I know there are other ways of doing it but I prefer a regex solution ( which will also help me to understand regex better)
Unfortunately, JavaScript doesn't understand lookbehind. If it did, you could change your regular expression to match .*? preceded (lookbehind) by DbCommand.*?\(" and followed (lookahead) by "\);.
With that solution denied, i believe the cleanest solution is to perform two matches:
// you probably want to build the regexps dynamically
var regexG = /DbCommand.*?\("(.*?)"\);/g;
var regex = /DbCommand.*?\("(.*?)"\);/;
var matches = str.match(regexG).map(function(item){ return item.match(regex)[1] });
console.log(matches);
// ["Eservices_Claims_Get_Pending_Claims_List", "Eservices_Claims_Get_Pending_Claims_Step1"]
DEMO: http://jsbin.com/aqaBobOP/2/edit
You should be able to do a global replace of:
public static DataTable.*?{.*?DbCommand.*?\("(.*?)"\);.*?}
All I've done is changed it to match the whole block including the function definition using a bunch of .*?s.
Note: Make sure your regex settings are such that the dot (.) matches all characters, including newlines.
In fact if you want to close up all whitespace, you can slap a \s* on the front and replace with $1\n:
\s*public static DataTable.*?{.*?DbCommand.*?\("(.*?)"\);.*?}
Using your test case: http://regexr.com?37ibi
You can use this (without the ignore case and multiline option, with a global search):
pattern: (?:[^D]+|\BD|D(?!bCommand ))+|DbCommand [^"]+"([^"]+)
replace: $1\n
Try simply replacing the whole document replacing using this expression:
^(?: |\t)*(?:(?!DbCommand).)*$
You will then only be left with the lines that begin with the string DbCommand
You can then remove the spaces in between by replacing:
\r?\n\s* with \n globally.
Here is an example of this working: http://regexr.com?37ic4

JS XRegExp Replace all non characters

My objective is to replace all characters which are not dash (-) or not number or not letters in any language in a string.All of the #!()[], and all other signs to be replaced with empty string. All occurences of - should not be replaced also.
I have used for this the XRegExp plugin but it seems I cannot find the magic solution :)
I have tryed like this :
var txt = "Ad СТИНГ (ALI) - Englishmen In New York";
var regex = new XRegExp('\\p{^N}\\p{^L}',"g");
var b = XRegExp.replace(txt, regex, "")
but the result is : AСТИН(AL EnglishmeINeYork ... which is kind of weird
If I try to add also the condition for not removing the '-' character leads to make the RegEx invalid.
\\p{^N}\\p{^L} means a non-number followed by a non-letter.
Try [^\\p{N}\\p{L}-] that means a non-number, non-letter, non-dash.
A jsfiddle where to do some tests... The third XRegExp is the one you asked.
\p{^N}\p{^L}
is a non-number followed by a non-letter. You probably meant to say a character that is neither a letter nor a number:
[^\p{N}\p{L}]
// all non letters/numbers in a string => /[^a-zA-z0-9]/g
I dont know XRegExp.
but in js Regexp you can replace it by
b.replace(/[^a-zA-z0-9]/g,'')

java script Regular Expressions patterns problem

My problem start with like-
var str='0|31|2|03|.....|4|2007'
str=str.replace(/[^|]\d*[^|]/,'5');
so the output becomes like:"0|5|2|03|....|4|2007" so it replaces 31->5
But this doesn't work for replacing other segments when i change code like this:
str=str.replace(/[^|]{2}\d*[^|]/,'6');
doesn't change 2->6.
What actually i am missing here.Any help?
I think a regular expression is a bad solution for that problem. I'd rather do something like this:
var str = '0|31|2|03|4|2007';
var segments = str.split("|");
segments[1] = "35";
segments[2] = "123";
Can't think of a good way to solve this with a regexp.
Here is a specific regex solution which replaces the number following the first | pipe symbol with the number 5:
var re = /^((?:\d+\|){1})\d+/;
return text.replace(re, '$15');
If you want to replace the digits following the third |, simply change the {1} portion of the regex to {3}
Here is a generalized function that will replace any given number slot (zero-based index), with a specified new number:
function replaceNthNumber(text, n, newnum) {
var re = new RegExp("^((?:\\d+\\|){"+ n +'})\\d+');
return text.replace(re, '$1'+ newnum);
}
Firstly, you don't have to escape | in the character set, because it doesn't have any special meaning in character sets.
Secondly, you don't put quantifiers in character sets.
And finally, to create a global matching expression, you have to use the g flag.
[^\|] means anything but a '|', so in your case it only matches a digit. So it will only match anything with 2 or more digits.
Second you should put the {2} outside of the []-brackets
I'm not sure what you want to achieve here.

Categories

Resources