Regex: match word (but delete commas after OR before) - javascript

I have tried to delete an item from a string divided with commas:
var str="this,is,unwanted,a,test";
if I do a simple str.replace('unwanted',''); I end up with 2 commas
if I do a more complex str.replace('unwanted','').replace(',,','');
It might work
But the problem comes when the str is like this:
var str="unwanted,this,is,a,test"; // or "...,unwanted"
However, I could do a 'if char at [0 or str.length] == comma', then remove it
But I really think this is not the way to go, it is absurd I need to do 2 replaces and 2 ifs to achieve what I want
I have heard that regex can do powerful stuff, but I simply can't understand it no matter how hard I try
Important Notes:
It should match after OR before (not both), or we will end with
"this,is,,a,test"
There are no spaces between commas

How about something less flaky than a regex for this sort of replacement?
str = str
.split(',')
.filter(function(token) { return token !== 'unwanted' })
.join(',');
jsFiddle.
However if you are convinced a regex is the best way...
str = str.replace(/(^|,)?unwanted(,|$)?/g, function(all, leading, trailing) {
return leading && trailing ? ',' : '';
});
(thanks Logan F. Smyth.)
jsFiddle.

Since Alex hasn't fixed this in his solution, I wanted to get a fully functional version up somewhere.
var unwanted = 'unwanted';
var regex = new RegExp('(^|,)' + unwanted + '(,|$)', 'g');
str = str.replace(regex, function(a, pre, suf) {
return pre && suf ? ',' : '';
});
The only thing to be careful of when dynamically building a regex, is that the 'unwanted' variable can't have anything in it that could be interpretted as a regex pattern.
There are way easier ways to parse this though, as Alex mentioned. Don't resort to regular expressions unless you have to.

Related

regex to remove certain characters at the beginning and end of a string

Let's say I have a string like this:
...hello world.bye
But I want to remove the first three dots and replace .bye with !
So the output should be
hello world!
it should only match if both conditions apply (... at the beginning and .bye at the end)
And I'm trying to use js replace method. Could you please help? Thanks
First match the dots, capture and lazy-repeat any character until you get to .bye, and match the .bye. Then, you can replace with the first captured group, plus an exclamation mark:
const str = '...hello world.bye';
console.log(str.replace(/\.\.\.(.*)\.bye/, '$1!'));
The lazy-repeat is there to ensure you don't match too much, for example:
const str = `...hello world.bye
...Hello again! Goodbye.`;
console.log(str.replace(/\.\.\.(.*)\.bye/g, '$1!'));
You don't actually need a regex to do this. Although it's a bit inelegant, the following should work fine (obviously the function can be called whatever makes sense in the context of your application):
function manipulate(string) {
if (string.slice(0, 3) == "..." && string.slice(-4) == ".bye") {
return string.slice(4, -4) + "!";
}
return string;
}
(Apologies if I made any stupid errors with indexing there, but the basic idea should be obvious.)
This, to me at least, has the advantage of being easier to reason about than a regex. Of course if you need to deal with more complicated cases you may reach the point where a regex is best - but I personally wouldn't bother for a simple use-case like the one mentioned in the OP.
Your regex would be
const rx = /\.\.\.([\s\S]*?)\.bye/g
const out = '\n\nfoobar...hello world.bye\nfoobar...ok.bye\n...line\nbreak.bye\n'.replace(rx, `$1!`)
console.log(out)
In English, find three dots, anything eager in group, and ending with .bye.
The replacement uses the first match $1 and concats ! using a string template.
An arguably simpler solution:
const str = '...hello world.bye'
const newStr = /...(.+)\.bye/.exec(str)
const formatted = newStr ? newStr[1] + '!' : str
console.log(formatted)
If the string doesn't match the regex it will just return the string.

How to split a string by a character not directly preceded by a character of the same type?

Let's say I have a string: "We.need..to...split.asap". What I would like to do is to split the string by the delimiter ., but I only wish to split by the first . and include any recurring .s in the succeeding token.
Expected output:
["We", "need", ".to", "..split", "asap"]
In other languages, I know that this is possible with a look-behind /(?<!\.)\./ but Javascript unfortunately does not support such a feature.
I am curious to see your answers to this question. Perhaps there is a clever use of look-aheads that presently evades me?
I was considering reversing the string, then re-reversing the tokens, but that seems like too much work for what I am after... plus controversy: How do you reverse a string in place in JavaScript?
Thanks for the help!
Here's a variation of the answer by guest271314 that handles more than two consecutive delimiters:
var text = "We.need.to...split.asap";
var re = /(\.*[^.]+)\./;
var items = text.split(re).filter(function(val) { return val.length > 0; });
It uses the detail that if the split expression includes a capture group, the captured items are included in the returned array. These capture groups are actually the only thing we are interested in; the tokens are all empty strings, which we filter out.
EDIT: Unfortunately there's perhaps one slight bug with this. If the text to be split starts with a delimiter, that will be included in the first token. If that's an issue, it can be remedied with:
var re = /(?:^|(\.*[^.]+))\./;
var items = text.split(re).filter(function(val) { return !!val; });
(I think this regex is ugly and would welcome an improvement.)
You can do this without any lookaheads:
var subject = "We.need.to....split.asap";
var regex = /\.?(\.*[^.]+)/g;
var matches, output = [];
while(matches = regex.exec(subject)) {
output.push(matches[1]);
}
document.write(JSON.stringify(output));
It seemed like it'd work in one line, as it did on https://regex101.com/r/cO1dP3/1, but had to be expanded in the code above because the /g option by default prevents capturing groups from returning with .match (i.e. the correct data was in the capturing groups, but we couldn't immediately access them without doing the above).
See: JavaScript Regex Global Match Groups
An alternative solution with the original one liner (plus one line) is:
document.write(JSON.stringify(
"We.need.to....split.asap".match(/\.?(\.*[^.]+)/g)
.map(function(s) { return s.replace(/^\./, ''); })
));
Take your pick!
Note: This answer can't handle more than 2 consecutive delimiters, since it was written according to the example in the revision 1 of the question, which was not very clear about such cases.
var text = "We.need.to..split.asap";
// split "." if followed by "."
var res = text.split(/\.(?=\.)/).map(function(val, key) {
// if `val[0]` does not begin with "." split "."
// else split "." if not followed by "."
return val[0] !== "." ? val.split(/\./) : val.split(/\.(?!.*\.)/)
});
// concat arrays `res[0]` , `res[1]`
res = res[0].concat(res[1]);
document.write(JSON.stringify(res));

Need a regex that finds "string" but not "[string]"

I'm trying to build a regular expression that parses a string and skips things in brackets.
Something like
string = "A bc defg hi [hi] jkl mnop.";
The .match() should return "hi" but not [hi]. I've spent 5 hours running through RE's but I'm throwing in the towel.
Also this is for javascript or jquery if that matters.
Any help is appreciated. Also I'm working on getting my questions formatted correctly : )
EDIT:
Ok I just had a eureka moment and figured out that the original RegExp I was using actually did work. But when I was replaces the matches with the [matches] it simply replaced the first match in the string... over and over. I thought this was my regex refusing to skip the brackets but after much time of trying almost all of the solutions below, I realized that I was derping Hardcore.
When .replace was working its magic it was on the first match, so I quite simply added a space to the end of the result word as follows:
var result = string.match(regex);
var modifiedResult = '[' + result[0].toString() + ']';
string.replace(result[0].toString() + ' ', modifiedResult + ' ');
This got it to stop targeting the original word in the string and stop adding a new set of brackets to it with every match. Thank you all for your help. I am going to give answer credit to the post that prodded me in the right direction.
preprocess the target string by removing everything between brackets before trying to match your RE
string = "A bc defg hi [hi] jkl mnop."
tmpstring = string.replace(/\[.*\]/, "")
then apply your RE to tmpstring
correction: made the match for brackets eager per nhahtd comment below, and also, made the RE global
string = "A bc defg hi [hi] jkl mnop."
tmpstring = string.replace(/\[.*?\]/g, "")
You don't necessarily need regex for this. Simply use string manipulation:
var arr = string.split("[");
var final = arr[0] + arr[1].split("]")[1];
If there are multiple bracketed expressions, use a loop:
while (string.indexOf("[") != -1){
var arr = string.split("[");
string = arr[0] + arr.slice(1).join("[").split("]").slice(1).join("]");
}
Using only Regular Expressions, you can use:
hi(?!])
as an example.
Look here about negative lookahead: http://www.regular-expressions.info/lookaround.html
Unfortunately, javascript does not support negative lookbehind.
I used http://regexpal.com/ to test, abcd[hi]jkhilmnop as test data, hi(?!]) as the regex to find. It matched 'hi' without matching '[hi]'. Basically it matched the 'hi' so long as there was not a following ']' character.
This of course, can be expanded if needed. This has a benefit of not requiring any pre-processing for the string.
r"\[(.*)\]"
Just play arounds with this if you wanto to use regular expressions.
What do yo uwant to do with it? If you want to selectively replace parts like "hi" except when it's "[hi]", then I often use a system where I match what I want to avoid first and then what I want to watch; if it matches what I want to avoid then I return the match, otherwise I return the processed match.
Like this:
return string.replace(/(\[\w+\])|(\w+)/g, function(all, m1, m2) {return m1 || m2.toUpperCase()});
which, with the given string, returns:
"A BC DEFG HI [hi] JKL MNOP."
Thus: it replaces every word with uppercase (m1 is empty), except if the word is between square brackets (m1 is not empty).
This builds an array of all the strings contained in [ ]:
var regex = /\[([^\]]*)\]/;
var string = "A bc defg hi [hi] [jkl] mnop.";
var results=[], result;
while(result = regex.exec(string))
results.push(result[1]);
edit
To answer to the question, this regex returns the string less all is in [ ], and trim whitespaces:
"A bc defg [hi] mnop [jkl].".replace(/(\s{0,1})\[[^\]]*\](\s{0,1})/g,'$1')
Instead of skipping the match you can probably try something different - match everything but do not capture the string within square brackets (inclusive) with something like this:
var r = /(?:\[.*?[^\[\]]\])|(.)/g;
var result;
var str = [];
while((result = r.exec(s)) !== null){
if(result[1] !== undefined){ //true if [string] matched but not captured
str.push(result[1]);
}
}
console.log(str.join(''));
The last line will print parts of the string which do not match the [string] pattern. For example, when called with the input "A [bc] [defg] hi [hi] j[kl]u m[no]p." the code prints "A hi ju mp." with whitespaces intact.
You can try different things with this code e.g. replacing etc.

remove umlauts or specialchars in javascript string

Never played before with umlauts or specialchars in javascript strings. My problem is how to remove them?
For example I have this in javascript:
var oldstr = "Bayern München";
var str = oldstr.split(' ').join('-');
Result is Bayern-München ok easy, but now I want to remove the umlaut or specialchar like:
Real Sporting de Gijón.
How can I realize this?
Kind regards,
Frank
replace should be able to do it for you, e.g.:
var str = str.replace(/ü/g, 'u');
...of course ü and u are not the same letter. :-)
If you're trying to replace all characters outside a given range with something (like a -), you can do that by specifying a range:
var str = str.replace(/[^A-Za-z0-9\-_]/g, '-');
That replaces all characters that aren't English letters, digits, -, or _ with -. (The character range is the [...] bit, the ^ at the beginning means "not".) Here's a live example.
But that ("Bayern-M-nchen") may be a bit unpleasant for Mr. München to look at. :-) You could use a function passed into replace to try to just drop diacriticals:
var str = str.replace(/[^A-Za-z0-9\-_]/g, function(ch) {
// Character that look a bit like 'a'
if ("áàâä".indexOf(ch) >= 0) { // There are a lot more than this
return 'a';
}
// Character that look a bit like 'u'
if ("úùûü".indexOf(ch) >= 0) { // There are a lot more than this
return 'u';
}
/* ...long list of others...*/
// Default
return '-';
});
Live example
The above is optimized for long strings. If the string itself is short, you may be better off with repeated regexps:
var str = str.replace(/[áàâä]/g, 'a')
.replace(/[úùûü]/g, 'u')
.replace(/[^A-Za-z0-9\-_]/g, '-');
...but that's speculative.
Note that literal characters in JavaScript strings are totally fine, but you can run into fun with encoding of files. I tend to stick to unicode escapes. So for instance, the above would be:
var str = str.replace(/[\u00e4\u00e2\u00e0\u00e1]/g, 'a')
.replace(/[\u00fc\u00fb\u00f9\u00fa]/g, 'u')
.replace(' ','-');
...but again, there are a lot more to do...
Theres a npm package called "remove-accents".
Install the package: npm i remove-accents.
Import the remove
function: import { remove } from "remove-accents";
Use the function: remove(inputString)

Regex to match all '&' before first '?'

Basically, I want to do "zipzam&&&?&&&?&&&" -> "zipzam%26%26%26?&&&?&&&". I can do that without regex many different ways, but it'd cleanup things a tad bit if I could do it with regex.
Thanks
Edit: "zip=zam&&&=?&&&?&&&" -> "zip=zam%26%26%26=?&&&?&&&" should make things a little clearer.
Edit: "zip=zam=&=&=&=?&&&?&&&" -> "zip=zam=%26=%26=%26=?&&&?&&&" should make things clearer.
However, theses are just examples. I still want to replace all '&' before the first '?' no matter where the '&' are before the first '?' and no matter if the '&' are consecutive or not.
This should do it:
"zip=zam=&=&=&=?&&&?&&&".replace(/^[^?]+/, function(match) { return match.replace(/&/g, "%26"); });
you need negative lookbehinds which are tricky to replicate in JS, but fortunately there are ways and means:
var x = "zipzam&&&?&&&?&&&";
x.replace(/(&+)(?=.*?\?)/,function ($1) {for(var i=$1.length, s='';i;i--){s+='%26';} return s;})
commentary: this works because it's not global. The first match is therefore a given, and the trick of replacing all of the matching "&" chars 1:1 with "%26" is achieved with the function loop
edit: a solution for unknown groupings of "&" can be achieved simply (if perhaps a little clunkily) with a little modification. The basic pattern for replacer methods is infinitely flexible.
var x = "zipzam&foo&bar&baz?&&&?&&&";
var f = function ($1,$2)
{
return $2 + ($2=='' || $2.indexOf('?')>-1 ? '&' : '%26')
}
x.replace(/(.*?)&(?=.*?\?)/g,f)
This should do it:
^[^?]*&[^?]*\?
Or this one, I think:
^[^?]*(&+?)\?
In this case regexes are really not the most appropiate things to use. A simple search for the first index of '?' and then replacing each '&' character would be best. However, if you really want a regex then this should do the job.
(?:.*?(&))*?\?
This close enough to what you are after:-
alert("zipzam&&&?&&&?&&&".replace(/^([^&\?]*)(&*)\?/, function(s, p, m)
{
for (var i = 0; i < m.length; i++) p += '%26';
return p +'?';
}));
Since the OP only wants to match ampersands before the first question mark, slightly modifying Michael Borgwardt's answer gives me this Regex which appears to be appropriate :
^[^?&]*(\&+)\?
Replace all matches with "%26"
This will not match zipzam&&abc?&&&?&&& because the first "?" does not have an ampersand immediately before it.

Categories

Resources