Regex to match all '&' before first '?' - javascript

Basically, I want to do "zipzam&&&?&&&?&&&" -> "zipzam%26%26%26?&&&?&&&". I can do that without regex many different ways, but it'd cleanup things a tad bit if I could do it with regex.
Thanks
Edit: "zip=zam&&&=?&&&?&&&" -> "zip=zam%26%26%26=?&&&?&&&" should make things a little clearer.
Edit: "zip=zam=&=&=&=?&&&?&&&" -> "zip=zam=%26=%26=%26=?&&&?&&&" should make things clearer.
However, theses are just examples. I still want to replace all '&' before the first '?' no matter where the '&' are before the first '?' and no matter if the '&' are consecutive or not.

This should do it:
"zip=zam=&=&=&=?&&&?&&&".replace(/^[^?]+/, function(match) { return match.replace(/&/g, "%26"); });

you need negative lookbehinds which are tricky to replicate in JS, but fortunately there are ways and means:
var x = "zipzam&&&?&&&?&&&";
x.replace(/(&+)(?=.*?\?)/,function ($1) {for(var i=$1.length, s='';i;i--){s+='%26';} return s;})
commentary: this works because it's not global. The first match is therefore a given, and the trick of replacing all of the matching "&" chars 1:1 with "%26" is achieved with the function loop
edit: a solution for unknown groupings of "&" can be achieved simply (if perhaps a little clunkily) with a little modification. The basic pattern for replacer methods is infinitely flexible.
var x = "zipzam&foo&bar&baz?&&&?&&&";
var f = function ($1,$2)
{
return $2 + ($2=='' || $2.indexOf('?')>-1 ? '&' : '%26')
}
x.replace(/(.*?)&(?=.*?\?)/g,f)

This should do it:
^[^?]*&[^?]*\?

Or this one, I think:
^[^?]*(&+?)\?

In this case regexes are really not the most appropiate things to use. A simple search for the first index of '?' and then replacing each '&' character would be best. However, if you really want a regex then this should do the job.
(?:.*?(&))*?\?

This close enough to what you are after:-
alert("zipzam&&&?&&&?&&&".replace(/^([^&\?]*)(&*)\?/, function(s, p, m)
{
for (var i = 0; i < m.length; i++) p += '%26';
return p +'?';
}));

Since the OP only wants to match ampersands before the first question mark, slightly modifying Michael Borgwardt's answer gives me this Regex which appears to be appropriate :
^[^?&]*(\&+)\?
Replace all matches with "%26"
This will not match zipzam&&abc?&&&?&&& because the first "?" does not have an ampersand immediately before it.

Related

regex to remove certain characters at the beginning and end of a string

Let's say I have a string like this:
...hello world.bye
But I want to remove the first three dots and replace .bye with !
So the output should be
hello world!
it should only match if both conditions apply (... at the beginning and .bye at the end)
And I'm trying to use js replace method. Could you please help? Thanks
First match the dots, capture and lazy-repeat any character until you get to .bye, and match the .bye. Then, you can replace with the first captured group, plus an exclamation mark:
const str = '...hello world.bye';
console.log(str.replace(/\.\.\.(.*)\.bye/, '$1!'));
The lazy-repeat is there to ensure you don't match too much, for example:
const str = `...hello world.bye
...Hello again! Goodbye.`;
console.log(str.replace(/\.\.\.(.*)\.bye/g, '$1!'));
You don't actually need a regex to do this. Although it's a bit inelegant, the following should work fine (obviously the function can be called whatever makes sense in the context of your application):
function manipulate(string) {
if (string.slice(0, 3) == "..." && string.slice(-4) == ".bye") {
return string.slice(4, -4) + "!";
}
return string;
}
(Apologies if I made any stupid errors with indexing there, but the basic idea should be obvious.)
This, to me at least, has the advantage of being easier to reason about than a regex. Of course if you need to deal with more complicated cases you may reach the point where a regex is best - but I personally wouldn't bother for a simple use-case like the one mentioned in the OP.
Your regex would be
const rx = /\.\.\.([\s\S]*?)\.bye/g
const out = '\n\nfoobar...hello world.bye\nfoobar...ok.bye\n...line\nbreak.bye\n'.replace(rx, `$1!`)
console.log(out)
In English, find three dots, anything eager in group, and ending with .bye.
The replacement uses the first match $1 and concats ! using a string template.
An arguably simpler solution:
const str = '...hello world.bye'
const newStr = /...(.+)\.bye/.exec(str)
const formatted = newStr ? newStr[1] + '!' : str
console.log(formatted)
If the string doesn't match the regex it will just return the string.

How does .split(/_(.+)?/)[i] work?

After finding this solution useful,
split string only on first instance of specified character
I'm confused at how this actually works. One top comment explains, "Just to be clear, the reason this solution works is because everything after the first _ is matched inside a capturing group, and gets added to the token list for that reason." - #Alan Moore
That doesn't make sense to me; what's a "capturing group"? Additionally, the author's positive-rated solution,
"good_luck_buddy".split(/_(.+)?/)[1]
"luck_buddy"
is being noted in the comments as having an improved method by omitting the question mark, ?,
split(/_(.+)/)
or omitting the question mark and replacing the plus sign, +, with an asterisk, *.
split(/_(.*)/)
Which is actually the best solution and why?
Thank you.
"good_luck_buddy".split(/_(.+)?/)
doesn't really make much sense. It's essentially the same as
"good_luck_buddy".split(/_(.*)/)
("match 1 or more, optionally" is the same as "match 0 or more").
The behaviour of regex.split in most languages is "take pieces of string that do not match":
"a_#b_#c".split(/_#/) => ["a", "b", "c"]
If the split expression contains capturing groups (...), these are also included in the resulting list:
"a_#b_#c".split(/_(#)/) => ["a", "#", "b", "#", "c"]
So the above code
"good_luck_buddy".split(/_(.*)/)
works as follows:
it finds the first piece in the string that doesn't match _(.*). This is good.
it finds a piece that does match _(.*). This is _luck_buddy. Since there's a capturing group, its content (luck_buddy) is also included in the output
finally, it finds the next piece that doesn't match _(.*). This is an empty string, and it's added to the output, so the output becomes ["good", "luck_buddy", ""]
To address the "what's the best" part, I'd use the second voted solution for a literal splitter:
result = str.slice(str.indexOf('_') + 1)
and .replace for a regex splitter:
result = str.replace(/.*?<regex>/, '')
I'm not going to explain how basic RegEx works ("what is a capture group" ...). But to answer your question "which is best and why": It's just a matter of performance. Different regexes result in different processing times in the regex processor.
See this jsperf comparision:
http://jsperf.com/regex-split-on-first-occurence-of-char
I tested IE11, FF and Chrome. There is not really a noticable difference between the three regex variants in this case.
No need for a regular expression. Just find the index of the '_' (underscores) and get the substring.
function head(str, pattern) {
var index = str.indexOf(pattern);
return index > -1 ? str.substring(0, index) : '';
}
function tail(str, pattern) {
var index = str.indexOf(pattern);
return index > -1 ? str.substr(index + 1) : '';
}
function foot(str, pattern) { // Made this one up...
var index = str.lastIndexOf(pattern);
return index > -1 ? str.substr(index + 1) : '';
}
var str = "good_luck_buddy";
var pattern = '_';
document.body.innerHTML = head(str, pattern) + '<br />';
document.body.innerHTML += tail(str, pattern) + '<br />';
document.body.innerHTML += foot(str, pattern);
If you want to find the index of a pattern (regex) in a string, this question will show you the way:
Polyfill for String.prototype.regexIndexOf(regex, startpos)

Match two string patterns at the same time with regex for removal

A quick question for string manipulation in Javascript.
I have some files named with this pattern:
content_filename
content_big_filename
The filename is in a variable, so I cannot take the last 8 chars.
I need to extract the filename.
Now I'm using
string.replace( /\/content_/, '' ) but I need also to support content_big.
How should I go about this?
If it's always the last part after the underscore, just split and pop
str.split('_').pop();
document.body.innerHTML += "content_filename1".split('_').pop() + '<br>';
document.body.innerHTML += "content_big_filename2".split('_').pop()
You can do it in sequence as long as you do it in the right order.
While content_ will match both cases, content_big_ will only match the second case.
By looking at content_big_ first, you won't break the content_ check that you do subsequently.
If you want to go the regex path, as demonstrated in your question, use the following.
string.replace( /content(_big)?_/, '' )
The (some_text_here)? (specifically the ?) in the regex defines a block of regex that can match, but isn't required.
Please note: If regex can be avoided, it's almost always a good idea to do so. See adeneos answer for further advice.
str.substring(str.lastIndexOf("_") + 1)
edit for explanation :
substring : extracts the characters from a string, between two specified indices. If the second indice is not specified, it extracts from the indice to the end of the string.
lastIndexOf : return the indice of the last "_" character of the string.
So what you want to return is the string after the last "_" to the end of the string.
However if the "filename" contain a "_", it will not work.

Don't replace regex if it is enclosed by a character

I would like to replace all strings that are enclosed by - into strings enclosed by ~, but not if this string again is enclosed by *.
As an example, this string...
The -quick- *brown -f-ox* jumps.
...should become...
The ~quick~ *brown -f-ox* jumps.
We see - is only replaced if it is not within *<here>*.
My javascript-regex for now (which takes no care whether it is enclosed by * or not):
var message = source.replace(/-(.[^-]+?)-/g, "~$1~");
Edit: Note that it might be the case that there is an odd number of *s.
That's a tricky sort of thing to do with regular expressions. I think what I'd do is something like this:
var msg = source.replace(/(-[^-]+-|\*[^*]+\*)/g, function(_, grp) {
return grp[0] === '-' ? grp.replace(/^-(.*)-$/, "~$1~") : grp;
});
jsFiddle Demo
That looks for either - or * groups, and only performs the replacement on dashed ones. In general, "nesting" syntaxes are challenging (or impossible) with regular expressions. (And of course as a comment on the question notes, there are special cases — dangling metacharacters — that complicate this too.)
I would solve it by splitting the array based on * and then replacing only the even indices. Matching unbalanced stars is trickier, it involves knowing whether the last item index is odd or even:
'The -quick- *brown -f-ox* jumps.'
.split('*')
.map(function(item, index, arr) {
if (index % 2) {
if (index < arr.length - 1) {
return item; // balanced
}
// not balanced
item = '*' + item;
}
return item.replace(/\-([^-]+)\-/, '~$1~');
})
.join('');
Demo
Finding out whether a match is not enclosed by some delimiters is a very complicated task - see also this example. Lookaround could help, but JS only supports lookahead. So we could rewrite "not surrounded by ~" to "followed by an even number or ~", and match on that:
source.replace(/-([^-]+)-(?=[^~]*([^~]*~[^~]*~)*$)/g, "~$1~");
But better we match on both - and *, so that we consume anything wrapped in *s as well and can then decide in a callback function not to replace it:
source.replace(/-([^-]+)-|\*([^*]+)\*/g, function(m, hyp) {
if (hyp) // the first group has matched
return "~"+hyp+"~";
// else let the match be unchanged:
return m;
});
This has the advantage of being able to better specify "enclosed", e.g. by adding word boundaries on the "inside", for better handling of invalid patterns (odd number of * characters as mentioned by #Maras for example) - the current regex just takes the next two appearances.
A terser version of Jack's very clear answer.
source.split(/(\*[^*]*\*)/g).map(function(x,i){
return i%2?x:x.replace(/-/g,'~');
}).join('');
Seems to work,
Cheers.

Regex: match word (but delete commas after OR before)

I have tried to delete an item from a string divided with commas:
var str="this,is,unwanted,a,test";
if I do a simple str.replace('unwanted',''); I end up with 2 commas
if I do a more complex str.replace('unwanted','').replace(',,','');
It might work
But the problem comes when the str is like this:
var str="unwanted,this,is,a,test"; // or "...,unwanted"
However, I could do a 'if char at [0 or str.length] == comma', then remove it
But I really think this is not the way to go, it is absurd I need to do 2 replaces and 2 ifs to achieve what I want
I have heard that regex can do powerful stuff, but I simply can't understand it no matter how hard I try
Important Notes:
It should match after OR before (not both), or we will end with
"this,is,,a,test"
There are no spaces between commas
How about something less flaky than a regex for this sort of replacement?
str = str
.split(',')
.filter(function(token) { return token !== 'unwanted' })
.join(',');
jsFiddle.
However if you are convinced a regex is the best way...
str = str.replace(/(^|,)?unwanted(,|$)?/g, function(all, leading, trailing) {
return leading && trailing ? ',' : '';
});
(thanks Logan F. Smyth.)
jsFiddle.
Since Alex hasn't fixed this in his solution, I wanted to get a fully functional version up somewhere.
var unwanted = 'unwanted';
var regex = new RegExp('(^|,)' + unwanted + '(,|$)', 'g');
str = str.replace(regex, function(a, pre, suf) {
return pre && suf ? ',' : '';
});
The only thing to be careful of when dynamically building a regex, is that the 'unwanted' variable can't have anything in it that could be interpretted as a regex pattern.
There are way easier ways to parse this though, as Alex mentioned. Don't resort to regular expressions unless you have to.

Categories

Resources