Javascript - remove repeating characters when there are more than 2 repeats - javascript

As what the title says. When there are more than 2 repeats of a letter in a string, the excess repeats are removed.
I have the following code based off this answer but it does not seem to work:
function removeRepeatingLetters (text) {
return text.replace('^(?!.*([A-Za-z0-9])\1{2})(?=.*[a-z])(?=.*\d)[A-Za-z0-9]+$', '');
}
But it does not seem to work for my test string:
"bookkeepers! are amaazing! loooooooool"
The output for the sample string should be:
"bookkeepers! are amaazing! lool"
What am I doing wrong?

Try
"bookkeepers! are amaazing! loooooooool".replace(/(.)\1{2,}/g, '$1$1')
// "bookkeepers! are amaazing! lool"
The RegExp /(.)\1{2,}/ matches any single character followed by the same character two or more times.
The flag g ensures you match all occurrences.
Then, you replace each occurrence with the repeated character duplicated.
Note that the simpler .replace(/(.)\1+/g, '$1$1') should work too, but a bit slower because it does unnecessary replacements.

Another way (Oriol's answer works just fine) to do this is with a callback function:
function removeRepeatingLetters (text) {
return text.replace(/(.)\1{2,}/g, function(match, p1) {
return p1 + p1;
});
}
This will:
match an instances of an individual character repeated at least one - (.)\1{2,}
then it will pass the match and the first substring into a callback function - function(match, p1)
then it will return the first matched substring, appended to itself, as the value to replace the overall match - return p1 + p1;
Because of the g at the end of the regex, it will do it with all instances that it finds of repeated characters.
The above code works with the test string that you provided (along with a couple of others that I tested with ;) ). As mentioned, Oriol's works, but figured I'd share another option, since it gives you a glimpse into how to use the callback for .replace().

Related

Replace a string fails in some cases

I must check a string and replace something of them.
I have to check if
$mystring='abc...';
has one of this values px,em,%,vh,s
I use this to check the string
function replaceUnit(input){ return input.replace(/(px|em|%|vh|s)/i ,'') }
It works but produces error in some cases.
If I have in my string for example
$mystring="its embeded"
The function will replace the "s" and "em" that's not the way it should be.
The function should check if in mystring is
only a number+px
or only a number+em
or only a number+%
or only a number+vh
or only a number+s
If there is a match, the function should replace the textpart, in all other cases the function should do nothing.
Is it possible to create a kind of this function and how a replace code must be?
Thanks a lot.
UPDATE
based on one of the answears i trie to change it
var input="0s";
function replaceUnit(input)
{
console.log('check: '+input);
var test=input.replace(/(\d)(?:px|em|%|vh|s)$/i ,'');
console.log('->: '+test);
return test
}
the result in the console is
check: 0s
->:
Add a $ (end-of-string anchor) to the end of the regular expression, to ensure that it'll only match if the characters occur at the very end, and capture a number before those characters, so that you can replace with that number alone (thus stripping out the extra characters):
return input.replace(/(\d)(?:px|em|%|vh|s)$/i ,'$1')
https://regex101.com/r/IodB6z/1

Don't replace regex if it is enclosed by a character

I would like to replace all strings that are enclosed by - into strings enclosed by ~, but not if this string again is enclosed by *.
As an example, this string...
The -quick- *brown -f-ox* jumps.
...should become...
The ~quick~ *brown -f-ox* jumps.
We see - is only replaced if it is not within *<here>*.
My javascript-regex for now (which takes no care whether it is enclosed by * or not):
var message = source.replace(/-(.[^-]+?)-/g, "~$1~");
Edit: Note that it might be the case that there is an odd number of *s.
That's a tricky sort of thing to do with regular expressions. I think what I'd do is something like this:
var msg = source.replace(/(-[^-]+-|\*[^*]+\*)/g, function(_, grp) {
return grp[0] === '-' ? grp.replace(/^-(.*)-$/, "~$1~") : grp;
});
jsFiddle Demo
That looks for either - or * groups, and only performs the replacement on dashed ones. In general, "nesting" syntaxes are challenging (or impossible) with regular expressions. (And of course as a comment on the question notes, there are special cases — dangling metacharacters — that complicate this too.)
I would solve it by splitting the array based on * and then replacing only the even indices. Matching unbalanced stars is trickier, it involves knowing whether the last item index is odd or even:
'The -quick- *brown -f-ox* jumps.'
.split('*')
.map(function(item, index, arr) {
if (index % 2) {
if (index < arr.length - 1) {
return item; // balanced
}
// not balanced
item = '*' + item;
}
return item.replace(/\-([^-]+)\-/, '~$1~');
})
.join('');
Demo
Finding out whether a match is not enclosed by some delimiters is a very complicated task - see also this example. Lookaround could help, but JS only supports lookahead. So we could rewrite "not surrounded by ~" to "followed by an even number or ~", and match on that:
source.replace(/-([^-]+)-(?=[^~]*([^~]*~[^~]*~)*$)/g, "~$1~");
But better we match on both - and *, so that we consume anything wrapped in *s as well and can then decide in a callback function not to replace it:
source.replace(/-([^-]+)-|\*([^*]+)\*/g, function(m, hyp) {
if (hyp) // the first group has matched
return "~"+hyp+"~";
// else let the match be unchanged:
return m;
});
This has the advantage of being able to better specify "enclosed", e.g. by adding word boundaries on the "inside", for better handling of invalid patterns (odd number of * characters as mentioned by #Maras for example) - the current regex just takes the next two appearances.
A terser version of Jack's very clear answer.
source.split(/(\*[^*]*\*)/g).map(function(x,i){
return i%2?x:x.replace(/-/g,'~');
}).join('');
Seems to work,
Cheers.

Moving index in JavaScript regex matching

I have this regex to extract double words from text
/[A-Za-z]+\s[A-Za-z]+/g
And this sample text
Mary had a little lamb
My output is this
[0] - Mary had; [1] - a little;
Whereas my expected output is this:
[0] - Mary had; [1] - had a; [2] - a little; [3] - little lamb
How can I achieve this output? As I understand it, the index of the search moves to the end of the first match. How can I move it back one word?
Abusing String.replace function
I use a little trick using the replace function. Since the replace function loops through the matches and allows us to specify a function, the possibility is infinite. The result will be in output.
var output = [];
var str = "Mary had a little lamb";
str.replace(/[A-Za-z]+(?=(\s[A-Za-z]+))/g, function ($0, $1) {
output.push($0 + $1);
return $0; // Actually we don't care. You don't even need to return
});
Since the output contains overlapping portion in the input string, it is necessary to not to consume the next word when we are matching the current word by using look-ahead 1.
The regex /[A-Za-z]+(?=(\s[A-Za-z]+))/g does exactly as what I have said above: it will only consume one word at a time with the [A-Za-z]+ portion (the start of the regex), and look-ahead for the next word (?=(\s[A-Za-z]+)) 2, and also capture the matched text.
The function passed to the replace function will receive the matched string as the first argument and the captured text in subsequent arguments. (There are more - check the documentation - I don't need them here). Since the look-ahead is zero-width (the input is not consumed), the whole match is also conveniently the first word. The capture text in the look-ahead will go into the 2nd argument.
Proper solution with RegExp.exec
Note that String.replace function incurs a replacement overhead, since the replacement result is not used at all. If this is unacceptable, you can rewrite the above code with RegExp.exec function in a loop:
var output = [];
var str = "Mary had a little lamb";
var re = /[A-Za-z]+(?=(\s[A-Za-z]+))/g;
var arr;
while ((arr = re.exec(str)) != null) {
output.push(arr[0] + arr[1]);
}
Footnote
In other flavor of regex which supports variable width negative look-behind, it is possible to retrieve the previous word, but JavaScript regex doesn't support negative look-behind!.
(?=pattern) is syntax for look-ahead.
Appendix
String.match can't be used here since it ignores the capturing group when g flag is used. The capturing group is necessary in the regex, as we need look-around to avoid consuming input and match overlapping text.
It can be done without regexp
"Mary had a little lamb".split(" ")
.map(function(item, idx, arr) {
if(idx < arr.length - 1){
return item + " " + arr[idx + 1];
}
}).filter(function(item) {return item;})
Here's a non-regex solution (it's not really a regular problem).
function pairs(str) {
var parts = str.split(" "), out = [];
for (var i=0; i < parts.length - 1; i++)
out.push([parts[i], parts[i+1]].join(' '));
return out;
}
Pass your string and you get an array back.
demo
Side note: if you're worried about non-words in your input (making a case for regular expressions!) you can run tests on parts[i] and parts[i+1] inside the for loop. If the tests fail: don't push them onto out.
A way that you could like could be this one:
var s = "Mary had a little lamb";
// Break on each word and loop
s.match(/\w+/g).map(function(w) {
// Get the word, a space and another word
return s.match(new RegExp(w + '\\s\\w+'));
// At this point, there is one "null" value (the last word), so filter it out
}).filter(Boolean)
// There, we have an array of matches -- we want the matched value, i.e. the first element
.map(Array.prototype.shift.call.bind(Array.prototype.shift));
If you run this in your console, you'll see ["Mary had", "had a", "a little", "little lamb"].
With this way, you keep your original regex and can do the other stuff you want in it. Although with some code around it to make it really work.
By the way, this code is not cross-browser. The following functions are not supported in IE8 and below:
Array.prototype.filter
Array.prototype.map
Function.prototype.bind
But they're easily shimmable. Or the same functionality is easily achievable with for.
Here we go:
You still don't know how the regular expression internal pointer really works, so I will explain it to you with a little example:
Mary had a little lamb with this regex /[A-Za-z]+\s[A-Za-z]+/g
Here, the first part of the regex: [A-Za-z]+ will match Mary so the pointer will be at the end of the y
Mary had a little lamb
^
In the next part (\s[A-Za-z]+) it will match an space followed by another word so...
Mary had a little lamb
^
The pointer will be where the word had ends. So here's your problem, you are increasing the internal pointer of the regular expression without wanting, how is this solved? Lookaround is your friend. With lookarounds (lookahead and lookbehind) you are able to walk through your text without increasing the main internal pointer of the regular expression (it would use another pointer for that).
So at the end, the regular expression that would match what you want would be: ([A-Za-z]+(?=\s[A-Za-z]+))
Explanation:
The only think you dont know about that regular expression is the (?=\s[A-Za-z]+) part, it means that the [A-Za-z]+ must be followed by a word, else the regular expression won't match. And this is exactly what you seem to want because the interal pointer will not be increased and will match everyword but the last one because the last one won't be followed by a word.
Then, once you have that you only have to replace whatever you are done right now.
Here you have a working example, DEMO
In full admiration of the concept of 'look-ahead', I still propose a pairwise function (demo), since it's really Regex's task to tokenize a character stream, and the decision of what to do with the tokens is up to the business logic. At least, that's my opinion.
A shame that Javascript hasn't got a pairwise, yet, but this could do it:
function pairwise(a, f) {
for (var i = 0; i < a.length - 1; i++) {
f(a[i], a[i + 1]);
}
}
var str = "Mary had a little lamb";
pairwise(str.match(/\w+/g), function(a, b) {
document.write("<br>"+a+" "+b);
});
​

Using replace and regex to capitalize first letter of each word of a string in JavaScript

The following,though redundant, works perfectly :
'leap of, faith'.replace(/([^ \t]+)/g,"$1");
and prints "leap of, faith", but in the following :
'leap of, faith'.replace(/([^ \t]+)/g,RegExp.$1); it prints "faith faith faith"
As a result when I wish to capitalize each word's first character like:
'leap of, faith'.replace(/([^ \t]+)/g,RegExp.$1.capitalize());
it doesn't work. Neither does,
'leap of, faith'.replace(/([^ \t]+)/g,"$1".capitalize);
because it probably capitalizes "$1" before substituting the group's value.
I want to do this in a single line using prototype's capitalize() method
You can pass a function as the second argument of ".replace()":
"string".replace(/([^ \t]+)/g, function(_, word) { return word.capitalize(); });
The arguments to the function are, first, the whole match, and then the matched groups. In this case there's just one group ("word"). The return value of the function is used as the replacement.

getting contents of string between digits

have a regex problem :(
what i would like to do is to find out the contents between two or more numbers.
var string = "90+*-+80-+/*70"
im trying to edit the symbols in between so it only shows up the last symbol and not the ones before it. so trying to get the above variable to be turned into 90+80*70. although this is just an example i have no idea how to do this. the length of the numbers, how many "sets" of numbers and the length of the symbols in between could be anything.
many thanks,
Steve,
The trick is in matching '90+-+' and '80-+/' seperately, and selecting only the number and the last constant.
The expression for finding the a number followed by 1 or more non-numbers would be
\d+[^\d]+
To select the number and the last non-number, add parens:
(\d+)[^\d]*([^\d])
Finally add a /g to repeat the procedure for each match, and replace it with the 2 matched groups for each match:
js> '90+*-+80-+/*70'.replace(/(\d+)[^\d]*([^\d])/g, '$1$2');
90+80*70
js>
Or you can use lookahead assertion and simply remove all non-numerical characters which are not last: "90+*-+80-+/*70".replace(/[^0-9]+(?=[^0-9])/g,'');
You can use a regular expression to match the non-digits and a callback function to process the match and decide what to replace:
var test = "90+*-+80-+/*70";
var out = test.replace(/[^\d]+/g, function(str) {
return(str.substr(-1));
})
alert(out);
See it work here: http://jsfiddle.net/jfriend00/Tncya/
This works by using a regular expression to match sequences of non-digits and then replacing that sequence of non-digits with the last character in the matched sequence.
i would use this tutorial, first, then review this for javascript-specific regex questions.
This should do it -
var string = "90+*-+80-+/*70"
var result = '';
var arr = string.split(/(\d+)/)
for (i = 0; i < arr.length; i++) {
if (!isNaN(arr[i])) result = result + arr[i];
else result = result + arr[i].slice(arr[i].length - 1, arr[i].length);
}
alert(result);
Working demo - http://jsfiddle.net/ipr101/SA2pR/
Similar to #Arnout Engelen
var string = "90+*-+80-+/*70";
string = string.replace(/(\d+)[^\d]*([^\d])(?=\d+)/g, '$1$2');
This was my first thinking of how the RegEx should perform, it also looks ahead to make sure the non-digit pattern is followed by another digit, which is what the question asked for (between two numbers)
Similar to #jfriend00
var string = "90+*-+80-+/*70";
string = string.replace( /(\d+?)([^\d]+?)(?=\d+)/g
, function(){
return arguments[1] + arguments[2].substr(-1);
});
Instead of only matching on non-digits, it matches on non-digits between two numbers, which is what the question asked
Why would this be any better?
If your equation was embedded in a paragraph or string of text. Like:
This is a test where I want to clean up something like 90+*-+80-+/*70 and don't want to scrap the whole paragraph.
Result (Expected) :
This is a test where I want to clean up something like 90+80*70 and don't want to scrap the whole paragraph.
Why would this not be any better?
There is more pattern matching, which makes it theoretically slower (negligible)
It would fail if your paragraph had embedded numbers. Like:
This is a paragraph where Sally bought 4 eggs from the supermarket, but only 3 of them made it back in one piece.
Result (Unexpected):
This is a paragraph where Sally bought 4 3 of them made it back in one piece.

Categories

Resources