Regex remove duplicate adjacent characters in javascript - javascript

I've been struggling getting my regex function to work as intended. My goal is to iterate endlessly over a string (until no match is found) and remove all duplicate, adjacent characters. Aside from checking if 2 characters (adjacent of each other) are equal, the regex should only remove the match when one of the pair is uppercase.
e.g. the regex should only remove 'Xx' or 'xX'.
My current regex only removes matches where a lowercase character is followed by any uppercase character.
(.)(([a-z]{0})+[A-Z])
How can I implement looking for the same adjacent character and the pattern of looking for an uppercase character followed by an equal lowercase character?

You'd either have to list out all possible combinations, eg
aA|Aa|bB|Bb...
Or implement it more programatically, without regex:
let str = 'fooaBbAfoo';
outer:
while (true) {
for (let i = 0; i < str.length - 1; i++) {
const thisChar = str[i];
const nextChar = str[i + 1];
if (/[a-z]/i.test(thisChar) && thisChar.toUpperCase() === nextChar.toUpperCase() && thisChar !== nextChar) {
str = str.slice(0, i) + str.slice(i + 2);
continue outer;
}
}
break;
}
console.log(str);

Looking for the same adjacent character: /(.)\1/
Looking for an uppercase character followed by an equal lowercase character isn't possible in JavaScript since it doesn't support inline modifiers. If they were regex should be: /(.)(?!\1)(?i:\1)/, so it matches both 'xX' or 'Xx'

Related

How to join the string either with a dot, exclamation mark, or a question mark?

I want to convert a string to the sentence case. That is, uppercase the first character in each sentence and lowercase the following characters. I managed to do this. However, after splitting the string and converting it to a sentence case, I need to join it again with a corresponding character.
Here is my code that splits the string into sentences:
const string = "my seNTencE . My sentence! my another sentence. yEt another senTence? Again my sentence .";
function splitString(str) {
str = str.split(/[.!?]/);
for(let i = 0; i < str.length; i++) {
str[i] = str[i].trim();
}
for(let i = 0; i < str.length; i++) {
str[i] = str[i].charAt(0).toUpperCase() + str[i].slice(1).toLowerCase();
}
return str;
}
console.log(splitString(string));
In the return statement, I want to return joined strings. For example, the first sentence must end with a dot, and the second must end with an exclamation mark, etc. How to implement this?
str.split eliminates the result of the regex match from the string. If you want to keep it, you can place the separator in a lookbehind like this:
str.split(/(?<=[.!?])/);
The syntax (?<= ) means the regex will find positions that are preceded by punctuation, but won't include said punctuation in the match, so the split method will leave it in.
As a side note, keep in mind that this function will ruin acronyms, proper nouns, and the word I. Forcing the first letter after a period to be a capital letter is probably fine, but you will find that this function does more harm than good.
Use a regular expression with capture groups. This regex uses the lazy ? modifier so the match will end at the first [!.?], and the global g flag to grab all matches.
const string = "my seNTencE . My sentence! my another sentence. yEt another senTence? Again my sentence ."
const rx = /(.*?)([.!?])/g
const found = []
while (m = rx.exec(string)) {
let str = m[1].trim()
str = str.charAt(0).toUpperCase() + str.slice(1).toLowerCase()
found.push(str + m[2])
}
console.log(found)

Regex match apostrophe inside, but not around words, inside a character set

I'm counting how many times different words appear in a text using Regular Expressions in JavaScript. My problem is when I have quoted words: 'word' should be counted simply as word (without the quotes, otherwise they'll behave as two different words), while it's should be counted as a whole word.
(?<=\w)(')(?=\w)
This regex can identify apostrophes inside, but not around words. Problem is, I can't use it inside a character set such as [\w]+.
(?<=\w)(')(?=\w)|[\w]+
Will count it's a 'miracle' of nature as 7 words, instead of 5 (it, ', s becoming 3 different words). Also, the third word should be selected simply as miracle, and not as 'miracle'.
To make things even more complicated, I need to capture diacritics too, so I'm using [A-Za-zÀ-ÖØ-öø-ÿ] instead of \w.
How can I accomplish that?
1) You can simply use /[^\s]+/g regex
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g);
console.log(result.length);
console.log(result);
2) If you are calculating total number of words in a string then you can also use split as:
const str = `it's a 'miracle' of nature`;
const result = str.split(/\s+/);
console.log(result.length);
console.log(result);
3) If you want a word without quote at the starting and at the end then you can do as:
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g).map((s) => {
s = s[0] === "'" ? s.slice(1) : s;
s = s[s.length - 1] === "'" ? s.slice(0, -1) : s;
return s;
});
console.log(result.length);
console.log(result);
You might use an alternation with 2 capture groups, and then check for the values of those groups.
(?<!\S)'(\S+)'(?!\S)|(\S+)
(?<!\S)' Negative lookbehind, assert a whitespace boundary to the left and match '
(\S+) Capture group 1, match 1+ non whitespace chars
'(?!\S) Match ' and assert a whitespace boundary to the right
| Or
(\S+) Capture group 2, match 1+ non whitespace chars
See a regex demo.
const regex = /(?<!\S)'(\S+)'(?!\S)|(\S+)/g;
const s = "it's a 'miracle' of nature";
Array.from(s.matchAll(regex), m => {
if (m[1]) console.log(m[1])
if (m[2]) console.log(m[2])
});

javascript remove all succeeding occurrence of a character in string

I'm aiming to remove a succeeding occurrence of 2 particular characters from a string: the dot and the negative sign. let's say we have -123-456.78.9.0-12, I should be getting -123456.789012 afterwards. can it be done via regex replace?
If I may add, my complete goal is to just allow numbers, negative sign, and dot, with the negative sign only being allowed either as the first character or not present at all.
thanks so much
You can do this in 3 replace calls:
function repl(n) {
return n.replace(/[^\d.-]+/g, '') // remove all non-digits except - and .
.replace(/^([^.]*\.)|\./g, '$1') // remove all dots except first one
.replace(/(?!^)-/g, '') // remove all hyphens except first one
}
console.log(repl('-123-456.78.9.0-12'))
//=> "-123456.789012"
console.log(repl('-123-#456.78.9.0-12-abc-foo'))
//=> "-123456.789012"
console.log(repl('-1234'))
//=> "-1234"
console.log(repl('#-123-#456.78.9.0-12-abc-foo'))
//=> "-123456.789012"
Here:
First replace method is replacing every non-digit character except - and .
Second replace method is replacing every dot except the first one.
Third replace method is replacing every hyphen except the first hyphen.
If you want to avoid using RegExps, you can do something like this:
let str = '-123-456.78.9.0-12';
let output = '';
if (str[0] == '-') output += '-';
let periodIdx = str.indexOf('.');
for (let idx = 0; idx < str.length; idx += 1) {
let char = str.charCodeAt(idx);
if (char > 47 && char < 58) output += str[idx];
if (idx == periodIdx) output += '.';
}
console.log(output);
If I may add, my complete goal is to just allow numbers, negative sign, and dot, with the negative sign only being allowed either as the first character or not present at all.
^-?[^.-]*\.?[^.-]*$

JavaScript: Amend the Sentence

I am having trouble below javaScript problem.
Question:
You have been given a string s, which is supposed to be a sentence. However, someone forgot to put spaces between the different words, and for some reason they capitalized the first letter of every word. Return the sentence after making the following amendments:
Put a single space between the words.
Convert the uppercase letters to lowercase.
Example
"CodefightsIsAwesome", the output should be "codefights is awesome";
"Hello", the output should be "hello".
My current code is:
Right now, my second for-loop just manually slices the parts from the string.
How can I make this dynamic and insert "space" in front of the Capital String?
You can use String.prototype.match() with RegExp /[A-Z][^A-Z]*/g to match A-Z followed by one or more characters which are not A-Z, or character at end of string; chain Array.prototype.map() to call .toLowerCase() on matched words, .join() with parameter " " to include space character between matches at resulting string.
var str = "CodefightsIsAwesome";
var res = str.match(/[A-Z][^A-Z]*/g).map(word => word.toLowerCase()).join(" ");
console.log(res);
Alternatively, as suggested by #FissureKing, you can use String.prototype.repalce() with .trim() and .toLowerCase() chained
var str = "CodefightsIsAwesome";
var res = str.replace(/[A-Z][^A-Z]*/g, word => word + ' ').trim().toLowerCase();
console.log(res);
Rather than coding a loop, I'd do it in one line with a (reasonably) simple string replacement:
function amendTheSentence(s) {
return s.replace(/[A-Z]/g, function(m) { return " " + m.toLowerCase() })
.replace(/^ /, "");
}
console.log(amendTheSentence("CodefightsIsAwesome"));
console.log(amendTheSentence("noCapitalOnFirstWord"));
console.log(amendTheSentence("ThereIsNobodyCrazierThanI"));
That is, match any uppercase letter with the regular expression /[A-Z]/, replace the matched letter with a space plus that letter in lowercase, then remove any space that was added at the start of the string.
Further reading:
String .replace() method
Regular expressions
We can loop through once.
The below assumes the very first character should always be capitalized in our return array. If that is not true, simply remove the first if block from below.
For each character after that, we check to see if it is capitalized. If so, we add it to our return array, prefaced with a space. If not, we add it as-is into our array.
Finally, we join the array back into a string and return it.
const sentence = "CodefightsIsAwesome";
const amend = function(s) {
ret = [];
for (let i = 0; i < s.length; i++) {
const char = s[i];
if (i === 0) {
ret.push(char.toUpperCase());
} else if (char.toUpperCase() === char) {
ret.push(` ${char.toLowerCase()}`);
} else {
ret.push(char);
}
}
return ret.join('');
};
console.log(amend(sentence));

Javascript - Using Concatenated String in Regex

I'm trying to find if a given string of digits contains a sequence of three identical digits.
using a for loop, each digit in the string gets its own representation of a three digit sequence which is then checked against the string using Regex:
var str = "6854777322"
for(var i=0; i<str.length; i++)
{
seqToCompare = str[i] + str[i] + str[i];
var re = new RegExp(seqToCompare, "g");
if(str.match(re).length == 1)
{
match = str[i];
}
}
console.log(match)
The result should be seven (if I put 777 in seqToCompare, it would work), but it looks like the concatenation causes it to fail. Console shows "cannot read property length for null".
You can test it here - https://jsfiddle.net/kwnL7vLs/
I tried .toString, setting seqToCompare in Regex format and even parsing it as int (out of desperation for not knowing what to do anymore...)
Rather than looping over each character, you can use a simple regex to get a digit that is repeated 3 times:
/(\d)(?=\1{2})/
(\d) - Here we match a digit and group it in captured group #1
(?=\1{2}) is lookahead that asserts same captured group #1 is repeated twice ahead of current position
RegEx Demo
anubhava's answer is the way to go, as it's more efficient and simpler. However, if you're wondering why your code specifically is giving an error, it's because you try to find the length property of the return value of str.match(), even when no match is found.
Try this instead:
var str = "6854777322"
for(var i=0; i<str.length; i++)
{
seqToCompare = str[i] + str[i] + str[i];
var re = new RegExp(seqToCompare, "g");
if(str.match(re))
{
match = str[i];
}
}
console.log(match)

Categories

Resources