Regex non exclusive group - javascript

Do you how can I get this result with Regex? In literal words, I want each groups of successives vowels with the maximum of consonants surrounding them (backward or foreward).
Example :
input : alliibaba
outputs :
[all]
[lliib]
[bab]
[ba]
I tried :
[bcdfghjklmnpqrstvwxyz]*[aeiou]+[bcdfghjklmnpqrstvwxyz]*
but it returns distincts groups, so I don't know if it's possible with Regex...
Thanks you for any help.

You can put the part of the regex that you want to be made non-exclusive in a lookaround pattern so that it leaves that portion of the search buffer for the next match. Since your rule appears to be that vowels do not overlap between matches while the surrounding consonants can, you can group the consonants after vowels in a lookahead pattern, while putting vowels and any preceding consonants in another capture group, and then concatenate the 2 matching groups into a string for output:
var re = /([bcdfghjklmnpqrstvwxyz]*[aeiou]+)(?=([bcdfghjklmnpqrstvwxyz]*))/g;
var s = 'alliibaba';
var m;
do {
m = re.exec(s);
if (m)
console.log(m[1] + m[2])
} while (m);

Related

Javascript Regex replace repetitions with exact one less

I have a string which has a character (\n) repeated multiple times. I want to replace this repetitions by just one repetition less.
So let's suppose that we have a string like this (\n repeats 3 times)
hello\n\n\ngoodbye
Ans I want to convert it to this (\n repeats 2 times)
hello\n\ngoodbye
I know how to find out with regex when the occurrence repeats (e.g. /\n\n+/g), but I don't know how to capture the exact number of repetitions to use it in the replace.
Is possible to do that with Regex?
You can search using this regex:
/\n(\n*)/g
And replace using: $1
RegEx Demo
RegEx Details:
\n: Match a line break
(\n*): Match 0 or more line breaks and capture them in 1st capture group. Note that we are capturing one less \ns in this capture group
Replacement of $1 will place all \ns with back-reference of 1st capture group, thus reducing line breaks by one.
Code:
const string = 'hello\n\n\ngoodbye';
console.log(string);
const re = /\n(\n*)/g;
var repl = string.replace(re, "$1");
console.log(repl);
replace (\n*)\n with $1?
let str = 'hello\n\n\ngoodbye'
console.log(str)
console.log(str.replace(/(\n*)\n/,'$1'))

regex if capture group matches string

I need to build a simple script to hyphenate Romanian words. I've seen several and they don't implement the rules correctly.
var words = "arta codru";
Rule: if 2 consonants are between 2 vowels, then they become split between syllables unless they belong in this array in which case both consonants move to the second syllable:
var exceptions_to_regex2 = ["bl","cl","dl","fl","gl","hl","pl","tl","vl","br","cr","dr","fr","gr","hr","pr","tr","vr"];
Expected result: ar-ta co-dru
The code so far:
https://playcode.io/156923?tabs=console&script.js&output
var words = "arta codru";
var exceptions_to_regex2 = ["bl","cl","dl","fl","gl","hl","pl","tl","vl","br","cr","dr","fr","gr","hr","pr","tr","vr"];
var regex2 = /([aeiou])([bcdfghjklmnprstvwxy]{1})(?=[bcdfghjklmnprstvwxy]{1})([aeiou])/gi;
console.log(words.replace(regex2, '$1$2-'));
console.log("desired result: ar-ta co-dru");
Now I would need to do something like this:
if (exceptions_to_regex2.includes($2+$3)){
words.replace(regex2, '$1-');
}
else {
words.replace(regex2, '$1$2-');
}
Obviously it doesn't work because I can't just use the capture groups as I would a regular variable. Please help.
You may code your exceptions as a pattern to check for after a vowel, and stop matching there, or you may still consume any other consonant before another vowel, and replace with the backreference to the whole match with a hyphen right after:
.replace(/[aeiou](?:(?=[bcdfghptv][lr])|[bcdfghj-nprstvwxy](?=[bcdfghj-nprstvwxy][aeiou]))/g, '$&-')
Add i modifier after g if you need case insensitive matching.
See the regex demo.
Details
[aeiou] - a vowel
(?: - start of a non-capturing group:
(?=[bcdfghptv][lr]) - a positive lookahead that requires the exception letter clusters to appear immediately to the right of the current position
| - or
[bcdfghj-nprstvwxy] - a consonant
(?=[bcdfghj-nprstvwxy][aeiou]) - followed with any consonant and a vowel
) - end of the non-capturing group.
The $& in the replacement pattern is the placeholder for the whole match value (at regex101, $0 can only be used at this moment, since the Web site does not support language specific only replacement patterns).

regex match not outputting the adjacent matches javascript

i was experimenting on regex in javascript. Then i came across an issue such that let consider string str = "+d+a+", I was trying to output those characters in the string which are surrounded by +, I used str.match(/\+[a-z]\+/ig), so here what I'm expecting is ["+d+","+a+"], but what i got is just ["+d+"], "+a+" is not showing in the output. Why?
.match(/.../g) returns all non-overlapping matches. Your regex requires a + sign on each side. Given your target string:
+d+a+
^^^
^^^
Your matches would have to overlap in the middle in order to return "+a+".
You can use look-ahead and a manual loop to find overlapping matches:
var str = "+d+a+";
var re = /(?=(\+[a-z]\+))/g;
var matches = [], m;
while (m = re.exec(str)) {
matches.push(m[1]);
re.lastIndex++;
}
console.log(matches);
With regex, when a character gets consumed with a match, then it won't count for the next match.
For example, a regex like /aba/g wouldn't find 2 aba's in a string like "ababa".
Because the second "a" was already consumed.
However, that can be overcome by using a positive lookahead (?=...).
Because lookaheads just check what's behind without actually consuming it.
So a regex like /(ab)(?=(a))/g would return 2 capture groups with 'ab' and 'a' for each 'aba'.
But in this case it just needs to be followed by 1 fixed character '+'.
So it can be simplified, because you don't really need capture groups for this one.
Example snippet:
var str = "+a+b+c+";
var matches = str.match(/\+[a-z]+(?=\+)/g).map(function(m){return m + '+'});
console.log(matches);

Regular expression with asterisk quantifier

This documentation states this about the asterisk quantifier:
Matches the preceding character 0 or more times.
It works in something like this:
var regex = /<[A-Za-z][A-Za-z0-9]*>/;
var str = "<html>";
console.log(str.match(regex));
The result of the above is : <html>
But when tried on the following code to get all the "r"s in the string below, it only returns the first "r". Why is this?
var regex = /r*/;
var str = "rodriguez";
console.log(str.match(regex));
Why, in the first example does it cause "the preceding" character/token to be repeated "0 or more times" but not in the second example?
var regex = /r*/;
var str = "rodriguez";
The regex engine will first try to match r in rodriguez from left to right and since there is a match, it consumes this match.
The regex engine then tries to match another r, but the next character is o, so it stops there.
Without the global flag g (used as so var regex = /r*/g;), the regex engine will stop looking for more matches once the regex is satisfied.
Try using:
var regex = /a*/;
var str = "cabbage";
The match will be an empty string, despite having as in the string! This is because at first, the regex engine tries to find a in cabbage from left to right, but the first character is c. Since this doesn't match, the regex tries to match 0 times. The regex is thus satisfied and the matching ends here.
It might be worth pointing out that * alone is greedy, which means it will first try to match as many as possible (the 'or more' part from the description) before trying to match 0 times.
To get all r from rodriguez, you will need the global flag as mentioned earlier:
var regex = /r*/g;
var str = "rodriguez";
You'll get all the r, plus all the empty strings inside, since * also matches 'nothing'.
Use global switch to match 1 or more r anywhere in the string:
var regex = /r+/g;
In your other regex:
var regex = /<[A-Za-z][A-Za-z0-9]*>/;
You're matching literal < followed by a letter followed by 0 or more letter or digits and it will perfectly match <html>
But if you have input as <foo>:<bar>:<abc> then it will just match <foo> not other segments. To match all segments you need to use /<[A-Za-z][A-Za-z0-9]*>/g with global switch.

How to match start of line and white space in a lookahead expression

I am sure this is really easy but how do I match
match either start of line or whitespace
match a-z
match either end of line or whitespace
I only want to return item no. 2 so for the following string
"one 1.ignore two 2ignore ignore3 three"
The expression will return
["one","two","three"]
Thanks
You would need lookbehind for a regex that matches these items, which is not supported in javascript. Either you do a manual iteration and extract matching groups (as demonstrated by #Some1.Kill.The.DJ), or you're going to split the string instead of matching:
str.split(/\s+(?:\S*?(?![a-z])\S+\s+)*/);
This expression does match all whitespaces combined with words that contain at least one character that is not [a-z]. However, this regex is complicated and not easy to maintain; also it does yield empty strings sometimes. Better, do something like
str.split(/\s+/).filter(RegExp.prototype.test.bind(/^[a-z]+$/));
Use this code:
var str = 'one 1.ignore two 2ignore ignore3 three';
str = str.replace(/\s(?=[a-z])/ig, function(text, p1) {
return p1 ? p1 : text;
});
var arr = str.match(/([a-z]+)(?=\s|$)/ig);

Categories

Resources