Regex to find character only if it occurs 4 times - javascript

I'm stuck on making this Regex. I tried using look-ahead and look-behind together, but I couldn't use the capture group in the look-behind. I need to extract characters from a string ONLY if it occurs 4 times.
If I have these strings
3346AAAA44
3973BBBBBB44
9755BBBBBBAAAA44
The first one will match because it has 4 A's in a row.
The second one will NOT match because it has 6 B's in a row.
The third one will match because it still has 4 A's. What makes it even more frustrating, is that it can be any char from A to Z occuring 4 times.
Positioning does not matter.
EDIT: My attempt at the regex, doesn't work.
(([A-Z])\2\2\2)(?<!\2*)(?!\2*)

If lookbehind is allowed, after capturing the character, negative lookbehind for \1. (because if that matches, the start of the match is preceded by the same character as the captured first character). Then backreference the group 3 times, and negative lookahead for the \1:
`3346AAAA44
3973BBBBBB44
9755BBBBBBAAAA44`
.split('\n')
.forEach((str) => {
console.log(str.match(/([a-z])(?<!\1.)\1{3}(?!\1)/i));
});
([a-z]) - Capture a character
(?<!\1.) Negative lookbehind: check that the position at the 1st index of the captured group is not preceded by 2 of the same characters
\1{3} - Match the same character that was captured 3 more times
(?!\1) - After the 4th match, make sure it's not followed by the same character

Another version without lookbehind (see demo). The captured sequence of 4 equal characters will be rendered in Group 2.
(?:^|(?:(?=(\w)(?!\1))).)(([A-Z])\3{3})(?:(?!\3)|$)
(?:^|(?:(?=(\w)(?!\1))).) - ensure it's the beginning of the string. Otherwise, the 2nd char must be different from the 1st one - if yes, skip the 1st char.
(([A-Z])\3{3}) Capture 4 repeated [A-Z] chars
(?:(?!\3)|$) - ensure the first char after those 4 is different. Or it's the end of the string
As it was suggested by bobble-bubble in this comment - the expression above can be simplified to (demo):
(?:^|(\w)(?!\1))(([A-Z])\3{3})(?!\3)

Another variant could be capturing the first char in a group 1.
Assert that then the previous 2 chars on the left are not the same as group 1, match an additional 3 times group 1 which is a total of 4 the same chars.
Then assert what is on the right is not group 1.
([A-Z])(?<!\1\1)\1{3}(?!\1)
([A-Z]) Capture group 1, match a single char A-Z
(?<!\1\1) Negative lookbehind, assert what is on the left is not 2 times group 1
\1{3} Match 3 times group 1
(?!\1) Assert what is on the right is not group 1
For example
let pattern = /([A-Z])(?<!\1\1)\1{3}(?!\1)/g;
[
"3346AAAA44",
"3973BBBBBB44",
"9755BBBBBBAAAA44",
"AAAA",
"AAAAB",
"BAAAAB"
].forEach(s =>
console.log(s + " --> " + s.match(pattern))
);

Related

How do you move n characters that are between 2 words to the end of the line (inside of a multi-line string)?

The multi-line string always contains XY, followed by a few characters (not always the same amount of characters), followed by :thiSWORD:
The goal is to move these few characters that are in the middle to the end of the line
So, for example, this is the original string:
XY1239:thiSWORD:a6b4ba21ba54f6bde411930b0d88432f
XY545:thiSWORD:b598944d1ba4c787e411800b8043559c
XY4239:thiSWORD:a6b4ba21ba54f6bde411930b0d8817c6
In the end it would look like this:
XY:thiSWORD:a6b4ba21ba54f6bde411930b0d88432f1239
XY:thiSWORD:b598944d1ba4c787e411800b8043559c545
XY:thiSWORD:a6b4ba21ba54f6bde411930b0d8817c64239
I have tried something along the lines of
str.replace(/(\w{4})(\w{48})/g, '$2$1');
But that only moved 4 characters, so lines that had 3 or 5 characters between XY and :thiSWORD: were messed up.
You can use 2 capture groups, and use those in the replacement:
XY(\d+)(.*)
XY Match literally
(\d+) Capture 1+ digits in group 1
(.*) Capture the rest of the line in group 2
See a regex demo.
[
"XY1239:thiSWORD:a6b4ba21ba54f6bde411930b0d88432f",
"XY545:thiSWORD:b598944d1ba4c787e411800b8043559c",
"XY4239:thiSWORD:a6b4ba21ba54f6bde411930b0d8817c6"
].forEach(s => {
console.log(s.replace(/XY(\d+)(.*)/, "XY$2$1"))
})
Another variant using 1 or more word characters \w+ if there can also be word characters instead of only digits, matching 32 word chars instead of 48, word boundaries on the left and right and matching :thiSWORD:
\bXY(\w+)(:thiSWORD:\w{32})\b
Regex demo

Regex for match beginning with 2 letters and ending with 3 letters

Example input:
'Please find the ref AB45676785567XYZ. which is used to identify reference number'
Example output:
'AB45676785567XYZ'
I need a RegExp to return the match exactly matching my requirements; i.e. the substring where the first 2 and last 3 characters are letters.
The first 2 and last 3 letters are unknown.
I've tried this RegExp:
[a-zA-Z]{2}[^\s]*?[a-zA-Z]{3}
But it is not matching as intended.
Your current RegExp matches the following words marked with code blocks:
Please find the ref AB45676785567XYZ. which is used to identify reference number
This is because your RegExp, [a-zA-Z]{2}[^\s]*?[a-zA-Z]{3}, is asking for:
[a-zA-Z]{2} Begins with 2 letters (either case)
[^\s]*? Contains anything that isn't a whitespace
[a-zA-Z]{3} Ends with 3 letters (either case)
In your current example, restricting the letters to uppercase only would match only the match you seek:
[A-Z]{2}[^\s]+[A-Z]{3}
Alternatively, requiring numbers between the 2 beginning and 3 ending letters would also produce the match you want:
[a-zA-Z]{2}\d+[a-zA-Z]{3}
What is really important here, is word boundaries \b, try: \b[a-zA-Z]{2}\w+[a-zA-Z]{3}\b
Explanation:
\b - word boundary
[a-zA-Z]{2} - match any letter, 2 times
\w+ - match one or more word characters
[a-zA-Z]{3} - match any letter, 3 times
\b - word boundary
Demo
CAUTION your requirements are amibgious, as any word consisting of 5 or more letters would match the pattern
Start with 2 letters :
[a-zA-Z]{2}
Digits in the middle :
\d+
Finish with 3 letters :
[a-zA-Z]{3}
Full Regex :
[a-zA-Z]{2}\d+[a-zA-Z]{3}
If the middle text is Alpha-Numeric, you can use this :
[A-Z]{2}[^\s]+[A-Z]{3}

Regex for phone numbers that check if numbers are not the same and last 7 digits are not same

I am trying to create a regex which matches 7-15 digit number, the number cannot contain all the same digits and Last 7 digits cannot be the same. I have made two regex expressions for number that all numbers cannot be same. The regex which I have made is:
/^(?!(.)\1+$)^(|[0-9]{7,15})$/.
And for Last seven digits cannot be same,the regex which i have made is:
/^(?!.*(\d)\1{6}\b)^[0-9]{0,15}$/.
But the problem is I am not able to make the regex which full fills both the conditions i.e. all the numbers cannot contain all the same digits and Last 7 digits cannot be the same.
Please suggest how this can be done.
It seems you can use alternation operator inside the negative lookahead to check for 2 conditions:
^(?!(\d)\1+$|\d*(\d)\2{6}$)(?:\d{7,15})?$
See the regex demo.
Details:
^ - start of string
(?!(\d)\1+$|\d*(\d)\2{6}$) - the negative lookahead failing the match if all digits are the same from start to end ((\d)\1+$ where (\d) captures a digit into Group 1 and then \1+ matches one or more values captured in Group 1 followed with end of string check with $), or if only the last 7 are the same (see \d*(\d)\2{6}$ where \d* matches 0+ digits, (\d) captures a digit into Group 2 and then \2{6} matches 6 values captured in Group 2 followed with end of string check with $)
(?:\d{7,15})? - an optional group matching 7 to 15 digits (or an empty string, as it's optional)
$ - end of string.

Javascript RegExp non sequential characters

I have this rule
var reg = new RegExp('[a-z]{3}');
Which means it is allowed to use characters between a-z and at least 3 occurrences.
So, I am wondering if there is a way to match this rule with non sequential characters.
In other words,
"abc" => valid
"aaa" => not valid
Thank you!
Here is a working regex for exactly 3 (or N) characters, if the number is not fixed it gets more complicated:
^([a-z])(?!\1{2})[a-z]{2}$
1 2 3 4 5 6 7 8
Explanation:
^ matches the beginning of the string
([a-z]) match one of the accepted characters and save it (group 1)
(?!...) negative lookahead, what is in those brackets is not accepted
\1 reference to the first group (first character here)
{2} repeated exactly twice
[a-z] the accepted characters
{2} repeated exactly twice
$ matches the end of the string
Link here (I added the gm modifiers, so that several expressions can be tested.)
Try to use the excluding lookahead (?![a-z]{3}), it will not match 3 equal characters in sequence.

Difference between (\w)* and \w?

I'm trying to study regexes, and I came upon this confusing scenario:
Suppose you have the text:
hello world
If you run the regex (\w)*, it gives:
['hello', 'o']
What I expected was:
['hello', 'h']
Doesn't \w mean any word character?
Another example:
Text:
Delicious cake
(\w)* output:
['Delicious', 's']
What I expected:
['Delicious', 'D']
'*' matches the preceding part zero or more times and bind tightly to the element on the left.
Example: m*o will match o, mo, mmo, mmmmo and so on.
Parentheses () are used to mark sub-expressions, also called capture groups.
So (\w)* is repeated capturing group.
Regex Demo
Sam, the reason why (\w)* returns "s" in Group 1 against "delicious" is that there can only be one Group 1. Each time a new character is matched by (\w), the parentheses force the new value of the character to be captured into Group 1. "s" is the last character, so it is the final Group 1 reported to you by the engine.
If you wanted to capture the first letter into Group 1 instead, you could go with something like:
(\w)\w*
This causes the first character to be captured. There is no quantifier on the capturing parentheses, so Group 1 doesn't change. The remaining \w* optionally match any additional characters.
Also please note that when you run (\w)* against "hello world", the matches are not "hello" and "o" as you stated. The matches (if you match them all) are "hello" and "world". The Group 1 captures are "o" and "d", the last letters of each word.
Reference: All about capture
Remember, a repeated capturing group always captures the last group.
So.
(\w)* on hello will check one character at a time unless it reaches the last match.
Thus will get o in the capture group.
(\w)* on helloworld will check one character at a time unless it reaches the last match.
Thus will get d in the capture group.
(\w)* on hello123 will check one character at a time unless it reaches the last match.
Thus will get 3 in the capture group.
(\w)* on helloworld#3w4 will check one character at a time unless it reaches the last match. Thus will get d in the capture group since # is not a valid \word character( only [_0-9a-zA-Z] allowed).
(\w)*
Match the regular expression below and capture its match into backreference number 1 «(\w)*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «*»
Match a single character that is a “word character” (letters, digits, and underscores) «\w»
Will give you two matches:
hello
world
\w
Match a single character that is a “word character” (letters, digits, and underscores) «\w»
Will match every character (individually) on the sentence:
h
e
l
l
o
w
o
r
l
d
\w is a RegEx shortcut for [_a-zA-Z0-9] which means any letter, digit, or an underscore.
When you add an asterisk * after anything, it means it can appear from 0 to unlimited times.
If you want to match all the letters in your input, use \w
If you want to match whole words in your input, use \w+ (use + and not * since a word has at least one letter)
Also, when you're surrounding stuff in your RegEx with brackets, they become a capture group, which means they will appear in your results, which is why (\w)* is different from (\w*)
Useful RegEx sites:
RegexPal
Debuggex

Categories

Resources