JavaScript Regular Expression to repeat element match in string - javascript

Yes I am a newbie, I have looked online and cannot seem to find the answer to the following, I know it must be simple.
I have a simple string and need to match spaced Capitals eg T G D ......repeater,
Secondly I need to match capitals with a dot between them and no space eg T.G.D ........repeater
I have the current string = str.match(/ [A-Z] [A-Z] | [A-Z].[A-Z]/g)
but this will only match the first two e.g T G I Need it to match wherever it finds the following pattern eg T G D E F L ...repeater as a one match
Likewise It will only match the T.G but nothing after e.g T.G I Need to match T.G.D.L.T repeater (may end with a dot and may not)
any help will be appreciated.

You might use a capturing group matching either a dot or a space in a character class ([. ])
Match the first 2 capitals and capture the space or dot in group 1. Then optionally repeat what is captured using a back reference followed by a captital A-Z.
\b[A-Z]([. ])[A-Z](?:\1[A-Z])*\b
\b Word boundary
[A-Z]([. ])[A-Z] Match A-Z, capture in group 1 matching either space or .
(?:\1[A-Z])* Repeat 0+ times matching what it captured in group 1 followed by A-Z
\b Word boundary
Regex demo

Related

REgex for non repeating alphabets comma seperated

I have a requirement where I need a regex which
should not repeat alphabet
should only contain alphabet and comma
should not start or end with comma
can contain more than 2 alphabets
example :-
A,B --- correct
A,B,C,D,E,F --- correct
D,D,A --- wrong
,B,C --- wrong
B,C, --- wrong
A,,B,C --- wrong
Can anyone help ?
Another idea with capturing and checking by use of a lookahead:
^(?:([A-Z])(?!.*?\1),?\b)+$
You can test here at regex101 if it meets your requirements.
If you don't want to match single characters, e.g. A, change the + quantifier to {2,}.
The statement of the question is incomplete in several respects. I have made the following assumptions:
Considering that D,D,A is incorrect I assume that a letter cannot be followed by a comma followed by the same letter.
The string may contain the same letter more than once as long as #1 is satisfied.
Considering that A,,B,C is incorrect I assume a comma cannot follow a comma.
Since the examples contain only capital letters I will assume that lower-case letters are not permitted (though one need only set the case-indifferent flag (i) to permit either case).
We observe that the requirements are satisfied if and only if the string begins with a capital letter and is followed by a sequence of comma-capital letter pairs, provided that no capital letter is followed by a comma followed by the same letter. We therefore can attempt to match the following regular expression.
^(?:([A-Z]),(?!\1))*[A-Z]$
Demo
The elements of the expression are as follows.
^ # match beginning of string
(?: # begin a non-capture group
([A-Z]) # match a capital letter and save to capture group 1
, # match a comma
(?!\1) # use negative lookahead to assert next character is not equal
# to the content of capture group 1
)* # end non-capture group and execute it zero or more times
[A-Z] # match a capital letter
$ # match end of string
Here is a big ugly regex solution:
var inputs = ['A,B', 'D,D,D', ',B,C', 'B,C,', 'A,,B'];
for (var i=0; i < inputs.length; ++i) {
if (/^(?!.*?([^,]+).*,\1(?:,|$))[^,]+(?:,[^,]+)*$/.test(inputs[i])) {
console.log(inputs[i] + " => VALID");
}
else {
console.log(inputs[i] + " => INVALID");
}
}
The regex has two parts to it. It uses a negative lookahead to assert that no two CSV entries ever repeat in the input. Then, it uses a straightforward pattern to match any proper CSV delimited input. Here is an explanation:
^ from the start of the input
(?!.*?([^,]+).*,\1(?:,|$)) assert that no CSV element ever repeats
[^,]+ then match a CSV element
(?:,[^,]+)* followed by comma and another element, 0 or more times
$ end of the input
This one could suit your needs:
^(?!,)(?!.*,,)(?!.*(\b[A-Z]+\b).*\1)[A-Z,]+(?<!,)$
^: the start of the string
(?!,): should not be directly followed by a comma
(?!.*,,): should not be followed by two commas
(?!.*(\b[A-Z]+\b).*\1): should not be followed by a value found twice
[A-Z,]+: should contain letters and commas only
$: the end of the string
(?<!,): should not be directly preceded by a comma
See https://regex101.com/r/1kGVSB/1

Create a regex to extract a string that contain a noral character and escaped string without DOS

I have a string like this:
///////AB?\a\b\c\d\d\e\\f\a\a\b\cd\ed\fmnopqrstuvwxy\z\a\a\a\a\a\a\a\a\a///imgy
it started with /// and ended with ///imgy (i and/or m and/or g and/or y), and between the beginning and end are the character are normal character like a or escaped character like \a.
Here is my regex:
/^\/{3}((?:\\?[\s\S])+?)\/{3}([imgy]{0,4})(?!\w)/
But the problem is that it is reported as "vulnerable to denial-of-service attacks". The main part that has the problem is
(?:\\?[\s\S])+
How can I create a right one that can figure out both a and \a? Thank you!
Regex Demo
Update:
I just found to use the following regex:
(?:\\[\s\S]+?)|(?:(?<!\\)[\s\S]+?)|(?:(?<=\\\\)[\s\S]+?)
to replace the old problematic part (?:\\?[\s\S])+?, and in this way, it can avoid requires exponential time to match certain inputs, and avoid vulnerable to denial-of-service attacks.
The details:
(?:\\[\s\S]+?) match any \a
(?:(?<!\\)[\s\S]+?) match any a, but not following \.
(?:(?<=\\\\)[\s\S]+?) match any a, but much following \\. This to make sure f is matched that following \\.
So the whole regex will look like this:
^\/{3}((?:\\[\s\S]+?)|(?:(?<!\\)[\s\S]+?)|(?:(?<=\\\\)[\s\S]+?))\/{3}([imgy]{0,4})(?!\w)
You might list the characters that are allowed to a character class, and optionally repeat an escaped character [a-z]
^\/{3,}[A-Za-z?]+(?:\\[a-z\\][A-Za-z?]*)*\/\/\/[imgy]{0,4}$
The pattern matches:
^ Start of string
\/{3,}[A-Za-z?]+ Match 3 or more / and 1 or more times any of the listed allowed chars
(?: Non capture group
\\[a-z\\] Match an escaped char a-z or \\
[A-Za-z?]* Optionally match any of the listed
)* Close an optionally repeat the group
\/\/\/[imgy]{0,4} Match /// and 0-4 times any of i m g or y If there should be at least a single char, you can use {1,4}
$ End of string
Regex demo

Why isn't this group capturing all items that appear in parentheses?

I'm trying to create a regex that will capture a string not enclosed by parentheses in the first group, followed by any amount of strings enclosed by parentheses.
e.g.
2(3)(4)(5)
Should be: 2 - first group, 3 - second group, and so on.
What I came up with is this regex: (I'm using JavaScript)
([^()]*)(?:\((([^)]*))\))*
However, when I enter a string like A(B)(C)(D), I only get the A and D captured.
https://regex101.com/r/HQC0ib/1
Can anyone help me out on this, and possibly explain where the error is?
Since you cannot use a \G anchor in JS regex (to match consecutive matches), and there is no stack for each capturing group as in a .NET / PyPi regex libraries, you need to use a 2 step approach: 1) match the strings as whole streaks of text, and then 2) post-process to get the values required.
var s = "2(3)(4)(5) A(B)(C)(D)";
var rx = /[^()\s]+(?:\([^)]*\))*/g;
var res = [], m;
while(m=rx.exec(s)) {
res.push(m[0].split(/[()]+/).filter(Boolean));
}
console.log(res);
I added \s to the negated character class [^()] since I added the examples as a single string.
Pattern details
[^()\s]+ - 1 or more chars other than (, ) and whitespace
(?:\([^)]*\))* - 0 or more sequences of:
\( - a (
[^)]* - 0+ chars other than )
\) - a )
The splitting regex is [()]+ that matches 1 or more ) or ( chars, and filter(Boolean) removes empty items.
You cannot have an undetermined number of capture groups. The number of capture groups you get is determined by the regular expression, not by the input it parses. A capture group that occurs within another repetition will indeed only retain the last of those repetitions.
If you know the maximum number of repetitions you can encounter, then just repeat the pattern that many times, and make each of them optional with a ?. For instance, this will capture up to 4 items within parentheses:
([^()]*)(?:\(([^)]*)\))?(?:\(([^)]*)\))?(?:\(([^)]*)\))?(?:\(([^)]*)\))?
It's not an error. It's just that in regex when you repeat a capture group (...)* that only the last occurence will be put in the backreference.
For example:
On a string "a,b,c,d", if you match /(,[a-z])+/ then the back reference of capture group 1 (\1) will give ",d".
If you want it to return more, then you could surround it in another capture group.
--> With /((?:,[a-z])+)/ then \1 will give ",b,c,d".
To get those numbers between the parentheses you could also just try to match the word characters.
For example:
var str = "2(3)(14)(B)";
var matches = str.match(/\w+/g);
console.log(matches);

jQuery Regex - Finding All Joined Words

jQuery Regex
/((\b([a-zA-Z]{0,15})\b)([^a-z0-9\$_]))/g
My Attempt So Far: https://regex101.com/r/d3VUpG/1
Example test string:
(options.method==="
|options.method==="
=options.method==="HEAD"
options.method.options.method==="HEAD"
What I'm Trying TO Achieve
Returned as $1 the value of any connected words such as:
options.method - Would = $1
options.method.options.method - Would also = $1
Question
How can I find all words connected with a dot (.) to then wrap in a span like the below example;
.replace(//gi,'<span class="join">$1</span>')
You can use the following expression:
/((?:\w+\.)+\w+)/g
Explanation:
( - Start of capturing group 1
(?: - Start of a non-capturing group
\w+\. - Match [a-zA-Z0-9_] characters one or more times followed by a literal . character
)+ - End of the non-capturing group; match the group one or more times
\w+ - Match [a-zA-Z0-9_] characters one or more times
) - End of capturing group 1
So in other words, the non-capturing group, (?:\w+\.)+, will match a substring like option. one or more times followed by a final \w+ which will match the final word without a literal . character following it. Since there is only one capturing group wrapping everything, you can wrap your span tag around the first group, $1.
Live Example
string.replace(/((?:\w+\.)+\w+)/g, '<span class="join">$1</span>');
As mentioned above, \w includes underscore, numbers and letters ([a-zA-Z0-9_]), so if you only want to match letter characters, then you could swap out \w with [a-z] and use the case-insensitive flag:
/((?:[a-z]+\.)+[a-z]+)/gi

Difference between (\w)* and \w?

I'm trying to study regexes, and I came upon this confusing scenario:
Suppose you have the text:
hello world
If you run the regex (\w)*, it gives:
['hello', 'o']
What I expected was:
['hello', 'h']
Doesn't \w mean any word character?
Another example:
Text:
Delicious cake
(\w)* output:
['Delicious', 's']
What I expected:
['Delicious', 'D']
'*' matches the preceding part zero or more times and bind tightly to the element on the left.
Example: m*o will match o, mo, mmo, mmmmo and so on.
Parentheses () are used to mark sub-expressions, also called capture groups.
So (\w)* is repeated capturing group.
Regex Demo
Sam, the reason why (\w)* returns "s" in Group 1 against "delicious" is that there can only be one Group 1. Each time a new character is matched by (\w), the parentheses force the new value of the character to be captured into Group 1. "s" is the last character, so it is the final Group 1 reported to you by the engine.
If you wanted to capture the first letter into Group 1 instead, you could go with something like:
(\w)\w*
This causes the first character to be captured. There is no quantifier on the capturing parentheses, so Group 1 doesn't change. The remaining \w* optionally match any additional characters.
Also please note that when you run (\w)* against "hello world", the matches are not "hello" and "o" as you stated. The matches (if you match them all) are "hello" and "world". The Group 1 captures are "o" and "d", the last letters of each word.
Reference: All about capture
Remember, a repeated capturing group always captures the last group.
So.
(\w)* on hello will check one character at a time unless it reaches the last match.
Thus will get o in the capture group.
(\w)* on helloworld will check one character at a time unless it reaches the last match.
Thus will get d in the capture group.
(\w)* on hello123 will check one character at a time unless it reaches the last match.
Thus will get 3 in the capture group.
(\w)* on helloworld#3w4 will check one character at a time unless it reaches the last match. Thus will get d in the capture group since # is not a valid \word character( only [_0-9a-zA-Z] allowed).
(\w)*
Match the regular expression below and capture its match into backreference number 1 «(\w)*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «*»
Match a single character that is a “word character” (letters, digits, and underscores) «\w»
Will give you two matches:
hello
world
\w
Match a single character that is a “word character” (letters, digits, and underscores) «\w»
Will match every character (individually) on the sentence:
h
e
l
l
o
w
o
r
l
d
\w is a RegEx shortcut for [_a-zA-Z0-9] which means any letter, digit, or an underscore.
When you add an asterisk * after anything, it means it can appear from 0 to unlimited times.
If you want to match all the letters in your input, use \w
If you want to match whole words in your input, use \w+ (use + and not * since a word has at least one letter)
Also, when you're surrounding stuff in your RegEx with brackets, they become a capture group, which means they will appear in your results, which is why (\w)* is different from (\w*)
Useful RegEx sites:
RegexPal
Debuggex

Categories

Resources