JavaScript - Capture repeated group - javascript

Using JavaScript (or VBScript) Is it possible to separately capture from %n:32+5000 the following groups?:
Capture group 1: %n
Capture group 2: :32
Capture group 3: +5000
I tried a search through this forum about capturing repeated groups, but the examples given were either in a different language like .NET or the asker wasn't asking specifically how to do what I need.
The best attempt I've made thus far is (%n)(([:\+]\d+){0,2}) with global turned on. Also, I am using https://regex101.com/r/qBylQX/1 to help me visualize what's happening; but, so far I haven't cracked it.
Notes:
only one instance of %n is allowed per
only one appearance of :\d+ is allowed per match
only one appearance of +\d+ is allowed per match.
the pattern can appear anywhere in a string.

Use
(%n)(?:(:\d+)(\+\d+)?)?
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
%n '%n'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \3 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\+ '+'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)? end of \3 (NOTE: because you are using a
quantifier on this capture, only the
LAST repetition of the captured pattern
will be stored in \3)
--------------------------------------------------------------------------------
)? end of grouping

Using this pattern (%n)(([:\+]\d+){0,2}) can also match 2 times :\d+ or 2 times +\d+ as both the : and + are in the character class.
Repeating a capture group like ([:\+]\d+){0,2} will only capture the value of the last iteration in the group.
As this repetition is inside the outer capture group 2, the group 2 value will contain the whole match and you don't have a nice separation of values in groups that you are after.
According to the notes only one occurrence can of either one of them can be in the match.
If you want to match both combinations of : and + but if they can not be the same, you can use a capture group with a backreference.
The values are in group 1, 3 and 4.
(%n)(?:([:+])(\d+)(?:(?!\2)([:+]\d+))?)?(?!\S)
The pattern matches:
(%n) Capture group 1, match %n
(?: Non capture group to match as a whole
([:+]) Capture group 2, match either : or +
(\d+) Capture group 3, match 1+ digits
(?: Non capture group to match as a whole
(?!\2) Negative lookahead, assert what is directly to the right is not the same value as captured in group 2
([:+]\d+) Capture group 4, match : or +
)? Close group and make it optional
)? Close group and make it optional
(?!\S) Assert a whitespace boundary to the right to prevent a partial match
Regex demo
const regex = /(%n)(?:([:+])(\d+)(?:(?!\2)([:+]\d+))?)?(?!\S)/;
[
"%n:32+5000",
"%n:32",
"%n+5000",
"%n",
"%n+32:5000",
"%n+32",
"%n:5000",
"%n:32:5000",
"%n+32+5000"
].forEach(s => console.log(`${regex.test(s)} ==> ${s}`));

Related

Regex to match string with or without capturing group

I've been trying for a while and not sure what to Google but I want both of these to be matched
Someone does something
and
Someone tries to do something
What I thought would work is
/^Someone (tries to)? do(es)? something$/
But that only matches the second string.
They are separate strings, not a single string spanning multiple lines.
Use
/^Someone(?: tries to)? do(?:es)? something$/
See proof
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
Someone 'Someone'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
tries to ' tries to'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
do ' do'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
es 'es'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
something ' something'
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Regex that allows PascalCased with one or more consecutive uppercase letters and numbers

I'm trying to check user input for PascalCased names and this works, but I would like to also allow one or more consecutive upperCase letters eg. UNOrganization and also allow numbers in between eg. A2BOrganization.
So all of the following should be allowed: ABCWord, A3BWord, OtherWord, Word3, Word3AB (last unlikely but if possible fine)
if (value.match(/^[A-Z][a-z]+(?:[A-Z][a-z]+)*$/)) {
//logic here
}
Regex is a little beyond me and the logic to parse the strings would be too long for my needs and I know this can be done in a one-liner with regex so hopefully someone more savvy can help me.
Use
^[A-Z]+[a-z]*(?:\d*(?:[A-Z]+[a-z]*)?)*$
See proof
If you require at least one lowercase letter in the input string:
^(?=.*[a-z])[A-Z]+[a-z]*(?:\d*(?:[A-Z]+[a-z]*)?)*$
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------------------------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[A-Z]+ any character of: 'A' to 'Z' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
[a-z]* any character of: 'a' to 'z' (0 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\d* digits (0-9) (0 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[A-Z]+ any character of: 'A' to 'Z' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
[a-z]* any character of: 'a' to 'z' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
/^(?=.*[a-z])[A-Z]\d*[a-z]*(([A-Z\d]*[A-Z]|[A-Z][A-Z\d]*|\d+$)[a-z]*)*$/
seems to fit your requirements. See tests below.
The idea is to start with a capital letter followed by an optional digit and zero or more lowercase letters. Then enter the main group repeated zero or more times. This group matches one or more capital letters or digits followed by zero or more lowercase letters. The alternation handles allowing digits at the end and disallowing lone digits sandwiched between two lowercase characters.
const pat = /^(?=.*[a-z])[A-Z]\d*[a-z]*(([A-Z\d]*[A-Z]|[A-Z][A-Z\d]*|\d+$)[a-z]*)*$/;
const tests = [
"UNOrganization",
"A2BOrganization",
"ABCWord",
"A3BWord",
"OtherWord",
"Word3",
"Word3AB",
"a",
"A",
"Aa",
"aA",
"AAA",
"aaa",
"9A",
"A4",
"A4a",
"AaA",
"AaAa",
"Aa1",
"AaA1",
"Aa1a",
"",
];
const len = 2 + Math.max(...tests.map(e => e.length));
tests.forEach(e =>
console.log(`${`'${e}'`.padStart(len)} => ${pat.test(e)}`)
);
Another pattern:
^[A-Z]+[A-Za-z]*(?:[0-9]+(?:[A-Z]+[A-Za-z]*)*)*$
[A-Z]+[A-Za-z]*: At least one capital letter optionally followed a sequence of letters of any case, ...
(?: ... )*: ... optionally followed by a sequence of :
[0-9]+: one or more digit...
(?: ... )*: optionally followed by a sequence of :
[A-Z]+[A-Za-z]*: at least one capital letter optionally followed a sequence of letters of any case
Regex101.com working examples (replaced ^ and $ by word delimiters \b).

Regex to exclude an entire line match if certain characters found

I'm stuck on the cleanest way to accomplish two bits of regex. Every solution I've come up with so far seems clunky.
Example text
Match: Choose: blah blah blah 123 for 100'ish characters, this matches
NoMatch: Choose: blah blah blah 123! for 100'ish characters?, .this potential match fails for the ! ? and .
The first regex (?:^\w+?:)(((?![.!?]).)*)$ needs to:
Match a line containing any word followed by a : so long as !?. are not found in the same line (the word: will always be at the beginning of a line)
Ideally, match every part of the line from the example EXCEPT Choose:. Matching the whole line is still a win.
The second regex ^(^\w+?:)(?:(?![.!?]).)*$ needs to:
Match a line containing any word followed by a : so long as !?. are not found in the same line (the word: will always be at the beginning of a line)
Match only Choose:
The regex is in a greasemonkey/tampermonkey script.
Use
^\w+:(?:(?!.*[.!?])(.*))?
See proof.
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
[.!?] any character of: '.', '!', '?'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
)? end of grouping
Does this do what you want?
(?:^\w+:)((?:(?![!?.]).)*)$
What makes you feel that this is clunky?
(?: ... ) non-capturing group
^ start with
\w+: a series of one or more word characters followed by a :
( ... )$ capturing group that continues to the end
(?: ... )* non-capturing group, repeated zero or more times, with
(?! ... ) negative look-ahead: no following character can be
[!?.] either ?, ! or .
. followed by any character
For the first pattern, you could first check that there is no ! ? or . present using a negative lookahead. Then capture in the first group 1+ word chars and : and the rest of the line in group 2.
^(?![^!?.\n\r]*[!?.])(\w+:)(.*)$
^ Start of string
(?! Negative lookahead, assert what is on the right is not
[^!?.\n\r]*[!?.] Match 0+ times any char except the listed using contrast, then match either ! ? .
) Close lookahead
(\w+:) Capture group 1, match 1+ word chars and a colon
(.*) Capture group 2, match any char except a newline 0+ times
$ End of string
Regex demo
For the second part, if you want a match only for Choose:, you could use the negative lookahead only without a capturing group.
^(?![^!?.\n\r]*[!?.])\w+:
Regex demo

Regular Expression match character OR number

I would like to get parts of a string separated by the type (number or character)
Simplified initial situation:
var content = 'foo123bar456';
Desired result:
result = ['foo', 123, 'bar', 456];
This is what I have so far to match the first "foo"
^(([a-z])|([0-9]))+
I thought this would match either characters [a-z]+ OR numbers [0-9]+ (which would match 'foo' in this case) but unfortunately it allows both (characters AND numbers) at the same time.
If it would match only characters with the same type I could just add "{1,}" to my regex in order to match all occurrences of the pattern and the world would be a bit better.
Correct regex will be:
([a-zA-Z]+|[0-9]+)
Explanation of the regex is:
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[a-zA-Z]+ any character of: 'a' to 'z', 'A' to 'Z'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
Use g modifier as well for global match
[a-zA-Z]+|[0-9]+/g

Javascript Regex Input Validation to Prevent Duplicate Characters

I am attempting to validate text input with the following requirements:
allowed characters & length /^\w{8,15}$/
must contain /[a-z]+/
must contain /[A-Z]+/
must contain /[0-9]+/
must not contain repeated characters (ie. aba=pass and aab=fail)
Each test would return true when used with .test().
With modest familiarity, I am able to write the first 4 tests, albeit individually. The 5th test is not working out, negated lookahead (which is what i believe i need to be using) is challenging.
Here are a few value/result examples:
re.test("Fail1");//returns false, too short
re.test("StringFailsRule1");//returns false, too long
re.test("Fail!");//returns false, invalid !
re.test("FAILRULE2");//returns false, missing [a-z]+
re.test("failrule3");//returns false, missing [A-Z]+
re.test("failRuleFour");//returns false, missing [0-9]+
re.test("failRule55");//returns false, repeat of "5"
re.test("TestValue1");//returns true
Finally, the ideal would be a single combined test used to enforce all requirements.
This uses negative and positive lookaheads zero-length assertions for your tests and the .{8,15} bit validates length.
^(?!.*(.)\1)(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])\w{8,15}$
For your fifth rule I used a negative lookahead to make sure that a capture group of any character is never followed by itself.
Regexpal demo
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[0-9] any character of: '0' to '9'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
\w{8,15} word characters (a-z, A-Z, 0-9, _)
(between 8 and 15 times (matching the most
amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Categories

Resources