Regex to exclude an entire line match if certain characters found

Regex to exclude an entire line match if certain characters found - javascript

I'm stuck on the cleanest way to accomplish two bits of regex. Every solution I've come up with so far seems clunky.
Example text
Match: Choose: blah blah blah 123 for 100'ish characters, this matches
NoMatch: Choose: blah blah blah 123! for 100'ish characters?, .this potential match fails for the ! ? and .
The first regex (?:^\w+?:)(((?![.!?]).)*)$ needs to:
Match a line containing any word followed by a : so long as !?. are not found in the same line (the word: will always be at the beginning of a line)
Ideally, match every part of the line from the example EXCEPT Choose:. Matching the whole line is still a win.
The second regex ^(^\w+?:)(?:(?![.!?]).)*$ needs to:
Match a line containing any word followed by a : so long as !?. are not found in the same line (the word: will always be at the beginning of a line)
Match only Choose:
The regex is in a greasemonkey/tampermonkey script.

Use
^\w+:(?:(?!.*[.!?])(.*))?
See proof.
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
[.!?] any character of: '.', '!', '?'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
)? end of grouping

Does this do what you want?
(?:^\w+:)((?:(?![!?.]).)*)$
What makes you feel that this is clunky?
(?: ... ) non-capturing group
^ start with
\w+: a series of one or more word characters followed by a :
( ... )$ capturing group that continues to the end
(?: ... )* non-capturing group, repeated zero or more times, with
(?! ... ) negative look-ahead: no following character can be
[!?.] either ?, ! or .
. followed by any character

For the first pattern, you could first check that there is no ! ? or . present using a negative lookahead. Then capture in the first group 1+ word chars and : and the rest of the line in group 2.
^(?![^!?.\n\r]*[!?.])(\w+:)(.*)$
^ Start of string
(?! Negative lookahead, assert what is on the right is not
[^!?.\n\r]*[!?.] Match 0+ times any char except the listed using contrast, then match either ! ? .
) Close lookahead
(\w+:) Capture group 1, match 1+ word chars and a colon
(.*) Capture group 2, match any char except a newline 0+ times
$ End of string
Regex demo
For the second part, if you want a match only for Choose:, you could use the negative lookahead only without a capturing group.
^(?![^!?.\n\r]*[!?.])\w+:
Regex demo

Related

JavaScript - Capture repeated group

Using JavaScript (or VBScript) Is it possible to separately capture from %n:32+5000 the following groups?:
Capture group 1: %n
Capture group 2: :32
Capture group 3: +5000
I tried a search through this forum about capturing repeated groups, but the examples given were either in a different language like .NET or the asker wasn't asking specifically how to do what I need.
The best attempt I've made thus far is (%n)(([:\+]\d+){0,2}) with global turned on. Also, I am using https://regex101.com/r/qBylQX/1 to help me visualize what's happening; but, so far I haven't cracked it.
Notes:
only one instance of %n is allowed per
only one appearance of :\d+ is allowed per match
only one appearance of +\d+ is allowed per match.
the pattern can appear anywhere in a string.

Use
(%n)(?:(:\d+)(\+\d+)?)?
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
%n '%n'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \3 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\+ '+'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)? end of \3 (NOTE: because you are using a
quantifier on this capture, only the
LAST repetition of the captured pattern
will be stored in \3)
--------------------------------------------------------------------------------
)? end of grouping

Using this pattern (%n)(([:\+]\d+){0,2}) can also match 2 times :\d+ or 2 times +\d+ as both the : and + are in the character class.
Repeating a capture group like ([:\+]\d+){0,2} will only capture the value of the last iteration in the group.
As this repetition is inside the outer capture group 2, the group 2 value will contain the whole match and you don't have a nice separation of values in groups that you are after.
According to the notes only one occurrence can of either one of them can be in the match.
If you want to match both combinations of : and + but if they can not be the same, you can use a capture group with a backreference.
The values are in group 1, 3 and 4.
(%n)(?:([:+])(\d+)(?:(?!\2)([:+]\d+))?)?(?!\S)
The pattern matches:
(%n) Capture group 1, match %n
(?: Non capture group to match as a whole
([:+]) Capture group 2, match either : or +
(\d+) Capture group 3, match 1+ digits
(?: Non capture group to match as a whole
(?!\2) Negative lookahead, assert what is directly to the right is not the same value as captured in group 2
([:+]\d+) Capture group 4, match : or +
)? Close group and make it optional
)? Close group and make it optional
(?!\S) Assert a whitespace boundary to the right to prevent a partial match
Regex demo
const regex = /(%n)(?:([:+])(\d+)(?:(?!\2)([:+]\d+))?)?(?!\S)/;
[
"%n:32+5000",
"%n:32",
"%n+5000",
"%n",
"%n+32:5000",
"%n+32",
"%n:5000",
"%n:32:5000",
"%n+32+5000"
].forEach(s => console.log(`${regex.test(s)} ==> ${s}`));

Javascript regex - check if string contains two or more of same letter

I'm working on some regex with JavaScript. I have now created a regex that checks if a string has two or more of the same letter following each other. I would want to create a regex that checks if a word / string contains two or more of one particular letter, no matter if they are after each other or just in the same word / string.
It would need to match: drama and anaconda, but not match: lame, kiwi or tree.
This is the regex in JS.
const str = "anaconda";
str.match(/[a]{2,}/);

Use
\w*(\w)\w*\1\w*
See proof
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))

My thought process was something like this:
A word can start with any Alphabet
A word can end with any Alphabet
Similar letters can have zero or multiple alphabets between them
If any of it's letters are similar and it fits the criteria above then accept the word
regex = /[a-z]*([a-z])[a-z]*\1+[a-z]*/

Regex that allows PascalCased with one or more consecutive uppercase letters and numbers

I'm trying to check user input for PascalCased names and this works, but I would like to also allow one or more consecutive upperCase letters eg. UNOrganization and also allow numbers in between eg. A2BOrganization.
So all of the following should be allowed: ABCWord, A3BWord, OtherWord, Word3, Word3AB (last unlikely but if possible fine)
if (value.match(/^[A-Z][a-z]+(?:[A-Z][a-z]+)*$/)) {
//logic here
}
Regex is a little beyond me and the logic to parse the strings would be too long for my needs and I know this can be done in a one-liner with regex so hopefully someone more savvy can help me.

Use
^[A-Z]+[a-z]*(?:\d*(?:[A-Z]+[a-z]*)?)*$
See proof
If you require at least one lowercase letter in the input string:
^(?=.*[a-z])[A-Z]+[a-z]*(?:\d*(?:[A-Z]+[a-z]*)?)*$
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------------------------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[A-Z]+ any character of: 'A' to 'Z' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
[a-z]* any character of: 'a' to 'z' (0 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\d* digits (0-9) (0 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[A-Z]+ any character of: 'A' to 'Z' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
[a-z]* any character of: 'a' to 'z' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

/^(?=.*[a-z])[A-Z]\d*[a-z]*(([A-Z\d]*[A-Z]|[A-Z][A-Z\d]*|\d+$)[a-z]*)*$/
seems to fit your requirements. See tests below.
The idea is to start with a capital letter followed by an optional digit and zero or more lowercase letters. Then enter the main group repeated zero or more times. This group matches one or more capital letters or digits followed by zero or more lowercase letters. The alternation handles allowing digits at the end and disallowing lone digits sandwiched between two lowercase characters.
const pat = /^(?=.*[a-z])[A-Z]\d*[a-z]*(([A-Z\d]*[A-Z]|[A-Z][A-Z\d]*|\d+$)[a-z]*)*$/;
const tests = [
"UNOrganization",
"A2BOrganization",
"ABCWord",
"A3BWord",
"OtherWord",
"Word3",
"Word3AB",
"a",
"A",
"Aa",
"aA",
"AAA",
"aaa",
"9A",
"A4",
"A4a",
"AaA",
"AaAa",
"Aa1",
"AaA1",
"Aa1a",
"",
];
const len = 2 + Math.max(...tests.map(e => e.length));
tests.forEach(e =>
console.log(`${`'${e}'`.padStart(len)} => ${pat.test(e)}`)
);

Another pattern:
^[A-Z]+[A-Za-z]*(?:[0-9]+(?:[A-Z]+[A-Za-z]*)*)*$
[A-Z]+[A-Za-z]*: At least one capital letter optionally followed a sequence of letters of any case, ...
(?: ... )*: ... optionally followed by a sequence of :
[0-9]+: one or more digit...
(?: ... )*: optionally followed by a sequence of :
[A-Z]+[A-Za-z]*: at least one capital letter optionally followed a sequence of letters of any case
Regex101.com working examples (replaced ^ and $ by word delimiters \b).

RegEx for match whole word in sentences - javascript

Trying to work out what the right RegEx would be for finding "s***" in a series of strings, e.g:
match for "find s*** in s*** foobar"
match for "s***"
don't match for "s******"
don't match for "s****** foobar"
I'm using a match because I want to count the number of instances of matches in the sentence. I was trying "s*{3}" as a starting point, and variations on $ and \b or \B but I can't quite figure it out.
I created some tests here to try it out, if that's helpful.
https://regex101.com/r/VdLyOY/2

You may use this regex with a negative lookahead:
/\bs\*{3}(?!\*)/g
RegEx Demo
or with a positive lookahead:
/\bs\*{3}(?=\s|$)/g
RegEx Details:
\bs: Match letter s after a word bounday
\*{3}: Match * 3 times i.e. ***
(?!\*): Negative lookahead to assert that we don't have a * ahead
(?=\s|$): Positive lookahead to assert that we have a whitespace or line end at next position

/\bs\*{3}(\s|$)/g might work depending on exactly what your criteria are.

Use
/\bs\*{3}\B(?!\*)/g
See proof
EXPLANATION
EXPLANATION
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
s 's'
--------------------------------------------------------------------------------
\*{3} '*' (3 times)
--------------------------------------------------------------------------------
\B the boundary between two word chars (\w)
or two non-word chars (\W)
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\* '*'
--------------------------------------------------------------------------------
) end of look-ahead

Javascript Regex Input Validation to Prevent Duplicate Characters

I am attempting to validate text input with the following requirements:
allowed characters & length /^\w{8,15}$/
must contain /[a-z]+/
must contain /[A-Z]+/
must contain /[0-9]+/
must not contain repeated characters (ie. aba=pass and aab=fail)
Each test would return true when used with .test().
With modest familiarity, I am able to write the first 4 tests, albeit individually. The 5th test is not working out, negated lookahead (which is what i believe i need to be using) is challenging.
Here are a few value/result examples:
re.test("Fail1");//returns false, too short
re.test("StringFailsRule1");//returns false, too long
re.test("Fail!");//returns false, invalid !
re.test("FAILRULE2");//returns false, missing [a-z]+
re.test("failrule3");//returns false, missing [A-Z]+
re.test("failRuleFour");//returns false, missing [0-9]+
re.test("failRule55");//returns false, repeat of "5"
re.test("TestValue1");//returns true
Finally, the ideal would be a single combined test used to enforce all requirements.

This uses negative and positive lookaheads zero-length assertions for your tests and the .{8,15} bit validates length.
^(?!.*(.)\1)(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])\w{8,15}$
For your fifth rule I used a negative lookahead to make sure that a capture group of any character is never followed by itself.
Regexpal demo
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[0-9] any character of: '0' to '9'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
\w{8,15} word characters (a-z, A-Z, 0-9, _)
(between 8 and 15 times (matching the most
amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Develop Reference

JavaScript is the programming language of the Web.

Regex to exclude an entire line match if certain characters found - javascript

Related

JavaScript - Capture repeated group

Javascript regex - check if string contains two or more of same letter

Regex that allows PascalCased with one or more consecutive uppercase letters and numbers

RegEx for match whole word in sentences - javascript

Javascript Regex Input Validation to Prevent Duplicate Characters

Categories

Resources