Regex to match string with or without capturing group

Regex to match string with or without capturing group - javascript

I've been trying for a while and not sure what to Google but I want both of these to be matched
Someone does something
and
Someone tries to do something
What I thought would work is
/^Someone (tries to)? do(es)? something$/
But that only matches the second string.
They are separate strings, not a single string spanning multiple lines.

Use
/^Someone(?: tries to)? do(?:es)? something$/
See proof
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
Someone 'Someone'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
tries to ' tries to'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
do ' do'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
es 'es'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
something ' something'
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Related

Get headers from markdown using regex

I'm trying to get only h1 and h2 headers from markdown file using regex, but unfortunately I don't know regex well and can't write the correct solution.
With this expression I'm near the solution (I think so):
/\#{1,2} (.*?)(\\r\\n|\\r|\\n)/gm
But it returns also headers with more than two hashes.
Test case:
# first \r
## second \r
### third \r## fourth \r
This should return ['first', 'second', 'fourth']

Use
/(?<!#)#{1,2} (.*?)(\\r(?:\\n)?|\\n)/gm
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
#{1,2} '#' (between 1 and 2 times (matching the
most amount possible))
--------------------------------------------------------------------------------
' '
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
r 'r'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
n 'n'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
n 'n'
--------------------------------------------------------------------------------
) end of \2

Javascript regex to match JSDoc tags inside a documentation block

So right now I can isolate each JSDoc (ish) block I have in my code, for example I have this block here
/// #tag {type} name description
/// description continue
/// another description line
/// #tag {type} name description
/// description continue
/// #tag name description just one line.
/// #tag {type} name
I want to use a Regex expression to match each tag inside the block a tag needs to have a #name format then it can have a type {type} (optional) then it requires a name and finally it can have a description.. the description can be both multiple lined or single lined (and is also optional). So
the Regex I have came up with was:
^\/{3} #(?<tag>\w+)(?:[ \t]+{(?<type>.*)})?(?:[ \t]+(?<name>\w+))(?:[ \t]+(?<desc>[\s\S]*))?
my problem is with the description as soon as I hit the description it doesn't stop at the start of the next tag... I get the feeling that right now I'm using a greedy approach but I cannot find a wait to make it non greedy.
So the example above matches:
tag: tag
name: name
description:
description
/// description continue
/// another description line
/// #tag {type} name description
/// description continue
/// #tag name description just one line.
/// #tag {type} name
I wanted the description to stop just as the new tag starts or if the block ends

Use
/^\/{3} #(?<tag>\w+)(?:[ \t]+{(?<type>[^{}]*)})?[ \t]+(?<name>\w+)(?:[ \t]+(?<desc>.*(?:\n(?!\/{3} #\w).*)*))?/gm
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\/{3} '/' (3 times)
--------------------------------------------------------------------------------
# ' #'
--------------------------------------------------------------------------------
(?<tag> group and capture to \k<tag>:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \k<tag>
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[ \t]+ any character of: ' ', '\t' (tab) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
{ '{'
--------------------------------------------------------------------------------
(?<type> group and capture to \k<type>:
--------------------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \k<type>
--------------------------------------------------------------------------------
} '}'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
[ \t]+ any character of: ' ', '\t' (tab) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?<name> group and capture to \k<name>:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \k<name>
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[ \t]+ any character of: ' ', '\t' (tab) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?<desc> group and capture to \k<desc>:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\/{3} '/' (3 times)
--------------------------------------------------------------------------------
# ' #'
--------------------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
) end of \k<desc>
--------------------------------------------------------------------------------
)? end of grouping

Yep, the problem is that the description matcher is greedy. Changing * to *? to make it non-greedy fixes it. But it still has the problem of knowing when to stop. You can do that by checking if the input is over, or if there is a /// # ahead. Note that this includes the /// at the start of each description line: I don't think it's possible to filter them out directly in the regex, so you'd have to post-process the output to remove ///s in desc.
/^\/{3} #(?<tag>\w+)(?:[ \t]+{(?<type>.*)})?(?:[ \t]+(?<name>\w+))((?:[ \t]+(?<desc>[\s\S]*?)((?=\/\/\/ #)|\s*\z)))?/gm

Get either HH:mm:ss, mm:ss regex

I have a string where I would like to extract a time stamp. The time stamp could be anywhere in the string, and the timestamp could be different lengths. Think youtube video time stamping.
Here is an example
What would you change for education from your 27:19:45 experience k-12? #change'
I would like to extract 27:19:45. However other strings could contain only MM:SS
The regex I could come up with was
const timeRegex = /(?:(?:([01]?\d|2[0-3]):)?([0-5]?\d):)?([0-5]?\d)/gm;
When I run this regex it also picks up the 12 in the k-12 part of the string which I don't want.
Sorry I removed the seconds part. If it were just seconds it would have 0:12

Use
(?<!\S)(?:(?:(\d{1,2}):)?([0-5]?\d):)?([0-5]?\d)(?!\S)
See regex proof.
In your expression, you need to make MM part compulsory.
EXPLANATION
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\d{1,2} digits (0-9) (between 1 and 2 times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
[0-5]? any character of: '0' to '5' (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
[0-5]? any character of: '0' to '5' (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-ahead

This one should work:
const timeRegex = /\d\d:\d\d(:\d\d)?/gm
Or, ff the numbers between the colons can be one or two:
const timeRegex = /\d\d?:\d\d?(:\d\d?)?/gm
Test:
s = "What would you change for education from your 27:09:02 experience k-12? #change'"
-> "What would you change for education from your 27:09:02 experience k-12? #change'"
s.match(/\d\d?:\d\d?(:\d\d?)?/gm)
-> ["27:09:02"]
s = "What would you change for education from your 27:09 experience k-12? #change'"
-> "What would you change for education from your 27:09 experience k-12? #change'"
s.match(/\d\d?:\d\d?(:\d\d?)?/gm)
-> ["27:09"]
s = "What would you change for education from your experience k-12? #change'"
-> "What would you change for education from your experience k-12? #change'"
s.match(/\d\d?:\d\d?(:\d\d?)?/gm)
-> null

You can use positive lookahead and positive lookbehind and select only if there is \s on both the side of the time.
const str =
"What would you change for education from your 27:19:45 and 12:12 and 11 experience k-12? #change";
const result = str.match(/(?<=\s)(\d\d:?)+(?=\s)/g);
console.log(result);

I want to limit number of subdomain in Regular Expression

I want to limit levels of subdomain to 3 levels only. trying regex below fails
([\.]?[a-z]*){3}
My Target: abc.def.ghi
but
regex above accepts abc.def.ghi. (Notice the last .)

Use
^(?:[a-z]+(?:\.[a-z]+){0,2})?$
See proof.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (between 0 and
2 times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
){0,2} end of grouping
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Javascript Regex Input Validation to Prevent Duplicate Characters

I am attempting to validate text input with the following requirements:
allowed characters & length /^\w{8,15}$/
must contain /[a-z]+/
must contain /[A-Z]+/
must contain /[0-9]+/
must not contain repeated characters (ie. aba=pass and aab=fail)
Each test would return true when used with .test().
With modest familiarity, I am able to write the first 4 tests, albeit individually. The 5th test is not working out, negated lookahead (which is what i believe i need to be using) is challenging.
Here are a few value/result examples:
re.test("Fail1");//returns false, too short
re.test("StringFailsRule1");//returns false, too long
re.test("Fail!");//returns false, invalid !
re.test("FAILRULE2");//returns false, missing [a-z]+
re.test("failrule3");//returns false, missing [A-Z]+
re.test("failRuleFour");//returns false, missing [0-9]+
re.test("failRule55");//returns false, repeat of "5"
re.test("TestValue1");//returns true
Finally, the ideal would be a single combined test used to enforce all requirements.

This uses negative and positive lookaheads zero-length assertions for your tests and the .{8,15} bit validates length.
^(?!.*(.)\1)(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])\w{8,15}$
For your fifth rule I used a negative lookahead to make sure that a capture group of any character is never followed by itself.
Regexpal demo
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[0-9] any character of: '0' to '9'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
\w{8,15} word characters (a-z, A-Z, 0-9, _)
(between 8 and 15 times (matching the most
amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Develop Reference

JavaScript is the programming language of the Web.

Regex to match string with or without capturing group - javascript

Related

Get headers from markdown using regex

Javascript regex to match JSDoc tags inside a documentation block

Get either HH:mm:ss, mm:ss regex

I want to limit number of subdomain in Regular Expression

Javascript Regex Input Validation to Prevent Duplicate Characters

Categories

Resources