Get headers from markdown using regex - javascript

I'm trying to get only h1 and h2 headers from markdown file using regex, but unfortunately I don't know regex well and can't write the correct solution.
With this expression I'm near the solution (I think so):
/\#{1,2} (.*?)(\\r\\n|\\r|\\n)/gm
But it returns also headers with more than two hashes.
Test case:
# first \r
## second \r
### third \r## fourth \r
This should return ['first', 'second', 'fourth']

Use
/(?<!#)#{1,2} (.*?)(\\r(?:\\n)?|\\n)/gm
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
#{1,2} '#' (between 1 and 2 times (matching the
most amount possible))
--------------------------------------------------------------------------------
' '
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
r 'r'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
n 'n'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
n 'n'
--------------------------------------------------------------------------------
) end of \2

Related

Matching an entire sentence containing words even if the sentence spans multiple lines

Attempting to match the entire sentence of a document containing certain words even if the sentence spans multiple lines.
My current attempts only capture the sentence if it does not span to the next lines.
^.*\b(dog|cat|bird)\b.*\.
Using ECMAScript.
When no abbreviations in the input are expected use
/\b[^?!.]*?\b(dog|cat|bird)\b[^?!.]*[.?!]/gi
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
[^?!.]*? any character except: '?', '!', '.' (0 or
more times (matching the least amount
possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
dog 'dog'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
cat 'cat'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
bird 'bird'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
[^?!.]* any character except: '?', '!', '.' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
[.?!] any character of: '.', '?', '!'

Javascript regex to match JSDoc tags inside a documentation block

So right now I can isolate each JSDoc (ish) block I have in my code, for example I have this block here
/// #tag {type} name description
/// description continue
/// another description line
/// #tag {type} name description
/// description continue
/// #tag name description just one line.
/// #tag {type} name
I want to use a Regex expression to match each tag inside the block a tag needs to have a #name format then it can have a type {type} (optional) then it requires a name and finally it can have a description.. the description can be both multiple lined or single lined (and is also optional). So
the Regex I have came up with was:
^\/{3} #(?<tag>\w+)(?:[ \t]+{(?<type>.*)})?(?:[ \t]+(?<name>\w+))(?:[ \t]+(?<desc>[\s\S]*))?
my problem is with the description as soon as I hit the description it doesn't stop at the start of the next tag... I get the feeling that right now I'm using a greedy approach but I cannot find a wait to make it non greedy.
So the example above matches:
tag: tag
name: name
description:
description
/// description continue
/// another description line
/// #tag {type} name description
/// description continue
/// #tag name description just one line.
/// #tag {type} name
I wanted the description to stop just as the new tag starts or if the block ends
Use
/^\/{3} #(?<tag>\w+)(?:[ \t]+{(?<type>[^{}]*)})?[ \t]+(?<name>\w+)(?:[ \t]+(?<desc>.*(?:\n(?!\/{3} #\w).*)*))?/gm
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\/{3} '/' (3 times)
--------------------------------------------------------------------------------
# ' #'
--------------------------------------------------------------------------------
(?<tag> group and capture to \k<tag>:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \k<tag>
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[ \t]+ any character of: ' ', '\t' (tab) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
{ '{'
--------------------------------------------------------------------------------
(?<type> group and capture to \k<type>:
--------------------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \k<type>
--------------------------------------------------------------------------------
} '}'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
[ \t]+ any character of: ' ', '\t' (tab) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?<name> group and capture to \k<name>:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \k<name>
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[ \t]+ any character of: ' ', '\t' (tab) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?<desc> group and capture to \k<desc>:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\/{3} '/' (3 times)
--------------------------------------------------------------------------------
# ' #'
--------------------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
) end of \k<desc>
--------------------------------------------------------------------------------
)? end of grouping
Yep, the problem is that the description matcher is greedy. Changing * to *? to make it non-greedy fixes it. But it still has the problem of knowing when to stop. You can do that by checking if the input is over, or if there is a /// # ahead. Note that this includes the /// at the start of each description line: I don't think it's possible to filter them out directly in the regex, so you'd have to post-process the output to remove ///s in desc.
/^\/{3} #(?<tag>\w+)(?:[ \t]+{(?<type>.*)})?(?:[ \t]+(?<name>\w+))((?:[ \t]+(?<desc>[\s\S]*?)((?=\/\/\/ #)|\s*\z)))?/gm

Regex to match string with or without capturing group

I've been trying for a while and not sure what to Google but I want both of these to be matched
Someone does something
and
Someone tries to do something
What I thought would work is
/^Someone (tries to)? do(es)? something$/
But that only matches the second string.
They are separate strings, not a single string spanning multiple lines.
Use
/^Someone(?: tries to)? do(?:es)? something$/
See proof
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
Someone 'Someone'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
tries to ' tries to'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
do ' do'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
es 'es'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
something ' something'
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

I want to limit number of subdomain in Regular Expression

I want to limit levels of subdomain to 3 levels only. trying regex below fails
([\.]?[a-z]*){3}
My Target: abc.def.ghi
but
regex above accepts abc.def.ghi. (Notice the last .)
Use
^(?:[a-z]+(?:\.[a-z]+){0,2})?$
See proof.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (between 0 and
2 times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
){0,2} end of grouping
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Javascript regex for specific name validation

I have a university assignment to write a JS regex for name validation: name can include spaces at the beginning and the end (don't ask anything, it's our teacher demand), also it can include such chars like: -, ' and space (" "). At the moment, my regex looks like that:
var Nameregex = /^( ?)*[A-Z]+((['-])?[a-z]+)*(([ ]?[a-z]*)*)*$/g;
It works almost perfect, but except for one case: one word (words separated from each other with spaces and - symbols) can contain only one ' symbol.
For example, names like John-andrew'andrew'john shouldn't work. But John-andrew'john-andrew'john should work.
Add a (?!.*'[A-Za-z]+') negative lookahead after ^:
/^(?!.*'[A-Za-z]+')\s*[A-Z]+(?:['-]?[a-z]+)*(?:\s*[a-z]*)*$/
See proof
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
[A-Za-z]+ any character of: 'A' to 'Z', 'a' to 'z'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
[A-Z]+ any character of: 'A' to 'Z' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
['-]? any character of: ''', '-' (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
[a-z]* any character of: 'a' to 'z' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Categories

Resources