JS Regex - Match until the end of line OR a character - javascript

Here is an example of what I'm trying to match:
Match everything after this:::
one
two
three
Match this also:::
one
two
three
___
but not this
My code:
const thing = /[^:]:::\n([\s\S]*)(_{3}|$)/gm
I want it to match everything AFTER ':::', but end either when it sees ___, or if that is not there, then the end of the text input, $.
It works for the FIRST example, but it continues to match the text after the ___ in the second example.
Any ideas how to make this right?
I'm only interested in the results in the first grouping. I had to group the (_{3}|$) otherwise it creates an infinite loop.

The pattern [^:]:::\n([\s\S]*)(_{3}|$) that you tried matches too much because [\s\S]* will match all the way to the end. Then when at the end of string, there is an alternation (_{3}|$) matches either 3 times an underscore or the end of the string.
Then pattern can settle matching the end of the string.
You could use a capture group, and match all following lines that do not start with ___
[^:](:::(?:\n(?!___).*)*)
[^:] Match any char except :
( Capture group 1
::: Match literally
(?:\n(?!___).*)* Match all consecutive lines that does not start with ___
) Close group 1
Regex demo
Or with a negative lookbehind if supported to get a match only, asserting not : to the left
(?<!:):::(?:\n(?!___).*)*
Regex demo

Related

Regex match first character once, followed by repetitive matching until end

I'm trying to match characters that shouldn't be allowed in a username string to then be replaced.
Anything outside this range should match first character [a-zA-Z] <-- restricting the first character is causing problems and I don't know how to fix it
And then match everything else outside this range [0-9a-zA-Z_.] <---- repeat until the end of the string
Matches:
/////hey/// <-- first match /////, second match ///
[][123Bc_.// <-- first match [][, second match //
(/abc <-- should match (/
a2__./) <-- should match /)
Non Matches:
a_____
b__...
Current regex
/^([^a-zA-Z])([^\w.])*/
const regex = /^([^a-zA-Z])([^0-9a-zA-Z_.])*/;
'(/abc'.replace(regex, '') // => return expected abc
'/////hey///'.replace(regex, '') // => return expected "hey"
/^([^a-zA-Z])([^\w.])*/
You can not do it this way, with negated character classes and the pattern anchored at the start. For example for your va2__./), this of course won’t match - because the first character is not in the disallowed range, so the whole expression doesn’t match.
Your allowed characters for the first position are a subset, of what you want to allow for “the rest” - so do that second part first, replace everything that does not match [0-9a-zA-Z_.] with an empty string, without anchoring the pattern at the beginning or end.
And then, in the result of that operation, replace any characters not matching [a-zA-Z] from the start. (So that second pattern does get anchored at the beginning, and you’ll want to use + as quantifier - because when you remove the first invalid character, the next one becomes the new first, and that one might still be invalid.)

Match everything Between two Characters except when there is a Blank line

I am trying to find a regex pattern that matches everything between one or two dollar signs, \$.*\$|\${2}.*\${2}, except when there is a blank line (it's either two or one, can't be this: \$.*\$\$). Below, I provide examples of what I want to match and what I want to skip. The match should include/exclude everything.
Examples of what I want to match:
$$ \abc + ko$$
$*-ls$
Here the single dollar sign has a escape character before it so it won't break the match.
$$
654a\$
$$
$123
a*/\
[]{}$
Examples of what I want to exclude:
$$
asd
$$
$asdasd$$
Again, I want to match everything if they are bound by one $ or two $ at each side, unless there is (are) empty line(s) in between.
So far I figured out how to match the ones occurring in a single line, but I am struggling how to include break-line and exclude them if the whole line is empty.
Here is what I have:
^\${2}.*[^\\$]\${2}$|^\$.*[^\\$]\$$
Demo
You may use
/^[^\S\r\n]{0,3}(\${1,2})(?:(?!\1|^$)[\s\S])+?\1[^\S\r\n]*$/gm
See the regex demo
Details
^ - start of a line (since m makes ^ match line start positions)
[^\S\r\n]{0,3} - zero to three occurrences of any whitespace but CR and LF
(\${1,2}) - Group 1 holding one or two $ chars
(?:(?!\1|^$)[\s\S])+? - any char ([\s\S]), 1 or more occurrences, but as few as possible (due to the lazy +? quantifier), that does not start the same sequence as captured in Group 1 (\1) and a position between two line break chars (^$)
\1 - the same value as in Group 1 ($ or $$)
[^\S\r\n]* - zero or more occurrences of any whitespace but CR and LF
$ - end of a line (since m makes ^ match line start positions)
For your example data, you might use
(?<!\S)(\$\$?+)[^\r\n$]*(?:\$(?!\$)[^\r\n$]*)*(?:\r?\n(?![^\S\r\n]*$)[^\r\n$]*(?:\$(?!\$)[^\r\n$]*)*)*\1(?!\S)
Explanation
(?<!\S) Assert a whitespace boundary on the left
(\$\$?+) Capture group 1, match $ or $$ where the second one is possessive (prevent backtracking)
[^\r\n$]*(?:\$(?!\$)[^\r\n$]*)* Match any char except $ or newline or a $ when not directly followed by another $
(?: Non capture group
\r?\n(?![^\S\r\n]*$) Match a newline, assert not a line consisting of only spaces
[^\r\n$]*(?:\$(?!\$)[^\r\n$]*)* Same pattern as above
)* Close the group and repeat 0+ times
\1 Backreference to what is captured in group 1
(?!\S) Assert a whitespace boundary on the right
Regex demo

javascript regular expressions - groups

I"m currently studying regular expression groups. I'm having trouble fully understanding the first example presented in the book under groups. The book gives the following example:
/(\S+) (\S*) ?\b(\S+)/
I understand that this will match at most three words (consisting of any character except a white space), where the second word and space is optional.
What I have trouble understanding is the function of the boundary condition to start the match of the last group at the beginning of the third word.
When there are three words It makes no difference whether it is included or not.
When there are only two words there is a difference between group #2 and group #3
So, my question is as follows
When there are two words, why is the presence of \b causing group#2 to be an empty string as expected, but when not present causes group #2 to contain the second word minus the last letter and group #3 to contain the last letter of the second word?
When there are two words, why is the presence of \b causing group#2 to be an empty string as expected
Look at the first and third groups - being (\S+), they must contain characters. When there are two words, those two words must go into the first and third group - the second group, since it's repeated with *, will not consume any characters, and will be the empty string.
but when not present causes group #2 to contain the second word minus the last letter and group #3 to contain the last letter of the second word?
When the pattern is
(\S+) (\S*) ?(\S+)
once the engine has matched the first word, the engine will start trying to match the second word. So if the input is foo bar, we can consider how the pattern (\S*) ?(\S+) works on bar.
The engine first tries to consume all remaining characters in the string with the \S*. This fails, because the last group is required to contain at least one character, so the engine backs up a step, and has the \S* group match all but the last character. This results in a successful match, because the position before the last character does match \s?(\S+).
You can see this process visually here:
https://regex101.com/r/RAkEOt/1/debugger
In the first pattern, the word boundary before the last group ensures that the second group does not match any characters, in case there are only two words in the string - rather than backtracking to just before the last character, it must back up all the way until a word boundary is found:
The original pattern may be slightly flawed - \b matches a word boundary, but not every non-space character is a word character - it (probably undesirably) matches foo it's where the it' goes into the second group, and the s goes into the third group.
The difference comes from the second group (\S*) - it will capture any amount of non-whitespace characters. So, when you have two words but three groups where the last one is (\S+) - match at least one non-whitespace character, the regex engine will try to satisfy both group 2 and 3.
Remember that it's matching a pattern and you've not told it not to match like that. Hence it does the minimum work necessary - the second group's \S* will initially match everything grabbing brown - the next part of the pattern is an optional space, which passes, then it gets to the final group \S+ and since it has a mandatory character, the second match will release matches one by one until group 3 is satisfied.
You can see this here - I've defined the third group to have at least two mandatory characters, hence it only gets two:
let [ , group1, group2, group3] = "the brown".match(/(\S+) (\S*) ?(\S{2,})/);
console.log("group 1:", group1)
console.log("group 2:", group2)
console.log("group 3:", group3)
When you instead add the word boundary \b to the pattern, you cannot have group 2 have any characters and satisfy the later condition - when a regex consumes a character the rest of the pattern will only continue from that character onward, hence you cannot have, for example group 2 match b and then have a word boundary followed by rown. The only way that (\S+) (\S*) ?\b(\S+) can be satisfied is the following:
group 1 matches the
the space character is matched
group 2 matches nothing, which is acceptable as it can match any amount, including zero
the optional space matches zero spaces
there is a word boundary
group 3 consumes the rest of the letters - brown

Regex to match first character and no more than 2 identical consecutives

I'm using jQuery to add a pattern attribute to a text field based on a letter I select from an array. I'm trying to restrict the values that text field can accept with a regex, but it doesn't work properly.
What I want is that the first char of the value must be the letter I choose of the array, and then don't accept more than 2 identical consecutive caracters.
My regex is this:
^["+letter+"](?!(.)\1).{2}.*
And it seems to work when I'm testing it in regexr.com, but when I test it in my page, just the part of match the 1st char works, and the rest don't. When I type something like "Aaaaron", the message of "invalid entry" doesn't show.
Thanks in advance.
Description
^(.)(?!\1{2})
This regex will do the following:
capture the first character
validate the first character is then not repeated 2 more times. If two more of the same character are present after the first occurrence, then you have 3 of the same characters in a row.
Note to make this expression view upper and lower case versions of a letter as the same character you'll need to use the case insensitive flag.
Live Example
https://regex101.com/r/xG9mE9/2
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
\1{2} what was matched by capture \1 (2 times)
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
try this regex: (with letter being 'a')
$("form").find("input[type=text]").attr("pattern", "(?=[Aa])(?!.*(.)\\1\\1).*");
it validates the starting letter, and that no character appears more than 2 times consecutively:
jsfiddle
notes:
you can't do case insensitive matching with HTML5 pattern attribute, so 'a' and 'A' are not the same thing ('Aaaron' isn't 3 a's in a row)
if adding pattern via a string (not a regex literal) in jquery/javascript, remember there's string interpolation first and then regex interpolation second (the backslash means something to String as well to Regex, you might need to double escape them: (\\1 for a backreference in this case)
you don't need ^ or $ to make the input value match the entire pattern only, the regex is wrapped in ^(?:regex)$ for you. This means that if your pattern does not consume the entire string it will not work: (?=[Aa])(?!.*(.)\\1\\1), which are just a couple of lookarounds, and would normally validate the input just fine, is a zero-width pattern, and without the .* at the end, does not work.

Catching start number and final number

i'm trying to create a regex to catch the first number in the line and the last one, but i'm having some problem with the last one:
The lines look like this:
00005 SALARIO MENSAL 17030 36.397.291,92 36.397.291,92
00010 HORAS TRABALHADAS 0798 19.731,93 19.731,93
And this is my regex:
(^\d+).*(\d)
As you can see here: http://regexr.com/3crbt is not working as expected. I can get the first one, but the last is just the last number.
Thanks!
You can use
/^(\d+).*?(\d+(?:[,.]\d+)*)$/gm
See the regex demo
The regex matches:
^ - start of the line
(\d+) - captures into Group 1 one or more digits
.*? - matches any characters but a newline, as few as possible up to
(\d+(?:[,.]\d+)*) - one or more digits followed with zero or more sequences of , or . followed with one or more digits (Group 2)
$ - end of the string
The /g modifier ensures we get all matches and /m modifier makes the ^ and $ match start and end of a line respectively.
I tried the following one:
(^(\d+))|(\d+$)
And its seems to work on the regexr.com thingy. But matching them up might require some assumptions that each line has at least two numbers.
You need to make the .* non-greedy by changing it to .*? and add + to the second digit sequence match.
^(\d+).*?(\d+)$
If you want to match the full last number, use this:
^(\d+).*?([\d\.,]+)$
Example

Categories

Resources