RegExp: How to find only one match (or not match pattern) - javascript

I can't get how to write regexp right to be able match only heo. So for example if we found some l
char during parsing - cancel that match then.
'heo heo helo'.match(/he.*(?!l)o/gi) // should be only [heo, heo]
UPD:
I need to match as mutch as possible times among the string. Not the first one. Thanks
Example (wrong one):
console.log('heo heo helo'.match(/he.*(?!l)o/gi))

There are two issues:
.* - matches any zero or more chars other than line break chars as many as possible, and thus will match till the last occurrence of the subsequent patterns in the regex. You might use a non-greedy .*? here to fix the issue.
(?!l)o - always matches o, since o is not l, (?!l), a negative lookahead, always returns true, saying, yes, go ahead and return the match. You wanted a negative lookbehind, (?<!l) here.
To match strings starting with he and then matching any chars (other than line break chars) as few as possible and then o not preceded with l, you can use
/he.*?(?<!l)o/gi
See this regex demo. The .*?(?<!l)o pattern matches any 0+ chars other than line break chars as few as possible up to the leftmot o that is not immediately preceded with l.
Now, if you just want to match words that start with he and end with o not preceded with l, you can use
/\bhe[a-z]*(?<!l)o\b/gi
/\bhe(?![a-z]*lo\b)[a-z]*o\b/gi
See this regex demo and this regex demo.

console.log('heo heo helo'.match(/he.*(?!l)o/gi))
You matches any characters .* before checking the condition (?!l). Your regex should check condition before matching characters.
Besides, you want to match only hexxxo (x is not l), so you should use \b in your regex. I suggest following regex.
console.log('heo heo helo aheob'.match(/\bhe[^l]*o\b/gi));

Related

Create a regex to extract a string that contain a noral character and escaped string without DOS

I have a string like this:
///////AB?\a\b\c\d\d\e\\f\a\a\b\cd\ed\fmnopqrstuvwxy\z\a\a\a\a\a\a\a\a\a///imgy
it started with /// and ended with ///imgy (i and/or m and/or g and/or y), and between the beginning and end are the character are normal character like a or escaped character like \a.
Here is my regex:
/^\/{3}((?:\\?[\s\S])+?)\/{3}([imgy]{0,4})(?!\w)/
But the problem is that it is reported as "vulnerable to denial-of-service attacks". The main part that has the problem is
(?:\\?[\s\S])+
How can I create a right one that can figure out both a and \a? Thank you!
Regex Demo
Update:
I just found to use the following regex:
(?:\\[\s\S]+?)|(?:(?<!\\)[\s\S]+?)|(?:(?<=\\\\)[\s\S]+?)
to replace the old problematic part (?:\\?[\s\S])+?, and in this way, it can avoid requires exponential time to match certain inputs, and avoid vulnerable to denial-of-service attacks.
The details:
(?:\\[\s\S]+?) match any \a
(?:(?<!\\)[\s\S]+?) match any a, but not following \.
(?:(?<=\\\\)[\s\S]+?) match any a, but much following \\. This to make sure f is matched that following \\.
So the whole regex will look like this:
^\/{3}((?:\\[\s\S]+?)|(?:(?<!\\)[\s\S]+?)|(?:(?<=\\\\)[\s\S]+?))\/{3}([imgy]{0,4})(?!\w)
You might list the characters that are allowed to a character class, and optionally repeat an escaped character [a-z]
^\/{3,}[A-Za-z?]+(?:\\[a-z\\][A-Za-z?]*)*\/\/\/[imgy]{0,4}$
The pattern matches:
^ Start of string
\/{3,}[A-Za-z?]+ Match 3 or more / and 1 or more times any of the listed allowed chars
(?: Non capture group
\\[a-z\\] Match an escaped char a-z or \\
[A-Za-z?]* Optionally match any of the listed
)* Close an optionally repeat the group
\/\/\/[imgy]{0,4} Match /// and 0-4 times any of i m g or y If there should be at least a single char, you can use {1,4}
$ End of string
Regex demo

Negative lookahead ends match before the last character I need

I am looking to identify parts of a string that are hex.
So if you consider the string
CHICKENORBEEFPIE, the match would be BEEF.
To do this I came up with this expression /[A-F0-9]{2,}(?![^A-F0-9])/g
This works perfectly - except it only matches BEE, not BEEF. Unless BEEF happened to be at the end of the string.
The negative lookahead (?![^A-F0-9]) means: do not match anything followed by any characters other than A-F, 0-9. Which translates to match pattern followed by A-F, 0-9. Your regex is matching 'BEE' because it is followed by F, which satisfies the condition.
If you want to identify sequences of two or more characters that are hex code, just eliminate the negative lookahead altogether.
/[A-F0-9]{2,}/g translates to: Find as many matches, a pattern consisting of A-F or 0-9 that are 2 or more characters long.
It is because the last part of your regex: (?![^A-F0-9])
Because of that, you are matching any strings that aren't followed by a non-hex character... which ultimately means to find strings where the next character is a hex character.
You could either remove the ^ or remove that whole piece altogether as it isn't necessary. The following will retrieve what you are looking for: /[A-F0-9]{2,}/g
[A-F0-9]{2,}(?![A-F0-9]) will match what is expected, however negative lookahead is superfluous because quantifier are greedy by default.
[A-F0-9]{2,}(?![^A-F0-9]) doesn't work because assertion is that following character must not be any character except A-F0-9 (double negation).
the reason why the last character F in BEEF is not matched is that after matching BEEF, negtaive lookahead fails P is in [^A-F0-9] which makes backtrack to BEE which success because F is not in [^A-F0-9].
If you need the given result with pair-based values you can use /([A-F0-9]{2})+/g, if not (if it doesn't matter whether it's odd or not) you can use /[A-F0-9]{2,}/g instead.
Hope it helps.
Use
/[A-F0-9]{2,}(?![^A-F0-9])*/g

Why do I have to use replace in my regex for Palindromes?

I'm doing some challenges at FreeCodeCamp and I got lost in a basic challenge that
asks to Check for Palindromes. In the solution I had to do the following:
str = str.replace(/[^a-zA-Z]/g, '').toLowerCase();
But I don't understand the reason I have to use the replace method and the regular expression.
Anybody can help me, please?
With this code:
str.replace(/[^a-zA-Z]/g, '').toLowerCase()
You are getting rid of all the characters that are not letters from A-Z and a-z, and then you are setting the replaced string to lower case. The ^ at the beginning of a character class [..] like [^...] means not this characters. So, [a-z] means match letter from a to z while [^a-z] means match anything but letter from a to z
Demo
There are plenty of online regex tool explaining the patterns. From Regex101 you can see:
/[^a-zA-Z]/g
[^a-zA-Z] match a single character not present in the list below
a-z a single character in the range between a and z (case sensitive)
A-Z a single character in the range between A and Z (case sensitive)
g modifier: global. All matches (don't return on first match)

How to extract the last word in a string with a JavaScript regex?

I need is the last match. In the case below the word test without the $ signs or any other special character:
Test String:
$this$ $is$ $a$ $test$
Regex:
\b(\w+)\b
The $ represents the end of the string, so...
\b(\w+)$
However, your test string seems to have dollar sign delimiters, so if those are always there, then you can use that instead of \b.
\$(\w+)\$$
var s = "$this$ $is$ $a$ $test$";
document.body.textContent = /\$(\w+)\$$/.exec(s)[1];
If there could be trailing spaces, then add \s* before the end.
\$(\w+)\$\s*$
And finally, if there could be other non-word stuff at the end, then use \W* instead.
\b(\w+)\W*$
In some cases a word may be proceeded by non-word characters, for example, take the following sentence:
Marvelous Marvin Hagler was a very talented boxer!
If we want to match the word boxer all previous answers will not suffice due the fact we have an exclamation mark character proceeding the word. In order for us to ensure a successful capture the following expression will suffice and in addition take into account extraneous whitespace, newlines and any non-word character.
[a-zA-Z]+?(?=\s*?[^\w]*?$)
https://regex101.com/r/D3bRHW/1
We are informing upon the following:
We are looking for letters only, either uppercase or lowercase.
We will expand only as necessary.
We leverage a positive lookahead.
We exclude any word boundary.
We expand that exclusion,
We assert end of line.
The benefit here are that we do not need to assert any flags or word boundaries, it will take into account non-word characters and we do not need to reach for negate.
var input = "$this$ $is$ $a$ $test$";
If you use var result = input.match("\b(\w+)\b") an array of all the matches will be returned next you can get it by using pop() on the result or by doing: result[result.length]
Your regex will find a word, and since regexes operate left to right it will find the first word.
A \w+ matches as many consecutive alphanumeric character as it can, but it must match at least 1.
A \b matches an alphanumeric character next to a non-alphanumeric character. In your case this matches the '$' characters.
What you need is to anchor your regex to the end of the input which is denoted in a regex by the $ character.
To support an input that may have more than just a '$' character at the end of the line, spaces or a period for instance, you can use \W+ which matches as many non-alphanumeric characters as it can:
\$(\w+)\W+$
Avoid regex - use .split and .pop the result. Use .replace to remove the special characters:
var match = str.split(' ').pop().replace(/[^\w\s]/gi, '');
DEMO

Match last character, get next to last if regex is null

I'm trying to get the last character of a string, but only if it matches the following RegEx:
/\W/
If it doesn't match, I want it to move to the next last character and do the test again until it finds a match.
function getLastChar(s) {
var l = s.length - 1;
return s[l - i]; // need logic to keep checking for /\W/
}
getLastChar('hello.'); // returns '.', want it to return 'o'
I have the following idea of how to match if the character isn't a letter/number; however, I'm searching for a more elegant solution, one that would allow me to return the last matching character on a single line with a ternary if()
if(string.match(/\W/) !== null){
//keep looking for a match, going backwards.
}
/(\w)\W*$/
Capture one \w character, that is followed by zero or more \W characters, anchored to the end of the subject.
[Edited after comments.]
Easy enough.. just do a greedy match up to the last \W
string.match(/.*(\W)/)
If you're looking for a simple answer, you might be able to accomplish it with a single regex, no looping required - something like the following:
^.*(\W)[^\W]*$
The capture group will have the last non-word character.
For example, running this regex on ~~~~99*9 puts the character * in the capture group.
Edit:
However, after re-reading your question, it seems like you really meant to use \w not \W - in other words, you want the last word character, not the last non-word character. That's easily fixed by swapping \W for \w in the regex above.

Categories

Resources