Here Is My Regex Code:
/fun(niest|!ny$)?/ig
How would I get the word "fun" or "funniest" but not the word "funny" through regex, here is what I have. Is there any way of doing this, if so please help!
You can use word boundaries \b and an optional group (?:niest)?:
/\bfun(?:niest)?\b/ig
See the regex demo
The pattern matches:
\b - leading word boundary
fun - literal character sequence fun
(?:niest)? - an optional (one or zero occurrences) niest literal character sequence (not captured into any group since the group is non-capturing, i.e. used only for grouping)
\b - trailing word boundary.
Your fun(niest|!ny$)? matches fun, or funniest or fun!ny that is at the end of the string.
Related
For example, I want to match all strings that contain the word 'cat' or 'dog' such as concatenation, doghouse, underdog, catastrophe, or endogamy. But I want to exclude the words dogs or cats from being matched. I tried this task using the following regex.
\\w*(cat|dog)(s(?=\w+))*\
But this regex doesn't help me select whatever is after the s. Is there some other way to achieve this? Any help is appreciated.
If you also don't want to match dogsdogs you might write the pattern as:
\b(?!\w*(?:cats\b|dogs\b))\w*(?:cat|dog)\w*
The pattern matches:
\b a word boundary
(?! Negative lookahead, assert that to the right is not
\w*(?:cats\b|dogs\b) Match optional word characters followed by the word cat or dog followed by a word boundary
) Close the lookahead
\w*(?:cat|dog)\w* Match cat or dot between word characters
Regex demo
If a lookbehind assertion is supported, and you also want to allow other non whitespace characters, you can use \S to match a non whitespace character instead of \w that matches a word character.
(?<!\S)(?!\S*(?:cats\b|dogs\b))\S*(?:cat|dog)\S*
See another Regex demo
I understand your requirements as: match everything that has cat/dog anywhere in word apart from the specific words 'cats' and 'dogs'
\b(?!cats\b|dogs\b)(?=\S*cat\S*|\S*dog\S*)\S*\b
(very) Rough human translation: Find a point where a word isn't cats or dogs (ending with word boundary) and then find a point where a word has cat or dog (either at start, middle, or end) then match everything till the end of the word from that point
Note: flavour - PCRE2
This regex avoids a lookbehind, which is not supported by all browsers.
const regex = /\b(?!cats\b|dogs\b)[a-z]*(?:cat|dog)[a-z]*\b/gi;
const m = 'concatenation, doghouse, underdog, catastrophe, endogamy, dogshore and catstick should match, but not cats and dogs.'.match(regex);
console.log(m);
Output:
[
"concatenation",
"doghouse",
"underdog",
"catastrophe",
"endogamy"
]
Explanation of regex:
\b -- word boundary
(?!cats\b|dogs\b) -- negative lookahead for just cats or dogs
[a-z]* -- optional alpha chars
(?:cat|dog) -- non-capture group for literal cat or dog
[a-z]* -- optional alpha chars
\b -- word boundary
I'm trying to create a custom word boundary (like \b) that also takes words starting or ending with the unicode characters "ÆØÅæøå" into consideration.
Now the only thing I can come up with is this ugly thing
((?<![\wÆØÅæøå])(?=[\wÆØÅæøå])|(?![\wÆØÅæøå])(?<=[\wÆØÅæøå]))
Is there a more elegant solution to this? Or is this the only way.
You can use:
(?<!\p{L}\p{M}*|[\p{N}_]) // leading word boundary, similar to \<, [[:<:]] or \m in other flavors
(?![\p{L}\p{N}_]) // trailing word boundary, similar to \>, [[:>:]] or \M
Compile the regex with the u modifier to enable Unicode category classes.
The (?<!\p{L}\p{M}*|[\p{N}_]) is a negative lookbehind that matches a location not immediately preceded with a letter followed with zero or more diacritic marks or a digit or an underscore.
The (?![\p{L}\p{N}_]) is a negative lookahead that matches a location not immediately followed with a letter, digit or an underscore.
My regular expression should match if there aren't any consecutive letters that are the same.
for example :
"ploplir" should match
"ploppir" should not match
so I use this regular expression:
/([.])\1{1,}/
But It does the exact contrary of what I want. How can I make the match work correctly?
Code
See regex in use here
\b(?!\w*(\w)\1)\w+\b
var r = /\b(?!\w*(\w)\1)\w+\b/g
var s = "ploplir ploppir"
console.log(s.match(r))
Explanation
\b Assert position as a word boundary
(?!\w*(\w)\1\w*) Negative lookahead ensuring what follows doesn't match
\w* Match any number of word characters
(\w) Capture a word character into capture group 1
\1 Match the same text as most recently matched by the 1st capture group
\w+ Match one or more word characters
\b Assert position as a word boundary
Maybe you could use lookarounds to check if there are no consecutive letters in the string:
^(?!.*(.)(?=\1)).*$
Explanation
From the beginning of the string ^
A negative look ahead (?!
Which asserts that following .* a character (.) is not followed by the same character (?=\1) using the group reference \1
Close the negative lookahead
Match zero or more characters .*
The end of the string
I'm trying to parse following sentences with regex (javascript) :
I wish a TV
I want some chocolate
I need fire
Currently I'm trying : I(\b[a-zA-Z]*\b){0,5}(TV|chocolate|fire) but it doesn't work. I also made some test with \w but no luck.
I want to allow any word (max 5 words) between "I" and the last word witch is predefined.
To account for non-word chars in-between words, you may use
/I(?:\W+\w+){0,5}\W+(?:TV|chocolate|fire)/
See the regex demo
The point is that you added word boundaries, but did not account for spaces, punctuation, etc. (all the other non-word chars) between "words".
Pattern details:
I - matches the left delimiter
(?:\W+\w+){0,5}\W+ - matches 0 to 5 sequences (due to the limiting quantifier {n,m}) of 1+ non-word chars (\W+) and 1+ word chars after them (\w+), and a \W+ at the end matches 1 or more non-word chars that must be present to separate the last matched word chars from the...
(?:TV|chocolate|fire) - matches the trailing delimiter
You need to add the whitespace after the I. Otherwise it wouldn´t capture the whole sentence.
I(\b[a-zA-Z ]*\b){0,5}(TV|chocolate|fire)
I greate site to test regex expressions is regexr
If you don't care about the spaces, use:
/I(\s[a-zA-Z]*\s?){0,5}(TV|chocolate|fire)/
Try
/I\s+(?:\w+\s+){0,5}(TV|chocolate|fire)/
(Test here)
Based on Stefan Kert version, but rely on right side spaces of each extra word instead of word boundaries.
It also accepts any valid "word" (\w) character words of any length and any valid spacing character (not caring for repetitions).
I am trying to match smileys followed by a word boundary \b.
Let's say I wanna match :p and :) followed by \b.
/(:p)\b/ is working fine but why is /(:\))\b/ behaving the opposite?
You cannot use a word boundary here as ) is a non-word character.
Simply put: \b allows you to perform a whole words only search using
a regular expression in the form of \bword\b. A word character is a
character that can be used to form words. All characters that are not
word characters are non-word characters.
Use (:\)) to match :) and capture it in the first capturing group.
Use /(:\))(?![a-z0-9_])/i in order to avoid matching any :)s with letters after the smiley. It is an equivalent of (:\))\B.
\B is the negated version of \b. \B matches at every position where \b
does not. Effectively, \B matches at any position between two word
characters as well as at any position between two non-word characters.
See demo 1 and demo 2.
Addition to stribizhev's answer.. you can use (:\))\B
Examples for when to use what:
\b : string = That man is batman. regex = \bman\b matches only man and not the man in batman because position between tm is not a word boundary (it is a word).
\B : string = I am bat-man and he is super - man. regex = \B-\B matches - in super - man whereas \b-\b matches - in bat-man since position between t- and -m are word boundaries.. and (space) -, - (space) is not.
Note: It is easy to understand if you consider \b or \B as a position between two characters and if the transition from character to character is word to word or word to non word