I want regex that will match "lead" or "leads" and will not match when it's part of another word like cheerleaders or leaders. I also don't want whitespaces matched before or after the word.
The closet I got was /(?:^|\W)Lead(?:$|\W){0,5}/g;
But this matches Leaders and whitespaces. This is in javascript if that makes a difference.
\bleads?\b is all you need. \b is a word boundary, which means the word ends at the boundary.
s? is an optional s
Demo
Related
I need help putting together a regex that will match word that ends with "Id" with case sensitive match.
Try this regular expression:
\w*Id\b
\w* allows word characters in front of Id and the \b ensures that Id is at the end of the word (\b is word boundary assertion).
Gumbo gets my vote, however, the OP doesn't specify whether just "Id" is an allowable word, which means I'd make a minor modification:
\w+Id\b
1 or more word characters followed by "Id" and a breaking space. The [a-zA-Z] variants don't take into account non-English alphabetic characters. I might also use \s instead of \b as a space rather than a breaking space. It would depend if you need to wrap over multiple lines.
This may do the trick:
\b\p{L}*Id\b
Where \p{L} matches any (Unicode) letter and \b matches a word boundary.
How about \A[a-z]*Id\z? [This makes characters before Id optional. Use \A[a-z]+Id\z if there needs to be one or more characters preceding Id.]
I would use
\b[A-Za-z]*Id\b
The \b matches the beginning and end of a word i.e. space, tab or newline, or the beginning or end of a string.
The [A-Za-z] will match any letter, and the * means that 0+ get matched. Finally there is the Id.
Note that this will match words that have capital letters in the middle such as 'teStId'.
I use http://www.regular-expressions.info/ for regex reference
Regex ids = new Regex(#"\w*Id\b", RegexOptions.None);
\b means "word break" and \w means any word character. So \w*Id\b means "{stuff}Id". By not including RegexOptions.IgnoreCase, it will be case sensitive.
Imagine you are trying to pattern match "stackoverflow".
You want the following:
this is stackoverflow and it rocks [MATCH]
stackoverflow is the best [MATCH]
i love stackoverflow [MATCH]
typostackoverflow rules [NO MATCH]
i love stackoverflowtypo [NO MATCH]
I know how to parse out stackoverflow if it has spaces on both sites using:
/\s(stackoverflow)\s/
Same with if its at the start or end of a string:
/^(stackoverflow)\s/
/\s(stackoverflow)$/
But how do you specify "space or end of string" and "space or start of string" using a regular expression?
You can use any of the following:
\b #A word break and will work for both spaces and end of lines.
(^|\s) #the | means or. () is a capturing group.
/\b(stackoverflow)\b/
Also, if you don't want to include the space in your match, you can use lookbehind/aheads.
(?<=\s|^) #to look behind the match
(stackoverflow) #the string you want. () optional
(?=\s|$) #to look ahead.
(^|\s) would match space or start of string and ($|\s) for space or end of string. Together it's:
(^|\s)stackoverflow($|\s)
Here's what I would use:
(?<!\S)stackoverflow(?!\S)
In other words, match "stackoverflow" if it's not preceded by a non-whitespace character and not followed by a non-whitespace character.
This is neater (IMO) than the "space-or-anchor" approach, and it doesn't assume the string starts and ends with word characters like the \b approach does.
\b matches at word boundaries (without actually matching any characters), so the following should do what you want:
\bstackoverflow\b
I'm trying to parse following sentences with regex (javascript) :
I wish a TV
I want some chocolate
I need fire
Currently I'm trying : I(\b[a-zA-Z]*\b){0,5}(TV|chocolate|fire) but it doesn't work. I also made some test with \w but no luck.
I want to allow any word (max 5 words) between "I" and the last word witch is predefined.
To account for non-word chars in-between words, you may use
/I(?:\W+\w+){0,5}\W+(?:TV|chocolate|fire)/
See the regex demo
The point is that you added word boundaries, but did not account for spaces, punctuation, etc. (all the other non-word chars) between "words".
Pattern details:
I - matches the left delimiter
(?:\W+\w+){0,5}\W+ - matches 0 to 5 sequences (due to the limiting quantifier {n,m}) of 1+ non-word chars (\W+) and 1+ word chars after them (\w+), and a \W+ at the end matches 1 or more non-word chars that must be present to separate the last matched word chars from the...
(?:TV|chocolate|fire) - matches the trailing delimiter
You need to add the whitespace after the I. Otherwise it wouldn´t capture the whole sentence.
I(\b[a-zA-Z ]*\b){0,5}(TV|chocolate|fire)
I greate site to test regex expressions is regexr
If you don't care about the spaces, use:
/I(\s[a-zA-Z]*\s?){0,5}(TV|chocolate|fire)/
Try
/I\s+(?:\w+\s+){0,5}(TV|chocolate|fire)/
(Test here)
Based on Stefan Kert version, but rely on right side spaces of each extra word instead of word boundaries.
It also accepts any valid "word" (\w) character words of any length and any valid spacing character (not caring for repetitions).
For the address field, I need first character of every word to be uppercase. I have been using /\b./g which has eventually resulted in a problem where first character after special characters such as !#*&;' and so on are also capitalised. ie. King'S Street instead of King's Street.
Is there a way to adjust that expression to exclude that behaviour or is changing the entire expression more optimal?
replace \b with (^|[ ])
Your regex will be: /(^|[ ])./g
Explanation:
\b by definition: is used to find a match at the beginning or end of a word.
(^|[ ]) will match with the beginning of the string or any space characters
(^|[ ]). will match every space followed by a character and the first character of the string.
Side note:
Use (^|\s) to match every blank spaces.
Your regex will be: /(^|\s)./g
You could use a lookahead:
\b[a-z](?=\w+)
See a demo on regex101.com.
I am trying to match smileys followed by a word boundary \b.
Let's say I wanna match :p and :) followed by \b.
/(:p)\b/ is working fine but why is /(:\))\b/ behaving the opposite?
You cannot use a word boundary here as ) is a non-word character.
Simply put: \b allows you to perform a whole words only search using
a regular expression in the form of \bword\b. A word character is a
character that can be used to form words. All characters that are not
word characters are non-word characters.
Use (:\)) to match :) and capture it in the first capturing group.
Use /(:\))(?![a-z0-9_])/i in order to avoid matching any :)s with letters after the smiley. It is an equivalent of (:\))\B.
\B is the negated version of \b. \B matches at every position where \b
does not. Effectively, \B matches at any position between two word
characters as well as at any position between two non-word characters.
See demo 1 and demo 2.
Addition to stribizhev's answer.. you can use (:\))\B
Examples for when to use what:
\b : string = That man is batman. regex = \bman\b matches only man and not the man in batman because position between tm is not a word boundary (it is a word).
\B : string = I am bat-man and he is super - man. regex = \B-\B matches - in super - man whereas \b-\b matches - in bat-man since position between t- and -m are word boundaries.. and (space) -, - (space) is not.
Note: It is easy to understand if you consider \b or \B as a position between two characters and if the transition from character to character is word to word or word to non word