RegEx for match whole word in sentences - javascript - javascript

Trying to work out what the right RegEx would be for finding "s***" in a series of strings, e.g:
match for "find s*** in s*** foobar"
match for "s***"
don't match for "s******"
don't match for "s****** foobar"
I'm using a match because I want to count the number of instances of matches in the sentence. I was trying "s*{3}" as a starting point, and variations on $ and \b or \B but I can't quite figure it out.
I created some tests here to try it out, if that's helpful.
https://regex101.com/r/VdLyOY/2

You may use this regex with a negative lookahead:
/\bs\*{3}(?!\*)/g
RegEx Demo
or with a positive lookahead:
/\bs\*{3}(?=\s|$)/g
RegEx Details:
\bs: Match letter s after a word bounday
\*{3}: Match * 3 times i.e. ***
(?!\*): Negative lookahead to assert that we don't have a * ahead
(?=\s|$): Positive lookahead to assert that we have a whitespace or line end at next position

/\bs\*{3}(\s|$)/g might work depending on exactly what your criteria are.

Use
/\bs\*{3}\B(?!\*)/g
See proof
EXPLANATION
EXPLANATION
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
s 's'
--------------------------------------------------------------------------------
\*{3} '*' (3 times)
--------------------------------------------------------------------------------
\B the boundary between two word chars (\w)
or two non-word chars (\W)
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\* '*'
--------------------------------------------------------------------------------
) end of look-ahead

Related

Regex to exclude an entire line match if certain characters found

I'm stuck on the cleanest way to accomplish two bits of regex. Every solution I've come up with so far seems clunky.
Example text
Match: Choose: blah blah blah 123 for 100'ish characters, this matches
NoMatch: Choose: blah blah blah 123! for 100'ish characters?, .this potential match fails for the ! ? and .
The first regex (?:^\w+?:)(((?![.!?]).)*)$ needs to:
Match a line containing any word followed by a : so long as !?. are not found in the same line (the word: will always be at the beginning of a line)
Ideally, match every part of the line from the example EXCEPT Choose:. Matching the whole line is still a win.
The second regex ^(^\w+?:)(?:(?![.!?]).)*$ needs to:
Match a line containing any word followed by a : so long as !?. are not found in the same line (the word: will always be at the beginning of a line)
Match only Choose:
The regex is in a greasemonkey/tampermonkey script.
Use
^\w+:(?:(?!.*[.!?])(.*))?
See proof.
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
[.!?] any character of: '.', '!', '?'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
)? end of grouping
Does this do what you want?
(?:^\w+:)((?:(?![!?.]).)*)$
What makes you feel that this is clunky?
(?: ... ) non-capturing group
^ start with
\w+: a series of one or more word characters followed by a :
( ... )$ capturing group that continues to the end
(?: ... )* non-capturing group, repeated zero or more times, with
(?! ... ) negative look-ahead: no following character can be
[!?.] either ?, ! or .
. followed by any character
For the first pattern, you could first check that there is no ! ? or . present using a negative lookahead. Then capture in the first group 1+ word chars and : and the rest of the line in group 2.
^(?![^!?.\n\r]*[!?.])(\w+:)(.*)$
^ Start of string
(?! Negative lookahead, assert what is on the right is not
[^!?.\n\r]*[!?.] Match 0+ times any char except the listed using contrast, then match either ! ? .
) Close lookahead
(\w+:) Capture group 1, match 1+ word chars and a colon
(.*) Capture group 2, match any char except a newline 0+ times
$ End of string
Regex demo
For the second part, if you want a match only for Choose:, you could use the negative lookahead only without a capturing group.
^(?![^!?.\n\r]*[!?.])\w+:
Regex demo

How to match any string that contains no consecutively repeating letter

My regular expression should match if there aren't any consecutive letters that are the same.
for example :
"ploplir" should match
"ploppir" should not match
so I use this regular expression:
/([.])\1{1,}/
But It does the exact contrary of what I want. How can I make the match work correctly?
Code
See regex in use here
\b(?!\w*(\w)\1)\w+\b
var r = /\b(?!\w*(\w)\1)\w+\b/g
var s = "ploplir ploppir"
console.log(s.match(r))
Explanation
\b Assert position as a word boundary
(?!\w*(\w)\1\w*) Negative lookahead ensuring what follows doesn't match
\w* Match any number of word characters
(\w) Capture a word character into capture group 1
\1 Match the same text as most recently matched by the 1st capture group
\w+ Match one or more word characters
\b Assert position as a word boundary
Maybe you could use lookarounds to check if there are no consecutive letters in the string:
^(?!.*(.)(?=\1)).*$
Explanation
From the beginning of the string ^
A negative look ahead (?!
Which asserts that following .* a character (.) is not followed by the same character (?=\1) using the group reference \1
Close the negative lookahead
Match zero or more characters .*
The end of the string

jQuery Regex - Finding All Joined Words

jQuery Regex
/((\b([a-zA-Z]{0,15})\b)([^a-z0-9\$_]))/g
My Attempt So Far: https://regex101.com/r/d3VUpG/1
Example test string:
(options.method==="
|options.method==="
=options.method==="HEAD"
options.method.options.method==="HEAD"
What I'm Trying TO Achieve
Returned as $1 the value of any connected words such as:
options.method - Would = $1
options.method.options.method - Would also = $1
Question
How can I find all words connected with a dot (.) to then wrap in a span like the below example;
.replace(//gi,'<span class="join">$1</span>')
You can use the following expression:
/((?:\w+\.)+\w+)/g
Explanation:
( - Start of capturing group 1
(?: - Start of a non-capturing group
\w+\. - Match [a-zA-Z0-9_] characters one or more times followed by a literal . character
)+ - End of the non-capturing group; match the group one or more times
\w+ - Match [a-zA-Z0-9_] characters one or more times
) - End of capturing group 1
So in other words, the non-capturing group, (?:\w+\.)+, will match a substring like option. one or more times followed by a final \w+ which will match the final word without a literal . character following it. Since there is only one capturing group wrapping everything, you can wrap your span tag around the first group, $1.
Live Example
string.replace(/((?:\w+\.)+\w+)/g, '<span class="join">$1</span>');
As mentioned above, \w includes underscore, numbers and letters ([a-zA-Z0-9_]), so if you only want to match letter characters, then you could swap out \w with [a-z] and use the case-insensitive flag:
/((?:[a-z]+\.)+[a-z]+)/gi

javascript regex exceptions

Here Is My Regex Code:
/fun(niest|!ny$)?/ig
How would I get the word "fun" or "funniest" but not the word "funny" through regex, here is what I have. Is there any way of doing this, if so please help!
You can use word boundaries \b and an optional group (?:niest)?:
/\bfun(?:niest)?\b/ig
See the regex demo
The pattern matches:
\b - leading word boundary
fun - literal character sequence fun
(?:niest)? - an optional (one or zero occurrences) niest literal character sequence (not captured into any group since the group is non-capturing, i.e. used only for grouping)
\b - trailing word boundary.
Your fun(niest|!ny$)? matches fun, or funniest or fun!ny that is at the end of the string.

Match ":)" smiley followed by word boundary

I am trying to match smileys followed by a word boundary \b.
Let's say I wanna match :p and :) followed by \b.
/(:p)\b/ is working fine but why is /(:\))\b/ behaving the opposite?
You cannot use a word boundary here as ) is a non-word character.
Simply put: \b allows you to perform a whole words only search using
a regular expression in the form of \bword\b. A word character is a
character that can be used to form words. All characters that are not
word characters are non-word characters.
Use (:\)) to match :) and capture it in the first capturing group.
Use /(:\))(?![a-z0-9_])/i in order to avoid matching any :)s with letters after the smiley. It is an equivalent of (:\))\B.
\B is the negated version of \b. \B matches at every position where \b
does not. Effectively, \B matches at any position between two word
characters as well as at any position between two non-word characters.
See demo 1 and demo 2.
Addition to stribizhev's answer.. you can use (:\))\B
Examples for when to use what:
\b : string = That man is batman. regex = \bman\b matches only man and not the man in batman because position between tm is not a word boundary (it is a word).
\B : string = I am bat-man and he is super - man. regex = \B-\B matches - in super - man whereas \b-\b matches - in bat-man since position between t- and -m are word boundaries.. and (space) -, - (space) is not.
Note: It is easy to understand if you consider \b or \B as a position between two characters and if the transition from character to character is word to word or word to non word

Categories

Resources