My regular expression should match if there aren't any consecutive letters that are the same.
for example :
"ploplir" should match
"ploppir" should not match
so I use this regular expression:
/([.])\1{1,}/
But It does the exact contrary of what I want. How can I make the match work correctly?
Code
See regex in use here
\b(?!\w*(\w)\1)\w+\b
var r = /\b(?!\w*(\w)\1)\w+\b/g
var s = "ploplir ploppir"
console.log(s.match(r))
Explanation
\b Assert position as a word boundary
(?!\w*(\w)\1\w*) Negative lookahead ensuring what follows doesn't match
\w* Match any number of word characters
(\w) Capture a word character into capture group 1
\1 Match the same text as most recently matched by the 1st capture group
\w+ Match one or more word characters
\b Assert position as a word boundary
Maybe you could use lookarounds to check if there are no consecutive letters in the string:
^(?!.*(.)(?=\1)).*$
Explanation
From the beginning of the string ^
A negative look ahead (?!
Which asserts that following .* a character (.) is not followed by the same character (?=\1) using the group reference \1
Close the negative lookahead
Match zero or more characters .*
The end of the string
Related
I'm trying to write a regex that captures all the numbers in a string BUT only if the string ends with numbers.
I worked out the pattern would require a repeating capture group:
^(\D*(\d+))+$
So
the string starts
there are 0 or more non-digit characters
then 1 or more digits (which we capture)
that pattern repeats until the end of the string
My problem is that it seems that in repeated capture groups you only get the last match returned to you. (demo)
Can anyone show me where I'm going wrong?
You may use this regex with a lookahead:
\d+(?=(?:\w+\d)?\b)
RegEx Demo
RegEx Breakdown:
\d+: Match 1+ digits
(?=: Start Lookahead assertion
(?:\w+\d)?: Optionally match 1 or more word characters followed by a digit
\b: Word boundary
): End Lookahead assertion
I have a series of words I try to capture.
I have the following problem:
The string ends with a fixed set of words
It is not clearly defined how many words the string consists of. However, it should capture all words that start with a upper case letter (German language). Therefore, the left anchor should be the first word starting with lower case.
Example (bold is what I try to capture):
I like Apple Bananas And Cars.
building houses Might Be Salty + Hard said Jessica.
This is the RegEx I tried so far, it only works, if the "non-capture" string does not include any upper case words:
/(?:[a-zäöü]*)([\p{L} +().&]+[Cars|Hard])/gu
You might start the match with an uppercase character allowing German uppercase chars as well, and then optionally repeat matching either words that start with an uppercase character, or a "special character.
Then end the match with an alternation matching either Hard or Cars.
(?<!\S)[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜẞ]*(?:\s+(?:[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜẞ]*|[+()&]))*\s+(?:Hard|Cars)\b
Explanation
(?<!\S) Assert a whitespace boundary to the left to prevent starting the match after a non whitespace char
[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜẞ]* Match a word that starts with an uppercase char
(?: Non capture group to match as a whole part
\s+ Match 1+ whitespace chars
(?: Non capture group
[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜẞ]* Match a word that starts with uppercase
| Or
[+()&] Match one of the "special" chars
) Close the non capture group
)* Close the non capture group and optionally repeat it
\s+ Match 1+ whitespace chars
(?:Hard|Cars) Match one of the alternatives
\b A word boundary to prevent a partial word match
See a regex demo.
Use \p{Lu} for uppercase letters:
(?:[\p{Lu}+()&][\p{L}+()&]* )+(?:Cars|Hard)
See live demo (showing matching umlauted letters and ß).
I have a long text in form of a string.
This text includes a lot of questions that are at the same time the headers of sections.
These headers always start with a number+dot+whitespace character combination and end with a question mark, I am trying to extract these strings.
This is what I've got so far: longString.match(/\d\.\s+[a-zA-Z]+\s\\?/g).
Sure enough this doesn't work.
In your example you use [a-zA-Z]+, but you might extend that to matching 1 or more word characters using \w+
This part at the end of the pattern \s\\? matches an expected whitespace char followed by an optional backslash.
To match multiple words, you can optionally repeat the pattern to match a word preceded by 1 or more whitespace characters.
You one option is to use
\d\.\s+\w+(?:\s+\w+)*\s*\?
Explanation
\d\. Match a single digit (for 1 or digits use \d+)
\s+\w+ Match a . and 1+ whitspace chars and 1+ word chars
(?:\s+\w+)* Optionally repeat 1+ whitspace chars and 1+ word chars
\s*\? Match 0+ whitespace chars and a question mark.
Regex demo
A broader match might be matching at least a single time any char except a question mark or whitespace char after the digit, dot and whitespace:
\d\.\s+[^\s?]+(?:\s+[^\s?]+)*\?
Regex demo
I was writing regex for the following validate a string. I wrote the following regex.
^[^\s]+[a-z]{0,}(?!.* {2})[ a-zA-z]{0,}$
it validates for
No space in beginning.
no two consecutive space allowed.
The problem is it allows a single special character. it should not allow a special character unless it is suffixed or prefixed with alpha-numeric character.
Examples:
# -> not allowed.
#A or A# or A2 or 3A is allowed.
One option is to assert that the string does not contain a single "special" char or 2 special chars next to each other using a negative lookahead.
^(?!.*[^a-zA-Z0-9\s][^a-zA-Z0-9\s])(?!.*(?:^| )[^a-zA-Z0-9\s](?!\S))\S+(?: \S+)*$
Explanation
^ Start of string
(?! Negative lookahead, assert that what is at the right does not contain
.*[^a-zA-Z0-9\s][^a-zA-Z0-9\s] match 2 chars other than a-zA-Z0-9 or a whitespace char next to each other
) Close lookahead
(?! Negative lookahead, assert that what is at the right does not contain
.*(?:^| )[^a-zA-Z0-9\s](?!\S) Match a single char other than a-zA-Z0-9 or a whitespace char
) Close lookahead
\S+(?: \S+)* Match 1+ non whitespace chars and optionally repeat a space and 1+ non whitespace chars
$ End of string
Regex demo
Please omit the '$' symbol from the regex because it represents the end of the sentence.
^[^\s]+[a-z]{0,}(?!.* {2})[ a-zA-z]{0,}
So when applying the above regex to the following, it finds only '# '.
#A A# A2 3A
I am trying to match smileys followed by a word boundary \b.
Let's say I wanna match :p and :) followed by \b.
/(:p)\b/ is working fine but why is /(:\))\b/ behaving the opposite?
You cannot use a word boundary here as ) is a non-word character.
Simply put: \b allows you to perform a whole words only search using
a regular expression in the form of \bword\b. A word character is a
character that can be used to form words. All characters that are not
word characters are non-word characters.
Use (:\)) to match :) and capture it in the first capturing group.
Use /(:\))(?![a-z0-9_])/i in order to avoid matching any :)s with letters after the smiley. It is an equivalent of (:\))\B.
\B is the negated version of \b. \B matches at every position where \b
does not. Effectively, \B matches at any position between two word
characters as well as at any position between two non-word characters.
See demo 1 and demo 2.
Addition to stribizhev's answer.. you can use (:\))\B
Examples for when to use what:
\b : string = That man is batman. regex = \bman\b matches only man and not the man in batman because position between tm is not a word boundary (it is a word).
\B : string = I am bat-man and he is super - man. regex = \B-\B matches - in super - man whereas \b-\b matches - in bat-man since position between t- and -m are word boundaries.. and (space) -, - (space) is not.
Note: It is easy to understand if you consider \b or \B as a position between two characters and if the transition from character to character is word to word or word to non word