I'm trying to validate text with javascript but can find out why it's not working.
I have been using : https://regex101.com/ for testing where it works but in my script it fails
var check = "test"
var pattern = new RegExp('^(?!\.)[a-zA-Z0-9._-]+$(?<!\.)','gmi');
if (!pattern.test(check)) validate_check = false;else validate_check = true;
What i'm looking for is first and last char not a dot, and string may contain [a-zA-Z0-9._-]
But the above check always fails even on the word : test
+$(?<!\.) is invalid in your RegEx
$ will match the end of the text or line (with the m flag)
Negative lookbehind → (?<!Y)X will match X, but only if Y is not before it
What about more simpler RegEx?
var checks = ["test", "1-t.e_s.t0", ".test", "test.", ".test."];
checks.forEach(check => {
var pattern = new RegExp('^[^.][a-zA-Z0-9\._-]+[^.]$','gmi');
console.log(check, pattern.test(check))
});
Your code should look like this:
var check = "test";
var pattern = new RegExp('^[^.][a-zA-Z0-9\._-]+[^.]$','gmi');
var validate_check = pattern.test(check);
console.log(validate_check);
A few notes about the pattern:
You are using the RegExp constructor, where you have to double escape the backslash. In this case with a single backslash, the pattern is ^(?!.)[a-zA-Z0-9._-]+$(?<!.) and the first negative lookahead will make the pattern fail if there is a character other than a newline to the right, that is why it does not match test
If you use the /i flag for a case insensitive match, you can shorten [A-Za-z] to just one of the ranges like [a-z] or use \w to match a word character like in your character class
This part (?<!\.) using a negative lookbehind is not invalid in your pattern, but is is not always supported
For your requirements, you don't have to use lookarounds. If you also want to allow a single char, you can use:
^[\w-]+(?:[\w.-]*[\w-])?$
^ Start of string
[\w-]+ Match 1+ occurrences of a word character or -
(?: Non capture group
[\w.-]*[\w-] Match optional word chars, a dot or hyphen
)? Close non capture group and make it optional
$ End of string
Regex demo
const regex = /^[\w-]+(?:[\w.-]*[\w-])?$/;
["test", "abc....abc", "a", ".test", "test."]
.forEach((s) =>
console.log(`${s} --> ${regex.test(s)}`)
);
The following code only matches MN. How do I get it to match KDMN?
var str = ' New York Stock Exchange (NYSE) under the symbol "KDMN."';
var patt = new RegExp("symbol.+([A-Z]{2,5})");
var res = patt.exec(str);
console.log(res[1]);
You may use a lazy +? quantifier:
/symbol.+?([A-Z]{2,5})/
^
See the regex demo. If you keep the greedy .+, it will match as many characters as possible, and will only leave the minimum 2 chars for the next subpattern.
Or, I'd rather make this a bit more verbose:
/symbol\s+"([A-Z]{2,5})/
See another regex demo. The symbol matches a literal string symbol, \s+ will match 1 or more whitespaces, " will match a double quote, and ([A-Z]{2,5}) will capture 2 to 5 uppercase ASCII letters into Group 1.
I'm a regular expression newbie and I can't quite figure out how to write a single regular expression that would "match" any duplicate consecutive words such as:
Paris in the the spring.
Not that that is related.
Why are you laughing? Are my my regular expressions THAT bad??
Is there a single regular expression that will match ALL of the bold strings above?
Try this regular expression:
\b(\w+)\s+\1\b
Here \b is a word boundary and \1 references the captured match of the first group.
Regex101 example here
I believe this regex handles more situations:
/(\b\S+\b)\s+\b\1\b/
A good selection of test strings can be found here: http://callumacrae.github.com/regex-tuesday/challenge1.html
The below expression should work correctly to find any number of duplicated words. The matching can be case insensitive.
String regex = "\\b(\\w+)(\\s+\\1\\b)+";
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(input);
// Check for subsequences of input that match the compiled pattern
while (m.find()) {
input = input.replaceAll(m.group(0), m.group(1));
}
Sample Input : Goodbye goodbye GooDbYe
Sample Output : Goodbye
Explanation:
The regex expression:
\b : Start of a word boundary
\w+ : Any number of word characters
(\s+\1\b)* : Any number of space followed by word which matches the previous word and ends the word boundary. Whole thing wrapped in * helps to find more than one repetitions.
Grouping :
m.group(0) : Shall contain the matched group in above case Goodbye goodbye GooDbYe
m.group(1) : Shall contain the first word of the matched pattern in above case Goodbye
Replace method shall replace all consecutive matched words with the first instance of the word.
Try this with below RE
\b start of word word boundary
\W+ any word character
\1 same word matched already
\b end of word
()* Repeating again
public static void main(String[] args) {
String regex = "\\b(\\w+)(\\b\\W+\\b\\1\\b)*";// "/* Write a RegEx matching repeated words here. */";
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE/* Insert the correct Pattern flag here.*/);
Scanner in = new Scanner(System.in);
int numSentences = Integer.parseInt(in.nextLine());
while (numSentences-- > 0) {
String input = in.nextLine();
Matcher m = p.matcher(input);
// Check for subsequences of input that match the compiled pattern
while (m.find()) {
input = input.replaceAll(m.group(0),m.group(1));
}
// Prints the modified sentence.
System.out.println(input);
}
in.close();
}
Regex to Strip 2+ duplicate words (consecutive/non-consecutive words)
Try this regex that can catch 2 or more duplicate words and only leave behind one single word. And the duplicate words need not even be consecutive.
/\b(\w+)\b(?=.*?\b\1\b)/ig
Here, \b is used for Word Boundary, ?= is used for positive lookahead, and \1 is used for back-referencing.
Example
Source
The widely-used PCRE library can handle such situations (you won't achieve the the same with POSIX-compliant regex engines, though):
(\b\w+\b)\W+\1
Here is one that catches multiple words multiple times:
(\b\w+\b)(\s+\1)+
No. That is an irregular grammar. There may be engine-/language-specific regular expressions that you can use, but there is no universal regular expression that can do that.
This is the regex I use to remove duplicate phrases in my twitch bot:
(\S+\s*)\1{2,}
(\S+\s*) looks for any string of characters that isn't whitespace, followed whitespace.
\1{2,} then looks for more than 2 instances of that phrase in the string to match. If there are 3 phrases that are identical, it matches.
Since some developers are coming to this page in search of a solution which not only eliminates duplicate consecutive non-whitespace substrings, but triplicates and beyond, I'll show the adapted pattern.
Pattern: /(\b\S+)(?:\s+\1\b)+/ (Pattern Demo)
Replace: $1 (replaces the fullstring match with capture group #1)
This pattern greedily matches a "whole" non-whitespace substring, then requires one or more copies of the matched substring which may be delimited by one or more whitespace characters (space, tab, newline, etc).
Specifically:
\b (word boundary) characters are vital to ensure partial words are not matched.
The second parenthetical is a non-capturing group, because this variable width substring does not need to be captured -- only matched/absorbed.
the + (one or more quantifier) on the non-capturing group is more appropriate than * because * will "bother" the regex engine to capture and replace singleton occurrences -- this is wasteful pattern design.
*note if you are dealing with sentences or input strings with punctuation, then the pattern will need to be further refined.
The example in Javascript: The Good Parts can be adapted to do this:
var doubled_words = /([A-Za-z\u00C0-\u1FFF\u2800-\uFFFD]+)\s+\1(?:\s|$)/gi;
\b uses \w for word boundaries, where \w is equivalent to [0-9A-Z_a-z]. If you don't mind that limitation, the accepted answer is fine.
This expression (inspired from Mike, above) seems to catch all duplicates, triplicates, etc, including the ones at the end of the string, which most of the others don't:
/(^|\s+)(\S+)(($|\s+)\2)+/g, "$1$2")
I know the question asked to match duplicates only, but a triplicate is just 2 duplicates next to each other :)
First, I put (^|\s+) to make sure it starts with a full word, otherwise "child's steak" would go to "child'steak" (the "s"'s would match). Then, it matches all full words ((\b\S+\b)), followed by an end of string ($) or a number of spaces (\s+), the whole repeated more than once.
I tried it like this and it worked well:
var s = "here here here here is ahi-ahi ahi-ahi ahi-ahi joe's joe's joe's joe's joe's the result result result";
print( s.replace( /(\b\S+\b)(($|\s+)\1)+/g, "$1"))
--> here is ahi-ahi joe's the result
Try this regular expression it fits for all repeated words cases:
\b(\w+)\s+\1(?:\s+\1)*\b
I think another solution would be to use named capture groups and backreferences like this:
.* (?<mytoken>\w+)\s+\k<mytoken> .*/
OR
.*(?<mytoken>\w{3,}).+\k<mytoken>.*/
Kotlin:
val regex = Regex(""".* (?<myToken>\w+)\s+\k<myToken> .*""")
val input = "This is a test test data"
val result = regex.find(input)
println(result!!.groups["myToken"]!!.value)
Java:
var pattern = Pattern.compile(".* (?<myToken>\\w+)\\s+\\k<myToken> .*");
var matcher = pattern.matcher("This is a test test data");
var isFound = matcher.find();
var result = matcher.group("myToken");
System.out.println(result);
JavaScript:
const regex = /.* (?<myToken>\w+)\s+\k<myToken> .*/;
const input = "This is a test test data";
const result = regex.exec(input);
console.log(result.groups.myToken);
// OR
const regex = /.* (?<myToken>\w+)\s+\k<myToken> .*/g;
const input = "This is a test test data";
const result = [...input.matchAll(regex)];
console.log(result[0].groups.myToken);
All the above detect the test as the duplicate word.
Tested with Kotlin 1.7.0-Beta, Java 11, Chrome and Firefox 100.
You can use this pattern:
\b(\w+)(?:\W+\1\b)+
This pattern can be used to match all duplicated word groups in sentences. :)
Here is a sample util function written in java 17, which replaces all duplications with the first occurrence:
public String removeDuplicates(String input) {
var regex = "\\b(\\w+)(?:\\W+\\1\\b)+";
var pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
var matcher = pattern.matcher(input);
while (matcher.find()) {
input = input.replaceAll(matcher.group(), matcher.group(1));
}
return input;
}
As far as I can see, none of these would match:
London in the
the winter (with the winter on a new line )
Although matching duplicates on the same line is fairly straightforward,
I haven't been able to come up with a solution for the situation in which they
stretch over two lines. ( with Perl )
To find duplicate words that have no leading or trailing non whitespace character(s) other than a word character(s), you can use whitespace boundaries on the left and on the right making use of lookarounds.
The pattern will have a match in:
Paris in the the spring.
Not that that is related.
The pattern will not have a match in:
This is $word word
(?<!\S)(\w+)\s+\1(?!\S)
Explanation
(?<!\S) Negative lookbehind, assert not a non whitespace char to the left of the current location
(\w+) Capture group 1, match 1 or more word characters
\s+ Match 1 or more whitespace characters (note that this can also match a newline)
\1 Backreference to match the same as in group 1
(?!\S) Negative lookahead, assert not a non whitespace char to the right of the current location
See a regex101 demo.
To find 2 or more duplicate words:
(?<!\S)(\w+)(?:\s+\1)+(?!\S)
This part of the pattern (?:\s+\1)+ uses a non capture group to repeat 1 or more times matching 1 or more whitespace characters followed by the backreference to match the same as in group 1.
See a regex101 demo.
Alternatives without using lookarounds
You could also make use of a leading and trailing alternation matching either a whitespace char or assert the start/end of the string.
Then use a capture group 1 for the value that you want to get, and use a second capture group with a backreference \2 to match the repeated word.
Matching 2 duplicate words:
(?:\s|^)((\w+)\s+\2)(?:\s|$)
See a regex101 demo.
Matching 2 or more duplicate words:
(?:\s|^)((\w+)(?:\s+\2)+)(?:\s|$)
See a regex101 demo.
Use this in case you want case-insensitive checking for duplicate words.
(?i)\\b(\\w+)\\s+\\1\\b
Hello I need help with validation using regular expressions in javascript
I need something like this.
The first character should be a designated character like A, B or C only. and the next 3 characters should be numbers.
example: A123, B345, C234.
D123 is not allowed.
This works for me:
var rgx = /^(?:A|B|C)\d{3}$/;
alert('A123'.match(rgx)); // A123
alert('D123'.match(rgx)); // null
alert('B986'.match(rgx)); // B986
Breakdown:
^ matches the beginning of a string
(?:A|B|C) matches A or B or C but does not capture it
\d{3} matches 3 digits in a row
$ matches the end of the string
Therefore 'A12' would not be valid because there aren't 3 digits, nor would ' A123' because of leading whitespace, nor would 'A123 hello' because the match isn't at the beginning and end of string.
To make it case insensitive, add i after the / at the end of the regex.
Try with this regex:
/^[a-c]\d{3}$/i
I have a password field in one form. Now I have to validate in such a way that the field value should be a 7 digits string along with a number. Otherwise it will return false.
Please help me.
Create regex first
Var regex = /\w{7}\d/i;
var yourvalue=$("#passwordid").value;
regex.test(yourvalue){
return true;
}
else{
return false
}
I’m sure there is a better way, but something like:
if ( /.{7}/.test(str) && /\d/.test(str) ) {
//OK
}
In your javascript you can use the RegExp object.
var regEx = new RegExp(pattern, modifiers);
or more simply:
var pattern = /pattern/modifiers;
E.g.
var password = "abcdefg1";
var pattern = /\w{7}\d/i;
var isMatch = pattern.test(password);
Here are some expressions:
[abc] Find any character between the brackets
[^abc] Find any character not between the brackets
[0-9] Find any digit from 0 to 9
[A-Z] Find any character from uppercase A to uppercase Z
[a-z] Find any character from lowercase a to lowercase z
[A-z] Find any character from uppercase A to lowercase z
[adgk] Find any character in the given set
[^adgk] Find any character outside the given set
(red|blue|green) Find any of the alternatives specified
Metacharacters:
. Find a single character, except newline or line terminator
\w Find a word character
\W Find a non-word character
\d Find a digit
\D Find a non-digit character
\s Find a whitespace character
\S Find a non-whitespace character
\b Find a match at the beginning/end of a word
\B Find a match not at the beginning/end of a word
\0 Find a NUL character
\n Find a new line character
\f Find a form feed character
\r Find a carriage return character
\t Find a tab character
\v Find a vertical tab character
\xxx Find the character specified by an octal number xxx
\xdd Find the character specified by a hexadecimal number dd
\uxxxx Find the Unicode character specified by a hexadecimal number xxxx
Quantifiers
n+ Matches any string that contains at least one n
n* Matches any string that contains zero or more occurrences of n
n? Matches any string that contains zero or one occurrences of n
n{X} Matches any string that contains a sequence of X n's
n{X,Y} Matches any string that contains a sequence of X to Y n's
n{X,} Matches any string that contains a sequence of at least X n's
n$ Matches any string with n at the end of it
^n Matches any string with n at the beginning of it
?=n Matches any string that is followed by a specific string n
?!n