Why my regex not working in this particular case - javascript

I want get out the number after "pw=" between these text
For example
"blablabla pw=0.5 alg=blablalbala"
would get me 0.5
The regex that I used was:
/.*pw=+(.*)\s+alg=.*/g
In the context of javascript, I would then use that regex in match function to get the number:
str.match(/.*pw=+(.*)\s+alg=.*/g)
But in regex101.com, the result of matching and highlight does not match at all(The result showed that the regex is correct while highlight part not)

You should remove the /g global modifier, and I suggest precising your value matching pattern to [\d.]*.
The point is that when a global modifier is used with String#match, all captured group values are discarded.
Use a regex like
str.match(/\bpw=([\d.]*)\s+alg=/)
^^^^^^ ^
Note that you do not need the .* at the start and end of the pattern, String#match does not require the full string match (unlike String#matches() in Java).
var str = 'blablabla pw=0.5 alg=blablalbala';
var m = str.match(/\bpw=([\d.]*)\s+alg=/);
if (m) {
console.log(m[1]);
}

Related

JavaScript - Regex: URL has a param

I'm trying to create an analog for php's isset ($_GET[param]) but for JavaScript.
So long, I get this
[?&]param[&=]
But if param is at the end of URL (example.com?param) this regex won't work.
You can play with it here: https://regex101.com/r/fFeWPW/1
If you want to make sure your match ends with &, = or end of string, you may replace the [&=] character class with a (?:[&=]|$) alternation group that will match &, = or end of string (note that $ cannot be placed inside the character class as it would lose its special meaning there and will be treated as a $ symbol), or you may use a negative lookahead (?![^&=]) that fails the match if there is no & or = immediately after the current location, which might be a bit more efficient than an alternation group.
So, in your case, it will look like
[?&]param(?:[&=]|$)
or
[?&]param(?![^&=])
See a regex demo
JS demo:
var strs = ['http://example.com?param', 'http://example.com?param=123', 'http://example.com?param&another','http://example.com?params'];
var rx = /[?&]param(?![^&=])/;
for (var s of strs) {
console.log(s, "=>", rx.test(s))
}

Match the same start and end character of a string with Regex

I'm trying to match the start and end character of a string to be the same vowel. My regex is working in most scenarios, but failing in others:
var re = /([aeiou]).*\1/;
re.test(str);
Sample input:
abcde, output - false (Valid)
abcda, output - true (Valid)
aabcdaa, output - true (Valid)
aeqwae, output - true (Not valid)
ouqweru, output - true (Not valid)
You need to add anchors to your string.
When you have, for example:
aeqwae
You say the output is true, but it's not valid because a is not the same as e. Well, regex simply matches the previous character (before e), which is a. Thus, the match is valid. So, you get this:
[aeqwa]e
The string enclosed in the brackets is the actual match and why it returns true.
If you change your regex to this:
/^([aeiou]).*\1$/
By adding ^, you tell it that the start of the match must be the start of the string and by adding $ you tell it that the end of the match must be the end of the string. This way, if there's a match, the whole string must be matched, meaning that aeqwae will no longer get matched.
A great tool for testing regex is Regex101. Give it a try!
Note: Depending on your input, you might need to set the global (g) or multi-line (m) flag. The global flag prevents regex from returning after the first match. The multi-line flag makes ^ and $ match the start and end of the line (not the string). I used both of them when testing with your input.
Just a different version of #Hristiyan Dodov answer that I have written for fun.
regex = /^(a|e|i|o|u).*\1$/
const strings = ['abcde', 'abcda', 'aabcdaa', 'aeqwae', 'ouqweru']
strings.forEach((e)=>{
const result = regex.test(e)
console.log(e, result)
})
Correct answer is already mentioned above, just for some more clarification:
regEx= /^([a,e,i,o,u])(.*)\1$/
Here, \1 is the backreference to match the same text again, you can reuse the same backreference more than once. Most regex flavors support up to 99 capturing groups and double-digit backreferences. So \99 is a valid backreference if your regex has 99 capturing groups.visit_for_detail
/^([aeiou])[a-z]\1$/
just a bit of improvement, to catch alphabet letters.

JavaScript RegExp all chracters except dynamic series

So, I'm working on an opensource project as a way to expand my knowledge of JavaScript, and created an utility that processes strings dynamically, and replaces specific occurrences with other strings.
An example of this would be the following:
jdhfkjhs${c1}kdfjh$%^%$S654sgdsjh${c20}SUYTDRF^%$&*#(Y
And assuming I select the character '#', the RegExp processes it to be:
########${c1}####################${c20}###############
The problem I am facing is my RegExp /[^\$\{c\d\}]/g is also matching any of the characters inside of the RegExp, so a string such as _,met$$$$$1234{}cccgg. will be returned as #####$$$$$1234{}ccc###
Is there a way I can catch such a dynamic group with JavaScript, or should I find an alternative way to achieve what I am doing?
For some context, the project code can be found here.
You may match the group and capture it to restore later, and just match any char (with . if no line breaks are expected or with [^] / [\s\S]):
var rx = /(\${c\d+})|./g;
var str = 'jdhfkjhs\${c1}kdfjh\$%^%\$S654sgdsjh\${c20}SUYTDRF^%\$&*#(Y';
var result = str.replace(rx, function ($0,$1) {
return $1 ? $1 : '#';
});
console.log(result);
Details:
(\${c\d+}) - Group 1: a literal ${c substring, then 1+ digits and a literal }
| - or
. - any char but a line break char (or any char if you use [^] or [\s\S]).
In the replacement, $0 stands for the whole match, $1 stands for the contents of the first capturing group. If the $1 is set, it is re-inserted to the resulting string, else, the char is replaced with #.

Finding duplicates with regular expressions, how does this actually work? [duplicate]

I'm a regular expression newbie and I can't quite figure out how to write a single regular expression that would "match" any duplicate consecutive words such as:
Paris in the the spring.
Not that that is related.
Why are you laughing? Are my my regular expressions THAT bad??
Is there a single regular expression that will match ALL of the bold strings above?
Try this regular expression:
\b(\w+)\s+\1\b
Here \b is a word boundary and \1 references the captured match of the first group.
Regex101 example here
I believe this regex handles more situations:
/(\b\S+\b)\s+\b\1\b/
A good selection of test strings can be found here: http://callumacrae.github.com/regex-tuesday/challenge1.html
The below expression should work correctly to find any number of duplicated words. The matching can be case insensitive.
String regex = "\\b(\\w+)(\\s+\\1\\b)+";
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(input);
// Check for subsequences of input that match the compiled pattern
while (m.find()) {
input = input.replaceAll(m.group(0), m.group(1));
}
Sample Input : Goodbye goodbye GooDbYe
Sample Output : Goodbye
Explanation:
The regex expression:
\b : Start of a word boundary
\w+ : Any number of word characters
(\s+\1\b)* : Any number of space followed by word which matches the previous word and ends the word boundary. Whole thing wrapped in * helps to find more than one repetitions.
Grouping :
m.group(0) : Shall contain the matched group in above case Goodbye goodbye GooDbYe
m.group(1) : Shall contain the first word of the matched pattern in above case Goodbye
Replace method shall replace all consecutive matched words with the first instance of the word.
Try this with below RE
\b start of word word boundary
\W+ any word character
\1 same word matched already
\b end of word
()* Repeating again
public static void main(String[] args) {
String regex = "\\b(\\w+)(\\b\\W+\\b\\1\\b)*";// "/* Write a RegEx matching repeated words here. */";
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE/* Insert the correct Pattern flag here.*/);
Scanner in = new Scanner(System.in);
int numSentences = Integer.parseInt(in.nextLine());
while (numSentences-- > 0) {
String input = in.nextLine();
Matcher m = p.matcher(input);
// Check for subsequences of input that match the compiled pattern
while (m.find()) {
input = input.replaceAll(m.group(0),m.group(1));
}
// Prints the modified sentence.
System.out.println(input);
}
in.close();
}
Regex to Strip 2+ duplicate words (consecutive/non-consecutive words)
Try this regex that can catch 2 or more duplicate words and only leave behind one single word. And the duplicate words need not even be consecutive.
/\b(\w+)\b(?=.*?\b\1\b)/ig
Here, \b is used for Word Boundary, ?= is used for positive lookahead, and \1 is used for back-referencing.
Example
Source
The widely-used PCRE library can handle such situations (you won't achieve the the same with POSIX-compliant regex engines, though):
(\b\w+\b)\W+\1
Here is one that catches multiple words multiple times:
(\b\w+\b)(\s+\1)+
No. That is an irregular grammar. There may be engine-/language-specific regular expressions that you can use, but there is no universal regular expression that can do that.
This is the regex I use to remove duplicate phrases in my twitch bot:
(\S+\s*)\1{2,}
(\S+\s*) looks for any string of characters that isn't whitespace, followed whitespace.
\1{2,} then looks for more than 2 instances of that phrase in the string to match. If there are 3 phrases that are identical, it matches.
Since some developers are coming to this page in search of a solution which not only eliminates duplicate consecutive non-whitespace substrings, but triplicates and beyond, I'll show the adapted pattern.
Pattern: /(\b\S+)(?:\s+\1\b)+/ (Pattern Demo)
Replace: $1 (replaces the fullstring match with capture group #1)
This pattern greedily matches a "whole" non-whitespace substring, then requires one or more copies of the matched substring which may be delimited by one or more whitespace characters (space, tab, newline, etc).
Specifically:
\b (word boundary) characters are vital to ensure partial words are not matched.
The second parenthetical is a non-capturing group, because this variable width substring does not need to be captured -- only matched/absorbed.
the + (one or more quantifier) on the non-capturing group is more appropriate than * because * will "bother" the regex engine to capture and replace singleton occurrences -- this is wasteful pattern design.
*note if you are dealing with sentences or input strings with punctuation, then the pattern will need to be further refined.
The example in Javascript: The Good Parts can be adapted to do this:
var doubled_words = /([A-Za-z\u00C0-\u1FFF\u2800-\uFFFD]+)\s+\1(?:\s|$)/gi;
\b uses \w for word boundaries, where \w is equivalent to [0-9A-Z_a-z]. If you don't mind that limitation, the accepted answer is fine.
This expression (inspired from Mike, above) seems to catch all duplicates, triplicates, etc, including the ones at the end of the string, which most of the others don't:
/(^|\s+)(\S+)(($|\s+)\2)+/g, "$1$2")
I know the question asked to match duplicates only, but a triplicate is just 2 duplicates next to each other :)
First, I put (^|\s+) to make sure it starts with a full word, otherwise "child's steak" would go to "child'steak" (the "s"'s would match). Then, it matches all full words ((\b\S+\b)), followed by an end of string ($) or a number of spaces (\s+), the whole repeated more than once.
I tried it like this and it worked well:
var s = "here here here here is ahi-ahi ahi-ahi ahi-ahi joe's joe's joe's joe's joe's the result result result";
print( s.replace( /(\b\S+\b)(($|\s+)\1)+/g, "$1"))
--> here is ahi-ahi joe's the result
Try this regular expression it fits for all repeated words cases:
\b(\w+)\s+\1(?:\s+\1)*\b
I think another solution would be to use named capture groups and backreferences like this:
.* (?<mytoken>\w+)\s+\k<mytoken> .*/
OR
.*(?<mytoken>\w{3,}).+\k<mytoken>.*/
Kotlin:
val regex = Regex(""".* (?<myToken>\w+)\s+\k<myToken> .*""")
val input = "This is a test test data"
val result = regex.find(input)
println(result!!.groups["myToken"]!!.value)
Java:
var pattern = Pattern.compile(".* (?<myToken>\\w+)\\s+\\k<myToken> .*");
var matcher = pattern.matcher("This is a test test data");
var isFound = matcher.find();
var result = matcher.group("myToken");
System.out.println(result);
JavaScript:
const regex = /.* (?<myToken>\w+)\s+\k<myToken> .*/;
const input = "This is a test test data";
const result = regex.exec(input);
console.log(result.groups.myToken);
// OR
const regex = /.* (?<myToken>\w+)\s+\k<myToken> .*/g;
const input = "This is a test test data";
const result = [...input.matchAll(regex)];
console.log(result[0].groups.myToken);
All the above detect the test as the duplicate word.
Tested with Kotlin 1.7.0-Beta, Java 11, Chrome and Firefox 100.
You can use this pattern:
\b(\w+)(?:\W+\1\b)+
This pattern can be used to match all duplicated word groups in sentences. :)
Here is a sample util function written in java 17, which replaces all duplications with the first occurrence:
public String removeDuplicates(String input) {
var regex = "\\b(\\w+)(?:\\W+\\1\\b)+";
var pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
var matcher = pattern.matcher(input);
while (matcher.find()) {
input = input.replaceAll(matcher.group(), matcher.group(1));
}
return input;
}
As far as I can see, none of these would match:
London in the
the winter (with the winter on a new line )
Although matching duplicates on the same line is fairly straightforward,
I haven't been able to come up with a solution for the situation in which they
stretch over two lines. ( with Perl )
To find duplicate words that have no leading or trailing non whitespace character(s) other than a word character(s), you can use whitespace boundaries on the left and on the right making use of lookarounds.
The pattern will have a match in:
Paris in the the spring.
Not that that is related.
The pattern will not have a match in:
This is $word word
(?<!\S)(\w+)\s+\1(?!\S)
Explanation
(?<!\S) Negative lookbehind, assert not a non whitespace char to the left of the current location
(\w+) Capture group 1, match 1 or more word characters
\s+ Match 1 or more whitespace characters (note that this can also match a newline)
\1 Backreference to match the same as in group 1
(?!\S) Negative lookahead, assert not a non whitespace char to the right of the current location
See a regex101 demo.
To find 2 or more duplicate words:
(?<!\S)(\w+)(?:\s+\1)+(?!\S)
This part of the pattern (?:\s+\1)+ uses a non capture group to repeat 1 or more times matching 1 or more whitespace characters followed by the backreference to match the same as in group 1.
See a regex101 demo.
Alternatives without using lookarounds
You could also make use of a leading and trailing alternation matching either a whitespace char or assert the start/end of the string.
Then use a capture group 1 for the value that you want to get, and use a second capture group with a backreference \2 to match the repeated word.
Matching 2 duplicate words:
(?:\s|^)((\w+)\s+\2)(?:\s|$)
See a regex101 demo.
Matching 2 or more duplicate words:
(?:\s|^)((\w+)(?:\s+\2)+)(?:\s|$)
See a regex101 demo.
Use this in case you want case-insensitive checking for duplicate words.
(?i)\\b(\\w+)\\s+\\1\\b

javascript regex to return letters only

My string can be something like A01, B02, C03, possibly AA18 in the future as well. I thought I could use a regex to get just the letters and work on my regex since I haven't done much with it. I wrote this function:
function rowOffset(sequence) {
console.log(sequence);
var matches = /^[a-zA-Z]+$/.exec(sequence);
console.log(matches);
var letter = matches[0].toUpperCase();
return letter;
}
var x = "A01";
console.log(rowOffset(x));
My matches continue to be null. Am I doing this correctly? Looking at this post, I thought the regex was correct: Regular expression for only characters a-z, A-Z
You can use String#replace to remove all non letters from input string:
var r = 'AA18'.replace(/[^a-zA-Z]+/g, '');
//=> "AA"
Your main issue is the use of the ^ and $ characters in the regex pattern. ^ indicates the beginning of the string and $ indicates the end, so you pattern is looking for a string that is ONLY a group of one or more letters, from the beginning to the end of the string.
Additionally, if you want to get each individual instance of the letters, you want to include the "global" indicator (g) at the end of your regex pattern: /[a-zA-Z]+/g. Leaving that out means that it will only find the first instance of the pattern and then stop searching . . . adding it will match all instances.
Those two updates should get you going.
EDIT:
Also, you may want to use match() rather than exec(). If you have a string of multiple values (e.g., "A01, B02, C03, AA18"), match() will return them all in an array, whereas, exec() will only match the first one. If it is only ever one value, then exec() will be fine (and you also wouldn't need the "global" flag).
If you want to use match(), you need to change your code order just a bit to:
var matches = sequence.match(/[a-zA-Z]+/g);
To return an array of separate letters remove +:
var matches = sequence.match(/[a-zA-Z]/g);
You're confused about what the goal of the other question was: he wanted to check that there were only letters in his string.
You need to remove the anchors ^$, who match respectively the beginning and end of the string:
[a-zA-Z]+
This will match the first of letters in your input string.
If there might be more (ie you want multiple matches in your single string), use
sequence.match(/[a-zA-Z]+/g)
This /[^a-z]/g solves the problem. Look at the example below.
function pangram(str) {
let regExp = /[^a-z]/g;
let letters = str.toLowerCase().replace(regExp, '');
document.getElementById('letters').innerHTML = letters;
}
pangram('GHV 2## %hfr efg uor7 489(*&^% knt lhtkjj ngnm!##$%^&*()_');
<h4 id="letters"></h4>
You can do this:
var r = 'AA18'.replace(/[\W\d_]/g, ''); // AA
Also can be done by String.prototype.split(regex).
'AA12BB34'.split(/(\d+)/); // ["AA", "12", "BB", "34", ""]
'AA12BB34'.split(/(\d+)/)[0]; // "AA"
Here regex divides the giving string by digits (\d+)

Categories

Resources