Match the same start and end character of a string with Regex - javascript

I'm trying to match the start and end character of a string to be the same vowel. My regex is working in most scenarios, but failing in others:
var re = /([aeiou]).*\1/;
re.test(str);
Sample input:
abcde, output - false (Valid)
abcda, output - true (Valid)
aabcdaa, output - true (Valid)
aeqwae, output - true (Not valid)
ouqweru, output - true (Not valid)

You need to add anchors to your string.
When you have, for example:
aeqwae
You say the output is true, but it's not valid because a is not the same as e. Well, regex simply matches the previous character (before e), which is a. Thus, the match is valid. So, you get this:
[aeqwa]e
The string enclosed in the brackets is the actual match and why it returns true.
If you change your regex to this:
/^([aeiou]).*\1$/
By adding ^, you tell it that the start of the match must be the start of the string and by adding $ you tell it that the end of the match must be the end of the string. This way, if there's a match, the whole string must be matched, meaning that aeqwae will no longer get matched.
A great tool for testing regex is Regex101. Give it a try!
Note: Depending on your input, you might need to set the global (g) or multi-line (m) flag. The global flag prevents regex from returning after the first match. The multi-line flag makes ^ and $ match the start and end of the line (not the string). I used both of them when testing with your input.

Just a different version of #Hristiyan Dodov answer that I have written for fun.
regex = /^(a|e|i|o|u).*\1$/
const strings = ['abcde', 'abcda', 'aabcdaa', 'aeqwae', 'ouqweru']
strings.forEach((e)=>{
const result = regex.test(e)
console.log(e, result)
})

Correct answer is already mentioned above, just for some more clarification:
regEx= /^([a,e,i,o,u])(.*)\1$/
Here, \1 is the backreference to match the same text again, you can reuse the same backreference more than once. Most regex flavors support up to 99 capturing groups and double-digit backreferences. So \99 is a valid backreference if your regex has 99 capturing groups.visit_for_detail

/^([aeiou])[a-z]\1$/
just a bit of improvement, to catch alphabet letters.

Related

Finding words between special characters using Unicode regex

I have a working regular expression which matches the words below.
Input:
(T1.Test)
(AT.Test)
Match:
T1.Test
AT.Test
But when I try replacing /w with unicode \p{L}, the regex does not work properly anymore.
Current expression: /(?:\w+\()+|\b(\p{L}+(?:\.\p{L}+)?)\b(?!')/gu
Input:
(T1.Test)
(AT.Test)
(ワーク.Test)
Match:
Test
Test
Test
How do I make my regex works properly now it has unicode flag?
My expected output should be:
T1.Test
AT.Test
ワーク.Test
First of all \p{L} does not catch numbers, so (T1.Test) will not be matched, while with \w would be.
Your regex is diveded in two big OR parts "1 | 2":
(?:\w+\()+ this non capturing group is matching anything of the shape anyAmmountOfLetter(. If this has success will totally ignore the rest of the regex, I don't know if it was intentional. This for example will trigger your regex: aaa(333.6780) with aaa( as full match, but 0 groups as you are not capturing it.
\b(\p{L}+(?:\.\p{L}+)?)\b(?!') this requires that you start your expression with a word boundary. But \b is valid in between two characters (Regex Tutorial) only if one is a word character an the other is not.
In your case, your starting round bracket will not be matched against the word boundary so (クーク.Test) will not work, but 3クーク.Test) will.
For fix that you can use only the second part (if the first is not really needed for checking something else of what you had shown in the question inputs):
// slight edited, can use digits: (3123.123) => 3123.123
input.match(/[\b]*\(([\d\p{L}]+(?:\.[\d\p{L}]+)?)\)[\b]*(?!')/gu)
// slight edited, must start with letter: (A1.Test) works, (1A.Test) doesn't
input.match(/[\b]*\((\p{L}[\d\p{L}]*(?:\.[\d\p{L}]+)?)\)[\b]*(?!')/gu)
Also the last part \b(?!') is optional for the input cases you gave, but I suppose it is usefull for other purposes.
If you want to keep the regex simple for those inputs, this would also work:
// can use digits: (3123.123) => 3123.123
input.match(/\(([\p{L}\d]+(?:\.[\p{L}\d]+))\)/gu)
// must start with letter: (A1.Test) works, (1A.Test) doesn't
input.match(/\((\p{L}[\p{L}\d]*(?:\.[\p{L}\d]+))\)/gu)

How to extract a string that conforms to a regex?

Say I have a RegEx like the following:
^[a-zA-Z]\w{12}$
And I have the following string:
%7AgTy!5hG^vxWa2#AgW
I would like to "pull" out of that string something that conforms to that regex. In this example we would get the following:
AgTy5hGvxWa2A
Reason: it starts with A because the regex says the first letter must be [a-zA-Z] (so it skips the first 2 characters), and then it pulls successive \ws out until it reaches 12 characters.
Is this sort of thing possible?
Edit: My apologies for being unclear. I'm not looking for a new regular expression that will give the proper output. Rather, I'm looking for a way to use the existing RegEx to extract the proper output. In my program these regular expressions are entered by hand by the user to extract a password from a long base256 hash such that it will conform to these existing password requirement regexes.
Instead of trying to match what you want and reconstructing the string, replace everything you don't want with nothing. This gives the impression that you're extracting what you need, but, in fact, it's doing the opposite; gets rid of everything you don't want to extract. I also dropped $ from the end of your original pattern otherwise it'll never match the string you present in your question.
See regex in use here
^[^a-z]+|\W+
^ Assert position at the start of the line
[^a-z]+ Matches any character that is not in the range a-z one or more times. Since the i flag is specified, this also matches A-Z
\W+ Match any non-word character one or more times
const regex = /^[^a-z]+|\W+/gi
const a = [
`%7AgTy!5hG^vxWa2#AgW`,
`%7AgTy!5hG^vxWa2#`
]
a.forEach(function(s) {
var clean = s.replace(regex, '')
var match = clean.match(/^[a-z]\w{12}/i)
console.log(match)
})

How to match string inside second set of brackets with Regex Javascript

Here is my string:
type_logistics[][delivery]
type_logistics[][random]
type_logistics[][word]
I would like to pull out the word, whatever it is, inside the second set of brackets. I thought that meant doing something like this:
Indicate that the start of the string I want to capture is [ by writing ^\[
Indicate that there will be any number 1+ of characters using [a-z]+
Indicate that the end will be ] by using \]$
The above three steps should get me to [delivery], [random], [word] in which case I'd just wrap the entire regex in a capture parenthesis ()
My finished statement would have been
string.match(/^\[([a-z]+)\]$/)
Have been playing with regex101.com and literally none of my assumptions have worked LOL. Please help?
With ^ you are assuming the String you are checking starts there. Your String starts with type_logistics and not as expected by the regex with a [
To detect the 2nd set of brackets you need to either add the type_logistics[] to the regex or just match everything before the 1st set of brackets with .*
When working with multiple lines (for example during testing on regex101), don't forget to set the modifiers gm
g modifier: global. All matches (don't return on first match) m modifier: multi-line. Causes ^ and $ to match the begin/end of each
line (not only begin/end of string)
These all would work for your test cases
/^.*\[\]\[([a-z]+)\]$/gm
/^type_logistics\[\]\[([a-z]+)\]$/gm
/^.*\[([a-z]+)\]$/gm
Match [ followed by a-z followed by ] , convert back to string, split [ character, filter "" empty string
var str = "type_logistics[][delivery] type_logistics[][random] type_logistics[][word]"
var res = str.match(/(\[[a-z]+)(?=\])/g).join("").split(/\[/).filter(Boolean);
console.log(res);
document.body.textContent = res;

why does ((a(-b)?)(?!Z)) match the a in "a-bZ"?

I want to write a regular Expression that matches
a
a-b
but only if these sequences are not followed by Z
((a(-b)?)(?!Z))
a matches a ok
a-b matches a-b ok
aZ empty ok
a-bZ matches a NOT OK
Why does "a-bZ" match the first a although there is a group around (a(-b)?) ?
How can I correct it?
Need this in javascript RegExp, which should not matter however. Tried it in http://regexpal.com/
a-bZ is matched because (-b)? is ignored and (?!Z) matches the - symbol.
Because (-b) is optional, every string of the form ((a)(?!Z)) also gets matched.
You could match (a(?!Z))|(a-b(?!Z))
However, this will match a-bZ (because a is followed by a non-Z character).
If you want to find all instances of the strings where, for example, a-c doesn't get matched (even though - is a non-Z character), you could do this:
(a(?![-Z]))|(a-b(?!Z))
You could use atomic grouping to make your regex work. Unfortunately, the JavaScript regex engine does not support this feature.
But there is a trick to mimic its effect using a look-ahead and a back-reference (explained here):
(?=(pattern to make atomic))\1
so with your a-b or just a situation, this would become:
(?=(a-b|a))\1(?!Z)
Note that the longer sub-pattern a-b needs to be mentioned first in the group, otherwise it does not work.
The key mechanism is that the look-ahead finds the ealiest, longest-possible sub-match, while the back-reference prevents any backtracking in the engine and moves the position in the string, so the following test (?!Z) can be executed.
If you specify the start and end anchors, the above regex ((a(-b)?)(?!Z)) wouldn't match the string a-bZ, see the demo here. Because the anchors are not specified and the (-b) is made optional, the regex engine try to match a-b anywhere at first and then discards the match on seeing the following Z letter. Now the regex engine backtracks because of the optional -b to get a match. Now it's on a, the letter a is not immediately followed by Z, so the engine now matches the letter a

Filter characters in this RegEx

I have this regular expression to match a valid name: /^['"\s\-.*0-9\u00BF-\u1FFF\u2C00-\uD7FF\w]+$/.test(name)
I'm having trouble figuring out how to transform this match style regex into one designed to filter out invalid characters using replace.
Ideally I would like to be able to take an invalid name in name, run it through the replace to replace any invalid characters, and then have the original test return true no matter what (as invalid characters will be filtered out).
Just use a negated character class by adding a ^ in front:
name.replace(/[^'"\s\-.*0-9\u00BF-\u1FFF\u2C00-\uD7FF\w]/g, "")
Example:
var name = "'41%!\u2000abc";
var sanitized = name.replace(/[^'"\s\-.*0-9\u00BF-\u1FFF\u2C00-\uD7FF\w]/g, "");
console.log(/^['"\s\-.*0-9\u00BF-\u1FFF\u2C00-\uD7FF\w]+$/.test(name)); // false
console.log(/^['"\s\-.*0-9\u00BF-\u1FFF\u2C00-\uD7FF\w]+$/.test(sanitized)); // true
/^['"\s\-.*0-9\u00BF-\u1FFF\u2C00-\uD7FF\w]+$/
The + at the end tells you to match a at least 1 or multiple characters of the types inside the brackets. The ^ at the beginning in combination with the $ at the end tells to match the whole input from its start to its end. So given regex matches a string consisting of only the characters of the set.
What you want is this:
/[^'"\s\-.*0-9\u00BF-\u1FFF\u2C00-\uD7FF\w]/g
[^] means to NOT match whatever is inside the brackets and is the opposite of [].

Categories

Resources