Combining 2 regexes, one with exact match using OR operator - javascript

I am trying to combine:
^[a-zA-Z.][a-zA-Z'\\- .]*$
with
(\W|^)first\sname(\W|$)
which should check for the exact phrase, first name, if that is correct. It should match either the first regex OR the second exact match. I tried this, but appears invalid:
^(([a-zA-Z.][a-zA-Z'\\- .]*$)|((\W|^)first\sname(\W|$))
This is in javascript btw.

Combining regular expressions generally can be done simply in the following way:
Regex1 + Regex2 = (Regex1|Regex2)
^[a-zA-Z.][a-zA-Z'\\- .]*$
+ (\W|^)first\sname(\W|$) =
(^[a-zA-Z.][a-zA-Z'\\- .]*$|(\W|^)first\sname(\W|$))
Because some SO users have a hard time understand the math analogy, here's a full word explanation.
If you have a regex with content REGEX1 and a second regex with content REGEX2 and you want to combine them in the way that was described by OP in his question, a simple way to do this without optimization is the following.
(REGEX1|REGEX2)
Where you surround both regular expressions with parenthesis and divide the two with |.
Your regex would be the following:
(^[a-zA-Z.][a-zA-Z'\\- .]*$|(\W|^)first\sname(\W|$))
Your first regex has an error in it, though, that makes it invalid. Try this instead.
(^[a-zA-Z.][a-zA-Z'\- .]*$|(\W|^)first\sname(\W|$))
You had \\ in the second character class where you wanted \

The problem is that the first regex is messed up. You don't need to double escape characters. Therefore
\\-
Will match an ascii character between \(92) and (32). Remove one of the slashes.
Reference

Related

Unable to find a string matching a regex pattern

While trying to submit a form a javascript regex validation always proves to be false for a string.
Regex:- ^(([a-zA-Z]:)|(\\\\{2}\\w+)\\$?)(\\\\(\\w[\\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
I have tried following strings against it
abc.jpg,
abc:.jpg,
a:.jpg,
a:asdas.jpg,
What string could possible match this regex ?
This regex won't match against anything because of that $? in the middle of the string.
Apparently using the optional modifier ? on the end string symbol $ is not correct (if you paste it on https://regex101.com/ it will give you an error indeed). If the javascript parser ignores the error and keeps the regex as it is this still means you are going to match an end string in the middle of a string which is supposed to continue.
Unescaped it was supposed to match a \$ (dollar symbol) but as it is written it won't work.
If you want your string to be accepted at any cost you can probably use Firebug or a similar developer tool and edit the string inside the javascript code (this, assuming there's no server side check too and assuming it's not wrong aswell). If you ignore the $? then a matching string will be \\\\w\\\\ww.jpg (but since the . is unescaped even \\\\w\\\\ww%jpg is a match)
Of course, I wrote this answer assuming the escaping is indeed the one you showed in the question. If you need to find a matching pattern for the correctly escaped one ^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(\.jpeg|\.JPEG|\.jpg|\.JPG)$ then you can use this tool to find one http://fent.github.io/randexp.js/ (though it will find weird matches). A matching pattern is c:\zz.jpg
If you are just looking for a regular expression to match what you got there, go ahead and test this out:
(\w+:?\w*\.[jpe?gJPE?G]+,)
That should match exactly what you are looking for. Remove the optional comma at the end if you feel like it, of course.
If you remove escape level, the actual regex is
^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
After ^start the first pipe (([a-zA-Z]:)|(\\{2}\w+)\$?) which matches an alpha followed by a colon or two backslashes followed by one or more word characters, followed by an optional literal $. There is some needless parenthesis used inside.
The second part (\\(\w[\w].*))+ matches a backslash, followed by two word characters \w[\w] which looks weird because it's equivalent to \w\w (don't need a character class for second \w). Followed by any amount of any character. This whole thing one or more times.
In the last part (.jpeg|.JPEG|.jpg|.JPG) one probably forgot to escape the dot for matching a literal. \. should be used. This part can be reduced to \.(JPE?G|jpe?g).
It would match something like
A:\12anything.JPEG
\\1$\anything.jpg
Play with it at regex101. A better readable could be
^([a-zA-Z]:|\\{2}\w+\$?)(\\\w{2}.*)+\.(jpe?g|JPE?G)$
Also read the explanation on regex101 to understand any pattern, it's helpful!

Javascript Pattern Validation

Please can someone assist me with a regular expression to validate a string of this pattern
aaa [bbb]
I want to an input in the format expressed above. aaa & bbb can be any combination of one or more words which could also contain special characters. bbb must be contained inside square brackets ([...]). The full string can have a leading or trailing spaces.
I have tried this:
var re=/\w{0,} \[*\w{0,}\]/
But it returns false on a test string like:
re.test("onc>*!llklk[dd<dfd]")
Your regular expression explicitly requires a space to be present. We can visualise this with Regexper:
This returns false on "onc>*!llklk[dd<dfd]" because there is no space character.
To fix your problem, either use a test string which has a space character, or change your regular expression to not require this character:
var re = /\w{0,}\[*\w{0,}\]/;
re.test("aaa [bbb]");
> true
re.test("onc>*!llklk[dd<dfd]");
> true
You may want to rethink your regular expression though, because as it stands, a single "]" character will pass the test:
re.test("]");
> true
\S+\s*\[\S+?\]
You can try this if you want to allow special characters as well.
http://regex101.com/r/yA1jY6/3
First, to make it easier to read, you could replace {0,} by * as it's the same thing.
Next, \w would not match some symbols like > or *, you can use a . to match any symbol.
Then, like an other answer is saying, you're expecting a space between the two groups (aaa and [bbb]), so your example won't match.
I think this regex is a good starting point (depending on your other requirements).
/.+\[.+\]/
Try it here

Regular expression handle multiple matches like one, how to fix?

I have a regex, and a string that includes some matches for this regex. My regex handle all this matches like it is only one big match (of course I don't want such behaviour), let me show you an example:
My test string (sorry for scribble, but this doesn't matter):
sdfsd -dsf- sdfsdfssdfsfdsfsd -sdfsdf-
my regex in js code:
view.replace(/(\-(.+)\-)/g, '<span style="background-color:yellow">$1</span>');
my result:
sdfsd<span style="background-color:yellow">-dsf- sdfsdfssdfsfdsfsd -sdfsdf-</span>
As you can see each of this strings in the "-" must be enclosed in span, but there is only one span. How I can fix this? (honestly I don't want change my (.+) regex part, which I think might be a problem, but if there is no other way to do this, let me know).
In other words, result must be:
sdfsd<span style="background-color:yellow">-dsf-</span> sdfsdfssdfsfdsfsd <span style="background-color:yellow">-sdfsdf-</span>
Feel free to ask me in the comments, and thanks for your help.
honestly I don't want change my (.+) regex part, which I think might be a problem
Why not, it is actually the source of the problem, you can try the following regex which would work:
/(\-([^-]+)\-)/g
and if you think that dashes - can appear between - and - themselves then you can use the less efficient:
/(\-(.+?)\-)/g
+? causes a lazy match, or in other words after matching the initial -, then .+? matches a single character then it moves control to the following - which tries to match a dash, if it couldn't then .+? reads (consumes) another character from the input and so on until the following - is able to match.
You can try:
view.replace(/-([^-]+)-/g, '<span style="background-color:yellow">$1</span>');

How to replace a substring with open parentheses (

I am a Regex newbie and trying to implement Regex to replace a matching pattern in a string only when it has a ( - open parentheses using Javascript. for example if I have a string
IN(INTERM_LEVEL_IN + (int)X_ID)
I would only like to highlight the first IN( in the string. Not the INTERM_LEVEL_IN (2 ins here) and the int.
What is the Regex to accomplish this?
To match the opening bracket you just need to escape it: IN\(.
For instance, running this in Firebug console:
enter code here"IN(INTERM_LEVEL_IN + (int)X_ID)".replace(/(IN()/, 'test');`
Will result in:
>>> "IN(INTERM_LEVEL_IN + (int)X_ID)".replace(/(IN\()/, 'test');
"testINTERM_LEVEL_IN + (int)X_ID)"
Parenthesis in regular expressions have a special meaning (sub-capture groups), so when you want them to be interpreted literally you have to escape them by with a \ before them. The regular expression IN\( would match the string IN(.
The following should only match IN( at the beginning of a line:
/^IN\(/
The following would match IN( that is not preceded by any alphanumeric character or underscore:
/[a-zA-Z0-9_]IN\(/
And finally, the following would match any instance of IN( no matter what precedes it:
/IN\(/
So, take your pick. If you're interested in learning more about regex, here's a good tutorial: http://www.regular-expressions.info/tutorial.html
You can use just regular old Javascript for regex, a simple IN\( would work for the example you gave (see here), but I suspect your situation is more complicated than that. In which case, you need to define exactly what you are trying to match and what you don't want to match.

JavaScript regular expression replace - why does one work, but this other not?

I grabbed the following JavaScript regular expression replace from another site to strip out some invalid characters:
str = str.replace(/[^\u000D\u00B7\u0020-\u007E\u00A2-\u00A4]/g,'');
However, I noticed it wasn't catching occurrences of \00B7 (the ISO-8859-1 center dot character).
If I did it in two steps however, it works:
str = str.replace(/\u00B7/g,'');
str = str.replace(/[^\u000D\u00B7\u0020-\u007E\u00A2-\u00A4]/g,'');
The 1st replace seems to be included in the 2nd replace. Can somebody explain to me why the 2nd line doesn't work all by itself. Thanks.
The first and second pattern are completely different. Pattern one replaces \u00B7, while the second pattern replaces all characters NOT listed in the pattern. Remove the carat from pattern two and that should fix your issue.
Just to be clear:
/[^\u000D\u00B7\u0020-\u007E\u00A2-\u00A4]/
matches all characters not in the set. So to match \u00B7 (and have it replaced with ''), remove it from the pattern:
/[^\u000D\u0020-\u007E\u00A2-\u00A4]/
The ASCII character set is given at http://www.asciitable.com/, likely that is the set you want to keep. The range \u0020-\u007E covers most the common set that is of interest, the others are typically not wanted.
\u000D is a carriage return, I would investigate whether you really need u00A2, u00A3 and u00A4.

Categories

Resources