If-else condition in Regex [duplicate] - javascript

This is what I have so far...
var regex_string = "s(at)?u(?(1)r|n)day"
console.log("Before: "+regex_string)
var regex_string = regex_string.replace(/\(\?\((\d)\)(.+?\|)(.+?)\)/g,'((?!\\$1)$2\\$1$3)')
console.log("After: "+regex_string)
var rex = new RegExp(regex_string)
var arr = "thursday tuesday thuesday tursday saturday sunday surday satunday monday".split(" ")
for(i in arr){
var m
if(m = arr[i].match(rex)){
console.log(m[0])
}
}
I am swapping (?(n)a|b) for ((?!\n)a|\nb) where n is a number, and a and b are strings. This seems to work fine - however, I am aware that it is a big fat hack.
Is there a better way to approach this problem?

In the specific case of your regex, it is much simpler and more readable to use alternation:
(?:sunday|saturday)
Or you can create alternation only between the 2 positions where the conditional regex is involved (this is more useful in the case where there are many such conditional expressions, but only refers to the nearby capturing group). Using your case as an example, we will only create the alternation for un and atur since only those are involved in the condition:
s(?:un|atur)day
There are 2 common types of conditional regex. (There are more exotic stuffs supported by Perl regular expression, but those requires support for features that JavaScript regular expression or other common regex engine doesn't have).
The first type is where an explicit pattern is provided as condition. This type can be mimicked in JavaScript regex. In the language that supports conditional regex, the pattern will be:
(?(conditional-pattern)yes-pattern|no-pattern)
In JavaScript, you can mimic it with look-ahead, with the (obvious) assumption that the original conditional-pattern is a look-ahead:
((?=conditional-pattern)yes-pattern|(?!conditional-pattern)no-pattern)
The negative look-ahead is necessary, to prevent the cases where the input string passes the conditional-pattern and fail in the yes-pattern, but it can match the no-pattern. It is safe to do so, because positive look-around and negative look-around are exact opposite of each other logically.
The second type is where a reference to a capturing group is provided (name or number), and the condition will be evaluated to true when the capturing group has a match. In such case, there is no simple solution.
The only way I can think of is by duplication, as what I have done with your case as an example. This of course reduces the maintainability. It is possible to compose you regex by writing them in parts (in literal RegExp), retrieve the string with source attribute, then concatenate them together; this will allow for changes to propagate to other duplicated parts, but makes it harder to understand the regex and/or make major modification to it.
References
Alternation Constructs in Regular Expression - .NET - Microsoft
re package in Python: Ctrl+F for (?(
perlre - Perl regular expression: Ctrl+F for (?(

Related

What code is this? /^(\d{4}|\d{6})$/

So I am extremely new to the Javascript world. I was practicing on codewars having to analyze a pin to make sure it only contained numbers and was either 4 or 6 characters. I looked at the most clever code and the answer was:
function validatePIN(pin) {
return /^(\d{4}|\d{6})$/.test(pin)
}
I've never seen the "/^(\d{4}|\d{6})$/" bit before. Could anyone tell me what this is called so I can research it on my own, or give me a breakdown of how it works?
It's a regular expression.
I tend to use http://www.regexpal.com/ when I want to try and find the expression I need, there's also http://regexr.com/ for learning about them (among other resources).
It's a regular expression literal, similar to using return new RegExp('^(\\d{4}|\\d{6})$').test(pin) The "literal" part implies that it's a means of representing a specific data type as a string in code—just like true and 'true' are different, as one is a boolean literal and the other is a string literal.
Specifically, the regex ^(\d{4}|\d{6})$ breaks down to:
^ a string that starts with...
( either
\d a digit (0-9)...
{4} that repeats four times...
| or
\d a digit (0-9)...
{6} that repeats six times...
)
$ and then ends
So: '1234', '123456', etc would match. '123.00', '12345','abc123',' 1234', ' 1234 ' would not match.
As noted by several others in the comments on Draco18s' answer there are several nuances to be aware of with using regex literals in JS:
The literal syntax doesn't require you to escape special characters within the regex pattern. Using the RegExp constructor requires you to represent the pattern as a string, which in turn requires escaping. Note the differences of the \'s between the two syntaxes.
Using a regex literal will treat the regex as a constant, whereas using new RegExp() leaves life cycle management of the regex instance up to you.
The literal notation is compiled and implies a constant regex, whereas the constructor version is reparsed from the string, and so the literal is better optimized/cached. jsperf.com/regexp-literal-vs-constructor/4 Note: you can get basically the same effect by caching the new Regex in a variable, but the literal one is cached at the JIT step – user120242
In other words, using a regex literal can avoid potential performance pitfalls:
Example:
for (var i = 0; i < 1000; i++) {
// Instantiates 1x Regex per iteration
var constructed = new RegExp('^(\\d{4}|\\d{6})$')
// Instantiates 1 Regex
var literal = /^(\d{4}|\d{6})$/
}
Good reference for Javascript RegExp
http://www.regular-expressions.info/javascript.html
^ beginning of line
\d = all digits
{4} = repetition 4 times
| = "or"
$ end of line
your example tests for a 4 digit string or 6 digit string

RegEx false negative with .test()

I'm making a Chrome extension that searches a page for a dollar amount (a number with no more then two decimal places immediately preceded by a "$") then tacks on a bit with how much that value would be in another currency. I found a commonly used regex that matches exactly those parameters.
/^\$?\-?([1-9]{1}[0-9]{0,2}(\,\d{3})*(\.\d{0,2})?|[1-9]{1}\d{0,}(\.\d{0,2})?|0(\.\d{0,2})?|(\.\d{1,2}))$|^\-?\$?([1-9]{1}\d{0,2}(\,\d{3})*(\.\d{0,2})?|[1-9]{1}\d{0,}(\.\d{0,2})?|0(\.\d{0,2})?|(\.\d{1,2}))$|^\(\$?([1-9]{1}\d{0,2}(\,\d{3})*(\.\d{0,2})?|[1-9]{1}\d{0,}(\.\d{0,2})?|0(\.\d{0,2})?|(\.\d{1,2}))\)$/g
so I'm thinking I have a nice headstart. I've only been coding a couple of months and of all the concepts I've encountered, regex's give me the most headache. I test out my shiny new expression with:
var regex = /^\$?\-?([1-9]{1}[0-9]{0,2}(\,\d{3})*(\.\d{0,2})?|[1-9]{1}\d{0,}(\.\d{0,2})?|0(\.\d{0,2})?|(\.\d{1,2}))$|^\-?\$?([1-9]{1}\d{0,2}(\,\d{3})*(\.\d{0,2})?|[1-9]{1}\d{0,}(\.\d{0,2})?|0(\.\d{0,2})?|(\.\d{1,2}))$|^\(\$?([1-9]{1}\d{0,2}(\,\d{3})*(\.\d{0,2})?|[1-9]{1}\d{0,}(\.\d{0,2})?|0(\.\d{0,2})?|(\.\d{1,2}))\)$/g;
var str = "The total it $2.25 Would you like paper or plastic?";
r = regex.test(str);
console.log(r);
and of course that sucker returns false! I tried a few more strings with "2.25" or "$2" or "$2.256" just to be sure and they all returned false.
I am thoroughly stumped. The expression came recommended, I'm using .test() correctly. All I can think of is it's probably some small newbish detail that has nothing to do with regex's.
Thanks for your time.
Your overly complex regular expression is checking the entire string. Remove the ^ and $ which denote the beginning and end of the string, respectively. Then remove the /g flag, which is used to search for multiple matches.
What's wrong with checking for /\$\d+\.\d\d/?
I find http://regex101.com/ to be a helpful resource.

What Regular Expression Can I Use To Find Simple Regular Expressions

If I have a string, which is the source of a regular expression:
"For example, I have (.*) string with (\.d+) special bits (but this is just an aside)."
Is there a way to extract the special parts of the regular expression?
In particular, I'm interested in the parts that will give back values when I call string.match(expr);
Regex can be complicated, but if you do a global regex with ([\.\\]([*a-z])\+?), it will capture your individual fields without including the parenthesis per your request. Demo code as put in this fiddle is below as well.
var testString = 'For example, I have (.*) string with (.d+) special bits (but this is just an aside). (\\w+)';
var regex = /([\.\\]([*a-z])\+?)/gi;
var matches_array = testString.match(regex);
//Outputs the following: [".*", ".d+", "\w+"]
Regular expressions are not powerful enough to recognize the language of matching parentheses. (The formal proof uses the equivalence of regular expressions and finite state machines and the fact that there are infinitely many levels of nesting possible.) Thus, matching the first ) after each ( would make (\d+(\.d+)?) return (\d+(\.d+) and matching the last ) after each ( would make (\w+) (\w+) match the entire string.
The correct way to do this is with recursion (which mathematical regular expressions do not allow, but actual implementations such as PCRE do). You can also get a simple expression for non-nested parentheses. Just be careful to parse escape characters: to be fully robust, \( and \\\( are special, but \\( is not.

Solving regular expression recursive strings

The Problem
I could match this string
(xx)
using this regex
\([^()]*\)
But it wouldn't match
(x(xx)x)
So, this regex would
\([^()]*\([^()]*\)[^()]*\)
However, this would fail to match
(x(x(xx)x)x)
But again, this new regex would
[^()]*\([^()]*\([^()]*\)[^()]*\)[^()]*
This is where you can notice the replication, the entire regex pattern of the second regex after the first \( and before the last \) is copied and replaces the center most [^()]*. Of course, this last regex wouldn't match
(x(x(x(xx)x)x)x)
But, you could always copy replace the center most [^()]* with [^()]*\([^()]*\)[^()]* like we did for the last regex and it'll capture more (xx) groups. The more you add to the regex the more it can handle, but it will always be limited to how much you add.
So, how do you get around this limitation and capture a group of parenthesis (or any two characters for that matter) that can contain extra groups within it?
Falsely Assumed Solutions
I know you might think to just use
\(.*\)
But this will match all of
(xx)xx)
when it should only match the sub-string (xx).
Even this
\([^)]*\)
will not match pairs of parentheses that have pairs nested like
(xx(xx)xx)
From this, it'll only match up to (xx(xx).
Is it possible?
So is it possible to write a regex that can match groups of parentheses? Or is this something that must be handled by a routine?
Edit
The solution must work in the JavaScript implementation of Regular Expressions
If you want to match only if the round brackets are balanced you cannot do it by regex itself..
a better way would be to
1>match the string using \(.*\)
2>count the number of (,) and check if they are equal..if they are then you have the match
3>if they are not equal use \([^()]*\) to match the required string
Formally speaking, this isn't possible using regular expressions! Regular expressions define regular languages, and regular languages can't have balanced parenthesis.
However, it turns out that this is the sort of thing people need to do all the time, so lots of Regex engines have been extended to include more than formal regular expressions. Therefore, you can do balanced brackets with regular expressions in javascript. This article might help get you started: http://weblogs.asp.net/whaggard/archive/2005/02/20/377025.aspx . It's for .net, but the same applies for the standard javascript regex engine.
Personally though, I think it's best to solve a complex problem like this with your own function rather than leveraging the extended features of a Regex engine.

Regular expression: Get the matched value of each match in a Kleene-star expression?

In particular, is this possible with Javascript?
>> "Version 1.2.3.4".match(/\S+ (\d+)(\.\d+)*/)
["Version 1.2.3.4", "1", ".4"]
It's obvious $2 gets set to the last Kleene-"match". Is there no built-in method to retrieve the rest (".2", ".3")?
If this cannot be done easily in JS, could Perl do it?
UPDATE: Many of the answers so far have been "workarounds" which work because of the simplicity of my example. If the part that repeated that I wanted to match was more than just a number, they wouldn't work.
However, a very valid solution does exist: use /expr/g global regex matching: just filter out the parts that repeat and use that. I find this to be somewhat less flexible than the more generally applicable * operator but it will obviously get the job done in most cases.
Regex in JavaScript, like most other regex flavors, only captures the last value of the capturing group if it is matched repeatedly. The only well known regex lib (that I know of) where you get access to all of the previous matched captures is the one in .NET.
So no, you can't do this in JS.
In Perl there are a couple of ways you can accomplish such things. One of the more elegant is probably to use \G (which works in PCRE too).
For example:
"Version 1.2.3.4" =~ /(?:\S+ |\G(?!^)\.)(\d+)/g
Returns (in list context):
(1, 2, 3, 4)
Why not match the whole version string, then split by .?
>> "Version 1.2.3.4".match(/\S+ (\d+(?:\.\d+)*)/)[1].split('.')
Just capture the whole version number string and then split on the period character.
Regex for matching the whole number: /((?:\d+)(?:\.\d+)*)/
Then simply call split on the resulting capture.
Regex \.?\d+ will return you what you need, but you have to run this regex for all matches, not just one...
var n=str.match(/\.?\d+/g);
If you want to match just numbers without leading dot, then go with regex \d+.
var n=str.match(/\d+/g);

Categories

Resources