What is wrong with these regexes with backreferences? - javascript

I am trying to make a regex that will match a number of x's that is a power of two. I am using JavaScript. I tried this one:
^(x\1?)$
but it doesn't work. Shouldn't the \1 refer to the outer parathesis so it should match xx, and therefore also xxxx, etc.?
I tried a simpler one that I thought would match x and xx:
^((x)|(\2{2}))$
but this only matches x.
What am I doing wrong?

You can't do "recursive backreferences". At least, it is not so easy.
I'm no sure that you need recuresive regular expressions here. May be you could just count number of the characters in the string and check if it is equal to a power of two?
But if you really need recursive regular expressions (I'm almost sure, you don't),
you can check this question:
Recursive matching with regular expressions in Javascript
and this blog
http://blog.stevenlevithan.com/archives/javascript-match-nested

Related

Regex for brainfuck loops

I'd like to create a regular expression that is able to fetch every loop inside a brainfuck code.
Let's say this code is given:
++++[>+[>,++.]<<-]++[>,.<-]
I want to fetch these three loops (actually it would be sufficient just to fetch the first one):
[>+[>,++.]<<-]
[>,++.]
[,.<-]
My knowledge of regular expressions is pretty weak, so I can't do much more than basics. What I have thought of is this expression:
\[[-+><.,\[\]]*]
\[ - Match the first (opening) bracket
[-+><.,\[\]]* - followed by a number of brainfuck operators
] - followed by a closing bracket
This however matches (obviously) everything between the first opening, and the last closing bracket:
[>+[>,++.]<<-]++[>,.<-]
It might need something to test for the same number of opening and closing brackets inside the loop, before matching the last closing bracket - If that makes any sense.
Maybe a lookaround (I need to use this in javascript, so I can only use lookaheads) is the right way to do this, but I can't figure out how it's supposed to be done.
I had written this one once when I needed to match a pair of square brackets (while handling nesting correctly)
It is a .NET regex that uses some features that aren't available in all regex engines. Here goes:
\[(?>\[(?<d>)|\](?<-d>)|.?)*(?(d)(?!))\]
Regular expressions cannot match infinitely recursing things. Look at the Chomsky hierarchy of languages.
You can write a regular expression matching finitely recursing things by expanding them. For example, this POSIX ERE (tested with egrep) will match brainfuck loops up to nesting depth 3:
(\[[^][]*\]|\[([^][]|\[[^][]*\])*|\[([^][]|\[([^][]|\[[^][]*\])*\])*\])
Use a non-greedy (or lazy) matching:
\[[-+><.,\[\]]*?\]
Notice the ?. Though, it'll match the shortest string between [ and ]. Thus, one of the results would be:
[>+[>,++.]

How to invert an existing regular expression in javascript?

I have created a regex to validate time as follows : ([01]?\d|2[0-3]):[0-5]\d.
Matches TRUE : 08:00, 09:00, 9:00, 13:00, 23:59.
Matches FALSE : 10.00, 24:00, 25:30, 23:62, afdasdasd, ten.
QUESTION
How to invert a javascript regular expression to validate if NOT time?
NOTE - I have seen several ways to do this on stack but cannot seem to make them work for my expression because I do not understand how the invert expression should work.
http://regexr.com?38ai1
ANSWER
Simplest solution was to invert the javascript statement and NOT the regex itself.
if (!(/^(([01]?\d|2[0-3]):[0-5]\d)/.test(obj.value))
Simply adding ! to create an if NOT statement.
A regular expression is usually used for capturing some specific condition(s) - the more specific, the better the regex. What you're looking for is an extremely broad condition to match because just about everything wouldn't be considered "time" (a whitespace, a special character, an alphabet character, etc etc etc).
As suggested in the comments, for what you're trying to achieve, it makes much more sense to look for a time and then check (and negate) the result of that regular expression.
As i mentioned in the comment, the better way is to negate the test rather then create a new regexp that matches any non-time.
However, if you really need the regexp, you could use negative lookahead to match the start of something that is not a time:
/^(?!([01]?\d|2[0-3]):[0-5]\d$)/
DEMO: http://regex101.com/r/bD3aG4
Note that i anchored the regexp (^ and $), which might not work with what you need it for.

Solving regular expression recursive strings

The Problem
I could match this string
(xx)
using this regex
\([^()]*\)
But it wouldn't match
(x(xx)x)
So, this regex would
\([^()]*\([^()]*\)[^()]*\)
However, this would fail to match
(x(x(xx)x)x)
But again, this new regex would
[^()]*\([^()]*\([^()]*\)[^()]*\)[^()]*
This is where you can notice the replication, the entire regex pattern of the second regex after the first \( and before the last \) is copied and replaces the center most [^()]*. Of course, this last regex wouldn't match
(x(x(x(xx)x)x)x)
But, you could always copy replace the center most [^()]* with [^()]*\([^()]*\)[^()]* like we did for the last regex and it'll capture more (xx) groups. The more you add to the regex the more it can handle, but it will always be limited to how much you add.
So, how do you get around this limitation and capture a group of parenthesis (or any two characters for that matter) that can contain extra groups within it?
Falsely Assumed Solutions
I know you might think to just use
\(.*\)
But this will match all of
(xx)xx)
when it should only match the sub-string (xx).
Even this
\([^)]*\)
will not match pairs of parentheses that have pairs nested like
(xx(xx)xx)
From this, it'll only match up to (xx(xx).
Is it possible?
So is it possible to write a regex that can match groups of parentheses? Or is this something that must be handled by a routine?
Edit
The solution must work in the JavaScript implementation of Regular Expressions
If you want to match only if the round brackets are balanced you cannot do it by regex itself..
a better way would be to
1>match the string using \(.*\)
2>count the number of (,) and check if they are equal..if they are then you have the match
3>if they are not equal use \([^()]*\) to match the required string
Formally speaking, this isn't possible using regular expressions! Regular expressions define regular languages, and regular languages can't have balanced parenthesis.
However, it turns out that this is the sort of thing people need to do all the time, so lots of Regex engines have been extended to include more than formal regular expressions. Therefore, you can do balanced brackets with regular expressions in javascript. This article might help get you started: http://weblogs.asp.net/whaggard/archive/2005/02/20/377025.aspx . It's for .net, but the same applies for the standard javascript regex engine.
Personally though, I think it's best to solve a complex problem like this with your own function rather than leveraging the extended features of a Regex engine.

Get Part of String

I am not good at Regular expression and couldn't find an easy way for this problem.
i have an expression like:
TR_NN_Expression
Where NN is a number of 2 digits, and Expression can contain '_', so i can't use split for this. I would like to get the Expression. Any help would be greater appreciated.
You can use this regular expression:
TR_[0-9]{2}_(.*)
The part you want will be in the capturing group. Example usage:
> s = 'TR_01_My##34_Expresion'
"TR_01_My##34_Expresion"
> s.match(/TR_[0-9]{2}_(.*)/)[1]
"My##34_Expresion"
I always use and recommend this tool, It makes our life to easier,
Interactive multi-language regular expression generator
Enjoy!
If the prefix is of fixed length and you know that the strings are of the correct format you can just use substring to accomplish this.
"TR_42_some_expression_here".substring(6) // yields "some_expression_here"
If you have a more complicated situation, regular expressions may be appropriate. The exact expression depends on what you wish to capture.

Match altered version of first match with only one expression?

I'm writing a brush for Alex Gorbatchev's Syntax Highlighter to get highlighting for Smalltalk code. Now, consider the following Smalltalk code:
aCollection do: [ :each | each shout ]
I want to find the block argument ":each" and then match "each" every time it occurrs afterwards (for simplicity, let's say every occurrence an not just inside the brackets).
Note that the argument can have any name, e.g. ":myArg".
My attempt to match ":each":
\:([\d\w]+)
This seems to work. The problem is for me to match the occurrences of "each". I thought something like this could work:
\:([\d\w]+)|\1
But the right hand side of the alternation seems to be treated as an independent expression, so backreferencing doesn't work.
Is it even possible to accomplish what I want in a single expression? Or would I have to use the backreference within a second expression (via another function call)?
You could do it in languages that support variable-length lookbehind (AFAIK only the .NET framework languages do, Perl 6 might). There you could highlight a word if it matches (?<=:(\w+)\b.*)\1. But JavaScript doesn't support lookbehind at all.
But anyway this regex would be very inefficient (I just checked a simple example in RegexBuddy, and the regex engine needs over 60 steps for nearly every character in the document to decide between match and non-match), so this is not a good idea if you want to use it for code highlighting.
I'd recommend you use the two-step approach you mentioned: First match :(\w+)\b (word boundary inserted for safety, \d is implied in \w), then do a literal search for match result \1.
I believe the only thing stored by the Regex engine between matches is the position of the last match. Therefore, when looking for the next match, you cannot use a backreference to the match before.
So, no, I do not think that this is possible.

Categories

Resources