I'm in the midst of figuring out the RegExp function capabilities and trying to create this string:
api/seaarch/redirect/complete/127879/2013-11-27/4-12-2013/7/2/0/0/LGW/null
from this:
/api/search/redirect/complete/127879/2013-11-27/4-12-2013/7/2/0/0/undefined/undefined/undefined/undefined/undefined/undefined/undefined/undefined/undefined/^^undefined^undefined^undefined^undefined^undefined/undefined^undefined^undefined^undefined^undefined^undefined^undefined/undefined^undefined^undefined^undefined^undefined^undefined^undefined/undefined^undefined^undefined^undefined^undefined^undefined^undefined/LGW/null
I know \bundefined\b\^ removes 'undefined^' and undefined\b\/ removes 'undefined/' but how do i combine these together?
In this case, since the ^ or / follow the undefined in the same place, you can use a character class:
str = str.replace(/\bundefined[/^]+/g, '');
Note: ^ is special both inside and outside of character classes, but in different ways. Inside a character class ([...]), it's only special if it's the first character, hence my not making it the first char above.
Also note the + at the end, saying "one or more" of the ^ or /. Without that, since there are a couple of double ^^, you end up with ^ in the result.
If you want to be a bit paranoid (and I admit I probably would be), you could escape the / within the character class. For me it works if you don't, with Chrome's implementation, but trying to follow the pattern definition in the spec is...tiresome...so I honestly don't know if there's any implementation out there that would try to end the expression as of the / in the character class. So with the escape:
str = str.replace(/\bundefined[\/^]+/g, '');
Related
I'm sitting in on some technical interviews at the moment and we ask a question about checking that curly brackets are balanced (same number of opens/closes and that closes never proceed their matching open) within a string - asking people to write a small function to verify this.
A few candidates have considered trying to use Regex to solve this, then given up pretty quickly. I decided I wanted to give it a try, to see if it was possible. I'm currently using the following test strings:
Pass
{(function(r){ return r; })()}
{}{}{}{}
{{{{}}}}
Fail
}{
{{}}}
{{{}}
I thought the following regex would work [^{}]*({[^{}]*})*[^{}]*. The idea was to match non-curly bracket characters , then match { then non-curly bracket characters, then }, repeating the bracketed match, and then finish with any non-curly bracket characters.
I seem to be getting an infinite error when using regexr.com though and I don't understand why:
Can anyone explain what is causing this exactly?
You are getting an infinite error because your regex can match any text. Since all of your groups are marked with *, they are all considered optional (* matches zero or more occurrences). This means that the engine can find zero occurrences of any group in your pattern and still consider the text a match.
Consider marking at least one group with +, which means "one or more" rather than "zero or more". Try this pattern:
[^{}]*(\{[^{}]*\})+[^{}]*
This way, the engine has some kind of restriction that it must match for your pattern to be accepted.
NOTE: it is also wise to escape { and } when not in a character block ([]). I have done this in the pattern above. Regexr.com doesn't seem to care, but some engines might produce a parse error without them.
Update
This is regex you need (one level):
\{{0,1}[^\{\}]*\{[^\{\}]*\}[^\{\}]*\}{0,1}
I cannot write a nested rule, but you can say how many levels you need.
Explanation:
| - means alternative (a|b - matches "a" or "b" literally)
^ - means start of line (^a - matches any string that begin with "a" literally, "a taxi", "apple" ...)
$ - means end of line (a$ - matches any string that end with "a" literally, "umbrella")
\{ - matches character: "{" literally
\( - matches character: "(" literally
I am trying to combine:
^[a-zA-Z.][a-zA-Z'\\- .]*$
with
(\W|^)first\sname(\W|$)
which should check for the exact phrase, first name, if that is correct. It should match either the first regex OR the second exact match. I tried this, but appears invalid:
^(([a-zA-Z.][a-zA-Z'\\- .]*$)|((\W|^)first\sname(\W|$))
This is in javascript btw.
Combining regular expressions generally can be done simply in the following way:
Regex1 + Regex2 = (Regex1|Regex2)
^[a-zA-Z.][a-zA-Z'\\- .]*$
+ (\W|^)first\sname(\W|$) =
(^[a-zA-Z.][a-zA-Z'\\- .]*$|(\W|^)first\sname(\W|$))
Because some SO users have a hard time understand the math analogy, here's a full word explanation.
If you have a regex with content REGEX1 and a second regex with content REGEX2 and you want to combine them in the way that was described by OP in his question, a simple way to do this without optimization is the following.
(REGEX1|REGEX2)
Where you surround both regular expressions with parenthesis and divide the two with |.
Your regex would be the following:
(^[a-zA-Z.][a-zA-Z'\\- .]*$|(\W|^)first\sname(\W|$))
Your first regex has an error in it, though, that makes it invalid. Try this instead.
(^[a-zA-Z.][a-zA-Z'\- .]*$|(\W|^)first\sname(\W|$))
You had \\ in the second character class where you wanted \
The problem is that the first regex is messed up. You don't need to double escape characters. Therefore
\\-
Will match an ascii character between \(92) and (32). Remove one of the slashes.
Reference
I was learning regular expression, It seems very much confusing to me for now.
val.replace(/^[^a-zA-Z0-9]*|[^a-zA-Z0-9]*$/g, '');
In the above expression
1) which part denotes not to include white space? as i am trying to exclude all non alphanumeric characters.
2) Since i don't want to use even '$' and ''(underscore) can i specify '$' & ''(underscore) in expression something like below?
val.replace(/^[^a-zA-Z0-9$_]*|[^a-zA-Z0-9$_]*/g, '');?
3) As 'x|y' specify that - "Find any of the alternatives specified". Then Why we have used something like this [^a-zA-Z0-9]|[^a-zA-Z0-9] which is same on both sides?
Please help me understand this, Finding it bit confused and difficult.
This regular expression replaces all starting and trailing non alphanumeric characters from the string.
It doesn't specifically specifies whitespace. It just negates every thing other than alphanumeric characters. Whatever inside square bracket is a character set - [Whatever]. A starting cap(^) INSIDE the character set says its a negation. So [^a-zA-Z0-9]* says zero or more characters which are other than a-z, A-z or 0-9.
The $ sign at the end says, to the end of string and nothing to do with $ and _ symbols. That will be already included in the character set as it all non alpha numeric characters.
Refer answer of #smathy.
Also just FYI, AFAIU regular expression can't be learned by scrolling a tutorial. You just need to go through the basics and try out the examples.
Some basic info.
When you read regular expressions, you read them from left to right. That's how the engine does it.
This is important in the case of alternations as the one on the left side(s) are always tried first.
But in the case of a $ (EOL or EOS) anchor, it might be easier to read from right to left.
Built-in assertions like line break anchors ^$ and word boundry \b along with normal assertions look ahead (?=)(?!) and look behind (?<=)(?<!), do not consume characters.
They are like single path in-line conditionals that pass or fail, where only if it passes will the expression to the right of it be examined. So they do actually Match something, they match a condition.
Format your regex so you can see what its doing. (Use a app to help you RegexFormat 5)
^ # BOS
[^a-zA-Z0-9]* # Optional not any alphanum chars
| # or,
[^a-zA-Z0-9]* # Optional not any alphanum chars
$ # EOS
Your regex in global context will always match twice, once at the beginning of the string, once at the end because of the line break anchors and because you don't actually require anything else to match.
So basically you should avoid trying to match (mix) all optional things with the built-in anchors ^$\b. That means your regex is better represented by ^[^a-zA-Z0-9]+|[^a-zA-Z0-9]+$ since you don't care if its NOT there (in the case of *, zero or more quantifier).
Good Luck, keep studying.
To answer your third question, the alternatives run all the way to the //s, so both sides are not the same. In the original regex the left alternative is "all non alphanumerics at the start of the string" and the right alternative is "all non alphanumerics at the end of the string".
I have been trying to use a regexp that matches any text that is between a caret, less than and a greater than, caret.
So it would look like: ^< THE TEXT I WANT SELECTED >^
I have tried something like this, but it isn't working: ^<(.*?)>^
I'm assuming this is possible, right? I think the reason I have been having such a tough time is because the caret serves as a quantifier. Thanks for any help I get!
Update
Just so everyone knows, they following from am not i am worked
/\^<(.*?)>\^/
But, it turned out that I was getting html entities since I was getting my string by using the .innerHTML property. In other words,
> ... >
< ... <
To solve this, my regexp actually looks like this:
\^<(.*?)((.|\n)*)>\^
This includes the fact that the string in between should be any character or new line. Thanks!
You need to escape the ^ symbol since it has special meaning in a JavaScript regex.
/\^<(.*?)>\^/
In a JavaScript regex, the ^ means beginning of the string, unless the m modifier was used, in which case it means beginning of the line.
This should work:
\^<(.*?)>\^
In a regex, if you want to use a character that has a special meaning (caret, brackets, pipe, ...), you have to escape it using a backslash. For example, (\w\b)*\w\. will select a sequence of words terminated by a dot.
Careful!
If you have to pass the regex pattern as a string, i.e. there's no regex literal like in javascript or perl, you may have to use a double backslash, which the programming language will escape to a single one, which will then be processed by the regex engine.
Same regex in multiple languages:
Python:
import re
myRegex=re.compile(r"\^<(.*?)>\^") # The r before the string prevents backslash escaping
PHP:
$result=preg_match("/\\^<(.*?)>\\^/",$subject); // Notice the double backslashes here?
JavaScript:
var myRegex=/\^<(.*?)>\^/,
subject="^<blah example>^";
subject.match(myRegex);
If you tell us what programming language you're writing in, we'll be able to give you some finished code to work with.
Edit: Whoops, didn't even notice this was tagged as javascript. Then, you don't have to worry about double backslash at all.
Edit 2: \b represent a word boundary. Though I agree yours is what I would have used myself.
I'm sitting here with "The Good Parts" in hand but I'm still none the wiser.
Can anyone knock up a regex for me that will allow me to replace any instances of "|" and "," from a string.
Also, could anyone point me in the direction of a really good resource for learning regular expressions, especially in javascript (are they a particular flavour??) It really is a weak point in my knowledge.
Cheers.
str.replace(/(\||,)/g, "replaceWith") don't forget the g at the end so it seaches the string globally, if you don't put it the regex will only replace the first instance of the characters.
What is saying is replace | (you need to escape this character) OR(|) ,
Nice Cheatsheet here
The best resource I have found if you really want to understand regular expressions (and the special caveats or quirks of any of a majority of the implementations/flavors) is Regular-Expressions.info.
If you really get into regular expressions, I would recommend the product called RegexBuddy for testing and debugging regular expressions in all sorts of languages (though there are a few things it does not quite support, it is rather good overall)
Edit:
The best way (I think, especially if you consider readability) is using a character class rather than alternation (i.e.: [] instead of |)
use:
var newString = str.replace(/[|,]/g, ";");
This will replace either a | or a , with a semicolon
The character class essentially means "match anything inside these square brackets" - with only a few exceptions.
First, you can specify ranges of characters ([a-zA-Z] means any letter from a to z or from A to Z).
Second, putting a caret (^) at the beginning of the character class negates it - it means anything not in this character class ([^0-9] means any character that is not from 0 to 9).
put the dash at the beginning and the caret at the end of the character class to match those characters literally, or escape them anywhere else in the class with a \ if you prefer