It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I want to find a . and get any characters after I found a (, with a regex. How do i make that happen?
I would also like to see some good tutorials on regexes for Javascript.
Following regex should work for you:
[.]([^(]*)[(]
Text you want to capture will be available in group # 1.
Javascript Code:
var str='here is a sentance. and some other text ( here )';
var match = str.match(/[.]([^(]*)[(]/);
console.log(match[1]); // and some other text
Live Demo: http://www.rubular.com/r/ALqusiC9EQ
It seems like you're having trouble with the concept of escaping. The . and ( characters have special meaning in RegEx, so you need to escape them by placing a \ in front of them. For example, to match a literal dot, you might use \.
For repetition, you can use * or + for 0+ and 1+ respectively. These are used as modifiers on preceding expressions. So, for example, A+ means "one or more A characters", whereas A* means "zero or more A characters". You can also use the ? modifier to alter the "greedy" behavior of these matches, but that's a more complicated topic.
If you need to constrain the exact number of repetitions, you can use the {n} syntax. For example, you might use A{10} to match exactly 10 A characters, or A{3,5} to match between 3 and 5 A characters.
These also work on groups and classes, e.g. [A-Z]{3} or (a*b+){3}.
As far as RegEx tutorials go, pretty much nowhere beats Regular-Expressions.info, though the MDN article on RegEx might be useful on the JavaScript side of things too.
Related
This question already has answers here:
Regular expression to stop at first match
(9 answers)
Closed 2 years ago.
I have this gigantic ugly string:
J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM
J0000010: Project name: E:\foo.pf
J0000011: Job name: MBiek Direct Mail Test
J0000020: Document 1 - Completed successfully
I'm trying to extract pieces from it using regex. In this case, I want to grab everything after Project Name up to the part where it says J0000011: (the 11 is going to be a different number every time).
Here's the regex I've been playing with:
Project name:\s+(.*)\s+J[0-9]{7}:
The problem is that it doesn't stop until it hits the J0000020: at the end.
How do I make the regex stop at the first occurrence of J[0-9]{7}?
Make .* non-greedy by adding '?' after it:
Project name:\s+(.*?)\s+J[0-9]{7}:
Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.
However, consider using a negative character class instead:
Project name:\s+(\S*)\s+J[0-9]{7}:
\S means “everything except a whitespace and this is exactly what you want.
Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.
Here's what I used. s contains your original string. This code is .NET specific, but most flavors of regex will have something similar.
string m = Regex.Match(s, #"Project name: (?<name>.*?) J\d+").Groups["name"].Value;
I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.
One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.
For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.
Available for download at their site:
http://www.ultrapico.com/Expresso.htm
Express download:
http://www.ultrapico.com/ExpressoDownload.htm
(Project name:\s+[A-Z]:(?:\\w+)+.[a-zA-Z]+\s+J[0-9]{7})(?=:)
This will work for you.
Adding (?:\\w+)+.[a-zA-Z]+ will be more restrictive instead of .*
This question already has answers here:
Regular expression to stop at first match
(9 answers)
Closed 2 years ago.
I have this gigantic ugly string:
J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM
J0000010: Project name: E:\foo.pf
J0000011: Job name: MBiek Direct Mail Test
J0000020: Document 1 - Completed successfully
I'm trying to extract pieces from it using regex. In this case, I want to grab everything after Project Name up to the part where it says J0000011: (the 11 is going to be a different number every time).
Here's the regex I've been playing with:
Project name:\s+(.*)\s+J[0-9]{7}:
The problem is that it doesn't stop until it hits the J0000020: at the end.
How do I make the regex stop at the first occurrence of J[0-9]{7}?
Make .* non-greedy by adding '?' after it:
Project name:\s+(.*?)\s+J[0-9]{7}:
Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.
However, consider using a negative character class instead:
Project name:\s+(\S*)\s+J[0-9]{7}:
\S means “everything except a whitespace and this is exactly what you want.
Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.
Here's what I used. s contains your original string. This code is .NET specific, but most flavors of regex will have something similar.
string m = Regex.Match(s, #"Project name: (?<name>.*?) J\d+").Groups["name"].Value;
I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.
One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.
For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.
Available for download at their site:
http://www.ultrapico.com/Expresso.htm
Express download:
http://www.ultrapico.com/ExpressoDownload.htm
(Project name:\s+[A-Z]:(?:\\w+)+.[a-zA-Z]+\s+J[0-9]{7})(?=:)
This will work for you.
Adding (?:\\w+)+.[a-zA-Z]+ will be more restrictive instead of .*
This question already has answers here:
Regular expression to stop at first match
(9 answers)
Closed 2 years ago.
I have this gigantic ugly string:
J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM
J0000010: Project name: E:\foo.pf
J0000011: Job name: MBiek Direct Mail Test
J0000020: Document 1 - Completed successfully
I'm trying to extract pieces from it using regex. In this case, I want to grab everything after Project Name up to the part where it says J0000011: (the 11 is going to be a different number every time).
Here's the regex I've been playing with:
Project name:\s+(.*)\s+J[0-9]{7}:
The problem is that it doesn't stop until it hits the J0000020: at the end.
How do I make the regex stop at the first occurrence of J[0-9]{7}?
Make .* non-greedy by adding '?' after it:
Project name:\s+(.*?)\s+J[0-9]{7}:
Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.
However, consider using a negative character class instead:
Project name:\s+(\S*)\s+J[0-9]{7}:
\S means “everything except a whitespace and this is exactly what you want.
Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.
Here's what I used. s contains your original string. This code is .NET specific, but most flavors of regex will have something similar.
string m = Regex.Match(s, #"Project name: (?<name>.*?) J\d+").Groups["name"].Value;
I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.
One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.
For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.
Available for download at their site:
http://www.ultrapico.com/Expresso.htm
Express download:
http://www.ultrapico.com/ExpressoDownload.htm
(Project name:\s+[A-Z]:(?:\\w+)+.[a-zA-Z]+\s+J[0-9]{7})(?=:)
This will work for you.
Adding (?:\\w+)+.[a-zA-Z]+ will be more restrictive instead of .*
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have absolutly no experience with regular expressions and I need some help setting up one to match a string with. This is for phone number validation. I need to make sure that a string a user inputs has only upper case letters A-Z, numbers 0-9, open/close parentheses[()], and hyphens(-). I also don't know what string method I need to use either match or string.
RegEx is explained poorly all over the web. I don't fault anyone for asking more general questions about it and this is different from the other post which is more do-it-form-me google-evasion than specific question. The characters you asked about:
[A-Z]
[0-9] or \d
\(
\)
-
/matchme/ is a regular expression literal. This is preferable to useing the RegExp constructor because you end up having to escape your escape backslashes which gets real ugly.
You can actually use regEx literals in a lot of string methods, like replace, split, etc.
Without special characters following, any non-special character is about matching one character at that position in a string. Stuff in [] is a class and can match more than one KIND of character but only the character at that positions following the last position matched. You might [.- ] useful for identifying non-number characters for telephone numbers. You can also express ranges in character classes, e.g. [a-hA-H] or [4-9]
But one str position at a time goes out the window when you start using the follow-up characters:
? - one or none
* - 0 or many
+ - 1 or more
Avoid the . wildcard character. It is inefficient. For some reason that I suspect goes down all the way to implementation in assembly for efficiency's sake, it checks against every single possibility rather than the 1-2 teletype whitespace characters it actually doesn't represent and there is no honest use for on a computer. More importantly, the better-performing alternative is much more powerful and helpful. Negating character classes are much faster. [^<]* represents 0 or more positions of anything that is NOT a < character.
Very handy stuff for XML/SGML-style parsing which in spite of what many on Stack have said, is perfectly feasible with regEx, which is no longer technically confined to "regular" languages. You have to be aware of what your looking with something that allows as much sloppiness as somebody else's HTML but that's just a 'duh' in my book.
Crockford warns against negating character classes in JSlint. Crockford is painfully wrong on that count. They are not only much more efficient, they also make it much easier to think through how to tokenize stuff. If there is a security risk, you can set explicit limits to the number of characters matched with {} brackets, e.g. p{2,5} - which matches two to five p chars or {5} for exactly 5 or {,5} for up to 5 or {5,} at least 5 (I think - test those last two)
Other random stuff you should look up:
(ph|f) - ph or f - helpful for finding phish and fish (when a class won't do, basically)
^ - represents beginning of a string - think of as a condition for the next character more than a character itself. Yes, it also negates character classes.
$ - represents end of a string - same caveat as above but on the previous character.
\ - used to escape special symbols. Note: a lot of special symbols that have no meaning in character classes require no \ inside []
\s\w\d - These represent commonly used sets of characters. The first is pretty much all whitespace (js-style escapes typically have regEx equivalents) followed by w for word characters (class equivalent [a-zA-Z0-9_]) and d for digits [0-9]. Capitalize any of these for the exact opposite.
There's more, like back-references, and lookaheads whose use-case scenarios are worth knowing but this is the commonly used stuff I actually remember from regular experience (bwaahaahaa).
I assume you're looking for non US since you have that A-Z concern and I'm sure there's plenty of US phone-numbers regExes out there but I'd probably do something like this for US numbers:
/\(?\d{3}[)\-. ]?\d{3}[\-. ]?\d{4}/
to match:
123-456-7890
(123)456-7890
123.456.7890
123 456 7890
1234567890
But also perhaps messily allows:
(123456.7890
...which I'm willing to live with for the sake of avoiding complexity. Resist the temptation to do it all with one expression. Sometimes it's much cleaner to eliminate trailing/leading whitespace for instance, and then hit something with an expression. Split and join methods are very powerful for tokenizing
If this goes like a usual regEx conversation, somebody will shortly point out something I missed in my pattern. So yeah, test 'em out on stuff. There's sites that let you set the expression and then just plug in characters to try and break them.
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I'm trying to modify and update an old Greasemonkey script with the goal of automatically adding an affiliate ID to all Amazon links. I'm a novice when it comes to JavaScript, but I'm usually pretty good about modifying existing scripts in any language. There's just one line here that I can't wrap my head around.
The script I started with is outdated, so I don't know if there is a problem with the syntax or if the link format has changed. Can somebody please help me understand what this line is doing so I can make changes to it?
const affiliateLink = /(obidos.(ASIN.{12}([^\/]*(=|%3D)[^\/]*\/)*|redirect[^\/]*.(tag=)?))[^\/&]+/i;
Alright, you asked for it :)
Start the regular expression:
/
Start a group operation:
(
Search for the text "obidos" followed by any single character
obidos.
Open another group operator:
(
Search for the text "ASIN" followed by any 12 characters
ASIN.{12}
Another group operation:
(
Followed by 0 or more characters that are not slashes:
[^\/]*
Group operation searching for an '=' character or a url encoded '=' (%3D):
(=|%3D)
Followed by 0 or more characters that are not slashes:
[^\/]*
Followed by slash (and closes the current group), which can be repeated 0 or more times:
\/)*
Allows the pattern to match if the previous group was found OR everything to the right of the bar is matched:
|
Matches the text "redirect" followed by 0 or more chatacters that are not a slash:
redirect[^\/]*
Matches any single character, followed optionally by the text "tag=":
.(tag=)?
Closes the two group operations we're currently still inside of:
))
Followed by one or more characters that are not a slash or &:
[^\/&]+
Closes the regular expression:
/
Download a copy of expresso, its a great utility for this and comes in handy for all this stuff. then just place the regex into that (everything between the starting slashes and ending slash).
I would describe what string it matches e.c.t. but its fairly complex as theres lots of components to it. Its easier for you to look at it yourself. expresso provides a more english explanation of each pattern