silent group not working in javascript regex match() - javascript

I'm trying to extract (potentially hyphenated) words from a string that have been marked with a '#'.
So for example from the string
var s = '#moo, #baa and #moo-baa are writing an email to a#bc.de'
I would like to return
['#moo', '#baa', '#moo-baa']
To make sure I don't capture the email address, I check that the group is preceded by a white-space character OR the beginning of the line:
s.match(/(^|\s)#(\w+[-\w+]*)/g)
This seems to do the trick, but it also captures the spaces, which I don't want:
["#moo", " #baa", " #moo-baa"]
Silencing the grouping like this
s.match(/(?:^|\s)#(\w+[-\w+]*)/g)
doesn't seem to work, it returns the same result as before. I also tried the opposite, and checked that there's no \w or \S in front of the group, but that also excludes the beginning of the line. I know I could simply trim the spaces off, but I'd really like to get this working with just a single 'match' call.
Anybody have a suggestion what I'm doing wrong? Thanks a lot in advance!!
[edit]
I also just noticed: Why is it returning the '#' symbols as well?! I mean, it's what I want, but why is it doing that? They're outside of the group, aren't they?

As far as I know, the whole match is returned from String.match when using the "g" modifier. Because, with the modifier you are telling the function to match the whole expression instead of creating numbered matches from sub-expressions (groups). A global match does not return groups, instead the groups are the matches themselves.
In your case, the regular expression you were looking for might be this:
'#moo, #baa and #moo-baa are writing an email to a#bc.de'.match(/(?!\b)(#[\w\-]+)/g);
You are looking for every "#" symbol that doesn't follow a word boundary. So there is no need for silent groups.

If you don't want to capture the space, don't put the \s inside of the parentheses. Anything inside the parentheses will be returned as part of the capture group.

Related

regex validating if string ends with specific set of words [duplicate]

I'm creating a javascript regex to match queries in a search engine string. I am having a problem with alternation. I have the following regex:
.*baidu.com.*[/?].*wd{1}=
I want to be able to match strings that have the string 'word' or 'qw' in addition to 'wd', but everything I try is unsuccessful. I thought I would be able to do something like the following:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
but it does not seem to work.
replace [wd|word|qw] with (wd|word|qw) or (?:wd|word|qw).
[] denotes character sets, () denotes logical groupings.
Your expression:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
does need a few changes, including [wd|word|qw] to (wd|word|qw) and getting rid of the redundant {1}, like so:
.*baidu.com.*[/?].*(wd|word|qw)=
But you also need to understand that the first part of your expression (.*baidu.com.*[/?].*) will match baidu.com hello what spelling/handle????????? or hbaidu-com/ or even something like lkas----jhdf lkja$##!3hdsfbaidugcomlaksjhdf.[($?lakshf, because the dot (.) matches any character except newlines... to match a literal dot, you have to escape it with a backslash (like \.)
There are several approaches you could take to match things in a URL, but we could help you more if you tell us what you are trying to do or accomplish - perhaps regex is not the best solution or (EDIT) only part of the best solution?

How to extract separate parts of a string with a regex

I'm trying to build a regex that can process the following:
abc
abc-def
where the -def part is optional.
I'm wanting to get capture groups for the "abc", and optional "def" part.
I've tried this (in Javascript) but can't seem to figure out the optional part:
/^(.*)+(-(.*))?$/
It matches both examples but the optional part is contained in the first capture group. This should be simple, but I can't seem to get it right.
You're close, try a ? to make the expression lazy.
/^(.*?)(-(.*))?$/
You can try /^([^-]+)(-(.*))?$/. One issue is that the first + is outside of the capture group which means it'll only match the last character. Secondly, the .* is greedy and will match a -, gobbling all the way to the end of the line.
Runnable example:
console.log("abc-def".match(/^([^-]*)(-(.*))?$/));
console.log("abc".match(/^([^-]*)(-(.*))?$/));
You may not need to capture the substring starting with -, in which case /^([^-]*)(?:-(.*))?$/ could work.

Unable to find a string matching a regex pattern

While trying to submit a form a javascript regex validation always proves to be false for a string.
Regex:- ^(([a-zA-Z]:)|(\\\\{2}\\w+)\\$?)(\\\\(\\w[\\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
I have tried following strings against it
abc.jpg,
abc:.jpg,
a:.jpg,
a:asdas.jpg,
What string could possible match this regex ?
This regex won't match against anything because of that $? in the middle of the string.
Apparently using the optional modifier ? on the end string symbol $ is not correct (if you paste it on https://regex101.com/ it will give you an error indeed). If the javascript parser ignores the error and keeps the regex as it is this still means you are going to match an end string in the middle of a string which is supposed to continue.
Unescaped it was supposed to match a \$ (dollar symbol) but as it is written it won't work.
If you want your string to be accepted at any cost you can probably use Firebug or a similar developer tool and edit the string inside the javascript code (this, assuming there's no server side check too and assuming it's not wrong aswell). If you ignore the $? then a matching string will be \\\\w\\\\ww.jpg (but since the . is unescaped even \\\\w\\\\ww%jpg is a match)
Of course, I wrote this answer assuming the escaping is indeed the one you showed in the question. If you need to find a matching pattern for the correctly escaped one ^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(\.jpeg|\.JPEG|\.jpg|\.JPG)$ then you can use this tool to find one http://fent.github.io/randexp.js/ (though it will find weird matches). A matching pattern is c:\zz.jpg
If you are just looking for a regular expression to match what you got there, go ahead and test this out:
(\w+:?\w*\.[jpe?gJPE?G]+,)
That should match exactly what you are looking for. Remove the optional comma at the end if you feel like it, of course.
If you remove escape level, the actual regex is
^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
After ^start the first pipe (([a-zA-Z]:)|(\\{2}\w+)\$?) which matches an alpha followed by a colon or two backslashes followed by one or more word characters, followed by an optional literal $. There is some needless parenthesis used inside.
The second part (\\(\w[\w].*))+ matches a backslash, followed by two word characters \w[\w] which looks weird because it's equivalent to \w\w (don't need a character class for second \w). Followed by any amount of any character. This whole thing one or more times.
In the last part (.jpeg|.JPEG|.jpg|.JPG) one probably forgot to escape the dot for matching a literal. \. should be used. This part can be reduced to \.(JPE?G|jpe?g).
It would match something like
A:\12anything.JPEG
\\1$\anything.jpg
Play with it at regex101. A better readable could be
^([a-zA-Z]:|\\{2}\w+\$?)(\\\w{2}.*)+\.(jpe?g|JPE?G)$
Also read the explanation on regex101 to understand any pattern, it's helpful!

Javascript Regex: how to simulate "match without capture" behavior of positive lookbehind?

I have a relatively simple regex problem - I need to match specific words in a string, if they are entire words or a prefix. With word boundaries, it would look something like this:
\b(word1|word2|prefix1|prefix2)
However, I can't use the word boundary condition because some words may start with odd characters, e.g. .999
My solution was to look for whitespace or starting token for these odd cases.
(\b|^|\s)(word1|word2|prefix1|prefix2)
Now words like .999 will still get matched correctly, BUT it also captures the whitespace preceding the matched words/prefixes. For my purposes, I can't have it capture the whitespace.
Positive lookbehinds seem to solve this, but javascript doesn't support them. Is there some other way I can get the same behavior to solve this problem?
You can use a non-capturing group using (?:):
/(?:\b|^|\s)(word1|word2|prefix1|prefix2)/
UPDATE:
Based on what you want to replace it with (and #AlanMoore's good point about the \b), you probably want to go with this:
var regex = /(^|\s)(word1|word2|prefix1|prefix2)/g;
myString.replace(regex,"$1<span>$2</span>");
Note that I changed the first group back to a capturing one since it'll be part of the match but you want to keep it in the replacement string (right?). Also added the g modifier so that this happens for all occurrences in the string (assuming thats what you wanted).
Let's get the terminology straight first. A regex normally consumes everything it matches. When you do a replace(), everything that was consumed is overwritten. You can also capture parts of the matched text separately and plug them back in using $1, $2, etc.
When you were using the word boundary you didn't have to worry about this, because \b doesn't consume anything. But now you're consuming the leading whitespace character if there is one, so you have to plug it back in. I don't know what you're replacing the match with, so I'll just replace them with nothing for this demonstration.
result = subject.replace(/(^|\s)(word1|word2|prefix1|prefix2)/g, "$1");
Note that the \b isn't needed any more. In fact, you must remove it, or it will match things like .999 in xyz.999, because \b matches between z and .. I'm pretty sure you don't want that.

How do you use non captured elements in a Javascript regex?

I want to capture thing in nothing globally and case insensitively.
For some reason this doesn't work:
"Nothing thing nothing".match(/no(thing)/gi);
jsFiddle
The captured array is Nothing,nothing instead of thing,thing.
I thought parentheses delimit the matching pattern? What am I doing wrong?
(yes, I know this will also match in nothingness)
If you use the global flag, the match method will return all overall matches. This is the equivalent of the first element of every match array you would get without global.
To get all groups from each match, loop:
var match;
while(match = /no(thing)/gi.exec("Nothing thing nothing"))
{
// Do something with match
}
This will give you ["Nothing", "thing"] and ["nothing", "thing"].
Parentheses or no, the whole of the matched substring is always captured--think of it as the default capturing group. What the explicit capturing groups do is enable you to work with smaller chunks of text within the overall match.
The tutorial you linked to does indeed list the grouping constructs under the heading "pattern delimiters", but it's wrong, and the actual description isn't much better:
(pattern), (?:pattern) Matches entire contained pattern.
Well of course they're going to match it (or try to)! But what the parentheses do is treat the entire contained subpattern as a unit, so you can (for example) add a quantifier to it:
(?:foo){3} // "foofoofoo"
(?:...) is a pure grouping construct, while (...) also captures whatever the contained subpattern matches.
With just a quick look-through I spotted several more examples of inaccurate, ambiguous, or incomplete descriptions. I suggest you unbookmark that tutorial immediately and bookmark this one instead: regular-expressions.info.
Parentheses do nothing in this regex.
The regex /no(thing)/gi is same as /nothing/gi.
Parentheses are used for grouping. If you don't put any reference to groups (using $1, $2) or count for group, the () are useless.
So, this regex will find only this sequence n-o-t-h-i-n-g. The word thing does'nt starts with 'no', so it doen't match.
EDIT:
Change to /(no)?thing/gi and will work.
Will work because ()? indicates a optional part.

Categories

Resources