Javascript Regex Conditional - javascript

I have strings like this:
#WTK-56491650H #=> want to capture '56491650H'
#M123456 #=> want to capture 'M123456'
I want to match everything after the # unless there is a dash then I want everything after the dash. I have a feeling I'm close but maybe not. I've found a lot of stuff about javascript regex conditionals and I can never get it to do the if then else part. It only matches after the # and that's it.
This is what I have so far:
/((?=-{1})-(.+)|(?!-{0)#(.+))/
And the demo: https://regex101.com/r/bY0yC6/1

You can use this regex with an optional match to consume everything between # and -:
/#(?:[^-]*-)?([^#-]+)$/mg
Updated RegEx Demo

Here's a solution which uses non-capturing groups (?:stuff) which I prefer so I don't have to dig through the result groups to find the string I'm interested in.
(?:#)(?:[\w\d]+-)?([\w\d]+)
First it throws out the # character, then throws out the stuff up to and including the - character, if it is there, then groups the rest as your match.

With a single regular expression, your full match will always contain the hash and/or dash because you are using it to define an acceptable string, but the groupings of a match can provide you the information that you're looking for.
you want the string to start with a hash so your regex should contain the #
next, you don't want anything before and including a dash (.*-)?, and we add a question mark because this is an optional part (ie if there is no dash)
finally, we can grab everything that is left into a final group, which will be your answer (.*)
the full expression is then #(.*-)?(.*) as pointed out by Lux

Related

Replace a phrase in a string that is being broken up into 2 separate lines [duplicate]

Is there a simple way to ignore the white space in a target string when searching for matches using a regular expression pattern? For example, if my search is for "cats", I would want "c ats" or "ca ts" to match. I can't strip out the whitespace beforehand because I need to find the begin and end index of the match (including any whitespace) in order to highlight that match and any whitespace needs to be there for formatting purposes.
You can stick optional whitespace characters \s* in between every other character in your regex. Although granted, it will get a bit lengthy.
/cats/ -> /c\s*a\s*t\s*s/
While the accepted answer is technically correct, a more practical approach, if possible, is to just strip whitespace out of both the regular expression and the search string.
If you want to search for "my cats", instead of:
myString.match(/m\s*y\s*c\s*a\*st\s*s\s*/g)
Just do:
myString.replace(/\s*/g,"").match(/mycats/g)
Warning: You can't automate this on the regular expression by just replacing all spaces with empty strings because they may occur in a negation or otherwise make your regular expression invalid.
Addressing Steven's comment to Sam Dufel's answer
Thanks, sounds like that's the way to go. But I just realized that I only want the optional whitespace characters if they follow a newline. So for example, "c\n ats" or "ca\n ts" should match. But wouldn't want "c ats" to match if there is no newline. Any ideas on how that might be done?
This should do the trick:
/c(?:\n\s*)?a(?:\n\s*)?t(?:\n\s*)?s/
See this page for all the different variations of 'cats' that this matches.
You can also solve this using conditionals, but they are not supported in the javascript flavor of regex.
You could put \s* inbetween every character in your search string so if you were looking for cat you would use c\s*a\s*t\s*s\s*s
It's long but you could build the string dynamically of course.
You can see it working here: http://www.rubular.com/r/zzWwvppSpE
If you only want to allow spaces, then
\bc *a *t *s\b
should do it. To also allow tabs, use
\bc[ \t]*a[ \t]*t[ \t]*s\b
Remove the \b anchors if you also want to find cats within words like bobcats or catsup.
This approach can be used to automate this
(the following exemplary solution is in python, although obviously it can be ported to any language):
you can strip the whitespace beforehand AND save the positions of non-whitespace characters so you can use them later to find out the matched string boundary positions in the original string like the following:
def regex_search_ignore_space(regex, string):
no_spaces = ''
char_positions = []
for pos, char in enumerate(string):
if re.match(r'\S', char): # upper \S matches non-whitespace chars
no_spaces += char
char_positions.append(pos)
match = re.search(regex, no_spaces)
if not match:
return match
# match.start() and match.end() are indices of start and end
# of the found string in the spaceless string
# (as we have searched in it).
start = char_positions[match.start()] # in the original string
end = char_positions[match.end()] # in the original string
matched_string = string[start:end] # see
# the match WITH spaces is returned.
return matched_string
with_spaces = 'a li on and a cat'
print(regex_search_ignore_space('lion', with_spaces))
# prints 'li on'
If you want to go further you can construct the match object and return it instead, so the use of this helper will be more handy.
And the performance of this function can of course also be optimized, this example is just to show the path to a solution.
The accepted answer will not work if and when you are passing a dynamic value (such as "current value" in an array loop) as the regex test value. You would not be able to input the optional white spaces without getting some really ugly regex.
Konrad Hoffner's solution is therefore better in such cases as it will strip both the regest and test string of whitespace. The test will be conducted as though both have no whitespace.

regex for ng-pattern for filepath

I have arrived at a regex for file path that has these conditions,
Must match regex ^(\\\\[^\\]+\\[^\\]+|https?://[^/]+), so either something like \server\share (optionally followed by one or more "\folder"s), or an HTTP(S) URL
Cannot contain any invalid path name chars( ",<,>, |)
How can i get a single regex to use in angular.js that meets these conditions
Your current regex doesn't seem to match what you want. But given it is correctly doing what you want, then this will add the negation :
^(?!.*[ "<>|])(\\\\[^\\]+\\[^\\]+|https?://[^/]+)
Here we added a negative lookahead to see if any characters are in the string which we will fail the match. If we find none, then the rest of the regular expression will continue.
If I understand your requirements correctly, you could probably do this :
^(?!.*[ "<>|])(\\\\|https?://).*$
This will still not match any invalid characters defined in the negative lookahead, and also meets your criteria of matching one or more path segments, as well as http(s) and is much simpler.
The caviate is that if you require 2 or more path segments, or a trailing slash on the url, than this will not work. This is what your regex seems to suggest.
So in that case this is still somewhat cleaner than the original
^(?!.*[ "<>|])(\\\\[^\\]+\\.|https?://[^/]+/).*$
One more point. You ask to match \server\share, yet your regex opens with \\\\. I have assumed that \server\share should be \\server\share and wrote the regex's accordingly. If this is not the case, then all instances of \\\\ in the examples I gave should be changed to \\
Ok, first the regex, than the explanation:
(?<folderorurl>(?<folder>(\\[^\\\s",<>|]+)+)|(?<url>https?:\/\/[^\s]+))
Your first condition is to match a folder name which must not contain any character from ",<>|" nor a whitespace. This is written as:
[^\s,<>|] # the caret negates the character class, meaning this must not be matched
Additionally, we want to match a folder name optionally followed by another
(sub)folder, so we have to add a backslash to the character class:
[^\\\s,<>|] # added backslash
Now we want to match as many characters as possible but at minimum one, this is what the plus sign is for (+). With this in mind, consider the following string:
\server\folder
At the moment, only "server" is matched, so we need to prepend a backslash, thus "\server" will be matched. Now, if you break a filepath down, it always consists of a backslash + somefoldername, so we need to match backslash + somefoldername unlimited times (but minimum one):
(\\[^\\\s",<>|]+)+
As this is getting somewhat unreadable, I have used a named capturing group ((?<folder>)):
(?<folder>(\\[^\\\s",<>|]+)+)
This will match everything like \server or \server\folder\subfolder\subfolder and store it in the group called folder.
Now comes the URL part. A URL consists of http or https followed by a colon, two forward slashes and "something afterwards":
https?:\/\/[^\s]+ # something afterwards = .+, but no whitespaces
Following the explanation above this is stored in a named group called "url":
(?<folder>(\\[^\\\s",<>|]+)+)
Bear in mind though, that this will match even non-valid url strings (e.g. https://www.google.com.256357216423727...), if this is ok for you, leave it, if not, you may want to have a look at this question here on SO.
Now, last but not least, let's combine the two elements with an or, store it in another named group (folderorurl) and we are done. Simple, right?
(?<folderorurl>(?<folder>(\\[^\\\s",<>|]+)+)|(?<url>https?:\/\/[^\s]+))
Now the folder or a URL can be found in the folderorurl group while still saving the parts in url or folder. Unfortunately, I do know nothing about angular.js but the regex will get you started. Additionally, see this regex101 demo for a working fiddle.
Must match regex ^(\\\\[^\\]+\\[^\\]+|https?://[^/]+), so either something like \\server\share (optionally followed by one or more
"\folder"s), or an HTTP(S) URL
Cannot contain any invalid path name chars( ",<,>, |)
To introduce the second condition in your regex, you mainly just have to include the invalid characters in the negated character sets, e. g. instead of [^/] use [^/"<>|].
Here's a working example with a slightly rearranged regex:
paths = [ '\\server\\share',
'\\\\server\\share',
'\\\\server\\share\\folder',
'http://www.invalid.de',
'https://example.com',
'\\\\<server\\share',
'https://"host.com',
'\\\\server"\\share',
]
for (i in paths)
{
document.body.appendChild(document.createTextNode(paths[i]+' '+
/^\\(\\[^\\"<>|]+){2,}$|^https?:\/\/[^/"<>|]+$/.test(paths[i])))
document.body.appendChild(document.createElement('br'))
}

Unable to find a string matching a regex pattern

While trying to submit a form a javascript regex validation always proves to be false for a string.
Regex:- ^(([a-zA-Z]:)|(\\\\{2}\\w+)\\$?)(\\\\(\\w[\\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
I have tried following strings against it
abc.jpg,
abc:.jpg,
a:.jpg,
a:asdas.jpg,
What string could possible match this regex ?
This regex won't match against anything because of that $? in the middle of the string.
Apparently using the optional modifier ? on the end string symbol $ is not correct (if you paste it on https://regex101.com/ it will give you an error indeed). If the javascript parser ignores the error and keeps the regex as it is this still means you are going to match an end string in the middle of a string which is supposed to continue.
Unescaped it was supposed to match a \$ (dollar symbol) but as it is written it won't work.
If you want your string to be accepted at any cost you can probably use Firebug or a similar developer tool and edit the string inside the javascript code (this, assuming there's no server side check too and assuming it's not wrong aswell). If you ignore the $? then a matching string will be \\\\w\\\\ww.jpg (but since the . is unescaped even \\\\w\\\\ww%jpg is a match)
Of course, I wrote this answer assuming the escaping is indeed the one you showed in the question. If you need to find a matching pattern for the correctly escaped one ^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(\.jpeg|\.JPEG|\.jpg|\.JPG)$ then you can use this tool to find one http://fent.github.io/randexp.js/ (though it will find weird matches). A matching pattern is c:\zz.jpg
If you are just looking for a regular expression to match what you got there, go ahead and test this out:
(\w+:?\w*\.[jpe?gJPE?G]+,)
That should match exactly what you are looking for. Remove the optional comma at the end if you feel like it, of course.
If you remove escape level, the actual regex is
^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
After ^start the first pipe (([a-zA-Z]:)|(\\{2}\w+)\$?) which matches an alpha followed by a colon or two backslashes followed by one or more word characters, followed by an optional literal $. There is some needless parenthesis used inside.
The second part (\\(\w[\w].*))+ matches a backslash, followed by two word characters \w[\w] which looks weird because it's equivalent to \w\w (don't need a character class for second \w). Followed by any amount of any character. This whole thing one or more times.
In the last part (.jpeg|.JPEG|.jpg|.JPG) one probably forgot to escape the dot for matching a literal. \. should be used. This part can be reduced to \.(JPE?G|jpe?g).
It would match something like
A:\12anything.JPEG
\\1$\anything.jpg
Play with it at regex101. A better readable could be
^([a-zA-Z]:|\\{2}\w+\$?)(\\\w{2}.*)+\.(jpe?g|JPE?G)$
Also read the explanation on regex101 to understand any pattern, it's helpful!

Javascript regex: how to not capture an optional string on the right side

For example /(www\.)?(.+)(\.com)?/.exec("www.something.com") will result with 'something.com' at index 1 of the resulting array. But what if we want to capture only 'something' in a capturing group?
Clarifications:
The above string is just for example - we dont want to assume anything about the suffix string (.com above). It could as well be orange.
Just this part can be solved in C# by matching from right to left (I dont know of a way of doing that in JS though) but that will end up having www. included then!
Sure, this problem as such is easily solvable mixing regex with other string methods like replace / substring. But is there a solution with only regex?
(?:www\.)?(.+?)(?:\.com|$)
This will give only something ingroups.Just make other groups non capturing.See demo.
https://regex101.com/r/rO0yD8/4
Just removing the last character (?) from the regex does the trick:
https://regex101.com/r/uR0iD2/1
The last ? allows a valid output without the (\.com) matching anything, so the (.+) can match all the characters after the www..
Another option is to replace the greedy quantifier +, which always tries to match as much characters as possible, with the +?, which tries to match as less characters as possible:
(www\.)?(.+?)(\.com)?$
https://regex101.com/r/oY7fE0/2
Note that it is necessary to force a match with the entire string through the end of line anchor ($).
If you only want to capture "something", use non-capturing groups for the other sections:
/(?:www\.)?(.+)(?:\.com)?/.exec("www.something.com")
The ?: denotes the groups as non-capturing.

silent group not working in javascript regex match()

I'm trying to extract (potentially hyphenated) words from a string that have been marked with a '#'.
So for example from the string
var s = '#moo, #baa and #moo-baa are writing an email to a#bc.de'
I would like to return
['#moo', '#baa', '#moo-baa']
To make sure I don't capture the email address, I check that the group is preceded by a white-space character OR the beginning of the line:
s.match(/(^|\s)#(\w+[-\w+]*)/g)
This seems to do the trick, but it also captures the spaces, which I don't want:
["#moo", " #baa", " #moo-baa"]
Silencing the grouping like this
s.match(/(?:^|\s)#(\w+[-\w+]*)/g)
doesn't seem to work, it returns the same result as before. I also tried the opposite, and checked that there's no \w or \S in front of the group, but that also excludes the beginning of the line. I know I could simply trim the spaces off, but I'd really like to get this working with just a single 'match' call.
Anybody have a suggestion what I'm doing wrong? Thanks a lot in advance!!
[edit]
I also just noticed: Why is it returning the '#' symbols as well?! I mean, it's what I want, but why is it doing that? They're outside of the group, aren't they?
As far as I know, the whole match is returned from String.match when using the "g" modifier. Because, with the modifier you are telling the function to match the whole expression instead of creating numbered matches from sub-expressions (groups). A global match does not return groups, instead the groups are the matches themselves.
In your case, the regular expression you were looking for might be this:
'#moo, #baa and #moo-baa are writing an email to a#bc.de'.match(/(?!\b)(#[\w\-]+)/g);
You are looking for every "#" symbol that doesn't follow a word boundary. So there is no need for silent groups.
If you don't want to capture the space, don't put the \s inside of the parentheses. Anything inside the parentheses will be returned as part of the capture group.

Categories

Resources