regex for ng-pattern for filepath - javascript

I have arrived at a regex for file path that has these conditions,
Must match regex ^(\\\\[^\\]+\\[^\\]+|https?://[^/]+), so either something like \server\share (optionally followed by one or more "\folder"s), or an HTTP(S) URL
Cannot contain any invalid path name chars( ",<,>, |)
How can i get a single regex to use in angular.js that meets these conditions

Your current regex doesn't seem to match what you want. But given it is correctly doing what you want, then this will add the negation :
^(?!.*[ "<>|])(\\\\[^\\]+\\[^\\]+|https?://[^/]+)
Here we added a negative lookahead to see if any characters are in the string which we will fail the match. If we find none, then the rest of the regular expression will continue.
If I understand your requirements correctly, you could probably do this :
^(?!.*[ "<>|])(\\\\|https?://).*$
This will still not match any invalid characters defined in the negative lookahead, and also meets your criteria of matching one or more path segments, as well as http(s) and is much simpler.
The caviate is that if you require 2 or more path segments, or a trailing slash on the url, than this will not work. This is what your regex seems to suggest.
So in that case this is still somewhat cleaner than the original
^(?!.*[ "<>|])(\\\\[^\\]+\\.|https?://[^/]+/).*$
One more point. You ask to match \server\share, yet your regex opens with \\\\. I have assumed that \server\share should be \\server\share and wrote the regex's accordingly. If this is not the case, then all instances of \\\\ in the examples I gave should be changed to \\

Ok, first the regex, than the explanation:
(?<folderorurl>(?<folder>(\\[^\\\s",<>|]+)+)|(?<url>https?:\/\/[^\s]+))
Your first condition is to match a folder name which must not contain any character from ",<>|" nor a whitespace. This is written as:
[^\s,<>|] # the caret negates the character class, meaning this must not be matched
Additionally, we want to match a folder name optionally followed by another
(sub)folder, so we have to add a backslash to the character class:
[^\\\s,<>|] # added backslash
Now we want to match as many characters as possible but at minimum one, this is what the plus sign is for (+). With this in mind, consider the following string:
\server\folder
At the moment, only "server" is matched, so we need to prepend a backslash, thus "\server" will be matched. Now, if you break a filepath down, it always consists of a backslash + somefoldername, so we need to match backslash + somefoldername unlimited times (but minimum one):
(\\[^\\\s",<>|]+)+
As this is getting somewhat unreadable, I have used a named capturing group ((?<folder>)):
(?<folder>(\\[^\\\s",<>|]+)+)
This will match everything like \server or \server\folder\subfolder\subfolder and store it in the group called folder.
Now comes the URL part. A URL consists of http or https followed by a colon, two forward slashes and "something afterwards":
https?:\/\/[^\s]+ # something afterwards = .+, but no whitespaces
Following the explanation above this is stored in a named group called "url":
(?<folder>(\\[^\\\s",<>|]+)+)
Bear in mind though, that this will match even non-valid url strings (e.g. https://www.google.com.256357216423727...), if this is ok for you, leave it, if not, you may want to have a look at this question here on SO.
Now, last but not least, let's combine the two elements with an or, store it in another named group (folderorurl) and we are done. Simple, right?
(?<folderorurl>(?<folder>(\\[^\\\s",<>|]+)+)|(?<url>https?:\/\/[^\s]+))
Now the folder or a URL can be found in the folderorurl group while still saving the parts in url or folder. Unfortunately, I do know nothing about angular.js but the regex will get you started. Additionally, see this regex101 demo for a working fiddle.

Must match regex ^(\\\\[^\\]+\\[^\\]+|https?://[^/]+), so either something like \\server\share (optionally followed by one or more
"\folder"s), or an HTTP(S) URL
Cannot contain any invalid path name chars( ",<,>, |)
To introduce the second condition in your regex, you mainly just have to include the invalid characters in the negated character sets, e. g. instead of [^/] use [^/"<>|].
Here's a working example with a slightly rearranged regex:
paths = [ '\\server\\share',
'\\\\server\\share',
'\\\\server\\share\\folder',
'http://www.invalid.de',
'https://example.com',
'\\\\<server\\share',
'https://"host.com',
'\\\\server"\\share',
]
for (i in paths)
{
document.body.appendChild(document.createTextNode(paths[i]+' '+
/^\\(\\[^\\"<>|]+){2,}$|^https?:\/\/[^/"<>|]+$/.test(paths[i])))
document.body.appendChild(document.createElement('br'))
}

Related

How to extract separate parts of a string with a regex

I'm trying to build a regex that can process the following:
abc
abc-def
where the -def part is optional.
I'm wanting to get capture groups for the "abc", and optional "def" part.
I've tried this (in Javascript) but can't seem to figure out the optional part:
/^(.*)+(-(.*))?$/
It matches both examples but the optional part is contained in the first capture group. This should be simple, but I can't seem to get it right.
You're close, try a ? to make the expression lazy.
/^(.*?)(-(.*))?$/
You can try /^([^-]+)(-(.*))?$/. One issue is that the first + is outside of the capture group which means it'll only match the last character. Secondly, the .* is greedy and will match a -, gobbling all the way to the end of the line.
Runnable example:
console.log("abc-def".match(/^([^-]*)(-(.*))?$/));
console.log("abc".match(/^([^-]*)(-(.*))?$/));
You may not need to capture the substring starting with -, in which case /^([^-]*)(?:-(.*))?$/ could work.

How can I match the last part of an email via JavaScript? [duplicate]

Using a regular expression (replaceregexp in Ant) how can I match (and then replace) everything from the start of a line, up to and including the last occurrence of a slash?
What I need is to start with any of these:
../../replace_this/keep_this
../replace_this/replace_this/Keep_this
/../../replace_this/replace_this/Keep_this
and turn them into this:
what_I_addedKeep_this
It seems like it should be simple but I'm not getting it. I've made regular expressions that will identify the last slash and match from there to the end of the line, but what I need is one that will match everything from the start of a line until the last slash, so I can replace it all.
This is for an Ant build file that's reading a bunch of .txt files and transforming any links it finds in them. I just want to use replaceregexp, not variables or properties. If possible.
You can match this:
.*\/
and replace with your text.
DEMO
What you want to do is match greedily, the longest possible match of the pattern, it is default usually, but match till the last instance of '/'.
That would be something like this:
.*\/
Explanation:
. any character
* any and all characters after that (greedy)
\/ the slash escaped, this will stop at the **last** instance of '/'
You can see it in action here: http://regex101.com/r/pI4lR5
Option 1
Search: ^.*/
Replace: Empty string
Because the * quantifier is greedy, ^.*/ will match from the start of the line to the very last slash. So you can directly replace that with an empty string, and you are left with your desired text.
Option 2
Search: ^.*/(.*)
Replace: Group 1 (typically, the syntax would be $1 or \1, not sure about Ant)
Again, ^.*/ matches to the last slash. You then capture the end of the line to Group 1 with (.*), and replace the whole match with Group 1.
In my view, there's no reason to choose this option, but it's good to understand it.

Regex finding the last string that doesnt contain a number

Usually in my system i have the following string:
http://localhost/api/module
to find out the last part of the string (which is my route) ive been using the following:
/[^\/]+$/g
However there may be cases where my string looks abit different such as:
http://localhost/api/module/123
Using the above regex it would then return 123. When my String looks like this i know that the last part will always be a number. So my question is how do i make sure that i can always find the last string that does not contain a number?
This is what i came up with which really stricty matches only module for the following lines:
http://localhost/api/module
http://localhost/api/module/123
http://localhost/api/module/123a
http://localhost/api/module/a123
http://localhost/api/module/a123a
http://localhost/api/module/1a3
(?!\w*\d\w*)[^\/][a-zA-Z]+(?=\/\w*\d+\w*|$)
Explanation
I basically just extended your expression with negative lookahead and lookbehind which basically matches your expression given both of the following conditions is true:
(?!\w*\d\w*) May contain letters, but no digits
[a-zA-Z]+ Really, truly only consists of one or more letters (was needed)
(?=\/\d+|$)The match is either followed by a slash, followed by digits or the end of the line
See this in action in my sample at Regex101.
partYouWant = urlString.replace(/^.*\/([a-zA-Z]+)[\/0-9]*$/,'$1')
Here it is in action:
urlString="http://localhost/api/module/123"
urlString.replace(/^.*\/([a-zA-Z]+)[\/0-9]*$/,'$1')
-->"module"
urlString="http://localhost/api/module"
urlString.replace(/^.*\/([a-zA-Z]+)[\/0-9]*$/,'$1')
-->"module"
It just uses a capture expression to find the last non-numeric part.
It's going to do this too, not sure if this is what you want:
urlString="http://localhost/api/module/123/456"
urlString.replace(/^.*\/([a-zA-Z]+)[\/0-9]*$/,'$1')
-->"module"
/([0-9])\w+/g
That would select the numbers. You could use it remove that part from the url. What language are you using it for ?

Javascript Regex Conditional

I have strings like this:
#WTK-56491650H #=> want to capture '56491650H'
#M123456 #=> want to capture 'M123456'
I want to match everything after the # unless there is a dash then I want everything after the dash. I have a feeling I'm close but maybe not. I've found a lot of stuff about javascript regex conditionals and I can never get it to do the if then else part. It only matches after the # and that's it.
This is what I have so far:
/((?=-{1})-(.+)|(?!-{0)#(.+))/
And the demo: https://regex101.com/r/bY0yC6/1
You can use this regex with an optional match to consume everything between # and -:
/#(?:[^-]*-)?([^#-]+)$/mg
Updated RegEx Demo
Here's a solution which uses non-capturing groups (?:stuff) which I prefer so I don't have to dig through the result groups to find the string I'm interested in.
(?:#)(?:[\w\d]+-)?([\w\d]+)
First it throws out the # character, then throws out the stuff up to and including the - character, if it is there, then groups the rest as your match.
With a single regular expression, your full match will always contain the hash and/or dash because you are using it to define an acceptable string, but the groupings of a match can provide you the information that you're looking for.
you want the string to start with a hash so your regex should contain the #
next, you don't want anything before and including a dash (.*-)?, and we add a question mark because this is an optional part (ie if there is no dash)
finally, we can grab everything that is left into a final group, which will be your answer (.*)
the full expression is then #(.*-)?(.*) as pointed out by Lux

Unable to find a string matching a regex pattern

While trying to submit a form a javascript regex validation always proves to be false for a string.
Regex:- ^(([a-zA-Z]:)|(\\\\{2}\\w+)\\$?)(\\\\(\\w[\\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
I have tried following strings against it
abc.jpg,
abc:.jpg,
a:.jpg,
a:asdas.jpg,
What string could possible match this regex ?
This regex won't match against anything because of that $? in the middle of the string.
Apparently using the optional modifier ? on the end string symbol $ is not correct (if you paste it on https://regex101.com/ it will give you an error indeed). If the javascript parser ignores the error and keeps the regex as it is this still means you are going to match an end string in the middle of a string which is supposed to continue.
Unescaped it was supposed to match a \$ (dollar symbol) but as it is written it won't work.
If you want your string to be accepted at any cost you can probably use Firebug or a similar developer tool and edit the string inside the javascript code (this, assuming there's no server side check too and assuming it's not wrong aswell). If you ignore the $? then a matching string will be \\\\w\\\\ww.jpg (but since the . is unescaped even \\\\w\\\\ww%jpg is a match)
Of course, I wrote this answer assuming the escaping is indeed the one you showed in the question. If you need to find a matching pattern for the correctly escaped one ^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(\.jpeg|\.JPEG|\.jpg|\.JPG)$ then you can use this tool to find one http://fent.github.io/randexp.js/ (though it will find weird matches). A matching pattern is c:\zz.jpg
If you are just looking for a regular expression to match what you got there, go ahead and test this out:
(\w+:?\w*\.[jpe?gJPE?G]+,)
That should match exactly what you are looking for. Remove the optional comma at the end if you feel like it, of course.
If you remove escape level, the actual regex is
^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
After ^start the first pipe (([a-zA-Z]:)|(\\{2}\w+)\$?) which matches an alpha followed by a colon or two backslashes followed by one or more word characters, followed by an optional literal $. There is some needless parenthesis used inside.
The second part (\\(\w[\w].*))+ matches a backslash, followed by two word characters \w[\w] which looks weird because it's equivalent to \w\w (don't need a character class for second \w). Followed by any amount of any character. This whole thing one or more times.
In the last part (.jpeg|.JPEG|.jpg|.JPG) one probably forgot to escape the dot for matching a literal. \. should be used. This part can be reduced to \.(JPE?G|jpe?g).
It would match something like
A:\12anything.JPEG
\\1$\anything.jpg
Play with it at regex101. A better readable could be
^([a-zA-Z]:|\\{2}\w+\$?)(\\\w{2}.*)+\.(jpe?g|JPE?G)$
Also read the explanation on regex101 to understand any pattern, it's helpful!

Categories

Resources