Regex expression excludes links with weird URL

Regex expression excludes links with weird URL - javascript

I have this regex expression (Java / JavaScript)
/(http|ftp|https):\/\/([\w+?\.\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\-\\=\\+\\\\\/\\?\\.\\:\\;\\'\\,]*\.(?:jpg|JPG|jpeg|JPEG|gif|GIF|png|PNG|bmp|BMP|tiff|TIFF))?/
But it seem to have issues with a URL like this one :
https://cdn.vox-cdn.com/thumbor/C07imD1SHmAnbObkg-nJ92N6sD8=/0x0:4799x3199/920x613/filters:focal(2017x1217:2783x1983):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/62871037/seattle.0.jpg
What do you think is missing in my expression?
I want to accept valid image URL.

Your expression works for me in the validator I tested with (regex101.com), however, it matches as 3 separate capture groups. To capture it all as a single match, just wrap the whole statement in a set of parentheses.
Note: to be clear, there are simpler ways to do this, but to answer the specific question that the OP asked, this will make their statement match their supplied link.
((http|ftp|https):\/\/([\w+?\.\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\-\\=\\+\\\\\/\\?\\.\\:\\;\\'\\,]*\.(?:jpg|JPG|jpeg|JPEG|gif|GIF|png|PNG|bmp|BMP|tiff|TIFF))?)
EDIT: After assisting the OP in narrowing down the scope of their issue, a more appropriate regex statement would be something like this: /^(((http(s?))|((s?)ftp)):)([\w \D~!##$%^&*\\_/-=+/?.:;',]){1,}\.(jpg|gif|png)$/i
Lets break this down:
First this says it must start with either'http' with an optional 's', or if that isnt there, it will look for 'ftp' with an optional 's' prefixing it to account for secure forms of ftp. this must be followed with a colon. The next set accepts just about any commonly used character or symbol in a url path. Finally, it ensures that the expression ends with an actual image extension. wrapping the expression in /{expression}/i indicates that the expression is case insensitive and it will matche either upper or lower case, in any combination.
as a further note, you also may want to account for the print formats of .jpeg, .tif, etc.

Related

regex for ng-pattern for filepath

I have arrived at a regex for file path that has these conditions,
Must match regex ^(\\\\[^\\]+\\[^\\]+|https?://[^/]+), so either something like \server\share (optionally followed by one or more "\folder"s), or an HTTP(S) URL
Cannot contain any invalid path name chars( ",<,>, |)
How can i get a single regex to use in angular.js that meets these conditions

Your current regex doesn't seem to match what you want. But given it is correctly doing what you want, then this will add the negation :
^(?!.*[ "<>|])(\\\\[^\\]+\\[^\\]+|https?://[^/]+)
Here we added a negative lookahead to see if any characters are in the string which we will fail the match. If we find none, then the rest of the regular expression will continue.
If I understand your requirements correctly, you could probably do this :
^(?!.*[ "<>|])(\\\\|https?://).*$
This will still not match any invalid characters defined in the negative lookahead, and also meets your criteria of matching one or more path segments, as well as http(s) and is much simpler.
The caviate is that if you require 2 or more path segments, or a trailing slash on the url, than this will not work. This is what your regex seems to suggest.
So in that case this is still somewhat cleaner than the original
^(?!.*[ "<>|])(\\\\[^\\]+\\.|https?://[^/]+/).*$
One more point. You ask to match \server\share, yet your regex opens with \\\\. I have assumed that \server\share should be \\server\share and wrote the regex's accordingly. If this is not the case, then all instances of \\\\ in the examples I gave should be changed to \\

Ok, first the regex, than the explanation:
(?<folderorurl>(?<folder>(\\[^\\\s",<>|]+)+)|(?<url>https?:\/\/[^\s]+))
Your first condition is to match a folder name which must not contain any character from ",<>|" nor a whitespace. This is written as:
[^\s,<>|] # the caret negates the character class, meaning this must not be matched
Additionally, we want to match a folder name optionally followed by another
(sub)folder, so we have to add a backslash to the character class:
[^\\\s,<>|] # added backslash
Now we want to match as many characters as possible but at minimum one, this is what the plus sign is for (+). With this in mind, consider the following string:
\server\folder
At the moment, only "server" is matched, so we need to prepend a backslash, thus "\server" will be matched. Now, if you break a filepath down, it always consists of a backslash + somefoldername, so we need to match backslash + somefoldername unlimited times (but minimum one):
(\\[^\\\s",<>|]+)+
As this is getting somewhat unreadable, I have used a named capturing group ((?<folder>)):
(?<folder>(\\[^\\\s",<>|]+)+)
This will match everything like \server or \server\folder\subfolder\subfolder and store it in the group called folder.
Now comes the URL part. A URL consists of http or https followed by a colon, two forward slashes and "something afterwards":
https?:\/\/[^\s]+ # something afterwards = .+, but no whitespaces
Following the explanation above this is stored in a named group called "url":
(?<folder>(\\[^\\\s",<>|]+)+)
Bear in mind though, that this will match even non-valid url strings (e.g. https://www.google.com.256357216423727...), if this is ok for you, leave it, if not, you may want to have a look at this question here on SO.
Now, last but not least, let's combine the two elements with an or, store it in another named group (folderorurl) and we are done. Simple, right?
(?<folderorurl>(?<folder>(\\[^\\\s",<>|]+)+)|(?<url>https?:\/\/[^\s]+))
Now the folder or a URL can be found in the folderorurl group while still saving the parts in url or folder. Unfortunately, I do know nothing about angular.js but the regex will get you started. Additionally, see this regex101 demo for a working fiddle.

Must match regex ^(\\\\[^\\]+\\[^\\]+|https?://[^/]+), so either something like \\server\share (optionally followed by one or more
"\folder"s), or an HTTP(S) URL
Cannot contain any invalid path name chars( ",<,>, |)
To introduce the second condition in your regex, you mainly just have to include the invalid characters in the negated character sets, e. g. instead of [^/] use [^/"<>|].
Here's a working example with a slightly rearranged regex:
paths = [ '\\server\\share',
'\\\\server\\share',
'\\\\server\\share\\folder',
'http://www.invalid.de',
'https://example.com',
'\\\\<server\\share',
'https://"host.com',
'\\\\server"\\share',
]
for (i in paths)
{
document.body.appendChild(document.createTextNode(paths[i]+' '+
/^\\(\\[^\\"<>|]+){2,}$|^https?:\/\/[^/"<>|]+$/.test(paths[i])))
document.body.appendChild(document.createElement('br'))
}

Regular expression anything but letters (javascript)

I want to validate a form field by checking of the input contains any letters. All other characters and numbers should be allowed. I'm quite bad at regular expressions, and I can't find a correct solution anywhere.
I've tried this:
/[^A-Za-z]/g
but this only returns false if the string consists of only letters (i.e. 432ad32d should return false as well).
Could anyone tell me how to do this?

Using a whitelist of allowed characters is the best approach in your case:
/^[-+\d(), ]+$/
Unicode has many things it calls a letter, better not mess with that in the first place. And JavaScript regexes aren't well suited for handling these (they lack things like \p{L} for instance unless you use an external library).
Also, by using the whitelist approach you can be sure about the kinds of inputs which will be accepted by your form. You can't predict the kind of mess users could input otherwise. Think about things like this:
TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ
:-)

/[^A-Za-z]/
This regex matches a single non-letter, which isn't very useful. Yura Yakym's answer matches the beginning of the string, any number of non-letters, and then the end of the string, which is useful when it matches: it means your string contains only those things.
Another useful regex is:
/[A-Za-z]/
This matches a single letter, which is useful when it doesn't match: it means your string does not contain any letters at all.
For your question in general, "how can I ensure a string lacks letters?", I would use that second regex: I would try to match a letter, and hopefully fail to do so. For input validation though, I'd prefer a regex that describes all possible valid inputs. If /^[^A-Za-z]*$/ does so, then use that. If you have additional requirements, add those to it. Don't have multiple "no letters? OK. no non-dash special characters? OK." ... well, unless you want to provide error messages precisely about such things.

Try this regular expression: ^[^A-Za-z]*$

You need to include anchors
/^[^A-Za-z]+$/g
This will ensure the string starts and ends with one or more numbers/special characters

You forgot about start and end markers. Also you don't need g flag.
/^[^A-Za-z]*$/
Anyway, that's strange as I can enter ciryllic letters still.

regex pattern for URL in javascript

Im using the following URL regex pattern for URL validation.
/[-a-zA-Z0-9#:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)?/gi;
But i need to exclude .com
ie http://google/ should work.
What change needs to be done for this?

You better user this length expression from jquery.validate.js extension. This is well tested and support multilingual urls. Don't afraid of unicode and hexadecimal expression inside the expression. Its only to support multilingual urls. Refer this (Unicode Characters) to understand what following unicode means
/^(https?|ftp):\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i
Your above expression has lots of flaw like last part of your expression \b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)? itself match the whole url and does not have any effect of the previous expression

assuming you want everything including urls without the .com in it.
/[-a-zA-Z0-9#:%_\+.~#?&//=]{2,256}
(?:\.[a-z]{2,4})? // (?:) match group this is where the .com is captured
// ? quantifier 0 to 1 times
\b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)?/gi
JSFIDDLE

Simply take this section: \.[a-z]{2,4} and replace it with (\.[a-z]{2,4})?.
The full regex:
[-a-zA-Z0-9#:%_\+.~#?&//=]{2,256}(\.[a-z]{2,4})?\b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)?
And a demo.
Effectively what we're doing here is making the .xxxx optional, by wrapping it in () and using the ? to denote a non-greedy state.
This will match both:
http://www.google.com/
and
http://localhost/
Caveat: this isn't the most efficient expression to accomplish what you want, but it is simply the smallest required adjustment needed to accomplish what you want.

Optional lookahead in javascript

I was trying to build a regex for a user input. Im building a form based on the user Input. Lets assume that the user assigns the css property as to "Icon-[anything]" (Bootstrap Icon). Now in this case i have to ensure that "--" is not repeated more than once and also should ensure that "icon-white" should be the only class assigned beside the other one; this 'icon-white' has to optional as well.
/^icon-[a-z-]+(\ icon-white)?$/ - this regex works fine for the OPTIONAL Icon-white Scenario, but having some issue in avoiding the repetition of '--'.

If you want to match "icon-somevalue" but not "icon-white" try
icon-(?!white).*

If I understand correctly (although I'm not sure I do, sorry...) I think you're saying that the following two scenarios are allowed:
icon-white
icon-[anything] where [anything] can be any lower-case text and include a hyphen, but never two (or more) hyphens directly next to each other like --.
You've not said where this pattern might occur, although your original regex suggests this pattern will occur anchored to the start of your test string, so I'll assume that's the case. In which case, this regex should help:
^icon-white$|^icon-([a-z]+-?)+$
Breaking that down:
^icon-white$ Match the literal string that contains exactly "icon-white"
| or
^icon-([a-z]+-?)+$ Match the literal string that starts with exactly "icon-" and then immediately ends with "something" which is ([a-z]+-?)+.
Now, to be clear - I don't get the relationship between icon-white and icon-[something]. That is, as far as I see it there's no reason why the icon-[something] pattern at 3 above can't cover the "icon-white" literal too. ie 1 and 2 above are redundant. But I've included them here so you can maybe piece something more suitable together.
Breaking that "something" down from 3:
( )+ means one or more instances of whatever's inside the parenthesis, which is [a-z]+-?
Breaking that [a-z]+-? down:
[a-z]+ At least one character "a" through "z" (note hyphen is NOT allowed here to avoid additional hyphen immediately after the previous one)
-? An optional hyphen (ie exactly 0 or 1 hyphen)
This matches the following test cases:
icon-white
icon-x
icon-xx
icon-x-
icon-xx-
icon-x-x
icon-xx-x
icon-xx-xx
icon-x-x-x-
icon-x-xx-xx-x-xxxxx-
... and so on
This DOES NOT match the following test cases:
any case where a capital letter used (you've specified only lower-case)
icon- (because we need one or more characters for "something".
icon--
icon--x
icon-x--
I hope this covers your needs, but I doubt it does (because I didn't really understand your explanation "ensure that "icon-white" should be the only class assigned beside the other one"), but hopefully my breakdown will give you the pieces you need.
EDIT:
I think maybe you're saying the scenarios allowed are:
icon-[something]
icon-[something] icon-white
icon-white icon-[something]
where [something] is any combinations of lower-case text and hypens, so long as there's never a double-hyphen, and so long as it's not "white".
So... this defines "icon-[something]" : icon-(?!white$)([a-z]+-?)+
This means our 3 above scenarios are:
^icon-(?!white$)([a-z]+-?)+$
^icon-(?!white$)([a-z]+-?)+ icon-white$
^icon-white icon-(?!white$)([a-z]+-?)+$
And hence, putting it all together:
^icon-(?!white$)([a-z]+-?)+$|^icon-(?!white$)([a-z]+-?)+ icon-white$|^icon-white icon-(?!white$)([a-z]+-?)+$
I tried doing this with the icon-white section as an optional group, but had trouble with the negative lookahead from the first section capturing it... so... this'll do ;-)

JS regex - convert "any" plain text hostname/url/ip to a link

I have been looking for a JS regexp that converts plain text url or hostnames to clickable links, but none of the script I found meet my requirements. Unfortunately, I suck at regex and are unable to modify the expression to work the way I want.
The plain text I wish to convert to links are:
Anything staring with http(s):, ftp(s):, mailto: or
file:
domain.tld[:port][path][file][querystring]
any.sub.domain.tld[:port][path][file][querystring]
0/255.0/255.0/255.0/255[:port][path][file][querystring]
locahost[:port][path][file][querystring]
[*] = optional.
Any help are highly appreciated!

If you can live with false positives, such as something.notavalidtld or 999.999.999.999 getting matched, what you are looking for is probably something like this. (Otherwise, it gets more messy.)
Start matching at the beginning of the string.
^(
Match anything starting with http/https/ftp/...
((https?|ftps?|mailto|file):.*?)
OR match the all of the below.
|
Optionally match http/https/ftp/... followed by : and at least one /.
((https?|ftps?|mailto|file):/+)?
Match an IP address...
(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
...or a domain (with optional username/password, which also matches email addresses)...
|([\w\d.:_%+-]+#)?([\w\d-]+\.)+[\w\d]{2,}
... or localhost.
|localhost)
Optionally followed by a port number.
(:\d+)?
Optionally followed by any path/query string.
(/.*)?
Ensuring the string ends here.
)$
All the above parts should be joined together without any whitespace in between.
I haven't tested it extensively, so I might have missed something. But at least you have a starting point.

Develop Reference

JavaScript is the programming language of the Web.

Regex expression excludes links with weird URL - javascript

Related

regex for ng-pattern for filepath

Regular expression anything but letters (javascript)

regex pattern for URL in javascript

Optional lookahead in javascript

JS regex - convert "any" plain text hostname/url/ip to a link

Categories

Resources