Exclude Email Addresses from Web Address Regex - javascript

Okay, I have two Regex patterns.
([a-zA-Z0-9]?http[s]?:\/\/)?((?:(?:\w+)\.)(?:\S+)(?:\.(?:\w+))+?)
[a-zA-Z0-9._-]+#[a-zA-Z0-9.-]+.[a-zA-Z]{2,6}
The first meets my needs at finding web addresses in a string. The second meets my needs at locating email addresses in a string. However, for some reason the first one is finding email addresses that look like this first.last#d1.d2.d3.d4 or first.last#d1.com. I need some help getting that first one so that it doesn't pick up those email addresses.

For example you could fix it by excluding #
([a-zA-Z0-9]?http[s]?:\/\/)?((?:(?:\w+)\.)(?:[^\s#]+)(?:\.(?:\w+))*?)
and at the very end I suggest use *? instead of +?, +? didn't matched 1st level domain without www
yet it find abc#gmail.com
Sadly I have no idea how to check that 1st symbol before matched substring is not #
edit: bad solution
^[^#]*?([a-zA-Z0-9]?http[s]?:\/\/)?((?:(?:\w+)\.)(?:[^\s#]+)(?:\.(?:\w+))*?)
checks that there is no #s from the start of the line till matched part

([a-zA-Z0-9]?http[s]?:\/\/)?((?:(?:\w+)\.)(?:\S+)(?:\.(?:\w+))+?)
Breaking this down, there are several problems...
( // capture protocol
[a-zA-Z0-9]? // matches alphanumeric, optionally (do you really want that to start the string before the protoco?)
http[s]? // square brackets delimit character class, so are unneccessary here, although don't change functionality
:\/\/ // matches ://
)? // make captured protocol optional
((?:(?:\w+)\.)(?:\S+)(?:\.(?:\w+))+?) // too many lookaheads, not enough patterns. Innefficient and causing your error
I would replace the regex with something more like this...
(https?:\/\/)?(\w[-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?

Related

Javascript Regex Conditional

I have strings like this:
#WTK-56491650H #=> want to capture '56491650H'
#M123456 #=> want to capture 'M123456'
I want to match everything after the # unless there is a dash then I want everything after the dash. I have a feeling I'm close but maybe not. I've found a lot of stuff about javascript regex conditionals and I can never get it to do the if then else part. It only matches after the # and that's it.
This is what I have so far:
/((?=-{1})-(.+)|(?!-{0)#(.+))/
And the demo: https://regex101.com/r/bY0yC6/1
You can use this regex with an optional match to consume everything between # and -:
/#(?:[^-]*-)?([^#-]+)$/mg
Updated RegEx Demo
Here's a solution which uses non-capturing groups (?:stuff) which I prefer so I don't have to dig through the result groups to find the string I'm interested in.
(?:#)(?:[\w\d]+-)?([\w\d]+)
First it throws out the # character, then throws out the stuff up to and including the - character, if it is there, then groups the rest as your match.
With a single regular expression, your full match will always contain the hash and/or dash because you are using it to define an acceptable string, but the groupings of a match can provide you the information that you're looking for.
you want the string to start with a hash so your regex should contain the #
next, you don't want anything before and including a dash (.*-)?, and we add a question mark because this is an optional part (ie if there is no dash)
finally, we can grab everything that is left into a final group, which will be your answer (.*)
the full expression is then #(.*-)?(.*) as pointed out by Lux

Why are my optional characters not being caught?

I'm trying to create a regex to test passwords against. My current one checks for the following:
One Uppercase Letter
One Lowercase Letter
One number
Currently, the user can't enter special characters, however I'm trying to add that as an optional check (so Testing1 and Testing1! should both match). I've tried:
^(?=.*[A-Za-z])(?=.*\d)(?=.*[$#$!%*#?&])(A-Za-z\d[$#$!%*#?&]?){8,}$
But it doesn't catch it. I have a feeling my special character set is in the wrong place, but I'm not sure where to place it.
Where do I add my list of special characters as optional checks?
There's many ways that you can set up your regex, such as creating a whitelist, or a blacklist, for types of characters. This one in particular creates a whitelist for characters that can be used which seems to be what you are looking for.
^(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])[A-Za-z0-9$#$!%*#?&]{8,}$
Regex Breakdown:
^ // Assert position at start of the line
(?=.*[A-Z]) // First positive lookahead, makes sure a capital character exists
(?=.*[a-z]) // Make sure a lowercase character exists
(?=.*[0-9]) // Make sure a number exists
[A-Za-z0-9$#$!%*#?&] // All of the possible characters that can be typed
{8,} // 8 to infinity characters
$ // Assert position at end of line
Since you say that you want special characters as optional, they are just placed in the possible characters that can be typed, but they are not validated by any positive lookaheads.
See this regex in action on regex101. Keep in mind, the modifiers gm are there to validate across lines in this example and should probably be removed in your use case.
Of course you may have reasons for the "whitelist" approach, but a more common approach, and one you may want to look into trying sometime, is to allow almost anything (blacklist), and then validate that a certain criteria is met.

regex for ng-pattern for filepath

I have arrived at a regex for file path that has these conditions,
Must match regex ^(\\\\[^\\]+\\[^\\]+|https?://[^/]+), so either something like \server\share (optionally followed by one or more "\folder"s), or an HTTP(S) URL
Cannot contain any invalid path name chars( ",<,>, |)
How can i get a single regex to use in angular.js that meets these conditions
Your current regex doesn't seem to match what you want. But given it is correctly doing what you want, then this will add the negation :
^(?!.*[ "<>|])(\\\\[^\\]+\\[^\\]+|https?://[^/]+)
Here we added a negative lookahead to see if any characters are in the string which we will fail the match. If we find none, then the rest of the regular expression will continue.
If I understand your requirements correctly, you could probably do this :
^(?!.*[ "<>|])(\\\\|https?://).*$
This will still not match any invalid characters defined in the negative lookahead, and also meets your criteria of matching one or more path segments, as well as http(s) and is much simpler.
The caviate is that if you require 2 or more path segments, or a trailing slash on the url, than this will not work. This is what your regex seems to suggest.
So in that case this is still somewhat cleaner than the original
^(?!.*[ "<>|])(\\\\[^\\]+\\.|https?://[^/]+/).*$
One more point. You ask to match \server\share, yet your regex opens with \\\\. I have assumed that \server\share should be \\server\share and wrote the regex's accordingly. If this is not the case, then all instances of \\\\ in the examples I gave should be changed to \\
Ok, first the regex, than the explanation:
(?<folderorurl>(?<folder>(\\[^\\\s",<>|]+)+)|(?<url>https?:\/\/[^\s]+))
Your first condition is to match a folder name which must not contain any character from ",<>|" nor a whitespace. This is written as:
[^\s,<>|] # the caret negates the character class, meaning this must not be matched
Additionally, we want to match a folder name optionally followed by another
(sub)folder, so we have to add a backslash to the character class:
[^\\\s,<>|] # added backslash
Now we want to match as many characters as possible but at minimum one, this is what the plus sign is for (+). With this in mind, consider the following string:
\server\folder
At the moment, only "server" is matched, so we need to prepend a backslash, thus "\server" will be matched. Now, if you break a filepath down, it always consists of a backslash + somefoldername, so we need to match backslash + somefoldername unlimited times (but minimum one):
(\\[^\\\s",<>|]+)+
As this is getting somewhat unreadable, I have used a named capturing group ((?<folder>)):
(?<folder>(\\[^\\\s",<>|]+)+)
This will match everything like \server or \server\folder\subfolder\subfolder and store it in the group called folder.
Now comes the URL part. A URL consists of http or https followed by a colon, two forward slashes and "something afterwards":
https?:\/\/[^\s]+ # something afterwards = .+, but no whitespaces
Following the explanation above this is stored in a named group called "url":
(?<folder>(\\[^\\\s",<>|]+)+)
Bear in mind though, that this will match even non-valid url strings (e.g. https://www.google.com.256357216423727...), if this is ok for you, leave it, if not, you may want to have a look at this question here on SO.
Now, last but not least, let's combine the two elements with an or, store it in another named group (folderorurl) and we are done. Simple, right?
(?<folderorurl>(?<folder>(\\[^\\\s",<>|]+)+)|(?<url>https?:\/\/[^\s]+))
Now the folder or a URL can be found in the folderorurl group while still saving the parts in url or folder. Unfortunately, I do know nothing about angular.js but the regex will get you started. Additionally, see this regex101 demo for a working fiddle.
Must match regex ^(\\\\[^\\]+\\[^\\]+|https?://[^/]+), so either something like \\server\share (optionally followed by one or more
"\folder"s), or an HTTP(S) URL
Cannot contain any invalid path name chars( ",<,>, |)
To introduce the second condition in your regex, you mainly just have to include the invalid characters in the negated character sets, e. g. instead of [^/] use [^/"<>|].
Here's a working example with a slightly rearranged regex:
paths = [ '\\server\\share',
'\\\\server\\share',
'\\\\server\\share\\folder',
'http://www.invalid.de',
'https://example.com',
'\\\\<server\\share',
'https://"host.com',
'\\\\server"\\share',
]
for (i in paths)
{
document.body.appendChild(document.createTextNode(paths[i]+' '+
/^\\(\\[^\\"<>|]+){2,}$|^https?:\/\/[^/"<>|]+$/.test(paths[i])))
document.body.appendChild(document.createElement('br'))
}

Regex - Matching a string within an email address

So my problem is I am looking to match a certain combination of letters at the start of an email address followed by an # and then a wildcard, for example:
admin#* OR noreply#* OR spam#* OR subscribe#
Can this be done?
Try this
^(?:admin|noreply|spam|subscribe)#\S*
See it here on Regexr
You need an anchor at the start to avoid matching address with other characters before. If the string contains only the email address use ^ at the beginning, this matches the start of the string. If the email address is surrounded by other text, the use \b this is a word boundary.
(?:admin|noreply|spam|subscribe) is a non capturing group, becuase of the ?: at the start, then there is a list of alternatives, divided by the | character.
\S* is any amount of non white characters, this will match addresses that are not valid, but should not hurt too much.
Your looking for grouping with the | operator. The following will do what you want.
edit: Since your using this for an email server rules you won't need to match the entire string, only part of it. In that case you will need to use ^ to specify the start of the string and then drop the domain portion since we don't care about what it is.
^(admin|noreply|spam|subscribe)#
sure thing!
[A-Za-z]+#.+
says any letters at least once but any number of times, then an at sign, then anything (other than newline) for your specific examples use
(admin|noreply|spam|subscribe)#.+

JS regex - convert "any" plain text hostname/url/ip to a link

I have been looking for a JS regexp that converts plain text url or hostnames to clickable links, but none of the script I found meet my requirements. Unfortunately, I suck at regex and are unable to modify the expression to work the way I want.
The plain text I wish to convert to links are:
Anything staring with http(s):, ftp(s):, mailto: or
file:
domain.tld[:port][path][file][querystring]
any.sub.domain.tld[:port][path][file][querystring]
0/255.0/255.0/255.0/255[:port][path][file][querystring]
locahost[:port][path][file][querystring]
[*] = optional.
Any help are highly appreciated!
If you can live with false positives, such as something.notavalidtld or 999.999.999.999 getting matched, what you are looking for is probably something like this. (Otherwise, it gets more messy.)
Start matching at the beginning of the string.
^(
Match anything starting with http/https/ftp/...
((https?|ftps?|mailto|file):.*?)
OR match the all of the below.
|
Optionally match http/https/ftp/... followed by : and at least one /.
((https?|ftps?|mailto|file):/+)?
Match an IP address...
(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
...or a domain (with optional username/password, which also matches email addresses)...
|([\w\d.:_%+-]+#)?([\w\d-]+\.)+[\w\d]{2,}
... or localhost.
|localhost)
Optionally followed by a port number.
(:\d+)?
Optionally followed by any path/query string.
(/.*)?
Ensuring the string ends here.
)$
All the above parts should be joined together without any whitespace in between.
I haven't tested it extensively, so I might have missed something. But at least you have a starting point.

Categories

Resources