RegExp matching url w/o affect email address - javascript

Supposed that I have a block of text like this:
My site: http://www.mysite.com,
drop me an email at foo#bar.com
I want to replace url & email address to convert text to link.
I then used this pattern for email:
text.replace(/([\w\-\+_]+(\.[\w\-\+_]+)*\#[\w\-\+_]+\.[\w\-\+_]+(\.[\w\-\+_]+)*)/gi, replacement);
and below pattern for url:
text.replace(/((https?:\/\/)?[\w-]+(\.[\w-]+)+\.?(:\d+)?(\/\S*)?)/gi, replacement);
But the url pattern screw up my email pattern, the last result become like:
bar.com>bar.com</a>
Is there any better pattern for this situation?
Thanks

I would suggest you to go with this regex:
/(\b(?:ht|f)tps?:\/\/.+\b)|(\b[\w.]+#(?=.*?\.)[\w.]+\b)/g
And replace it with:
$1$2
Demo

Well, I think your core problem will be that your 'URL' pattern also contains an '#' so it'll 'match' your email address as well.
However, I would also suggest that your patterns are way too complicated.
Something like:
email pattern should be something like:
[\w.+]+\#[\w.]+
url pattern something like:
https?://[\w.]+
It won't catch some of the more unusual, but otherwise valid patterns that could be used for URLs (that latter for example, won't catch URLs for CGI GETs). Neither does either really validate the data. But you don't want a regex for that in the first place really.

Related

JavaScript match access_token in source page

I have this input string for a regular expression
ine",monitor_crashes:false,container_type:null,min_aspect_ratio:0.25,max_aspect_ratio:4,number_of_partitions:2,multi_partitioning_enabled:false,access_token:"EAAAAUaZA8jlABAOZC1TJwwFgfHyWt4V6b6B6cNxMXKkrjcpmzYS2vB7GWnIJFZCFQMPPEoZCInyJVigwcn8DtZA9xtYNATZBZBriOZBjAhdZCMfZCwohKOISSpC8aewclxA3U3X2PqPZBwZCdZBcKNA2Ydr2pQECR6ZBbuOaAZD",resumability_enabled:true,resumable_service_override:null,change_default_chunk_size:true,client_chunk_size:200000000,use_real_progress_percentage:false,use_progress_linearity:0,use_progress_transform_x:1,early_receive:false
I try to grab access token. But result return not single value.
I want single value:
"EAAAAUaZA8jlABAOZC1TJwwFgfHyWt4V6b6B6cNxMXKkrjcpmzYS2vB7GWnIJFZCFQMPPEoZCInyJVigwcn8DtZA9xtYNATZBZBriOZBjAhdZCMfZCwohKOISSpC8aewclxA3U3X2PqPZBwZCdZBcKNA2Ydr2pQECR6ZBbuOaAZD"
How to improvement my regex.
My test https://www.debuggex.com/r/xPqpBV3e9h2yoghE
My regex: (\w)+(?="|access_token$)
Your current regex, (\w)+(?="|access_token$), matches any length (>= 1) of word characters followed either by an " or by access_token$. I'm really not sure why you would want to have it followed by access_token$, because the access_token comes before the text you're looking for.
I don't know why a simple regex like: access_token:"(\w+)\" wouldn't work? (the first capturing group is the string) It looks for the string with the key access_token.
Then again, as #desoares said in the comments, it's probably best to parse this JSON with a JSON parser, using: JSON.parse(yourJsonObjectString).access_token.
This regular expression is what would work:
RegEx: access_token:"(\w+?)"
Click Here to see Screen Capture of Solution using your page "Debuggex.com"
Regex = ([A-Z]\w+)
Reference:
enter link description here

How to validate my URL efficienty using JavaScript?

My regex successfully validates many URLs except http://www.google
Here's my URL validator in JSFiddle: http://jsfiddle.net/z23nZ/2/
It correctly validates the following URLs:
http://www.google.com gives True
www.google.com gives True
http://www.rootsweb.ancestry.com/~mopoc/links.htm gives True
http:// www. gives False
...but not this one:
http://www.google gives True
It's not correct to return true in this case. How can I validate that case?
I think you need to way simplify this. There are plenty of URL validation RegExes out there, but as an exercise, I'll go through my thought process for constructing one.
First, you need to match a protocol if there is one: /((http|ftp)s?:\/\/)?
Then match any series of non-whitespace characters: \S+
If you're trying to pick out URLs from text, you'll want to look for signs that it is a URL. Look for dots or slashes, then more non-whitespace: [\.\/]\S*/
Now put it all together:
/(((http|ftp)s?:\/\/)|(\S+[\.\/]))\S*[^\s\.]*/
I'm guessing that your attempting to look for www.google is because of the new TLDs... the fact is, such URLs might just look like google, and so any word could be a URL. Trying to come up with a catch-all regex which matches valid URLs and nothing else isn't possible, so you're best just going with something simple like the above.
Edit: I've stuck a | in there between the protocol part and the non-whitespace-then-dot-or-slash part to match http://google if people choose to write new URLs like that
Edit 2: See comments for the next improvement. It makes sure google.com matches, http://google matches, and even google/ matches, but not a..

Regular expression for detecting hyperlinks

I've got this regex pattern from WMD showdown.js file.
/<((https?|ftp|dict):[^'">\s]+)>/gi
and the code is:
text = text.replace(/<((https?|ftp|dict):[^'">\s]+)>/gi,"$1");
But when I set text to http://www.google.com, it does not anchor it, it returns the original text value as is (http://www.google.com).
P.S: I've tested it with RegexPal and it does not match.
Your code is searching for a url wrapped in <> like: <http://www.google.com>: RegexPal.
Just change it to /((https?|ftp|dict):[^'">\s]+)/gi if you don't want it to search for the <>: RegexPal
As long as you know your url's start with http:// or https:// or whatever you can use:
/((https?|s?ftp|dict|www)(://)?)[A-Za-z0-9.\-]+)/gi
The expression will match till it encounters a character not allowed in the URL i.e. is not A-Za-z\.\-. It will not however detect anything of the form google.com or anything that comes after the domain name like parameters or sub directory paths etc. If that is your requirement that you can simply choose to terminate the terminating condition as you have above in your regex.
I know it seems pointless but it may be useful if you want the display name to be something abbreviated rather than the whole url in case of complex urls.
You could use:
var re = /(http|https|ftp|dict)(:\/\/\S+?)(\.?\s|\.?$)/gi;
with:
el.innerHTML = el.innerHTML.replace(re, '<a href=\'$1$2\'>$1$2<\/a>$3');
to also match URLs at the end of sentences.
But you need to be very careful with this technique, make sure the content of the element is more or less plain text and not complex markup. Regular expressions are not meant for, nor are they good at, processing or parsing HTML.

homepage regexp matching

I'm trying to write a regexp which allow me to filter out all the url addresses having this shape.
http://foo.com/
http://foo
http://foo.*
But not the following:
http://foo.com/subfoo
http://foo.com/subfoo/
http://foo.com/subfoo/subsubfoo..
In order to match the second url group i've written the following regexp:
http://.*/.
However my problem is search the regexp matching the first group.
So i need a way to say:
if after http://.* or http.//.*/ there is nothing, matches the pattern.
I've read something on lookhaead. I don't know it this might be the right way.
Any idea? Thanks for your answer.
A bit late, but this worked for me:
http://[^/]*[/]*$
/^http:\/\/[^/?#]*\/./i
should match only http URLs with a path component other than /.
Don't forget query strings like this: http://foo.com/?search
/^http:\/\/[^/?#]*\/[^?]/i

regex for email validation Not Working with Subdomains?

I'm using the following for email validation:
var filter = /^([\w]+)(.[\w]+)*#([\w]+)(.[\w]{2,3}){1,2}$/; // For Email Validation
if (filter.test(emailInputVal))) {console.log('good')}
For some reason the above does not work with emails that have a subdomain Any ideas why?
xxxx#xxx.xxx.com
Thanks
Because your regular expression is incorrect. Try this instead:
var filter = /^\w+(?:\.\w+)*#\w+(?:\.\w+)+$/;
This link may help you lots when validating email addresses:
http://www.regular-expressions.info/email.html
Official RFC 2822 standard
This non-trivial simplified regular expression conforming to RFC 2822 standard:
var filter = /[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)\b/;
That is one weird regex. It's certainly not doing what you're expecting it to do, for example because the dot isn't escaped when you do mean a literal dot.
Since it's impossible to really validate an email address with a regex anyway - why not go for something simpler?
/^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,6}$/i
This will still match some invalid addresses and will reject some valid addresses (as all readable regexes do), but in the end you have to send a confirmation mail to a user-submitted mail address and see if you get a reply if you truly want to validate it.
You can't reliably validate email addresses with regular expressions. What I'd do:
use a simple expression like /^[^#]+#([A-Za-z0-9-]+\.)*[A-Za-z0-9-]+$/ for client-side validation to catch typos
check the DNS record on the server-side
send a confirmation mail
Your last component is: any length word, then one or two instances of (dot, two-or-three-letter word). I would expect "xxxx#xxx.xxx.com" to work, but perhaps not more realistic examples like "xxxx#xxx.example.com" because your domain name is not a two-or-three-letter word.
Do yourself a favor: use simply /^[^# ]+#[^# ]+\.[^# ]+$/ More about this: http://nedbatchelder.com/blog/200908/humane_email_validation.html
I am not sure if the above code will work any format, because there is an extra ) in the if condition, removing it works for the sub domain too:
if (filter.test(emailInputVal)) {console.log('good')}

Categories

Resources