How to get domain from email address in Javascript? - javascript

I want to get the domain part from an email address, in Javascript. It's easy to extract the domain from an email like via split: "joe#example.com", which is example.com.
However, emails also come in forms like "joe#subdomain1.example.com.uk", of which the domain is example.com.uk, instead of subdomain1.example.com.uk. The problem here is that subdomain1 can be mistakenly considered as part of the domain.
How do I do this reliably?

That is really not a trivial problem as it might seem at first glance.
Luckily there are libs that solves this, tld-extract is a popular choice which uses Mozilla's Public Suffix List (a volunteer based list). The usage is
var parser = require('tld-extract');
console.log( parser("www.google.com") );
console.log( parser("google.co.uk") );
/**
* >> { tld: 'com', domain: 'google.com', sub: 'www' }
* >> { tld: 'co.uk', domain: 'google.co.uk', sub: '' }
*/
To extract the server address part from email address first split by # character like this
const email = "john#sub.domain.com"
const address = email.split('#').pop()
const domain = parser(address).domain
See more in depth discussion about the problem solution check the README of a similar python library.
tldextract on the other hand knows what all gTLDs and ccTLDs look like
by looking up the currently living ones according to the Public Suffix
List (PSL). So, given a URL, it knows its subdomain from its domain,
and its domain from its country code.
Make sure to learn about the list on Public Suffix List website and understand it is based on volunteer work and might not be exhaustive at all time.
The Public Suffix List is a cross-vendor initiative to provide an
accurate list of domain name suffixes, maintained by the hard work of
Mozilla volunteers and by submissions from registries, to whom we are
very grateful.
Since there was and remains no algorithmic method of finding the
highest level at which a domain may be registered for a particular
top-level domain (the policies differ with each registry), the only
method is to create a list. This is the aim of the Public Suffix List.

I agree that the best solution for this problem would be to use a library, like what was suggested in https://stackoverflow.com/a/49893282/2735286.
Yet if you have a long enough list with top level domains and subdomains, you could write some code which extracts whatever characters are found after the '#' sign and then from the domain you try to find out whether you have a top level or subdomain. When you know if you are dealing with a top level domain you know where you can find the main domain name and so everything before it must be a subdomain. The same applies to the subdomain.
This is a naive implementation, but you could try this:
// TODO: needs to have an exhaustive list of top level domains
const topLevelDomains = ["com", "org", "int", "gov", "edu", "net", "mil"];
// TODO: Needs an exhaustive list of subdomains
const subdomains = ["co.uk", "org.uk", "me.uk", "ltd.uk", "plc.uk"];
function extract(str) {
const suffix = str.match(/.+#(.+)/);
if (suffix) {
const groups = suffix.pop().split(".");
const lastPart = groups[groups.length - 1];
if (isSubDomain(groups[groups.length - 2] + "." + lastPart)) {
console.log("Sub domain detected in: " + groups);
if (groups.length > 3) {
console.log("Possible subdomain: " + groups.splice(0, groups.length - 3));
console.log();
}
} else if (isTopLevelDomain(lastPart)) {
console.log("Top level domain detected in: " + groups);
if (groups.length > 2) {
console.log("Possible subdomain: " + groups.splice(0, groups.length - 2));
console.log();
}
}
}
}
function isTopLevelDomain(lastPart) {
return (topLevelDomains.find(s => s === lastPart));
}
function isSubDomain(lastPart) {
return (subdomains.find(s => s === lastPart));
}
extract("joe#example.com");
extract("joe#subdomain1.example.co.uk");
extract("joe#subdomain2.example.edu");
extract("joe#subdomain3.example.ltd.uk");
extract("joe#test.subdomain3.example.plc.uk");
Please challenge the logic, if I got this wrong.

// Not a proper solution because of email pattern is not fixed. Use below if it is appropriate solution according to your problem .
jQuery( document ).ready(function() {
//var input = 'joe#subdomain1.com';
var input = 'joe#subdomain1.example.com.uk';
var first_split = input.split("#")[1];
var second_split = first_split.split(".");
if(second_split.length == 2) {
console.log('domain is : '+first_split);
} else if(second_split.length > 2) {
var str = first_split.substring(first_split.indexOf(".") + 1);
console.log('domain is : '+str);
}
});

Related

Complete url validation angular [duplicate]

I want to validate a URL and display message. Below is my code:
$("#pageUrl").keydown(function(){
$(".status").show();
var url = $("#pageUrl").val();
if(isValidURL(url)){
$.ajax({
type: "POST",
url: "demo.php",
data: "pageUrl="+ url,
success: function(msg){
if(msg == 1 ){
$(".status").html('<img src="images/success.gif"/><span><strong>SiteID:</strong>12345678901234456</span>');
}else{
$(".status").html('<img src="images/failure.gif"/>');
}
}
});
}else{
$(".status").html('<img src="images/failure.gif"/>');
}
});
function isValidURL(url){
var RegExp = /(ftp|http|https):\/\/(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-\/]))?/;
if(RegExp.test(url)){
return true;
}else{
return false;
}
}
My problem is now it will show an error message even when entering a proper URL until it matches regular expression, and it return true even if the URL is something like "http://wwww".
I appreciate your suggestions.
Someone mentioned the Jquery Validation plugin, seems overkill if you just want to validate the url, here is the line of regex from the plugin:
return this.optional(element) || /^(https?|ftp):\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i.test(value);
Here is where they got it from: http://projects.scottsplayground.com/iri/
Pointed out by #nhahtdh This has been updated to:
// Copyright (c) 2010-2013 Diego Perini, MIT licensed
// https://gist.github.com/dperini/729294
// see also https://mathiasbynens.be/demo/url-regex
// modified to allow protocol-relative URLs
return this.optional( element ) || /^(?:(?:(?:https?|ftp):)?\/\/)(?:\S+(?::\S*)?#)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})).?)(?::\d{2,5})?(?:[/?#]\S*)?$/i.test( value );
source: https://github.com/jzaefferer/jquery-validation/blob/c1db10a34c0847c28a5bd30e3ee1117e137ca834/src/core.js#L1349
It's not practical to parse URLs using regex. A full implementation of the RFC1738 rules would result in an enormously long regex (assuming it's even possible). Certainly your current expression fails many valid URLs, and passes invalid ones.
Instead:
a. use a proper URL parser that actually follows the real rules. (I don't know of one for JavaScript; it would probably be overkill. You could do it on the server side though). Or,
b. just trim away any leading or trailing spaces, then check it has one of your preferred schemes on the front (typically ‘http://’ or ‘https://’), and leave it at that. Or,
c. attempt to use the URL and see what lies at the end, for example by sending it am HTTP HEAD request from the server-side. If you get a 404 or connection error, it's probably wrong.
it return true even if url is something like "http://wwww".
Well, that is indeed a perfectly valid URL.
If you want to check whether a hostname such as ‘wwww’ actually exists, you have no choice but to look it up in the DNS. Again, this would be server-side code.
function validateURL(textval) {
var urlregex = /^(https?|ftp):\/\/([a-zA-Z0-9.-]+(:[a-zA-Z0-9.&%$-]+)*#)*((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|([a-zA-Z0-9-]+\.)*[a-zA-Z0-9-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(:[0-9]+)*(\/($|[a-zA-Z0-9.,?'\\+&%$#=~_-]+))*$/;
return urlregex.test(textval);
}
This can return true for URLs like:
http://stackoverflow.com/questions/1303872/url-validation-using-javascript
or:
http://regexlib.com/DisplayPatterns.aspx?cattabindex=1&categoryId=2
I written also a URL validation function base on rfc1738 and rfc3986 to check http and https urls. I try to hold this modular, so it can be better maintained and adapted to own requirements.
The RegExp in one line is show at end of this post.
The RegExp accept HTTP and HTTPS URLs with some international domain or IPv4 number. IPv6 is not supported yet.
window.isValidURL = (function() {// wrapped in self calling function to prevent global pollution
//URL pattern based on rfc1738 and rfc3986
var rg_pctEncoded = "%[0-9a-fA-F]{2}";
var rg_protocol = "(http|https):\\/\\/";
var rg_userinfo = "([a-zA-Z0-9$\\-_.+!*'(),;:&=]|" + rg_pctEncoded + ")+" + "#";
var rg_decOctet = "(25[0-5]|2[0-4][0-9]|[0-1][0-9][0-9]|[1-9][0-9]|[0-9])"; // 0-255
var rg_ipv4address = "(" + rg_decOctet + "(\\." + rg_decOctet + "){3}" + ")";
var rg_hostname = "([a-zA-Z0-9\\-\\u00C0-\\u017F]+\\.)+([a-zA-Z]{2,})";
var rg_port = "[0-9]+";
var rg_hostport = "(" + rg_ipv4address + "|localhost|" + rg_hostname + ")(:" + rg_port + ")?";
// chars sets
// safe = "$" | "-" | "_" | "." | "+"
// extra = "!" | "*" | "'" | "(" | ")" | ","
// hsegment = *[ alpha | digit | safe | extra | ";" | ":" | "#" | "&" | "=" | escape ]
var rg_pchar = "a-zA-Z0-9$\\-_.+!*'(),;:#&=";
var rg_segment = "([" + rg_pchar + "]|" + rg_pctEncoded + ")*";
var rg_path = rg_segment + "(\\/" + rg_segment + ")*";
var rg_query = "\\?" + "([" + rg_pchar + "/?]|" + rg_pctEncoded + ")*";
var rg_fragment = "\\#" + "([" + rg_pchar + "/?]|" + rg_pctEncoded + ")*";
var rgHttpUrl = new RegExp(
"^"
+ rg_protocol
+ "(" + rg_userinfo + ")?"
+ rg_hostport
+ "(\\/"
+ "(" + rg_path + ")?"
+ "(" + rg_query + ")?"
+ "(" + rg_fragment + ")?"
+ ")?"
+ "$"
);
// export public function
return function (url) {
if (rgHttpUrl.test(url)) {
return true;
} else {
return false;
}
};
})();
RegExp in one line:
var rg = /^(http|https):\/\/(([a-zA-Z0-9$\-_.+!*'(),;:&=]|%[0-9a-fA-F]{2})+#)?(((25[0-5]|2[0-4][0-9]|[0-1][0-9][0-9]|[1-9][0-9]|[0-9])(\.(25[0-5]|2[0-4][0-9]|[0-1][0-9][0-9]|[1-9][0-9]|[0-9])){3})|localhost|([a-zA-Z0-9\-\u00C0-\u017F]+\.)+([a-zA-Z]{2,}))(:[0-9]+)?(\/(([a-zA-Z0-9$\-_.+!*'(),;:#&=]|%[0-9a-fA-F]{2})*(\/([a-zA-Z0-9$\-_.+!*'(),;:#&=]|%[0-9a-fA-F]{2})*)*)?(\?([a-zA-Z0-9$\-_.+!*'(),;:#&=\/?]|%[0-9a-fA-F]{2})*)?(\#([a-zA-Z0-9$\-_.+!*'(),;:#&=\/?]|%[0-9a-fA-F]{2})*)?)?$/;
In a similar situation I got away with this:
someUtils.validateURL = function(url) {
var parser = document.createElement('a');
try {
parser.href = url;
return !!parser.hostname;
} catch (e) {
return false;
}
};
i.e. why invent the wheel if browsers can do it for you? But, of course, this will only work in the browser.
there are various parts of parsed URL exactly how browser would interpret it:
parser.protocol; // => "http:"
parser.hostname; // => "example.com"
parser.port; // => "8080"
parser.pathname; // => "/path/"
parser.search; // => "?search=test"
parser.hash; // => "#hash"
parser.host; // => "example.com:3000"
Using these you can improve your validating function depending on the requirements. The only drawback is that it will accept relative URLs and use current page server's host and port. But you can use it for your advantage, by re-assembling the URL from parts and always passing it in full to your AJAX service.
What validateURL won't accept is invalid URL, e.g. http:\:8883 will return false, but :1234 is valid and is interpreted as http://pagehost.example.com/:1234 i.e. as a relative path.
UPDATE
This approach is no longer working with Chrome and other WebKit browsers. Even when URL is invalid, hostname is filled with some value, e.g. taken from base. It still helps to parse parts of URL, but will not allow to validate one.
Possible better no-own-parser approach is to use var parsedURL = new URL(url) and catch exceptions. See e.g. URL API. Supported by all major browsers and NodeJS, although still marked experimental.
best regex I found from http://angularjs.org/
var urlregex = /^(ftp|http|https):\/\/(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-\/]))?$/;
This is what worked for me:
function validateURL(value) {
return /^(https?|ftp):\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i.test(value);
}
from there is is just a matter of calling the function to get a true or false back:
validateURL(urltovalidate);
I know it's quite an old question but since it does not have any accepted answer, I suggest you to use the URI.js framework: https://github.com/medialize/URI.js
You can use it to check for malformed URI using a try/catch block:
function isValidURL(url)
{
try {
(new URI(url));
return true;
}
catch (e) {
// Malformed URI
return false;
}
}
Of course it will consider something like "%#" as a well formed relative URI... So I suggest you read the URI.js API to perform more checks, for example if you want to make sure that the user entered a well formed absolute URL you may do like this:
function isValidURL(url)
{
try {
var uri = new URI(url);
// URI has a scheme and a host
return (!!uri.scheme() && !!uri.host());
}
catch (e) {
// Malformed URI
return false;
}
}
Import in an npm package like
https://www.npmjs.com/package/valid-url
and use it to validate your url.
You can use the URL API that is recently standard. Browser support is sketchy at best, see the link. new URL(str) is guaranteed to throw TypeError for invalid URLs.
As stated above, http://wwww is a valid URL.
The URL API can be used to validate the structure of a URL string.
An error is thrown when trying to serialise an invalid URL string into a URL object. This could be abstracted into a helper function (Typescript snippet below):
function isValidURL(URL: string) : boolean {
try {
new URL(string);
return true;
} catch (err) { return false; }
}
isValidURL('https://www.google.com'); // returns true
isValidURL('localhost:3000'); // returns true
isValidURL('not-a-valid-url'); // returns false
isValidURL('google.com'); // returns false (see footnote)
If you strictly want HTTP / web links to be valid, we can simply add a condition to the return statement:
...
const url = new URL(string);
return url.protocol === 'https:' || url.protocol === 'http:';
...
Granted, this approach comes with a few caveats:
No support for the URL API in Internet Explorer (could be fixed with a polyfill)
Without additional checks, URLs without either a protocol or port are seen as invalid (e.g. google.com is invalid but google.com:3000 is OK). This may be an unintended behaviour for some usecases.
If you're looking for a more reliable regex, check out RegexLib. Here's the page you'd probably be interested in:
http://regexlib.com/Search.aspx?k=url
As for the error messages showing while the person is still typing, change the event from keydown to blur and then it will only check once the person moves to the next element.
var RegExp = (/^HTTP|HTTP|http(s)?:\/\/(www\.)?[A-Za-z0-9]+([\-\.]{1}[A-Za-z0-9]+)*\.[A-Za-z]{2,40}(:[0-9]{1,40})?(\/.*)?$/);
My solution:
function isValidUrl(t)
{
return t.match(/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+)(:(\d+))?\/?/i)
}
Demo : http://jsbin.com/uzimeb/1/edit
function checkURL(value) {
var urlregex = new RegExp("^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*#)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$");
if (urlregex.test(value)) {
return (true);
}
return (false);
}
I have found a great resource for comparing different solutions:
https://mathiasbynens.be/demo/url-regex
According to that page, only solution from diegoperini passes all tests. Here is that regex:
_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS
I checked a lot of url validators in google and no one works for me. For example I'd like to see valid on links like 'aa.com'. I like silly check for dot sign in string.
function isValidUri(str) {
var dotIndex = str.indexOf('.');
return (dotIndex > 0 && dotIndex < str.length - 2);
}
It should not stay on beginning and end of string (for now we don't have top level domain names with one character).
Here's a regular expression which might fit the bill (it's very long):
/^(?:\u0066\u0069\u006C\u0065\u003A\u002F{2}(?:\u002F{2}(?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*\u0040)?(?:\u005B(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){6}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){5}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){4}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A)?[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){3}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,2}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,3}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,4}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,5}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,6}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2})\u005D|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])|(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039](?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D]+)?[\u0041-\u005A\u0061-\u007A\u0030-\u0039])?|(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039](?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D]+)?[\u0041-\u005A\u0061-\u007A\u0030-\u0039])?\u002E)+[\u0041-\u005A\u0061-\u007A\u0030-\u0039](?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D]+)?[\u0041-\u005A\u0061-\u007A\u0030-\u0039])?))(?:\u003A(?:\u0030-\u0035\u0030-\u0039{0,4}|\u0036\u0030-\u0034\u0030-\u0039{3}|\u0036\u0035\u0030-\u0034\u0030-\u0039{2}|\u0036\u0035\u0035\u0030-\u0032\u0030-\u0039|\u0036\u0035\u0035\u0033\u0030-\u0035))?(?:\u002F(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)*|\u002F(?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])+(?:\u002F(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)*)?|(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])+(?:\u002F(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)*)|[\u0041-\u005A\u0061-\u007A][\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002B\u002D\u002E]*\u003A(?:\u002F{2}(?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*\u0040)?(?:\u005B(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){6}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){5}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){4}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A)?[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){3}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,2}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,3}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,4}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,5}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,6}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2})\u005D|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])|(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039](?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D]+)?[\u0041-\u005A\u0061-\u007A\u0030-\u0039])?|(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039](?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D]+)?[\u0041-\u005A\u0061-\u007A\u0030-\u0039])?\u002E)+[\u0041-\u005A\u0061-\u007A\u0030-\u0039](?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D]+)?[\u0041-\u005A\u0061-\u007A\u0030-\u0039])?))(?:\u003A(?:\u0030-\u0035\u0030-\u0039{0,4}|\u0036\u0030-\u0034\u0030-\u0039{3}|\u0036\u0035\u0030-\u0034\u0030-\u0039{2}|\u0036\u0035\u0035\u0030-\u0032\u0030-\u0039|\u0036\u0035\u0035\u0033\u0030-\u0035))?(?:\u002F(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)*|\u002F(?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])+(?:\u002F(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)*)?|(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])+(?:\u002F(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)*)(?:\u003F(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040\u002F\u003F]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)?(?:\u0023(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040\u002F\u003F]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)?)$/
There are some caveats to its usage, namely it does not validate URIs which contain additional information after the user name (e.g. "username:password"). Also, only IPv6 addresses can be contained within the IP literal syntax and the "IPvFuture" syntax is currently ignored and will not validate against this regular expression. Port numbers are also constrained to be between 0 and 65,535. Also, only the file scheme can use triple slashes (e.g. "file:///etc/sysconfig") and can ignore both the query and fragment parts of a URI. Finally, it is geared towards regular URIs and not IRIs, hence the extensive focus on the ASCII character set.
This regular expression could be expanded upon, but it's already complex and long enough as it is. I also cannot guarantee it's going to be "100% accurate" or "bug free", but it should correctly validate URIs for all schemes.
You will need to do additional verification for any scheme-specific requirements or do URI normalization as this regular expression will validate a very broad range of URIs.
Try edit your isValidURL function as follows:
function isValidURL(url) {
var encodedURL = encodeURIComponent(url);
var isValid = false;
$.ajax({
url: "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22" + encodedURL + "%22&format=json",
type: "get",
async: false,
dataType: "json",
success: function(data) {
isValid = data.query.results != null;
},
error: function(){
isValid = false;
}
});
return isValid;
}
This should do the trick.

JavaScript Regex URL extract domain only

Currently I can extract the 'domain' from any URL with the following regex:
/^(?:https?:\/\/)?(?:[^#\n]+#)?(?:www\.)?([^:\/\n\?\=]+)/im
However I'm also getting subdomain's too which I want to avoid. For example if I have sites:
www.google.com
yahoo.com/something
freds.meatmarket.co.uk?someparameter
josh.meatmarket.co.uk/asldf/asdf
I currently get:
google.com
yahoo.com
freds.meatmarket.co.uk
josh.meatmarket.co.uk
Those last two I would like to exclude the freds and josh subdomain portion and extract only the true domain which would just be meatmarket.co.uk.
I did find another SOF that tries to solve in PHP, unfortunately I don't know PHP. is this translatable to JS (I'm actually using Google Script FYI)?
function topDomainFromURL($url) {
$url_parts = parse_url($url);
$domain_parts = explode('.', $url_parts['host']);
if (strlen(end($domain_parts)) == 2 ) {
// ccTLD here, get last three parts
$top_domain_parts = array_slice($domain_parts, -3);
} else {
$top_domain_parts = array_slice($domain_parts, -2);
}
$top_domain = implode('.', $top_domain_parts);
return $top_domain;
}
So, you need firstmost hostname stripped from your result, unless there only two parts already?
Just postprocess your result from first match with regexp matching that condition:
function domain_from_url(url) {
var result
var match
if (match = url.match(/^(?:https?:\/\/)?(?:[^#\n]+#)?(?:www\.)?([^:\/\n\?\=]+)/im)) {
result = match[1]
if (match = result.match(/^[^\.]+\.(.+\..+)$/)) {
result = match[1]
}
}
return result
}
console.log(domain_from_url("www.google.com"))
console.log(domain_from_url("yahoo.com/something"))
console.log(domain_from_url("freds.meatmarket.co.uk?someparameter"))
console.log(domain_from_url("josh.meatmarket.co.uk/asldf/asdf"))
// google.com
// yahoo.com
// meatmarket.co.uk
// meatmarket.co.uk
Try this:
https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.([a-z]{2,6}){1}
Try to replace www by something else:
/^(?:https?:\/\/)?(?:[^#\n]+#)?(?:[^.]+\.)?([^:\/\n\?\=]+)/im
EDIT:
If you absolutely want to preserve the www into your regex, you could try this one:
/^(?:https?:\/\/)?(?:[^#\n]+#)?(?:www\.)?(?:[^.]+\.)?([^:\/\n\?\=]+)/im
export const extractHostname = url => {
let hostname;
// find & remove protocol (http, ftp, etc.) and get hostname
if (url.indexOf("://") > -1)
{
hostname = url.split('/')[2];
}
else
{
hostname = url.split('/')[0];
}
// find & remove port number
hostname = hostname.split(':')[0];
// find & remove "?"
hostname = hostname.split('?')[0];
return hostname;
};
export const extractRootDomain = url => {
let domain = extractHostname(url),
splitArr = domain.split('.'),
arrLen = splitArr.length;
// extracting the root domain here
// if there is a subdomain
if (arrLen > 2)
{
domain = splitArr[arrLen - 2] + '.' + splitArr[arrLen - 1];
// check to see if it's using a Country Code Top Level Domain (ccTLD) (i.e. ".me.uk")
if (splitArr[arrLen - 2].length === 2 && splitArr[arrLen - 1].length === 2)
{
//this is using a ccTLD
domain = splitArr[arrLen - 3] + '.' + domain;
}
}
return domain;
};
This is what I've come up with. I don't know how to combine the two match rules into a single regexp, however. This routine won't properly process bad domains like example..com. It does, however, account for TLDs that are in the variety of .xx, .xx.xx, .xxx, or more than 4 character TLDs on the end. This routine will work on just domain names or entire URLs, and the URLs don't have to have the http or https protocol -- it could be ftp, chrome, and others.
function getRootDomain(s){
var sResult = ''
try {
sResult = s.match(/^(?:.*\:\/?\/)?(?<domain>[\w\-\.]*)/i).groups.domain
.match(/(?<root>[\w\-]*(\.\w{3,}|\.\w{2}|\.\w{2}\.\w{2}))$/).groups.root;
} catch(ignore) {}
return sResult;
}
So basically, the first routine strips out any potential stuff before the ://, if that exists, or just a :, if that exists. Next, it looks for all non-word boundary stuff except allows the dash and period like you'd potentially see in domains. It labels this into a named capture group called domain. It also prevents the domain match from including a port such as :8080 as an example. If given an empty string, it just returns an empty string back.
From there, we then do another pass on this and instead of looking from the left-to-right like you would with the preceding ^ symbol, we use the ending $ symbol, working right-to-left, and allow only 4 conditions on the end: .xx.xx, .xx, .xxx, or more than .xxx (such as 4+ character TLDs), where x is a non-word boundary item. Note the {3,} -- that means 3 or more of something, which is why we handle the TLDs that are 3 or more characters too. From there, we allow for a non-word boundary in front of that which may include dashes and periods.
EDIT: Since posting this answer, I learned how to combine the full domain and the root part into one single RegExp. However, I'll keep the above for reasons where you may want to get both values, although the function only returned the root (but with a quick edit, could have returned both full domain and root domain). So, if you just want the root alone, then you could use this solution:
function getRootDomain(s){
var sResult = ''
try {
sResult = s.match(/^(?:.*?:\/\/)?.*?(?<root>[\w\-]*(?:\.\w{2,}|\.\w{2}\.\w{2}))(?:[\/?#:]|$)/).groups.root;
} catch(ignore) {}
return sResult;
}

Get the domain name of the subdomain Javascript

How i can get the domain name example.com from the set of possible subdomains sub1.example.com sub2.example.com sub3.example.com using javascript ...?
var parts = location.hostname.split('.');
var subdomain = parts.shift();
var upperleveldomain = parts.join('.');
To get only the second-level-domain, you might use
var parts = location.hostname.split('.');
var sndleveldomain = parts.slice(-2).join('.');
The accepted answer will work to get the second level domain. However, there is something called "public suffixes" that you may want to take into account. Otherwise, you may get unexpected and erroneous results.
For example, take the domain www.amazon.co.uk.
If you just try getting the second level domain, you'll end up with co.uk, which is probably not what you want. That's because co.uk is a "public suffix", which means it's essentially a top level domain. Here's the definition of a public suffix, taken from https://publicsuffix.org:
A "public suffix" is one under which Internet users can (or historically could) directly register names.
If this is a crucial part of your application, I would look into something like psl (https://github.com/lupomontero/psl) for domain parsing. It works in nodejs and the browser, and it's tested on Mozilla's maintained public suffix list.
Here's a quick example from their README:
var psl = require('psl');
// TLD with some 2-level rules.
psl.get('uk.com'); // null);
psl.get('example.uk.com'); // 'example.uk.com');
psl.get('b.example.uk.com'); // 'example.uk.com');
This is faster
const firstDotIndex = subDomain.indexOf('.');
const domain = subDomain.substring(firstDotIndex + 1);
The generic solution is explained here http://rossscrivener.co.uk/blog/javascript-get-domain-exclude-subdomain
From above link
var domain = (function(){
var i=0,domain=document.domain,p=domain.split('.'),s='_gd'+(new Date()).getTime();
while(i<(p.length-1) && document.cookie.indexOf(s+'='+s)==-1){
domain = p.slice(-1-(++i)).join('.');
document.cookie = s+"="+s+";domain="+domain+";";
}
document.cookie = s+"=;expires=Thu, 01 Jan 1970 00:00:01 GMT;domain="+domain+";";
return domain;
})();
function getDomain() {
const hostnameArray = window.location.hostname.split('.')
const numberOfSubdomains = hostnameArray.length - 2
return hostnameArray.length === 2 ? window.location.hostname : hostnameArray.slice(numberOfSubdomains).join('.')
}
console.log(getDomain());
This will remove all subdomains, so "a.b.c.d.test.com" will become "test.com"
If you want to verify if a specific subdomain exists
var parts = location.hostname.split('.');
if(parts.includes('subdomain_to_search_here')){
//yes
}else{
//no
}
some more robust version, which is independent of the subdomain count
function getDomain() {
const hostname = window.location.hostname.split('.');
hostname.reverse();
return `${hostname[1]}.${hostname[0]}`;
}

Improving regex for parsing YouTube / Vimeo URLs

I've made a function (in JavaScript) that takes an URL from either YouTube or Vimeo. It figures out the provider and ID for that particular video (demo: http://jsfiddle.net/csjwf/).
function parseVideoURL(url) {
var provider = url.match(/http:\/\/(:?www.)?(\w*)/)[2],
id;
if(provider == "youtube") {
id = url.match(/http:\/\/(?:www.)?(\w*).com\/.*v=(\w*)/)[2];
} else if (provider == "vimeo") {
id = url.match(/http:\/\/(?:www.)?(\w*).com\/(\d*)/)[2];
} else {
throw new Error("parseVideoURL() takes a YouTube or Vimeo URL");
}
return {
provider : provider,
id : id
}
}
It works, however as a regex Novice, I'm looking for ways to improve it. The input I'm dealing with, typically looks like this:
http://vimeo.com/(id)
http://youtube.com/watch?v=(id)&blahblahblah.....
1) Right now I'm doing three separate matches, would it make sense to try and do everything in one single expression? If so, how?
2) Could the existing matches be more concise? Are they unnecessarily complex? or perhaps insufficient?
3) Are there any YouTube or Vimeo URL's that would fail being parsed? I've tried quite a few and so far it seems to work pretty well.
To summarize: I'm simply looking for ways improve the above function. Any advice is greatly appreciated.
Here's my attempt at the regex, which covers most updated cases:
function parseVideo(url) {
// - Supported YouTube URL formats:
// - http://www.youtube.com/watch?v=My2FRPA3Gf8
// - http://youtu.be/My2FRPA3Gf8
// - https://youtube.googleapis.com/v/My2FRPA3Gf8
// - Supported Vimeo URL formats:
// - http://vimeo.com/25451551
// - http://player.vimeo.com/video/25451551
// - Also supports relative URLs:
// - //player.vimeo.com/video/25451551
url.match(/(https?\/\/)(player.|www.)?(vimeo\.com|youtu(be\.com|\.be|be\.googleapis\.com))\/(video\/|embed\/|watch\?v=|v\/)?([A-Za-z0-9._%-]*)(\&\S+)?/);
var type = null;
if (RegExp.$3.indexOf('youtu') > -1) {
type = 'youtube';
} else if (RegExp.$3.indexOf('vimeo') > -1) {
type = 'vimeo';
}
return {
type: type,
id: RegExp.$6
};
}
Regex is wonderfully terse but can quickly get complicated.
http://jsfiddle.net/8nagx2sk/
function parseYouTube(str) {
// link : //youtube.com/watch?v=Bo_deCOd1HU
// share : //youtu.be/Bo_deCOd1HU
// embed : //youtube.com/embed/Bo_deCOd1HU
var re = /\/\/(?:www\.)?youtu(?:\.be|be\.com)\/(?:watch\?v=|embed\/)?([a-z0-9_\-]+)/i;
var matches = re.exec(str);
return matches && matches[1];
}
function parseVimeo(str) {
// embed & link: http://vimeo.com/86164897
var re = /\/\/(?:www\.)?vimeo.com\/([0-9a-z\-_]+)/i;
var matches = re.exec(str);
return matches && matches[1];
}
Sometimes simple code is nicer to your fellow developers.
https://jsfiddle.net/vkg02mhp/1/
// protocol and www nuetral
function getVideoId(str, prefixes) {
const cleaned = str.replace(/^(https?:)?\/\/(www\.)?/, '');
for(const prefix of prefixes) {
if (cleaned.startsWith(prefix))
return cleaned.substr(prefix.length)
}
return undefined;
}
function getYouTubeId(url) {
return getVideoId(url, [
'youtube.com/watch?v=',
'youtu.be/',
'youtube.com/embed/'
]);
}
function getVimeoId(url) {
return getVideoId(url, [
'vimeo.com/'
]);
}
Which do you prefer to update?
I am not sure about your question 3), but provided that your induction on the url forms is correct, the regexes can be combined into one as follows:
/http:\/\/(?:www.)?(?:(vimeo).com\/(.*)|(youtube).com\/watch\?v=(.*?)&)/
You will get the match under different positions (1st and 2nd matches if vimeo, 3rd and 4th matches if youtube), so you just need to handle that.
Or, if you are quite sure that vimeo's id only includes numbers, then you can do:
/http:\/\/(?:www.)?(vimeo|youtube).com\/(?:watch\?v=)?(.*?)(?:\z|&)/
and the provider and the id will apprear under 1st and 2nd match, respcetively.
Here is my regex
http://jsfiddle.net/csjwf/1/
For Vimeo, Don't rely on Regex as Vimeo tends to change/update their URL pattern every now and then. As of October 2nd, 2017, there are in total of six URL schemes Vimeo supports.
https://vimeo.com/*
https://vimeo.com/*/*/video/*
https://vimeo.com/album/*/video/*
https://vimeo.com/channels/*/*
https://vimeo.com/groups/*/videos/*
https://vimeo.com/ondemand/*/*
Instead, use their API to validate vimeo URLs. Here is this oEmbed (doc) API which takes an URL, checks its validity and return a object with bunch of video information(check out the dev page). Although not intended but we can easily use this to validate whether a given URL is from Vimeo or not.
So, with ajax it would look like this,
var VIMEO_BASE_URL = "https://vimeo.com/api/oembed.json?url=";
var yourTestUrl = "https://vimeo.com/23374724";
$.ajax({
url: VIMEO_BASE_URL + yourTestUrl,
type: 'GET',
success: function(data) {
if (data != null && data.video_id > 0)
// Valid Vimeo url
else
// not a valid Vimeo url
},
error: function(data) {
// not a valid Vimeo url
}
});
about sawa's answer :
a little update on the second regex :
/http:\/\/(?:www\.)?(vimeo|youtube)\.com\/(?:watch\?v=)?(.*?)(?:\z|$|&)/
(escaping the dots prevents from matching url of type www_vimeo_com/… and $ added…)
here is the same idea for matching the embed urls :
/http:\/\/(?:www\.|player\.)?(vimeo|youtube)\.com\/(?:embed\/|video\/)?(.*?)(?:\z|$|\?)/
FWIW, I just used the following to validate and parse both YouTube and Vimeo URLs in an app. I'm sure you could add parentheses to parse out the specific things you're looking for...
/^(?:https?:\/\/)?(?:www\.)?(?:youtu\.be\/|youtube\.com\/(?:embed\/|v\/|watch\?v=|watch\?.+&v=))((\w|-){11})(?:\S+)?$|^(https?:\/\/)?(www.)?(player.)?vimeo.com\/([a-z]*\/)*([0-9]{6,11})[?]?.*$/
^^ This is just a combination of 2 separate expressions using | (or) to join them. Here are the original 2 expressions separately:
/^(?:https?:\/\/)?(?:www\.)?(?:youtu\.be\/|youtube\.com\/(?:embed\/|v\/|watch\?v=|watch\?.+&v=))((\w|-){11})(?:\S+)?$/
/^(https?:\/\/)?(www.)?(player.)?vimeo.com\/([a-z]*\/)*([0-9]{6,11})[?]?.*$/
I'm no expert, but it seems to work according to Rubular. Hopefully this helps someone out in the future.
3) Your regex does not match https url's. I haven't tested it, but I guess the "http://" part would become "http(s)?://". Note that this would change the matching positions of the provider and id.
Just in case here is a php version
/*
* parseVideo
* #param (string) $url
* mi-ca.ch 27.05.2016
* parse vimeo & youtube id
* format url for iframe embed
* https://regex101.com/r/lA0fP4/1
*/
function parseVideo($url) {
$re = "/(http:|https:|)\\/\\/(player.|www.)?(vimeo\\.com|youtu(be\\.com|\\.be|be\\.googleapis\\.com))\\/(video\\/|embed\\/|watch\\?v=|v\\/)?([A-Za-z0-9._%-]*)(\\&\\S+)?/";
preg_match($re, $url, $matches);
if(strrpos($matches[3],'youtu')>-1){
$type='youtube';
$src='https://www.youtube.com/embed/'.$matches[6];
}else if(strrpos($matches[3],'vimeo')>-1){
$type="vimeo";
$src='https://player.vimeo.com/video/'.$matches[6];
}else{
return false;
}
return array(
'type' => $type // return youtube or vimeo
,'id' => $matches[6] // return the video id
,'src' => $src // return the src for iframe embed
);
}
I had a task to enable adding a dropbox videos. So the same input should take href, check it and transform to the playable link which I can then insert in .
const getPlayableUrl = (url) => {
// Check youtube and vimeo
let firstCheck = url.match(/(http:|https:|)\/\/(player.|www.)?(vimeo\.com|youtu(be\.com|\.be|be\.googleapis\.com))\/(video\/|embed\/|watch\?v=|v\/)?([A-Za-z0-9._%-]*)(\&\S+)?/);
if (firstCheck) {
if (RegExp.$3.indexOf('youtu') > -1) {
return "//www.youtube.com/embed/" + RegExp.$6;
} else if (RegExp.$3.indexOf('vimeo') > -1) {
return 'https://player.vimeo.com/video/' + RegExp.$6
}
} else {
// Check dropbox
let candidate = ''
if (url.indexOf('.mp4') !== -1) {
candidate = url.slice(0, url.indexOf('.mp4') + 4)
} else if (url.indexOf('.m4v') !== -1) {
candidate = url.slice(0, url.indexOf('.m4v') + 4)
} else if (url.indexOf('.webm') !== -1) {
candidate = url.slice(0, url.indexOf('.webm') + 5)
}
let secondCheck = candidate.match(/(http:|https:|)\/\/(player.|www.)?(dropbox\.com)\/(s\/|embed\/|watch\?v=|v\/)?([A-Za-z0-9._%-]*\/)?(.*)/);
if (secondCheck) {
return 'https://dropbox.com/' + RegExp.$4 + RegExp.$5 + RegExp.$6 + '?raw=1'
} else {
throw Error("Not supported video resource.");
}
}
}
I based myself the previous answers but I needed more out the regex.
Maybe it worked in 2011 but in 2019 the syntax has changed a bit. So this is a refresh.
The regex will allow us to detect weather the url is Youtube or Vimeo.
I've added Capture group to easily retrieve the videoID.
If ran with Case insensitive setting please remove the (?i).
(?:(?i)(?:https:|http:)?\/\/)?(?:(?i)(?:www\.youtube\.com\/(?:embed\/|watch\?v=)|youtu\.be\/|youtube\.googleapis\.com\/v\/)(?<YoutubeID>[a-z0-9-_]{11,12})|(?:vimeo\.com\/|player\.vimeo\.com\/video\/)(?<VimeoID>[0-9]+))
https://regex101.com/r/PVdjg0/2
Use this Regex devs:This works like Makhan(react js,Javascript)
^(http\:\/\/|https\:\/\/)?((www\.)?(vimeo\.com\/)([0-9]+)$)|((www\.youtube\.com|youtu\.be)\/.+$)

Trying to Validate URL Using JavaScript

I want to validate a URL and display message. Below is my code:
$("#pageUrl").keydown(function(){
$(".status").show();
var url = $("#pageUrl").val();
if(isValidURL(url)){
$.ajax({
type: "POST",
url: "demo.php",
data: "pageUrl="+ url,
success: function(msg){
if(msg == 1 ){
$(".status").html('<img src="images/success.gif"/><span><strong>SiteID:</strong>12345678901234456</span>');
}else{
$(".status").html('<img src="images/failure.gif"/>');
}
}
});
}else{
$(".status").html('<img src="images/failure.gif"/>');
}
});
function isValidURL(url){
var RegExp = /(ftp|http|https):\/\/(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-\/]))?/;
if(RegExp.test(url)){
return true;
}else{
return false;
}
}
My problem is now it will show an error message even when entering a proper URL until it matches regular expression, and it return true even if the URL is something like "http://wwww".
I appreciate your suggestions.
Someone mentioned the Jquery Validation plugin, seems overkill if you just want to validate the url, here is the line of regex from the plugin:
return this.optional(element) || /^(https?|ftp):\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i.test(value);
Here is where they got it from: http://projects.scottsplayground.com/iri/
Pointed out by #nhahtdh This has been updated to:
// Copyright (c) 2010-2013 Diego Perini, MIT licensed
// https://gist.github.com/dperini/729294
// see also https://mathiasbynens.be/demo/url-regex
// modified to allow protocol-relative URLs
return this.optional( element ) || /^(?:(?:(?:https?|ftp):)?\/\/)(?:\S+(?::\S*)?#)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})).?)(?::\d{2,5})?(?:[/?#]\S*)?$/i.test( value );
source: https://github.com/jzaefferer/jquery-validation/blob/c1db10a34c0847c28a5bd30e3ee1117e137ca834/src/core.js#L1349
It's not practical to parse URLs using regex. A full implementation of the RFC1738 rules would result in an enormously long regex (assuming it's even possible). Certainly your current expression fails many valid URLs, and passes invalid ones.
Instead:
a. use a proper URL parser that actually follows the real rules. (I don't know of one for JavaScript; it would probably be overkill. You could do it on the server side though). Or,
b. just trim away any leading or trailing spaces, then check it has one of your preferred schemes on the front (typically ‘http://’ or ‘https://’), and leave it at that. Or,
c. attempt to use the URL and see what lies at the end, for example by sending it am HTTP HEAD request from the server-side. If you get a 404 or connection error, it's probably wrong.
it return true even if url is something like "http://wwww".
Well, that is indeed a perfectly valid URL.
If you want to check whether a hostname such as ‘wwww’ actually exists, you have no choice but to look it up in the DNS. Again, this would be server-side code.
function validateURL(textval) {
var urlregex = /^(https?|ftp):\/\/([a-zA-Z0-9.-]+(:[a-zA-Z0-9.&%$-]+)*#)*((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|([a-zA-Z0-9-]+\.)*[a-zA-Z0-9-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(:[0-9]+)*(\/($|[a-zA-Z0-9.,?'\\+&%$#=~_-]+))*$/;
return urlregex.test(textval);
}
This can return true for URLs like:
http://stackoverflow.com/questions/1303872/url-validation-using-javascript
or:
http://regexlib.com/DisplayPatterns.aspx?cattabindex=1&categoryId=2
I written also a URL validation function base on rfc1738 and rfc3986 to check http and https urls. I try to hold this modular, so it can be better maintained and adapted to own requirements.
The RegExp in one line is show at end of this post.
The RegExp accept HTTP and HTTPS URLs with some international domain or IPv4 number. IPv6 is not supported yet.
window.isValidURL = (function() {// wrapped in self calling function to prevent global pollution
//URL pattern based on rfc1738 and rfc3986
var rg_pctEncoded = "%[0-9a-fA-F]{2}";
var rg_protocol = "(http|https):\\/\\/";
var rg_userinfo = "([a-zA-Z0-9$\\-_.+!*'(),;:&=]|" + rg_pctEncoded + ")+" + "#";
var rg_decOctet = "(25[0-5]|2[0-4][0-9]|[0-1][0-9][0-9]|[1-9][0-9]|[0-9])"; // 0-255
var rg_ipv4address = "(" + rg_decOctet + "(\\." + rg_decOctet + "){3}" + ")";
var rg_hostname = "([a-zA-Z0-9\\-\\u00C0-\\u017F]+\\.)+([a-zA-Z]{2,})";
var rg_port = "[0-9]+";
var rg_hostport = "(" + rg_ipv4address + "|localhost|" + rg_hostname + ")(:" + rg_port + ")?";
// chars sets
// safe = "$" | "-" | "_" | "." | "+"
// extra = "!" | "*" | "'" | "(" | ")" | ","
// hsegment = *[ alpha | digit | safe | extra | ";" | ":" | "#" | "&" | "=" | escape ]
var rg_pchar = "a-zA-Z0-9$\\-_.+!*'(),;:#&=";
var rg_segment = "([" + rg_pchar + "]|" + rg_pctEncoded + ")*";
var rg_path = rg_segment + "(\\/" + rg_segment + ")*";
var rg_query = "\\?" + "([" + rg_pchar + "/?]|" + rg_pctEncoded + ")*";
var rg_fragment = "\\#" + "([" + rg_pchar + "/?]|" + rg_pctEncoded + ")*";
var rgHttpUrl = new RegExp(
"^"
+ rg_protocol
+ "(" + rg_userinfo + ")?"
+ rg_hostport
+ "(\\/"
+ "(" + rg_path + ")?"
+ "(" + rg_query + ")?"
+ "(" + rg_fragment + ")?"
+ ")?"
+ "$"
);
// export public function
return function (url) {
if (rgHttpUrl.test(url)) {
return true;
} else {
return false;
}
};
})();
RegExp in one line:
var rg = /^(http|https):\/\/(([a-zA-Z0-9$\-_.+!*'(),;:&=]|%[0-9a-fA-F]{2})+#)?(((25[0-5]|2[0-4][0-9]|[0-1][0-9][0-9]|[1-9][0-9]|[0-9])(\.(25[0-5]|2[0-4][0-9]|[0-1][0-9][0-9]|[1-9][0-9]|[0-9])){3})|localhost|([a-zA-Z0-9\-\u00C0-\u017F]+\.)+([a-zA-Z]{2,}))(:[0-9]+)?(\/(([a-zA-Z0-9$\-_.+!*'(),;:#&=]|%[0-9a-fA-F]{2})*(\/([a-zA-Z0-9$\-_.+!*'(),;:#&=]|%[0-9a-fA-F]{2})*)*)?(\?([a-zA-Z0-9$\-_.+!*'(),;:#&=\/?]|%[0-9a-fA-F]{2})*)?(\#([a-zA-Z0-9$\-_.+!*'(),;:#&=\/?]|%[0-9a-fA-F]{2})*)?)?$/;
In a similar situation I got away with this:
someUtils.validateURL = function(url) {
var parser = document.createElement('a');
try {
parser.href = url;
return !!parser.hostname;
} catch (e) {
return false;
}
};
i.e. why invent the wheel if browsers can do it for you? But, of course, this will only work in the browser.
there are various parts of parsed URL exactly how browser would interpret it:
parser.protocol; // => "http:"
parser.hostname; // => "example.com"
parser.port; // => "8080"
parser.pathname; // => "/path/"
parser.search; // => "?search=test"
parser.hash; // => "#hash"
parser.host; // => "example.com:3000"
Using these you can improve your validating function depending on the requirements. The only drawback is that it will accept relative URLs and use current page server's host and port. But you can use it for your advantage, by re-assembling the URL from parts and always passing it in full to your AJAX service.
What validateURL won't accept is invalid URL, e.g. http:\:8883 will return false, but :1234 is valid and is interpreted as http://pagehost.example.com/:1234 i.e. as a relative path.
UPDATE
This approach is no longer working with Chrome and other WebKit browsers. Even when URL is invalid, hostname is filled with some value, e.g. taken from base. It still helps to parse parts of URL, but will not allow to validate one.
Possible better no-own-parser approach is to use var parsedURL = new URL(url) and catch exceptions. See e.g. URL API. Supported by all major browsers and NodeJS, although still marked experimental.
best regex I found from http://angularjs.org/
var urlregex = /^(ftp|http|https):\/\/(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-\/]))?$/;
This is what worked for me:
function validateURL(value) {
return /^(https?|ftp):\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i.test(value);
}
from there is is just a matter of calling the function to get a true or false back:
validateURL(urltovalidate);
I know it's quite an old question but since it does not have any accepted answer, I suggest you to use the URI.js framework: https://github.com/medialize/URI.js
You can use it to check for malformed URI using a try/catch block:
function isValidURL(url)
{
try {
(new URI(url));
return true;
}
catch (e) {
// Malformed URI
return false;
}
}
Of course it will consider something like "%#" as a well formed relative URI... So I suggest you read the URI.js API to perform more checks, for example if you want to make sure that the user entered a well formed absolute URL you may do like this:
function isValidURL(url)
{
try {
var uri = new URI(url);
// URI has a scheme and a host
return (!!uri.scheme() && !!uri.host());
}
catch (e) {
// Malformed URI
return false;
}
}
Import in an npm package like
https://www.npmjs.com/package/valid-url
and use it to validate your url.
You can use the URL API that is recently standard. Browser support is sketchy at best, see the link. new URL(str) is guaranteed to throw TypeError for invalid URLs.
As stated above, http://wwww is a valid URL.
The URL API can be used to validate the structure of a URL string.
An error is thrown when trying to serialise an invalid URL string into a URL object. This could be abstracted into a helper function (Typescript snippet below):
function isValidURL(URL: string) : boolean {
try {
new URL(string);
return true;
} catch (err) { return false; }
}
isValidURL('https://www.google.com'); // returns true
isValidURL('localhost:3000'); // returns true
isValidURL('not-a-valid-url'); // returns false
isValidURL('google.com'); // returns false (see footnote)
If you strictly want HTTP / web links to be valid, we can simply add a condition to the return statement:
...
const url = new URL(string);
return url.protocol === 'https:' || url.protocol === 'http:';
...
Granted, this approach comes with a few caveats:
No support for the URL API in Internet Explorer (could be fixed with a polyfill)
Without additional checks, URLs without either a protocol or port are seen as invalid (e.g. google.com is invalid but google.com:3000 is OK). This may be an unintended behaviour for some usecases.
If you're looking for a more reliable regex, check out RegexLib. Here's the page you'd probably be interested in:
http://regexlib.com/Search.aspx?k=url
As for the error messages showing while the person is still typing, change the event from keydown to blur and then it will only check once the person moves to the next element.
var RegExp = (/^HTTP|HTTP|http(s)?:\/\/(www\.)?[A-Za-z0-9]+([\-\.]{1}[A-Za-z0-9]+)*\.[A-Za-z]{2,40}(:[0-9]{1,40})?(\/.*)?$/);
My solution:
function isValidUrl(t)
{
return t.match(/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+)(:(\d+))?\/?/i)
}
Demo : http://jsbin.com/uzimeb/1/edit
function checkURL(value) {
var urlregex = new RegExp("^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*#)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$");
if (urlregex.test(value)) {
return (true);
}
return (false);
}
I have found a great resource for comparing different solutions:
https://mathiasbynens.be/demo/url-regex
According to that page, only solution from diegoperini passes all tests. Here is that regex:
_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS
I checked a lot of url validators in google and no one works for me. For example I'd like to see valid on links like 'aa.com'. I like silly check for dot sign in string.
function isValidUri(str) {
var dotIndex = str.indexOf('.');
return (dotIndex > 0 && dotIndex < str.length - 2);
}
It should not stay on beginning and end of string (for now we don't have top level domain names with one character).
Here's a regular expression which might fit the bill (it's very long):
/^(?:\u0066\u0069\u006C\u0065\u003A\u002F{2}(?:\u002F{2}(?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*\u0040)?(?:\u005B(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){6}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){5}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){4}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A)?[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){3}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,2}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,3}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,4}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,5}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,6}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2})\u005D|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])|(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039](?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D]+)?[\u0041-\u005A\u0061-\u007A\u0030-\u0039])?|(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039](?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D]+)?[\u0041-\u005A\u0061-\u007A\u0030-\u0039])?\u002E)+[\u0041-\u005A\u0061-\u007A\u0030-\u0039](?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D]+)?[\u0041-\u005A\u0061-\u007A\u0030-\u0039])?))(?:\u003A(?:\u0030-\u0035\u0030-\u0039{0,4}|\u0036\u0030-\u0034\u0030-\u0039{3}|\u0036\u0035\u0030-\u0034\u0030-\u0039{2}|\u0036\u0035\u0035\u0030-\u0032\u0030-\u0039|\u0036\u0035\u0035\u0033\u0030-\u0035))?(?:\u002F(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)*|\u002F(?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])+(?:\u002F(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)*)?|(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])+(?:\u002F(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)*)|[\u0041-\u005A\u0061-\u007A][\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002B\u002D\u002E]*\u003A(?:\u002F{2}(?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*\u0040)?(?:\u005B(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){6}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){5}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){4}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A)?[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){3}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,2}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,3}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,4}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035]))|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,5}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}|(?:(?:[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4}\u003A){0,6}[\u0030-\u0039\u0041-\u0046\u0061-\u0066]{1,4})?\u003A{2})\u005D|(?:(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])\u002E){3}(?:[\u0030-\u0039]|[\u0031-\u0039][\u0030-\u0039]|\u0031[\u0030-\u0039]{2}|\u0032[\u0030-\u0034][\u0030-\u0039]|\u0032\u0035[\u0030-\u0035])|(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039](?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D]+)?[\u0041-\u005A\u0061-\u007A\u0030-\u0039])?|(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039](?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D]+)?[\u0041-\u005A\u0061-\u007A\u0030-\u0039])?\u002E)+[\u0041-\u005A\u0061-\u007A\u0030-\u0039](?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D]+)?[\u0041-\u005A\u0061-\u007A\u0030-\u0039])?))(?:\u003A(?:\u0030-\u0035\u0030-\u0039{0,4}|\u0036\u0030-\u0034\u0030-\u0039{3}|\u0036\u0035\u0030-\u0034\u0030-\u0039{2}|\u0036\u0035\u0035\u0030-\u0032\u0030-\u0039|\u0036\u0035\u0035\u0033\u0030-\u0035))?(?:\u002F(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)*|\u002F(?:(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])+(?:\u002F(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)*)?|(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])+(?:\u002F(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)*)(?:\u003F(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040\u002F\u003F]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)?(?:\u0023(?:[\u0041-\u005A\u0061-\u007A\u0030-\u0039\u002D\u002E\u005F\u007E\u0021\u0024\u0026\u0027\u0028\u0029\u002A\u002B\u002C\u003B\u003D\u003A\u0040\u002F\u003F]|\u0025[\u0030-\u0039\u0041-\u0046\u0061-\u0066][\u0030-\u0039\u0041-\u0046\u0061-\u0066])*)?)$/
There are some caveats to its usage, namely it does not validate URIs which contain additional information after the user name (e.g. "username:password"). Also, only IPv6 addresses can be contained within the IP literal syntax and the "IPvFuture" syntax is currently ignored and will not validate against this regular expression. Port numbers are also constrained to be between 0 and 65,535. Also, only the file scheme can use triple slashes (e.g. "file:///etc/sysconfig") and can ignore both the query and fragment parts of a URI. Finally, it is geared towards regular URIs and not IRIs, hence the extensive focus on the ASCII character set.
This regular expression could be expanded upon, but it's already complex and long enough as it is. I also cannot guarantee it's going to be "100% accurate" or "bug free", but it should correctly validate URIs for all schemes.
You will need to do additional verification for any scheme-specific requirements or do URI normalization as this regular expression will validate a very broad range of URIs.
Try edit your isValidURL function as follows:
function isValidURL(url) {
var encodedURL = encodeURIComponent(url);
var isValid = false;
$.ajax({
url: "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22" + encodedURL + "%22&format=json",
type: "get",
async: false,
dataType: "json",
success: function(data) {
isValid = data.query.results != null;
},
error: function(){
isValid = false;
}
});
return isValid;
}
This should do the trick.

Categories

Resources