I have an array of strings, and I want to create a new array which contains the same strings without port numbers (the port number is a ":" followed by a number). For example if the string is "http://www.example.com:8080/hello/" Then it should be replaced with "http://www.example.com/hello/". How do I do it in JavaScript? I need it to call safari.extension.addContentScriptFromURL because the whitelist can't contain port numbers. If possible, it's better to replace the port number only between the second and third slash and leave the rest of the string unchanged.
You don't need any library or REGEX
https://developer.mozilla.org/en-US/docs/Web/API/URL
var url = new URL('http://localhost:8080');
url.port = '';
console.log(url.toString());
Regrards
One quite nifty way to do this, is to create an a element, and assign the URL you have as href - because the HTMLAnchorElement interface implements URLUtils, and therefor supports accessing the individual parts of the address in the same way the location object does, and you can set them individually as well:
var foo = document.createElement("a");
foo.href = "http://www.example.com:8080/hello/";
foo.port = ""
var newURL = foo.href;
console.log(newURL); // output: http://www.example.com/hello/
http://jsfiddle.net/pdymeb5d/
This should probably do what you want:
var newUrls = urls.map(function (url) {
return url.replace(/([a-zA-Z+.\-]+):\/\/([^\/]+):([0-9]+)\//, "$1://$2/");
});
Edit: It seems the schema part of URIs can contain "+", "." and "-" also. Changed the regular expression accordingly.
See: https://en.wikipedia.org/wiki/URI_scheme
You can use regex with string.replace() as follows,
var text = "http://www.example.com:8080/hello/";
var withNoDigits = text.replace(/[0-9]/g, '');
var outputString = withNoDigits.replace(/:([^:]*)$/,'$1');
alert(outputString);
function parseURL(url) {
var parser = document.createElement('a'),
searchObject = {},
queries, split, i;
parser.href = url;
queries = parser.search.replace(/^?/, '').split('&');
for( i = 0; i < queries.length; i++ ) {
split = queries[i].split('=');
searchObject[split[0]] = split[1];
}
return {
protocol: parser.protocol,
host: parser.host,
hostname: parser.hostname,
port: parser.port,
pathname: parser.pathname,
search: parser.search,
searchObject: searchObject,
hash: parser.hash
};
}
Use this to parse any URL and arrange in a format you prefer.
OK, I used your answer and changed it a little, because the protocol may contain dashes too:
var newUrls = urls.map(function(url) {
return url.replace(/([^\/\:]+):\/\/([^\/]+):([0-9]+)\//, "$1://$2/");
})
I have found best solution here.
var url = 'http://localhost:7001/www.facebook.com';
// Create a regex to match protocol, domain, and host
var matchProtocolDomainHost = /^.*\/\/[^\/]+:?[0-9]?\//i;
// Replace protocol, domain and host from url, assign tomyNewUrl
var myNewUrl = url.replace(matchProtocolDomainHost, '');
Now myNewUrl === 'www.facebook.com'.
Better to read full page.
remove hostname and port from url using regular expression
Related
var string = "https://example.com/app/something";
var string = "example.com/app/something";
new URL(string.origin)
If string have protocol all ok, and if not. have error Failed to construct 'URL': Invalid URL(…)
How can I obtain the root domain without using regex?
The question is still a bit unclear, and I'm not entirely sure how you're getting that string, but just for the sake of argument, here's a quick solution:
function getHostname(str)
{
str = (/^\w+:\/\//.test(str) ? "" : "http://") + str
return new URL(str).hostname;
}
console.log(getHostname("https://example.com/app/something"));
console.log(getHostname("example.com/app/something"));
Yes, technically, this technically does use a regular expression to check if the protocol is present, but it uses the URL class actually parse the host name.
Regex example:
var example1 = "www.example1.com/test/path";
var example2 = "https://example2.com/test/path";
var example3 = "http://subdomain.example3.com/test/path";
function getDomain(str) {
var matches = str.match(/^(?:https?:\/\/)?((?:[-A-Za-z0-9]+\.)+[A-Za-z]{2,6})/);
if (!matches || matches.length < 2) return '';
return matches[1];
}
console.log(getDomain(example1));
console.log(getDomain(example2));
console.log(getDomain(example3));
References:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match
http://regexr.com/
If I understand your question correctly, you want to check if the URL contains either the http or https protocol. This can easily be done with string functions built into JavaScript as shown below.
var string = window.location;
if (string.includes('http') || string.includes('https'))
{
//Do your logic here
}
UPDATE: Alternatively, you could use substring functionality shown below.
var string = window.location;
if (string.indexOf('http') == 0)
{
//Do your logic here
}
Note that this will also verify that the http is at the beginning of the string and not just thrown in willy nilly.
I have a whitelist where users can enter specific URLs/URL patterns (only targetting http and https.
I would like to transfrom and compare these URL/URL patterns so that the wildcard selector (*) can be used like so
user enters: example.*/test
I want to transform this to: *//*.example.*/test
so that it matches: http://www.example.com/test, https://example.co.uk/test
Another example:
user enters: http://www.*.com/*
I want to transform this to: http://www.*.com/*
so that it matches: http://www.blah.com/test, http://www.other.com/null.html
and
user enters: www.example.com/*
I want to transform this to: *//www.example.com/*
so that it matches: http://www.example.com/testtwo, https://www.example.com/arfg
The reason I want to insert a leading protocol (if it wasn't included by the user) is because I am using this to compare against the current tab URL.
I get this array of URL strings and would like to compare them with the current url, but am having trouble matching all use cases:
"isNotWhitelisted" : function(){
var whitelist = MyObject.userLists.whitelist;
var currentUrl = document.location.href;
for(var i=0; i<whitelist.length; i++){
var regexListItem = new RegExp(whitelist[i].toString().replace(".", "\\.").replace("*", ".+"));
if(currentUrl.match(regexListItem)) {
return false;
}
}
return true;
},
Firstly, the regex conversion matches end cases (e.g. example.com/* but not kinds like example.*/about
This is part of a Chrome extension, is there not a better/easier way to do this maybe using inbuilt methods?
Thanks for any help in advance.
whitelist.forEach(function(listItem){
var rgx = new RegExp(listItem.replace(/\./g,'\\.').replace(/\*/g,'.*'));
if(rgx.test(url)) {
// current URL matches URL/URL pattern in whitelist array!
}
})
If you dont replace, the pattern 'www.*.com' match also to 'wwwocom'.
If you want use another special characters you can use this:
var rgx = new RegExp(listItem.replace(/(\.|\[|\]|\{|\}|\(|\)|\+|\?|\\|\$|\^)/g,'\\$1').replace(/\*/g,'.*'));
If you want a greedy matching, I think you need request the user enter the pattern in this format: *://*/*
You can check this in this way:
var special_char_rgx = /(\.|\[|\]|\{|\}|\(|\)|\+|\?|\\|\/|\$|\^|\|)/g; // I think that all...
var asterisk_rgx = /\*/g;
var pattern_rgx = /^([^:\/]+):\/\/([^\/]+)\/(.*)$/g;
function addPatern(pattern, whitelist) {
var str_pattern = pattern.replace(asterisk_rgx,'\\*');
var isMatch = pattern_rgx.test(str_pattern);
if (isMatch) {
pattern = pattern.replace(special_char_rgx,'\\$1').replace(asterisk_rgx, '.+');
whitelist.push(new RegExp('^'+pattern + '$'));
}
pattern_rgx.lastIndex = 0; // Otherwise RegExp.test save this value and destroy the tests!
return isMatch;
}
If you want handle the protocol/ domain/ path in different ways you can do it that way:
if (isMatch) {
var protocol = RegExp.$1;
var domain= RegExp.$2;
var path_query = RegExp.$3;
// Your logic...
}
Hm, m.b. create RegExp from whitelist items? If it works as you expected:
new RegExp('example.com/*').test('http://example.com/aaaa')
Just create regexp from each item in whitelist
whitelist.forEach(function(item) {
new RegExp(item).match(URL);
});
I'm trying to come up with a regexp to get the page URL from the full URL but exclude a possible port number from it. So far I came up with the following JS:
var res = url.match(/^.*\:\/\/(?:www2?.)?([^?#]+)/i);
if(res)
{
var pageURL = res[1];
console.log(pageURL);
}
If I call it for this:
var url = "http://www.example.com/php/page.php?what=sw#print";
I get the correct answer: example.com/php/page.php
But if I do:
var url = "http://www.example.com:80/php/page.php?what=sw#print";
I need it to return example.com/php/page.php instead of example.com:80/php/page.php.
I can remove it with the second regexp, but I was curious if I could do it with just one (for speed)?
You can modify your regex to this:
/^.*\:\/\/(?:www2?.)?([^/:]+)(?:[^:]*:\d+)?([^?#]+)/i
RegEx Demo
It will return 2 matches:
1: example.com
2: /php/page.php
as match[1] and match[2] respectively for both inputs that you can concatenate.
http://www.example.com/php/page.php?what=sw#print
OR
http://www.example.com:80/php/page.php?what=sw#print
Update: Here are performance results on jsperf.com that shows regex method is fastest is of all.
Keep it simple:
~ node
> "http://www.example.com:3000/php/page.php?what=sw#print".replace(/:\d+/, '');
'http://www.example.com/php/page.php?what=sw#print'
> "http://www.example.com/php/page.php?what=sw#print".replace(/:\d+/, '');
'http://www.example.com/php/page.php?what=sw#print'
Why would you use a regex at all?
EDIT:
As pointed out by #c00000fd: Because document might not be available and document.createElement is very slow compared to RegExp - see:
http://jsperf.com/url-parsing/5
http://jsperf.com/hostname-from-url
Nevertheless I will leave my original answer for reference.
ORIGINAL ANSWER:
Instead you could just use the Anchor element:
Fiddle:
http://jsfiddle.net/12qjqx7n/
JS:
var url = 'http://foo:bar#www.example.com:8080/php/page.php?what=sw#print'
var a = document.createElement('a');
a.href = url;
console.log(a.hash);
console.log(a.host);
console.log(a.hostname);
console.log(a.origin);
console.log(a.password);
console.log(a.pathname);
console.log(a.port);
console.log(a.protocol);
console.log(a.search);
console.log(a.username);
Additional information:
http://www.w3schools.com/jsref/dom_obj_anchor.asp
How about a group for matching the port, if present?
var url = "http://www.example.com:80/php/page.php?what=sw#print";
var res = url.match(/^.*\:\/\/(?:www2?.)?([^?#\/:]+)(\:\d+)?(\/[^?#]+)/i);
if(res)
{
var pageURL = res[1]+res[3];
console.log(res, pageURL);
}
Try
var url = "http://www.example.com:80/php/page.php?what=sw#print";
var res = url.split(/\w+:\/\/+\w+\.|:+\d+|\?.*/).join("");
var url = "http://www.example.com:80/php/page.php?what=sw#print";
var res = url.split(/\w+:\/\/+\w+\.|:+\d+|\?.*/).join("");
document.body.innerText = res;
You could use replace method to modify your original string or Url,
> var url = "http://www.example.com/php/page.php?what=sw#print";
undefined
> var url1 = "http://www.example.com:80/php/page.php?what=sw#print";
undefined
> url.replace(/^.*?:\/\/(?:www2?.)?([^/:]+)(?::\d+)?([^?#]+).*$/g, "$1$2")
'example.com/php/page.php'
> url1.replace(/^.*?:\/\/(?:www2?.)?([^/:]+)(?::\d+)?([^?#]+).*$/g, "$1$2")
'example.com/php/page.php'
DEMO
I'm making a small web app in which a user enters a server URL from which it pulls a load of data with an AJAX request.
Since the user has to enter the URL manually, people generally forget the trailing slash, even though it's required (as some data is appended to the url entered). I need a way to check if the slash is present, and if not, add it.
This seems like a problem that jQuery would have a one-liner for, does anyone know how to do this or should I write a JS function for it?
var lastChar = url.substr(-1); // Selects the last character
if (lastChar != '/') { // If the last character is not a slash
url = url + '/'; // Append a slash to it.
}
The temporary variable name can be omitted, and directly embedded in the assertion:
if (url.substr(-1) != '/') url += '/';
Since the goal is changing the url with a one-liner, the following solution can also be used:
url = url.replace(/\/?$/, '/');
If the trailing slash exists, it is replaced with /.
If the trailing slash does not exist, a / is appended to the end (to be exact: The trailing anchor is replaced with /).
url += url.endsWith("/") ? "" : "/"
I added to the regex solution to accommodate query strings:
http://jsfiddle.net/hRheW/8/
url.replace(/\/?(\?|#|$)/, '/$1')
This works as well:
url = url.replace(/\/$|$/, '/');
Example:
let urlWithoutSlash = 'https://www.example.com/path';
urlWithoutSlash = urlWithoutSlash.replace(/\/$|$/, '/');
console.log(urlWithoutSlash);
let urlWithSlash = 'https://www.example.com/path/';
urlWithSlash = urlWithSlash.replace(/\/$|$/, '/');
console.log(urlWithSlash);
Output:
https://www.example.com/path/
https://www.example.com/path/
It replaces either the trailing slash or no trailing slash with a trailing slash. So if the slash is present, it replaces it with one (essentially leaving it there); if one is not present, it adds the trailing slash.
You can do something like:
var url = 'http://stackoverflow.com';
if (!url.match(/\/$/)) {
url += '/';
}
Here's the proof: http://jsfiddle.net/matthewbj/FyLnH/
The URL class is pretty awesome - it helps us change the path and takes care of query parameters and fragment identifiers
function addTrailingSlash(u) {
const url = new URL(u);
url.pathname += url.pathname.endsWith("/") ? "" : "/";
return url.toString();
}
addTrailingSlash('http://example.com/slug?page=2');
// result: "http://example.com/slug/?page=2"
You can read more about URL on MDN
Before finding this question and it's answers I created my own approach. I post it here as I don't see something similar.
function addSlashToUrl() {
//If there is no trailing shash after the path in the url add it
if (window.location.pathname.endsWith('/') === false) {
var url = window.location.protocol + '//' +
window.location.host +
window.location.pathname + '/' +
window.location.search;
window.history.replaceState(null, document.title, url);
}
}
Not every URL can be completed with slash at the end. There are at least several conditions that do not allow one:
String after last existing slash is something like index.html.
There are parameters: /page?foo=1&bar=2.
There is link to fragment: /page#tomato.
I have written a function for adding slash if none of the above cases are present. There are also two additional functions for checking the possibility of adding slash and for breaking URL into parts. Last one is not mine, I've given a link to the original one.
const SLASH = '/';
function appendSlashToUrlIfIsPossible(url) {
var resultingUrl = url;
var slashAppendingPossible = slashAppendingIsPossible(url);
if (slashAppendingPossible) {
resultingUrl += SLASH;
}
return resultingUrl;
}
function slashAppendingIsPossible(url) {
// Slash is possible to add to the end of url in following cases:
// - There is no slash standing as last symbol of URL.
// - There is no file extension (or there is no dot inside part called file name).
// - There are no parameters (even empty ones — single ? at the end of URL).
// - There is no link to a fragment (even empty one — single # mark at the end of URL).
var slashAppendingPossible = false;
var parsedUrl = parseUrl(url);
// Checking for slash absence.
var path = parsedUrl.path;
var lastCharacterInPath = path.substr(-1);
var noSlashInPathEnd = lastCharacterInPath !== SLASH;
// Check for extension absence.
const FILE_EXTENSION_REGEXP = /\.[^.]*$/;
var noFileExtension = !FILE_EXTENSION_REGEXP.test(parsedUrl.file);
// Check for parameters absence.
var noParameters = parsedUrl.query.length === 0;
// Check for link to fragment absence.
var noLinkToFragment = parsedUrl.hash.length === 0;
// All checks above cannot guarantee that there is no '?' or '#' symbol at the end of URL.
// It is required to be checked manually.
var NO_SLASH_HASH_OR_QUESTION_MARK_AT_STRING_END_REGEXP = /[^\/#?]$/;
var noStopCharactersAtTheEndOfRelativePath = NO_SLASH_HASH_OR_QUESTION_MARK_AT_STRING_END_REGEXP.test(parsedUrl.relative);
slashAppendingPossible = noSlashInPathEnd && noFileExtension && noParameters && noLinkToFragment && noStopCharactersAtTheEndOfRelativePath;
return slashAppendingPossible;
}
// parseUrl function is based on following one:
// http://james.padolsey.com/javascript/parsing-urls-with-the-dom/.
function parseUrl(url) {
var a = document.createElement('a');
a.href = url;
const DEFAULT_STRING = '';
var getParametersAndValues = function (a) {
var parametersAndValues = {};
const QUESTION_MARK_IN_STRING_START_REGEXP = /^\?/;
const PARAMETERS_DELIMITER = '&';
const PARAMETER_VALUE_DELIMITER = '=';
var parametersAndValuesStrings = a.search.replace(QUESTION_MARK_IN_STRING_START_REGEXP, DEFAULT_STRING).split(PARAMETERS_DELIMITER);
var parametersAmount = parametersAndValuesStrings.length;
for (let index = 0; index < parametersAmount; index++) {
if (!parametersAndValuesStrings[index]) {
continue;
}
let parameterAndValue = parametersAndValuesStrings[index].split(PARAMETER_VALUE_DELIMITER);
let parameter = parameterAndValue[0];
let value = parameterAndValue[1];
parametersAndValues[parameter] = value;
}
return parametersAndValues;
};
const PROTOCOL_DELIMITER = ':';
const SYMBOLS_AFTER_LAST_SLASH_AT_STRING_END_REGEXP = /\/([^\/?#]+)$/i;
// Stub for the case when regexp match method returns null.
const REGEXP_MATCH_STUB = [null, DEFAULT_STRING];
const URL_FRAGMENT_MARK = '#';
const NOT_SLASH_AT_STRING_START_REGEXP = /^([^\/])/;
// Replace methods uses '$1' to place first capturing group.
// In NOT_SLASH_AT_STRING_START_REGEXP regular expression that is the first
// symbol in case something else, but not '/' has taken first position.
const ORIGINAL_STRING_PREPENDED_BY_SLASH = '/$1';
const URL_RELATIVE_PART_REGEXP = /tps?:\/\/[^\/]+(.+)/;
const SLASH_AT_STRING_START_REGEXP = /^\//;
const PATH_SEGMENTS_DELIMITER = '/';
return {
source: url,
protocol: a.protocol.replace(PROTOCOL_DELIMITER, DEFAULT_STRING),
host: a.hostname,
port: a.port,
query: a.search,
parameters: getParametersAndValues(a),
file: (a.pathname.match(SYMBOLS_AFTER_LAST_SLASH_AT_STRING_END_REGEXP) || REGEXP_MATCH_STUB)[1],
hash: a.hash.replace(URL_FRAGMENT_MARK, DEFAULT_STRING),
path: a.pathname.replace(NOT_SLASH_AT_STRING_START_REGEXP, ORIGINAL_STRING_PREPENDED_BY_SLASH),
relative: (a.href.match(URL_RELATIVE_PART_REGEXP) || REGEXP_MATCH_STUB)[1],
segments: a.pathname.replace(SLASH_AT_STRING_START_REGEXP, DEFAULT_STRING).split(PATH_SEGMENTS_DELIMITER)
};
}
There might also be several cases when adding slash is not possible. If you know some, please comment my answer.
For those who use different inputs: like http://example.com or http://example.com/eee. It should not add a trailling slash in the second case.
There is the serialization option using .href which will add trailing slash only after the domain (host).
In NodeJs,
You would use the url module like this:
const url = require ('url');
let jojo = url.parse('http://google.com')
console.log(jojo);
In pure JS, you would use
var url = document.getElementsByTagName('a')[0];
var myURL = "http://stackoverflow.com";
console.log(myURL.href);
I have the following strings
http://example.com
https://example.com
http://www.example.com
how do i get rid of the http:// or https://?
Try with this:
var url = "https://site.com";
var urlNoProtocol = url.replace(/^https?\:\/\//i, "");
You can use the URL object like this:
const urlWithoutProtocol = new URL(url).host;
You may use URL() constructor. It will parse your url string and there will be an entry w/o protocol. So less headache with regexps:
let u = new URL('https://www.facebook.com/companypage/');
URL {
hash: ""
host: "www.facebook.com"
hostname: "www.facebook.com"
href: "https://www.facebook.com/companypage/"
origin: "https://www.facebook.com"
password: ""
pathname: "/companypage/"
port: ""
protocol: "https:"
search: ""
searchParams: URLSearchParams {}
username: ""
}
u.host // www.facebook.com
u.hostname // www.facebook.com
Although URL() drops out a protocol, it leaves you with www part. In my case I wanted to get rid of that subdomain part as well, so had to use to .replace() anyway.
u.host.replace(/^www./, '') // www.facebook.com => facebook.com
var txt="https://site.com";
txt=/^http(s)?:\/\/(.+)$/i.exec(txt);
txt=txt[2];
for parsing links without http/https use this:
var txt="https://site.com";
txt=/^(http(s)?:\/\/)?(.+)$/i.exec(txt);
txt=txt[3];
var str = "https://site.com";
str = str.substr( str.indexOf(':') + 3 );
Instead of .substr(), you could also use .slice() or .substring(). They'll all produce the same result in this situation.
str = str.slice( str.indexOf(':') + 3 );
str = str.substring( str.indexOf(':') + 3 );
EDIT: It appears as though the requirements of the question have changed in a comment under another answer.
If there possibly isn't a http:// in the string, then do this:
var str = "site.com";
var index = str.indexOf('://');
if( index > -1 )
str = str.substr( index + 3 );
This answer extends some answers above, http://, https://, or // which is also common.
Thanks for answers above that led me to this!
const urls = [ "http://example.com", "https://example.com", "//example.com" ]
// the regex below states: replace `//` or replace `//` and the 'stuff'
const resolveHostNames = urls.map(url => url.replace(/\/\/|.+\/\//, ''))
console.log(resolveHostNames);
Here's a link to a codepen.
Strip the protocol from a URL:
var url = "https://site.com";
var urlNoProto = url.split('/').slice(2).join('/');
Works with any protocol, ftp, http, gopher, nntp, telnet, wais, file, prospero ... all those specified in RFC 1738 with the exception of those without a // in them (mailto, news).
Please note that in real web pages inherited protocol // is a common practice https://paulirish.com/2010/the-protocol-relative-url.
So I suggest regexp covering this case as well:
/^\/\/|^https?:\/\//
(you can optimize it)
Another efficient solution,
url.replace(/(^(\w+:)?\/\//, '')
Assuming there are no double slashes other than the protocol, you could do:
var url = "https://example.com";
var noProtocol = url.split('//')[1];
You may use HTMLHyperlinkElementUtils of DOM:
function removeProtocol(url) {
const a = document.createElement('a');
a.href = url;
// `url` may be relative, but `a.href` will be absolute.
return a.href.replace(a.protocol + '//', '');
}
removeProtocol('https://example.com/https://foo');
// 'example.com/https://foo'
removeProtocol('wrong://bad_example/u');
// 'bad_example/u'
From HTMLHyperlinkElementUtils on MDN:
a.hostname, example.com
a.host, example.com:3000
a.pathname, /foo/bar.html
a.search, ?a=1&b=2
a.hash, #goo
a.username, a.password, a.port, etc.
Using regex might be an overkill when there's a handy native URL interface that does the job for you in 2 lines:
let url = "https://stackoverflow.com/questions/3999764/taking-off-the-http-or-https-off-a-javascript-string";
let a = new URL(url);
let withoutProtocol = a.host+a.pathname;
console.log(`Without protocol: ${withoutProtocol}`);
console.log(`With protocol: ${url}`);
URL API Support in browsers
Javascript use of split function also dervies the solution.
Awesome !!!
var url = "https://example.com";
url = url.split("://")[1]; // for https use url..split("://")[0];
console.log(url);