Comparing user-specific URL list with current URL? - javascript

I have a whitelist where users can enter specific URLs/URL patterns (only targetting http and https.
I would like to transfrom and compare these URL/URL patterns so that the wildcard selector (*) can be used like so
user enters: example.*/test
I want to transform this to: *//*.example.*/test
so that it matches: http://www.example.com/test, https://example.co.uk/test
Another example:
user enters: http://www.*.com/*
I want to transform this to: http://www.*.com/*
so that it matches: http://www.blah.com/test, http://www.other.com/null.html
and
user enters: www.example.com/*
I want to transform this to: *//www.example.com/*
so that it matches: http://www.example.com/testtwo, https://www.example.com/arfg
The reason I want to insert a leading protocol (if it wasn't included by the user) is because I am using this to compare against the current tab URL.
I get this array of URL strings and would like to compare them with the current url, but am having trouble matching all use cases:
"isNotWhitelisted" : function(){
var whitelist = MyObject.userLists.whitelist;
var currentUrl = document.location.href;
for(var i=0; i<whitelist.length; i++){
var regexListItem = new RegExp(whitelist[i].toString().replace(".", "\\.").replace("*", ".+"));
if(currentUrl.match(regexListItem)) {
return false;
}
}
return true;
},
Firstly, the regex conversion matches end cases (e.g. example.com/* but not kinds like example.*/about
This is part of a Chrome extension, is there not a better/easier way to do this maybe using inbuilt methods?
Thanks for any help in advance.

whitelist.forEach(function(listItem){
var rgx = new RegExp(listItem.replace(/\./g,'\\.').replace(/\*/g,'.*'));
if(rgx.test(url)) {
// current URL matches URL/URL pattern in whitelist array!
}
})
If you dont replace, the pattern 'www.*.com' match also to 'wwwocom'.
If you want use another special characters you can use this:
var rgx = new RegExp(listItem.replace(/(\.|\[|\]|\{|\}|\(|\)|\+|\?|\\|\$|\^)/g,'\\$1').replace(/\*/g,'.*'));

If you want a greedy matching, I think you need request the user enter the pattern in this format: *://*/*
You can check this in this way:
var special_char_rgx = /(\.|\[|\]|\{|\}|\(|\)|\+|\?|\\|\/|\$|\^|\|)/g; // I think that all...
var asterisk_rgx = /\*/g;
var pattern_rgx = /^([^:\/]+):\/\/([^\/]+)\/(.*)$/g;
function addPatern(pattern, whitelist) {
var str_pattern = pattern.replace(asterisk_rgx,'\\*');
var isMatch = pattern_rgx.test(str_pattern);
if (isMatch) {
pattern = pattern.replace(special_char_rgx,'\\$1').replace(asterisk_rgx, '.+');
whitelist.push(new RegExp('^'+pattern + '$'));
}
pattern_rgx.lastIndex = 0; // Otherwise RegExp.test save this value and destroy the tests!
return isMatch;
}
If you want handle the protocol/ domain/ path in different ways you can do it that way:
if (isMatch) {
var protocol = RegExp.$1;
var domain= RegExp.$2;
var path_query = RegExp.$3;
// Your logic...
}

Hm, m.b. create RegExp from whitelist items? If it works as you expected:
new RegExp('example.com/*').test('http://example.com/aaaa')
Just create regexp from each item in whitelist
whitelist.forEach(function(item) {
new RegExp(item).match(URL);
});

Related

How get domain from string?

var string = "https://example.com/app/something";
var string = "example.com/app/something";
new URL(string.origin)
If string have protocol all ok, and if not. have error Failed to construct 'URL': Invalid URL(…)
How can I obtain the root domain without using regex?
The question is still a bit unclear, and I'm not entirely sure how you're getting that string, but just for the sake of argument, here's a quick solution:
function getHostname(str)
{
str = (/^\w+:\/\//.test(str) ? "" : "http://") + str
return new URL(str).hostname;
}
console.log(getHostname("https://example.com/app/something"));
console.log(getHostname("example.com/app/something"));
Yes, technically, this technically does use a regular expression to check if the protocol is present, but it uses the URL class actually parse the host name.
Regex example:
var example1 = "www.example1.com/test/path";
var example2 = "https://example2.com/test/path";
var example3 = "http://subdomain.example3.com/test/path";
function getDomain(str) {
var matches = str.match(/^(?:https?:\/\/)?((?:[-A-Za-z0-9]+\.)+[A-Za-z]{2,6})/);
if (!matches || matches.length < 2) return '';
return matches[1];
}
console.log(getDomain(example1));
console.log(getDomain(example2));
console.log(getDomain(example3));
References:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match
http://regexr.com/
If I understand your question correctly, you want to check if the URL contains either the http or https protocol. This can easily be done with string functions built into JavaScript as shown below.
var string = window.location;
if (string.includes('http') || string.includes('https'))
{
//Do your logic here
}
UPDATE: Alternatively, you could use substring functionality shown below.
var string = window.location;
if (string.indexOf('http') == 0)
{
//Do your logic here
}
Note that this will also verify that the http is at the beginning of the string and not just thrown in willy nilly.

Extract links in a string and return an array of objects

I receive a string from a server and this string contains text and links (mainly starting with http://, https:// and www., very rarely different but if they are different they don't matter).
Example:
"simple text simple text simple text domain.ext/subdir again text text text youbank.com/transfertomealltheirmoney/witharegex text text text and again text"
I need a JS function that does the following:
- finds all the links (no matter if there are duplicates);
- returns an array of objects, each representing a link, together with keys that return where the link starts in the text and where it ends, something like:
[{link:"http://www.dom.ext/dir",startsAt:25,endsAt:47},
{link:"https://www.dom2.ext/dir/subdir",startsAt:57,endsAt:88},
{link:"www.dom.ext/dir",startsAt:176,endsAt:192}]
Is this possible? How?
EDIT: #Touffy: I tried this but I could not get how long is any string, only the starting index. Moreover, this does not detect www: var str = string with many links (SO does not let me post them)"
var regex =/(\b(https?|ftp|file|www):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig; var result, indices = [];
while ( (result = regex.exec(str)) ) {
indices.push({startsAt:result.index});
}; console.log(indices[0].link);console.log(indices[1].link);
One way to approach this would be with the use of regular expressions. Assuming whatever input, you can do something like
var expression = /(https?:\/\/(?:www\.|(?!www))[^\s\.]+\.[^\s]{2,}|www\.[^\s]+\.[^\s]{2,})/gi;
var matches = input.match(expression);
Then, you can iterate through the matches to discover there starting and ending points with the use of indexOf
for(match in matches)
{
var result = {};
result['link'] = matches[match];
result['startsAt'] = input.indexOf(matches[match]);
result['endsAt'] =
input.indexOf(matches[match]) + matches[match].length;
}
Of course, you may have to tinker with the regular expression itself to suit your specific needs.
You can see the results logged by console in this fiddle
const getLinksPool = (links) => {
//you can replace the https with any links like http or www
const linksplit = links.replace(/https:/g, " https:");
let linksarray = linksplit.split(" ");
let linkspools = linksarray.filter((array) => {
return array !== "";
});
return linkspools;
};

How do I replace the port number in JavaScript?

I have an array of strings, and I want to create a new array which contains the same strings without port numbers (the port number is a ":" followed by a number). For example if the string is "http://www.example.com:8080/hello/" Then it should be replaced with "http://www.example.com/hello/". How do I do it in JavaScript? I need it to call safari.extension.addContentScriptFromURL because the whitelist can't contain port numbers. If possible, it's better to replace the port number only between the second and third slash and leave the rest of the string unchanged.
You don't need any library or REGEX
https://developer.mozilla.org/en-US/docs/Web/API/URL
var url = new URL('http://localhost:8080');
url.port = '';
console.log(url.toString());
Regrards
One quite nifty way to do this, is to create an a element, and assign the URL you have as href - because the HTMLAnchorElement interface implements URLUtils, and therefor supports accessing the individual parts of the address in the same way the location object does, and you can set them individually as well:
var foo = document.createElement("a");
foo.href = "http://www.example.com:8080/hello/";
foo.port = ""
var newURL = foo.href;
console.log(newURL); // output: http://www.example.com/hello/
http://jsfiddle.net/pdymeb5d/
This should probably do what you want:
var newUrls = urls.map(function (url) {
return url.replace(/([a-zA-Z+.\-]+):\/\/([^\/]+):([0-9]+)\//, "$1://$2/");
});
Edit: It seems the schema part of URIs can contain "+", "." and "-" also. Changed the regular expression accordingly.
See: https://en.wikipedia.org/wiki/URI_scheme
You can use regex with string.replace() as follows,
var text = "http://www.example.com:8080/hello/";
var withNoDigits = text.replace(/[0-9]/g, '');
var outputString = withNoDigits.replace(/:([^:]*)$/,'$1');
alert(outputString);
function parseURL(url) {
var parser = document.createElement('a'),
searchObject = {},
queries, split, i;
parser.href = url;
queries = parser.search.replace(/^?/, '').split('&');
for( i = 0; i < queries.length; i++ ) {
split = queries[i].split('=');
searchObject[split[0]] = split[1];
}
return {
protocol: parser.protocol,
host: parser.host,
hostname: parser.hostname,
port: parser.port,
pathname: parser.pathname,
search: parser.search,
searchObject: searchObject,
hash: parser.hash
};
}
Use this to parse any URL and arrange in a format you prefer.
OK, I used your answer and changed it a little, because the protocol may contain dashes too:
var newUrls = urls.map(function(url) {
return url.replace(/([^\/\:]+):\/\/([^\/]+):([0-9]+)\//, "$1://$2/");
})
I have found best solution here.
var url = 'http://localhost:7001/www.facebook.com';
// Create a regex to match protocol, domain, and host
var matchProtocolDomainHost = /^.*\/\/[^\/]+:?[0-9]?\//i;
// Replace protocol, domain and host from url, assign tomyNewUrl
var myNewUrl = url.replace(matchProtocolDomainHost, '');
Now myNewUrl === 'www.facebook.com'.
Better to read full page.
remove hostname and port from url using regular expression

Javascript to extract *.com

I am looking for a javascript function/regex to extract *.com from a URI... (to be done on client side)
It should work for the following cases:
siphone.com = siphone.com
qwr.siphone.com = siphone.com
www.qwr.siphone.com = siphone.com
qw.rock.siphone.com = siphone.com
<http://www.qwr.siphone.com> = siphone.com
Much appreciated!
Edit: Sorry, I missed a case:
http://www.qwr.siphone.com/default.htm = siphone.com
I guess this regex should work for a few cases:
/[\w]+\.(com|ca|org|net)/
I'm not good with JavaScript, but there should be a library for splitting URIs out there, right?
According to that link, here's a "strict" regex:
/^(?:([^:\/?#]+):)?(?:\/\/((?:(([^:#]*)(?::([^:#]*))?)?#)?([^:\/?#]*)(?::(\d*))?))?((((?:[^?#\/]*\/)*)([^?#]*))(?:\?([^#]*))?(?:#(.*))?)/
As you can see, you're better off just using the "library". :)
This should do it. I added a few cases for some nonmatches.
var cases = [
"siphone.com",
"qwr.siphone.com",
"www.qwr.siphone.com",
"qw.rock.siphone.com",
"<http://www.qwr.siphone.com>",
"hamstar.corm",
"cheese.net",
"bro.at.me.come",
"http://www.qwr.siphone.com/default.htm"];
var grabCom = function(str) {
var result = str.match("(\\w+\\.com)\\W?|$");
if(result !== null)
return result[1];
return null;
};
for(var i = 0; i < cases.length; i++) {
console.log(grabCom(cases[i]));
}
var myStrings = [
'siphone.com',
'qwr.siphone.com',
'www.qwr.siphone.com',
'qw.rock.siphone.com',
'<http://www.qwr.siphone.com>'
];
for (var i = 0; i < myStrings.length; i++) {
document.write( myStrings[i] + '=' + myStrings[i].match(/[\w]+\.(com)/gi) + '<br><br>');
}
I've placed given demo strings to the myStrings array.
i - is index to iterate through this array. The following line does the matching trick:
myStrings[i].match(/[\w]+\.(com)/gi)
and returns the value of siphone.com. If you'd like to match .net and etc. - add (com|net|other) instead of just (com).
Also you may find the following link useful: Regular expressions Cheat Sheet
update: missed case works too %)
You could split the string then search for the .com string like so
var url = 'music.google.com'
var parts = url.split('.');
for(part in parts) {
if(part == 'com') {
return true;
}
{
uri = "foo.bar.baz.com"
uri.split(".").slice(-2).join(".") // returns baz.com
This assumes that you want just the hostname and tld. It also assumes that there is no path information either.
Updated now that you also need to handle uris with paths you could do:
uri.split(".").slice(-2).join(".").split("/")[0]
Use regexp to do that. This way modifications to the detections are quite easy.
var url = 'www.siphone.com';
var domain = url.match(/[^.]\.com/i)[0];
If you use url.match(/(([^.]+)\.com)[^a-z]/i)[1] instead. You can assure that the ".com" is not followed by any other characters.

Javascript substring() trickery

I have a URL that looks like http://mysite.com/#id/Blah-blah-blah, it's used for Ajax-ey bits. I want to use substring() or substr() to get the id part. ID could be any combination of any length of letters and numbers.
So far I have got:
var hash = window.location.hash;
alert(hash.substring(1)); // remove #
Which removes the front hash, but I'm not a JS coder and I'm struggling a bit. How can I remove all of it except the id part? I don't want anything after and including the final slash either (/Blah-blah-blah).
Thanks!
Jack
Now, this is a case where regular expressions will make sense. Using substring here won't work because of the variable lengths of the strings.
This code will assume that the id part wont contain any slashes.
var hash = "#asdfasdfid/Blah-blah-blah";
hash.match(/#(.+?)\//)[1]; // asdfasdfid
The . will match any character and
together with the + one or more characters
the ? makes the match non-greedy so that it will stop at the first occurence of a / in the string
If the id part can contain additional slashes and the final slash is the separator this regex will do your bidding
var hash = "#asdf/a/sdfid/Blah-blah-blah";
hash.match(/#(.+?)\/[^\/]*$/)[1]; // asdf/a/sdfid
Just for fun here are versions not using regular expressions.
No slashes in id-part:
var hash = "#asdfasdfid/Blah-blah-blah",
idpart = hash.substr(1, hash.indexOf("/"));
With slashes in id-part (last slash is separator):
var hash = "#asdf/a/sdfid/Blah-blah-blah",
lastSlash = hash.split("").reverse().indexOf("/") - 1, // Finding the last slash
idPart = hash.substring(1, lastSlash);
var hash = window.location.hash;
var matches = hash.match(/#(.+?)\//);
if (matches.length > 1) {
alert(matches[1]);
}
perhaps a regex
window.location.hash.match(/[^#\/]+/)
Use IndexOf to determine the position of the / after id and then use string.substr(start,length) to get the id value.
var hash = window.location.hash;
var posSlash = hash.indexOf("/", 1);
var id = hash.substr(1, posSlash -1)
You need ton include some validation code to check for absence of /
This one is not a good aproach, but you wish to use if you want...
var relUrl = "http://mysite.com/#id/Blah-blah-blah";
var urlParts = [];
urlParts = relUrl.split("/"); // array is 0 indexed, so
var idpart = = urlParts[3] // your id will be in 4th element
id = idpart.substring(1) //we are skipping # and read the rest
The most foolproof way to do it is probably the following:
function getId() {
var m = document.location.href.match(/\/#([^\/&]+)/);
return m && m[1];
}
This code does not assume anything about what comes after the id (if at all). The id it will catch is anything except for forward slashes and ampersands.
If you want it to catch only letters and numbers you can change it to the following:
function getId() {
var m = document.location.href.match(/\/#([a-z0-9]+)/i);
return m && m[1];
}

Categories

Resources