javascript parser for a string which contains .ini data - javascript

If a string contains a .ini file data , How can I parse it in JavaScript ?
Is there any JavaScript parser which will help in this regard?
here , typically string contains the content after reading a configuration file. (reading cannot be done through javascript , but somehow I gather .ini info in a string.)

I wrote a javascript function inspirated by node-iniparser.js
function parseINIString(data){
var regex = {
section: /^\s*\[\s*([^\]]*)\s*\]\s*$/,
param: /^\s*([^=]+?)\s*=\s*(.*?)\s*$/,
comment: /^\s*;.*$/
};
var value = {};
var lines = data.split(/[\r\n]+/);
var section = null;
lines.forEach(function(line){
if(regex.comment.test(line)){
return;
}else if(regex.param.test(line)){
var match = line.match(regex.param);
if(section){
value[section][match[1]] = match[2];
}else{
value[match[1]] = match[2];
}
}else if(regex.section.test(line)){
var match = line.match(regex.section);
value[match[1]] = {};
section = match[1];
}else if(line.length == 0 && section){
section = null;
};
});
return value;
}
2017-05-10 updated: fix bug of keys contains spaces.
EDIT:
Sample of ini file read and parse

You could try the config-ini-parser, it's similar to python ConfigParser without I/O operations
It could be installed by npm or bower. Here is an example:
var ConfigIniParser = require("config-ini-parser").ConfigIniParser;
var delimiter = "\r\n"; //or "\n" for *nux
parser = new ConfigIniParser(delimiter); //If don't assign the parameter delimiter then the default value \n will be used
parser.parse(iniContent);
var value = parser.get("section", "option");
parser.stringify('\n'); //get all the ini file content as a string
For more detail you could check the project main page or from the npm package page

Here's a function who's able to parse ini data from a string to an object! (on client side)
function parseINIString(data){
var regex = {
section: /^\s*\[\s*([^\]]*)\s*\]\s*$/,
param: /^\s*([\w\.\-\_]+)\s*=\s*(.*?)\s*$/,
comment: /^\s*;.*$/
};
var value = {};
var lines = data.split(/\r\n|\r|\n/);
var section = null;
for(x=0;x<lines.length;x++)
{
if(regex.comment.test(lines[x])){
return;
}else if(regex.param.test(lines[x])){
var match = lines[x].match(regex.param);
if(section){
value[section][match[1]] = match[2];
}else{
value[match[1]] = match[2];
}
}else if(regex.section.test(lines[x])){
var match = lines[x].match(regex.section);
value[match[1]] = {};
section = match[1];
}else if(lines.length == 0 && section){//changed line to lines to fix bug.
section = null;
};
}
return value;
}

Based on the other responses i've modified it so you can have nested sections :)
function parseINI(data: string) {
let rgx = {
section: /^\s*\[\s*([^\]]*)\s*\]\s*$/,
param: /^\s*([^=]+?)\s*=\s*(.*?)\s*$/,
comment: /^\s*;.*$/
};
let result = {};
let lines = data.split(/[\r\n]+/);
let section = result;
lines.forEach(function (line) {
//comments
if (rgx.comment.test(line)) return;
//params
if (rgx.param.test(line)) {
let match = line.match(rgx.param);
section[match[1]] = match[2];
return;
}
//sections
if (rgx.section.test(line)) {
section = result
let match = line.match(rgx.section);
for (let subSection of match[1].split(".")) {
!section[subSection] && (section[subSection] = {});
section = section[subSection];
}
return;
}
});
return result;
}

Related

Getting domain without subdomain from an url with javascript [duplicate]

How to get the domain name without subdomains?
e.g. if the url is "http://one.two.roothost.co.uk/page.html" how to get "roothost.co.uk"?
Following is a solution to extract a domain name without any subdomains. This solution doesn't make any assumptions about the URL format, so it should work for any URL. Since some domain names have one suffix (.com), and some have two or more (.co.uk), to get an accurate result in all cases, we need to parse the hostname using the Public Suffix List, which contains a list of all public domain name suffixes.
Solution
First, include the public suffix list js api in a script tag in your HTML, then in JavaScript to get the hostname you can call:
var parsed = psl.parse('one.two.roothost.co.uk');
console.log(parsed.domain);
...which will return "roothost.co.uk". To get the name from the current page, you can use location.hostname instead of a static string:
var parsed = psl.parse(location.hostname);
console.log(parsed.domain);
Finally, if you need to parse a domain name directly out of a full URL string, you can use the following:
var url = "http://one.two.roothost.co.uk/page.html";
url = url.split("/")[2]; // Get the hostname
var parsed = psl.parse(url); // Parse the domain
document.getElementById("output").textContent = parsed.domain;
JSFiddle Example (it includes the entire minified library in the jsFiddle, so scroll down!): https://jsfiddle.net/6aqdbL71/2/
What about this?
function getCanonicalHost(hostname) {
const MAX_TLD_LENGTH = 3;
function isNotTLD(_) { return _.length > MAX_TLD_LENGTH; };
hostname = hostname.split('.');
hostname = hostname.slice(Math.max(0, hostname.findLastIndex(isNotTLD)));
hostname = hostname.join('.');
return hostname;
}
console.log(getCanonicalHost('mail.google.com'));
console.log(getCanonicalHost('some.google.com.ar'));
console.log(getCanonicalHost('some.another.google.com.ar'));
console.log(getCanonicalHost('foo.bar.google.com'));
console.log(getCanonicalHost('foo.bar.google.com.ar'));
console.log(getCanonicalHost('bar.google.ar'));
Its works since https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_domain_name say:
TLDs can contain special as well as latin characters. A TLD's maximum length is 63 characters, although most are around 2–3.
In https://data.iana.org/TLD/tlds-alpha-by-domain.txt are 1481 TLD, 466 of this has length around 2–3 and the most used TLD no has more than 3.
If you need a solution that works with all TLDS, here is a more complex aproach:
function getCanonicalHost(hostname) {
return getCanonicalHost.tlds.then(function(tlds) {
function isNotTLD(_) { return tlds.indexOf(_) === -1; };
hostname = hostname.toLowerCase();
hostname = hostname.split('.');
hostname = hostname.slice(Math.max(0, hostname.findLastIndex(isNotTLD)));
hostname = hostname.join('.');
return hostname;
});
}
getCanonicalHost.tlds = new Promise(function(res, rej) {
const TLD_LIST_URL= 'https://data.iana.org/TLD/tlds-alpha-by-domain.txt';
const xhr = new XMLHttpRequest();
xhr.addEventListener('error', rej);
xhr.addEventListener('load', function() {
const MAX_TLD_LENGTH = 63;
var tlds = xhr.responseText.split('\n');
tlds = tlds.map(function(_) { return _.trim().toLowerCase(); });
tlds = tlds.filter(Boolean);
tlds = tlds.filter(function(_) { return _.length < MAX_TLD_LENGTH; });
res(tlds);
});
xhr.open('GET', TLD_LIST_URL);
xhr.send();
})
getCanonicalHost('mail.google.com').then(console.log);
getCanonicalHost('some.google.com.ar').then(console.log);
getCanonicalHost('some.another.google.com.ar').then(console.log);
getCanonicalHost('foo.bar.google.com').then(console.log);
getCanonicalHost('foo.bar.google.com.ar').then(console.log);
getCanonicalHost('bar.google.ar').then(console.log);
You can use parse-domain to do the heavy lifting for you. This package considers the public suffix list and returns an easy to work with object breaking up the domain.
Here is an example from their readme:
npm install parse-domain
import { parseDomain, ParseResultType } from 'parse-domain';
const parseResult = parseDomain(
// should be a string with basic latin characters only. more details in the readme
'www.some.example.co.uk',
);
// check if the domain is listed in the public suffix list
if (parseResult.type === ParseResultType.Listed) {
const { subDomains, domain, topLevelDomains } = parseResult;
console.log(subDomains); // ["www", "some"]
console.log(domain); // "example"
console.log(topLevelDomains); // ["co", "uk"]
} else {
// more about other parseResult types in the readme
}
This works for me:
const firstTLDs = "ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|be|bf|bg|bh|bi|bj|bm|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|cl|cm|cn|co|cr|cu|cv|cw|cx|cz|de|dj|dk|dm|do|dz|ec|ee|eg|es|et|eu|fi|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jo|jp|kg|ki|km|kn|kp|kr|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|na|nc|ne|nf|ng|nl|no|nr|nu|nz|om|pa|pe|pf|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|yt".split('|');
const secondTLDs = "com|edu|gov|net|mil|org|nom|sch|caa|res|off|gob|int|tur|ip6|uri|urn|asn|act|nsw|qld|tas|vic|pro|biz|adm|adv|agr|arq|art|ato|bio|bmd|cim|cng|cnt|ecn|eco|emp|eng|esp|etc|eti|far|fnd|fot|fst|g12|ggf|imb|ind|inf|jor|jus|leg|lel|mat|med|mus|not|ntr|odo|ppg|psc|psi|qsl|rec|slg|srv|teo|tmp|trd|vet|zlg|web|ltd|sld|pol|fin|k12|lib|pri|aip|fie|eun|sci|prd|cci|pvt|mod|idv|rel|sex|gen|nic|abr|bas|cal|cam|emr|fvg|laz|lig|lom|mar|mol|pmn|pug|sar|sic|taa|tos|umb|vao|vda|ven|mie|北海道|和歌山|神奈川|鹿児島|ass|rep|tra|per|ngo|soc|grp|plc|its|air|and|bus|can|ddr|jfk|mad|nrw|nyc|ski|spy|tcm|ulm|usa|war|fhs|vgs|dep|eid|fet|fla|flå|gol|hof|hol|sel|vik|cri|iwi|ing|abo|fam|gok|gon|gop|gos|aid|atm|gsm|sos|elk|waw|est|aca|bar|cpa|jur|law|sec|plo|www|bir|cbg|jar|khv|msk|nov|nsk|ptz|rnd|spb|stv|tom|tsk|udm|vrn|cmw|kms|nkz|snz|pub|fhv|red|ens|nat|rns|rnu|bbs|tel|bel|kep|nhs|dni|fed|isa|nsn|gub|e12|tec|орг|обр|упр|alt|nis|jpn|mex|ath|iki|nid|gda|inc".split('|');
const knownSubdomains = "www|studio|mail|remote|blog|webmail|server|ns1|ns2|smtp|secure|vpn|m|shop|ftp|mail2|test|portal|ns|ww1|host|support|dev|web|bbs|ww42|squatter|mx|email|1|mail1|2|forum|owa|www2|gw|admin|store|mx1|cdn|api|exchange|app|gov|2tty|vps|govyty|hgfgdf|news|1rer|lkjkui";
function removeSubdomain(s) {
const knownSubdomainsRegExp = new RegExp(`^(${knownSubdomains})\.`, 'i');
s = s.replace(knownSubdomainsRegExp, '');
const parts = s.split('.');
while (parts.length > 3) {
parts.shift();
}
if (parts.length === 3 && ((parts[1].length > 2 && parts[2].length > 2) || (secondTLDs.indexOf(parts[1]) === -1) && firstTLDs.indexOf(parts[2]) === -1)) {
parts.shift();
}
return parts.join('.');
};
var tests = {
'www.sidanmor.com': 'sidanmor.com',
'exemple.com': 'exemple.com',
'argos.co.uk': 'argos.co.uk',
'www.civilwar.museum': 'civilwar.museum',
'www.sub.civilwar.museum': 'civilwar.museum',
'www.xxx.sub.civilwar.museum': 'civilwar.museum',
'www.exemple.com': 'exemple.com',
'main.testsite.com': 'testsite.com',
'www.ex-emple.com.ar': 'ex-emple.com.ar',
'main.test-site.co.uk': 'test-site.co.uk',
'en.tour.mysite.nl': 'tour.mysite.nl',
'www.one.lv': 'one.lv',
'www.onfdsadfsafde.lv': 'onfdsadfsafde.lv',
'aaa.onfdsadfsafde.aa': 'onfdsadfsafde.aa',
};
const firstTLDs = "ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|be|bf|bg|bh|bi|bj|bm|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|cl|cm|cn|co|cr|cu|cv|cw|cx|cz|de|dj|dk|dm|do|dz|ec|ee|eg|es|et|eu|fi|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jo|jp|kg|ki|km|kn|kp|kr|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|na|nc|ne|nf|ng|nl|no|nr|nu|nz|om|pa|pe|pf|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|yt".split('|');
const secondTLDs = "com|edu|gov|net|mil|org|nom|sch|caa|res|off|gob|int|tur|ip6|uri|urn|asn|act|nsw|qld|tas|vic|pro|biz|adm|adv|agr|arq|art|ato|bio|bmd|cim|cng|cnt|ecn|eco|emp|eng|esp|etc|eti|far|fnd|fot|fst|g12|ggf|imb|ind|inf|jor|jus|leg|lel|mat|med|mus|not|ntr|odo|ppg|psc|psi|qsl|rec|slg|srv|teo|tmp|trd|vet|zlg|web|ltd|sld|pol|fin|k12|lib|pri|aip|fie|eun|sci|prd|cci|pvt|mod|idv|rel|sex|gen|nic|abr|bas|cal|cam|emr|fvg|laz|lig|lom|mar|mol|pmn|pug|sar|sic|taa|tos|umb|vao|vda|ven|mie|北海道|和歌山|神奈川|鹿児島|ass|rep|tra|per|ngo|soc|grp|plc|its|air|and|bus|can|ddr|jfk|mad|nrw|nyc|ski|spy|tcm|ulm|usa|war|fhs|vgs|dep|eid|fet|fla|flå|gol|hof|hol|sel|vik|cri|iwi|ing|abo|fam|gok|gon|gop|gos|aid|atm|gsm|sos|elk|waw|est|aca|bar|cpa|jur|law|sec|plo|www|bir|cbg|jar|khv|msk|nov|nsk|ptz|rnd|spb|stv|tom|tsk|udm|vrn|cmw|kms|nkz|snz|pub|fhv|red|ens|nat|rns|rnu|bbs|tel|bel|kep|nhs|dni|fed|isa|nsn|gub|e12|tec|орг|обр|упр|alt|nis|jpn|mex|ath|iki|nid|gda|inc".split('|');
const knownSubdomains = "www|studio|mail|remote|blog|webmail|server|ns1|ns2|smtp|secure|vpn|m|shop|ftp|mail2|test|portal|ns|ww1|host|support|dev|web|bbs|ww42|squatter|mx|email|1|mail1|2|forum|owa|www2|gw|admin|store|mx1|cdn|api|exchange|app|gov|2tty|vps|govyty|hgfgdf|news|1rer|lkjkui";
function removeSubdomain(s) {
const knownSubdomainsRegExp = new RegExp(`^(${knownSubdomains})\.`, 'i');
s = s.replace(knownSubdomainsRegExp, '');
const parts = s.split('.');
while (parts.length > 3) {
parts.shift();
}
if (parts.length === 3 && ((parts[1].length > 2 && parts[2].length > 2) || (secondTLDs.indexOf(parts[1]) === -1) && firstTLDs.indexOf(parts[2]) === -1)) {
parts.shift();
}
return parts.join('.');
};
for (var test in tests) {
if (tests.hasOwnProperty(test)) {
var t = test;
var e = tests[test];
var r = removeSubdomain(test);
var s = e === r;
if (s) {
console.log('OK: "' + t + '" should be "' + e + '" and it is really "' + r + '"');
} else {
console.log('Fail: "' + t + '" should be "' + e + '" but it is NOT "' + r + '"');
}
}
}
Referance:
psl.min.js file
Maximillian Laumeister Answer to this question
The most popular subdomains on the internet
Simplest solution:
var domain='https://'+window.location.hostname.split('.')[window.location.hostname.split('.').length-2]+'.'+window.location.hostname.split('.')[window.location.hostname.split('.').length-1];
alert(domain);
I created this function which uses URL to parse. It cheats by assuming all hostnames will have either 4 or less parts.
const getDomainWithoutSubdomain = url => {
const urlParts = new URL(url).hostname.split('.')
return urlParts
.slice(0)
.slice(-(urlParts.length === 4 ? 3 : 2))
.join('.')
}
[
'https://www.google.com',
'https://www.google.co.uk',
'https://mail.google.com',
'https://www.bbc.co.uk/news',
'https://github.com',
].forEach(url => {
console.log(getDomainWithoutSubdomain(url))
})
Here is a working JSFiddle
My solution works with the assumption that the root hostname you are looking for is of the type "abc.xyz.pp".
extractDomain() returns the hostname with all the subdomains.
getRootHostName() splits the hostname by . and then based on the assumption mentioned above, it uses the shift() to remove each subdomain name.
Finally, whatever remains in parts[], it joins them by . to form the root hostname.
Javascript
var urlInput = "http://one.two.roothost.co.uk/page.html";
function extractDomain(url) {
var domain;
//find & remove protocol (http, ftp, etc.) and get domain
if (url.indexOf("://") > -1) {
domain = url.split('/')[2];
} else {
domain = url.split('/')[0];
}
//find & remove port number
domain = domain.split(':')[0];
return domain;
}
function getRootHostName(url) {
var parts = extractDomain(url).split('.');
var partsLength = parts.length - 3;
//parts.length-3 assuming root hostname is of type abc.xyz.pp
for (i = 0; i < partsLength; i++) {
parts.shift(); //remove sub-domains one by one
}
var rootDomain = parts.join('.');
return rootDomain;
}
document.getElementById("result").innerHTML = getRootHostName(urlInput);
HTML
<div id="result"></div>
EDIT 1: Updated the JSFiddle link. It was reflecting the incorrect code.
What about...
function getDomain(){
if(document.domain.length){
var parts = document.domain.replace(/^(www\.)/,"").split('.');
//is there a subdomain?
while(parts.length > 2){
//removing it from our array
var subdomain = parts.shift();
}
//getting the remaining 2 elements
var domain = parts.join('.');
return domain.replace(/(^\.*)|(\.*$)/g, "");
}
return '';
}
My solution worked for me: Get "gocustom.com" from "shop.gocustom.com"
var site_domain_name = 'shop.gocustom.com';
alert(site_domain_name);
var strsArray = site_domain_name.split('.');
var strsArrayLen = strsArray.length;
alert(strsArray[eval(strsArrayLen - 2)]+'.'+strsArray[eval(strsArrayLen - 1)])
You can try this in JavaScript:
alert(window.location.hostname);
It will return the hostname.

How to get specific parameter's value from the querystring in jquery/javascript?

I've following query string:
url = "http://56.177.59.250/static/ajax.php?core[ajax]=true&core[call]=prj_name.contactform&width=400&core[security_token]=c7854c13380a26ff009a5cd9e6699840"
I want to get the value of variable core[call] i.e. prj_name.contactform
How should I get this value using jQuery/javascript?
Please help me.
Try this, which puts all variables into the vars{} object. You can then access vars.core.ajax, vars.width, etc. Also live on this fiddle:
var u = "http://localhost:8080/static/ajax.php?core[ajax]=true&core[call]=prj_name.contactform&width=400&core[security_token]=c7854c13380a26ff009a5cd9e6699840&x=1"
var re = /(\w+)\[(\w+)\]$/
var vars = {}
u.split('?')[1].split('&').forEach(function(e) {
var p = e.split('=');
var v = p[0].match(re);
if (v === null) {
vars[p[0]] = p[1];
} else {
if (!(v[1] in vars)) { vars[v[1]] = {}; }
vars[v[1]][v[2]] = p[1];
}
});
console.log(vars);

More efficient way than using lots of else if statements

I'm trying to find a better way to do this in Javascript:
if ( text === 'amy' ) {
var url = 'http://www.mydomain.com/amylikescats.html';
}
else if ( text === 'dave' ) {
var url = 'http://www.mydomain.com/daveshome.html';
}
else if ( text === 'steve' ) {
var url = 'http://www.mydomain.com/steve2.html';
}
else if ( text === 'jake' ) {
var url = 'http://www.mydomain.com/jakeeatstofu.html';
}
else {
var url = 'http://www.mydomain.com/noone.html';
}
Is there a more code efficient way of doing this?'
Use an object as a map:
var map = {
"amy": 'http://www.mydomain.com/amylikescats.html',
"dave": 'http://www.mydomain.com/daveshome.html',
// etc
};
var text = "whatever";
var url = map[text] === undefined ? 'http://www.mydomain.com/noone.html' : map[text];
This will save you the maximum amount of repeated code, but if you also need to do other stuff than setting url a switch might be more appropriate.
Switch statement!
var url = 'http://www.mydomain.com/noone.html';
switch(text) {
case 'amy': url = 'http://www.mydomain.com/amylikescats.html';
break;
case 'dave': url = 'http://www.mydomain.com/daveshome.html';
break;
case 'steve': url = 'http://www.mydomain.com/steve2.html';
break;
case 'jake': url = 'http://www.mydomain.com/jakeeatstofu.html';
break;
}
Now there is no need for a default clause because you've initialized url before the switch.
Otherwise you could add this:
default: url = 'http://www.mydomain.com/noone.html';
break;
Associative array:
var data = {
amy: 'http://www.mydomain.com/amylikescats.html',
dave: 'http://www.mydomain.com/daveshome.html',
// etc...
}
To use:
var url = data[text];
The else case can be replicate dby the non-existance of the item in the array, so expanding a bit:
var url = '';
if(!(text in data)){
url = 'http://www.mydomain.com/daveshome.html';
}
else{
url = data[text];
}
Store the unique parts in a dictionary and then take it from there:
var map = {
amy: "amylikescats",
dave: "daveshome",
steve: "steve2",
jake: "jakeeatstofu"
};
var url = map[text];
if (!url) {
url = 'http://www.mydomain.com/noone.html';
} else {
url = 'http://www.mydomain.com/' + url + '.html';
}
You could use an object to hold the URLs for different values of text, and then use the || operator when assigning a value to url to use the fallback value if necessary.
var urlsForText = {
'amy': 'http://www.mydomain.com/amylikescats.html',
'dave': 'http://www.mydomain.com/daveshome.html',
'steve': 'http://www.mydomain.com/steve2.html',
'jake': 'http://www.mydomain.com/jakeeatstofu.html'
};
var url = urlsForText[text] || 'http://www.mydomain.com/noone.html';

Get domain name without subdomains using JavaScript?

How to get the domain name without subdomains?
e.g. if the url is "http://one.two.roothost.co.uk/page.html" how to get "roothost.co.uk"?
Following is a solution to extract a domain name without any subdomains. This solution doesn't make any assumptions about the URL format, so it should work for any URL. Since some domain names have one suffix (.com), and some have two or more (.co.uk), to get an accurate result in all cases, we need to parse the hostname using the Public Suffix List, which contains a list of all public domain name suffixes.
Solution
First, include the public suffix list js api in a script tag in your HTML, then in JavaScript to get the hostname you can call:
var parsed = psl.parse('one.two.roothost.co.uk');
console.log(parsed.domain);
...which will return "roothost.co.uk". To get the name from the current page, you can use location.hostname instead of a static string:
var parsed = psl.parse(location.hostname);
console.log(parsed.domain);
Finally, if you need to parse a domain name directly out of a full URL string, you can use the following:
var url = "http://one.two.roothost.co.uk/page.html";
url = url.split("/")[2]; // Get the hostname
var parsed = psl.parse(url); // Parse the domain
document.getElementById("output").textContent = parsed.domain;
JSFiddle Example (it includes the entire minified library in the jsFiddle, so scroll down!): https://jsfiddle.net/6aqdbL71/2/
What about this?
function getCanonicalHost(hostname) {
const MAX_TLD_LENGTH = 3;
function isNotTLD(_) { return _.length > MAX_TLD_LENGTH; };
hostname = hostname.split('.');
hostname = hostname.slice(Math.max(0, hostname.findLastIndex(isNotTLD)));
hostname = hostname.join('.');
return hostname;
}
console.log(getCanonicalHost('mail.google.com'));
console.log(getCanonicalHost('some.google.com.ar'));
console.log(getCanonicalHost('some.another.google.com.ar'));
console.log(getCanonicalHost('foo.bar.google.com'));
console.log(getCanonicalHost('foo.bar.google.com.ar'));
console.log(getCanonicalHost('bar.google.ar'));
Its works since https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_domain_name say:
TLDs can contain special as well as latin characters. A TLD's maximum length is 63 characters, although most are around 2–3.
In https://data.iana.org/TLD/tlds-alpha-by-domain.txt are 1481 TLD, 466 of this has length around 2–3 and the most used TLD no has more than 3.
If you need a solution that works with all TLDS, here is a more complex aproach:
function getCanonicalHost(hostname) {
return getCanonicalHost.tlds.then(function(tlds) {
function isNotTLD(_) { return tlds.indexOf(_) === -1; };
hostname = hostname.toLowerCase();
hostname = hostname.split('.');
hostname = hostname.slice(Math.max(0, hostname.findLastIndex(isNotTLD)));
hostname = hostname.join('.');
return hostname;
});
}
getCanonicalHost.tlds = new Promise(function(res, rej) {
const TLD_LIST_URL= 'https://data.iana.org/TLD/tlds-alpha-by-domain.txt';
const xhr = new XMLHttpRequest();
xhr.addEventListener('error', rej);
xhr.addEventListener('load', function() {
const MAX_TLD_LENGTH = 63;
var tlds = xhr.responseText.split('\n');
tlds = tlds.map(function(_) { return _.trim().toLowerCase(); });
tlds = tlds.filter(Boolean);
tlds = tlds.filter(function(_) { return _.length < MAX_TLD_LENGTH; });
res(tlds);
});
xhr.open('GET', TLD_LIST_URL);
xhr.send();
})
getCanonicalHost('mail.google.com').then(console.log);
getCanonicalHost('some.google.com.ar').then(console.log);
getCanonicalHost('some.another.google.com.ar').then(console.log);
getCanonicalHost('foo.bar.google.com').then(console.log);
getCanonicalHost('foo.bar.google.com.ar').then(console.log);
getCanonicalHost('bar.google.ar').then(console.log);
You can use parse-domain to do the heavy lifting for you. This package considers the public suffix list and returns an easy to work with object breaking up the domain.
Here is an example from their readme:
npm install parse-domain
import { parseDomain, ParseResultType } from 'parse-domain';
const parseResult = parseDomain(
// should be a string with basic latin characters only. more details in the readme
'www.some.example.co.uk',
);
// check if the domain is listed in the public suffix list
if (parseResult.type === ParseResultType.Listed) {
const { subDomains, domain, topLevelDomains } = parseResult;
console.log(subDomains); // ["www", "some"]
console.log(domain); // "example"
console.log(topLevelDomains); // ["co", "uk"]
} else {
// more about other parseResult types in the readme
}
This works for me:
const firstTLDs = "ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|be|bf|bg|bh|bi|bj|bm|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|cl|cm|cn|co|cr|cu|cv|cw|cx|cz|de|dj|dk|dm|do|dz|ec|ee|eg|es|et|eu|fi|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jo|jp|kg|ki|km|kn|kp|kr|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|na|nc|ne|nf|ng|nl|no|nr|nu|nz|om|pa|pe|pf|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|yt".split('|');
const secondTLDs = "com|edu|gov|net|mil|org|nom|sch|caa|res|off|gob|int|tur|ip6|uri|urn|asn|act|nsw|qld|tas|vic|pro|biz|adm|adv|agr|arq|art|ato|bio|bmd|cim|cng|cnt|ecn|eco|emp|eng|esp|etc|eti|far|fnd|fot|fst|g12|ggf|imb|ind|inf|jor|jus|leg|lel|mat|med|mus|not|ntr|odo|ppg|psc|psi|qsl|rec|slg|srv|teo|tmp|trd|vet|zlg|web|ltd|sld|pol|fin|k12|lib|pri|aip|fie|eun|sci|prd|cci|pvt|mod|idv|rel|sex|gen|nic|abr|bas|cal|cam|emr|fvg|laz|lig|lom|mar|mol|pmn|pug|sar|sic|taa|tos|umb|vao|vda|ven|mie|北海道|和歌山|神奈川|鹿児島|ass|rep|tra|per|ngo|soc|grp|plc|its|air|and|bus|can|ddr|jfk|mad|nrw|nyc|ski|spy|tcm|ulm|usa|war|fhs|vgs|dep|eid|fet|fla|flå|gol|hof|hol|sel|vik|cri|iwi|ing|abo|fam|gok|gon|gop|gos|aid|atm|gsm|sos|elk|waw|est|aca|bar|cpa|jur|law|sec|plo|www|bir|cbg|jar|khv|msk|nov|nsk|ptz|rnd|spb|stv|tom|tsk|udm|vrn|cmw|kms|nkz|snz|pub|fhv|red|ens|nat|rns|rnu|bbs|tel|bel|kep|nhs|dni|fed|isa|nsn|gub|e12|tec|орг|обр|упр|alt|nis|jpn|mex|ath|iki|nid|gda|inc".split('|');
const knownSubdomains = "www|studio|mail|remote|blog|webmail|server|ns1|ns2|smtp|secure|vpn|m|shop|ftp|mail2|test|portal|ns|ww1|host|support|dev|web|bbs|ww42|squatter|mx|email|1|mail1|2|forum|owa|www2|gw|admin|store|mx1|cdn|api|exchange|app|gov|2tty|vps|govyty|hgfgdf|news|1rer|lkjkui";
function removeSubdomain(s) {
const knownSubdomainsRegExp = new RegExp(`^(${knownSubdomains})\.`, 'i');
s = s.replace(knownSubdomainsRegExp, '');
const parts = s.split('.');
while (parts.length > 3) {
parts.shift();
}
if (parts.length === 3 && ((parts[1].length > 2 && parts[2].length > 2) || (secondTLDs.indexOf(parts[1]) === -1) && firstTLDs.indexOf(parts[2]) === -1)) {
parts.shift();
}
return parts.join('.');
};
var tests = {
'www.sidanmor.com': 'sidanmor.com',
'exemple.com': 'exemple.com',
'argos.co.uk': 'argos.co.uk',
'www.civilwar.museum': 'civilwar.museum',
'www.sub.civilwar.museum': 'civilwar.museum',
'www.xxx.sub.civilwar.museum': 'civilwar.museum',
'www.exemple.com': 'exemple.com',
'main.testsite.com': 'testsite.com',
'www.ex-emple.com.ar': 'ex-emple.com.ar',
'main.test-site.co.uk': 'test-site.co.uk',
'en.tour.mysite.nl': 'tour.mysite.nl',
'www.one.lv': 'one.lv',
'www.onfdsadfsafde.lv': 'onfdsadfsafde.lv',
'aaa.onfdsadfsafde.aa': 'onfdsadfsafde.aa',
};
const firstTLDs = "ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|be|bf|bg|bh|bi|bj|bm|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|cl|cm|cn|co|cr|cu|cv|cw|cx|cz|de|dj|dk|dm|do|dz|ec|ee|eg|es|et|eu|fi|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jo|jp|kg|ki|km|kn|kp|kr|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|na|nc|ne|nf|ng|nl|no|nr|nu|nz|om|pa|pe|pf|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|yt".split('|');
const secondTLDs = "com|edu|gov|net|mil|org|nom|sch|caa|res|off|gob|int|tur|ip6|uri|urn|asn|act|nsw|qld|tas|vic|pro|biz|adm|adv|agr|arq|art|ato|bio|bmd|cim|cng|cnt|ecn|eco|emp|eng|esp|etc|eti|far|fnd|fot|fst|g12|ggf|imb|ind|inf|jor|jus|leg|lel|mat|med|mus|not|ntr|odo|ppg|psc|psi|qsl|rec|slg|srv|teo|tmp|trd|vet|zlg|web|ltd|sld|pol|fin|k12|lib|pri|aip|fie|eun|sci|prd|cci|pvt|mod|idv|rel|sex|gen|nic|abr|bas|cal|cam|emr|fvg|laz|lig|lom|mar|mol|pmn|pug|sar|sic|taa|tos|umb|vao|vda|ven|mie|北海道|和歌山|神奈川|鹿児島|ass|rep|tra|per|ngo|soc|grp|plc|its|air|and|bus|can|ddr|jfk|mad|nrw|nyc|ski|spy|tcm|ulm|usa|war|fhs|vgs|dep|eid|fet|fla|flå|gol|hof|hol|sel|vik|cri|iwi|ing|abo|fam|gok|gon|gop|gos|aid|atm|gsm|sos|elk|waw|est|aca|bar|cpa|jur|law|sec|plo|www|bir|cbg|jar|khv|msk|nov|nsk|ptz|rnd|spb|stv|tom|tsk|udm|vrn|cmw|kms|nkz|snz|pub|fhv|red|ens|nat|rns|rnu|bbs|tel|bel|kep|nhs|dni|fed|isa|nsn|gub|e12|tec|орг|обр|упр|alt|nis|jpn|mex|ath|iki|nid|gda|inc".split('|');
const knownSubdomains = "www|studio|mail|remote|blog|webmail|server|ns1|ns2|smtp|secure|vpn|m|shop|ftp|mail2|test|portal|ns|ww1|host|support|dev|web|bbs|ww42|squatter|mx|email|1|mail1|2|forum|owa|www2|gw|admin|store|mx1|cdn|api|exchange|app|gov|2tty|vps|govyty|hgfgdf|news|1rer|lkjkui";
function removeSubdomain(s) {
const knownSubdomainsRegExp = new RegExp(`^(${knownSubdomains})\.`, 'i');
s = s.replace(knownSubdomainsRegExp, '');
const parts = s.split('.');
while (parts.length > 3) {
parts.shift();
}
if (parts.length === 3 && ((parts[1].length > 2 && parts[2].length > 2) || (secondTLDs.indexOf(parts[1]) === -1) && firstTLDs.indexOf(parts[2]) === -1)) {
parts.shift();
}
return parts.join('.');
};
for (var test in tests) {
if (tests.hasOwnProperty(test)) {
var t = test;
var e = tests[test];
var r = removeSubdomain(test);
var s = e === r;
if (s) {
console.log('OK: "' + t + '" should be "' + e + '" and it is really "' + r + '"');
} else {
console.log('Fail: "' + t + '" should be "' + e + '" but it is NOT "' + r + '"');
}
}
}
Referance:
psl.min.js file
Maximillian Laumeister Answer to this question
The most popular subdomains on the internet
Simplest solution:
var domain='https://'+window.location.hostname.split('.')[window.location.hostname.split('.').length-2]+'.'+window.location.hostname.split('.')[window.location.hostname.split('.').length-1];
alert(domain);
I created this function which uses URL to parse. It cheats by assuming all hostnames will have either 4 or less parts.
const getDomainWithoutSubdomain = url => {
const urlParts = new URL(url).hostname.split('.')
return urlParts
.slice(0)
.slice(-(urlParts.length === 4 ? 3 : 2))
.join('.')
}
[
'https://www.google.com',
'https://www.google.co.uk',
'https://mail.google.com',
'https://www.bbc.co.uk/news',
'https://github.com',
].forEach(url => {
console.log(getDomainWithoutSubdomain(url))
})
Here is a working JSFiddle
My solution works with the assumption that the root hostname you are looking for is of the type "abc.xyz.pp".
extractDomain() returns the hostname with all the subdomains.
getRootHostName() splits the hostname by . and then based on the assumption mentioned above, it uses the shift() to remove each subdomain name.
Finally, whatever remains in parts[], it joins them by . to form the root hostname.
Javascript
var urlInput = "http://one.two.roothost.co.uk/page.html";
function extractDomain(url) {
var domain;
//find & remove protocol (http, ftp, etc.) and get domain
if (url.indexOf("://") > -1) {
domain = url.split('/')[2];
} else {
domain = url.split('/')[0];
}
//find & remove port number
domain = domain.split(':')[0];
return domain;
}
function getRootHostName(url) {
var parts = extractDomain(url).split('.');
var partsLength = parts.length - 3;
//parts.length-3 assuming root hostname is of type abc.xyz.pp
for (i = 0; i < partsLength; i++) {
parts.shift(); //remove sub-domains one by one
}
var rootDomain = parts.join('.');
return rootDomain;
}
document.getElementById("result").innerHTML = getRootHostName(urlInput);
HTML
<div id="result"></div>
EDIT 1: Updated the JSFiddle link. It was reflecting the incorrect code.
What about...
function getDomain(){
if(document.domain.length){
var parts = document.domain.replace(/^(www\.)/,"").split('.');
//is there a subdomain?
while(parts.length > 2){
//removing it from our array
var subdomain = parts.shift();
}
//getting the remaining 2 elements
var domain = parts.join('.');
return domain.replace(/(^\.*)|(\.*$)/g, "");
}
return '';
}
My solution worked for me: Get "gocustom.com" from "shop.gocustom.com"
var site_domain_name = 'shop.gocustom.com';
alert(site_domain_name);
var strsArray = site_domain_name.split('.');
var strsArrayLen = strsArray.length;
alert(strsArray[eval(strsArrayLen - 2)]+'.'+strsArray[eval(strsArrayLen - 1)])
You can try this in JavaScript:
alert(window.location.hostname);
It will return the hostname.

scrape id from url using javascript

I have the following URL:
http://www.abebooks.com/servlet/BookDetailsPL?bi=1325819827&searchurl=an%3DLofting%252C%2BHugh.%26ds%3D30%26sortby%3D13%26tn%3DDOCTOR%2BDOLITTLE%2527S%2BGARDEN.
Where bi is a identifier for the specific book.
How can I extract the book id from the link?
Thanks!
You can to use this regex:
var address = "http://www.abebooks.com/servlet/BookDetailsPL?bi=1325819827&...";
var bi = /[\?&]bi=(\d+)/.exec(address)[1]
alert(bi)
function getBookId()
{
var query = document.location.split("?")[1];
var values = query.split("&");
for(var i = 0; i < values.length; i++)
{
a = values[i].split("=");
if(a[0] === "bi")
return a[1];
}
//some error occurred
return null;
}
You can extract the book id (assumed to be only numbers) via a regular expression (and grouping).
var s = "http://www.abebooks.com/servlet/BookDetailsPL?\
bi=1325819827&searchurl=an%3DLofting%252C%2BHugh.\
%26ds%3D30%26sortby%3D13%26tn%3DDOCTOR%2BDOLITTLE\
%2527S%2BGARDEN."
var re = /bi=([0-9]+)&/; // or equally: /bi=(\d+)&/
var match = re.exec(s);
match[1]; // => contains 1325819827
address.split("bi=")[1].split("&")[0]
Try this
var bookId
var matcher = location.search.match(/(?:[?&]bi=([^&]+))/); // Assuming window.location
if (null !== matcher) {
bookId = matcher[1];
}
I once had the same problem.
I created a little function to help me out. Don't know where it is but I managed to recreate it:
function get(item,url) {
if (url == undefined)
url = window.location.href;
var itemlen = item.length
var items = url.split('?')[1].split('&');
for (var i = 0, len = items.length;i<len;i++) {
if (items[i].substr(0,itemlen) == item)
return items[i].split('=')[1];
}
return null;
}
So you would use it like:
get('bi');
If the url you gave was your current url, if not you could do:
get('bi','http://www.abebooks.com/servlet/BookDetailsPL?bi=1325819827&searchurl=an%3DLofting%252C%2BHugh.%26ds%3D30%26sortby%3D13%26tn%3DDOCTOR%2BDOLITTLE%2527S%2BGARDEN.')
Hope I didn't leave in any bugs :)

Categories

Resources