Get domain without subdomain in javascript [duplicate]

Get domain without subdomain in javascript [duplicate] - javascript

How can I fetch a domain name from a URL String?
Examples:
+----------------------+------------+
| input | output |
+----------------------+------------+
| www.google.com | google |
| www.mail.yahoo.com | mail.yahoo |
| www.mail.yahoo.co.in | mail.yahoo |
| www.abc.au.uk | abc |
+----------------------+------------+
Related:
Matching a web address through regex

I once had to write such a regex for a company I worked for. The solution was this:
Get a list of every ccTLD and gTLD available. Your first stop should be IANA. The list from Mozilla looks great at first sight, but lacks ac.uk for example so for this it is not really usable.
Join the list like the example below. A warning: Ordering is important! If org.uk would appear after uk then example.org.uk would match org instead of example.
Example regex:
.*([^\.]+)(com|net|org|info|coop|int|co\.uk|org\.uk|ac\.uk|uk|__and so on__)$
This worked really well and also matched weird, unofficial top-levels like de.com and friends.
The upside:
Very fast if regex is optimally ordered
The downside of this solution is of course:
Handwritten regex which has to be updated manually if ccTLDs change or get added. Tedious job!
Very large regex so not very readable.

A little late to the party, but:
const urls = [
'www.abc.au.uk',
'https://github.com',
'http://github.ca',
'https://www.google.ru',
'http://www.google.co.uk',
'www.yandex.com',
'yandex.ru',
'yandex'
]
urls.forEach(url => console.log(url.replace(/.+\/\/|www.|\..+/g, '')))

Extracting the Domain name accurately can be quite tricky mainly because the domain extension can contain 2 parts (like .com.au or .co.uk) and the subdomain (the prefix) may or may not be there. Listing all domain extensions is not an option because there are hundreds of these. EuroDNS.com for example lists over 800 domain name extensions.
I therefore wrote a short php function that uses 'parse_url()' and some observations about domain extensions to accurately extract the url components AND the domain name. The function is as follows:
function parse_url_all($url){
$url = substr($url,0,4)=='http'? $url: 'http://'.$url;
$d = parse_url($url);
$tmp = explode('.',$d['host']);
$n = count($tmp);
if ($n>=2){
if ($n==4 || ($n==3 && strlen($tmp[($n-2)])<=3)){
$d['domain'] = $tmp[($n-3)].".".$tmp[($n-2)].".".$tmp[($n-1)];
$d['domainX'] = $tmp[($n-3)];
} else {
$d['domain'] = $tmp[($n-2)].".".$tmp[($n-1)];
$d['domainX'] = $tmp[($n-2)];
}
}
return $d;
}
This simple function will work in almost every case. There are a few exceptions, but these are very rare.
To demonstrate / test this function you can use the following:
$urls = array('www.test.com', 'test.com', 'cp.test.com' .....);
echo "<div style='overflow-x:auto;'>";
echo "<table>";
echo "<tr><th>URL</th><th>Host</th><th>Domain</th><th>Domain X</th></tr>";
foreach ($urls as $url) {
$info = parse_url_all($url);
echo "<tr><td>".$url."</td><td>".$info['host'].
"</td><td>".$info['domain']."</td><td>".$info['domainX']."</td></tr>";
}
echo "</table></div>";
The output will be as follows for the URL's listed:
As you can see, the domain name and the domain name without the extension are consistently extracted whatever the URL that is presented to the function.
I hope that this helps.

/^(?:www\.)?(.*?)\.(?:com|au\.uk|co\.in)$/

There are two ways
Using split
Then just parse that string
var domain;
//find & remove protocol (http, ftp, etc.) and get domain
if (url.indexOf('://') > -1) {
domain = url.split('/')[2];
} if (url.indexOf('//') === 0) {
domain = url.split('/')[2];
} else {
domain = url.split('/')[0];
}
//find & remove port number
domain = domain.split(':')[0];
Using Regex
var r = /:\/\/(.[^/]+)/;
"http://stackoverflow.com/questions/5343288/get-url".match(r)[1]
=> stackoverflow.com
Hope this helps

I don't know of any libraries, but the string manipulation of domain names is easy enough.
The hard part is knowing if the name is at the second or third level. For this you will need a data file you maintain (e.g. for .uk is is not always the third level, some organisations (e.g. bl.uk, jet.uk) exist at the second level).
The source of Firefox from Mozilla has such a data file, check the Mozilla licensing to see if you could reuse that.

import urlparse
GENERIC_TLDS = [
'aero', 'asia', 'biz', 'com', 'coop', 'edu', 'gov', 'info', 'int', 'jobs',
'mil', 'mobi', 'museum', 'name', 'net', 'org', 'pro', 'tel', 'travel', 'cat'
]
def get_domain(url):
hostname = urlparse.urlparse(url.lower()).netloc
if hostname == '':
# Force the recognition as a full URL
hostname = urlparse.urlparse('http://' + uri).netloc
# Remove the 'user:passw', 'www.' and ':port' parts
hostname = hostname.split('#')[-1].split(':')[0].lstrip('www.').split('.')
num_parts = len(hostname)
if (num_parts < 3) or (len(hostname[-1]) > 2):
return '.'.join(hostname[:-1])
if len(hostname[-2]) > 2 and hostname[-2] not in GENERIC_TLDS:
return '.'.join(hostname[:-1])
if num_parts >= 3:
return '.'.join(hostname[:-2])
This code isn't guaranteed to work with all URLs and doesn't filter those that are grammatically correct but invalid like 'example.uk'.
However it'll do the job in most cases.

It is not possible without using a TLD list to compare with as their exist many cases like http://www.db.de/ or http://bbc.co.uk/ that will be interpreted by a regex as the domains db.de (correct) and co.uk (wrong).
But even with that you won't have success if your list does not contain SLDs, too. URLs like http://big.uk.com/ and http://www.uk.com/ would be both interpreted as uk.com (the first domain is big.uk.com).
Because of that all browsers use Mozilla's Public Suffix List:
https://en.wikipedia.org/wiki/Public_Suffix_List
You can use it in your code by importing it through this URL:
http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
Feel free to extend my function to extract the domain name, only. It won't use regex and it is fast:
http://www.programmierer-forum.de/domainnamen-ermitteln-t244185.htm#3471878

Basically, what you want is:
google.com -> google.com -> google
www.google.com -> google.com -> google
google.co.uk -> google.co.uk -> google
www.google.co.uk -> google.co.uk -> google
www.google.org -> google.org -> google
www.google.org.uk -> google.org.uk -> google
Optional:
www.google.com -> google.com -> www.google
images.google.com -> google.com -> images.google
mail.yahoo.co.uk -> yahoo.co.uk -> mail.yahoo
mail.yahoo.com -> yahoo.com -> mail.yahoo
www.mail.yahoo.com -> yahoo.com -> mail.yahoo
You don't need to construct an ever-changing regex as 99% of domains will be matched properly if you simply look at the 2nd last part of the name:
(co|com|gov|net|org)
If it is one of these, then you need to match 3 dots, else 2. Simple. Now, my regex wizardry is no match for that of some other SO'ers, so the best way I've found to achieve this is with some code, assuming you've already stripped off the path:
my #d=split /\./,$domain; # split the domain part into an array
$c=#d; # count how many parts
$dest=$d[$c-2].'.'.$d[$c-1]; # use the last 2 parts
if ($d[$c-2]=~m/(co|com|gov|net|org)/) { # is the second-last part one of these?
$dest=$d[$c-3].'.'.$dest; # if so, add a third part
};
print $dest; # show it
To just get the name, as per your question:
my #d=split /\./,$domain; # split the domain part into an array
$c=#d; # count how many parts
if ($d[$c-2]=~m/(co|com|gov|net|org)/) { # is the second-last part one of these?
$dest=$d[$c-3]; # if so, give the third last
$dest=$d[$c-4].'.'.$dest if ($c>3); # optional bit
} else {
$dest=$d[$c-2]; # else the second last
$dest=$d[$c-3].'.'.$dest if ($c>2); # optional bit
};
print $dest; # show it
I like this approach because it's maintenance-free. Unless you want to validate that it's actually a legitimate domain, but that's kind of pointless because you're most likely only using this to process log files and an invalid domain wouldn't find its way in there in the first place.
If you'd like to match "unofficial" subdomains such as bozo.za.net, or bozo.au.uk, bozo.msf.ru just add (za|au|msf) to the regex.
I'd love to see someone do all of this using just a regex, I'm sure it's possible.

/[^w{3}\.]([a-zA-Z0-9]([a-zA-Z0-9\-]{0,65}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}/gim
usage of this javascript regex ignores www and following dot, while retaining the domain intact. also properly matches no www and cc tld

Could you just look for the word before .com (or other) (the order of the other list would be the opposite of the frequency see here
and take the first matching group
i.e.
window.location.host.match(/(\w|-)+(?=(\.(com|net|org|info|coop|int|co|ac|ie|co|ai|eu|ca|icu|top|xyz|tk|cn|ga|cf|nl|us|eu|de|hk|am|tv|bingo|blackfriday|gov|edu|mil|arpa|au|ru)(\.|\/|$)))/g)[0]
You can test it could by copying this line into the developers' console on any tab
This example works in the following cases:

So if you just have a string and not a window.location you could use...
String.prototype.toUrl = function(){
if(!this && 0 < this.length)
{
return undefined;
}
var original = this.toString();
var s = original;
if(!original.toLowerCase().startsWith('http'))
{
s = 'http://' + original;
}
s = this.split('/');
var protocol = s[0];
var host = s[2];
var relativePath = '';
if(s.length > 3){
for(var i=3;i< s.length;i++)
{
relativePath += '/' + s[i];
}
}
s = host.split('.');
var domain = s[s.length-2] + '.' + s[s.length-1];
return {
original: original,
protocol: protocol,
domain: domain,
host: host,
relativePath: relativePath,
getParameter: function(param)
{
return this.getParameters()[param];
},
getParameters: function(){
var vars = [], hash;
var hashes = this.original.slice(this.original.indexOf('?') + 1).split('&');
for (var i = 0; i < hashes.length; i++) {
hash = hashes[i].split('=');
vars.push(hash[0]);
vars[hash[0]] = hash[1];
}
return vars;
}
};};
How to use.
var str = "http://en.wikipedia.org/wiki/Knopf?q=1&t=2";
var url = str.toUrl;
var host = url.host;
var domain = url.domain;
var original = url.original;
var relativePath = url.relativePath;
var paramQ = url.getParameter('q');
var paramT = url.getParamter('t');

For a certain purpose I did this quick Python function yesterday. It returns domain from URL. It's quick and doesn't need any input file listing stuff. However, I don't pretend it works in all cases, but it really does the job I needed for a simple text mining script.
Output looks like this :
http://www.google.co.uk => google.co.uk
http://24.media.tumblr.com/tumblr_m04s34rqh567ij78k_250.gif => tumblr.com
def getDomain(url):
parts = re.split("\/", url)
match = re.match("([\w\-]+\.)*([\w\-]+\.\w{2,6}$)", parts[2])
if match != None:
if re.search("\.uk", parts[2]):
match = re.match("([\w\-]+\.)*([\w\-]+\.[\w\-]+\.\w{2,6}$)", parts[2])
return match.group(2)
else: return ''
Seems to work pretty well.
However, it has to be modified to remove domain extensions on output as you wished.

how is this
=((?:(?:(?:http)s?:)?\/\/)?(?:(?:[a-zA-Z0-9]+)\.?)*(?:(?:[a-zA-Z0-9]+))\.[a-zA-Z0-9]{2,3})
(you may want to add "\/" to end of pattern
if your goal is to rid url's passed in as a param you may add the equal sign as the first char, like:
=((?:(?:(?:http)s?:)?//)?(?:(?:[a-zA-Z0-9]+).?)*(?:(?:[a-zA-Z0-9]+)).[a-zA-Z0-9]{2,3}/)
and replace with "/"
The goal of this example to get rid of any domain name regardless of the form it appears in.
(i.e. to ensure url parameters don't incldue domain names to avoid xss attack)

All answers here are very nice, but all will fails sometime.
So i know it is not common to link something else, already answered elsewhere, but you'll find that you have to not waste your time into impossible thing.
This because domains like mydomain.co.uk there is no way to know if an extracted domain is correct.
If you speak about to extract by URLs, something that ever have http or https or nothing in front (but if it is possible nothing in front, you have to remove
filter_var($url, filter_var($url, FILTER_VALIDATE_URL))
here below, because FILTER_VALIDATE_URL do not recognize as url a string that do not begin with http, so may remove it, and you can also achieve with something stupid like this, that never will fail:
$url = strtolower('hTTps://www.example.com/w3/forum/index.php');
if( filter_var($url, FILTER_VALIDATE_URL) && substr($url, 0, 4) == 'http' )
{
// array order is !important
$domain = str_replace(array("http://www.","https://www.","http://","https://"), array("","","",""), $url);
$spos = strpos($domain,'/');
if($spos !== false)
{
$domain = substr($domain, 0, $spos);
} } else { $domain = "can't extract a domain"; }
echo $domain;
Check FILTER_VALIDATE_URL default behavior here
But, if you want to check a domain for his validity, and ALWAYS be sure that the extracted value is correct, then you have to check against an array of valid top domains, as explained here:
https://stackoverflow.com/a/70566657/6399448
or you'll NEVER be sure that the extracted string is the correct domain. Unfortunately, all the answers here sometime will fails.
P.s the unique answer that make sense here seem to me this (i did not read it before sorry. It provide the same solution, even if do not provide an example as mine above mentioned or linked):
https://stackoverflow.com/a/569219/6399448

I know you actually asked for Regex and were not specific to a language. But In Javascript you can do this like this. Maybe other languages can parse URL in a similar way.
Easy Javascript solution
const domain = (new URL(str)).hostname.replace("www.", "");
Leave this solution in js for completeness.

In Javascript, the best way to do this is using the tld-extract npm package. Check out an example at the following link.
Below is the code for the same:
var tldExtract = require("tld-extract")
const urls = [
'http://www.mail.yahoo.co.in/',
'https://mail.yahoo.com/',
'https://www.abc.au.uk',
'https://github.com',
'http://github.ca',
'https://www.google.ru',
'https://google.co.uk',
'https://www.yandex.com',
'https://yandex.ru',
]
const tldList = [];
urls.forEach(url => tldList.push(tldExtract(url)))
console.log({tldList})
which results in the following output:
0: Object {tld: "co.in", domain: "yahoo.co.in", sub: "www.mail"}
1: Object {tld: "com", domain: "yahoo.com", sub: "mail"}
2: Object {tld: "uk", domain: "au.uk", sub: "www.abc"}
3: Object {tld: "com", domain: "github.com", sub: ""}
4: Object {tld: "ca", domain: "github.ca", sub: ""}
5: Object {tld: "ru", domain: "google.ru", sub: "www"}
6: Object {tld: "co.uk", domain: "google.co.uk", sub: ""}
7: Object {tld: "com", domain: "yandex.com", sub: "www"}
8: Object {tld: "ru", domain: "yandex.ru", sub: ""}

Found a custom function which works in most of the cases:
function getDomainWithoutSubdomain(url) {
const urlParts = new URL(url).hostname.split('.')
return urlParts
.slice(0)
.slice(-(urlParts.length === 4 ? 3 : 2))
.join('.')
}

You need a list of what domain prefixes and suffixes can be removed. For example:
Prefixes:
www.
Suffixes:
.com
.co.in
.au.uk

#!/usr/bin/perl -w
use strict;
my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+)\.[^\/]+/g) {
print $3;
}

/^(?:https?:\/\/)?(?:www\.)?([^\/]+)/i

Just for knowledge:
'http://api.livreto.co/books'.replace(/^(https?:\/\/)([a-z]{3}[0-9]?\.)?(\w+)(\.[a-zA-Z]{2,3})(\.[a-zA-Z]{2,3})?.*$/, '$3$4$5');
# returns livreto.co

I know the question is seeking a regex solution but in every attempt it won't work to cover everything
I decided to write this method in Python which only works with urls that have a subdomain (i.e. www.mydomain.co.uk) and not multiple level subdomains like www.mail.yahoo.com
def urlextract(url):
url_split=url.split(".")
if len(url_split) <= 2:
raise Exception("Full url required with subdomain:",url)
return {'subdomain': url_split[0], 'domain': url_split[1], 'suffix': ".".join(url_split[2:])}

Let's say we have this: http://google.com
and you only want the domain name
let url = http://google.com;
let domainName = url.split("://")[1];
console.log(domainName);

Use this
(.)(.*?)(.)
then just extract the leading and end points.
Easy, right?

Related

Redirection Based on Query String

Not wanting to bloat up an .htaccess with 300 entries, what would be the javascript I could use to redirect to URLs based on a query string in the request to this single file. For example,
https://www.mywebsite.com/redirect.jhtml?Type=Cool&LinkID=57
The only part I care about is the 57 and then redirect it to wherever:
https://www.anothercoolwebsite/secretworld/
In the following case, take the 34 and redirect:
https://www.mywebsite.com/redirect.jhtml?Type=Cool&LinkID=34
https://www.anoldwebsite.com/cool/file.html
Thank you!

This should do you fine. Keep in mind a server-side solution like a PHP script will work for more clients. Since you mentioned .htaccess, I think I should let you know about the fallback resource command
Anyways, here is the JS only solution
function parseString(){//Parse query string
var queryString=location.search.substring(1);//Remove ? mark
var pair = queryString.split('&'); //Key value pairs
var returnVal={};
pair.forEach(function(item,i){
var currPair = item.split('=');//Give name and value
returnVal[currPair[0]]=currPair[1];
});
return returnVal;
}
var links=["index", "about"];//Sample array of links, make sure this matches up with your LinkID
location.href=links[parseString().LinkID]+".html"; //Redirect based on LinkID

Using url parameters for a page link

What I'm about to ask may sound stupid but I've been trying to figure it out for a few days now. I want to generate a link to a site:
example.github.io/Example/Example
That has a variable or something at the end of it
example.github.io/Example/ExampleVariable
and then read that variable as the page loads. In a perfect world it would look something like this:
http://Example.github.io/Example/Example<script>function(){}</script>
I also need to make sure that the page the user actually goes to or at least ends up on is the original link: i.e.
example.github.io/Example/Example
Any help would be greatly appreciated.
Also if anyone is wondering. Yes it is on github if that applies. I barely know PHP so that's not the best. It's for a ToDo list manager app I've made. There is a load function so users can share lists. The Load string (variable I'm trying to read) looks like this: /LoadNAME#THEME#Item A,Item B,ect.

If you're using github pages you could use URL parameters. In that case the url would look something like this: http://mypage.github.io/test/?myparam=value
Then you could query that with javascript and execute something based on that url parameters the url contains.

Alternatively, you can use this hash # old trick then after it use slashes
example.github.io/Example/#/var1/var2/var3
then using the window.location.href with couple split() uses will provide you
with an array of parameters.
/* URL in address bar:
http://localhost/test/js-url-parameters/#/str1/str2/str3/
*/
var docURL = window.location.href,
params = [];
// filter out the website origin "example.github.io" in the OP example
docURL = docURL.replace(window.location.origin, '');
// if /#/ found then we have URL parameters
// grabbing the parameters part of the URL
if (docURL.indexOf('/#/') > -1) {
docURL = docURL.split('/#/')[1];
if (docURL != '') {
// omit the last forward slash if exist
if (docURL[docURL.length - 1] == '/') {
docURL = docURL.substring(0, docURL.length - 1);
}
// split the URL final string o get an object with all params
params = docURL.split('/');
console.log(params);
}
} else {
console.log('No URL parameters found');
}
/* Output:
["str1", "str2", "str3"]
*/
UPDATE:
The above outputs all variables as string, so to retrieve numeric values you need to parseInt -or parseFloat() depending on your case.
For example, if for this URL:
http://localhost/test/js-url-parameters/#/str1/22/str3/
The above code will output ["str1", "22", "str3"], while we suppose to have 22 as integer, to fix this just add this:
// for each elements in params, if it is Not a Number (NaN) we return
// it as it is, else it's a nubmer so we parseInt it then return it
for(var i in params){
params[i] = isNaN(parseInt(params[i])) ? params[i] : parseInt(params[i]);
}
the above snippets go rights after the params = docURL.split('/'); line.
Now the URL:
http://localhost/test/js-url-parameters/#/str1/22/str3/ outputs ["str1", 22, "str3"], as you see now 22 is a number rather than a string.

How to get base url from FTP address?

For example I have a url like:
ftp://xxx:xxx#ftp.example.com/BigFile.zip
How can I get example.com from this url using javascript/jquery?

You can get the browser to parse the URL for you like this :
var a = document.createElement('a');
a.href = 'ftp://xxx:xxx#ftp.example.com/BigFile.zip';
var host = a.hostname;
That gets you the hostname, which in this case would be ftp.example.com, if for some reason you have to remove the subdomain, you can do
var domain = host.split('.');
domain.shift();
var domain = domain.join('.');
FIDDLE
Here's the different parts to a URL -> https://developer.mozilla.org/en-US/docs/Web/API/Location#wikiArticle

Here is using javascript RegExp
input = "ftp://xxx:xxx#ftp.example.com/BigFile.zip";
pattern = new RegExp(/ftp:\/\/\S+?#\S+?\.([^\/]+)/);
match = pattern.exec(input);
alert(match[1]);
You can also use i at the end of regex to make it case insensitive.
pattern = new RegExp(/ftp:\/\/\S+?#\S+?\.([^\/]+)/i);

You can use jquery like this:
var url = "ftp://xxx:xxx#ftp.example.com/BigFile.zip";
var ahref = $('<a>', { href:url } )[0]; // create an <a> element
var host = ahref.hostname.split('.').slice(1).join('.'); // example.com

You can have a regex to do this for you.
url = 'ftp://xxx:xxx#ftp.example.com/BigFile.zip'
base_address = url.match(/#.*\//)[0];
base_address = base_address.substring(1, base_address.length-1)
This would contain ftp.example.com though. You can fine tune it as per your need.

I just wanted to try/add something different (can't bet for performance or the general solution, but it works and hey ! without DOM/regexp involved):
var x="ftp://xxx:xxx#ftp.example.com/BigFile.zip"
console.log((x.split(".")[1]+ "." + x.split(".")[2]).split("/")[0]);
For the given case can be shortest since always will be ".com"
console.log(x.split(".")[1]+ ".com");
Another (messy) approach (and will work with .com.something:
console.log(x.substring((x.indexOf("#ftp"))+5,x.indexOf(x.split("/")[3])-1));
And well on this we're dependend about having "#ftp" and the slashes "/" (at least 3 of them or one after the .com.something) for example would not work with: ftp://xxx:xxx#ftp.example.com
Last update This will be my best
without DOM/RegExp, nicer (but also confusing) that the previous ones
solves the problem about having or don't the slashes,
still dependant on having "#ftp." in the string.
works with .com.something.whatever
(function (splittedString){
//this is a bit nicer, no regExp, no DOM, avoid abuse of "split"
//method over and over the same string
//check if we have a "/"
if(splittedString.indexOf("/")>=0){
//split one more time only to get what we want.
return (console.log(splittedString.split("/")[0]));
}
else{
return (console.log(splittedString));//else we have what we want
}
})(x.split("#ftp.")[1]);
As always it depends how maintainable you want your code to be, I just wanted to honor the affirmation about there's more than one way to code something. My answer for sure is not the best, but based on it you could improve your question.

gsub a string in javascript

I try to get only the domain name i.e. google.com from javascript
document.location.hostname
This code returns www.google.com.
How can I only get google.com? In this case it would be to either remove the www. or get only the domain name (if there's such a method in javascript).

var host = location.hostname.replace( /www\./g, '' );
The 'g' flag is for 'global', which is needed if you want a true "gsub" (all matches replaced, not just the first).
Better, though, would be to get the full TLD:
var tld = location.hostname.replace( /^(.+\.)?(\w+\.\w+)$/, '$2' );
This will handle domains like foo.bar.jim.jam.com and give you just jam.com.

... I'm in chrome right now, and window.location.host does the trick.
EDIT
So I'm an idiot... BUT hopefully this will redeem:
An alternate to regex:
var host = window.location.hostname.split('.')
.filter(
function(el, i, array){
return (i >= array.length - 2)
}
)
.join('.');

How to trim a file extension from a String in JavaScript?

For example, assuming that x = filename.jpg, I want to get filename, where filename could be any file name (Let's assume the file name only contains [a-zA-Z0-9-_] to simplify.).
I saw x.substring(0, x.indexOf('.jpg')) on DZone Snippets, but wouldn't x.substring(0, x.length-4) perform better? Because, length is a property and doesn't do character checking whereas indexOf() is a function and does character checking.

Not sure what would perform faster but this would be more reliable when it comes to extension like .jpeg or .html
x.replace(/\.[^/.]+$/, "")

In node.js, the name of the file without the extension can be obtained as follows.
const path = require('path');
const filename = 'hello.html';
path.parse(filename).name; //=> "hello"
path.parse(filename).ext; //=> ".html"
path.parse(filename).base; //=> "hello.html"
Further explanation at Node.js documentation page.

If you know the length of the extension, you can use x.slice(0, -4) (where 4 is the three characters of the extension and the dot).
If you don't know the length #John Hartsock regex would be the right approach.
If you'd rather not use regular expressions, you can try this (less performant):
filename.split('.').slice(0, -1).join('.')
Note that it will fail on files without extension.

x.length-4 only accounts for extensions of 3 characters. What if you have filename.jpegor filename.pl?
EDIT:
To answer... sure, if you always have an extension of .jpg, x.length-4 would work just fine.
However, if you don't know the length of your extension, any of a number of solutions are better/more robust.
x = x.replace(/\..+$/, '');
OR
x = x.substring(0, x.lastIndexOf('.'));
OR
x = x.replace(/(.*)\.(.*?)$/, "$1");
OR (with the assumption filename only has one dot)
parts = x.match(/[^\.]+/);
x = parts[0];
OR (also with only one dot)
parts = x.split(".");
x = parts[0];

I like this one because it is a one liner which isn't too hard to read:
filename.substring(0, filename.lastIndexOf('.')) || filename

You can perhaps use the assumption that the last dot will be the extension delimiter.
var x = 'filename.jpg';
var f = x.substr(0, x.lastIndexOf('.'));
If file has no extension, it will return empty string. To fix that use this function
function removeExtension(filename){
var lastDotPosition = filename.lastIndexOf(".");
if (lastDotPosition === -1) return filename;
else return filename.substr(0, lastDotPosition);
}

In Node.js versions prior to 0.12.x:
path.basename(filename, path.extname(filename))
Of course this also works in 0.12.x and later.

I don't know if it's a valid option but I use this:
name = filename.split(".");
// trimming with pop()
name.pop();
// getting the name with join()
name.join('.'); // we split by '.' and we join by '.' to restore other eventual points.
It's not just one operation I know, but at least it should always work!
UPDATE: If you want a oneliner, here you are:
(name.split('.').slice(0, -1)).join('.')

This works, even when the delimiter is not present in the string.
String.prototype.beforeLastIndex = function (delimiter) {
return this.split(delimiter).slice(0,-1).join(delimiter) || this + ""
}
"image".beforeLastIndex(".") // "image"
"image.jpeg".beforeLastIndex(".") // "image"
"image.second.jpeg".beforeLastIndex(".") // "image.second"
"image.second.third.jpeg".beforeLastIndex(".") // "image.second.third"
Can also be used as a one-liner like this:
var filename = "this.is.a.filename.txt";
console.log(filename.split(".").slice(0,-1).join(".") || filename + "");
EDIT: This is a more efficient solution:
String.prototype.beforeLastIndex = function (delimiter) {
return this.substr(0,this.lastIndexOf(delimiter)) || this + ""
}

Another one-liner:
x.split(".").slice(0, -1).join(".")

Here's another regex-based solution:
filename.replace(/\.[^.$]+$/, '');
This should only chop off the last segment.

Simple one:
var n = str.lastIndexOf(".");
return n > -1 ? str.substr(0, n) : str;

The accepted answer strips the last extension part only (.jpeg), which might be a good choice in most cases.
I once had to strip all extensions (.tar.gz) and the file names were restricted to not contain dots (so 2015-01-01.backup.tar would not be a problem):
var name = "2015-01-01_backup.tar.gz";
name.replace(/(\.[^/.]+)+$/, "");

var fileName = "something.extension";
fileName.slice(0, -path.extname(fileName).length) // === "something"

If you have to process a variable that contains the complete path (ex.: thePath = "http://stackoverflow.com/directory/subdirectory/filename.jpg") and you want to return just "filename" you can use:
theName = thePath.split("/").slice(-1).join().split(".").shift();
the result will be theName == "filename";
To try it write the following command into the console window of your chrome debugger:
window.location.pathname.split("/").slice(-1).join().split(".").shift()
If you have to process just the file name and its extension (ex.: theNameWithExt = "filename.jpg"):
theName = theNameWithExt.split(".").shift();
the result will be theName == "filename", the same as above;
Notes:
The first one is a little bit slower cause performes more
operations; but works in both cases, in other words it can extract
the file name without extension from a given string that contains a path or a file name with ex. While the second works only if the given variable contains a filename with ext like filename.ext but is a little bit quicker.
Both solutions work for both local and server files;
But I can't say nothing about neither performances comparison with other answers nor for browser or OS compatibility.
working snippet 1: the complete path
var thePath = "http://stackoverflow.com/directory/subdirectory/filename.jpg";
theName = thePath.split("/").slice(-1).join().split(".").shift();
alert(theName);
working snippet 2: the file name with extension
var theNameWithExt = "filename.jpg";
theName = theNameWithExt.split("/").slice(-1).join().split(".").shift();
alert(theName);
working snippet 2: the file name with double extension
var theNameWithExt = "filename.tar.gz";
theName = theNameWithExt.split("/").slice(-1).join().split(".").shift();
alert(theName);

Node.js remove extension from full path keeping directory
https://stackoverflow.com/a/31615711/895245 for example did path/hello.html -> hello, but if you want path/hello.html -> path/hello, you can use this:
#!/usr/bin/env node
const path = require('path');
const filename = 'path/hello.html';
const filename_parsed = path.parse(filename);
console.log(path.join(filename_parsed.dir, filename_parsed.name));
outputs directory as well:
path/hello
https://stackoverflow.com/a/36099196/895245 also achieves this, but I find this approach a bit more semantically pleasing.
Tested in Node.js v10.15.2.

Though it's pretty late, I will add another approach to get the filename without extension using plain old JS-
path.replace(path.substr(path.lastIndexOf('.')), '')

A straightforward answer, if you are using Node.js, is the one in the first comment.
My task was I need to delete an image in Cloudinary from the Node server and I just need to get the image name only.
Example:
const path = require("path")
const image=xyz.jpg;
const img= path.parse(image).name
console.log(img) // xyz

This is where regular expressions come in handy! Javascript's .replace() method will take a regular expression, and you can utilize that to accomplish what you want:
// assuming var x = filename.jpg or some extension
x = x.replace(/(.*)\.[^.]+$/, "$1");

You can use path to maneuver.
var MYPATH = '/User/HELLO/WORLD/FILENAME.js';
var MYEXT = '.js';
var fileName = path.basename(MYPATH, MYEXT);
var filePath = path.dirname(MYPATH) + '/' + fileName;
Output
> filePath
'/User/HELLO/WORLD/FILENAME'
> fileName
'FILENAME'
> MYPATH
'/User/HELLO/WORLD/FILENAME.js'

This is the code I use to remove the extension from a filename, without using either regex or indexOf (indexOf is not supported in IE8). It assumes that the extension is any text after the last '.' character.
It works for:
files without an extension: "myletter"
files with '.' in the name: "my.letter.txt"
unknown length of file extension: "my.letter.html"
Here's the code:
var filename = "my.letter.txt" // some filename
var substrings = filename.split('.'); // split the string at '.'
if (substrings.length == 1)
{
return filename; // there was no file extension, file was something like 'myfile'
}
else
{
var ext = substrings.pop(); // remove the last element
var name = substrings.join(""); // rejoin the remaining elements without separator
name = ([name, ext]).join("."); // readd the extension
return name;
}

I like to use the regex to do that. It's short and easy to understand.
for (const regexPattern of [
/\..+$/, // Find the first dot and all the content after it.
/\.[^/.]+$/ // Get the last dot and all the content after it.
]) {
console.log("myFont.ttf".replace(regexPattern, ""))
console.log("myFont.ttf.log".replace(regexPattern, ""))
}
/* output
myFont
myFont
myFont
myFont.ttf
*/
The above explanation may not be very rigorous. If you want to get a more accurate explanation can go to regex101 to check
\..+$
\.[^/.]+$

We might come across filename or file path with multiple extension suffix. Consider the following to trim them.
text = "/dir/path/filename.tar.gz"
output = text.replace(/(\.\w+)+$/,"")
result of output: "/dir/path/filename"
It solves the file extension problem especially when the input has multiple extensions.

Another one liner - we presume our file is a jpg picture >> ex: var yourStr = 'test.jpg';
yourStr = yourStr.slice(0, -4); // 'test'

x.slice(0, -(x.split('.').pop().length + 1));

name.split('.').slice(0, -1).join('.')
that's all enjoy your coding...

I would use something like x.substring(0, x.lastIndexOf('.')). If you're going for performance, don't go for javascript at all :-p No, one more statement really doesn't matter for 99.99999% of all purposes.

Develop Reference

JavaScript is the programming language of the Web.

Get domain without subdomain in javascript [duplicate] - javascript

A little late to the party, but: const urls = [ 'www.abc.au.uk', 'https://github.com', 'http://github.ca', 'https://www.google.ru', 'http://www.google.co.uk', 'www.yandex.com', 'yandex.ru', 'yandex' ] urls.forEach(url => console.log(url.replace(/.+\/\/|www.|\..+/g, '')))

/^(?:www\.)?(.*?)\.(?:com|au\.uk|co\.in)$/

/[^w{3}\.]([a-zA-Z0-9]([a-zA-Z0-9\-]{0,65}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}/gim usage of this javascript regex ignores www and following dot, while retaining the domain intact. also properly matches no www and cc tld

I know you actually asked for Regex and were not specific to a language. But In Javascript you can do this like this. Maybe other languages can parse URL in a similar way. Easy Javascript solution const domain = (new URL(str)).hostname.replace("www.", ""); Leave this solution in js for completeness.

Found a custom function which works in most of the cases: function getDomainWithoutSubdomain(url) { const urlParts = new URL(url).hostname.split('.') return urlParts .slice(0) .slice(-(urlParts.length === 4 ? 3 : 2)) .join('.') }

You need a list of what domain prefixes and suffixes can be removed. For example: Prefixes: www. Suffixes: .com .co.in .au.uk

#!/usr/bin/perl -w use strict; my $url = $ARGV[0]; if($url =~ /([^:]:\/\/)?([^\/]\.)*([^\/\.]+)\.[^\/]+/g) { print $3; }

/^(?:https?:\/\/)?(?:www\.)?([^\/]+)/i

Just for knowledge: 'http://api.livreto.co/books'.replace(/^(https?:\/\/)([a-z]{3}[0-9]?\.)?(\w+)(\.[a-zA-Z]{2,3})(\.[a-zA-Z]{2,3})?.*$/, '$3$4$5'); # returns livreto.co

Let's say we have this: http://google.com and you only want the domain name let url = http://google.com; let domainName = url.split("://")[1]; console.log(domainName);

Use this (.)(.*?)(.) then just extract the leading and end points. Easy, right?

Related

Redirection Based on Query String

Using url parameters for a page link

How to get base url from FTP address?

gsub a string in javascript

How to trim a file extension from a String in JavaScript?

Categories

Resources

Develop Reference

JavaScript is the programming language of the Web.

Get domain without subdomain in javascript [duplicate] - javascript

A little late to the party, but: const urls = [ 'www.abc.au.uk', 'https://github.com', 'http://github.ca', 'https://www.google.ru', 'http://www.google.co.uk', 'www.yandex.com', 'yandex.ru', 'yandex' ] urls.forEach(url => console.log(url.replace(/.+\/\/|www.|\..+/g, '')))

/^(?:www\.)?(.*?)\.(?:com|au\.uk|co\.in)$/

/[^w{3}\.]([a-zA-Z0-9]([a-zA-Z0-9\-]{0,65}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}/gim usage of this javascript regex ignores www and following dot, while retaining the domain intact. also properly matches no www and cc tld

Found a custom function which works in most of the cases: function getDomainWithoutSubdomain(url) { const urlParts = new URL(url).hostname.split('.') return urlParts .slice(0) .slice(-(urlParts.length === 4 ? 3 : 2)) .join('.') }

You need a list of what domain prefixes and suffixes can be removed. For example: Prefixes: www. Suffixes: .com .co.in .au.uk

#!/usr/bin/perl -w use strict; my $url = $ARGV[0]; if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+)\.[^\/]+/g) { print $3; }

/^(?:https?:\/\/)?(?:www\.)?([^\/]+)/i

Just for knowledge: 'http://api.livreto.co/books'.replace(/^(https?:\/\/)([a-z]{3}[0-9]?\.)?(\w+)(\.[a-zA-Z]{2,3})(\.[a-zA-Z]{2,3})?.*$/, '$3$4$5'); # returns livreto.co

Let's say we have this: http://google.com and you only want the domain name let url = http://google.com; let domainName = url.split("://")[1]; console.log(domainName);

Use this (.)(.*?)(.) then just extract the leading and end points. Easy, right?

Related

Redirection Based on Query String

Using url parameters for a page link

How to get base url from FTP address?

gsub a string in javascript

How to trim a file extension from a String in JavaScript?

Categories

Resources

#!/usr/bin/perl -w use strict; my $url = $ARGV[0]; if($url =~ /([^:]:\/\/)?([^\/]\.)*([^\/\.]+)\.[^\/]+/g) { print $3; }