gsub a string in javascript

gsub a string in javascript - javascript

I try to get only the domain name i.e. google.com from javascript
document.location.hostname
This code returns www.google.com.
How can I only get google.com? In this case it would be to either remove the www. or get only the domain name (if there's such a method in javascript).

var host = location.hostname.replace( /www\./g, '' );
The 'g' flag is for 'global', which is needed if you want a true "gsub" (all matches replaced, not just the first).
Better, though, would be to get the full TLD:
var tld = location.hostname.replace( /^(.+\.)?(\w+\.\w+)$/, '$2' );
This will handle domains like foo.bar.jim.jam.com and give you just jam.com.

... I'm in chrome right now, and window.location.host does the trick.
EDIT
So I'm an idiot... BUT hopefully this will redeem:
An alternate to regex:
var host = window.location.hostname.split('.')
.filter(
function(el, i, array){
return (i >= array.length - 2)
}
)
.join('.');

Related

Get domain without subdomain in javascript [duplicate]

How can I fetch a domain name from a URL String?
Examples:
+----------------------+------------+
| input | output |
+----------------------+------------+
| www.google.com | google |
| www.mail.yahoo.com | mail.yahoo |
| www.mail.yahoo.co.in | mail.yahoo |
| www.abc.au.uk | abc |
+----------------------+------------+
Related:
Matching a web address through regex

I once had to write such a regex for a company I worked for. The solution was this:
Get a list of every ccTLD and gTLD available. Your first stop should be IANA. The list from Mozilla looks great at first sight, but lacks ac.uk for example so for this it is not really usable.
Join the list like the example below. A warning: Ordering is important! If org.uk would appear after uk then example.org.uk would match org instead of example.
Example regex:
.*([^\.]+)(com|net|org|info|coop|int|co\.uk|org\.uk|ac\.uk|uk|__and so on__)$
This worked really well and also matched weird, unofficial top-levels like de.com and friends.
The upside:
Very fast if regex is optimally ordered
The downside of this solution is of course:
Handwritten regex which has to be updated manually if ccTLDs change or get added. Tedious job!
Very large regex so not very readable.

A little late to the party, but:
const urls = [
'www.abc.au.uk',
'https://github.com',
'http://github.ca',
'https://www.google.ru',
'http://www.google.co.uk',
'www.yandex.com',
'yandex.ru',
'yandex'
]
urls.forEach(url => console.log(url.replace(/.+\/\/|www.|\..+/g, '')))

Extracting the Domain name accurately can be quite tricky mainly because the domain extension can contain 2 parts (like .com.au or .co.uk) and the subdomain (the prefix) may or may not be there. Listing all domain extensions is not an option because there are hundreds of these. EuroDNS.com for example lists over 800 domain name extensions.
I therefore wrote a short php function that uses 'parse_url()' and some observations about domain extensions to accurately extract the url components AND the domain name. The function is as follows:
function parse_url_all($url){
$url = substr($url,0,4)=='http'? $url: 'http://'.$url;
$d = parse_url($url);
$tmp = explode('.',$d['host']);
$n = count($tmp);
if ($n>=2){
if ($n==4 || ($n==3 && strlen($tmp[($n-2)])<=3)){
$d['domain'] = $tmp[($n-3)].".".$tmp[($n-2)].".".$tmp[($n-1)];
$d['domainX'] = $tmp[($n-3)];
} else {
$d['domain'] = $tmp[($n-2)].".".$tmp[($n-1)];
$d['domainX'] = $tmp[($n-2)];
}
}
return $d;
}
This simple function will work in almost every case. There are a few exceptions, but these are very rare.
To demonstrate / test this function you can use the following:
$urls = array('www.test.com', 'test.com', 'cp.test.com' .....);
echo "<div style='overflow-x:auto;'>";
echo "<table>";
echo "<tr><th>URL</th><th>Host</th><th>Domain</th><th>Domain X</th></tr>";
foreach ($urls as $url) {
$info = parse_url_all($url);
echo "<tr><td>".$url."</td><td>".$info['host'].
"</td><td>".$info['domain']."</td><td>".$info['domainX']."</td></tr>";
}
echo "</table></div>";
The output will be as follows for the URL's listed:
As you can see, the domain name and the domain name without the extension are consistently extracted whatever the URL that is presented to the function.
I hope that this helps.

/^(?:www\.)?(.*?)\.(?:com|au\.uk|co\.in)$/

There are two ways
Using split
Then just parse that string
var domain;
//find & remove protocol (http, ftp, etc.) and get domain
if (url.indexOf('://') > -1) {
domain = url.split('/')[2];
} if (url.indexOf('//') === 0) {
domain = url.split('/')[2];
} else {
domain = url.split('/')[0];
}
//find & remove port number
domain = domain.split(':')[0];
Using Regex
var r = /:\/\/(.[^/]+)/;
"http://stackoverflow.com/questions/5343288/get-url".match(r)[1]
=> stackoverflow.com
Hope this helps

I don't know of any libraries, but the string manipulation of domain names is easy enough.
The hard part is knowing if the name is at the second or third level. For this you will need a data file you maintain (e.g. for .uk is is not always the third level, some organisations (e.g. bl.uk, jet.uk) exist at the second level).
The source of Firefox from Mozilla has such a data file, check the Mozilla licensing to see if you could reuse that.

import urlparse
GENERIC_TLDS = [
'aero', 'asia', 'biz', 'com', 'coop', 'edu', 'gov', 'info', 'int', 'jobs',
'mil', 'mobi', 'museum', 'name', 'net', 'org', 'pro', 'tel', 'travel', 'cat'
]
def get_domain(url):
hostname = urlparse.urlparse(url.lower()).netloc
if hostname == '':
# Force the recognition as a full URL
hostname = urlparse.urlparse('http://' + uri).netloc
# Remove the 'user:passw', 'www.' and ':port' parts
hostname = hostname.split('#')[-1].split(':')[0].lstrip('www.').split('.')
num_parts = len(hostname)
if (num_parts < 3) or (len(hostname[-1]) > 2):
return '.'.join(hostname[:-1])
if len(hostname[-2]) > 2 and hostname[-2] not in GENERIC_TLDS:
return '.'.join(hostname[:-1])
if num_parts >= 3:
return '.'.join(hostname[:-2])
This code isn't guaranteed to work with all URLs and doesn't filter those that are grammatically correct but invalid like 'example.uk'.
However it'll do the job in most cases.

It is not possible without using a TLD list to compare with as their exist many cases like http://www.db.de/ or http://bbc.co.uk/ that will be interpreted by a regex as the domains db.de (correct) and co.uk (wrong).
But even with that you won't have success if your list does not contain SLDs, too. URLs like http://big.uk.com/ and http://www.uk.com/ would be both interpreted as uk.com (the first domain is big.uk.com).
Because of that all browsers use Mozilla's Public Suffix List:
https://en.wikipedia.org/wiki/Public_Suffix_List
You can use it in your code by importing it through this URL:
http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
Feel free to extend my function to extract the domain name, only. It won't use regex and it is fast:
http://www.programmierer-forum.de/domainnamen-ermitteln-t244185.htm#3471878

Basically, what you want is:
google.com -> google.com -> google
www.google.com -> google.com -> google
google.co.uk -> google.co.uk -> google
www.google.co.uk -> google.co.uk -> google
www.google.org -> google.org -> google
www.google.org.uk -> google.org.uk -> google
Optional:
www.google.com -> google.com -> www.google
images.google.com -> google.com -> images.google
mail.yahoo.co.uk -> yahoo.co.uk -> mail.yahoo
mail.yahoo.com -> yahoo.com -> mail.yahoo
www.mail.yahoo.com -> yahoo.com -> mail.yahoo
You don't need to construct an ever-changing regex as 99% of domains will be matched properly if you simply look at the 2nd last part of the name:
(co|com|gov|net|org)
If it is one of these, then you need to match 3 dots, else 2. Simple. Now, my regex wizardry is no match for that of some other SO'ers, so the best way I've found to achieve this is with some code, assuming you've already stripped off the path:
my #d=split /\./,$domain; # split the domain part into an array
$c=#d; # count how many parts
$dest=$d[$c-2].'.'.$d[$c-1]; # use the last 2 parts
if ($d[$c-2]=~m/(co|com|gov|net|org)/) { # is the second-last part one of these?
$dest=$d[$c-3].'.'.$dest; # if so, add a third part
};
print $dest; # show it
To just get the name, as per your question:
my #d=split /\./,$domain; # split the domain part into an array
$c=#d; # count how many parts
if ($d[$c-2]=~m/(co|com|gov|net|org)/) { # is the second-last part one of these?
$dest=$d[$c-3]; # if so, give the third last
$dest=$d[$c-4].'.'.$dest if ($c>3); # optional bit
} else {
$dest=$d[$c-2]; # else the second last
$dest=$d[$c-3].'.'.$dest if ($c>2); # optional bit
};
print $dest; # show it
I like this approach because it's maintenance-free. Unless you want to validate that it's actually a legitimate domain, but that's kind of pointless because you're most likely only using this to process log files and an invalid domain wouldn't find its way in there in the first place.
If you'd like to match "unofficial" subdomains such as bozo.za.net, or bozo.au.uk, bozo.msf.ru just add (za|au|msf) to the regex.
I'd love to see someone do all of this using just a regex, I'm sure it's possible.

/[^w{3}\.]([a-zA-Z0-9]([a-zA-Z0-9\-]{0,65}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}/gim
usage of this javascript regex ignores www and following dot, while retaining the domain intact. also properly matches no www and cc tld

Could you just look for the word before .com (or other) (the order of the other list would be the opposite of the frequency see here
and take the first matching group
i.e.
window.location.host.match(/(\w|-)+(?=(\.(com|net|org|info|coop|int|co|ac|ie|co|ai|eu|ca|icu|top|xyz|tk|cn|ga|cf|nl|us|eu|de|hk|am|tv|bingo|blackfriday|gov|edu|mil|arpa|au|ru)(\.|\/|$)))/g)[0]
You can test it could by copying this line into the developers' console on any tab
This example works in the following cases:

So if you just have a string and not a window.location you could use...
String.prototype.toUrl = function(){
if(!this && 0 < this.length)
{
return undefined;
}
var original = this.toString();
var s = original;
if(!original.toLowerCase().startsWith('http'))
{
s = 'http://' + original;
}
s = this.split('/');
var protocol = s[0];
var host = s[2];
var relativePath = '';
if(s.length > 3){
for(var i=3;i< s.length;i++)
{
relativePath += '/' + s[i];
}
}
s = host.split('.');
var domain = s[s.length-2] + '.' + s[s.length-1];
return {
original: original,
protocol: protocol,
domain: domain,
host: host,
relativePath: relativePath,
getParameter: function(param)
{
return this.getParameters()[param];
},
getParameters: function(){
var vars = [], hash;
var hashes = this.original.slice(this.original.indexOf('?') + 1).split('&');
for (var i = 0; i < hashes.length; i++) {
hash = hashes[i].split('=');
vars.push(hash[0]);
vars[hash[0]] = hash[1];
}
return vars;
}
};};
How to use.
var str = "http://en.wikipedia.org/wiki/Knopf?q=1&t=2";
var url = str.toUrl;
var host = url.host;
var domain = url.domain;
var original = url.original;
var relativePath = url.relativePath;
var paramQ = url.getParameter('q');
var paramT = url.getParamter('t');

For a certain purpose I did this quick Python function yesterday. It returns domain from URL. It's quick and doesn't need any input file listing stuff. However, I don't pretend it works in all cases, but it really does the job I needed for a simple text mining script.
Output looks like this :
http://www.google.co.uk => google.co.uk
http://24.media.tumblr.com/tumblr_m04s34rqh567ij78k_250.gif => tumblr.com
def getDomain(url):
parts = re.split("\/", url)
match = re.match("([\w\-]+\.)*([\w\-]+\.\w{2,6}$)", parts[2])
if match != None:
if re.search("\.uk", parts[2]):
match = re.match("([\w\-]+\.)*([\w\-]+\.[\w\-]+\.\w{2,6}$)", parts[2])
return match.group(2)
else: return ''
Seems to work pretty well.
However, it has to be modified to remove domain extensions on output as you wished.

how is this
=((?:(?:(?:http)s?:)?\/\/)?(?:(?:[a-zA-Z0-9]+)\.?)*(?:(?:[a-zA-Z0-9]+))\.[a-zA-Z0-9]{2,3})
(you may want to add "\/" to end of pattern
if your goal is to rid url's passed in as a param you may add the equal sign as the first char, like:
=((?:(?:(?:http)s?:)?//)?(?:(?:[a-zA-Z0-9]+).?)*(?:(?:[a-zA-Z0-9]+)).[a-zA-Z0-9]{2,3}/)
and replace with "/"
The goal of this example to get rid of any domain name regardless of the form it appears in.
(i.e. to ensure url parameters don't incldue domain names to avoid xss attack)

All answers here are very nice, but all will fails sometime.
So i know it is not common to link something else, already answered elsewhere, but you'll find that you have to not waste your time into impossible thing.
This because domains like mydomain.co.uk there is no way to know if an extracted domain is correct.
If you speak about to extract by URLs, something that ever have http or https or nothing in front (but if it is possible nothing in front, you have to remove
filter_var($url, filter_var($url, FILTER_VALIDATE_URL))
here below, because FILTER_VALIDATE_URL do not recognize as url a string that do not begin with http, so may remove it, and you can also achieve with something stupid like this, that never will fail:
$url = strtolower('hTTps://www.example.com/w3/forum/index.php');
if( filter_var($url, FILTER_VALIDATE_URL) && substr($url, 0, 4) == 'http' )
{
// array order is !important
$domain = str_replace(array("http://www.","https://www.","http://","https://"), array("","","",""), $url);
$spos = strpos($domain,'/');
if($spos !== false)
{
$domain = substr($domain, 0, $spos);
} } else { $domain = "can't extract a domain"; }
echo $domain;
Check FILTER_VALIDATE_URL default behavior here
But, if you want to check a domain for his validity, and ALWAYS be sure that the extracted value is correct, then you have to check against an array of valid top domains, as explained here:
https://stackoverflow.com/a/70566657/6399448
or you'll NEVER be sure that the extracted string is the correct domain. Unfortunately, all the answers here sometime will fails.
P.s the unique answer that make sense here seem to me this (i did not read it before sorry. It provide the same solution, even if do not provide an example as mine above mentioned or linked):
https://stackoverflow.com/a/569219/6399448

I know you actually asked for Regex and were not specific to a language. But In Javascript you can do this like this. Maybe other languages can parse URL in a similar way.
Easy Javascript solution
const domain = (new URL(str)).hostname.replace("www.", "");
Leave this solution in js for completeness.

In Javascript, the best way to do this is using the tld-extract npm package. Check out an example at the following link.
Below is the code for the same:
var tldExtract = require("tld-extract")
const urls = [
'http://www.mail.yahoo.co.in/',
'https://mail.yahoo.com/',
'https://www.abc.au.uk',
'https://github.com',
'http://github.ca',
'https://www.google.ru',
'https://google.co.uk',
'https://www.yandex.com',
'https://yandex.ru',
]
const tldList = [];
urls.forEach(url => tldList.push(tldExtract(url)))
console.log({tldList})
which results in the following output:
0: Object {tld: "co.in", domain: "yahoo.co.in", sub: "www.mail"}
1: Object {tld: "com", domain: "yahoo.com", sub: "mail"}
2: Object {tld: "uk", domain: "au.uk", sub: "www.abc"}
3: Object {tld: "com", domain: "github.com", sub: ""}
4: Object {tld: "ca", domain: "github.ca", sub: ""}
5: Object {tld: "ru", domain: "google.ru", sub: "www"}
6: Object {tld: "co.uk", domain: "google.co.uk", sub: ""}
7: Object {tld: "com", domain: "yandex.com", sub: "www"}
8: Object {tld: "ru", domain: "yandex.ru", sub: ""}

Found a custom function which works in most of the cases:
function getDomainWithoutSubdomain(url) {
const urlParts = new URL(url).hostname.split('.')
return urlParts
.slice(0)
.slice(-(urlParts.length === 4 ? 3 : 2))
.join('.')
}

You need a list of what domain prefixes and suffixes can be removed. For example:
Prefixes:
www.
Suffixes:
.com
.co.in
.au.uk

#!/usr/bin/perl -w
use strict;
my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+)\.[^\/]+/g) {
print $3;
}

/^(?:https?:\/\/)?(?:www\.)?([^\/]+)/i

Just for knowledge:
'http://api.livreto.co/books'.replace(/^(https?:\/\/)([a-z]{3}[0-9]?\.)?(\w+)(\.[a-zA-Z]{2,3})(\.[a-zA-Z]{2,3})?.*$/, '$3$4$5');
# returns livreto.co

I know the question is seeking a regex solution but in every attempt it won't work to cover everything
I decided to write this method in Python which only works with urls that have a subdomain (i.e. www.mydomain.co.uk) and not multiple level subdomains like www.mail.yahoo.com
def urlextract(url):
url_split=url.split(".")
if len(url_split) <= 2:
raise Exception("Full url required with subdomain:",url)
return {'subdomain': url_split[0], 'domain': url_split[1], 'suffix': ".".join(url_split[2:])}

Let's say we have this: http://google.com
and you only want the domain name
let url = http://google.com;
let domainName = url.split("://")[1];
console.log(domainName);

Use this
(.)(.*?)(.)
then just extract the leading and end points.
Easy, right?

How to get base url from FTP address?

For example I have a url like:
ftp://xxx:xxx#ftp.example.com/BigFile.zip
How can I get example.com from this url using javascript/jquery?

You can get the browser to parse the URL for you like this :
var a = document.createElement('a');
a.href = 'ftp://xxx:xxx#ftp.example.com/BigFile.zip';
var host = a.hostname;
That gets you the hostname, which in this case would be ftp.example.com, if for some reason you have to remove the subdomain, you can do
var domain = host.split('.');
domain.shift();
var domain = domain.join('.');
FIDDLE
Here's the different parts to a URL -> https://developer.mozilla.org/en-US/docs/Web/API/Location#wikiArticle

Here is using javascript RegExp
input = "ftp://xxx:xxx#ftp.example.com/BigFile.zip";
pattern = new RegExp(/ftp:\/\/\S+?#\S+?\.([^\/]+)/);
match = pattern.exec(input);
alert(match[1]);
You can also use i at the end of regex to make it case insensitive.
pattern = new RegExp(/ftp:\/\/\S+?#\S+?\.([^\/]+)/i);

You can use jquery like this:
var url = "ftp://xxx:xxx#ftp.example.com/BigFile.zip";
var ahref = $('<a>', { href:url } )[0]; // create an <a> element
var host = ahref.hostname.split('.').slice(1).join('.'); // example.com

You can have a regex to do this for you.
url = 'ftp://xxx:xxx#ftp.example.com/BigFile.zip'
base_address = url.match(/#.*\//)[0];
base_address = base_address.substring(1, base_address.length-1)
This would contain ftp.example.com though. You can fine tune it as per your need.

I just wanted to try/add something different (can't bet for performance or the general solution, but it works and hey ! without DOM/regexp involved):
var x="ftp://xxx:xxx#ftp.example.com/BigFile.zip"
console.log((x.split(".")[1]+ "." + x.split(".")[2]).split("/")[0]);
For the given case can be shortest since always will be ".com"
console.log(x.split(".")[1]+ ".com");
Another (messy) approach (and will work with .com.something:
console.log(x.substring((x.indexOf("#ftp"))+5,x.indexOf(x.split("/")[3])-1));
And well on this we're dependend about having "#ftp" and the slashes "/" (at least 3 of them or one after the .com.something) for example would not work with: ftp://xxx:xxx#ftp.example.com
Last update This will be my best
without DOM/RegExp, nicer (but also confusing) that the previous ones
solves the problem about having or don't the slashes,
still dependant on having "#ftp." in the string.
works with .com.something.whatever
(function (splittedString){
//this is a bit nicer, no regExp, no DOM, avoid abuse of "split"
//method over and over the same string
//check if we have a "/"
if(splittedString.indexOf("/")>=0){
//split one more time only to get what we want.
return (console.log(splittedString.split("/")[0]));
}
else{
return (console.log(splittedString));//else we have what we want
}
})(x.split("#ftp.")[1]);
As always it depends how maintainable you want your code to be, I just wanted to honor the affirmation about there's more than one way to code something. My answer for sure is not the best, but based on it you could improve your question.

Regex to find <a> tags containing links to specific file types

I am trying to write a small jQuery / javascript function that searches through all the links on a page, identifies the type of file to which the tag links, and then adds an appropriate class. The purpose of this task is to style the links depending on the type of file at the other end of the link.
So far I have this:
$(document).ready(function(){
$('#rt-mainbody a').each(function(){
linkURL = $(this).attr('href');
var match = linkURL.match("^.*\.(pdf|PDF)$");
if(match != null){$(this).addClass('pdf');}
});
});
Fiddle me this.
And then I would continue the concept to identify, for example, spreadsheet files, Word documents, text files, jpgs, etc.
it works... but the thing is, to me this is super clunky because I have completely botched it together from odds and sods I've found around SO and the internet - I'm sure there must be a neater, more efficient, more readable way of doing this but I have no idea what it might be. Can someone give it a spit and polish for me, please?
Ideally the function should detect (a) that the extension is at the end of the href string, and (b) that the extension is preceded by a dot.
Thanks! :)
EDIT
Wow! Such a response! :) Thanks guys!
When I saw the method using simply the selector it was a bit of a facepalm moment - however the end user I am building this app for is linking to PDFs (and potentially other MIMEs) on a multitude of resource websites and has no control over the case usage of the filenames to which they'll be linking... using the selector is clearly not the way to go because the result would be so inconsistent.
EDIT
And the grand prize goes to #Dave Stein!! :D
The solution I will adopt is a "set it and leave it" script (fiddle me that) which will accommodate any extension, regardless of case, and all I need to do is tweak the CSS for each reasonable eventuality.
It's actually nice to learn that I was already fairly close to the best solution already... more through good luck than good judgement though XD

Well you don't want to use regex to search strings so I like that you narrowed it to just links. I saved off $(this) so you don't have to double call it. I also changed the regex so it's case insensitive. And lastly I made sure that the class is adding what the match was. This accomplish what you want?
$(document).ready(function(){
$('#rt-mainbody a').each(function(){
var $link = $(this),
linkURL = $link.attr('href'),
// I can't remember offhand but I think some extensions have numbers too
match = linkURL.match( /^.*\.([a-z0-9]+)$/i );
if( match != null ){
$link.addClass( match[1].toLowerCase() );
}
});
});
Oh and I almost forgot, I made sure linkURL was no longer global. :)

"Attribute ends with" selector:
$('#rt-mainbody a[href$=".pdf"], #rt-mainbody a[href$=".PDF"]').addClass('pdf')
EDIT: Or more generally and flexibly:
var types = {
doc: ['doc', 'docx'],
pdf: ['pdf'],
// ...
};
function addLinkClasses(ancestor, types) {
var $ancestor = $(ancestor);
$.each(types, function(type, extensions) {
selector = $.map(extensions, function(extension) {
return 'a[href$=".' + extension + '"]';
}).join(', ');
$ancestor.find(selector).addClass(type);
});
}
addLinkClasses('#rt-mainbody', types);
This is case sensitive, so I suggest you canonicalise all extensions to lowercase on your server.

Regex should be /^.*\.(pdf)$/i .

You can use this in your selector (to find all links to pdf files)
a[href$=".pdf"]

use this regex (without quotes):
/\.(pdf|doc)$/i
this regex matches (case insensitive) anything that ends with .pdf, .doc etc.
for dynamic class:
var match = linkURL.match(/\.(pdf|doc)$/i);
match = match ? match[1].toLowerCase() : null;
if (match != null) {
$(this).addClass(match);
}

Another answer, building off of #Amadan is:
var extensions = [
'pdf',
'jpg',
'doc'
];
$.each( extensions, function( i, v) {
$('#rt-mainbody').find( 'a[href$=".' + v + '"], a[href$=".' + v.toUpperCase() + '"]')
.addClass( extension );
});

The onyl suggestion I would make is that you can change your match to inspect what is the file extension instead of having to do a different regex search for each possible file extension:
var linkURL = $(this).attr('href'); //<--you were accidentally declared linkURL as a global BTW.
var match = linkURL.match(/\.(.*)$/);
if(match != null){
//we can extract the part between the parens in our regex
var ext = match[1].toLowerCase()
switch(ext){
case 'pdf': $(this).addClass('pdf'); break;
case 'jpg': $(this).addClass('jpg'); break;
//...
}
}
This switch statement mostly useful if you want the option of using class names that are different from your file extensions. If the file extension is always the same you can consider changing the regex to something that fits the file extensions you want
/\.(pdf|jpg|txt)$/i //i for "case insensitive"
and then just do
var ext = match[1].toLowerCase()
$(this).addClass(ext);

Checking a specific uri in the current url - Jquery

I am trying to check if the current url
base_url/index.php?name0=value0&name1=value1&name2=value2...
contains a specific name=value. I tried this
var path = $.inArray('name=value', $(location).attr('href').split('&'));
if (path > -1){ triggers my function...}
But I guess that this wouldn't work if the url is url encoded. Is there a way to check if the url contains name=value without checking all the conditions (split('&') or split('%26')) ?

Split will always work, because & part of url is not encoded if it split parameters. However, you can have name or value encoded in the url. To search for them, you should use encodeURI like that:
var path = $.inArray(encodeURI('name=value'), $(location).attr('href').split('&'));
if (path > -1){ triggers my function...}

You can use core javascript for this:
var parameterName = 'name0';
var parameterValue = 'value0';
var path = decodeURI(location.href).indexOf(parameterName+'='+parameterValue);
if (path > -1){
triggers my function...
}
EDIT: I've tested it more and neither solution is perfect: mine fails when you have something before the specified name value, for example: varname0 when you check name0 will be found and that's not correct, yours (and monshq's) doesn't check the first value/pair which follows ? character.
How can I get query string values? is something you're looking for.

How to trim a file extension from a String in JavaScript?

For example, assuming that x = filename.jpg, I want to get filename, where filename could be any file name (Let's assume the file name only contains [a-zA-Z0-9-_] to simplify.).
I saw x.substring(0, x.indexOf('.jpg')) on DZone Snippets, but wouldn't x.substring(0, x.length-4) perform better? Because, length is a property and doesn't do character checking whereas indexOf() is a function and does character checking.

Not sure what would perform faster but this would be more reliable when it comes to extension like .jpeg or .html
x.replace(/\.[^/.]+$/, "")

In node.js, the name of the file without the extension can be obtained as follows.
const path = require('path');
const filename = 'hello.html';
path.parse(filename).name; //=> "hello"
path.parse(filename).ext; //=> ".html"
path.parse(filename).base; //=> "hello.html"
Further explanation at Node.js documentation page.

If you know the length of the extension, you can use x.slice(0, -4) (where 4 is the three characters of the extension and the dot).
If you don't know the length #John Hartsock regex would be the right approach.
If you'd rather not use regular expressions, you can try this (less performant):
filename.split('.').slice(0, -1).join('.')
Note that it will fail on files without extension.

x.length-4 only accounts for extensions of 3 characters. What if you have filename.jpegor filename.pl?
EDIT:
To answer... sure, if you always have an extension of .jpg, x.length-4 would work just fine.
However, if you don't know the length of your extension, any of a number of solutions are better/more robust.
x = x.replace(/\..+$/, '');
OR
x = x.substring(0, x.lastIndexOf('.'));
OR
x = x.replace(/(.*)\.(.*?)$/, "$1");
OR (with the assumption filename only has one dot)
parts = x.match(/[^\.]+/);
x = parts[0];
OR (also with only one dot)
parts = x.split(".");
x = parts[0];

I like this one because it is a one liner which isn't too hard to read:
filename.substring(0, filename.lastIndexOf('.')) || filename

You can perhaps use the assumption that the last dot will be the extension delimiter.
var x = 'filename.jpg';
var f = x.substr(0, x.lastIndexOf('.'));
If file has no extension, it will return empty string. To fix that use this function
function removeExtension(filename){
var lastDotPosition = filename.lastIndexOf(".");
if (lastDotPosition === -1) return filename;
else return filename.substr(0, lastDotPosition);
}

In Node.js versions prior to 0.12.x:
path.basename(filename, path.extname(filename))
Of course this also works in 0.12.x and later.

I don't know if it's a valid option but I use this:
name = filename.split(".");
// trimming with pop()
name.pop();
// getting the name with join()
name.join('.'); // we split by '.' and we join by '.' to restore other eventual points.
It's not just one operation I know, but at least it should always work!
UPDATE: If you want a oneliner, here you are:
(name.split('.').slice(0, -1)).join('.')

This works, even when the delimiter is not present in the string.
String.prototype.beforeLastIndex = function (delimiter) {
return this.split(delimiter).slice(0,-1).join(delimiter) || this + ""
}
"image".beforeLastIndex(".") // "image"
"image.jpeg".beforeLastIndex(".") // "image"
"image.second.jpeg".beforeLastIndex(".") // "image.second"
"image.second.third.jpeg".beforeLastIndex(".") // "image.second.third"
Can also be used as a one-liner like this:
var filename = "this.is.a.filename.txt";
console.log(filename.split(".").slice(0,-1).join(".") || filename + "");
EDIT: This is a more efficient solution:
String.prototype.beforeLastIndex = function (delimiter) {
return this.substr(0,this.lastIndexOf(delimiter)) || this + ""
}

Another one-liner:
x.split(".").slice(0, -1).join(".")

Here's another regex-based solution:
filename.replace(/\.[^.$]+$/, '');
This should only chop off the last segment.

Simple one:
var n = str.lastIndexOf(".");
return n > -1 ? str.substr(0, n) : str;

The accepted answer strips the last extension part only (.jpeg), which might be a good choice in most cases.
I once had to strip all extensions (.tar.gz) and the file names were restricted to not contain dots (so 2015-01-01.backup.tar would not be a problem):
var name = "2015-01-01_backup.tar.gz";
name.replace(/(\.[^/.]+)+$/, "");

var fileName = "something.extension";
fileName.slice(0, -path.extname(fileName).length) // === "something"

If you have to process a variable that contains the complete path (ex.: thePath = "http://stackoverflow.com/directory/subdirectory/filename.jpg") and you want to return just "filename" you can use:
theName = thePath.split("/").slice(-1).join().split(".").shift();
the result will be theName == "filename";
To try it write the following command into the console window of your chrome debugger:
window.location.pathname.split("/").slice(-1).join().split(".").shift()
If you have to process just the file name and its extension (ex.: theNameWithExt = "filename.jpg"):
theName = theNameWithExt.split(".").shift();
the result will be theName == "filename", the same as above;
Notes:
The first one is a little bit slower cause performes more
operations; but works in both cases, in other words it can extract
the file name without extension from a given string that contains a path or a file name with ex. While the second works only if the given variable contains a filename with ext like filename.ext but is a little bit quicker.
Both solutions work for both local and server files;
But I can't say nothing about neither performances comparison with other answers nor for browser or OS compatibility.
working snippet 1: the complete path
var thePath = "http://stackoverflow.com/directory/subdirectory/filename.jpg";
theName = thePath.split("/").slice(-1).join().split(".").shift();
alert(theName);
working snippet 2: the file name with extension
var theNameWithExt = "filename.jpg";
theName = theNameWithExt.split("/").slice(-1).join().split(".").shift();
alert(theName);
working snippet 2: the file name with double extension
var theNameWithExt = "filename.tar.gz";
theName = theNameWithExt.split("/").slice(-1).join().split(".").shift();
alert(theName);

Node.js remove extension from full path keeping directory
https://stackoverflow.com/a/31615711/895245 for example did path/hello.html -> hello, but if you want path/hello.html -> path/hello, you can use this:
#!/usr/bin/env node
const path = require('path');
const filename = 'path/hello.html';
const filename_parsed = path.parse(filename);
console.log(path.join(filename_parsed.dir, filename_parsed.name));
outputs directory as well:
path/hello
https://stackoverflow.com/a/36099196/895245 also achieves this, but I find this approach a bit more semantically pleasing.
Tested in Node.js v10.15.2.

Though it's pretty late, I will add another approach to get the filename without extension using plain old JS-
path.replace(path.substr(path.lastIndexOf('.')), '')

A straightforward answer, if you are using Node.js, is the one in the first comment.
My task was I need to delete an image in Cloudinary from the Node server and I just need to get the image name only.
Example:
const path = require("path")
const image=xyz.jpg;
const img= path.parse(image).name
console.log(img) // xyz

This is where regular expressions come in handy! Javascript's .replace() method will take a regular expression, and you can utilize that to accomplish what you want:
// assuming var x = filename.jpg or some extension
x = x.replace(/(.*)\.[^.]+$/, "$1");

You can use path to maneuver.
var MYPATH = '/User/HELLO/WORLD/FILENAME.js';
var MYEXT = '.js';
var fileName = path.basename(MYPATH, MYEXT);
var filePath = path.dirname(MYPATH) + '/' + fileName;
Output
> filePath
'/User/HELLO/WORLD/FILENAME'
> fileName
'FILENAME'
> MYPATH
'/User/HELLO/WORLD/FILENAME.js'

This is the code I use to remove the extension from a filename, without using either regex or indexOf (indexOf is not supported in IE8). It assumes that the extension is any text after the last '.' character.
It works for:
files without an extension: "myletter"
files with '.' in the name: "my.letter.txt"
unknown length of file extension: "my.letter.html"
Here's the code:
var filename = "my.letter.txt" // some filename
var substrings = filename.split('.'); // split the string at '.'
if (substrings.length == 1)
{
return filename; // there was no file extension, file was something like 'myfile'
}
else
{
var ext = substrings.pop(); // remove the last element
var name = substrings.join(""); // rejoin the remaining elements without separator
name = ([name, ext]).join("."); // readd the extension
return name;
}

I like to use the regex to do that. It's short and easy to understand.
for (const regexPattern of [
/\..+$/, // Find the first dot and all the content after it.
/\.[^/.]+$/ // Get the last dot and all the content after it.
]) {
console.log("myFont.ttf".replace(regexPattern, ""))
console.log("myFont.ttf.log".replace(regexPattern, ""))
}
/* output
myFont
myFont
myFont
myFont.ttf
*/
The above explanation may not be very rigorous. If you want to get a more accurate explanation can go to regex101 to check
\..+$
\.[^/.]+$

We might come across filename or file path with multiple extension suffix. Consider the following to trim them.
text = "/dir/path/filename.tar.gz"
output = text.replace(/(\.\w+)+$/,"")
result of output: "/dir/path/filename"
It solves the file extension problem especially when the input has multiple extensions.

Another one liner - we presume our file is a jpg picture >> ex: var yourStr = 'test.jpg';
yourStr = yourStr.slice(0, -4); // 'test'

x.slice(0, -(x.split('.').pop().length + 1));

name.split('.').slice(0, -1).join('.')
that's all enjoy your coding...

I would use something like x.substring(0, x.lastIndexOf('.')). If you're going for performance, don't go for javascript at all :-p No, one more statement really doesn't matter for 99.99999% of all purposes.

Develop Reference

JavaScript is the programming language of the Web.

gsub a string in javascript - javascript

I try to get only the domain name i.e. google.com from javascript document.location.hostname This code returns www.google.com. How can I only get google.com? In this case it would be to either remove the www. or get only the domain name (if there's such a method in javascript).

... I'm in chrome right now, and window.location.host does the trick. EDIT So I'm an idiot... BUT hopefully this will redeem: An alternate to regex: var host = window.location.hostname.split('.') .filter( function(el, i, array){ return (i >= array.length - 2) } ) .join('.');

Related

Get domain without subdomain in javascript [duplicate]

How to get base url from FTP address?

Regex to find <a> tags containing links to specific file types

Checking a specific uri in the current url - Jquery

How to trim a file extension from a String in JavaScript?

Categories

Resources