How to get the main domain string using regular expression? - javascript

I have just started using regular expression and i landed up in a problem. So it would be really nice if someone can help me out with it.
The problem is, in case I have a url as given below;
$url = http://www.blog.domain.com/page/category=?
and want only the domain, how can i get it using regular expression in javascript.
thank you

This should work too, but most restrictive and shorter:
var url = "http://www.blog.domain.com/page/category"
var result = url.replace(/^(https?:\/\/)?(.+\.)*(([a-z0-9-]*)\.[a-z]{2,6})(\/.+)$/i,"$4")
If you want "domain.com" and not only "domain", use $3 instead of $4.
Explaination step by step:
A correct domain syntax: letters,numbers and "-" /([a-z0-9-]*)/i
Domain extension (2-6 chars): /(([a-z0-9-]*)\.[a-z]{2,6})/i
Subdomains: /(.+\.)*(([a-z0-9-]*)\.[a-z]{2,6})/i
An url start with http and maybe https: /^https?:\/\/(.+\.)*(([a-z0-9-]*)\.[a-z]{2,6})/i
You can put or not http when you type an url: /^(https?:\/\/)?(.+\.)*(([a-z0-9-]*)\.[a-z]{2,6})/i
Then what is after /: /^(https?:\/\/)?(.+\.)*(([a-z0-9-]*)\.[a-z]{2,6})(\/.+)$/i

Try below code
var url = "http://www.blog.domain.com/page/category=?";
var match = url .match(/(?:http?:\/\/)?(?:www\.)?(.*?)\//);
console.log(match[match.length-1]);

You can get it using the following RegEx: /.*\.(.+)\.[com|org|gov]/
You can add all of the supported domain extensions in this regex.
RegEx101 Explanation
Working Code Snippet:
var url = "http://www.blog.domain.gov/page/category=?";
var regEx = /.*\.(.+)\.[com|org|gov]/;
alert(url.match(regEx)[1]);

Do not use regex for this:
use hostname:
The URLUtils.hostname property is a DOMString containing the domain of
the URL.
var x = new URL("http://www.blog.domain.com/page/category=?").hostname;
console.log(x);
as pointed by vishwanath, URL faces compatibilty issues with IE<10 so for those cases, regex will be needed.
use this :
var str = "http://www.blog.domain.com/page/category=?";
var res = str.match(/[^.]*.(com|net|org|info|coop|int|co\.uk|org\.uk|ac\.uk|uk)/g);
console.log(res);
=> domain.com
the list in the regex can be expanded further depending upon your need.
a list of TLDs can be found here

Related

How to validate/block shorten URL in string

I need to block/validate shorten URL in String. Below string contains shorten URL how can I block/validate this in string .
Hi #first_name# This is Mondi from Novato Cleaners. May I ask for a favor ? Our google https://bit.ly requires reviews. Could you provide one ?Thank you
So for this you need to follow these steps:
1- extract all urls from string.
2- request each urls and get there original location. very well explained here:
How to get domain name from shortened URL with Javascript?
3- when you have originalUrl, just check if url != originalUrl then it is a shorten url.
Use regex to find whether there is a URL in your string or not, if they're just replacing it what you need on that space
/(https?://[^\s]+)/g
var string = "Hi Vignesh This is Mondi from Novato Cleaners. May I ask for a favor ? Our google https://bit.ly requires reviews. Could you provide one ?Thank you";
var protomatch = /(https?:\/\/[^\s]+)/g;
var b = string.replace(protomatch, '');
console.log(b)

Get base url from string with Regex and Javascript

I'm trying to get the base url from a string (So no window.location).
It needs to remove the trailing slash
It needs to be regex (No New URL)
It need to work with query parameters and anchor links
In other words all the following should return https://apple.com or https://www.apple.com for the last one.
https://apple.com?query=true&slash=false
https://apple.com#anchor=true&slash=false
http://www.apple.com/#anchor=true&slash=true&whatever=foo
These are just examples, urls can have different subdomains like https://shop.apple.co.uk/?query=foo should return https://shop.apple.co.uk - It could be any url like: https://foo.bar
The closer I got is with:
const baseUrl = url.replace(/^((\w+:)?\/\/[^\/]+\/?).*$/,'$1').replace(/\/$/, ""); // Base Path & Trailing slash
But this doesn't work with anchor links and queries which start right after the url without the / before
Any idea how I can get it to work on all cases?
You could add # and ? to your negated character class. You don't need .* because that will match until the end of the string.
For your example data, you could match:
^https?:\/\/[^#?\/]+
Regex demo
strings = [
"https://apple.com?query=true&slash=false",
"https://apple.com#anchor=true&slash=false",
"http://www.apple.com/#anchor=true&slash=true&whatever=foo",
"https://foo.bar/?q=true"
];
strings.forEach(s => {
console.log(s.match(/^https?:\/\/[^#?\/]+/)[0]);
})
You could use Web API's built-in URL for this. URL will also provide you with other parsed properties that are easy to get to, like the query string params, the protocol, etc.
Regex is a painful way to do something that the browser makes otherwise very simple.
I know that you asked about using regex, but in the event that you (or someone coming here in the future) really just cares about getting the information out and isn't committed to using regex, maybe this answer will help.
let one = "https://apple.com?query=true&slash=false"
let two = "https://apple.com#anchor=true&slash=false"
let three = "http://www.apple.com/#anchor=true&slash=true&whatever=foo"
let urlOne = new URL(one)
console.log(urlOne.origin)
let urlTwo = new URL(two)
console.log(urlTwo.origin)
let urlThree = new URL(three)
console.log(urlThree.origin)
const baseUrl = url.replace(/(.*:\/\/.*)[\?\/#].*/, '$1');
This will get you everything up to the .com part. You will have to append .com once you pull out the first part of the url.
^http.*?(?=\.com)
Or maybe you could do:
myUrl.Replace(/(#|\?|\/#).*$/, "")
To remove everything after the host name.

how to remove .com, .org,.co.uk,.co.us using regular expression in javascript?

I am working on regular expression and right now I am having problem as I am not able to remove the .com, .net, .co.uk kind of extensions from the url that is for example I have a url like
url = subdomain.domain.com or domain.co.uk or domain.net
after using regular expression I need the output to be only domain..so can someone help me out with this.
//match domain name (with HTTP)
var domainRegex = /(.*?)[^w{3}.]([a-zA-Z0-9]([a-zA-Z0-9-]{0,65}[a-zA-Z0-9])?.)+[a-zA-Z]{2,6}/igm;
//match domain name (www. only)
var domainRegex = /[^w{3}.]([a-zA-Z0-9]([a-zA-Z0-9-]{0,65}[a-zA-Z0-9])?.)+[a-zA-Z]{2,6}/igm;
//match domain name (alternative)
var domainRegex = /(.*?).(com|net|org|info|coop|int|com.au|co.uk|org.uk|ac.uk|)/igm;
//match sub domains: www, dev, int, stage, int.travel, stage.travel
var subDomainRegex = /(http://|https://)?(www.|dev.)?(int.|stage.)?(travel.)?(.*)+?/igm;
Source: SitePoint
.*([^\.]+)(com|net|org|info|coop|int|co\.uk|org\.uk|ac\.uk|uk|__and so on__)$
or
/^(?:www.)?(.*?).(?:com|au.uk|co.in)$/

Regex expression to match the First url after a space followed

I want to match the First url followed by a space using regex expression while typing in the input box.
For example :
if I type www.google.com it should be matched only after a space followed by the url
ie www.google.com<SPACE>
Code
$(".site").keyup(function()
{
var site=$(this).val();
var exp = /^http(s?):\/\/(\w+:{0,1}\w*)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-\/]))?/;
var find = site.match(exp);
var url = find? find[0] : null;
if (url === null){
var exp = /[-\w]+(\.[a-z]{2,})+(\S+)?(\/|\/[\w#!:.?+=&%#!\-\/])?/g;
var find = site.match(exp);
url = find? 'http://'+find[0] : null;
}
});
Fiddle
Please help, Thanks in advance
you should be using a better regex to correctly match the query & fragment parts of your url. Have a look here (What is the best regular expression to check if a string is a valid URL?) for a correct IRI/URI structured Regex test.
But here's a rudimentary version:
var regex = /[-\w]+(\.[a-z]{2,})+(\/?)([^\s]+)/g;
var text = 'test google.com/?q=foo basdasd www.url.com/test?q=asdasd#cheese something else';
console.log(text.match(regex));
Expected Result:
["google.com/?q=foo", "www.url.com/test?q=asdasd#cheese"]
If you really want to check for URLs, make sure you include scheme, port, username & password checks just to be safe.
In the context of what you're trying to achieve, you should really put in some delay so that you don't impact browser performance. Regex tests can be expensive when you use complex rules especially so when running the same rule every time a new character is entered. Just think about what you're trying to achieve and whether or not there's a better solution to get there.
With a lookahead:
var exp = /[-\w]+(\.[a-z]{2,})+(\S+)?(\/|\/[\w#!:.?+=&%#!\-\/])?(?= )/g;
I only added this "(?= )" to your regex.
Fiddle

change domain portion of links with javascript or jquery

Sorry for my original question being unclear, hopefully by rewording I can better explain what I want to do.
Because of this I need a way to use JavaScript (or jQuery) to do the following:
determine domain of the current page being accessed
identify all the links on the page that use the domain www.domain1.com and replace with www.domain2.com
i.e. if the user is accessing www.domain2.com/index then:
Test 1
should be rewritten dynamically on load to
Test 1
Is it even possible to rewrite only a portion of the url in an href tag?
Your code will loop over all links on the page. Here's a version that only iterates over URLS that need to be replaced.
var linkRewriter = function(a, b) {
$('a[href*="' + a + '"]').each(function() {
$(this).attr('href', $(this).attr('href').replace(a, b));
});
};
linkRewriter('originalDomain.com', 'rewrittenDomain.com');
I figured out how to make this work.
<script type="text/javascript">
// link rewriter
$(document).ready (
function link_rewriter(){
var hostadd = location.host;
var vendor = '999.99.999.9';
var localaccess = 'somesite1.';
if (hostadd == vendor) {
$("a").each(function(){
var o = $(this);
var href = o.attr('href');
var newhref;
newhref = href.replace(/somesite1/i, "999.99.999.99");
o.attr('href',newhref);
});
}
}
);
</script>
You'll need to involve Java or something server-side to get the IP address. See this:
http://javascript.about.com/library/blip.htm
Replace urls domains using REGEX
This example will replace all urls using my-domain.com to my-other-domain (both are variables).
You can do dynamic regexs by combining string values and other regex expressions within a raw string template. Using String.raw will prevent javascript from escaping any character within your string values.
// Strings with some data
const domainStr = 'my-domain.com'
const newDomain = 'my-other-domain.com'
// Make sure your string is regex friendly
// This will replace dots for '\'.
const regexUrl = /\./gm;
const substr = `\\\.`;
const domain = domainStr.replace(regexUrl, substr);
// domain is a regex friendly string: 'my-domain\.com'
console.log('Regex expresion for domain', domain)
// HERE!!! You can 'assemble a complex regex using string pieces.
const re = new RegExp( String.raw `([\'|\"]https:\/\/)(${domain})(\S+[\'|\"])`, 'gm');
// now I'll use the regex expression groups to replace the domain
const domainSubst = `$1${newDomain}$3`;
// const page contains all the html text
const result = page.replace(re, domainSubst);
note: Don't forget to use regex101.com to create, test and export REGEX code.

Categories

Resources