Regex to get the domain from URL, and thing - javascript

There are lots of posts online like this, but none of them seem to do what I'm trying to do.
Let's say I have a domain in a string:
Extract hostname name from string
And I want to extract the domain name and nothing else (not the protocol, the subdomain or the file extension).
so for
https://stackoverflow.com/questions/8498592/extract-root-domain-name-from-string
I want to get:
stackoverflow.com
Is there any way to do this?

Try this on:
var url = 'http://stackoverflow.com/questions/8498592/extract-root-domain-name-from-string';
var domain = url.match(/^https?:\/\/([^\/?#]+)/)[1];
alert(domain);
This looks for a string that starts with http and optionally s, followed by ://, then matches everything it can that is not a /. But .match() returns an array here:
['http://stackoverflow.com', 'stackoverflow.com']
So, we use [1] to get the submatch.

You can use a simple regex like this:
\/\/(.*?)\/
Here you have a working example:
http://regex101.com/r/iP0uX7/1
Hope to help

Related

Node.js how to split querystring value by index

I am trying to figure out how to split a querystring's values in Node.js. This is for a web proxy I am creating. I need to split the querystring by the third '/' for example. https://example.org/bahahhaah to https://example.org. It will also be nice if I knew how to split by the last one for example https://example.org/bahhaa/s2.html to https://example.org/bahhaa. I wanna have these two outputs save into a cookie. I am not sure what code to put for example. If I am not being clear enough, please tell me.
What you are looking for, is the built-in Node.js module URL. It will parse things out see this question's answer, so you can get the pathname. Then, you can .split the pathname to find the things you are looking for:
const url = require('url');
const URLparts = url.parse('https://example.org/bahahhaah').pathname.split('/');
console.log(URLParts); // yields [`bahahhaah`]
URLParts now yields an array of your path parts, separated by the / at the end of the URL.

javascript Reg Exp to match specific domain name

I have been trying to make a Reg Exp to match the URL with specific domain name.
So if i want to check if this url is from example.com
what reg exp should be the best?
This reg exp should match following type of URLs:
http://api.example.com/...
http://preview.example.com/...
http://www.example.com/...
http://purhcase.example.com/...
Just simple rule, like http://{something}.example.com/{something} then should pass.
Thank you.
I think this is what you're looking for: (https?:\/\/(.+?\.)?example\.com(\/[A-Za-z0-9\-\._~:\/\?#\[\]#!$&'\(\)\*\+,;\=]*)?).
It breaks down as follows:
https?:\/\/ to match http:// or https:// (you didn't mention https, but it seemed like a good idea).
(.+?\.)? to match anything before the first dot (I made it optional so that, for example, http://example.com/ would be found
example\.com (example.com, of course);
(\/[A-Za-z0-9\-\._~:\/\?#\[\]#!$&'\(\)\*\+,;\=]*)?): a slash followed by every acceptable character in a URL; I made this optional so that http://example.com (without the final slash) would be found.
Example: https://regex101.com/r/kT8lP2/1
Use indexOf javascript API. :)
var url = 'http://api.example.com/api/url';
var testUrl = 'example.com';
if(url.indexOf(testUrl) !== -1) {
console.log('URL passed the test');
} else{
console.log('URL failed the test');
}
EDIT:
Why use indexOf instead of Regular Expression.
You see, what you have here for matching is a simple string (example.com) not a pattern. If you have a fixed string, then no need to introduce semantic complexity by checking for patterns.
Regular expressions are best suited for deciding if patterns are matched.
For example, if your requirement was something like the domain name should start with ex end with le and between start and end, it should contain alphanumeric characters out of which 4 characters must be upper case. This is the usecase where regular expression would prove beneficial.
You have simple problem so it's unnecessary to employ army of 1000 angels to convince someone who loves you. ;)
Use this:
/^[a-zA-Z0-9_.+-]+#(?:(?:[a-zA-Z0-9-]+\.)?[a-zA-Z]+\.)?
(domain|domain2)\.com$/g
To match the specific domain of your choice.
If you want to match only one domain then remove |domain2 from (domain|domain2) portion.
It will help you. https://www.regextester.com/94044
Not sure if this would work for your case, but it would probably be better to rely on the built in URL parser vs. using a regex.
var url = document.createElement('a');
url.href = "http://www.example.com/thing";
You can then call those values using the given to you by the API
url.protocol // (http:)
url.host // (www.example.com)
url.pathname // (/thing)
If that doesn't help you, something like this could work, but is likely too brittle:
var url = "http://www.example.com/thing";
var matches = url.match(/:\/\/(.[^\/]+)(.*)/);
// matches would return something like
// ["://example.com/thing", "example.com", "/thing"]
These posts could also help:
https://stackoverflow.com/a/3213643/4954530
https://stackoverflow.com/a/6168370
Good luck out there!
There are cases where the domain you're looking for could actually be found in the query section but not in the domain section: https://www.google.com/q=www.example.com
This answer would treat that case better.
See this example on regex101.
As you you pointed you only need example.com (write domain then escaped period then com), so use it in regex.
Example
UPDATED
See the answer below

How to remove URL from a string completely in Javascript?

I have a string that may contain several url links (http or https). I need a script that would remove all those URLs from the string completely and return that same string without them.
I tried so far:
var url = "and I said http://fdsadfs.com/dasfsdadf/afsdasf.html";
var protomatch = /(https?|ftp):\/\//; // NB: not '.*'
var b = url.replace(protomatch, '');
console.log(b);
but this only removes the http part and keeps the link.
How to write the right regex that it would remove everything that follows http and also detect several links in the string?
Thank you so much!
You can use this regex:
var b = url.replace(/(?:https?|ftp):\/\/[\n\S]+/g, '');
//=> and I said
This regex matches and removes any URL that starts with http:// or https:// or ftp:// and matches up to next space character OR end of input. [\n\S]+ will match across multi lines as well.
Did you search for a url parser regex? This question has a few comprehensive answers Getting parts of a URL (Regex)
That said, if you want something much simpler (and maybe not as perfect), you should remember to capture the entire url string and not just the protocol.
Something like
/(https?|ftp):\/\/[\.[a-zA-Z0-9\/\-]+/
should work better. Notice that the added half parses the rest of the URL after the protocol.

javascript fetch last url without prefix

I am looking to detect last url from text using javascript or mootools. Url canbe without prefix/scheme
I am working on URL auto sense like Facebook. Where a user may give an URL www.example.com or with http://www.example.com either of them should be detected by JavaScript. see stackoverflow detected URL that included with scheme without URL scheme it couldn't detect URL. In my case I need both.
Here is some text
'http://www.example.com www.example2.com'
Now I want www.example2.com It will be better if I get full array containing both http://www.example.com and www.example2.com
I searched a lot but couldn't find solution.
Most close to my requirements were Question about URL Validation with Regex and How do I extract a URL from plain text using jQuery?
Any help greatly appreciated.
by combing info in these 2 links:
How do I extract a URL from plain text using jQuery?
Detect URLs in text with JavaScript
We can get this:
http://jsfiddle.net/qQwGA/1/
If I understand what you're trying to do, this should cover it.
Given your input string, I think you just want to split it using spaces as separator?
.split(' ') ?
REGEX
/([^:\/?# ]+:)?(\/\/[^\/?# ]*)?[^?# ]+(\?[^# ]*)?(#\S*)?/gi
**SAMPLE CODE**
var str = 'http://www.example.com www.example2.com scheme://username:password#domain:port/path?query_string#fragment_id';
var t = str.match(/([^:\/?# ]+:)?(\/\/[^\/?# ]*)?[^?# ]+(\?[^# ]*)?(#\S*)?/gi);
/*
t contains :
[
"http://www.example.com",
"www.example2.com",
"scheme://username:password#domain:port/path?query_string#fragment_id"
]
*/
**DEMO**
>http://jsfiddle.net/wvYTd/
**DISCUSSION**
This regex will find any substring that looks like an URL in an input string.
No validation is performed on any URL found. For instance, if the input string is 3aBadScheme://hostname, the regex will detect it as an URL. In this example, 3aBadScheme is invalid since a scheme MUST start with a letter.
Excerpt from RFC3986
(...)Scheme names consist of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus ("+"), period ("."), or hyphen ("-").(...)

javascript url regex

How can I allow users to enter both subdomain and domain names without the http:// prefix using a regex in javascript. I need to allow: domainname.com or www.domainname.com or www.domainname.co.uk. I have this at the moment which expects www. :
/^(?=www\.)[A-Za-z0-9_-]+\.+[A-Za-z0-9.\/%&=\?_:;-]+$/ix.test(value);
Try this:
/^(?=www\.)?[A-Za-z0-9_-]+\.+[A-Za-z0-9.\/%&=\?_:;-]+$/
Marking the www group with a ? makes it a zero-or-one match, which is what you want as I understand it.
Tested with http://www.regular-expressions.info/javascriptexample.html
This seems to work when tested on http://www.regextester.com/
/^(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-\/]))?/

Categories

Resources