Thanks for your help!
I'm looking to convert urls in a string to url using javascript.
I have the following regex:
/https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)/gi
which seems to work pretty good for any url starting with either http://, https://, http://www. or https://www..
However, it doesn't work for urls starting with just a www: www.url.com
How can the regex be modified to work with the following:
http://
https://
http://www.
https://www.
www.
Thanks again for your help! Have a great day!
You can use alternation
(?:https?:\/\/(www\.)?|www\.)[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9#:%_\+.~#?&\/=]*)
const regex = /(?:https?:\/\/(www\.)?|www\.)[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9#:%_\+.~#?&\/=]*)/i;
const strs = ['http://example.com','https://example.com','http://www.example.com','https://www.example.com','www.example.com','example.com']
strs.forEach(str=>{
console.log(str, ' | ', regex.test(str))
})
Regex Demo
If you want example.com also to match then you can make (?:https?:\/\/)? optional
(?:https?:\/\/)?(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9#:%_\+.~#?&\/=]*)
Related
I need to retrieve those URls from a string using regex in JavaScript
Here is an example string:
"Hello everyone! please join us at: https://oursite.com/. and
please visit our courses websites: http://courseone.com.eu & http://coursetwo.us.
For prod use websocket with this url: wss://localhost:4500/websocket/. and
for staging use this url: ws://localhost:4500/websocket".
Now I want to extract these URls from the above string:
Like this:
https://oursite.com/
http://courseone.com.eu
http://coursetwo.us
wss://localhost:4500/websocket/
ws://localhost:4500/websocket
Now I followed this regex given in Detect URLs in text with JavaScript
/(https?:\/\/[^\s]+)/g;
But it is not properly working for me since I have wss and ws URls aswell
Can anyone help me with the regex?
Using this part [^\s]+ in your pattern matches too much.
Depending on the formats and the characters that you want to allow in the links, you can get the desired result from the question by matching optional non whitespace characters and then end on not a . or "
\b(?:http|ws)s?:\/\/\S*[^\s."]
Regex demo
const regex = /\b(?:http|ws)s?:\/\/\S*[^\s."]/g;
const s = `"Hello everyone! please join us at: https://oursite.com/. and
please visit our courses websites: http://courseone.com.eu & http://coursetwo.us.
For prod use websocket with this url: wss://localhost:4500/websocket/. and
for staging use this url: ws://localhost:4500/websocket".`
console.log(s.match(regex));
Or end the match on a word character followed by an optional forward slash:
\b(?:http|ws)s?:\/\/\S*\w\/?
Regex demo
I need help to match a url that matches only if
the path matches exactly /gummybear/ or /gummybear (case sensetive)
the protocol is http or https
the domain/host/port/hash can be anything
the regex should not match if
the path is /gummybear/foobar
it contains search parameters such as ?query=string
So far i got this:
/^http[s]?:\/\/?[^\/\s]+\/gummybear[\/]?/
examples it should be true for
https://www.example.com:81/gummybear/
http://www.example.com/gummybear#top
https://example.com:81/gummybear#/foobar/?search=params
http://www.exa.mple.com:81/gummybear
https://example.com:81/gummybear/#/exaple/1234/
examples that it should be false for
https://www.example.com:81/foo/gummybear/
https://www.example.com:81/guMmybear/
https://www.example.com:81/gummybear.html
http://www.example.com/gummybear/apple#top
file://example.com:81/gummybear#/foobar/?search=params
http://www.exa.mple.com:81/gummybear?search=apple#lol
https://example.com:81/#/gummybear/
http://www.test.com:81/dir/dir.2/index.htm?q1=0&&test1&test2=value#top
For your specific needs, I can come up with this regex:
^https?://[^/]+/gummybear(?:/?|/?#.*)$
Working demo
I haven't escaped slashes to make it more readable, but for javascript you can use:
^https?:\/\/[^\/]+\/gummybear(?:\/?|\/?#.*)$
I have been trying to make a Reg Exp to match the URL with specific domain name.
So if i want to check if this url is from example.com
what reg exp should be the best?
This reg exp should match following type of URLs:
http://api.example.com/...
http://preview.example.com/...
http://www.example.com/...
http://purhcase.example.com/...
Just simple rule, like http://{something}.example.com/{something} then should pass.
Thank you.
I think this is what you're looking for: (https?:\/\/(.+?\.)?example\.com(\/[A-Za-z0-9\-\._~:\/\?#\[\]#!$&'\(\)\*\+,;\=]*)?).
It breaks down as follows:
https?:\/\/ to match http:// or https:// (you didn't mention https, but it seemed like a good idea).
(.+?\.)? to match anything before the first dot (I made it optional so that, for example, http://example.com/ would be found
example\.com (example.com, of course);
(\/[A-Za-z0-9\-\._~:\/\?#\[\]#!$&'\(\)\*\+,;\=]*)?): a slash followed by every acceptable character in a URL; I made this optional so that http://example.com (without the final slash) would be found.
Example: https://regex101.com/r/kT8lP2/1
Use indexOf javascript API. :)
var url = 'http://api.example.com/api/url';
var testUrl = 'example.com';
if(url.indexOf(testUrl) !== -1) {
console.log('URL passed the test');
} else{
console.log('URL failed the test');
}
EDIT:
Why use indexOf instead of Regular Expression.
You see, what you have here for matching is a simple string (example.com) not a pattern. If you have a fixed string, then no need to introduce semantic complexity by checking for patterns.
Regular expressions are best suited for deciding if patterns are matched.
For example, if your requirement was something like the domain name should start with ex end with le and between start and end, it should contain alphanumeric characters out of which 4 characters must be upper case. This is the usecase where regular expression would prove beneficial.
You have simple problem so it's unnecessary to employ army of 1000 angels to convince someone who loves you. ;)
Use this:
/^[a-zA-Z0-9_.+-]+#(?:(?:[a-zA-Z0-9-]+\.)?[a-zA-Z]+\.)?
(domain|domain2)\.com$/g
To match the specific domain of your choice.
If you want to match only one domain then remove |domain2 from (domain|domain2) portion.
It will help you. https://www.regextester.com/94044
Not sure if this would work for your case, but it would probably be better to rely on the built in URL parser vs. using a regex.
var url = document.createElement('a');
url.href = "http://www.example.com/thing";
You can then call those values using the given to you by the API
url.protocol // (http:)
url.host // (www.example.com)
url.pathname // (/thing)
If that doesn't help you, something like this could work, but is likely too brittle:
var url = "http://www.example.com/thing";
var matches = url.match(/:\/\/(.[^\/]+)(.*)/);
// matches would return something like
// ["://example.com/thing", "example.com", "/thing"]
These posts could also help:
https://stackoverflow.com/a/3213643/4954530
https://stackoverflow.com/a/6168370
Good luck out there!
There are cases where the domain you're looking for could actually be found in the query section but not in the domain section: https://www.google.com/q=www.example.com
This answer would treat that case better.
See this example on regex101.
As you you pointed you only need example.com (write domain then escaped period then com), so use it in regex.
Example
UPDATED
See the answer below
I have a string that may contain several url links (http or https). I need a script that would remove all those URLs from the string completely and return that same string without them.
I tried so far:
var url = "and I said http://fdsadfs.com/dasfsdadf/afsdasf.html";
var protomatch = /(https?|ftp):\/\//; // NB: not '.*'
var b = url.replace(protomatch, '');
console.log(b);
but this only removes the http part and keeps the link.
How to write the right regex that it would remove everything that follows http and also detect several links in the string?
Thank you so much!
You can use this regex:
var b = url.replace(/(?:https?|ftp):\/\/[\n\S]+/g, '');
//=> and I said
This regex matches and removes any URL that starts with http:// or https:// or ftp:// and matches up to next space character OR end of input. [\n\S]+ will match across multi lines as well.
Did you search for a url parser regex? This question has a few comprehensive answers Getting parts of a URL (Regex)
That said, if you want something much simpler (and maybe not as perfect), you should remember to capture the entire url string and not just the protocol.
Something like
/(https?|ftp):\/\/[\.[a-zA-Z0-9\/\-]+/
should work better. Notice that the added half parses the rest of the URL after the protocol.
I use this js code to match a hostname from a string:
url.match(/:\/\/(www\.)?(.[^/:]+)/);
This works when the url has protocol:// at the beginning. For example:
This works fine:
var url = "http://domain.com/page";
url.match(/:\/\/(www\.)?(.[^/:]+)/);
But this doesn't:
var url = "domain.com/page";
url.match(/:\/\/(www\.)?(.[^/:]+)/);
I have tried:
url.match(/(:\/\/)?(www\.)?(.[^/:]+)/);
And that matches fine the hostname when it doesn't contain protocol://, but when it does contains it it only returns the protocol and not the hostname.
How could I match the domain when it doesn't contains it?
I used this function from Steven Levithan, it parses urls quite decently.
Here's how you use this function
alert(parseUri("www.domain.com/foo").host)
OK before you have a brain meltdown from #xanatos answer here is a simple regex for basic needs. The other answers are more complete and handle more cases than this regex :
(?:(?:(?:\bhttps?|ftp)://)|^)([-A-Z0-9.]+)/
Group 1 will have your host name. URL parsing is a fragile thing to do with regexes. You were on the right track. You had two regexes that worked partially. I simply combined them.
Edit : I was tired yesterday night. Here is the regex for jscript
if (subject.match(/(?:(?:(?:\bhttps?|ftp):\/\/)|^)([\-a-z0-9.]+)\//i)) {
// Successful match
} else {
// Match attempt failed
}
This
var rx = /^(?:(?:ht|f)tp(?:s?)\:\/\/|~\/|\/)?(?:\w+:\w+#)?(?:(?:[-\w]+\.)+(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))(?::[\d]{1,5})?(?:(?:(?:\/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|\/)+|\?|#)?(?:(?:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?:#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?$/;
should be the uber-url parsing regex :-) Taken from here http://flanders.co.nz/2009/11/08/a-good-url-regular-expression-repost/
Test here: http://jsfiddle.net/Qznzx/1/
It shows the uselessness of regexes.
This might be a bit more complex than necessary but it seems to work:
^((?:.+?:\/\/)?(?:.[^/:]+)+)$
A non-capturing group for the protocol. From the start of the string
match any number of characters until a :. There may be zero or one
protocol.
A non-capturing group for the rest of the url. This part must exist.
Group it all up in single group.