I need to retrieve those URls from a string using regex in JavaScript
Here is an example string:
"Hello everyone! please join us at: https://oursite.com/. and
please visit our courses websites: http://courseone.com.eu & http://coursetwo.us.
For prod use websocket with this url: wss://localhost:4500/websocket/. and
for staging use this url: ws://localhost:4500/websocket".
Now I want to extract these URls from the above string:
Like this:
https://oursite.com/
http://courseone.com.eu
http://coursetwo.us
wss://localhost:4500/websocket/
ws://localhost:4500/websocket
Now I followed this regex given in Detect URLs in text with JavaScript
/(https?:\/\/[^\s]+)/g;
But it is not properly working for me since I have wss and ws URls aswell
Can anyone help me with the regex?
Using this part [^\s]+ in your pattern matches too much.
Depending on the formats and the characters that you want to allow in the links, you can get the desired result from the question by matching optional non whitespace characters and then end on not a . or "
\b(?:http|ws)s?:\/\/\S*[^\s."]
Regex demo
const regex = /\b(?:http|ws)s?:\/\/\S*[^\s."]/g;
const s = `"Hello everyone! please join us at: https://oursite.com/. and
please visit our courses websites: http://courseone.com.eu & http://coursetwo.us.
For prod use websocket with this url: wss://localhost:4500/websocket/. and
for staging use this url: ws://localhost:4500/websocket".`
console.log(s.match(regex));
Or end the match on a word character followed by an optional forward slash:
\b(?:http|ws)s?:\/\/\S*\w\/?
Regex demo
Related
Thanks for your help!
I'm looking to convert urls in a string to url using javascript.
I have the following regex:
/https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)/gi
which seems to work pretty good for any url starting with either http://, https://, http://www. or https://www..
However, it doesn't work for urls starting with just a www: www.url.com
How can the regex be modified to work with the following:
http://
https://
http://www.
https://www.
www.
Thanks again for your help! Have a great day!
You can use alternation
(?:https?:\/\/(www\.)?|www\.)[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9#:%_\+.~#?&\/=]*)
const regex = /(?:https?:\/\/(www\.)?|www\.)[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9#:%_\+.~#?&\/=]*)/i;
const strs = ['http://example.com','https://example.com','http://www.example.com','https://www.example.com','www.example.com','example.com']
strs.forEach(str=>{
console.log(str, ' | ', regex.test(str))
})
Regex Demo
If you want example.com also to match then you can make (?:https?:\/\/)? optional
(?:https?:\/\/)?(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9#:%_\+.~#?&\/=]*)
I need help to match a url that matches only if
the path matches exactly /gummybear/ or /gummybear (case sensetive)
the protocol is http or https
the domain/host/port/hash can be anything
the regex should not match if
the path is /gummybear/foobar
it contains search parameters such as ?query=string
So far i got this:
/^http[s]?:\/\/?[^\/\s]+\/gummybear[\/]?/
examples it should be true for
https://www.example.com:81/gummybear/
http://www.example.com/gummybear#top
https://example.com:81/gummybear#/foobar/?search=params
http://www.exa.mple.com:81/gummybear
https://example.com:81/gummybear/#/exaple/1234/
examples that it should be false for
https://www.example.com:81/foo/gummybear/
https://www.example.com:81/guMmybear/
https://www.example.com:81/gummybear.html
http://www.example.com/gummybear/apple#top
file://example.com:81/gummybear#/foobar/?search=params
http://www.exa.mple.com:81/gummybear?search=apple#lol
https://example.com:81/#/gummybear/
http://www.test.com:81/dir/dir.2/index.htm?q1=0&&test1&test2=value#top
For your specific needs, I can come up with this regex:
^https?://[^/]+/gummybear(?:/?|/?#.*)$
Working demo
I haven't escaped slashes to make it more readable, but for javascript you can use:
^https?:\/\/[^\/]+\/gummybear(?:\/?|\/?#.*)$
To start off I know this is bad practice. I know there are libraries out there that are supposed to help with this; however, this is the task to which I was assigned and changing this whole thing to work with a library will be much more work than we can take on right now (since we are on a tight time frame).
In our web app we have fields that people usually type URLs into. We have been assigned a task to 'linkify' anything that looks like a URL. Currently the people who wrote our app seemed to have used a regex to determine if a string of text is a URL. I am basing my regex off that (I am no regex guru, not even a novice).
The 'search' regex looks like so
function DoesTextContainLinks(linktText) {
//replace all urls with links!
var linkifyValue = /((ftp|https?):\/\/)?(www\.)?([a-zA-Z0-9\-]{1,}\.){1,}[a-zA-Z0-9]{1,4}(:[0-9]{1,5})?(\/[a-zA-Z0-9\-\_\.\?\&\#]{1,})*(\/)?$/.test(linktText);
return linkifyValue;
}
Using this regex and https://regex101.com/ I have come up with two regexes that work most of the time.
function WrapLinkTextInAnchorTag(linkText) {
//capture links that only have www and add http to the begining of them (regex ignores entries that have http, https, and ftp in them. They are handled by the next regexes)
linkText = linkText.replace(/(^(?:(?!http).)*^(?:(?!ftp).)(www\.)?([a-zA-Z0-9\-]{1,}\.){1,}[a-zA-Z0-9]{1,4}(:[0-9]{1,5})?(\/[a-zA-Z0-9\-\_\.\?\&\#]{1,})*(\/)?$)/gim, "<a href='http://$1'>$1</a>");
//capture links that have https and http on them and fix those too. No need to prepend http here
linkText = linkText.replace(/(((https|http|ftp?):\/\/)?(www\.)?([a-zA-Z0-9\-]{1,}\.){1,}[a-zA-Z0-9]{1,4}(:[0-9]{1,5})?(\/[a-zA-Z0-9\-\_\.\?\&\#]{1,})*(\/)?$)/gim, "<a href='$1'>$1</a>");
return linkText;
}
The problem here is that some complex URLs seem to not work. I can't understand exactly why they don't work. regex101 is pretty bad ass in that it tells you what each part is doing; however, my trouble is combining these keywords in the regex to get them to do what I want. I have two scenarios to account for : when a user types www.something.com | ftp.something.com and when a user actually types http://www.something.com.
I am looking for some help in pointing out exactly what is wrong with my 2 regexes that prevents them from capturing complicated URLs like the one below
https://pw.something.com/AAPS/default.aspx?guid=a5741c35-6fe1-31a1-b555-4028e931642b
I use this one ...
^(http|https|ftp)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?\/?([a-zA-Z0-9\-\._\?\,\'\/\\\+&%\$#\=~])*$
Look here ... Regex Tester
URL RegExp that requires (http, https, ftp)://, A nice domain, and a decent file/folder string. Allows : after domain name, and these characters in the file/folder string (letter, numbers, - . _ ? , ' / \ + & % $ # = ~). It blocks all other special characters and id good for protecting against user input!
If you look closely you will notice that nowhere in your regexps do you match an = character. That's what's breaking on the example you give.
Changing the second regexp by adding a \= to the characters supported in the path:
linkText.replace(/(((https|http|ftp?):\/\/)?(www\.)?([a-zA-Z0-9\-]{1,}\.){1,}[a-zA-Z0-9]{1,4}(:[0-9]{1,5})?(\/[a-zA-Z0-9\-\_\.\?\&\#\=]{1,})*(\/)?$)/gim, "<a href='$1'>$1</a>");
Causes your example URL to match. That said it may be worth slogging through the RFC on urls (http://www.ietf.org/rfc/rfc3986.txt) to find other characters that might be allowed in URLs (even if they have special meanings) because you're probably missing some others.
I have a string that may contain several url links (http or https). I need a script that would remove all those URLs from the string completely and return that same string without them.
I tried so far:
var url = "and I said http://fdsadfs.com/dasfsdadf/afsdasf.html";
var protomatch = /(https?|ftp):\/\//; // NB: not '.*'
var b = url.replace(protomatch, '');
console.log(b);
but this only removes the http part and keeps the link.
How to write the right regex that it would remove everything that follows http and also detect several links in the string?
Thank you so much!
You can use this regex:
var b = url.replace(/(?:https?|ftp):\/\/[\n\S]+/g, '');
//=> and I said
This regex matches and removes any URL that starts with http:// or https:// or ftp:// and matches up to next space character OR end of input. [\n\S]+ will match across multi lines as well.
Did you search for a url parser regex? This question has a few comprehensive answers Getting parts of a URL (Regex)
That said, if you want something much simpler (and maybe not as perfect), you should remember to capture the entire url string and not just the protocol.
Something like
/(https?|ftp):\/\/[\.[a-zA-Z0-9\/\-]+/
should work better. Notice that the added half parses the rest of the URL after the protocol.
I use this js code to match a hostname from a string:
url.match(/:\/\/(www\.)?(.[^/:]+)/);
This works when the url has protocol:// at the beginning. For example:
This works fine:
var url = "http://domain.com/page";
url.match(/:\/\/(www\.)?(.[^/:]+)/);
But this doesn't:
var url = "domain.com/page";
url.match(/:\/\/(www\.)?(.[^/:]+)/);
I have tried:
url.match(/(:\/\/)?(www\.)?(.[^/:]+)/);
And that matches fine the hostname when it doesn't contain protocol://, but when it does contains it it only returns the protocol and not the hostname.
How could I match the domain when it doesn't contains it?
I used this function from Steven Levithan, it parses urls quite decently.
Here's how you use this function
alert(parseUri("www.domain.com/foo").host)
OK before you have a brain meltdown from #xanatos answer here is a simple regex for basic needs. The other answers are more complete and handle more cases than this regex :
(?:(?:(?:\bhttps?|ftp)://)|^)([-A-Z0-9.]+)/
Group 1 will have your host name. URL parsing is a fragile thing to do with regexes. You were on the right track. You had two regexes that worked partially. I simply combined them.
Edit : I was tired yesterday night. Here is the regex for jscript
if (subject.match(/(?:(?:(?:\bhttps?|ftp):\/\/)|^)([\-a-z0-9.]+)\//i)) {
// Successful match
} else {
// Match attempt failed
}
This
var rx = /^(?:(?:ht|f)tp(?:s?)\:\/\/|~\/|\/)?(?:\w+:\w+#)?(?:(?:[-\w]+\.)+(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))(?::[\d]{1,5})?(?:(?:(?:\/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|\/)+|\?|#)?(?:(?:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?:#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?$/;
should be the uber-url parsing regex :-) Taken from here http://flanders.co.nz/2009/11/08/a-good-url-regular-expression-repost/
Test here: http://jsfiddle.net/Qznzx/1/
It shows the uselessness of regexes.
This might be a bit more complex than necessary but it seems to work:
^((?:.+?:\/\/)?(?:.[^/:]+)+)$
A non-capturing group for the protocol. From the start of the string
match any number of characters until a :. There may be zero or one
protocol.
A non-capturing group for the rest of the url. This part must exist.
Group it all up in single group.