How to find if a text contains url string - javascript

How can I find if text contains a url string. I mean if I have
Sometexthttp://daasddas some text
I want http://daasddas to be achored or maked as a link wit javascript

function replaceURLWithHTMLLinks(text)
{
var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
return text.replace(exp,"<a href='$1'>$1</a>");
}

While the code above works good if all given URLs are full (http://mydomain.com), I had problems parsing a URL like:
www.mydomain.com
i.e. without a protocol.
So I added some simple code to the function:
var exp = /(\b(((https?|ftp|file|):\/\/)|www[.])[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
var temp = text.replace(exp,"$1");
var result = "";
while (temp.length > 0) {
var pos = temp.indexOf("href=\"");
if (pos == -1) {
result += temp;
break;
}
result += temp.substring(0, pos + 6);
temp = temp.substring(pos + 6, temp.length);
if ((temp.indexOf("://") > 8) || (temp.indexOf("://") == -1)) {
result += "http://";
}
}
return result;
If someone should fine a more optimal solution to add a default protocol to URLs, let me know!

You have to use regex(Regular expressions) to find URL patterns in blocks of text.
Here's a link to same question and answers:
Regular Expression to find URLs in block of Text (Javascript)

I tweaked dperinis regex-url script so that a URL embedded in a string can be found. It will not find google.com, this is necessary if it's a user input field, the user might leave out the whitespace after a period/full stop. It will also find www.google.com, since hardly anyone types the protocol.
(?:((?:https?|ftp):\/\/)|ww)(?:\S+(?::\S*)?#)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?
I tested it on www.regextester.com, it worked for me, if you encounter a problem, please comment.

you can use a regular expression to find an URL and replace it by the same with a leading and a trailing tag

Many of the solutions start getting very complex and hard to work with a variety of situations. Here's a function I created to capture any URL beginning with http/https/ftp/file/www. This is working like a charm for me, the only thing it doesn't add a link to is user entered URL's without an http or www at the beginning (i.e. google.com). I hope this solution is helpful for somebody.
function convertText(txtData) {
var urlRegex =/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
txtData = txtData.replace(urlRegex, '$1');
var urlRegex =/(\b(\swww).[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
txtData = txtData.replace(urlRegex, ' $1');
var urlRegex =/(>\swww)/ig;
txtData = txtData.replace(urlRegex, '>www');
var urlRegex =/(\"\swww)/ig;
txtData = txtData.replace(urlRegex, '"http://www');
return txtData;
}

function replaceURLWithHTMLLinksHere(text)
{
var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
return text.replace(exp,"<a href='$1'>$1</a>");
}
Okay we got this regular expresion here in function.
/(\b(https?|ftp|file)://[-A-Z0-9+&##/%?=~|!:,.;]*[-A-Z0-9+&##/%=~|])/ig
Lets understand this.
/ / this is how a regex starts.
\b > is maching https or ftp or file that is unique and is in the start of string. these keywords should not have any character attatched to them in
begining like bbhttps or bbhttp it will not match these otherwise.
https? > here ? means zero or one of preceding character or group. In this case s is optional.
| > match one out of given just like OR.
() > create group to be matched
/ > means the next character is special and is not to be interpreted literally. For example, a 'b' without a preceding '\' generally matches lowercase
'b's wherever they occur. But a '\b' by itself doesn't match any character
[] > this is Character Classes or Character Sets. It is used to have a group of characters and only one character out of all will be present at a time.
[-A-Z0-9+&##/%?=~_|!:,.;]* > zero or more occurrences of the preceding element. For example, b*c matches "c", "bc", "bbc", "bbbc", and so on.
[-A-Z0-9+&##/%=~_|] > means one charactor out of these all.
i > Case-insensitive search.
g > Global search.

function replaceURLWithLinks(text){
var text = "";
text= text.replace(/\r?\n/g, '<br />');
var result = URI.withinString(text, function(url) {
return "<a href='"+url+"' target='_blank'>" + url + "</a>";
});
}

Related

How could I identify plain text website addresses and wrap them in an href tag? [duplicate]

I am using the function below to match URLs inside a given text and replace them for HTML links. The regular expression is working great, but currently I am only replacing the first match.
How I can replace all the URL? I guess I should be using the exec command, but I did not really figure how to do it.
function replaceURLWithHTMLLinks(text) {
var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/i;
return text.replace(exp,"<a href='$1'>$1</a>");
}
First off, rolling your own regexp to parse URLs is a terrible idea. You must imagine this is a common enough problem that someone has written, debugged and tested a library for it, according to the RFCs. URIs are complex - check out the code for URL parsing in Node.js and the Wikipedia page on URI schemes.
There are a ton of edge cases when it comes to parsing URLs: international domain names, actual (.museum) vs. nonexistent (.etc) TLDs, weird punctuation including parentheses, punctuation at the end of the URL, IPV6 hostnames etc.
I've looked at a ton of libraries, and there are a few worth using despite some downsides:
Soapbox's linkify has seen some serious effort put into it, and a major refactor in June 2015 removed the jQuery dependency. It still has issues with IDNs.
AnchorMe is a newcomer that claims to be faster and leaner. Some IDN issues as well.
Autolinker.js lists features very specifically (e.g. "Will properly handle HTML input. The utility will not change the href attribute inside anchor () tags"). I'll thrown some tests at it when a demo becomes available.
Libraries that I've disqualified quickly for this task:
Django's urlize didn't handle certain TLDs properly (here is the official list of valid TLDs. No demo.
autolink-js wouldn't detect "www.google.com" without http://, so it's not quite suitable for autolinking "casual URLs" (without a scheme/protocol) found in plain text.
Ben Alman's linkify hasn't been maintained since 2009.
If you insist on a regular expression, the most comprehensive is the URL regexp from Component, though it will falsely detect some non-existent two-letter TLDs by looking at it.
Replacing URLs with links (Answer to the General Problem)
The regular expression in the question misses a lot of edge cases. When detecting URLs, it's always better to use a specialized library that handles international domain names, new TLDs like .museum, parentheses and other punctuation within and at the end of the URL, and many other edge cases. See the Jeff Atwood's blog post The Problem With URLs for an explanation of some of the other issues.
The best summary of URL matching libraries is in Dan Dascalescu's Answer
(as of Feb 2014)
"Make a regular expression replace more than one match" (Answer to the specific problem)
Add a "g" to the end of the regular expression to enable global matching:
/ig;
But that only fixes the problem in the question where the regular expression was only replacing the first match. Do not use that code.
I've made some small modifications to Travis's code (just to avoid any unnecessary redeclaration - but it's working great for my needs, so nice job!):
function linkify(inputText) {
var replacedText, replacePattern1, replacePattern2, replacePattern3;
//URLs starting with http://, https://, or ftp://
replacePattern1 = /(\b(https?|ftp):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gim;
replacedText = inputText.replace(replacePattern1, '$1');
//URLs starting with "www." (without // before it, or it'd re-link the ones done above).
replacePattern2 = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
replacedText = replacedText.replace(replacePattern2, '$1$2');
//Change email addresses to mailto:: links.
replacePattern3 = /(([a-zA-Z0-9\-\_\.])+#[a-zA-Z\_]+?(\.[a-zA-Z]{2,6})+)/gim;
replacedText = replacedText.replace(replacePattern3, '$1');
return replacedText;
}
Made some optimizations to Travis' Linkify() code above. I also fixed a bug where email addresses with subdomain type formats would not be matched (i.e. example#domain.co.uk).
In addition, I changed the implementation to prototype the String class so that items can be matched like so:
var text = 'address#example.com';
text.linkify();
'http://stackoverflow.com/'.linkify();
Anyway, here's the script:
if(!String.linkify) {
String.prototype.linkify = function() {
// http://, https://, ftp://
var urlPattern = /\b(?:https?|ftp):\/\/[a-z0-9-+&##\/%?=~_|!:,.;]*[a-z0-9-+&##\/%=~_|]/gim;
// www. sans http:// or https://
var pseudoUrlPattern = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
// Email addresses
var emailAddressPattern = /[\w.]+#[a-zA-Z_-]+?(?:\.[a-zA-Z]{2,6})+/gim;
return this
.replace(urlPattern, '$&')
.replace(pseudoUrlPattern, '$1$2')
.replace(emailAddressPattern, '$&');
};
}
Thanks, this was very helpful. I also wanted something that would link things that looked like a URL -- as a basic requirement, it'd link something like www.yahoo.com, even if the http:// protocol prefix was not present. So basically, if "www." is present, it'll link it and assume it's http://. I also wanted emails to turn into mailto: links. EXAMPLE: www.yahoo.com would be converted to www.yahoo.com
Here's the code I ended up with (combination of code from this page and other stuff I found online, and other stuff I did on my own):
function Linkify(inputText) {
//URLs starting with http://, https://, or ftp://
var replacePattern1 = /(\b(https?|ftp):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gim;
var replacedText = inputText.replace(replacePattern1, '$1');
//URLs starting with www. (without // before it, or it'd re-link the ones done above)
var replacePattern2 = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
var replacedText = replacedText.replace(replacePattern2, '$1$2');
//Change email addresses to mailto:: links
var replacePattern3 = /(\w+#[a-zA-Z_]+?\.[a-zA-Z]{2,6})/gim;
var replacedText = replacedText.replace(replacePattern3, '$1');
return replacedText
}
In the 2nd replace, the (^|[^/]) part is only replacing www.whatever.com if it's not already prefixed by // -- to avoid double-linking if a URL was already linked in the first replace. Also, it's possible that www.whatever.com might be at the beginning of the string, which is the first "or" condition in that part of the regex.
This could be integrated as a jQuery plugin as Jesse P illustrated above -- but I specifically wanted a regular function that wasn't acting on an existing DOM element, because I'm taking text I have and then adding it to the DOM, and I want the text to be "linkified" before I add it, so I pass the text through this function. Works great.
Identifying URLs is tricky because they are often surrounded by punctuation marks and because users frequently do not use the full form of the URL. Many JavaScript functions exist for replacing URLs with hyperlinks, but I was unable to find one that works as well as the urlize filter in the Python-based web framework Django. I therefore ported Django's urlize function to JavaScript:
https://github.com/ljosa/urlize.js
An example:
urlize('Go to SO (stackoverflow.com) and ask. <grin>',
{nofollow: true, autoescape: true})
=> "Go to SO (stackoverflow.com) and ask. <grin>"
The second argument, if true, causes rel="nofollow" to be inserted. The third argument, if true, escapes characters that have special meaning in HTML. See the README file.
I searched on google for anything newer and ran across this one:
$('p').each(function(){
$(this).html( $(this).html().replace(/((http|https|ftp):\/\/[\w?=&.\/-;#~%-]+(?![\w\s?&.\/;#~%"=-]*>))/g, '$1 ') );
});
demo: http://jsfiddle.net/kachibito/hEgvc/1/
Works really well for normal links.
I made a change to Roshambo String.linkify() to the emailAddressPattern to recognize aaa.bbb.#ccc.ddd addresses
if(!String.linkify) {
String.prototype.linkify = function() {
// http://, https://, ftp://
var urlPattern = /\b(?:https?|ftp):\/\/[a-z0-9-+&##\/%?=~_|!:,.;]*[a-z0-9-+&##\/%=~_|]/gim;
// www. sans http:// or https://
var pseudoUrlPattern = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
// Email addresses *** here I've changed the expression ***
var emailAddressPattern = /(([a-zA-Z0-9_\-\.]+)#[a-zA-Z_]+?(?:\.[a-zA-Z]{2,6}))+/gim;
return this
.replace(urlPattern, '<a target="_blank" href="$&">$&</a>')
.replace(pseudoUrlPattern, '$1<a target="_blank" href="http://$2">$2</a>')
.replace(emailAddressPattern, '<a target="_blank" href="mailto:$1">$1</a>');
};
}
/**
* Convert URLs in a string to anchor buttons
* #param {!string} string
* #returns {!string}
*/
function URLify(string){
var urls = string.match(/(((ftp|https?):\/\/)[\-\w#:%_\+.~#?,&\/\/=]+)/g);
if (urls) {
urls.forEach(function (url) {
string = string.replace(url, '<a target="_blank" href="' + url + '">' + url + "</a>");
});
}
return string.replace("(", "<br/>(");
}
simple example
The best script to do this:
http://benalman.com/projects/javascript-linkify-process-lin/
This solution works like many of the others, and in fact uses the same regex as one of them, however in stead of returning a HTML String this will return a document fragment containing the A element and any applicable text nodes.
function make_link(string) {
var words = string.split(' '),
ret = document.createDocumentFragment();
for (var i = 0, l = words.length; i < l; i++) {
if (words[i].match(/[-a-zA-Z0-9#:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)?/gi)) {
var elm = document.createElement('a');
elm.href = words[i];
elm.textContent = words[i];
if (ret.childNodes.length > 0) {
ret.lastChild.textContent += ' ';
}
ret.appendChild(elm);
} else {
if (ret.lastChild && ret.lastChild.nodeType === 3) {
ret.lastChild.textContent += ' ' + words[i];
} else {
ret.appendChild(document.createTextNode(' ' + words[i]));
}
}
}
return ret;
}
There are some caveats, namely with older IE and textContent support.
here is a demo.
If you need to show shorter link (only domain), but with same long URL, you can try my modification of Sam Hasler's code version posted above
function replaceURLWithHTMLLinks(text) {
var exp = /(\b(https?|ftp|file):\/\/([-A-Z0-9+&##%?=~_|!:,.;]*)([-A-Z0-9+&##%?\/=~_|!:,.;]*)[-A-Z0-9+&##\/%=~_|])/ig;
return text.replace(exp, "<a href='$1' target='_blank'>$3</a>");
}
Reg Ex:
/(\b((https?|ftp|file):\/\/|(www))[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|]*)/ig
function UriphiMe(text) {
var exp = /(\b((https?|ftp|file):\/\/|(www))[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|]*)/ig;
return text.replace(exp,"<a href='$1'>$1</a>");
}
Below are some tested string:
Find me on to www.google.com
www
Find me on to www.http://www.com
Follow me on : http://www.nishantwork.wordpress.com
http://www.nishantwork.wordpress.com
Follow me on : http://www.nishantwork.wordpress.com
https://stackoverflow.com/users/430803/nishant
Note: If you don't want to pass www as valid one just use below reg ex:
/(\b((https?|ftp|file):\/\/|(www))[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig
The warnings about URI complexity should be noted, but the simple answer to your question is:
To replace every match you need to add the /g flag to the end of the RegEx:
/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gi
Try the below function :
function anchorify(text){
var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
var text1=text.replace(exp, "<a href='$1'>$1</a>");
var exp2 =/(^|[^\/])(www\.[\S]+(\b|$))/gim;
return text1.replace(exp2, '$1<a target="_blank" href="http://$2">$2</a>');
}
alert(anchorify("Hola amigo! https://www.sharda.ac.in/academics/"));
Keep it simple! Say what you cannot have, rather than what you can have :)
As mentioned above, URLs can be quite complex, especially after the '?', and not all of them start with a 'www.' e.g. maps.bing.com/something?key=!"£$%^*()&lat=65&lon&lon=20
So, rather than have a complex regex that wont meet all edge cases, and will be hard to maintain, how about this much simpler one, which works well for me in practise.
Match
http(s):// (anything but a space)+
www. (anything but a space)+
Where 'anything' is [^'"<>\s]
... basically a greedy match, carrying on to you meet a space, quote, angle bracket, or end of line
Also:
Remember to check that it is not already in URL format, e.g. the text contains href="..." or src="..."
Add ref=nofollow (if appropriate)
This solution isn't as "good" as the libraries mentioned above, but is much simpler, and works well in practise.
if html.match( /(href)|(src)/i )) {
return html; // text already has a hyper link in it
}
html = html.replace(
/\b(https?:\/\/[^\s\(\)\'\"\<\>]+)/ig,
"<a ref='nofollow' href='$1'>$1</a>"
);
html = html.replace(
/\s(www\.[^\s\(\)\'\"\<\>]+)/ig,
"<a ref='nofollow' href='http://$1'>$1</a>"
);
html = html.replace(
/^(www\.[^\s\(\)\'\"\<\>]+)/ig,
"<a ref='nofollow' href='http://$1'>$1</a>"
);
return html;
Correct URL detection with international domains & astral characters support is not trivial thing. linkify-it library builds regex from many conditions, and final size is about 6 kilobytes :) . It's more accurate than all libs, currently referenced in accepted answer.
See linkify-it demo to check live all edge cases and test your ones.
If you need to linkify HTML source, you should parse it first, and iterate each text token separately.
I've wrote yet another JavaScript library, it might be better for you since it's very sensitive with the least possible false positives, fast and small in size. I'm currently actively maintaining it so please do test it in the demo page and see how it would work for you.
link: https://github.com/alexcorvi/anchorme.js
I had to do the opposite, and make html links into just the URL, but I modified your regex and it works like a charm, thanks :)
var exp = /<a\s.*href=['"](\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])['"].*>.*<\/a>/ig;
source = source.replace(exp,"$1");
The e-mail detection in Travitron's answer above did not work for me, so I extended/replaced it with the following (C# code).
// Change e-mail addresses to mailto: links.
const RegexOptions o = RegexOptions.Multiline | RegexOptions.IgnoreCase;
const string pat3 = #"([a-zA-Z0-9_\-\.]+)#([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,6})";
const string rep3 = #"$1#$2.$3";
text = Regex.Replace(text, pat3, rep3, o);
This allows for e-mail addresses like "firstname.secondname#one.two.three.co.uk".
After input from several sources I've now a solution that works well. It had to do with writing your own replacement code.
Answer.
Fiddle.
function replaceURLWithHTMLLinks(text) {
var re = /(\(.*?)?\b((?:https?|ftp|file):\/\/[-a-z0-9+&##\/%?=~_()|!:,.;]*[-a-z0-9+&##\/%=~_()|])/ig;
return text.replace(re, function(match, lParens, url) {
var rParens = '';
lParens = lParens || '';
// Try to strip the same number of right parens from url
// as there are left parens. Here, lParenCounter must be
// a RegExp object. You cannot use a literal
// while (/\(/g.exec(lParens)) { ... }
// because an object is needed to store the lastIndex state.
var lParenCounter = /\(/g;
while (lParenCounter.exec(lParens)) {
var m;
// We want m[1] to be greedy, unless a period precedes the
// right parenthesis. These tests cannot be simplified as
// /(.*)(\.?\).*)/.exec(url)
// because if (.*) is greedy then \.? never gets a chance.
if (m = /(.*)(\.\).*)/.exec(url) ||
/(.*)(\).*)/.exec(url)) {
url = m[1];
rParens = m[2] + rParens;
}
}
return lParens + "<a href='" + url + "'>" + url + "</a>" + rParens;
});
}
Here's my solution:
var content = "Visit https://wwww.google.com or watch this video: https://www.youtube.com/watch?v=0T4DQYgsazo and news at http://www.bbc.com";
content = replaceUrlsWithLinks(content, "http://");
content = replaceUrlsWithLinks(content, "https://");
function replaceUrlsWithLinks(content, protocol) {
var startPos = 0;
var s = 0;
while (s < content.length) {
startPos = content.indexOf(protocol, s);
if (startPos < 0)
return content;
let endPos = content.indexOf(" ", startPos + 1);
if (endPos < 0)
endPos = content.length;
let url = content.substr(startPos, endPos - startPos);
if (url.endsWith(".") || url.endsWith("?") || url.endsWith(",")) {
url = url.substr(0, url.length - 1);
endPos--;
}
if (ROOTNS.utils.stringsHelper.validUrl(url)) {
let link = "<a href='" + url + "'>" + url + "</a>";
content = content.substr(0, startPos) + link + content.substr(endPos);
s = startPos + link.length;
} else {
s = endPos + 1;
}
}
return content;
}
function validUrl(url) {
try {
new URL(url);
return true;
} catch (e) {
return false;
}
}
Try Below Solution
function replaceLinkClickableLink(url = '') {
let pattern = new RegExp('^(https?:\\/\\/)?'+
'((([a-z\\d]([a-z\\d-]*[a-z\\d])*)\\.?)+[a-z]{2,}|'+
'((\\d{1,3}\\.){3}\\d{1,3}))'+
'(\\:\\d+)?(\\/[-a-z\\d%_.~+]*)*'+
'(\\?[;&a-z\\d%_.~+=-]*)?'+
'(\\#[-a-z\\d_]*)?$','i');
let isUrl = pattern.test(url);
if (isUrl) {
return `${url}`;
}
return url;
}
Replace URLs in text with HTML links, ignore the URLs within a href/pre tag.
https://github.com/JimLiu/auto-link
worked for me :
var urlRegex =/(\b((https?|ftp|file):\/\/)?((([a-z\d]([a-z\d-]*[a-z\d])*)\.)+[a-z]{2,}|((\d{1,3}\.){3}\d{1,3}))(\:\d+)?(\/[-a-z\d%_.~+]*)*(\?[;&a-z\d%_.~+=-]*)?(\#[-a-z\d_]*)?)/ig;
return text.replace(urlRegex, function(url) {
var newUrl = url.indexOf("http") === -1 ? "http://" + url : url;
return '' + url + '';
});

Regex match cookie value and remove hyphens

I'm trying to extract out a group of words from a larger string/cookie that are separated by hyphens. I would like to replace the hyphens with a space and set to a variable. Javascript or jQuery.
As an example, the larger string has a name and value like this within it:
facility=34222%7CConner-Department-Store;
(notice the leading "C")
So first, I need to match()/find facility=34222%7CConner-Department-Store; with regex. Then break it down to "Conner Department Store"
var cookie = document.cookie;
var facilityValue = cookie.match( REGEX ); ??
var test = "store=874635%7Csomethingelse;facility=34222%7CConner-Department-Store;store=874635%7Csomethingelse;";
var test2 = test.replace(/^(.*)facility=([^;]+)(.*)$/, function(matchedString, match1, match2, match3){
return decodeURIComponent(match2);
});
console.log( test2 );
console.log( test2.split('|')[1].replace(/[-]/g, ' ') );
If I understood it correctly, you want to make a phrase by getting all the words between hyphens and disallowing two successive Uppercase letters in a word, so I'd prefer using Regex in that case.
This is a Regex solution, that works dynamically with any cookies in the same format and extract the wanted sentence from it:
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Demo:
var str = "facility=34222%7CConner-Department-Store;";
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Explanation:
Use this Regex (/([A-Z][a-z]+)-?/g to match the words between -.
Replace any - occurence in the matched words.
Then just join these matches array with white space.
Ok,
first, you should decode this string as follows:
var str = "facility=34222%7CConner-Department-Store;"
var decoded = decodeURIComponent(str);
// decoded = "facility=34222|Conner-Department-Store;"
Then you have multiple possibilities to split up this string.
The easiest way is to use substring()
var solution1 = decoded.substring(decoded.indexOf('|') + 1, decoded.length)
// solution1 = "Conner-Department-Store;"
solution1 = solution1.replace('-', ' ');
// solution1 = "Conner Department Store;"
As you can see, substring(arg1, arg2) returns the string, starting at index arg1 and ending at index arg2. See Full Documentation here
If you want to cut the last ; just set decoded.length - 1 as arg2 in the snippet above.
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1)
//returns "Conner-Department-Store"
or all above in just one line:
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1).replace('-', ' ')
If you want still to use a regular Expression to retrieve (perhaps more) data out of the string, you could use something similar to this snippet:
var solution2 = "";
var regEx= /([A-Za-z]*)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/;
if (regEx.test(decoded)) {
solution2 = decoded.match(regEx);
/* returns
[0:"facility=34222|Conner-Department-Store",
1:"facility",
2:"34222",
3:"Conner-Department-Store",
index:0,
input:"facility=34222|Conner-Department-Store;"
length:4] */
solution2 = solution2[3].replace('-', ' ');
// "Conner Department Store"
}
I have applied some rules for the regex to work, feel free to modify them according your needs.
facility can be any Word built with alphabetical characters lower and uppercase (no other chars) at any length
= needs to be the char =
34222 can be any number but no other characters
| needs to be the char |
Conner-Department-Store can be any characters except one of the following (reserved delimiters): :/?#[]#;,'
Hope this helps :)
edit: to find only the part
facility=34222%7CConner-Department-Store; just modify the regex to
match facility= instead of ([A-z]*)=:
/(facility)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/
You can use cookies.js, a mini framework from MDN (Mozilla Developer Network).
Simply include the cookies.js file in your application, and write:
docCookies.getItem("Connor Department Store");

Javascript match URL with wildcards - Chrome Extension

I'm writing a chrome extension which allows the user to modify content on specific websites. I'd like the user to be able to specify these websites using wildcards, for example http://*.google.com or http://google.com/*
I found the following code
currentUrl = "http://google.com/";
matchUrl = "http://*.google.com/*";
match = RegExp(matchUrl.replace(/\*/g, "[^]*")).test(currentUrl);
But there are a few problems with it.
http://test.google.com/ is a match
http://google.com/ is not a match
http://test.google.com is not a match
http://.google.com/ is a match
Clarification:
http://google.com Isn't a match, and that is the real problem.
So how can I can I create a JavaScript code snippet that will check if there is a match correctly?
I suggest parsing the URL into protocol, base part and the rest, and then re-build the validation regex replacing * inside the base part with (?:[^/]*\\.)* and otherwise with (?:/[^]*)?. Also, you must escape all other special chars with .replace(/[?()[\]\\.+^$|]/g, "\\$&"). You will also need anchors (^ for start of string and $ for the end of string position) to match the entire string. A case insensitive /i modifier is just a bonus to make the pattern case insensitive.
So, for this exact matchUrl, the regex will look like
/^http:\/\/(?:[^\/]*\.)*google\.com(?:\/[^]*)?$/
See the regex demo
var rxUrlSplit = /((?:http|ftp)s?):\/\/([^\/]+)(\/.*)?/;
var strs = ['http://test.google.com/', 'http://google.com/','http://test.google.com', 'http://.google.com/','http://one.more.test.google.com'];
var matchUrl = "http://*.google.com/*";
var prepUrl = "";
if ((m=matchUrl.match(rxUrlSplit)) !== null) {
prepUrl = m[1]+"://"+m[2].replace(/[?()[\]\\.+^$|]/g, "\\$&").replace(/\*\\./g,'(?:[^/]*\\.)*').replace(/\*$/,'[^/]*');
if (m[3]) {
prepUrl+= m[3].replace(/[?()[\]\\.+^$|]/g, "\\$&").replace(/\/\*(?=$|\/)/g, '(?:/[^]*)?');
}
}
if (prepUrl) {
// console.log(prepUrl); // ^http://(?:[^/]*\.)*google\.com(?:/[^]*)?$
var rx = RegExp("^" + prepUrl + "$", "i");
for (var s of strs) {
if (s.match(rx)) {
console.log(s + " matches!<br/>");
} else {
console.log(s + " does not match!<br/>");
}
}
}
with this matchUrl
matchUrl = "http://*.google.com/*";
the RexExp is something like this
"http://.*.google.com/.*"
so try to replace the * entered by the user with .* in the regexp match
you can use this tool to test it

How to convert text within a textarea into a URL like Facebook does? [duplicate]

I am using the function below to match URLs inside a given text and replace them for HTML links. The regular expression is working great, but currently I am only replacing the first match.
How I can replace all the URL? I guess I should be using the exec command, but I did not really figure how to do it.
function replaceURLWithHTMLLinks(text) {
var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/i;
return text.replace(exp,"<a href='$1'>$1</a>");
}
First off, rolling your own regexp to parse URLs is a terrible idea. You must imagine this is a common enough problem that someone has written, debugged and tested a library for it, according to the RFCs. URIs are complex - check out the code for URL parsing in Node.js and the Wikipedia page on URI schemes.
There are a ton of edge cases when it comes to parsing URLs: international domain names, actual (.museum) vs. nonexistent (.etc) TLDs, weird punctuation including parentheses, punctuation at the end of the URL, IPV6 hostnames etc.
I've looked at a ton of libraries, and there are a few worth using despite some downsides:
Soapbox's linkify has seen some serious effort put into it, and a major refactor in June 2015 removed the jQuery dependency. It still has issues with IDNs.
AnchorMe is a newcomer that claims to be faster and leaner. Some IDN issues as well.
Autolinker.js lists features very specifically (e.g. "Will properly handle HTML input. The utility will not change the href attribute inside anchor () tags"). I'll thrown some tests at it when a demo becomes available.
Libraries that I've disqualified quickly for this task:
Django's urlize didn't handle certain TLDs properly (here is the official list of valid TLDs. No demo.
autolink-js wouldn't detect "www.google.com" without http://, so it's not quite suitable for autolinking "casual URLs" (without a scheme/protocol) found in plain text.
Ben Alman's linkify hasn't been maintained since 2009.
If you insist on a regular expression, the most comprehensive is the URL regexp from Component, though it will falsely detect some non-existent two-letter TLDs by looking at it.
Replacing URLs with links (Answer to the General Problem)
The regular expression in the question misses a lot of edge cases. When detecting URLs, it's always better to use a specialized library that handles international domain names, new TLDs like .museum, parentheses and other punctuation within and at the end of the URL, and many other edge cases. See the Jeff Atwood's blog post The Problem With URLs for an explanation of some of the other issues.
The best summary of URL matching libraries is in Dan Dascalescu's Answer
(as of Feb 2014)
"Make a regular expression replace more than one match" (Answer to the specific problem)
Add a "g" to the end of the regular expression to enable global matching:
/ig;
But that only fixes the problem in the question where the regular expression was only replacing the first match. Do not use that code.
I've made some small modifications to Travis's code (just to avoid any unnecessary redeclaration - but it's working great for my needs, so nice job!):
function linkify(inputText) {
var replacedText, replacePattern1, replacePattern2, replacePattern3;
//URLs starting with http://, https://, or ftp://
replacePattern1 = /(\b(https?|ftp):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gim;
replacedText = inputText.replace(replacePattern1, '$1');
//URLs starting with "www." (without // before it, or it'd re-link the ones done above).
replacePattern2 = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
replacedText = replacedText.replace(replacePattern2, '$1$2');
//Change email addresses to mailto:: links.
replacePattern3 = /(([a-zA-Z0-9\-\_\.])+#[a-zA-Z\_]+?(\.[a-zA-Z]{2,6})+)/gim;
replacedText = replacedText.replace(replacePattern3, '$1');
return replacedText;
}
Made some optimizations to Travis' Linkify() code above. I also fixed a bug where email addresses with subdomain type formats would not be matched (i.e. example#domain.co.uk).
In addition, I changed the implementation to prototype the String class so that items can be matched like so:
var text = 'address#example.com';
text.linkify();
'http://stackoverflow.com/'.linkify();
Anyway, here's the script:
if(!String.linkify) {
String.prototype.linkify = function() {
// http://, https://, ftp://
var urlPattern = /\b(?:https?|ftp):\/\/[a-z0-9-+&##\/%?=~_|!:,.;]*[a-z0-9-+&##\/%=~_|]/gim;
// www. sans http:// or https://
var pseudoUrlPattern = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
// Email addresses
var emailAddressPattern = /[\w.]+#[a-zA-Z_-]+?(?:\.[a-zA-Z]{2,6})+/gim;
return this
.replace(urlPattern, '$&')
.replace(pseudoUrlPattern, '$1$2')
.replace(emailAddressPattern, '$&');
};
}
Thanks, this was very helpful. I also wanted something that would link things that looked like a URL -- as a basic requirement, it'd link something like www.yahoo.com, even if the http:// protocol prefix was not present. So basically, if "www." is present, it'll link it and assume it's http://. I also wanted emails to turn into mailto: links. EXAMPLE: www.yahoo.com would be converted to www.yahoo.com
Here's the code I ended up with (combination of code from this page and other stuff I found online, and other stuff I did on my own):
function Linkify(inputText) {
//URLs starting with http://, https://, or ftp://
var replacePattern1 = /(\b(https?|ftp):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gim;
var replacedText = inputText.replace(replacePattern1, '$1');
//URLs starting with www. (without // before it, or it'd re-link the ones done above)
var replacePattern2 = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
var replacedText = replacedText.replace(replacePattern2, '$1$2');
//Change email addresses to mailto:: links
var replacePattern3 = /(\w+#[a-zA-Z_]+?\.[a-zA-Z]{2,6})/gim;
var replacedText = replacedText.replace(replacePattern3, '$1');
return replacedText
}
In the 2nd replace, the (^|[^/]) part is only replacing www.whatever.com if it's not already prefixed by // -- to avoid double-linking if a URL was already linked in the first replace. Also, it's possible that www.whatever.com might be at the beginning of the string, which is the first "or" condition in that part of the regex.
This could be integrated as a jQuery plugin as Jesse P illustrated above -- but I specifically wanted a regular function that wasn't acting on an existing DOM element, because I'm taking text I have and then adding it to the DOM, and I want the text to be "linkified" before I add it, so I pass the text through this function. Works great.
Identifying URLs is tricky because they are often surrounded by punctuation marks and because users frequently do not use the full form of the URL. Many JavaScript functions exist for replacing URLs with hyperlinks, but I was unable to find one that works as well as the urlize filter in the Python-based web framework Django. I therefore ported Django's urlize function to JavaScript:
https://github.com/ljosa/urlize.js
An example:
urlize('Go to SO (stackoverflow.com) and ask. <grin>',
{nofollow: true, autoescape: true})
=> "Go to SO (stackoverflow.com) and ask. <grin>"
The second argument, if true, causes rel="nofollow" to be inserted. The third argument, if true, escapes characters that have special meaning in HTML. See the README file.
I searched on google for anything newer and ran across this one:
$('p').each(function(){
$(this).html( $(this).html().replace(/((http|https|ftp):\/\/[\w?=&.\/-;#~%-]+(?![\w\s?&.\/;#~%"=-]*>))/g, '$1 ') );
});
demo: http://jsfiddle.net/kachibito/hEgvc/1/
Works really well for normal links.
I made a change to Roshambo String.linkify() to the emailAddressPattern to recognize aaa.bbb.#ccc.ddd addresses
if(!String.linkify) {
String.prototype.linkify = function() {
// http://, https://, ftp://
var urlPattern = /\b(?:https?|ftp):\/\/[a-z0-9-+&##\/%?=~_|!:,.;]*[a-z0-9-+&##\/%=~_|]/gim;
// www. sans http:// or https://
var pseudoUrlPattern = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
// Email addresses *** here I've changed the expression ***
var emailAddressPattern = /(([a-zA-Z0-9_\-\.]+)#[a-zA-Z_]+?(?:\.[a-zA-Z]{2,6}))+/gim;
return this
.replace(urlPattern, '<a target="_blank" href="$&">$&</a>')
.replace(pseudoUrlPattern, '$1<a target="_blank" href="http://$2">$2</a>')
.replace(emailAddressPattern, '<a target="_blank" href="mailto:$1">$1</a>');
};
}
/**
* Convert URLs in a string to anchor buttons
* #param {!string} string
* #returns {!string}
*/
function URLify(string){
var urls = string.match(/(((ftp|https?):\/\/)[\-\w#:%_\+.~#?,&\/\/=]+)/g);
if (urls) {
urls.forEach(function (url) {
string = string.replace(url, '<a target="_blank" href="' + url + '">' + url + "</a>");
});
}
return string.replace("(", "<br/>(");
}
simple example
The best script to do this:
http://benalman.com/projects/javascript-linkify-process-lin/
This solution works like many of the others, and in fact uses the same regex as one of them, however in stead of returning a HTML String this will return a document fragment containing the A element and any applicable text nodes.
function make_link(string) {
var words = string.split(' '),
ret = document.createDocumentFragment();
for (var i = 0, l = words.length; i < l; i++) {
if (words[i].match(/[-a-zA-Z0-9#:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)?/gi)) {
var elm = document.createElement('a');
elm.href = words[i];
elm.textContent = words[i];
if (ret.childNodes.length > 0) {
ret.lastChild.textContent += ' ';
}
ret.appendChild(elm);
} else {
if (ret.lastChild && ret.lastChild.nodeType === 3) {
ret.lastChild.textContent += ' ' + words[i];
} else {
ret.appendChild(document.createTextNode(' ' + words[i]));
}
}
}
return ret;
}
There are some caveats, namely with older IE and textContent support.
here is a demo.
If you need to show shorter link (only domain), but with same long URL, you can try my modification of Sam Hasler's code version posted above
function replaceURLWithHTMLLinks(text) {
var exp = /(\b(https?|ftp|file):\/\/([-A-Z0-9+&##%?=~_|!:,.;]*)([-A-Z0-9+&##%?\/=~_|!:,.;]*)[-A-Z0-9+&##\/%=~_|])/ig;
return text.replace(exp, "<a href='$1' target='_blank'>$3</a>");
}
Reg Ex:
/(\b((https?|ftp|file):\/\/|(www))[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|]*)/ig
function UriphiMe(text) {
var exp = /(\b((https?|ftp|file):\/\/|(www))[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|]*)/ig;
return text.replace(exp,"<a href='$1'>$1</a>");
}
Below are some tested string:
Find me on to www.google.com
www
Find me on to www.http://www.com
Follow me on : http://www.nishantwork.wordpress.com
http://www.nishantwork.wordpress.com
Follow me on : http://www.nishantwork.wordpress.com
https://stackoverflow.com/users/430803/nishant
Note: If you don't want to pass www as valid one just use below reg ex:
/(\b((https?|ftp|file):\/\/|(www))[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig
The warnings about URI complexity should be noted, but the simple answer to your question is:
To replace every match you need to add the /g flag to the end of the RegEx:
/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gi
Try the below function :
function anchorify(text){
var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
var text1=text.replace(exp, "<a href='$1'>$1</a>");
var exp2 =/(^|[^\/])(www\.[\S]+(\b|$))/gim;
return text1.replace(exp2, '$1<a target="_blank" href="http://$2">$2</a>');
}
alert(anchorify("Hola amigo! https://www.sharda.ac.in/academics/"));
Keep it simple! Say what you cannot have, rather than what you can have :)
As mentioned above, URLs can be quite complex, especially after the '?', and not all of them start with a 'www.' e.g. maps.bing.com/something?key=!"£$%^*()&lat=65&lon&lon=20
So, rather than have a complex regex that wont meet all edge cases, and will be hard to maintain, how about this much simpler one, which works well for me in practise.
Match
http(s):// (anything but a space)+
www. (anything but a space)+
Where 'anything' is [^'"<>\s]
... basically a greedy match, carrying on to you meet a space, quote, angle bracket, or end of line
Also:
Remember to check that it is not already in URL format, e.g. the text contains href="..." or src="..."
Add ref=nofollow (if appropriate)
This solution isn't as "good" as the libraries mentioned above, but is much simpler, and works well in practise.
if html.match( /(href)|(src)/i )) {
return html; // text already has a hyper link in it
}
html = html.replace(
/\b(https?:\/\/[^\s\(\)\'\"\<\>]+)/ig,
"<a ref='nofollow' href='$1'>$1</a>"
);
html = html.replace(
/\s(www\.[^\s\(\)\'\"\<\>]+)/ig,
"<a ref='nofollow' href='http://$1'>$1</a>"
);
html = html.replace(
/^(www\.[^\s\(\)\'\"\<\>]+)/ig,
"<a ref='nofollow' href='http://$1'>$1</a>"
);
return html;
Correct URL detection with international domains & astral characters support is not trivial thing. linkify-it library builds regex from many conditions, and final size is about 6 kilobytes :) . It's more accurate than all libs, currently referenced in accepted answer.
See linkify-it demo to check live all edge cases and test your ones.
If you need to linkify HTML source, you should parse it first, and iterate each text token separately.
I've wrote yet another JavaScript library, it might be better for you since it's very sensitive with the least possible false positives, fast and small in size. I'm currently actively maintaining it so please do test it in the demo page and see how it would work for you.
link: https://github.com/alexcorvi/anchorme.js
I had to do the opposite, and make html links into just the URL, but I modified your regex and it works like a charm, thanks :)
var exp = /<a\s.*href=['"](\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])['"].*>.*<\/a>/ig;
source = source.replace(exp,"$1");
The e-mail detection in Travitron's answer above did not work for me, so I extended/replaced it with the following (C# code).
// Change e-mail addresses to mailto: links.
const RegexOptions o = RegexOptions.Multiline | RegexOptions.IgnoreCase;
const string pat3 = #"([a-zA-Z0-9_\-\.]+)#([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,6})";
const string rep3 = #"$1#$2.$3";
text = Regex.Replace(text, pat3, rep3, o);
This allows for e-mail addresses like "firstname.secondname#one.two.three.co.uk".
After input from several sources I've now a solution that works well. It had to do with writing your own replacement code.
Answer.
Fiddle.
function replaceURLWithHTMLLinks(text) {
var re = /(\(.*?)?\b((?:https?|ftp|file):\/\/[-a-z0-9+&##\/%?=~_()|!:,.;]*[-a-z0-9+&##\/%=~_()|])/ig;
return text.replace(re, function(match, lParens, url) {
var rParens = '';
lParens = lParens || '';
// Try to strip the same number of right parens from url
// as there are left parens. Here, lParenCounter must be
// a RegExp object. You cannot use a literal
// while (/\(/g.exec(lParens)) { ... }
// because an object is needed to store the lastIndex state.
var lParenCounter = /\(/g;
while (lParenCounter.exec(lParens)) {
var m;
// We want m[1] to be greedy, unless a period precedes the
// right parenthesis. These tests cannot be simplified as
// /(.*)(\.?\).*)/.exec(url)
// because if (.*) is greedy then \.? never gets a chance.
if (m = /(.*)(\.\).*)/.exec(url) ||
/(.*)(\).*)/.exec(url)) {
url = m[1];
rParens = m[2] + rParens;
}
}
return lParens + "<a href='" + url + "'>" + url + "</a>" + rParens;
});
}
Here's my solution:
var content = "Visit https://wwww.google.com or watch this video: https://www.youtube.com/watch?v=0T4DQYgsazo and news at http://www.bbc.com";
content = replaceUrlsWithLinks(content, "http://");
content = replaceUrlsWithLinks(content, "https://");
function replaceUrlsWithLinks(content, protocol) {
var startPos = 0;
var s = 0;
while (s < content.length) {
startPos = content.indexOf(protocol, s);
if (startPos < 0)
return content;
let endPos = content.indexOf(" ", startPos + 1);
if (endPos < 0)
endPos = content.length;
let url = content.substr(startPos, endPos - startPos);
if (url.endsWith(".") || url.endsWith("?") || url.endsWith(",")) {
url = url.substr(0, url.length - 1);
endPos--;
}
if (ROOTNS.utils.stringsHelper.validUrl(url)) {
let link = "<a href='" + url + "'>" + url + "</a>";
content = content.substr(0, startPos) + link + content.substr(endPos);
s = startPos + link.length;
} else {
s = endPos + 1;
}
}
return content;
}
function validUrl(url) {
try {
new URL(url);
return true;
} catch (e) {
return false;
}
}
Try Below Solution
function replaceLinkClickableLink(url = '') {
let pattern = new RegExp('^(https?:\\/\\/)?'+
'((([a-z\\d]([a-z\\d-]*[a-z\\d])*)\\.?)+[a-z]{2,}|'+
'((\\d{1,3}\\.){3}\\d{1,3}))'+
'(\\:\\d+)?(\\/[-a-z\\d%_.~+]*)*'+
'(\\?[;&a-z\\d%_.~+=-]*)?'+
'(\\#[-a-z\\d_]*)?$','i');
let isUrl = pattern.test(url);
if (isUrl) {
return `${url}`;
}
return url;
}
Replace URLs in text with HTML links, ignore the URLs within a href/pre tag.
https://github.com/JimLiu/auto-link
worked for me :
var urlRegex =/(\b((https?|ftp|file):\/\/)?((([a-z\d]([a-z\d-]*[a-z\d])*)\.)+[a-z]{2,}|((\d{1,3}\.){3}\d{1,3}))(\:\d+)?(\/[-a-z\d%_.~+]*)*(\?[;&a-z\d%_.~+=-]*)?(\#[-a-z\d_]*)?)/ig;
return text.replace(urlRegex, function(url) {
var newUrl = url.indexOf("http") === -1 ? "http://" + url : url;
return '' + url + '';
});

Regex to validate a texarea input which must be URLs separated by new lines

I am trying to create a regex which will ultimately be used with Google Forms to validate a texarea input.
The rule is,
Input area can have one or more URLs (http or https)
Each URL must be separated either by one or more new lines
Each line which has text, must be a single valid URL
Last URL may have or may not have new line character/s after it
Till now, I have written this regex ^(https?://.+[\r\n]+)*(https?://.+[\r\n]+?)$ but the problem is that if a line has more than 1 url, it validates that too.
Here is my testing playground: http://goo.gl/YPdvBH.
Here is what you are looking for
Demo , Demo with your URLS
function validate(ele) {
str = ele.value;
str = str.replace(/\r/g, "");
while (/\s\n/.test(str)) {
str = str.replace(/\s\n/g, "\n");
}
while (/\n\n/.test(str)) {
str = str.replace(/\n\n/g, "\n");
}
ele.value = str;
str = str.replace(/\n/g, "_!_&_!_").split("_!_&_!_")
var result = [], counter = 0;
for (var i = 0; i < str.length; i++) {
str[i] = str[i].replace(/(?:(?:^|\n)\s+|\s+(?:$|\n))/g, '').replace(/\s+/g, ' ');
if(str[i].length !== 0){
if (isValidAddress(str[i])) {
result.push(str[i]);
}
counter += 1;
}
}
function isValidAddress(s) {
return /^(https?|ftp):\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i.test(s)
}
return (result.length === str.length);
}
var ele = document.getElementById('urls');
validate(ele);
This is closer to the regex you are looking for:
^(https?://[\S]+[\r\n]+)*(https?://[\S]+[\r\n]+?)$
The difference between your regex and this one is that you use .+ which will match all characters except newline whereas I use [\S]+ (note it is a capital S) which will match all non-whitespace characters. So, this doesn't match more than one token on one line. Hence, on each line you can match at max one token and that must be of the form that you have defined.
For a regex to match a single URL, look at this question on StackOverflow:
What is the best regular expression to check if a string is a valid URL?
I don't know whether google-forms have a length limit. But if they have, it is sure to almost bounce into it.
If i understand right - in your regexp missing m flag for multiline, so you need something like this
/^(https?://.+this your reg exp for one url)$/m
sample with regexp from Javascript URL validation regex
/^(ht|f)tps?:\/\/[a-z0-9-\.]+\.[a-z]{2,4}\/?([^\s<>\#%"\,\{\}\\|\\\^\[\]`]+)?$/m

Categories

Resources