REGEX Str Replace- finding link text and making them links

REGEX Str Replace- finding link text and making them links - javascript

I've written this functionality in Flash before without issue; however, I'm now attempting to do it with JavaScript and I'm running into some difficulty.
Using Regex, I'm trying to scour a string looking for anything that resembles a link... for example http://www.google.com or http://stacoverflow.com/question/ask; then wrap that result with the appropriate: :
<script type="text/javascript>
var mystring = "I'm trying to make this link http://facebook.com/lenfontes active."
/// REGEX that worked in Flash to grab ALL parts of the URL including after the .com...
var http = /\b(([\w-]+:\/\/?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|\/)))/gi;
// preform the replace based on the Regex
mystring = mystring.replace(http, function(){
var link = arguments[0];
if (link.indexOf("http") == -1){
link = "http://" + link;}
return "<a href='"+link+"'>"+arguments[0]+"</a>";
});
$('#results).html(mystring);
</script>
The issue I'm having: anything after the ...com/ is ignored. Therefore the link isn't correct...
Even while I post this question, the Stack Overflow interface has taken my text and rendered it out with the appropriate link tags... I need to do that.
Any suggestions welcomed.
-- Update:
Sorry, let me expand.. I'm not only looking for http:// it needs to detect and wrap links that start with "https://" and/or just start with "www"

Your regex doesn't work because ECMA (JavaScript) doesn't support POSIX character classes, so [:punct:] ignores the . in .com.

Related

How to grab URLs in JavaScript without harming embedded objects and inline URL

I wrote a RegExp to grab and encode URLs in JavaScript.
This works fine but, it introduced a bug into my app.
I have a span Element which is used to display Emojis like this:
<span style="background:url(http://localhost/res/emo/face/E004.png)"></span>
Now, I'm using this Regular Expression to grab and convert anything URL into actual HTML clickable links:
/((https?:\/\/)?[\w-]+(\.[\w-]+)+\.?(:\d+)?(\/\S*)?)/ig
This ended up encoding the emoji URL into a clickable link.
Can anyone adjust that Code to Ignore URLs inside Elements or embedded Objects???
Please I need help!
This is the code:
var urlRegex = /((https?:\/\/)?[\w-]+(\.[\w-]+)+\.?(:\d+)?(\/\S*)?)/ig;
return txt.replace(urlRegex, function (url) {
var hyperlink = url;
if(!hyperlink.match('^https?:\/\/')) {
hyperlink = 'http://' + hyperlink;
}
return `${url}`;
});
I don't that the URLS inside
<span style="background:url(http://localhost/res/emo/face/E004.png)"></span>
were touched.

You would need to use negative look behind, which has limited support in JavaScript. (see here https://stackoverflow.com/a/50434875/6853740)
Simply adding negative look behind to your existing regex still doesn't work as expected:
((?<!url\()(https?:\/\/)?[\w-]+(\.[\w-]+)+\.?(:\d+)?(\/\S*)?) still matches "E004.png" in your example. Even other URL regexs from this post (What is the best regular expression to check if a string is a valid URL?) also match that. You may need to consider only looking for links that start with http:// or https:// which may help you recraft a regex that will only match full URLs.

Regex expression to match certain url behavior in my website

I have the following url
https://myurl/blogs/<blog-category>/<blog-article>
I've trying to create a regEx so i can thrigger a script only when i'm in an article.
i tried this among other tests but it didn't work and i'm not really the best guy building RegExs.
window.location.pathname.match(/\/blogs\/^[a-zA-Z0-9_.-]*$\/^[a-zA-Z0-9_.-]*$/
So in my understanding the first part of this regEx (\/blogs\/) is trying just to match a fixed string.
Then next parts just tries to match any kind of numeric,character and _.- combination (which is basically the potential strings that i can have there)
However this is not working at all.
My piece of script is looking like this
if(window.location.pathname.match(/\/blogs\/^[a-zA-Z0-9_.-]*$\/^[a-zA-Z0-9_.-]*$/){
// A code implementation here
}
Note: One thing that i noticed when writing this is that if i remove everything and just try
window.location.pathname.match(/\/blogs\/)
It doesn't work either.
Can someone help me solve this? I will also appreciate any guide that can help me improve my RegEx skills.
Thanks!
Update: to have this working i had to separate my condition into two things to get it to work properly.
It ended up looking like this:
var path = window.location.pathname;
const regEx = /\/blogs\/[a-zA-Z0-9_.-]*\/[a-zA-Z0-9_.-]*/i;
if(path.match(regEx)){
// My code here
}

This should work:
\/blogs\/[a-zA-Z0-9_.-]*\/[a-zA-Z0-9_.-]*
the "^" symbol checks that it is the start of a string which is not the case for the url in question
I would suggest using https://regexr.com/ for testing your regex to remove any other possible issues from other code

var patt = /\/blogs\/[a-zA-Z0-9_.-]*\/[a-zA-Z0-9_.-]*/i window.location.pathname.match(patt)
You can try using this

display current subdirectory URL of webpage

Having an issue with the following code, it displays the full URL of the current page, (eg, example.com/dir1) however I am looking to display the sub directories only (eg, /dir1/). Also having an issue where it does not display spaces properly, spaces show in html encoding %20. I have very little programming experience and any help would be greatly appreciated.
<p3><script>document.write(location.href);</script></p3>
EDITworking on the following script, however am having trouble implementing it
<script>str.replace("%20", " ")</script>any thoughts?
EDIT - answer which suited my needs, many thanks to brettc
var url = location.href;
url = url.split("examle.com").pop();
url = decodeURIComponent(url);
document.write(url);

This may not be the best way, but a way none the less.
You could just split the string by the ”/“ and take the last occurrence.
Also, decodeURIComponent() will decode the %20 to a space, as found here.
<script>
var url = location.href;// get url, put in url variable
url = url.split("/").pop();// get last element separated by “/“
url = decodeURIComponent(url);// remove any %20
alert(url);
</script>
Note: this will alert everything past the last “/“.

How to remove URL from a string completely in Javascript?

I have a string that may contain several url links (http or https). I need a script that would remove all those URLs from the string completely and return that same string without them.
I tried so far:
var url = "and I said http://fdsadfs.com/dasfsdadf/afsdasf.html";
var protomatch = /(https?|ftp):\/\//; // NB: not '.*'
var b = url.replace(protomatch, '');
console.log(b);
but this only removes the http part and keeps the link.
How to write the right regex that it would remove everything that follows http and also detect several links in the string?
Thank you so much!

You can use this regex:
var b = url.replace(/(?:https?|ftp):\/\/[\n\S]+/g, '');
//=> and I said
This regex matches and removes any URL that starts with http:// or https:// or ftp:// and matches up to next space character OR end of input. [\n\S]+ will match across multi lines as well.

Did you search for a url parser regex? This question has a few comprehensive answers Getting parts of a URL (Regex)
That said, if you want something much simpler (and maybe not as perfect), you should remember to capture the entire url string and not just the protocol.
Something like
/(https?|ftp):\/\/[\.[a-zA-Z0-9\/\-]+/
should work better. Notice that the added half parses the rest of the URL after the protocol.

Only match regex if it doesnt start with a pattern in javascript

I have a bit of a strange one here, I basically have a large chunk of text which may or may not contain links to images.
So lets say it does I have a pattern which will extract the image url fine, however once a match is found it is replaced with a element with the link as the src. Now the problem is there may be multiple matches within the text and this is where it gets tricky. As the url pattern will now match the src tags url, which will basically just enter an infinite loop.
So is there a way to ONLY match in regex if it doesnt start with a pattern like ="|=' ? as then it would match the url in something like:
some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6
but not
some image <img src="http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6">
I am not sure if it is possible, but if it is could someone point me in the right direction? A replace by itself will not suffice in this scenario as the url matched needs to be used elsewhere too so it needs to be used like a capture.
The main scenarios I need to account for are:
Many links in one block of varied text
A single link without any other text
A single link with other varied text
== edit ==
Here is the current regex I am using to match urls:
(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
== edit 2 ==
Just so everyone understands why I cannot use the /g command here is an answer which explains the issue, if I could use this /g like I originally tried then it would make things a lot simpler.
Javascript regex multiple captures again

What you are looking for is a negative look behind, but Javascript doesn't support any kind of look behinds, so you will either have to use a callback function to check what was matched and make sure it is not preceded by a ' or ", or you can use the following regex:
(?:^|[^"'])(\b(https?|ftp|file):\/\/[-a-zA-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
which has a single problem, that is in the case of a successful match it will catch one more character, the one right before the (\b(https?|ftp|file) pattern in the input, but I think you can deal with this easily.
Regex101 Demo

Using the /ig command at the end should work... the g is for global replace and the i is for case-insensitivity, which is necessary as you've only got A-Z instead of a-zA-Z.
Using the following vanilla JS appears to work for me (see jsfiddle)...
var test="some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
document.getElementById("output").innerHTML = test.replace(re,"<img src=\"$1\"/>");
Although, what it does highlight is that the query string part of the URL (the ?v=6 is not being picked up with your RegEx).
For jQuery, it would be (see jsfiddle)...
$(document).ready(function(){
var test="some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
$("#output").html(test.replace(re,"<img src=\"$1\"/>"));
});
Update
Just in case my example of using the same image URL in the example doesn't convince you - it also works with different URLs... see this jsfiddle update
var test="http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 http://cdn.sstatic.net/serverfault/img/sprites.png?v=7";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
document.getElementById("output").innerHTML = test.replace(re,"<img src=\"$1\"/>");

Couldn't you just see if there is a whitespace in front of the url, instead of that word-boundary? seems to work, although you will have to remove the matched whitespace later.
(\s(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
http://rubular.com/r/9wSc0HNWas
Edit: Damn, too slow :) I'll still leave this here as my regex is shorter ;)

as was said by freefaller, you might use /g flag to just find all matches in one go, if exec is not a must.
otherwise: you can add (="|=')? to the beginning of your regex, and check if $1 is undefined. if it is undefined, then it was not started with a ="|=' pattern

Develop Reference

JavaScript is the programming language of the Web.

REGEX Str Replace- finding link text and making them links - javascript

Your regex doesn't work because ECMA (JavaScript) doesn't support POSIX character classes, so [:punct:] ignores the . in .com.

Related

How to grab URLs in JavaScript without harming embedded objects and inline URL

Regex expression to match certain url behavior in my website

display current subdirectory URL of webpage

How to remove URL from a string completely in Javascript?

Only match regex if it doesnt start with a pattern in javascript

Categories

Resources