I have a paragraph of text which may contain some links in plain text, or some links which are actually links.
For example:
Posting a link: http://test.com, posting an image <img src="http://test.com/2.jpg" />. Posting an actual A tag: http://test.com/test.html
I need to fish out the unformatted links from this piece of text. So any regular expression that will match the first case, but not the second or third case because they are already well formatted links.
I've managed to fish out all the links with this regex: ((http:|https:)\/\/[a-zA-Z0-9&#=.\/\-?_]+), however, am still having trouble distinguishing between the cases.
This needs to be in javascript so I don't think negative lookbehind is allowed.
Any help would be appreciated.
EDIT: I'm trying to wrap the fished out unformatted links in an a tag.
You can use this regex to get URLs outside of tags:
(?![^<]*>|[^<>]*<\/)((http:|https:)\/\/[a-zA-Z0-9&#=.\/\-?_]+)
See demo
We can shorten it a bit, too, with an i option:
(?![^<]*>|[^<>]*<\/)((https?:)\/\/[a-z0-9&#=.\/\-?_]+)
See another demo
Sample code:
var re = /(?![^<]*>|[^<>]*<\/)((https?:)\/\/[a-z0-9&#=.\/\-?_]+)/gi;
var str = 'Posting a link: http://test.com, posting an image <img src="http://test.com/2.jpg" />. Posting an actual A tag: http://test.com/test.html';
var val = re.exec(str);
document.getElementById("res").innerHTML = "<b>URL Found</b>: " + val[1];
var subst = '$1';
var result = str.replace(re, subst);
document.getElementById("res").innerHTML += "<br><b>Replacement Result</b>: " + result;
<div id="res"/>
Update:
To allow capturing inside specific tags, you can whitelist them like this:
var re = /(?![^<]*>|[^<>]*<\/(?!(?:p|pre)>))((https?:)\/\/[a-z0-9&#=.\/\-?_]+)/gi;
Related
I have a requirement , my client send me a string. for the links he is sending link title in squre brackets and link with bracket. like below,
[Google](https://www.google.com/)
I need get that value and make it clickable Google . adding like below and replace that to the original text.
' + url + ''
can anyone suggest better way of doing this with JavaScript regex.
Looks like Markdown formatting, so you could use a markdown library like Marked to parse and render it:
const s = '[Google](https://www.google.com/) ';
document.getElementById('content').innerHTML = marked(s);
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
<div id="content"></div>
Can be done with String replace function.
Regex: /\[(.*)\]\s*\((.*)\)/g
Replacer: $1
const str = `Lorem ipsum. [Google](https://www.google.com/). Sample text.`
const output = replaceWithLinks(str);
console.log(output);
function replaceWithLinks(str) {
return str.replace(/\[(.*)\]\s*\((.*)\)/g, '$1')
}
In HTML file:
<h1 id="header"></h1>
In Js File:
const myString = "[Google](https://www.google.com/)";
const show = myString.match(/\[(.*?)\]/); // it return two things. The first is with bracket and the second one is without bracket you have to use without bracket.
const url = myString.match(/\((.*?)\)/);
document.getElementById("header").innerHTML = `${show[1]}`;
You have to use regular expression. To get information about regular expression read MDN.
Get the first index value and show it to the UI.
Regexp is very handy for this purpose. just copy below code to F12 console for a preview
"text before [Google](https://www.google.com/) and after".replace(/\[(.*?)\]\((.*?)\)/gm, '$1')
ps: the code copy from a simple markdown parser
I found the following function:
function addHyperlinks(str) {
// Set the regex string
var regex = /(https?:\/\/([-\w\.]+)+(:\d+)?(\/([\w\/_\.]*(\?\S+)?)?)?)/ig
// Replace plain text links by hyperlinks
var replaced_text = str.replace(regex, "<a href='$1' target='_blank'>$1</a>");
// Echo link
return replaced_text;
}
Which works okay, however when there is a dash in the URL it stops processing there. So for instance, the following URL:
http://website.com/some-internet-page
Will get replaced with:
<a href='http://website.com/some'>http://website.com/some</a>-internet-page
I'm not good with regex, could anyone help modify the above so that this doesnt happen?
#Tonny said it in the comments above:
/(https?://([-\w-.]+)+(:\d+)?(/([\w-/_.]*(\?\S+)?)?)?)/ig
Thank you!
I have a strings where some html tags could present, like
this is a nice day for bowling <b>bbbb</b>
how can I replace with RegExp all b symbols, for example, with :blablabla: (for example) but ONLY outside html tags?
So in that case the resulting string should become
this is a nice day for :blablabla:owling <b>bbbb</b>
EDIT: I would like to be more specific, based on the answers I have received. So first of all I have just a string, not DOM element, or anything else. The string may or may not contain tags (opening and closing). The main idea is to be able to replace anywhere in the text except inside tags. For example if I have a string like
not feeling well today :/ check out this link http://example.com
the regexp should replace only first :/ with real smiley image, but should not replace second and third, because they are inside (and part of) tag. Here's an example snippet using the regexp from one of the answer.
var s = 'not feeling well today :/ check out this link http://example.com';
var replaced = s.replace(/(?:<[^\/]*?.*?<\/.*?>)|(:\/)/g, "smiley_image_here");
document.querySelector("pre").textContent = replaced;
<pre></pre>
It is strange but the DEMO shows that it captured the correct group, but the same regexp in replace function seem not to be working.
The regex itself to replace all bs with :blablabla: is not that hard:
.replace(/b/g, ":blablabla:")
It is a bit tricky to get the text nodes where we need to perform search and replace.
Here is a DOM-based example:
function replaceTextOutsideTags(input) {
var doc = document.createDocumentFragment();
var wrapper = document.createElement('myelt');
wrapper.innerHTML = input;
doc.appendChild( wrapper );
return textNodesUnder(doc);
}
function textNodesUnder(el){
var n, walk=document.createTreeWalker(el,NodeFilter.SHOW_TEXT,null,false);
while(n=walk.nextNode())
{
if (n.parentNode.nodeName.toLowerCase() === 'myelt')
n.nodeValue = n.nodeValue.replace(/:\/(?!\/)/g, "smiley_here");
}
return el.firstChild.innerHTML;
}
var s = 'not feeling well today :/ check out this link http://example.com';
console.log(replaceTextOutsideTags(s));
Here, we only modify the text nodes that are direct children of the custom-created element named myelt.
Result:
not feeling well today smiley_here check out this link http://example.com
var input = "this is a nice day for bowling <b>bbbb</b>";
var result = input.replace(/(^|>)([^<]*)(<|$)/g, function(_,a,b,c){
return a
+ b.replace(/b/g, ':blablabla:')
+ c;
});
document.querySelector("pre").textContent = result;
<pre></pre>
You can do this:
var result = input.replace(/(^|>)([^<]*)(<|$)/g, function(_,a,b,c){
return a
+ b.replace(/b/g, ':blablabla:') // you may do something else here
+ c;
});
Note that in most (no all but most) real complex use cases, it's much more convenient to manipulate a parsed DOM rather than just a string. If you're starting with a HTML page, you might use a library (some, like my one, accept regexes to do so).
I think you can use a regex like this : (Just for a simple data not a nested one)
/<[^\/]*?b.*?<\/.*?>|(b)/ig
[Regex Demo]
If you wanna use a regex I can suggest you use below regex to remove all tags recursively until all tags removed:
/<[^\/][^<]*>[^<]*<\/.*?>/g
then use a replace for finding any b.
Using Javascript, I'm trying to wrap span tags around certain text on the page, but I don't want to wrap tags around text already inside a set of span tags.
Currently I'm using:
html = $('#container').html();
var regex = /([\s| ]*)(apple)([\s| ]*)/g;
html = html.replace(regex, '$1<span class="highlight">$2</span>$3');
It works but if it's used on the same string twice or if the string appears in another string later, for example 'a bunch of apples' then later 'apples', I end up with this:
<span class="highlight">a bunch of <span class="highlight">apples</span></span>
I don't want it to replace 'apples' the second time because it's already inside span tags.
It should match 'apples' here:
Red apples are my <span class="highlight">favourite fruit.</span>
But not here:
<span class="highlight">Red apples are my favourite fruit.</span>
I've tried using this but it doesn't work:
([\s| ]*)(apples).*(?!</span)
Any help would be appreciated. Thank you.
First off, you should know that parsing html with regex is generally considered to be a bad idea—a Dom parser is usually recommended. With this disclaimer, I will show you a simple regex solution.
This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."
We can solve it with a beautifully-simple regex:
<span.*?<\/span>|(\bapples\b)
The left side of the alternation | matches complete <span... /span> tags. We will ignore these matches. The right side matches and captures apples to Group 1, and we know they are the right ones because they were not matched by the expression on the left.
This program shows how to use the regex (see the results in the right pane of the online demo). Please note that in the demo I replaced with [span] instead of <span> so that the result would show in the browser (which interprets the html):
var subject = 'Red apples are my <span class="highlight">favourite apples.</span>';
var regex = /<span.*?<\/span>|(\bapples\b)/g;
replaced = subject.replace(regex, function(m, group1) {
if (group1 == "" ) return m;
else return "<span class=\"highlight\">" + group1 + "</span>";
});
document.write("<br>*** Replacements ***<br>");
document.write(replaced);
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
Article about matching a pattern unless...
My question is related only to JavaScript regular expressions.
I'm building a simple Lightbox for Wordpress with Mootools JavaScript framework.
Wordpress stores pictures in a variety of sizes with file names like:
'image-50-50x100.jpg'
'image-50-150x100.jpg'
'image-50-1024x698.jpg'
'image-50.jpg'
When a user clicks a thumbnail image, I have to convert the source of that image into the source of full size image, and then preload that full-size image.
The question
How to change string like this:
'http://some-path/image-50-50x100.jpg'
'http://some-path/image-50-150x100.jpg'
'http://some-path/image-50-1024x698.jpg'
'http://some-path/image-50.jpg'
, into:
'http://some-path/image-50.jpg'
Missing piece is accurate regular-expression in code below:
source.replace( /regular-expression/, '' );
Thanks in advance.
This should do it:
str = str.replace(/-\d+x\d+/, '');
E.g.:
var str = 'http://some-path/image-50-1024x698.jpg';
str = str.replace(/-\d+x\d+/, '');
console.log(str); // "http://some-path/image-50.jpg"
And for the case where you don't want it to change, it doesn't:
var str = 'http://some-path/image-50.jpg';
str = str.replace(/-\d+x\d+/, '');
console.log(str); // "http://some-path/image-50.jpg"
Edit: You've said in a comment elsewhere that:
In some rare cases it can happen that Wordpress user uploads image like image-1024x698.jpg, then Wordpress creates thumb image like image-1024x698-300x300.jpg
Okay, so we add \. and . to the above:
var str = 'http://some-path/image-1024x698-300x300.jpg';
str = str.replace(/-\d+x\d+\./, '.');
console.log(str); // "http://some-path/image-1024x698.jpg"
Try:
source.replace(/(.+\/[^-]+-[^-]+)(-\d+x\d+)*\.([^\.]+)$/, '$1.$3')