wrapping keywords in hyperlinks - replacement takes place infinitely many times - javascript

Trying to wrap specific keywords in hyperlinks, but replacements take place inifitely many times:
var replacements = [
{ txt: 'magicFunction', link: 'https://www.example.com/doc/api/magicFunction.htm' },
];
$(function() {
$.each(replacements,
function() {
var searchWord = this.txt;
var link = this.link;
$('body:contains("' + searchWord + '")').each(function() {
var newHtml = $(this).html().replace(searchWord,
'' + searchWord + '');
$(this).html(newHtml);
});
}
);
});
I'd need a condition around the matching part to say that if is already wrapped in a hyperlink then don't do anything, or some other workaround.
How can it be fixed?
https://jsfiddle.net/m4j28s13/

You can select all nodes in the body but exclude all <a> elements:
$('body *:not(a):contains("' + searchWord + '")').each(...)
See proof-of-concept example:
var replacements = [{
txt: 'magicFunction',
link: 'https://www.example.com/doc/api/magicFunction.htm'
}, ];
$.each(replacements,
function() {
var searchWord = this.txt;
var link = this.link;
$('body *:not(a):contains("' + searchWord + '")').each(function() {
var newHtml = $(this).html().replace(searchWord,
'' + searchWord + '');
$(this).html(newHtml);
});
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<p>This sentence mentions magicFunction() in a paragraph.</p>
<p>The following code block (from API reference) mentions it too:</p>
<code class="code-block hljs lua">if a==0 then
h=magicFunction('foo')
end</code>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.6.0/highlight.min.js"></script>
Update: to handle cases where an <a> element may contain nested tags that contain the replacement word, another solution will be to actually replace :contains with a custom guard clause in the callback, which will check if the child textNodes contain the keyword. If it does, then perform the replacement:
var replacements = [{
txt: 'magicFunction',
link: 'https://www.example.com/doc/api/magicFunction.htm'
}, ];
$.each(replacements,
function() {
var searchWord = this.txt;
var link = this.link;
$('*:not(a, script)').each(function() {
const textContent = $(this).contents().filter(function() {
return this.nodeType === Node.TEXT_NODE;
}).text();
if (textContent.match(searchWord)) {
var newHtml = $(this).html().replace(searchWord,
'' + searchWord + '');
$(this).html(newHtml);
}
});
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<p>This sentence mentions magicFunction() in a paragraph.</p>
<p>The following code block (from API reference) mentions it too:</p>
<code class="code-block hljs lua">if a==0 then
h=magicFunction('foo')
end</code>
<p>This mention is already linked (should not be linked again): <a class="postlink" href="//www.example2.com/doc/api/magicFunction"><code style="display:inline">magicFunction</code></a></p>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/10.6.0/highlight.min.js"></script>

Related

Function to remove <span></span> from string in an json object array in JavaScript

I know there are many similar questions posted, and have tried a couple solutions, but would really appreciate some guidance with my specific issue.
I would like to remove the following HTML markup from my string for each item in my array:
<SPAN CLASS="KEYWORDSEARCHTERM"> </SPAN>
I have an array of json objects (printArray) with a printArray.header that might contain the HTML markup.
The header text is not always the same.
Below are 2 examples of what the printArray.header might look like:
<SPAN CLASS="KEYWORDSEARCHTERM">MOST EMPOWERED</SPAN> COMPANIES 2016
RECORD WINE PRICES AT <SPAN CLASS="KEYWORDSEARCHTERM">NEDBANK</SPAN> AUCTION
I would like the strip the HTML markup, leaving me with the following results:
MOST EMPOWERED COMPANIES 2016
RECORD WINE PRICES AT NEDBANK AUCTION
Here is my function:
var newHeaderString;
var printArrayWithExtract;
var summaryText;
this.setPrintItems = function(printArray) {
angular.forEach(printArray, function(printItem){
if (printItem.ArticleText === null) {
summaryText = '';
}
else {
summaryText = '... ' + printItem.ArticleText.substring(50, 210) + '...';
}
// Code to replace the HTML markup in printItem.header
// and return newHeaderString
printArrayWithExtract.push(
{
ArticleText: printItem.ArticleText,
Summary: summaryText,
Circulation: printItem.Circulation,
Headline: newHeaderString,
}
);
});
return printArrayWithExtract;
};
Try this function. It will remove all markup tags...
function strip(html)
{
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent || tmp.innerText || "";
}
Call this function sending the html as a string. For example,
var str = '<SPAN CLASS="KEYWORDSEARCHTERM">MOST EMPOWERED</SPAN> COMPANIES 2016';
var expectedText = strip(str);
Here you find your expected text.
It can be done using regular expressions, see below:
var s1 = '<SPAN CLASS="KEYWORDSEARCHTERM">MOST EMPOWERED</SPAN> COMPANIES 2016';
var s2 = 'RECORD WINE PRICES AT <SPAN CLASS="KEYWORDSEARCHTERM">NEDBANK</SPAN> AUCTION';
function removeSpanInText(s) {
return s.replace(/<\/?SPAN[^>]*>/gi, "");
}
$("#x1").text(removeSpanInText(s1));
$("#x2").text(removeSpanInText(s2));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
1 ->
<span id="x1"></span>
<br/>2 ->
<span id="x2"></span>
For more info, see e.g. Javascript Regex Replace HTML Tags.
And jQuery is not needed, just used here to show the output.
I used this little replace function:
if (printItem.Headline === null) {
headlineText = '';
}
else {
var str = printItem.Headline;
var rem1 = str.replace('<SPAN CLASS="KEYWORDSEARCHTERM">', '');
var rem2 = rem1.replace('</SPAN>', '');
var newHeaderString = rem2;
}

Twitter API - filter out #, # and other links

I'm using the Twitter API to get top 5 tweets for my app. I need to highlight, or link parts of the tweets differently. Ex, #'s will be orange, #'s will be red and clickable, etc...
From their API, they offer user_timeline endpoint:
https://dev.twitter.com/rest/reference/get/statuses/user_timeline
But the tweets object's text returns with those special characters embedded within it. I don't see options to pull out those #, # and href from the object:
Tweets object:
{
...
text: "This is some text #tagName that I'd like to #parse here https://t.co/m9Addr4IlS",
...
}
While I can write my own string parser to look for those things, is there something the Twitter API offers to handle this?
EDIT: <tweets> is an Angular directive that ng-repeats over my tweets from ModulesService. replace doesn't seem to be appending the DOM tags
scope.getTweets = function() {
ModulesService.getTweets().success(function(res) {
if (res && Array.isArray(res)) {
scope.tweets = parseTweets(res);
}
});
};
scope.getTweets();
var parseTweets = function (tweets) {
tweets.forEach(function (tweet) {
tweet.text.replace(/(#[^ ]+)/g, '<a class="user">$1</a>').
replace(/(#[^ ]+)/g, '<span class="hash">$1</span>').
replace(/(https?:\/\/[^ ]+)/g, '$1');
console.log('tweet!', tweet.text); //does not contain altered HTML
});
return tweets;
};
HTML:
<div ng-repeat="tweet in tweets" class="post-body clearfix">
{{tweet.text}}
</div>
recommended solution
The library twitter-text does the work for you.
As per their examples:
autolink
var twitter = require('twitter-text')
twitter.autoLink(twitter.htmlEscape('#hello < #world >'))
extract entities
var usernames = twttr.txt.extractMentions("Mentioning #twitter and #jack")
// usernames == ["twitter", "jack"]
Using that solution will save you from re-inventing the wheel and will provide you with a stable working code :)
alternative
Inside the tweet object that you receive from the user_timeline API endpoint, the entities property stores the list of urls, hashtags and mentions included inside the tweet. These contain the text content as well as the position (start / end character indices) of each entity.
Example hashtag entity:
"entities": {
"hashtags": [
"text": "pleaseRT"
"indices": [
6,
13
]
]
cf Entities documentation for more info.
Try:
var text = "This is some text #tagName that I'd like to #parse here https://t.co/m9Addr4IlS";
var div = document.getElementsByTagName('div')[0];
div.innerHTML = text.replace(/(#[^ ]+)/g, '<a class="user">$1</a>').
replace(/(#[^ ]+)/g, '<span class="hash">$1</span>').
replace(/(https?:\/\/[^ ]+)/g, '$1');
.hash { color: orange; }
.user { color: red; }
<div></div>
Loop over the returned tweets and modify the tweet text according to some conditions:
returnValues.forEach(function (tweet) {
if (tweet.text.search(/#|#/ig) > -1) {
var words = obj.text.split(' ');
var parsedTweetText = words.map(function (word) {
if (word.indexOf('#') === 0)
return '<span class="hashtag">' + word + '</span>';
else if (word.indexOf('#') === 0)
return '<span class="at-user">' + word + '</span>';
else
return word;
}).join(' ');
tweet.text = parsedTweetText;
}
});

jquery target text nodes that begin with (string1) and end with (string2)

I'd like to be able to target (and then remove) this string of text:
[UPLOAD]any-possible_FILEname.ANY[/UPLOAD]
HTML:
var filenameRegex = new RegExp("\w\d\.");
$('.posts').contents().filter(':contains([UPLOAD])').filter(':contains([/UPLOAD])').filter(function() {
return filenameRegex.test($(this).text());
console.log('yep');
}).remove();
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<span class="posts">
This is a forum post with lots of blabbing in it.
[UPLOAD]this-is-a-random-unknown-filename.jpg[/UPLOAD]
(Note the above always begins with [UPLOAD] and ends
with [/UPLOAD]. Also note that between these 'tags'
is 1 filename of some kind, such as an image or audio
or text file)
</span>
Thanks, much obliged.
var filenameRegex = /\[UPLOAD][\w\.\-]+\[\/UPLOAD]/gi;
$('.posts').filter(':contains([UPLOAD]):contains([/UPLOAD])').filter(function(i,e) {
return filenameRegex.test($(e).text());
}).each(function(i,e){
$(e).text($(e).text().replace(filenameRegex,''));
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<span class="posts">
This is a forum post with lots of blabbing in it.
[UPLOAD]this-is-a-random-unknown-filename.jpg[/UPLOAD]
(Note the above always begins with [UPLOAD] and ends
with [/UPLOAD]. Also note that between these 'tags'
is 1 filename of some kind, such as an image or audio
or text file)
</span>
The regex you need is:
/\[UPLOAD\](.*)\[\/UPLOAD\]/gi
So your code should be like:
var filenameRegex = new RegExp("/\[UPLOAD\](.*)\[\/UPLOAD\]/gi");
$('.posts').contents().filter(':contains([UPLOAD])').filter(':contains([/UPLOAD])').filter(function(){
return filenameRegex.test($(this).text());
console.log('yep');
}).remove();
Not a regular expression, but this seems to do the trick:
var text = 'hello [UPLOAD]very[/UPLOAD] [UPLOAD]cruel[/UPLOAD] world';
var start = '[UPLOAD]';
var end = '[/UPLOAD]';
var index = text.indexOf(start);
while (index > -1)
{
var end_index = text.indexOf(end);
var removed_text = text.substring(index, end_index + end.length));
text = text.substring(0, index) + text.substring(end_index + end.length);
index = text.indexOf(start);
}

How to replace text between tags only

I have multiple h2 tags and every tag contains text and has a custom attribute called data-options. This attribute has multiple options separated by commas and one of these options is the h2 tag text itself.
HTML:
<h3 id='test' data-options='happy,sad,fantastic'>sad</h3>
<h3 id='test2' data-options='1,2,3,4'>3</h3>
jQuery:
var indexArray = ['happy','1'];
$('h3').each(function(i){
var $this = $(this),
value = $this.text(),
code = $('body').html();
code = code.replace(value, indexArray[i]);
$('body').html(code);
});
This is what I expect:
<h3 id='test' data-options='happy,sad,fantastic'>happy</h3>
<h3 id='test2' data-options='1,2,3,4'>1</h3>
Instead I get this:
<h1 id="test" data-options="happy,happy,fantastic">sad</h1>
<h3 id="test2" data-options="1,2,3,4">3</h3>
As you can see the script changes the first text it encounters, not the one inside the tags.
This is a working demo for the script : http://jsfiddle.net/fs1sfztx/
It makes no sense to do a replace unless you're searching. Which means you have to be provided what to search for and what to replace it with. So far only the text to in the h3 is given.
Assuming that each index in indexArray matches the index of each h3 element, then you can compare the current indexArray element indexArray[i] with each of the words in the corresponding elements data-options attribute. When a matches, set the innerText prop to that word.
var indexArray = ['happy','1'];
$('h3').text( function( i, txt ) {
var that = $(this);
var ntxt = txt;
$.each( that.data('options').split(','), function( j, u ) {
indexArray[i] !== u || (ntxt = u);
});
return ntxt;
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<h3 id='test' data-options='happy,sad,fantastic'>sad</h3>
<h3 id='test2' data-options='1,2,3,4'>3</h3>
If all you wanted is to play with the replace method, then you could use something like this:
var indexArray = ['happy','1'];
var allhtml = $('body').html();
$('h3').each( function( i ) {
var txt = $(this).text();
var re = new RegExp( '>' + txt + '<', 'g' );
allhtml = allhtml.replace( re, '>' + indexArray[i] + '<' );
});
$('body').html( allhtml );
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<h3 id='test' data-options='happy,sad,fantastic'>sad</h3>
<h3 id='test2' data-options='1,2,3,4'>3</h3>

How to remove <div> and <br> using Cheerio js?

I have the following html that I like to parse through Cheerios.
var $ = cheerio.load('<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>This works well.</div><div><br clear="none"/></div><div>So I have been doing this for several hours. How come the space does not split? Thinking that this could be an issue.</div><div>Testing next paragraph.</div><div><br clear="none"/></div><div>Im testing with another post. This post should work.</div><div><br clear="none"/></div><h1>This is for test server.</h1></body></html>', {
normalizeWhitespace: true,
});
// trying to parse the html
// the goals are to
// 1. remove all the 'div'
// 2. clean up <br clear="none"/> into <br>
// 3. Have all the new 'empty' element added with 'p'
var testData = $('div').map(function(i, elem) {
var test = $(elem)
if ($(elem).has('br')) {
console.log('spaceme');
var test2 = $(elem).removeAttr('br');
} else {
var test2 = $(elem).removeAttr('div').add('p');
}
console.log(i +' '+ test2.html());
return test2.html()
})
res.send(test2.html())
My end goals are to try and parse the html
remove all the div
clean up <br clear="none"/> and change into <br>
and finally have all the empty 'element' (those sentences with 'div') remove to be added with 'p' sentence '/p'
I try to start with a smaller goal in the above code I have written. I tried to remove all the 'div' (it is a success) but I'm unable to to find the 'br. I been trying out for days and have no head way.
So I'm writing here to seek some help and hints on how can I get to my end goal.
Thank you :D
It's easier than it looks, first you iterate over all the DIV's
$('div').each(function() { ...
and for each div, you check if it has a <br> tag
$(this).find('br').length
if it does, you remove the attribute
$(this).find('br').removeAttr('clear');
if not you create a P with the same content
var p = $('<p>' + $(this).html() + '</p>');
and then just replace the DIV with the P
$(this).replaceWith(p);
and output
res.send($.html());
All together it's
$('div').each(function() {
if ( $(this).find('br').length ) {
$(this).find('br').removeAttr('clear');
} else {
var p = $('<p>' + $(this).html() + '</p>');
$(this).replaceWith(p);
}
});
res.send($.html());
You don't want to remove an attribute you want to remove the tag and so you want to switch removeAttr to remove, like so:
var testData = $('div').map(function(i, elem) {
var test = $(elem)
if ($(elem).has('br')) {
console.log('spaceme');
var test2 = $(elem).remove('br');
} else {
var test2 = $(elem).remove('div').add('p');
}
console.log(i +' '+ test2.html());
return test2.html()
})

Categories

Resources