Is it possible to wrap each word on HTML page with span element?
I'm trying something like
/(\s*(?:<\/?\w+[^>]*>)|(\b\w+\b))/g
but results far from what I need.
Thanks in advance!
Well, I don't ask for the reason, you could do it like this:
function getChilds( nodes ) {
var len = nodes.length;
while( len-- ) {
if( nodes[len].childNodes && nodes[len].childNodes.length ) {
getChilds( nodes[len].childNodes );
}
var content = nodes[len].textContent || nodes[len].text;
if( nodes[len].nodeType === 3 ) {
var parent = nodes[len].parentNode,
newstr = content.split(/\s+/).forEach(function( word ) {
var s = document.createElement('span');
s.textContent = word + ' ';
parent.appendChild(s);
});
parent.removeChild( nodes[len] );
}
};
}
getChilds( document.body.childNodes );
Even tho I have to admit I didn't test the code yet. That was just the first thing which came to my mind. Might be buggy or screw up completely, but for that case I know the gentle and kind stackoverflow community will kick my ass and downvote like hell :-p
You're going to have to get down to the "Text" nodes to make this happen. Without making it specific to a tag, you really to to traverse every element on the page, wrap it, and re-append it.
With that said, try something like what a garble post makes use of (less making fitlers for words with 4+ characters and mixing the letters up).
To get all words between span tags from current page, you can use:
var spans = document.body.getElementsByTagName('span');
if (spans)
{
for (var i in spans)
{
if (spans[i].innerHTML && !/[^\w*]/.test(spans[i].innerHTML))
{
alert(spans[i].innerHTML);
}
}
}
else
{
alert('span tags not found');
}
You should probably start off by getting all the text nodes in the document, and working with their contents instead of on the HTML as a plain string. It really depends on the language you're working with, but you could usually use a simple XPath like //text() to do that.
In JavaScript, that would be document.evaluate('//text()', document.body, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null), than iterating over the results and working with each text node separately.
See demo
Here's how I did it, may need some tweaking...
var wrapWords = function(el) {
var skipTags = { style: true, script: true, iframe: true, a: true },
child, tag;
for (var i = el.childNodes.length - 1; i >= 0; i--) {
child = el.childNodes[i];
if (child.nodeType == 1) {
tag = child.nodeName.toLowerCase();
if (!(tag in skipTags)) { wrapWords(child); }
} else if (child.nodeType == 3 && /\w+/.test(child.textContent)) {
var si, spanWrap;
while ((si = child.textContent.indexOf(' ')) >= 0) {
if (child != null && si == 0) {
child.splitText(1);
child = child.nextSibling;
} else if (child != null) {
child.splitText(si);
spanWrap = document.createElement("span");
spanWrap.innerHTML = child.textContent;
child.parentNode.replaceChild(spanWrap, child);
child = spanWrap.nextSibling;
}
}
if (child != null) {
spanWrap = document.createElement("span");
spanWrap.innerHTML = child.textContent;
child.parentNode.replaceChild(spanWrap, child);
}
}
}
};
wrapWords(document.body);
See demo
Related
I want to insert html tags within a text node with TreeWalker, but TreeWalker forces my html brackets into & lt; & gt; no matter what I've tried. Here is the code:
var text;
var tree = document.createTreeWalker(document.body,NodeFilter.SHOW_TEXT);
while (tree.nextNode()) {
text = tree.currentNode.nodeValue;
text = text.replace(/(\W)(\w+)/g, '$1<element onmouseover="sendWord(\'$2\')">$2</element>');
text = text.replace(/^(\w+)/, '<element onmouseover="sendWord(\'$1\')">$1</element>');
tree.currentNode.nodeValue = text;
}
Using \< or " instead of ' won't help. My workaround is to copy all of the DOM tree to a string and to replace the html body with that. It works on very simple webpages and solves my first problem, but is a bad hack and won't work on anything more than a trivial page. I was wondering if I could just work straight with the text node rather than use a workaround. Here is the code for the (currently buggy) workaround:
var text;
var newHTML = "";
var tree = document.createTreeWalker(document.body);
while (tree.nextNode()) {
text = tree.currentNode.nodeValue;
if (tree.currentNode.nodeType == 3){
text = text.replace(/(\W)(\w+)/g, '$1<element onmouseover="sendWord(\'$2\')">$2</element>');
text = text.replace(/^(\w+)/, '<element onmouseover="sendWord(\'$1\')">$1</element>');
}
newHTML += text
}
document.body.innerHTML = newHTML;
Edit: I realize a better workaround would be to custom tag the text nodes ((Customtag_Start_Here) etc.), copy the whole DOM to a string, and use my customs tags to identify text nodes and modify them that way. But if I don't have to, I'd rather not.
To 'change' a text node into an element, you must replace it with an element. For example:
var text = tree.currentNode;
var el = document.createElement('foo');
el.setAttribute('bar','yes');
text.parentNode.replaceChild( el, text );
If you want to retain part of the text node, and inject an element "in the middle", you need to create another text node and insert it and the element into the tree at the appropriate places in the tree.
Edit: Here's a function that might be super useful to you. :)
Given a text node, it runs a regex on the text values. For each hit that it finds it calls a custom function that you supply. If that function returns a string, then the match is replaced. However, if that function returns an object like:
{ name:"element", attrs{onmouseover:"sendWord('foo')"}, content:"foo" }
then it will split the text node around the match and inject an element in that location. You can also return an array of strings or those objects (and can recursively use arrays, strings, or objects as the content property).
Demo: http://jsfiddle.net/DpqGH/8/
function textNodeReplace(node,regex,handler) {
var mom=node.parentNode, nxt=node.nextSibling,
doc=node.ownerDocument, hits;
if (regex.global) {
while(node && (hits=regex.exec(node.nodeValue))){
regex.lastIndex = 0;
node=handleResult( node, hits, handler.apply(this,hits) );
}
} else if (hits=regex.exec(node.nodeValue))
handleResult( node, hits, handler.apply(this,hits) );
function handleResult(node,hits,results){
var orig = node.nodeValue;
node.nodeValue = orig.slice(0,hits.index);
[].concat(create(mom,results)).forEach(function(n){
mom.insertBefore(n,nxt);
});
var rest = orig.slice(hits.index+hits[0].length);
return rest && mom.insertBefore(doc.createTextNode(rest),nxt);
}
function create(el,o){
if (o.map) return o.map(function(v){ return create(el,v) });
else if (typeof o==='object') {
var e = doc.createElementNS(o.namespaceURI || el.namespaceURI,o.name);
if (o.attrs) for (var a in o.attrs) e.setAttribute(a,o.attrs[a]);
if (o.content) [].concat(create(e,o.content)).forEach(e.appendChild,e);
return e;
} else return doc.createTextNode(o+"");
}
}
It's not quite perfectly generic, as it does not support namespaces on attributes. But hopefully it's enough to get you going. :)
You would use it like so:
findAllTextNodes(document.body).forEach(function(textNode){
replaceTextNode( textNode, /\b\w+/g, function(match){
return {
name:'element',
attrs:{onmouseover:"sendWord('"+match[0]+"')"},
content:match[0]
};
});
});
function findAllTextNodes(node){
var walker = node.ownerDocument.createTreeWalker(node,NodeFilter.SHOW_TEXT);
var textNodes = [];
while (walker.nextNode())
if (walker.currentNode.parentNode.tagName!='SCRIPT')
textNodes.push(walker.currentNode);
return textNodes;
}
or if you want something closer to your original regex:
replaceTextNode( textNode, /(^|\W)(\w+)/g, function(match){
return [
match[1], // might be an empty string
{
name:'element',
attrs:{onmouseover:"sendWord('"+match[2]+"')"},
content:match[2]
}
];
});
Function that returns the parent element of any text node including partial match of passed string:
function findElByText(text, mainNode) {
let textEl = null;
const traverseNodes = function (n) {
if (textEl) {
return;
}
for (var nodes = n.childNodes, i = nodes.length; i--;) {
if (textEl) {
break;
}
var n = nodes[i], nodeType = n.nodeType;
// Its a text node, check if it matches string
if (nodeType == 3) {
if (n.textContent.includes(text)) {
textEl = n.parentElement;
break;
}
}
else if (nodeType == 1 || nodeType == 9 || nodeType == 11) {
traverseNodes(n);
}
}
}
traverseNodes(mainNode);
return textEl;
}
Usage:
findElByText('Some string in document', document.body);
As the title says, I am looking for a way of comparing the Text content of an HTML Element with another HTML Elements's Text content and only if they are identical, alert a message. Any thoughts? Greatly appreciate it!
(Posted with code): For example, I can't equalize the remItem's content with headElms[u]'s content.
else if (obj.type == 'checkbox' && obj.checked == false) {
var subPal = document.getElementById('submissionPanel');
var remItem = obj.parentNode.parentNode.childNodes[1].textContent;
alert("You have disselected "+remItem);
for (var j=0; j < checkSum.length; j++) {
if (remItem == checkSum[j]) {
alert("System found a match: "+checkSum[j]+" and deleted it!");
checkSum.splice(j,1);
} else {
//alert("There were no matches in the search!");
}
}
alert("Next are...");
alert("This is the checkSum: "+checkSum);
alert("Worked!!!");
var headElms = subPal.getElementsByTagName('h3');
alert("We found "+headElms.length+" elements!");
for (var u=0; u < headElms.length; u++){
alert("YES!!");
if (remItem == headElms[u].textContent) {
alert("System found a matching element "+headElms[u].textContent+" and deleted it!");
}
else {
alert("NO!!");
alert("This didn't work!");
}
}
}
var a = document.getElementById('a');
var b = document.getElementById('b');
var tc_a = a ? a.textContent || a.innerText : NaN;
var tc_b = b ? b.textContent || b.innerText : NaN;
if( tc_a === tc_b )
alert( 'equal' );
Using NaN to ensure a false result if one or both elements don't exist.
If you don't like the verbosity of it, or you need to do this more than once, create a function that hides away most of the work.
function equalText(id1, id2) {
var a = document.getElementById(id1);
var b = document.getElementById(id2);
return (a ? a.textContent || a.innerText : NaN) ===
(b ? b.textContent || b.innerText : NaN);
}
Then invoke it...
if( equalText('a','b') )
alert( 'equal' );
To address your updated question, there isn't enough info to be certain of the result, but here are some potential problems...
obj.parentNode.parentNode.childNodes[1] ...may give different element in different browsers
"System found a matching element ... and deleted it!" ...if you're deleting elements, you need to account for it in your u index because when you remove it from the DOM, it will be removed from the NodeList you're iterating. So you'd need to decrement u when removing an element, or just iterate in reverse.
.textContent isn't supported in older versions of IE
Whitespace will be taken into consideration in the comparison. So if there are different leading and trailing spaces, it won't be considered a match.
If you're a jQuery user....
var a = $('#element1').text(),
b = $('#element2').text();
if (a === b) {
alert('equal!');
}
The triple equals is preferred.
To compare two specific elements the following should work:
<div id="e1">Element 1</div>
<div id="e2">Element 2</div>
$(document).ready(function(){
var $e1 = $('#e1'),
$e2 = $('#e2'),
e1text = $e1.text(),
e2text = $e2.text();
if(e1text == e2text) {
alert("The same!!!");
}
});
I will highly recommend using jQuery for this kind of comparison. jQuery is a javascript library that allows you to draw values from between HTML elements.
var x = $('tag1').text();
var y = $('tag2').text();
continue js here
if(x===y){
//do something
}
for a quick intro to jQuery...
First, download the file from jQuery.com and save it into a js file in your js folder.
Then link to the file. I do it this way:
Of course, I assume that you're not doing inline js scripting...it is always recommended too.
A simple getText function is:
var getText = (function() {
var div = document.createElement('div');
if (typeof div.textContent == 'string') {
return function(el) {
return el.textContent;
}
} else if (typeof div.innerText == 'string') {
return function(el) {
return el.innerText;
}
}
}());
To compare the content of two elements:
if (getText(a) == getText(b)) {
// the content is the same
}
Consider this document fragment:
<div id="test">
<h1>An article about John</h1>
<p>The frist paragraph is about John.</p>
<p>The second paragraph contains a link to John's CV.</p>
<div class="comments">
<h2>Comments to John's article</h2>
<ul>
<li>Some user asks John a question.</li>
<li>John responds.</li>
</ul>
</div>
</div>
I would like to replace every occurrence of the string "John" with the string "Peter". This could be done via HTML rewriting:
$('#test').html(function(i, v) {
return v.replace(/John/g, 'Peter');
});
Working demo: http://jsfiddle.net/v2yp5/
The above jQuery code looks simple and straight-forward, but this is deceiving because it is a lousy solution. HTML rewriting recreates all the DOM nodes inside the #test DIV. Subsequently, changes made on that DOM subtree programmatically (for instance "onevent" handlers), or by the user (entered form fields) are not preserved.
So what would be an appropriate way to perform this task?
How about a jQuery plugin version for a little code reduction?
http://jsfiddle.net/v2yp5/4/
jQuery.fn.textWalk = function( fn ) {
this.contents().each( jwalk );
function jwalk() {
var nn = this.nodeName.toLowerCase();
if( nn === '#text' ) {
fn.call( this );
} else if( this.nodeType === 1 && this.childNodes && this.childNodes[0] && nn !== 'script' && nn !== 'textarea' ) {
$(this).contents().each( jwalk );
}
}
return this;
};
$('#test').textWalk(function() {
this.data = this.data.replace('John','Peter');
});
Or do a little duck typing, and have an option to pass a couple strings for the replace:
http://jsfiddle.net/v2yp5/5/
jQuery.fn.textWalk = function( fn, str ) {
var func = jQuery.isFunction( fn );
this.contents().each( jwalk );
function jwalk() {
var nn = this.nodeName.toLowerCase();
if( nn === '#text' ) {
if( func ) {
fn.call( this );
} else {
this.data = this.data.replace( fn, str );
}
} else if( this.nodeType === 1 && this.childNodes && this.childNodes[0] && nn !== 'script' && nn !== 'textarea' ) {
$(this).contents().each( jwalk );
}
}
return this;
};
$('#test').textWalk(function() {
this.data = this.data.replace('John','Peter');
});
$('#test').textWalk( 'Peter', 'Bob' );
You want to loop through all child nodes and only replace the text nodes. Otherwise, you may match HTML, attributes or anything else that is serialised. When replacing text, you want to work with the text nodes only, not the entire HTML serialised.
I think you already know that though :)
Bobince has a great piece of JavaScript for doing that.
I needed to do something similar, but I needed to insert HTML markup. I started from the answer by #user113716 and made a couple modifications:
$.fn.textWalk = function (fn, str) {
var func = jQuery.isFunction(fn);
var remove = [];
this.contents().each(jwalk);
// remove the replaced elements
remove.length && $(remove).remove();
function jwalk() {
var nn = this.nodeName.toLowerCase();
if (nn === '#text') {
var newValue;
if (func) {
newValue = fn.call(this);
} else {
newValue = this.data.replace(fn, str);
}
$(this).before(newValue);
remove.push(this)
} else if (this.nodeType === 1 && this.childNodes && this.childNodes[0] && nn !== 'script' && nn !== 'textarea') {
$(this).contents().each(jwalk);
}
}
return this;
};
There are a few implicit assumptions:
you are always inserting HTML. If not, you'd want to add a check to avoid manipulating the DOM when not necessary.
removing the original text elements isn't going to cause any side effects.
Slightly less intrusive, but not necessarily any more performant, is to select elements which you know only contain text nodes, and use .text(). In this case (not a general-purpose solution, obviously):
$('#test').find('h1, p, li').text(function(i, v) {
return v.replace(/John/g, 'Peter');
});
Demo: http://jsfiddle.net/mattball/jdc87/ (type something in the <input> before clicking the button)
This is how I would do it:
var textNodes = [], stack = [elementWhoseNodesToReplace], c;
while(c = stack.pop()) {
for(var i = 0; i < c.childNodes.length; i++) {
var n = c.childNodes[i];
if(n.nodeType === 1) {
stack.push(n);
} else if(n.nodeType === 3) {
textNodes.push(n);
}
}
}
for(var i = 0; i < textNodes.length; i++) textNodes[i].parentNode.replaceChild(document.createTextNode(textNodes[i].nodeValue.replace(/John/g, 'Peter')), textNodes[i]);
Pure JavaScript and no recursion.
You could wrap every textual instance that is variable (e.g. "John") in a span with a certain CSS class, and then do a .text('..') update on all those spans. Seems less intrusive to me, as the DOM isn't really manipulated.
<div id="test">
<h1>An article about <span class="name">John</span></h1>
<p>The frist paragraph is about <span class="name">John</span>.</p>
<p>The second paragraph contains a link to <span class="name">John</span>'s CV.</p>
<div class="comments">
<h2>Comments to <span class="name">John</span>'s article</h2>
<ul>
<li>Some user asks <span class="name">John</span> a question.</li>
<li><span class="name">John</span> responds.</li>
</ul>
</div>
</div>
$('#test .name').text(function(i, v) {
return v.replace(/John/g, 'Peter');
});
Another idea is to use jQuery Templates. It's definitely intrusive, as it has its way with the DOM and makes no apologies for it. But I see nothing wrong with that... I mean you're basically doing client-side data binding. So that's what the templates plugin is for.
This seems to work (demo):
$('#test :not(:has(*))').text(function(i, v) {
return v.replace(/John/g, 'Peter');
});
The POJS solution offered is ok, but I can't see why recursion is avoided. DOM nodes are usually not nested too deeply so it's fine I think. I also think it's much better to build a single regular expression than use a literal and build the expression on every call to replace.
// Repalce all instances of t0 in text descendents of
// root with t1
//
function replaceText(t0, t1, root) {
root = root || document;
var node, nodes = root.childNodes;
if (typeof t0 == 'string') {
t0 = new RegExp(t0, 'g');
}
for (var i=0, iLen=nodes.length; i<iLen; i++) {
node = nodes[i];
if (node.nodeType == 1) {
arguments.callee(t0, t1, node);
} else if (node.nodeType == 3) {
node.data = node.data.replace(t0, t1);
}
}
}
Let's say there's a string of HTML, with script tags, plain text, whatever.
What's the best way to strip out only the <a> tags?
I've been using some methods here, but these are for all tags. Strip HTML from Text JavaScript
Using jQuery:
var content = $('<div>' + htmlString + '</div>');
content.find('a').replaceWith(function() { return this.childNodes; });
var newHtml = content.html();
Adding a wrapping <div> tag allows us to get the desired HTML back.
I wrote a more detailed explanation on my blog.
This approach will preserve existing DOM nodes, minimizing side-effects if you have elements within the anchors that have events attached to them.
function unwrapAnchors() {
if(!('tagName' in this) || this.tagName.toLowerCase() != 'a' || !('parentNode' in this)) {
return;
}
var childNodes = this.childNodes || [], children = [], child;
// Convert childNodes collection to array
for(var i = 0, childNodes = this.childNodes || []; i < childNodes.length; i++) {
children[i] = childNodes[i];
}
// Move children outside element
for(i = 0; i < children.length; i++) {
child = children[i];
if(('tagName' in child) && child.tagName.toLowerCase() == 'a') {
child.parentNode.removeChild(child);
} else {
this.parentNode.insertBefore(child, this);
}
}
// Remove now-empty anchor
this.parentNode.removeChild(this);
}
To use (with jQuery):
$('a').each(unwrapAnchors);
To use (without jQuery):
var a = document.getElementsByTagName('a');
while(a.length) {
unwrapAnchors.call(a[a.length - 1]);
}
A <a> tag is not supposed to hold any other <a> tag, so a simple ungreedy regexp would do the trick (i.e. string.match(/<a>(.*?)<\/a>/), but this example suppose the tags have no attribute).
Here's a native (non-library) solution if performance is a concern.
function stripTag(str, tag) {
var a, parent, div = document.createElement('div');
div.innerHTML = str;
a = div.getElementsByTagName( tag );
while( a[0] ) {
parent = a[0].parentNode;
while (a[0].firstChild) {
parent.insertBefore(a[0].firstChild, a[0]);
}
parent.removeChild(a[0]);
}
return div.innerHTML;
}
Use it like this:
alert( stripTag( my_string, 'a' ) );
I want to highlight a specific word in my HTML page after the page is loaded. I don't want to use the dumb:
document.innerHTML = document.innerHTML.replace(.....);
I want to traverse every DOM node, find out the ones that contain text and modify the innerHTML of only those individual nodes. Here's what I came up with:
function highlightSearchTerms(sword) {
$$('body').map(Element.extend).first().descendants().each(function (el) {
if (el.nodeType == Node.ELEMENT_NODE && el.tagName != 'TD') {
//$A(el.childNodes).each(function (onlyChild) {
//if (onlyChild.nodeType == Node.TEXT_NODE) {
//console.log(onlyChild);
el.innerHTML = el.innerHTML.replace(new RegExp('('+sword+')', 'gi'), '<span class="highlight">$1</span>');
//}
//});
}
});
//document.body.innerHTML.replace(new RegExp('('+sword+')', 'gi'), '<span class="highlight">$1</span>');
}
It works as it is right now, but it's VERY inefficient and is hardly better than the single line above as it may do a replacement several times over the same text. (Hmmm..., or not?)
If you uncomment the commented stuff and change el.innerHTML.replace to onlyChild.textContent.replace it would work almost like it needs to, but modifying textContent doesn't create a new span as an element, but rather adds the HTML content as text.
My question/request is to find a way that it highlights the words in the document traversing elements one by one.
This works quick and clean:
function highlightSearchTerms(sword) {
$$('body').map(Element.extend).first().descendants().each(function (el) {
if (el.nodeType == Node.ELEMENT_NODE && el.tagName != 'TEXTAREA' && el.tagName != 'INPUT' && el.tagName != 'SCRIPT') {
$A(el.childNodes).each(function (onlyChild) {
var pos = onlyChild.textContent.indexOf(sword);
if (onlyChild.nodeType == Node.TEXT_NODE && pos >= 0) {
//console.log(onlyChild);
var spannode = document.createElement('span');
spannode.className = 'highlight';
var middlebit = onlyChild.splitText(pos);
var endbit = middlebit.splitText(sword.length);
var middleclone = middlebit.cloneNode(true);
spannode.appendChild(middleclone);
middlebit.parentNode.replaceChild(spannode, middlebit);
//onlyChild. = el.innerHTML.replace(new RegExp('('+sword+')', 'gi'), '<span class="highlight">$1</span>');
}
});
}
});
}
But I've trouble understanding how exactly it works. This seems to be the magic line:
middlebit.parentNode.replaceChild(spannode, middlebit);
I converted one from jQuery to PrototypeJS some time ago :
Element.addMethods({
highlight: function(element, term, className) {
function innerHighlight(element, term, className) {
className = className || 'highlight';
term = (term || '').toUpperCase();
var skip = 0;
if ($(element).nodeType == 3) {
var pos = element.data.toUpperCase().indexOf(term);
if (pos >= 0) {
var middlebit = element.splitText(pos),
endbit = middlebit.splitText(term.length),
middleclone = middlebit.cloneNode(true),
spannode = document.createElement('span');
spannode.className = 'highlight';
spannode.appendChild(middleclone);
middlebit.parentNode.replaceChild(spannode, middlebit);
skip = 1;
}
}
else if (element.nodeType == 1 && element.childNodes && !/(script|style)/i.test(element.tagName)) {
for (var i = 0; i < element.childNodes.length; ++i)
i += innerHighlight(element.childNodes[i], term);
}
return skip;
}
innerHighlight(element, term, className);
return element;
},
removeHighlight: function(element, term, className) {
className = className || 'highlight';
$(element).select("span."+className).each(function(e) {
e.parentNode.replaceChild(e.firstChild, e);
});
return element;
}
});
You can use it on every element like this:
$("someElementId").highlight("foo", "bar");
, and use the className of your choice. You can also remove the highlights.
if you're using the prototype version posted by Fabien, make sure to add the className
as argument to the call of innerHighlight:
i += innerHighlight(element.childNodes[i], term)
needs to be
i += innerHighlight(element.childNodes[i], term, className)
if you care about custom classNames for your highlights.
Grab $(document.body) and do a search/replace and wrap a span around the term, then swap the entire $(document.body) in one go. Treat it as a big string, forget about the DOM. This way you only have to update the DOM once. It should be very quick.
I have found a script that will do what you want (it seems pretty fast), it is not specific to any library so you may want to modify it:
http://www.nsftools.com/misc/SearchAndHighlight.htm
The method you provided above (although commented out) will have problems with replacing items that might be inside a an html element. ie ` a search and replace might "highlight" "thing" when that would not be what you want.
here is a Jquery based highlight script:
http://johannburkard.de/blog/programming/javascript/highlight-javascript-text-higlighting-jquery-plugin.html
It dosent look to hard to convert to prototype.