How to get text from all descendents of an element, disregarding scripts? - javascript

My current project involves gathering text content from an element and all of its descendants, based on a provided selector.
For example, when supplied the selector #content and run against this HTML:
<div id="content">
<p>This is some text.</p>
<script type="text/javascript">
var test = true;
</script>
<p>This is some more text.</p>
</div>
my script would return (after a little whitespace cleanup):
This is some text. var test = true; This is some more text.
However, I need to disregard text nodes that occur within <script> elements.
This is an excerpt of my current code (technically, it matches based on one or more provided selectors):
// get text content of all matching elements
for (x = 0; x < selectors.length; x++) { // 'selectors' is an array of CSS selectors from which to gather text content
matches = Sizzle(selectors[x], document);
for (y = 0; y < matches.length; y++) {
match = matches[y];
if (match.innerText) { // IE
content += match.innerText + ' ';
} else if (match.textContent) { // other browsers
content += match.textContent + ' ';
}
}
}
It's a bit simplistic in that it just returns all text nodes within the element (and its descendants) that matches the provided selector. The solution I'm looking for would return all text nodes except for those that fall within <script> elements. It doesn't need to be especially high-performance, but I do need it to ultimately be cross-browser compatible.
I'm assuming that I'll need to somehow loop through all children of the element that matches the selector and accumulate all text nodes other than ones within <script> elements; it doesn't look like there's any way to identify JavaScript once it's already rolled into the string accumulated from all of the text nodes.
I can't use jQuery (for performance/bandwidth reasons), although you may have noticed that I do use its Sizzle selector engine, so jQuery's selector logic is available.

function getTextContentExceptScript(element) {
var text= [];
for (var i= 0, n= element.childNodes.length; i<n; i++) {
var child= element.childNodes[i];
if (child.nodeType===1 && child.tagName.toLowerCase()!=='script')
text.push(getTextContentExceptScript(child));
else if (child.nodeType===3)
text.push(child.data);
}
return text.join('');
}
Or, if you are allowed to change the DOM to remove the <script> elements (which wouldn't usually have noticeable side effects), quicker:
var scripts= element.getElementsByTagName('script');
while (scripts.length!==0)
scripts[0].parentNode.removeChild(scripts[0]);
return 'textContent' in element? element.textContent : element.innerText;

EDIT:
Well first let me say im not too familar with Sizzle on its lonesome, jsut within libraries that use it... That said..
if i had to do this i would do something like:
var selectors = new Array('#main-content', '#side-bar');
function findText(selectors) {
var rText = '';
sNodes = typeof selectors = 'array' ? $(selectors.join(',')) : $(selectors);
for(var i = 0; i < sNodes.length; i++) {
var nodes = $(':not(script)', sNodes[i]);
for(var j=0; j < nodes.length; j++) {
if(nodes[j].nodeType != 1 && node[j].childNodes.length) {
/* recursion - this would work in jQ not sure if
* Sizzle takes a node as a selector you may need
* to tweak.
*/
rText += findText(node[j]);
}
}
}
return rText;
}
I didnt test any of that but it should give you an idea. Hopefully someone else will pipe up with more direction :-)
Cant you just grab the parent node and check the nodeName in your loop... like:
if(match.parentNode.nodeName.toLowerCase() != 'script' && match.nodeName.toLowerCase() != 'script' ) {
match = matches[y];
if (match.innerText) { // IE
content += match.innerText + ' ';
} else if (match.textContent) { // other browsers
content += match.textContent + ' ';
}
}
ofcourse jquery supports the not() syntax in selectors so could you just do $(':not(script)')?

Related

can adjacent text nodes in the DOM be merged with Javascript?

Suppose I have a sentence in the webpage DOM that when I examine it, consists of 3 text nodes followed by perhaps some element like BOLD or ITALIC. I want to merge the text nodes into one text node, since having adjacent text nodes is meaningless - there is no reason to have them. Is there a way to merge them easily?
Thanks
It seems that Node.normalize() is doing exactly what you want.
You can refer to: Node.normalize()
Maybe this will help you:
var parentNode = document.getElementById('pelements');
var textNode = document.createElement('p');
while (parentNode.firstChild) {
textNode.textContent += parentNode.firstChild.textContent;
parentNode.removeChild(parentNode.firstChild);
}
parentNode.appendChild(textNode);
<div id="pelements">
<p>A</p>
<p>B</p>
<p>C</p>
</div>
It is possible, but you need to specify the parent element. It should be possible to traverse the whole DOM and every node, but if you can avoid that, it would be better.
nodes = document.body.childNodes;
nodesToDelete = [];
function combineTextNodes(node, prevText) {
if (node.nextSibling && node.nextSibling.nodeType == 3) {
nodesToDelete.push(node.nextSibling);
return combineTextNodes(node.nextSibling, prevText + node.nodeValue);
} else {
return prevText + node.nodeValue;
}
}
for (i = 0; i < nodes.length; i++) {
if (nodes[i].nodeType == 3) {
nodes[i].nodeValue = combineTextNodes(nodes[i], '');
}
}
for (i = 0; i < nodesToDelete.length; i++) {
console.log(nodesToDelete[i]);
nodesToDelete[i].remove();
}

Inserting html elements while DOM is changing

My code should insert HTML content in all divs that have a predefined class name, without using jQuery and at least compatible with IE8 (so no getElementsbyClass).
The html:
<div class="target">1</div>
<div class="target">2</div>
<div class="target">3</div>
<div class="target">4</div>
The javascript:
var elems = document.getElementsByTagName('*'), i;
for (wwi in elems) {
if((' ' + elems[wwi].className + ' ').indexOf(' ' + "target" + ' ') > -1) {
elems[wwi].innerHTML = "YES";
//elems[wwi].innerHTML = "<div>YES!</div>";
}
}
You can try it here.
As you can see inside each div the word YES is printed. Well the if you comment elems[wwi].innerHTML = "YES"; and replace that for elems[wwi].innerHTML = "<div>YES!</div>" the code fails. I suppose is because inserting div elements modify the DOM and in consequence the FOR cycle fails. Am i right?
Well i can solve this pretty ugly by recalling the for cycle each time i make an innerHTML, and when i insert the code i can add a class (like data-codeAlreadyInserted=1) to ignore the next time the FOR pass in that div. But again, this is pretty much a very bad solution since for an average site with many tags I can even freeze the user browser.
What do you think? lets suppose i dont know the amount of tags i insert on each innerHTML call.
"I suppose is because inserting div elements modify the DOM and in consequence the FOR cycle fails. Am i right?"
Pretty much. Your elems list is a live list that is updated when the DOM changes. Because you're adding a new div on every iteration, the list keeps growing and so you never get to the end.
To avoid this, you can either do a reverse iteration,
for (var i = elems.length-1; i > -1; i--) {
// your code
}
or convert the list to an Array.
var arr = [];
for (var i = 0, len = list.length; i < len; i++) {
arr.push(elems[i]);
}
for (i = 0; i < len; i++) {
// your code
}
Another way is to use replaceChild instead of innerHTML. It works better and it's way faster:
var newEl = elem[wwi].cloneNode(false);
newEl.innerHTML = html;
elem[wwi].parentNode.replaceChild(newEl, elem[wwi]);
You can take a copy of the live node list:
var nodes = [];
for (var i = 0, n = elems.length; i < n; ++i) {
nodes.push(elems[i]);
}
and then use a proper for loop, not for ... in to iterate over the array:
for (var i = 0, n = nodes.length; i < n; ++i) {
...
}
for ... in should only be used on objects, not arrays.

counting text node recursively using javascript

Let say I have a mark up like this
<html id="test">
<body>
Some text node.
<div class="cool"><span class="try">This is another text node.</span></div>
Yet another test node.
</body>
</html>
my js code
function countText(node){
var counter = 0;
if(node.nodeType === 3){
counter+=node.nodeValue.length;
countText(node);
}
else{}
}
Now if I want to count the text nodes
console.log("count text : " + countText(document.getElementById("test"));
this should return me the count but its not working and moreover what should I put in else condition.
I never used nodeType so kind of having problem using it . Any help will be appreciated.
There are a couple of things wrong in your code:
Your HTML is malformed.
You are appending text to your counter instead of increasing it.
You never loop over the children of the a node, you always pass the same node to the recursive call.
You don't do anything if a node is not a text node.
This will work:
function countText(node){
var counter = 0;
if(node.nodeType === 3){
counter++;
}
else if(node.nodeType === 1) { // if it is an element node,
var children = node.childNodes; // examine the children
for(var i = children.length; i--; ) {
counter += countText(children[i]);
}
}
return counter;
}
alert(countText(document.body));
DEMO
Which number corresponds to which node type can be found here.
Update:
If you want to count the words, you have to split each text node into words first. In the following I assume that words are separated by white spaces:
if(node.nodeType === 3){
counter = node.nodeValue.split(/\s+/g).length;
}
Update 2
I know you want to use a recursive function, but if you want to count the words only, then there is a much easier and more efficient way:
function countWords(node){
// gets the text of the node and all its descendants
var text = node.innerText || node.textContent
return text.split(/\s+/g).length;
}
You want something like
function countTextNodes(node) {
var n = 0;
if(node.nodeType == 3)
n = 1;
for(var i = 0; i < node.childNodes.length; ++i)
n += countTextNodes(node.childNodes[i]);
return n;
}
This can be compressed into more compact code, but I went for legibility here.
Call this on the root in which you want to count text nodes. For example, to count text nodes throughout the entire document, you would want to call countTextNodes(document.getDocumentElement()).

How to Get Element By Class in JavaScript?

I want to replace the contents within a html element so I'm using the following function for that:
function ReplaceContentInContainer(id,content) {
var container = document.getElementById(id);
container.innerHTML = content;
}
ReplaceContentInContainer('box','This is the replacement text');
<div id='box'></div>
The above works great but the problem is I have more than one html element on a page that I want to replace the contents of. So I can't use ids but classes instead. I have been told that javascript does not support any type of inbuilt get element by class function. So how can the above code be revised to make it work with classes instead of ids?
P.S. I don't want to use jQuery for this.
This code should work in all browsers.
function replaceContentInContainer(matchClass, content) {
var elems = document.getElementsByTagName('*'), i;
for (i in elems) {
if((' ' + elems[i].className + ' ').indexOf(' ' + matchClass + ' ')
> -1) {
elems[i].innerHTML = content;
}
}
}
The way it works is by looping through all of the elements in the document, and searching their class list for matchClass. If a match is found, the contents is replaced.
jsFiddle Example, using Vanilla JS (i.e. no framework)
Of course, all modern browsers now support the following simpler way:
var elements = document.getElementsByClassName('someClass');
but be warned it doesn't work with IE8 or before. See http://caniuse.com/getelementsbyclassname
Also, not all browsers will return a pure NodeList like they're supposed to.
You're probably still better off using your favorite cross-browser library.
document.querySelectorAll(".your_class_name_here");
That will work in "modern" browsers that implement that method (IE8+).
function ReplaceContentInContainer(selector, content) {
var nodeList = document.querySelectorAll(selector);
for (var i = 0, length = nodeList.length; i < length; i++) {
nodeList[i].innerHTML = content;
}
}
ReplaceContentInContainer(".theclass", "HELLO WORLD");
If you want to provide support for older browsers, you could load a stand-alone selector engine like Sizzle (4KB mini+gzip) or Peppy (10K mini) and fall back to it if the native querySelector method is not found.
Is it overkill to load a selector engine just so you can get elements with a certain class? Probably. However, the scripts aren't all that big and you will may find the selector engine useful in many other places in your script.
A Simple and an easy way
var cusid_ele = document.getElementsByClassName('custid');
for (var i = 0; i < cusid_ele.length; ++i) {
var item = cusid_ele[i];
item.innerHTML = 'this is value';
}
I'm surprised there are no answers using Regular Expressions. This is pretty much Andrew's answer, using RegExp.test instead of String.indexOf, since it seems to perform better for multiple operations, according to jsPerf tests.
It also seems to be supported on IE6.
function replaceContentInContainer(matchClass, content) {
var re = new RegExp("(?:^|\\s)" + matchClass + "(?!\\S)"),
elems = document.getElementsByTagName('*'), i;
for (i in elems) {
if (re.test(elems[i].className)) {
elems[i].innerHTML = content;
}
}
}
replaceContentInContainer("box", "This is the replacement text.");
If you look for the same class(es) frequently, you can further improve it by storing the (precompiled) regular expressions elsewhere, and passing them directly to the function, instead of a string.
function replaceContentInContainer(reClass, content) {
var elems = document.getElementsByTagName('*'), i;
for (i in elems) {
if (reClass.test(elems[i].className)) {
elems[i].innerHTML = content;
}
}
}
var reBox = /(?:^|\s)box(?!\S)/;
replaceContentInContainer(reBox, "This is the replacement text.");
This should work in pretty much any browser...
function getByClass (className, parent) {
parent || (parent=document);
var descendants=parent.getElementsByTagName('*'), i=-1, e, result=[];
while (e=descendants[++i]) {
((' '+(e['class']||e.className)+' ').indexOf(' '+className+' ') > -1) && result.push(e);
}
return result;
}
You should be able to use it like this:
function replaceInClass (className, content) {
var nodes = getByClass(className), i=-1, node;
while (node=nodes[++i]) node.innerHTML = content;
}
var elems = document.querySelectorAll('.one');
for (var i = 0; i < elems.length; i++) {
elems[i].innerHTML = 'content';
};
I assume this was not a valid option when this was originally asked, but you can now use document.getElementsByClassName('');. For example:
var elements = document.getElementsByClassName(names); // or:
var elements = rootElement.getElementsByClassName(names);
See the MDN documentation for more.
There are 3 different ways to get elements by class in javascript. But here for your query as you have multiple elements with the same class names you can use 2 methods:
getElementsByClassName Method - It returns all the elements with the specified class present in the document or within the parent element which called it.
function ReplaceContentInContainer(className, content) {
var containers = document.getElementsByClassName(className);
for (let i = 0; i < containers.length; i++) {
containers[i].innerHTML = content;
}
}
ReplaceContentInContainer('box', 'This is the replacement text');
<div class='box'></div>
querySelectorAll Method - It select element on the basic of CSS selectors. Pass your CSS class to it with a dot and it will return all the element having specified class as an array-like object.
function ReplaceContentInContainer(className, content) {
var containers = document.querySelectorAll(`.${className}`);
for (let i = 0; i < containers.length; i++) {
containers[i].innerHTML = content;
}
}
ReplaceContentInContainer('box', 'This is the replacement text');
<div class='box'></div>
I think something like:
function ReplaceContentInContainer(klass,content) {
var elems = document.getElementsByTagName('*');
for (i in elems){
if(elems[i].getAttribute('class') == klass || elems[i].getAttribute('className') == klass){
elems[i].innerHTML = content;
}
}
}
would work
jQuery handles this easy.
let element = $(.myclass);
element.html("Some string");
It changes all the .myclass elements to that text.
When some elements lack ID, I use jQuery like this:
$(document).ready(function()
{
$('.myclass').attr('id', 'myid');
});
This might be a strange solution, but maybe someone find it useful.

Javascript search for tag and get it's innerHTML

It's probably something really simple, but I'm just learning.
There's a page with 3 blockquote tags on it, and I'd need to get the innerHTML of the one containing a certain string. I don't know how to search/match a string and get the innerHTML of the tag containing the matched result.
Any help would be appreciated!
var searchString = 'The stuff in innerHTML';
var elements = document.getElementsByTagName('blockquote')
for (var i = 0; i < elements.length; i++) {
if (elements[i].innerHTML.indexOf(searchString) !== -1) {
alert('Match');
break;
}
}
:)
Btw there would be a much nicer method if you'd be using Prorotype JS (which is much better than jQuery btw):
var el = $$('blockquote').find(function(el) {
return el.innerHTML.indexOf('The string you are looking for.') !== -1;
});
You could of course also use regular expressions to find the string, which might be more useful (use el.match() for that).
If you need to search through every <blockquote> on the page, try this:
function findBlockquoteContainingHtml(matchString) {
var blockquoteElements = document.getElementsByTagName('BLOCKQUOTE');
var i;
for (i = 0; i < blockquoteElements.length; i++) {
if (blockquoteElements[i].innerHTML.indexOf(matchString) >= 0) {
return blockquoteElements[i].innerHTML;
}
}
return null;
}
Assign an id to the blockquote elements then you can get the innerHTML like this:
HTML:
<blockquote id="bq1">Foo</blockquote>
JS:
var quote1 = document.getElementById('bq1').innerHTML;
Be careful using innerHTML to search for text within a tag, as that may also search for text in attributes or tags as well.
You can find all blockquote elements using:
var elems = document.getElementsByTagName("blockquote")
You can then look through their innerHTML, but I would recommend instead looking through their textContent/innerText (sadly, this is not standardized across browser, it seems):
for (i in elems) {
var text = elems[i].textContent || elems[i].innerText;
if (text.match(/foo/)) {
alert(elems[i].innerHTML);
}
}

Categories

Resources