Get all the descendant nodes (also the leaves) of a certain node - javascript

I have an html document consists of a <div id="main">. Inside this div may be several levels of nodes, without a precise structure because is the user who creates the document content.
I want to use a JavaScript function that returns all nodes within div id="main". Any tag is, taking into account that there may be different levels of children.
For example, if I has this document:
...
<div id="main">
<h1>bla bla</h1>
<p>
<b>fruits</b> apple<i>text</i>.
<img src="..">image</img>
</p>
<div>
<p></p>
<p></p>
</div>
<p>..</p>
</div>
...
The function getNodes would return an array of object nodes (I don't know how to represent it, so I list them):
[h1, #text (= bla bla), p, b, #text (= fruits), #text (= _apple), i, #text (= text), img, #text (= image), div, p, p, p, #text (= ..)]
As we see from the example, you must return all nodes, even the leaf nodes (ie #text node).
For now I have this function that returns all nodes except leaf:
function getNodes() {
var all = document.querySelectorAll("#main *");
for (var elem = 0; elem < all.length; elem++) {
//do something..
}
}
In fact, this feature applied in the above example returns:
[H1, P, B, I, IMG, DIV, P, P, P]
There aren't #text nodes.
Also, if text elements returned by that method in this way:
all[elem].children.length
I obtain that (I tested on <p>fruits</p>) <p> is a leaf node.
But if I build the DOM tree it is clear that is not a leaf node, and that in this example the leaf nodes are the #text...
Thank you

Classic case for recursion into the DOM.
function getDescendants(node, accum) {
var i;
accum = accum || [];
for (i = 0; i < node.childNodes.length; i++) {
accum.push(node.childNodes[i])
getDescendants(node.childNodes[i], accum);
}
return accum;
}
and
getDescendants( document.querySelector("#main") );

Aside from the already existing and perfectly functional answer, I find it worth mentioning that one can do away with the recursion and the many resulting function calls by simply navigating via the firstChild, nextSibling, and parentNode properties:
function getDescendants(node) {
var list = [], desc = node, checked = false, i = 0;
do {
checked || (list[i++] = desc);
desc =
(!checked && desc.firstChild) ||
(checked = false, desc.nextSibling) ||
(checked = true, desc.parentNode);
} while (desc !== node);
return list;
}
(Whenever we encounter a new node, we add it to the list, then try going to its first child node. If such does not exist, get the next sibling instead. Whenever no child node or following sibling is found, we go back up to the parent, while setting the checked flag to avoid adding that to the list again or reentering its descendant tree.)
This will, in virtually every case, improve performance greatly. Not that there is nothing left to optimize here, e.g. one could cache the nodes where we descend further into the hierarchy so as to later get rid of the parentNode when coming back up. I leave implementing this as an exercise for the reader.
Keep in mind though that iterating through the DOM like this will rarely be the bottleneck in a script. Unless you are going through a large DOM tree many tens/hundreds of times a second, that is — in which case you probably ought to think about avoiding that if at all possible, rather than simply optimizing it.

the children property only returns element nodes. If you want all children, I would suggest using the childNodes property. Then you can loop through this nodeList, and eliminate nodes that have nodeType of Node.ELEMENT_NODE or pick which other node types you would be interested in
so try something like:
var i, j, nodes
var result=[]
var all = document.querySelectorAll("#main *");
for (var elem = 0; elem < all.length; elem++) {
result.push(all[elem].nodeName)
nodes = all[elem].childNodes;
for (i=0, j=nodes.length; i<j; i++) {
if (nodes[i].nodeType == Node.TEXT_NODE) {
result.push(nodes[i].nodeValue)
}
}
}

If you only need the html tags and not the #text, you can just simply use this:<elem>.querySelectorAll("*");

Related

How can I get all the HTML in a document or node containing shadowRoot elements

I have not seen a satisfactory answer for this question. This basically a duplicate of this question, but it was improperly closed and the answers given are not sufficient.
I have come up with my own solution which I will post below.
This can be useful for web scraping, or in my case, running tests on a javascript library that handles custom elements. I make sure it is producing the output that I want, then I use this function to scrape the HTML for a given test output and use that copied HTML as the expected output to compare the test against in the future.
Here is a function that can do what is requested. Note that it ignores html comments and other fringe things. But it retrieves regular elements, text nodes, and custom elements with shadowRoots. It also handles slotted template content. It has not been tested exhaustively but seems to be working well for my needs.
Use it like extractHTML(document.body) or extractHTML(document.getElementByID('app')).
function extractHTML(node) {
// return a blank string if not a valid node
if (!node) return ''
// if it is a text node just return the trimmed textContent
if (node.nodeType===3) return node.textContent.trim()
//beyond here, only deal with element nodes
if (node.nodeType!==1) return ''
let html = ''
// clone the node for its outer html sans inner html
let outer = node.cloneNode()
// if the node has a shadowroot, jump into it
node = node.shadowRoot || node
if (node.children.length) {
// we checked for children but now iterate over childNodes
// which includes #text nodes (and even other things)
for (let n of node.childNodes) {
// if the node is a slot
if (n.assignedNodes) {
// an assigned slot
if (n.assignedNodes()[0]){
// Can there be more than 1 assigned node??
html += extractHTML(n.assignedNodes()[0])
// an unassigned slot
} else { html += n.innerHTML }
// node is not a slot, recurse
} else { html += extractHTML(n) }
}
// node has no children
} else { html = node.innerHTML }
// insert all the (children's) innerHTML
// into the (cloned) parent element
// and return the whole package
outer.innerHTML = html
return outer.outerHTML
}
Only if shadowRoots are created with the mode:"open" setting can you access shadowRoots from the outside.
You can then dive into elements and shadowRoots with something like:
const shadowDive = (
el,
selector,
match = (m, r) => console.warn('match', m, r)
) => {
let root = el.shadowRoot || el;
root.querySelector(selector) && match(root.querySelector(selector), root);
[...root.children].map(el => shadowDive(el, selector, match));
}
Note: extracting raw HTML is pointless if Web Component styling is based on shadowDOM behaviour; you will loose all correct styling.

Vanilla JS: Find all the DOM elements that just contain text

I want to get all the DOM elements in an HTML that doesn't contain any node, but text only.
I've got this code right now:
var elements = document.querySelectorAll("body *");
for(var i = 0; i < elements.length; i++) {
if(!elements[i].hasChildNodes()) {
console.log(elements[i])
}
}
This prints of course elements that have absolutely no content (and curiously enough, iframes).
Texts are accounted as a child node, so the .childNodes.length equals 1, but I don't know how to distinguish the nodes from the text. typeof the first node is always object, sadly.
How to distinguish the texts from the nodes?
Basically you are looking for leaf nodes of DOM with something inside the textContent property of the leaf node.
Let's traverse DOM and work out our little logic on leaf nodes.
const nodeQueue = [ document.querySelector('html') ];
const textOnlyNodes = [];
const textRegEx = /\w+/gi;
function traverseDOM () {
let currentNode = nodeQueue.shift();
// Our Leaf node
if (!currentNode.childElementCount && textRegEx.test(currentNode.textContent)) {
textOnlyNodes.push(currentNode);
return;
}
// Nodes with child nodes
nodeQueue.push(...currentNode.children);
traverseDOM();
}
childElementCount property make sure that the node is the leaf node and the RegEx test on textContent property is just my understanding of what a text implies in general. You can anytime tune the expression to make it a btter fit for your use case.
You can check for elements that have no .firstElementChild, which means it will only have text (or other invisible stuff).
var elements = document.querySelectorAll("body *");
for (var i = 0; i < elements.length; i++) {
if (!elements[i].firstElementChild) {
console.log(elements[i].nodeName)
}
}
<p>
text and elements <span>text only</span>
</p>
<div>text only</div>
The script that the stack snippet is included because it also only has text. You can filter out scripts if needed. This will also include elements that can not have content, like <input>.

Javascript childNodes

I am trying to make a childNode be invisible so that the user will not be able to see it.
function hideLetters() {
var squares = document.querySelectorAll( "#squarearea div" );
for ( var i = 0; i < squares.length; i++ ) {
squares[ i ] = hide( squares[ i ] );
}
}
function hide( squares ) {
var nodeList = squares.childNodes;
nodeList.style.display = "none";
squares.childNodes = nodeList;
return squares;
}
I have been trying to make the child nodes found within squares invisible so that they do not appear on the screen. Please note that I am only using JavaScript, HTML, and CSS for this project.
You need to apply it to every element of the node list:
squares.childNodes.forEach(node => node.style.display = "none");
Try this one
Array.prototype.slice.call(squares.childNodes).forEach(node => node.style.display = 'none')
There were a few things incorrect about your code and I took the liberty of taking out bits that didn't merit staying in given what you were trying to do.
In no way are you manipulating squares other than looping over it. In your code you said squares[i] = hide(squares[i] - not to put to fine a point on it, but this is worthless and does nothing. The list itself is a reference to the nodes, not the nodes themselves. You can think of every item in the list like a sign-post that tells the code where to look. So when that node is changed it doesn't need to be updated in the list because the list is simply saying "this is what you want to look at", it doesn't actually contain a copy of the node.
because of the reasons listed above you don't need to return anything from your hide function.
The nodeList in your hide function needs to be iterated over and each node manipulated individually. It's worth noting that you can't say "adjust all of these" at any point in JavaScript unless you yourself create a function that allows that functionality, but under the hood you're still going through every list or array one by one.
Your nodeList is aptly named. It is a list of nodes. Most people, at least the newer people to JavaScript(no shame for that, we all learn sometime), assume that tags(a.e. <div></div>, <a></a>, <span></span>) are nodes. And yes, you're right, they are! But the text within those tags are completely separate and individual nodes as well. This means that when you iterate over all the nodes you probably aren't just getting Element Nodes you might be getting Text Nodes or Document Fragment Nodes or Entity Nodes, etc.
While we iterate over the nodeList we need to separate out the nodes that we can hide(those with a style object that's able to be manipulated) and we do this by comparing the built-in nodeType property that's in every node with the Node.ELEMENT_NODE property. If it returns true we know absolutely that the node is an Element Node.
After we've checked that what we're manipulating is an element, we simply set it's display property (which is normally "block") to the value "none" and in that way hiding it on the DOM.
The code below, I think, is what you're looking for.
function hideLetters() {
let squares = document.querySelectorAll("#squarearea div");
for(let i = 0; i < squares.length; i++) {
hide( squares[i] );
});
}
function hide(squares) {
var nodeList = squares.childNodes;
for(let i = 0; i < nodeList.length; i++) {
if (nodeList[i].nodeType === Node.ELEMENT_NODE) {
nodeList[i].style.display = "none";
}
});
}
It's worth noting that you could simply use .children instead of .childNodes to return only the elements of a parent node. I don't know if you had a reason for wanting all nodes to be searched through, but this would condense the iteration down to simply setting the style property:
var nodeList = squares.children;
nodeList.forEach(node => node.style.display = "none");
function hideLetters() {
let squares = document.querySelectorAll("#squarearea div");
for (let i = 0; i < squares.length; i++) {
hide(squares[i]);
};
}
function hide(squares) {
var nodeList = squares.childNodes;
for (let i = 0; i < nodeList.length; i++) {
if (nodeList[i].nodeType === Node.ELEMENT_NODE) {
nodeList[i].style.display = "none";
}
};
}
hideLetters();
#squarearea div {
border: solid 1px black;
width: 10px;
padding: 3px;
margin: 10px;
}
<div id="squarearea">
<div><span>a</span></div>
<div><span>b</span></div>
<div><span>c</span></div>
<div><span>d</span></div>
<div><span>e</span></div>
<div><span>f</span></div>
<div><span>g</span></div>
</div>

Get an element's nth-child number in pure JavaScript

I am making a function for my site where I set a data attribute which contains the nth-child number of that element.
My HTML markup:
<html>
<body>
<section class="hardware">some text, nth-child is one</section>
<section class="hardware">some text, nth-child is two</section>
<section class="hardware">some text, nth-child is three</section>
<section class="hardware">some text, nth-child is four</section>
<section class="hardware">some text, nth-child is five</section>
</body>
</html>
My JavaScript so far:
var selector = document.getElementsByClassName('hardware');
for(var i = 0; i <= selector.length; i++) {
var index = selector[i] //get the nth-child number here
selector[i].dataset.number = index;
}
How can I get the nth-child number of an element with pure JavaScript (not jQuery), is this even possible in JavaScript?
Check out this previous answer HERE.
It uses
var i = 0;
while( (child = child.previousSibling) != null )
i++;
//at the end i will contain the index.
When you say "number", do you mean 1, 2, etc or "one", "two", etc?
If 1, 2, etc, then the number is simply i+1...
If "one", "two", etc, then you need to get the text inside the element, then probably use a Regexp to parse it and get the value you want.
Simply incrementing the index linearly will only work if all the elements matching that class name are the only element children of the same parent, with no other elements that could interfere with :nth-child(), as shown exactly in the given markup. See this answer for an explanation on how other elements might interfere. Also review the Selectors spec on :nth-child().
One way to achieve this that is more foolproof is to loop through the child nodes of each element's parent node, incrementing a counter for each child node that is an element node (since :nth-child() only counts element nodes):
var selector = document.getElementsByClassName('hardware');
for (var i = 0; i < selector.length; i++) {
var element = selector[i];
var child = element.parentNode.firstChild;
var index = 0;
while (true) {
if (child.nodeType === Node.ELEMENT_NODE) {
index++;
}
if (child === element || !child.nextSibling) {
break;
}
child = child.nextSibling;
}
element.dataset.number = index;
}
JSFiddle demo
Note that this will apply the correct index regardless of where the given element is in the DOM:
If a particular section.hardware element is the first and only child of a different section, it will be assigned the correct index of 1.
If a .hardware element is the second child of its parent, even if it is the only one with that class (i.e. it follows some other element without the class), it will be assigned the correct index of 2.
I'm going to answer the questions with the following assumptions:
Your hardware classed elements are all siblings
You are interested in nth child not nth child + 1
They can be mixed with other elements:
<body>
<section class="hardware">some text, nth-child is zero</section>
<section class="software"></section>
<section class="hardware">some text, nth-child is two</section>
</body>
(I'm making these assumptions, because this is the problem I'm facing, thought it could be useful)
So the main difference is that instead of querying the elements that belong to a given class, I'm going to get the (direct) children of the body, and filter them.
Array.from(document.body.children)
.map((element, index) => ({element, index}))
.filter(({element}) => element.classList.contains('hardware'))
The resulting array will look like this:
[
{element: section.hardware, index: 0}
{element: section.hardware, index: 2}
]
You can split the text at the spaces at get the last word from each split-array:
var hards = document.getElementsByClassName('hardware');
for (var i=0; i < hards.length; i++) {
var hardText = hards[i].innerText || hard[i].textContent;
var hardList = hardText.split(' ');
var hardLast = hardList[hardList.length - 1];
alert(hardLast);
}
I am using || here because Firefox does not support innerText, while IE does not support textContent.
If the elements only contain text then innerHTML can be used instead of innerText/textContent.
[].slice.call(elem.parentElement.childNodes).indexOf(elem)

Loop through textNodes within selection with unknown number of descendants

I'm required to basically Find and replace a list of words retrieved as an array of objects (which have comma separated terms) from a webservice. The find and replace only occurs on particular elements in the DOM, but they can have an unknown and varying number of children (of which can be nested an unknown amount of times).
The main part I'm struggling with is figuring out how to select all nodes down to textNode level, with an unknown amount of nested elements.
Here is a very stripped-down example:
Retrieved from the webservice:
[{
terms: 'first term, second term',
youtubeid: '123qwerty789'
},{
terms: 'match, all, of these',
youtubeid: '123qwerty789'
},{
terms: 'only one term',
youtubeid: '123qwerty789'
},
etc]
HTML could be something like:
<div id="my-wrapper">
<ol>
<li>This is some text here without a term</li>
<li>This is some text here with only one term</li>
<li>This is some text here that has <strong>the first term</strong> nested!</li>
</ol>
</div>
Javascript:
$('#my-wrapper').contents().each(function(){
// Unfortunately only provides the <ol> -
// How would I modify this to give me all nested elements in a loopable format?
});
The following function is very similar to cbayram's but should be a bit more efficient and it skips script elements. You may want to skip other elements too.
It's based on a getText function I have used for some time, your requirements are similar. The only difference is what to do with the value of the text nodes.
function processTextNodes(element) {
element = element || document.body;
var self = arguments.callee; // or processTextNodes
var el, els = element.childNodes;
for (var i=0, iLen=els.length; i<iLen; i++) {
el = els[i];
// Exclude script element content
// May need to add other node types here
if (el.nodeType == 1 && el.tagName && el.tagName.toLowerCase() != 'script') {
// Have an element node, so process it
self(el);
// Othewise see if it's a text node
// If working with XML, add nodeType 4 if you want to process
// text in CDATA nodes
} else if (el.nodeType == 3) {
/* do something with el.data */
}
}
/* return a value? */
}
The function should be completely browser agnostic and should work with any conforming DOM (e.g. XML and HTML). Incidentally, it's also very similar to jQuery's text function.
One issue you may want to consider is words split over two or more nodes. It should be rare, but difficult to find when it happens.
I think you want
$('#my-wrapper *').each
This should select all the descendants of #my-wrapper no matter what they are.
See this fiddle for an example
I'm not sure if you are looking strictly for a jQuery answer, but here is one solution in JavaScript:
var recurse = function(el) {
// if text node or comment node
if(el.nodeType == 3 || el.nodeType == 8) {
// do your work here
console.log("Text: " + el.nodeValue);
}else {
for(var i = 0, children = el.childNodes, len = children.length; i < len; i++) {
recurse(children[i]);
}
}
}
recurse(document.getElementById("my-wrapper"));
Try the below:
$('#my-wrapper li')

Categories

Resources