Find elements of similar hierarchy by JavaScript (for web scraping) - javascript

For example, when I select one of the p.item-title elements below, all p.item-title elements should be found (not by the class name). Also, when I select one of the table elements below, all similar tables should be found. I need this for web scraping.
<div>
<div>
<p class="item-title">...</p>
<table>...</table>
</div>
</div>
<div>
<div>
<p class="item-title">...</p>
<table>...</table>
</div>
</div>
jQuery's siblings() method is similar in concept, but it finds similar elements under the same parent node. Is there any method or library to find similar elements from different parent nodes?

Just do querySelectorAll by the path (hierarchy) you want:
var allElements = document.querySelectorAll("div > div > p");
allElements.forEach(p => console.log(p));
<div>
<div>
<p class="item-title">Text 1</p>
<table>...</table>
</div>
</div>
<div>
<div>
<p class="item-title">Text 2</p>
<table>...</table>
</div>
</div>

Try this:
jQuery.fn.addEvent = function(type, handler) {
this.bind(type, {'selector': this.selector}, handler);
};
$(document).ready(function() {
$('.item-title').addEvent('click', function(event) {
console.log(event.data.selector);
let elements = document.querySelectorAll(event.data.selector);
elements.forEach(e => console.log(e));
});
});

Thanks to Jack, I could create a running script.
// tags only selector (I need to improve depending on the use case)
function getSelector(element){
var tagNames = [];
while (element.parentNode){
tagNames.unshift(element.tagName);
element = element.parentNode;
}
return tagNames.join(" > ");
}
function getSimilarElements(element) {
return document.querySelectorAll(element);
}

Related

Clearing div child elements without removing from the DOM

My goal is, using Jquery or vanilla JS, to clear the inner text only of a div and each of its child elements while keeping all elements intact after the fact. In the example below, the div is student_profile.
Answers on SO have recommended the functions .html('') and .text('') but, as my example shows below, this completely removes the child element from the DOM (my example shows only one function but both actually remove the element). Is there a function that would remove all of the text from the current div and child divs while keeping the elements themselves intact?
Any advice here would be appreciated!
function cleardiv() {
console.log(document.getElementById("student_name"));
$('#student_profile').html('');
console.log(document.getElementById("student_name"));
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div id='student_profile'>
<h1 id="student_name">Foo Bar</h1>
<p id="student_id">123</p>
<p id="studen_course">math</p>
<p id="last_reported">2021-01-01</p>
</div>
<button onclick="cleardiv()">Clear</button>
One option is to select all text node descendants and .remove() them, leaving the actual elements intact:
const getTextDecendants = (parent) => {
const walker = document.createTreeWalker(
parent,
NodeFilter.SHOW_TEXT,
null,
false
);
const nodes = [];
let node;
while (node = walker.nextNode()) {
nodes.push(node);
}
return nodes;
}
function cleardiv() {
for (const child of getTextDecendants($('#student_profile')[0])) {
child.remove();
}
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div id='student_profile'>
<h1 id="student_name">Foo Bar</h1>
<p id="student_id">123</p>
<p id="studen_course">math</p>
<p id="last_reported">2021-01-01</p>
</div>
<button onclick="cleardiv()">Clear</button>
You can try the selector #student_profile * to include all the child elements.
function cleardiv() {
console.log(document.getElementById("student_name"));
$('#student_profile *').text('');
console.log(document.getElementById("student_name"));
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div id='student_profile'>
<h1 id="student_name">Foo Bar</h1>
<p id="student_id">123</p>
<p id="studen_course">math</p>
<p id="last_reported">2021-01-01</p>
</div>
<button onclick="cleardiv()">Clear</button>
If it's only direct children you're looking to affect, you can iterate the childNodes of the parent element. This will clear both element nodes as well as non-element nodes such as text nodes. Here using the NodeList#forEach() method provided by the returned NodeList.
function cleardiv() {
document.getElementById('student_profile')
.childNodes
.forEach((node) => (node.textContent = ''));
console.log(document.getElementById('student_name'));
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div id='student_profile'>
<h1 id="student_name">Foo Bar</h1>
<p id="student_id">123</p>
<p id="studen_course">math</p>
<p id="last_reported">2021-01-01</p>
</div>
<button onclick="cleardiv()">Clear</button>

How to get all specific elements that apears after an specific element

I want to get all specific elements that appears in DOM after a specific element by jQuery.
for example all p elements that appears after p element with id="pr3".`
<div>
<p id='pr3'></p>
<span></span>
<p></p> //this p element
</div>
<div>
<p></p> //this p element
<table>...</table>
</div>
<p></p> //this p elemnt and ...
for example change color of all p elements that located after id='pr3'
You can use .index() to get the benchmark element. resource: https://api.jquery.com/index/
var current_p = $( "#pr3" );
var current_p_index=$( "p" ).index( current_p )
console.log( current_p_index );
Use the .slice() method constructs to get all p elements that appears after id='pr3'. resource: https://api.jquery.com/slice/
$( "p" ).slice( current_p_index ).css( "background-color", "red" );
There's nothing built-in that will do this, but jQuery (and the DOM) give you all the tools you need to do it. I got to wondering how hard it would be, and it turns out not to be that bad, see comments:
// Add elements following the given set of elements
// that match the given selector to the target
function addFollowing(target, set, selector) {
target = target.add(set.nextAll(selector));
target = target.add(set.nextAll().find(selector));
return target;
}
// Start with #pr3
let set = $("#pr3");
let paras = $();
// Add the ones that follow it at that level
paras = addFollowing(paras, set, "p");
// Loop up through parents
set = set.parent();
while (set[0] && set[0] !== document.documentElement) {
// Add the ones that follow at this level
paras = addFollowing(paras, set, "p");
set = set.parent();
}
console.log(`Found ${paras.length} p elements`);
paras.css("color", "green");
<p>no</p>
<div>
<p>no</p>
<div>
<p>no</p>
</div>
<div>
<p>no</p>
<p id="pr3">pr3</p>
<span></span>
<p>yes 1</p>
</div>
</div>
<div>
<p>yes 2</p>
<table>
<tr>
<td>
<p>yes 3</p>
</td>
</tr>
</table>
</div>
<p>yes 4</p>
<div>
<div>
<p>yes 5</p>
</div>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

Find nearest element, not closest

I had an impression that closest() will give me nearest matching element as it suggest but I was wrong, it will give me nearest ancestor, so does parents(). So how can I get nearest element?
e.g. I have 2 divs as below
<div id="clickme">
Click me
</div>
<div class="findme" style="display:none">
Find me
</div>
what to do if I want to get .findme with reference with #clickme.
Can I do something like $('#click').helpfullFunctionThatGivesNearestElement('.findme')?
Or do I need to scan entire DOM like $('findme')? but there can be 100s of .findme then how will I find nearest to specific element?
Update
.findme can be anywhere in DOM.
This is how recursively traversing the DOM looks like.
I've implemented the function $.fn.nearest, as OP asumes to use jQuery, so it can be called with $('clickme').nearest('.findme');
The method will find multiple elements** (if they share the same distance from the starting node) by looking towards and backwards in every direction (parent, children, next and prev) by recursively searching through the nearest checked elements. It also avoids checking an element over and over again (i.e. the parent of multiple children is only checked once).
If you don't need a particular direction to be checked, i.e. children or prev you can just comment that part.
Some checks are made before the recursion is done. When the selector is not found in the DOM an empty jQuery element is returned, also when there is only one element found that found element is returned.
I haven't tested it's efficiency with a large HTML, it all depends on how far the desired element is located, and that is directly related to the complexity of the HTML structure. But for sure it is exponential, something close to O(n³) or O(n⁴).
Give it a try.
$.fn.nearest = function(selector) {
var allFound = $(selector);
if (!allFound.length) // selector not found in the dom
return $([]);
if (allFound.length == 1) // found one elem only
return allFound;
else
return nearestRec($(this), selector);
function nearestRec(elems, selector) {
if (elems.length == 0)
return this;
var selector = selector;
var newList = [],
found = $([]);
$(elems).each(function(i, e) {
var options = e[1] || {};
e = $($(e)[0]);
// children
if (!options.ignoreChildren)
updateFound(e.children(), selector, newList, found, {
ignoreParent: true
});
// next
if (!options.ignoreNext)
updateFound(e.next(), selector, newList, found, {
ignoreParent: true,
ignorePrev: true
});
// prev
if (!options.ignorePrev)
updateFound(e.prev(), selector, newList, found, {
ignoreParent: true,
ignoreNext: true
});
// parent
if (!options.ignoreParent)
updateFound(e.parent(), selector, newList, found, {
ignoreChildren: true
});
});
return found.length && found || nearestRec(newList, selector);
function updateFound(e, selector, newList, found, options) {
e.each(function() {
var el = $(this);
if (el.is(selector)) {
found.push(el);
return;
}
newList.push([el, options]);
});
}
}
};
$(function() {
// multiple elems found, traverse dom
$(".clickme").nearest(".findme").each(function() {
$(this).addClass("found");
});
});
div {
padding: 5px 3px 5px 10px;
border: 1px solid #666;
background-color: #fff;
margin: 3px;
}
.found {
background-color: red;
}
.clickme {
background-color: #37a;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div class="findme">
findme
<div class="findme">
findme
<div>
<div>
</div>
<div>
</div>
<div class="clickme">
clickme
<div>
</div>
<div>
<div class="findme">
findme
<div class="findme">
findme
</div>
</div>
<div class="findme">
findme
</div>
</div>
</div>
<div>
<div class="findme">
findme
</div>
</div>
<div class="findme">
findme
</div>
</div>
<div class="findme">
findme
<div>
</div>
<div>
</div>
<div class="findme">
findme
<div class="findme">
findme
</div>
</div>
<div>
</div>
<div class="findme">
findme
</div>
</div>
</div>
</div>
jquery plugin:
(function( $ ){
$.fn.nextElementInDom = function(selector, options) {
var defaults = { stopAt : 'body' };
options = $.extend(defaults, options);
var parent = $(this).parent();
var found = parent.find(selector + ":first");
switch(true){
case (found.length > 0):
return found;
case (parent.length === 0 || parent.is(options.stopAt)):
return $([]);
default:
return parent.nextElementInDom(selector);
}
};
})( jQuery );
Usage:
$('#clickme').nextElementInDom('.findme');
Instead of traversing the entire DOM tree, you can try to locate an element with reference to its enclosing parent.
<div>
<div id="clickme" onclick="$(this).parent().find('.findme').show();">
Click me
</div>
<div class="findme" style="display:none">
Find me
</div>
</div>
This will however work only if the element that you search has a same ancestral parent.

Javascript onmouseover and onmouseout

You can see in the headline what it is. I've four "div", and therein are each a p tag. When I go with the mouse on the first div, changes the "opacity" of the p tag of the first div. The problem is when I go on with the mouse on the second or third "div" only changes the tag "p" from the first "div". It should changes the their own "p" tags.
And it is important, that i cannot use CSS ":hover".
The problem is clear, it is that all have the same "id".
I need a javascript which does not individually enumerated all the different classes.
I' sorry for my english.
I hope you understand me.
My script:
<div onmouseout="normal();" onmouseover="hover();" >
<p id="something">LOLOL</p>
</div>
<div onmouseout="normal();" onmouseover="hover();" >
<p id="something">LOLOL</p>
</div>
<div onmouseout="normal();" onmouseover="hover();" >
<p id="something">LOLOL</p>
</div>
<div onmouseout="normal();" onmouseover="hover();" >
<p id="something">LOLOL</p>
</div>
Javascript:
function normal() {
var something = document.getElementById('something');
something.style.opacity = "0.5";
}
function hover() {
var something = document.getElementById('something');
something.style.opacity = "1";
CSS:
p {
opacity: 0.5;
color: red;
}
As Paul S. suggests, you need to pass this to the function so that it knows which element it has to work on.
<div onmouseout="normal(this);" onmouseover="hover(this);" >
<p>LOLOL</p>
</div>
<div onmouseout="normal(this);" onmouseover="hover(this);" >
<p>LOLOL</p>
</div>
<div onmouseout="normal(this);" onmouseover="hover(this);" >
<p>LOLOL</p>
</div>
<div onmouseout="normal(this);" onmouseover="hover(this);" >
<p>LOLOL</p>
</div>
And then select the child element <p> for the passed <div>. Here I select the first child p, i.e. the first element in the array of children of this element with tag p, that's why you see [0]. So if in each div you had two paragraph, then you could use e.g. getElementsByTagName("p")[1] to select the second <p>.
function normal(mydiv) {
mydiv.getElementsByTagName("p")[0].style.opacity="0.5";
}
function hover(mydiv) {
mydiv.getElementsByTagName("p")[0].style.opacity="1";
}
See the working example here: http://jsfiddle.net/mastazi/2REe5/
Your html should be something like this:
<div onmouseout="normal(1);" onmouseover="hover(1);">
<p id="something-1">LOLOL</p>
</div>
<div onmouseout="normal(2);" onmouseover="hover(2);">
<p id="something-2">LOLOL</p>
</div>
<div onmouseout="normal(3);" onmouseover="hover(3);">
<p id="something-3">LOLOL</p>
</div>
<div onmouseout="normal(4);" onmouseover="hover(4);">
<p id="something-4">LOLOL</p>
</div>
As you can see, we have different ids for your elements, and we pass the ids through the function that we trigger with onlouseover and onmouseout.
For your javascript, your code could be something like this:
function normal(id) {
var something = document.getElementById('something-'+id);
something.style.opacity = "0.5";
}
function hover(id) {
var something = document.getElementById('something-'+id);
something.style.opacity = "1";
}
For normal() and hover() we receive an id and change the style for the current element that have this id.
Please, check this JSFiddle that I've built for you.

Target <div> with Javascript?

I have code similar to this:
<div>
<div> randomText 11235 </div>
<div> randomText </div>
<div> randomText </div>
<div> randomText </div>
<div> randomText </div>
</div>
As you can see, each div contains random text but the first div also has a "11235" added at the end.
This "11235" will be added at the end of any of the divs (Note that this can be multiple divs).
Here is one scenario that can happen:
<div>
<div> randomText </div>
<div> randomText 11235 </div>
<div> randomText </div>
<div> randomText 11235 </div>
<div> randomText </div>
</div>
Is is possible to be able to target those divs that only have the 11235 added onto the end and remove the 11235.
I would prefer a solution in javascript (jQuery is fine).
Thanks
Using jQuery you could iterate all divs containing 11235 and replace '11235' in the text of those elements using the standard js-replace function:
$("div div:contains('11235')").each(function () {
$(this).text($(this).text().replace("11235",""));
});
As noted by Rick you could also use implicit iteration, it's slightly more elegant:
$("div div:contains(11235)").text(function (i, text) {
return text.replace("11235","");
});
If there's any chance each div element might contain multiple occurances of 11235 use the regex-version of replace:
$("div div:contains(11235)").text(function (i, text) {
return text.replace(/11235/g,"");
});
You can use the :contains() selector and the html() method of jQuery, like this:
$("div:contains(11235)").html(function( i, html ) {
return $.trim( ( html || "" ).replace(/11235/g, "") );
});
This works without the need for iterating with each() as jQuery is built on the contept of "implied iteration", meaning all setter methods will automatically iterate each element they receive from the calling jQuery object (unless otherwise noted).
See: http://jsfiddle.net/rwaldron/DeybW/

Categories

Resources