Python BeautifulSoup - get elements without child elements - javascript

Example HTML:
<p class="labels">
<span>Item1</span>
<span>Item2</span>
<time class="time">
<span>I dont want to get this span</span>
</time>
</p>
I am currently getting all the spans within the tag with the labels class, but i just want to get the 2 spans directly under the labels class and i dont want to get any span tags from child elements.
Currently i am doing it like this obviously:
First i am getting the labels HTML from a much bigger HTML:
labels = html.findAll(_class="labels")
Then i extract the span tags out of this.
spans = labels[0].findAll('span', {"class": None}
In my case the "class": None doesn't change anything because no span tag has any class.
So my question again is, how can i just get the first 2 span tags without all child elements?

There is a little sentence in the BeautifulSoup Docs where one can find recursive = False
So the answer on this problem was:
spans = labels[0].findAll('span', {"class": None}, recursive=False)

for container in html.findAll(_class="labels"):
spans = container.findAll('span', {"class": None})
spans = [span for span in spans if span.parent is container]
Alternatively iterate the .children:
for container in html.findAll(_class="labels"):
filter = lambda c: c.name == 'span' and c.class_ == None
spans = [child for child in container.children if filter(child)]

To extract first two span elements try below
>>>[i.text for i in html.find('p',{"class":"labels"}).findAll('span', {"class": None})[0:2]]
>>>[u'Item1', u'Item2']
If you want to grab all span inside class labels then remove the slice-
>>>[i.text for i in html.find('p',{"class":"labels"}).findAll('span', {"class": None})]
>>>[u'Item1', u'Item2', u'I dont want to get this span']

Related

Specify optional element in XPath path?

Consider the following HTML:
<div>
text1
</div>
<div>
<span>
text2
</span>
</div>
<div>
text3
</div>
I need to select all the nodes with text1/text2/text3. When I use
/html/body/div[position() > 0]
I obviously don't get the span around text2, but the div around <span>text2</span>. How can I say: If there is a span following the div, then return the span; if the div is already the last element in a path, return the div? So the intended nodes would be:
div[0]
div[1]/span
div[2]
Update: This one works, but is there a shorter way to do it? (e.g. I am writing /html/body/divin both of them, is it possible to make the pipe symbol (or) at a later place?)
/html/body/div[position() > 0 and count(*) = 0] | /html/body/div[position() > 0]/span
I order to select a node with text content in it, you can use the text() selector.
So if you want select all nodes with some text content form a root node, you can use this xpath selector:
//ROOT_NODE//text()
So, for your example and as you said in your comment:
/html/body/div//text()

How to get the index of an element containing specific child

I have a series of div elements, some of which contains h2 element as children (direct or indirect (descendant)). I look for a javascript or jquery to give me the index of such a div element. Moreover I want to start the search from a specific div element. I mean a div element with an index greater than x containing h2. The following code find the ones in the series of all divs containing h2. However, some of divs may not contain h2 and should be counted.
$(".myDivs h2:gt(2)").eq(0)
Html example:
<div class='myDivs'></div>
<div class='myDivs'><h2>Hi</h2></div>
<div class='myDivs'></div>
<div class='myDivs'><div><h2>How are you</h2></div></div>
<div class='myDivs'></div>
if x=2 I want the index of 3 for the fourth div with class myDivs containing an h2.
This is what I found so far, which works:
$(".myDivs:gt("+x+"):has(h2)").eq(0).index()
Since gt is deprecated, the better solution is:
$(".myDivs").slice(x).has("h2").first().index()
Basically, I think you want a jquery something like this:
$(start).find("div").find("h2").parent("div").index();
From the start element, find your divs, then find the h2 tags in the divs, then get the parent divs of the h2s, and then get the index.
You may need a .each (...) after .find("h2");

How can I select all the divs inside a section tag using javascript?

I'm making a turn-based game in javascript. I want to move the player from div to div. I have put all those divs in a section and then into an array using queryselectorall. Now my problem is that I also have another divs who I want to use and I can't select them separately. Can anyone tell me how to select only some divs? I have seen something like section>div to differentiate them, but that doesn't work for me.
I have tried replacing div with span on rollDice, zar1, and zar2, but by doing that some CSS breaks.
~
<div class="rollDice">Roll the dice</div>
<div class="zar1">
<img src="poze/dice-5.png" alt="Dice" class="dice" id="dice-1" style="width:150px">
</div>
<div class="zar2">
<img src="poze/dice-5.png" alt="Dice" class="dice2" id="dice-2" style="width:150px">
</div>
<section class="mutari">
<div class="nr1 mutabil"><h1>1</h1></div>
<div class="nr2 mutabil"><h1>2</h1></div>
<div class="nr3 mutabil"><h1>3</h1></div>
</section>
~
I want to select the div only from the section. And after that I want to select the first 3 divs.
What you are looking for is:
document.querySelectorAll('section > div:nth-child(-n+3)')
section (a type selector) finds your <section>. If you had more section elements, you could use section.mutari to be more precise (using a class selector).
> div selects all the <div> tags that are direct children of that section. > is a child combinator.
:nth-child(-n+3), a pseudo-class, restricts this to only select the first three elements, not all of them. It is not needed in your example, as you only have three divs; but if you had more, this would give you only the first three.
With document.body.childNodes
Just replace document.body with your HTML Element.
You can filter after that through the list you get an select all divs.
If you want to get all divs you can also use following:
var dh = document.body.getElementsByTagName('div');
Get all div nodes:
Use document.body.getElementsByTagName('div')
Or
Get filtered div nodes:
Take array from document.body.childNodes.
filter by using for loop and if condition.
Condition Example: use like node[i].nodeName and node[i].id

Javascript to change text of a span within div

My html is like this, I can only identify the div's class, there are no span' ids. I need to replace one href text and one image with some other text within those spans.
<div class ="myclass">
<span style="vertical-align:middle;">
</span>
<span style="vertical-align:middle;">
</span>
<span style="vertical-align:middle">
<span class="myspan">
<a href="http://testlink3">
<img title="test" class="imglink"></a>
</span>
</span>
<span>
Text - *This text needs to be replaced*
</span>
</div>
in the above code, I need to replace the img within the third span with a clickable text (which should take us to url) and the text within fourth span to a new text (keeping the url the same).
How can I get identify these specific spans when they are missing ids/classes?
We have 3 different things to do here:
How to replace the content inside a given element
This can be done very quickly:
$("selector").html("New text, same href");
Replace a given element with another
This can be done this way:
$("selector").replaceWith("<a href='somewhere.html'>I replaced an Img</a>");
Selecting the DOM elements
When you don't have an ID, nor a CSS class for your element, but you do know its position within another element plus some info about the element (like tagName), you can select the parent element and specify a relative position.
var myElement = $("parentElement").find("tagName:eq(position)");
Remember that this kind of selector ( "tagName:eq(position)") is zero indexed, so if you want to grab the third element, you need to tell jQuery tagName:eq(2).
So, let's say you parent element (not given in the question) is a div with a parent CSS class.
First thing you want to do is select this div.
var parent = $(".parent");
Then you want to find the Img within the third span.
var myImg = parent.find("span:eq(2)").find("img");
Now you can replace this element with the whatever you want
myImg.replaceWith("<a href='somewhere.html'>I replaced an Img</a>");
Note that jQuery allows you to pass HTML elements as a plain string.
Finally, you need to change the text inside the fourth span. This can be accomplished this way:
parent.find("span:eq(3)").find("a").html("New text, same href");
You could use document.querySelector to select an a based on the href:
document.querySelector("a[href='http://link4']").innerHTML = "The text you want to put in"
Since you're open to jQuery, this works too:
$("a[href='http://link4']").text("The text you want to put in")
var s = document.getElementsByTagName('span');
var i = spans[2].firstChild.children[1]; // here you find your img
i.parentNode.appendChild(<<your new text element>>);
i.parentNode.removeChild(img);// remove the image
var a = spans[3].firstChild; // here is your href
a.innerHTML = 'your new text';
You could use :nth-child() selector to select from the div you can identify.
More on :nth-child(): http://api.jquery.com/nth-child-selector/
Then select the img tag from the child span you found.

How do I access text inside an element while ignoring some text inside a tag adjacent to the text? [duplicate]

This question already has answers here:
Using .text() to retrieve only text not nested in child tags
(30 answers)
Closed 1 year ago.
What is a good way to get the text out of a jQuery element when the text itself is adjacent to another element containing text?
In this example, I want to get at the text: 'Text I want' while ignoring the text in the adjacent child element:
<span>
<a>Text I want to ignore</a>
Text I want
</span>
My solution was to get all the text in the <span> tag and then delete all the text in the <a> tag. This feels a little awkward so I'm wondering if there is a better way:
var all_the_text = $('span').text();
var the_text_i_dont_want = $('span').find('a').text();
var text_i_want = all_the_text.replace(the_text_i_dont_want, '');
You have to go to the text nodes for this:
var text_i_want = $("span").contents().filter(function(){
return this.nodeType === 3;
}).text();​
http://jsfiddle.net/UeBZq/
$("span")
.clone()
.children()
.remove()
.end()
.text();
should do it
to give proper credit :) http://viralpatel.net/blogs/jquery-get-text-element-without-child-element/
remove a tag and get the span contents. Working Demo
<span>
<a>Text I want to ignore</a>
Text I want
</span>​
var all_the_text = $('span').find('a').remove();
alert($('span').text());

Categories

Resources