i am new to js.
can you tell me why I am getting empty values for sports-title and third.
since we have one div with content in it.
sports-title---->{"0":{}}
third---->{}
providing my code below.
findStringInsideDiv() {
/*
var str = document.getElementsByClassName("sports-title").innerHTML;
*/
var sportsTitle = document.getElementsByClassName("sports-title");
var third = sportsTitle[0];
var thirdHTML = third.innerHTML
//str = str.split(" ")[4];
console.log("sports-title---->" + JSON.stringify(sportsTitle));
console.log("third---->" + JSON.stringify(third));
console.log("thirdHTML---->" + JSON.stringify(thirdHTML));
if ( thirdHTML === " basketball football swimming " ) {
console.log("matching basketball---->");
var menu = document.querySelector('.sports');
menu.classList.add('sports-with-basketball');
// how to add this class name directly to the first div after body.
// but we are not rendering that div in accordion
//is it possible
}
else{
console.log("not matching");
}
}
When you call an object in the Document Object Model (DOM) using any of the GetElement selectors, it returns an object that can be considered that HTML element. This object includes much more than just the text included in the HTML element. In order to access the text of that element, you want to use the .textContent property.
In addition, an HTML class can potentially be assigned to several elements and therefore GetElementsByClassName returns an array so you would have to do the following, for example:
console.log("sports-title---->" + JSON.stringify(sportsTitle[0].textContent));
You can find a brief introduction to the DOM on the W3Schools Website. https://www.w3schools.com/js/js_htmldom.asp If you follow along it gives an overview of different aspects of the DOM including elements.
Maybe this would be helpful
As you see sportsTitle[0].textContent returns full heading and 0 is the index thus you get "0" when you stringify (serialize) sportsTitle. Why 0? Because you have one <h1> element . See this fiddle http://jsfiddle.net/cqj6g7f0/3/
I added second h1 and see the console.log and you get two indexes 0 and 1
if you want to get a word from element so get substring use substr() method https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substr
One way is to change <h1> class attr to id and do sportsTitle.textContent;
and use substr() on this string
or
2nd way is to remain class attr and do sportsTitle[0].textContent;
and substr() on this string
The 2nd is the better way
Related
I am trying to scrape the following Javascript frontend website to practise my Javascript scraping skills:
https://www.oplaadpalen.nl/laadpaal/112618
I am trying to find two different elements by their xPath. The first one is the title, which it does find. The second one is the actual text itself, which it somehow fails to find. It's strange since I just copied the xPath's from Chrome browser.
from selenium import webdriver
link = 'https://www.oplaadpalen.nl/laadpaal/112618'
driver = webdriver.PhantomJS()
driver.get(link)
#It could find the right element
xpath_attribute_title = '//*[#id="main-sidebar-container"]/div/div[1]/div[2]/div/div[' + str(3) + ']/label'
next_page_elem_title = driver.find_element_by_xpath(xpath_attribute_title)
print(next_page_elem_title.text)
#It fails to find the right element
xpath_attribute_value = '//*[#id="main-sidebar-container"]/div/div[1]/div[2]/div/div[' + str(3) + ']/text()'
next_page_elem_value = driver.find_element_by_xpath(xpath_attribute_value)
print(next_page_elem_value.text)
I have tried a couple of things: change "text()" into "text", "(text)", but none of them seem to work.
I have two questions:
Why doesn't it find the correct element?
What can we do to make it find the correct element?
Selenium's find_element_by_xpath() method returns the first element node matching the given XPath query, if any. However, XPath's text() function returns a text nodeānot the element node that contains it.
To extract the text using Selenium's finder methods, you'll need to find the containing element, then extract the text from the returned object.
Keeping your own logic intact you can extract the labels and the associate value as follows :
for x in range(3, 8):
label = driver.find_element_by_xpath("//div[#class='labels']//following::div[%s]/label" %x).get_attribute("innerHTML")
value = driver.find_element_by_xpath("//div[#class='labels']//following::div[%s]" %x).get_attribute("innerHTML").split(">")[2]
print("Label is %s and value is %s" % (label, value))
Console Output :
Label is Paalcode: and value is NewMotion 04001157
Label is Adres: and value is Deventerstraat 130
Label is pc/plaats: and value is 7321cd Apeldoorn
I would suggest a slightly different approach. I would grab the entire text and then split one time on :. That will get you the title and the value. The code below will get Paalcode through openingstijden labels.
for x in range(2, 8):
s = driver.find_element_by_css_selector("div.leftblock > div.labels > div")[x].text
t = s.split(":", 1)
print(t[0]) # title
print(t[1]) # value
You don't want to split more than once because Status contains more semicolons.
Going with #JeffC's approach, if you want to first select all those elements using xpath instead of css selector, you may use this code:
xpath_title_value = "//div[#class='labels']//div[label[contains(text(),':')] and not(div) and not(contains(#class,'toolbox'))]"
title_and_value_elements = driver.find_elements_by_xpath(xpath_title_value)
Notice the plural elements in the find_elements_by_xpath method. The xpath above selects div elements that are descendants of a div element that had a class attribute of "labels". The nested label of each selected div must contain a colon. Furthermore, the div itself may not have a class of "toolbox" (Something that certain other divs on the page have), nor must it contain any additional nested divs.
Following which, you can extract the text within the individual div elements (which also contain the text from the nested label elements) and then split them using ":\n" which separates the title and value in the raw text string.
for element in title_and_value_elements:
element = element.text
title,value = element.split(":\n")
print(title)
print(value,"\n")
Since you want to practice JS skills you can do this also in JS, actually all the divs contain more data, you can see if you do paste this in the browser console:
labels = document.querySelectorAll(".labels");
divs = labels[0].querySelectorAll("div");
for (div of divs) console.log(div.firstChild, div.textContent);
you can push to an array and check only divs and that have label and return the resulted array in a python variable:
labels_value_pair.driver.execute_script('''
scrap = [];
labels = document.querySelectorAll(".labels");
divs = labels[0].querySelectorAll("div");
for (div of divs) if (div.firstChild.tagName==="LABEL") scrap.push(div.firstChild.textContent, div.textContent);
return scrap;
''')
I am trying to Get First Child with classname Plain Javascript.
I am trying to write my own form validation and trying the error message i appended and remove it. As well as dont append if error message is already there.
If you help me with just the first part getting child with class name that whould be great.
function display_error(selector, message) {
selector.insertAdjacentHTML('afterend', "<h1 class='js-error' >" + message + "</h1>");
}
function validateForm() {
// Validate Name Field
// Check if name has les than 3
var elem = document.getElementById("name")
if (elem.value.length < 3) {
display_error(elem, "Less than 3")
return false;
} else {
// here is the error
error_label = elem.querySelector('.js-error');
error_label.textContent = "more than 3"
}
}
here is a fiddle
https://jsfiddle.net/efh941cc/3/
The beautiful thing about document.querySelector() is that you can use CSS selectors rather than the, often clunky, DOM API.
CSS provides a very simple selector called first-child which does exactly what you need.
// Find the first element that uses the .test class that is a child of another element.
var firstTest = document.querySelector(".test:first-child");
// Now that you've scanned and found the element and stored a reference to it
// in a variable, you can access any aspect of the element.
console.log(firstTest.textContent);
firstTest.innerHTML = "<span>Now, I have completely different content than before!</span>";
// Now, we can get a reference to other elements that are relative to the last
// one we found.
var firstTestError = document.querySelector(".test:first-child + .error");
firstTestError.style.backgroundColor = "aqua";
firstTestError.innerHTML = "<span>Required</span>";
<div>
<span class="test">one</span><span class="error"></span>
<div class="test">two</div>
<div class="test">three</div>
</div>
In modern JavaScript, to get the first child with a class name, you can use the following:
document.querySelector('element.class:first-child')
Here, you supply the actual element, and the actual class name.
document.querySelector is available in all modern browsers, and will take any string which matches a CSS selector. It even works in IE8, though the :first-child pseudo class is not available there.
const GetFirstChild = document.querySelector(' .PlainJavascript');
In one of my projects I just discovered, that sometimes iterating over an array of html elements (and change all of them) just affects the last element. When I log the element's attributes I can see that the loop definitily adresses every element but nevertheless visibly just the last element is getting changed.
Can anyone explain me why?
I already figured out, that a solution is to use createElement() and appendChild() instead of insertHTML. I just want to understand why javascript behaves like this.
Here is my example code:
/* creating 5 elements and storing them into an array */
var elementArray = [];
for(var n = 0;n<5;n++)
{
document.body.innerHTML += "<div id='elmt_"+n+"'>"+n+"</div>\n";
elementArray[n] = document.getElementById("elmt_"+n);
}
/* loop over these 5 elements */
for(var n = 0;n<5;n++)
{
console.log(elementArray[n].id); // logs: elmt_0 elmt_1 elmt_2 elmt_3 elmt_4
elementArray[n].innerHTML = "test"; // changes just the last element (elmt_4) to "test"
}
I created an example here: http://jsfiddle.net/qwe44m1o/1/
1 - Using console.log(elementArray[n]); in your second loop shows that innerHTML in this loop is modifying html inside your array, not in your document. That means that you are storing the div element in your array, not a shortcut to document.getElementById("elmt_"+n)
See the JSFiddle
2 - If you want to store a shortcut in order to target an element by ID, you have to add quotes for elementArray[n] = "document.getElementById('elmt_"+n+"')";, and use it with eval like this : eval(elementArray[n]).innerHTML = n+"-test";
See the JSFiddle for this try
If I have the following:
<p class="demo" id="first_p">
This is the first paragraph in the page and it says stuff in it.
</p>
I could use
document.getElementById("first_p").innerHTML
to get
This is the first paragraph in the page and it says stuff in it.
But is there something simple you can run which would return as a string
class="demo" id="first_p"
I know I can iterate through all of the element's attributes to get each one individually but is there a function which returns tagHTML or something like that?
The following code is something of a mouthful: I wrote it as a one-liner, but I've broken it out into several lines here. But this will get you a plain object where the keys are attribute names and the values are the values of the corresponding attributes:
Array.prototype.reduce.call(
document.getElementById('first_p').attributes,
function (attributes, currentAttribute) {
attributes[currentAttribute.name] = currentAttribute.value;
return attributes;
},
{}
);
Going through this, document.getElementById('first_p').attributes gets you a NamedNodeMap of the element's attributes. A NamedNodeMap is not an Array, but Array.prototype.reduce.call(...) calls Array.prototype.reduce on the NamedNodeMap as if it were an Array. We can do this because NamedNodeMap is written so that it can be accessed like an array.
But we can't stop here. That NamedNodeMap that I mentioned is an array of Attr objects, rather than an object of name-value pairs. We need to convert it, which is where the other arguments to Array.prototype.reduce come into play.
When it's not being called in a strange way, Array.prototype.reduce takes two arguments. The second argument (which is third for us because of the way we called it) is an object that we want to build up. In our case, that's a brand-new bare object: the {} that you see at the end.
The first argument to Array.prototype.reduce (which, again, is second for us) is another function. That function will get called once for each item in the loop, but it takes two arguments. The second argument is the current loop item, which is easy to understand, but the first argument is a little wild. The first time we call that function, its first argument is the object we want to build up (i.e. the last argument to Array.prototype.reduce. Each time after that, the first argument is whatever that function returned the last time it was called. Array.prototype.reduce returns whatever the last call to its inner function returned.
So we start with an empty object. Then for every Attr in the element's attributes, we add something to the object, and return it. When the last call finishes, the object is finished, so we return that. And this is how we make the attribute list.
If you wanted the exact code in the tag, like a String, then I'm afraid there is no standard function to get that exactly. But we can get a close approximation of that code, with a similar setup:
Array.prototype.map.call(
document.getElementById('first_p').attributes,
function (currentAttribute) {
return currentAttribute.name + '=' + JSON.stringify(currentAttribute.value);
}
).join(' ');
The basic principle is the same: we take that NamedNodeMap and call an Array function on it, but this time we're using map instead of reduce. You can think of map as a special case of reduce: it always builds up an Array, with one element for every element that was in the original. Because of that, you don't even need to mention the object you're building up: the callback function only has one argument, and we just return the thing to put into the new Array. Once we're done, we have an Array of 'name="value"' strings, and then we just join that with ' '.
It isn't a built-in property, but you can use the array-like object attributes to obtain what you're looking for.
Array.prototype.map.call(element.attributes, function (el) {
return el.name + '="' + el.value + '"';
}).join(' ')
This is assuming a browser that supports the map function. The Array.prototype.map.call part is because attributes is not really an array and does not have a join method, but because it's an array-like JavaScript's dynamism allows us to call map on it anyway.
Example from the current page with the footer div:
var element = document.getElementById('footer')
Array.prototype.map.call(element.attributes, function (el) {
return el.name + '="' + el.value + '"';
}).join(' ');
// "id="footer" class="categories""
You can try the following:-
var attributes = '';
for(var i=0; i<document.getElementById("first_p").attributes.length; i++){
var attr = document.getElementById("first_p").attributes[i];
attributes += attr.nodeName+"='"+attr.nodeValue+"' "
}
console.log(attributes);
You can use document.getElementById("first_p").attributes to get an array of all the attributes on that DOM element
If you wanted them all in one string just do: document.getElementById("first_p").attributes.join(' ') to get the desired output
Well, while nothing currently exists to do this directly (though the approaches using the Node's attributes is a more reliable approach, one option is to create this method yourself:
HTMLElement.prototype.tagHTML = function(){
// we create a clone to avoid doing anything to the original:
var clone = this.cloneNode(),
// creating a regex, using new RegExp, in order to create it
// dynamically, and inserting the node's tagName:
re = new RegExp('<' + this.tagName + '\\s+','i'),
// 'empty' variables for later:
closure, str;
// removing all the child-nodes of the clone (we only want the
// contents of the Node's opening HTML tag, so remove everything else):
while (clone.firstChild){
clone.removeChild(clone.firstChild);
}
// we get the outerHTML of the Node as a string,
// remove the opening '<' and the tagName and a following space,
// using the above regular expression:
str = clone.outerHTML.replace(re,'');
// naively determining whether the element is void
// (ends with '/>') or not (ends with '>'):
closure = str.indexOf('/>') > -1 ? '/>' : '>';
// we get the string of HTML from the beginning until the closing
// string we assumed just now, and then trim any leading/trailing
// white-space using trim(). And, of course, we return that string:
return str.substring(0,str.indexOf(closure)).trim();
};
console.log(document.getElementById('test').tagHTML());
console.log(document.getElementById('demo').tagHTML());
JS Fiddle demo.
as i am getting tough time to list out all spans, which having class="ansspans", may be one or more span with "ansspans" classes will be there, i need to get all the spans with its content and iterate through it. can you tell me how to do it, Regex, Jquery, any thing ok,
The content will be in string variable (not in DOM), as IE 9 ignoring quotes from attribute, so i cant use getelementbyclass name,and i followed this answer, to get quotes InnerHTml workAround, now its displaying with quotes. so i need get all the class of ansspans, in an array, so that i'll iterate it n get the text content of each span
<span id="sss_ctl00_ctl06_lblanswertext">
assignment
<span class="ansspans">submission </span>
date : 10:07:51 AM
</span>
in this eg, expected output will be 1 span object, so that i can iterate over it
Update : I cant use DOm, as we are in quirks mode, so ie 9 will ignore attribute quotes, which i cant traverse using getelement by class name, . so , i need to match all spans in a string variable. hope everyone understood my problem ;(
(as been said on this site before, sometimes it's ok to parse a limited, known set of xml with regex)
//assumes no <span id="stupid>" class="ansspans">, and no embedded span.ansspans
var data = ' <span id="sss_ctl00_ctl06_lblanswertext"> assignment \n date : 10:07:51 AM \n <span class="ansspans">ONE has a new \n line in it</span><span class="ansspans">TWO</span><span class="ansspans">3 </span><span class="ansspans">4 </span><span class="ansspans">5 </span>';
var myregexp = /<span[^>]+?class="ansspans".*?>([\s\S]*?)<\/span>/g;
var match = myregexp.exec(data);
var result = "spans found:\n";
while (match != null) {
result += "match:"+RegExp.$1 + ',\n';
match = myregexp.exec(data);
}
alert(result);
(edited to capture inner html instead of whole tag)
The following jQuery selector $('span.ansspans') will get all the <span class="anspans"> for the page.
If you need something for a specific element, add a prefix of the appropriate selector, i.e. $('#sss_ctl00_ctl06_lblanswertext span.ansspans')
If this needs to be done in a more dynamic way - look into functions like find(), filter(), etc.
solution using jquery
var tempArray = $("span.ansspans");//this will select all the span elements having class 'ansspans' and return them in an array
var len = tempArray.length;//calculate the length of the array
for(var index = 0;index<len;index++){
var reqString = $(tempArray[index]).html();
}
These string values can be either put inside an array or can be utilised then and there only.
If you want textContent, use .text() instead of .html()