How to find elements on a JavaScript Website with Selenium? - javascript

I want to automate some searching stuff for myself, but I have a bit of a problem here.
On this website:
https://shop.orgatop.de/
The program can't find the search bar, and I don't really know why.
driver = webdriver.Firefox()
driver.get('https://shop.orgatop.de/')
input_search = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//*[#id="solrSearchTerm"]')))
input_search.click()
input_search.send_keys('asd')
input_search.send_keys(Keys.RETURN)

The element is present inside nested iframe like innerFrame>catalog>content>input.You need to switch those frame first inorder to access the input search box.
Induce WebDriverWait() and frame_to_be_available_and_switch_to_it()
driver = webdriver.Firefox()
driver.get('https://shop.orgatop.de/')
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.NAME,"innerFrame")))
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.NAME,"catalog")))
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.NAME,"content")))
input_search = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, '//*[#id="solrSearchTerm"]')))
input_search.click()
input_search.send_keys('asd')
input_search.send_keys(Keys.RETURN)
Browser snashot:

Related

Selenium won't work unless I actually look at the Web page (perhaps anti-crawler mechanism by JavaScript?)

The following code works fine ONLY when I look at the Web page (aka the Chrome tab being manipulated by Selenium).
Is there a way to make it work even when I'm browsing another tab/window?
(I wonder how the website knows I'm actually looking at the web page or not...)
#This is a job website in Japanese
login_url = "https://mypage.levtech.jp/"
driver = selenium.webdriver.Chrome("./chromedriver")
#Account and password are required to log in.
#I logged in and got to the following page, which displays a list of companies that I have applied for:
#https://mypage.levtech.jp/recruits/screening
#Dictionary to store company names and their job postings
jobs = {}
for i, company in enumerate(company_names):
time.sleep(1)
element = driver.find_elements_by_class_name("ScreeningRecruits_ListItem")[i]
while element.text == "":
#While loops and time.sleep() are there because the webpage seems to take a while to load
time.sleep(0.1)
element = driver.find_elements_by_class_name("ScreeningRecruits_ListItem")[i]
td = element.find_element_by_tag_name("td")
while td.text == "":
time.sleep(0.1)
td = element.find_element_by_tag_name("td")
if td.text == company:
td.click()
time.sleep(1)
jobs[company] = get_job_desc(driver) #The get_job_desc function checks HTML tags and extract info from certain elements
time.sleep(1)
driver.back()
time.sleep(1)
print(jobs)
By the way, I have tried adding a user agent and scroll down the page using the following code, in the hope that the Web page would believe that I'm "looking at it." Well, I failed :(
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
So, I think the answer to your question is due to window_handles. Whenever we open a new tab, Selenium changes the window's focus on us ( obviously ). Because the focus is on another page, we need to use the driver.switch_to.window(handle_here) method. This way, we can switch to our proper tab. In order to do this, I found a website that has a similar functionality ( also in Japanese / Kanji? ) that might help you out.
MAIN PROGRAM - For Reference
from selenium import webdriver
from selenium.webdriver.chrome.webdriver import WebDriver as ChromeDriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as DriverWait
from selenium.webdriver.support import expected_conditions as DriverConditions
from selenium.common.exceptions import WebDriverException
import time
def get_chrome_driver():
"""This sets up our Chrome Driver and returns it as an object"""
path_to_chrome = "F:\Selenium_Drivers\Windows_Chrome85_Driver\chromedriver.exe"
chrome_options = webdriver.ChromeOptions()
# Browser is displayed in a custom window size
chrome_options.add_argument("window-size=1500,1000")
return webdriver.Chrome(executable_path = path_to_chrome,
options = chrome_options)
def wait_displayed(driver : ChromeDriver, xpath: str, int = 5):
try:
DriverWait(driver, int).until(
DriverConditions.presence_of_element_located(locator = (By.XPATH, xpath))
)
except:
raise WebDriverException(f'Timeout: Failed to find {xpath}')
# Gets our chrome driver and opens our site
chrome_driver = get_chrome_driver()
chrome_driver.get("https://freelance.levtech.jp/project/search/?keyword=&srchbtn=top_search")
wait_displayed(chrome_driver, "//div[#class='l-contentWrap']//ul[#class='asideCta']")
wait_displayed(chrome_driver, "//div[#class='l-main']//ul[#class='prjList']")
wait_displayed(chrome_driver, "//div[#class='l-main']//ul[#class='prjList']//li[contains(#class, 'prjList__item')][1]")
# Click on the first item title link
titleLinkXpath = "(//div[#class='l-main']//ul[#class='prjList']//li[contains(#class, 'prjList__item')][1]//a[contains(#href, '/project/detail/')])[1]"
chrome_driver.find_element(By.XPATH, titleLinkXpath).click()
time.sleep(2)
# Get the currently displayed window handles
tabs_open = chrome_driver.window_handles
if tabs_open.__len__() != 2:
raise Exception("Failed to click on our Link's Header")
else:
print(f'You have: {tabs_open.__len__()} tabs open')
# Switch to the 2nd tab and then close it
chrome_driver.switch_to.window(tabs_open[1])
chrome_driver.close()
# Check how many tabs we have open
tabs_open = chrome_driver.window_handles
if tabs_open.__len__() != 1:
raise Exception("Failed to close our 2nd tab")
else:
print(f'You have: {tabs_open.__len__()} tabs open')
# Switch back to our main tab
chrome_driver.switch_to.window(tabs_open[0])
chrome_driver.quit()
chrome_driver.service.stop()
For scrolling, you could use this method
def scroll_to_element(driver : ChromeDriver, xpath : str, int = 5):
try:
webElement = DriverWait(driver, int).until(
DriverConditions.presence_of_element_located(locator = (By.XPATH, xpath))
)
driver.execute_script("arguments[0].scrollIntoView();", webElement)
except:
raise WebDriverException(f'Timeout: Failed to find element using xpath {xpath}\nResult: Could not scroll')

javascript + Selenium WebDriver cannot load list of followers in instgram

I am learning JavaScript,node.js and Selenium Web Driver.
As part of my education process I am developing simple bot for Instagram.
To emulate browser I use Chrome web driver.
Faced problem when trying to get list of followers and amount of followers for the account:
This code opens instagram page, enters credentials, goes to some account and opens followers for this account.
Data like username and password I take from the settings.json.
var webdriver = require('selenium-webdriver'),
by = webdriver.By,
Promise = require('promise'),
settings = require('./settings.json');
var browser = new webdriver
.Builder()
.withCapabilities(webdriver.Capabilities.chrome())
.build();
browser.manage().window().setSize(1024, 700);
browser.get('https://www.instagram.com/accounts/login/');
browser.sleep(settings.sleep_delay);
browser.findElement(by.name('username')).sendKeys(settings.instagram_account_username);
browser.findElement(by.name('password')).sendKeys(settings.instagram_account_password);
browser.findElement(by.xpath('//button')).click();
browser.sleep(settings.sleep_delay);
browser.get('https://www.instagram.com/SomeAccountHere/');
browser.sleep(settings.sleep_delay);
browser.findElement(by.partialLinkText('followers')).click();
This part should open all followers, but not working:
var FollowersAll = browser.findElement(by.className('_4zhc5 notranslate _j7lfh'));
Tried also by xpath:
var FollowersAll = browser.findElement(by.xpath('/html/body/div[2]/div/div[2]/div/div[2]/ul/li[3]/div/div[1]/div/div[1]/a'));
When I run in the browser's console:
var i = document.getElementsByClassName('_4zhc5 notranslate _j7lfh');
it is working fine.
I run code in debug mode (use WebStorm) and it shows in each case that variable "FollowersAll" is undfined.
The same happens when I try to check amount of followers for the account.
Thanks in advance.
example of the selected element
In DOM, class names may be used multiple time. In this case, findElement by className wont work.
Xpath should be Relative and should not be Absolute.
Try Xpath with unique HTML Attribute. For example:
1. //div[#id/text()='value']
In chrome browser, open Developer Tools(press F12). If you framed an Xpath, just press Ctrl+F and paste that Xpath. If it states 1 of 1, then you can surely use that Xpath.
If it states 1 of many, then you need to dig deeper to take exact Xpath.

Infinite Scroll on Quora using Selenium in Python and Javascript

I try to handle the "infinite scrolling" in Quora website.
I use selenium lib with Python after trying to use the send_keys methods i try to run Javascript command in order to scroll down the page.
It doesn't working when i run the code, but if i try to run in the firefox console it's work.
How can i fix this problem? and it's possibile use PhantomJs?
def scrapying(self):
print platform.system()
browser = webdriver.Firefox()
#browser = webdriver.PhantomJS(executable_path='/usr/local/bin/node_modules/phantomjs/lib/phantom/bin/phantomjs')
browser.get("https://www.quora.com/C-programming-language")
#browser.get("https://answers.yahoo.com/dir/index?sid=396545660")
time.sleep(10)
#elem = browser.find_element_by_class_name("topic_page content contents main_content fixed_header ContentWrapper")
no_of_pagedowns = 500
while no_of_pagedowns:
#elem.send_keys(Keys.SPACE)
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(0.5)
no_of_pagedowns -= 1
browser.quit()
myClassObject = getFrom()
myClassObject.scrapying()
One of the options would be to recursively scroll into view of the last loaded post on a page:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.quora.com/C-programming-language")
NUM_POSTS = 200
posts = driver.find_elements_by_css_selector("div.pagedlist_item")
while len(posts) < NUM_POSTS:
driver.execute_script("arguments[0].scrollIntoView();", posts[-1])
posts = driver.find_elements_by_css_selector("div.pagedlist_item")
print(len(posts))
And it would scroll the page down until, at least, NUM_POSTS posts are loaded.
I'm also not able to trigger the infinite scroll to work using this while using Firefox. The gist of the code works in the console, however:
for i in range(0, 5):
self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)

Recursively iterate over multiple web pages and scrape using selenium

This is a follow up question to the query which I had about scraping web pages.
My earlier question: Pin down exact content location in html for web scraping urllib2 Beautiful Soup
This question is regarding doing the same, but the issue is to do the same recursively over multiple page s/views.
Here is my code
from selenium.webdriver.firefox import web driver
driver = webdriver.WebDriver()
driver.get('http://www.walmart.com/ip/29701960?page=seeAllReviews')
for review in driver.find_elements_by_class_name('BVRRReviewDisplayStyle3Main'):
title = review.find_element_by_class_name('BVRRReviewTitle').text
rating =review.find_element_by_xpath('.//div[#class="BVRRRatingNormalImage"]//img').get_attribute('title')
print title, rating
From the url, you'll see that no change is seen if we navigate to the second page, otherwise it wouldn't have been an issue. In this case, the next page clicker calls in a javascript from the server. Is there a way we can still scrape this using selenium in python just by some slight modification of my presented code ? Please let me know if there is.
Thanks.
Just click Next after reading each page:
from selenium.webdriver.firefox import webdriver
driver = webdriver.WebDriver()
driver.get('http://www.walmart.com/ip/29701960?page=seeAllReviews')
while True:
for review in driver.find_elements_by_class_name('BVRRReviewDisplayStyle3Main'):
title = review.find_element_by_class_name('BVRRReviewTitle').text
rating = review.find_element_by_xpath('.//div[#class="BVRRRatingNormalImage"]//img').get_attribute('title')
print title,rating
try:
driver.find_element_by_link_text('Next').click()
except:
break
driver.quit()
Or if you want to limit the number of pages that you are reading:
from selenium.webdriver.firefox import webdriver
driver = webdriver.WebDriver()
driver.get('http://www.walmart.com/ip/29701960?page=seeAllReviews')
maxNumOfPages = 10; # for example
for pageId in range(2,maxNumOfPages+2):
for review in driver.find_elements_by_class_name('BVRRReviewDisplayStyle3Main'):
title = review.find_element_by_class_name('BVRRReviewTitle').text
rating = review.find_element_by_xpath('.//div[#class="BVRRRatingNormalImage"]//img').get_attribute('title')
print title,rating
try:
driver.find_element_by_link_text(str(pageId)).click()
except:
break
driver.quit()
I think this would work. Although the python might be a little off, this should give you a starting point:
continue = True
while continue:
try:
for review in driver.find_elements_by_class_name('BVRRReviewDisplayStyle3Main'):
title = review.find_element_by_class_name('BVRRReviewTitle').text
rating =review.find_element_by_xpath('.//div[#class="BVRRRatingNormalImage"]//img').get_attribute('title')
print title, rating
driver.find_element_by_name('BV_TrackingTag_Review_Display_NextPage').click()
except:
print "Done!"
continue = False

Get url from window.location with selenium

Does anybody know how can i get a url between a javascript window.location.href="url";
using seleniumhq web-driver in java.
Imagine a flow like this.
Link Page > Page 2 > Page 3 > Final Page
"Link Page" has the link:
Link and then Selenium clicks the link element with something like this:
webElement.click();
Page 2 executes the window.location.href="Page 3" and then Page 3
send the redirect to the Final Page.
is it possible to get the url from Page 3 or even the history navigation?
The easiest way in selenium webdriver (in Java):
String browserUrl = driver.getCurrentUrl();
You can also make javascript calls in selenium:
JavascriptExecutor js = (JavascriptExecutor) driver;
String browserUrl = (String) js.executeScript("return window.top.location.href.toString()");
System.out.println("Your browser URL is " + browserUrl);
String url = selenium.getLocation();
System.out.println(url);
So far I cannot think of any other way than using a proxy that would record all requests made by browser. It is possible to set up such a proxy and control it from your code. You didn't specify the language you use for writing your tests. If it's Java then Browsermob may be helpful. If it's C# take a look at FiddlerCore.

Categories

Resources