Python Selenium: Iteration Error - javascript

I'm trying to download all xml files from a webpage. The process requires locating xml file download link one after the other, and once such a download link is clicked it leads to a form which needs to be submitted for the download to begin. The issue I'm facing lies in the iteration of these loops, once the first file is downloaded from the webpage I receive an error:
"selenium.common.exceptions.StaleElementReferenceException: Message: The element reference of stale: either the element is no longer attached to the DOM or the page has been refreshed"
The "97081 data-extension xml" is the 2nd downloadable file in the iteration. I've hereby attached the code, any suggestions to rectify this will be much appreciated.
import os
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", "F:\Projects\Poli_Map\DatG_Py_Dat")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/xml")
driver = webdriver.Firefox(firefox_profile=fp)
driver.get('https://data.gov.in/catalog/variety-wise-daily-market-prices-data-cauliflower')
wait = WebDriverWait(driver, 10)
allelements = driver.find_elements_by_xpath("//a[text()='xml']")
for element in allelements:
element.click()
class FormPage(object):
def fill_form(self, data):
driver.execute_script("document.getElementById('edit-download-reasons-non-commercial').click()")
driver.execute_script("document.getElementById('edit-reasons-d-rd').click()")
driver.find_element_by_xpath('//input[#name = "name_d"]').send_keys(data['name_d'])
driver.find_element_by_xpath('//input[#name = "mail_d"]').send_keys(data['mail_d'])
return self
def submit(self):
driver.execute_script("document.getElementById('edit-submit').click()")
data = {
'name_d': 'xyz',
'mail_d': 'xyz#outlook.com',
}
time.sleep(5)
FormPage().fill_form(data).submit()
time.sleep(5)
window_before = driver.window_handles[0]
driver.switch_to_window(window_before)
driver.back()

I found a workaround for you, no need to submit any fields.
You need to get the ID in the class field in the bottom of this picture (here for instance its 962721)
Then, use this URL like so :
https://data.gov.in/node/962721/download
This was found just doing a bit of "reverse-engineering". When you do web scraping, always have a look at the .js files and at your networking tab to see all the requests made.

Related

How to "hit Enter" using driver.execute_script?

I'm trying to make an auto-login twitter bot. But when I try to send_keys to passwords field, I can't. (Only passwords field doesn't work, the similar code for send_keys to username and phone_number works).
Error Message: "selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable".
So I tried to use execute_script instead.
driver.execute_script("arguments[0].value=arguments[1];", password_input, TWITTER_PASSWORD)
The line above works too. But I don't know how to send Enter key to arguments[0]
driver.execute_script("arguments[0].submit();", password_input)
Tried this but doesn't work. (Forgive me if this line is completely wrong cause it looks like this method take JS code and I don't know JS)
Please help me with this. Any help for this problem or just my code in general would be appreciated.
""" If you want to test the code, remember to give values to TWITTER_EMAIL, TWITTER_PHONE_NUMBER and TWITTER_PASSWORD. """
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time
TWITTER_EMAIL =
TWITTER_PHONE_NUMBER =
TWITTER_PASSWORD =
URL = r"https://twitter.com/i/flow/login"
s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s)
driver.get(url=URL)
def twitter_auto_login():
time.sleep(10)
username_input = driver.find_element(By.CSS_SELECTOR, "input")
username_input.send_keys(TWITTER_EMAIL)
time.sleep(2)
username_input.send_keys(Keys.ENTER)
time.sleep(5)
phone_num_input = driver.find_element(By.CSS_SELECTOR, "input")
phone_num_input.send_keys(TWITTER_PHONE_NUMBER)
time.sleep(2)
phone_num_input.send_keys(Keys.ENTER)
time.sleep(5)
password_input = driver.find_element(By.CSS_SELECTOR, "input")
# driver.execute_script("arguments[0].click();", password_input)
driver.execute_script("arguments[0].value=arguments[1];", password_input, TWITTER_PASSWORD)
# https://stackoverflow.com/questions/52273298/what-is-arguments0-while-invoking-execute-script-method-through-webdriver-in
time.sleep(2)
driver.execute_script("arguments[0].submit();", password_input)
UPDATE: So I found exactly what is my stupid mistake here, in the third time entering input (the password input time), username is also an input tag will appear again and will be an unreachable tag. just use find_elements and select the second element of that list will get us the right password input tag. But the solution from wado is better, just use it in case u face the same problem
No need to use driver.execute_script. In your case you're just locating the elements in a wrong way. You should do something like this:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
TWITTER_EMAIL = "email#email.com"
TWITTER_PASSWORD = "lalala"
URL = r"https://twitter.com/i/flow/login"
s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s)
driver.get(url=URL)
def twitter_auto_login():
username_input = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//input[#autocomplete='username']")))
username_input.send_keys(TWITTER_EMAIL)
username_input.send_keys(Keys.ENTER)
password_input = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//input[#autocomplete='current-password']")))
password_input.send_keys(TWITTER_PASSWORD)
password_input.send_keys(Keys.ENTER)
twitter_auto_login()
Note that I used explicit waits that are way better than those implicit waits (that makes you waste a lot of time senseless).
Via JQuery, you can use the following Javascript to simulate the enter event:
driver.execute_script("var script = document.createElement('script');
script.src = 'https://code.jquery.com/jquery-3.6.0.min.js';
var e = $.Event( 'keypress', { which: 13 } );
arguments[0].trigger(e);", password_input)
It sounds like you want to submit whatever form that input is part of:
arguments[0].closest('form').submit()

JS/AJAX Content Not loading when URL is accessed using Selenium(Python)

I'm trying to scrape this URL: https://www.wheel-size.com/size/acura/mdx/2001/
The values that I want to scrape are loaded dynamically e.g Center Bore
If you open the link in normal browser the content is loaded just fine but if I use Selenium(chromedriver) it just keeps loading and the values are never displayed.
Any idea how can I scrape it?
Below is the picture of how it looks like. You can also see the loading for 1-2 seconds when you open the link in normal browser.
To extract the desired texts e.g. 64.1 mm, 5x114.3 etc as the elements are Google Tag Manager enabled elements you need to induce WebDriverWait for the visibility_of_element_located() and you can use the following locator strategies:
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get('https://www.wheel-size.com/size/acura/mdx/2001/')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(., 'Center Bore')]//following::span[1]"))).text)
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(., 'PCD')]//following::span[1]"))).text)
Console Output:
64.1 mm
5x114.3
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

How to scrape links that do not have a href and not available in page source

Im trying to use selenium web driver to extract data and i get to see that one of the links that i want to click does not have a href. The html tags i see in inspect element are also not available in the page source.I badly want the link to be clicked and proceed to the next page.
The anchor tag that i see during inspect is as below and this seems to be having a angular JS
< a id="docs" ng-click="changeFragment('deal.docs')">
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get('URL here');
time.sleep(5) # Let the user actually see something!
username = driver.find_element_by_name('USERID')
username.send_keys('12345')
password = driver.find_element_by_name('PASSWORD')
password.send_keys('password')
#search_box.submit()
driver.find_element_by_id("submitInput").submit()
time.sleep(5) # Let the user actually see something!
lnum = driver.find_element_by_name('Number')
lnum.send_keys('0589403823')
checkbox = driver.find_element_by_name('includeInactiveCheckBox').click()
driver.find_element_by_id("searchButton").click()
time.sleep(5)
driver.execute_script("​changeFragment('deal.docs')").click()
driver.quit()
I tried to use find element by xpath and script but both didnt work .
The url im trying access cant be shared as it can be accessed only through a specific network

Finding a javascript element by id

I am trying to download a csv file from a website using selenium, but I am failing in the last step.
I fail at selecting the format of the file and then to click on export. Does someone as any idea on how to do it? There's a free registration process to be able to connect to the website, you would have to register with your email address to try. I have attached a picture and circled in red the part I struggle to automate picture. Below is the working code up until before the last step i would like to complete.
Thank you very much for your help!
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
driver = webdriver.PhantomJS()
driver.get("http://www.sem-o.com/MarketData/pages /default.aspx?ReturnUrl=%2fMarketData%2fPages%2fDynamicReports.aspx")
#log-in
##############
elem = driver.find_element_by_name("ctl00$PlaceHolderMain$FBALoginId$username")
elem.clear()
elem.send_keys("EMAIL")
elem.send_keys(Keys.RETURN)
elem = driver.find_element_by_id("ctl00_PlaceHolderMain_FBALoginId_password")
elem.clear()
elem.send_keys("PASSWORD")
#elem.send_keys(Keys.RETURN)
elem = driver.find_element_by_id(r"ctl00_PlaceHolderMain_FBALoginId_btnLogin")
elem.click()
##############
#retrieve files of interest
##############
elem = Select(driver.find_element_by_id("ctl00_ctl00_g_f5e6fa98_faa2_4210_85e9_780934d96ab8_cmbReportGroup"))
elem.select_by_visible_text('Forecast Data')
elem = Select(driver.find_element_by_id("ctl00_ctl00_g_f5e6fa98_faa2_4210_85e9_780934d96ab8_cmbSelectReport"))
elem.select_by_visible_text("Four Day Load Forecast")
elem = driver.find_element_by_id(r"ctl00_ctl00_g_9ab92c0a_eb10_4b6c_ad1b_7277cbdab462_btnGenerateLocalReport")
elem.click()
elem = driver.find_element_by_id(r"ctl00_ctl00_g_9ab92c0a_eb10_4b6c_ad1b_7277cbdab462_prm_GetFromDate_prm_GetFromDateDate")
elem.clear()
elem.send_keys("01/01/2017")
elem = driver.find_element_by_id(r"ctl00_ctl00_g_9ab92c0a_eb10_4b6c_ad1b_7277cbdab462_prm_GetToDate_prm_GetToDateDate")
elem.clear()
elem.send_keys("15/01/2017")
elem = driver.find_element_by_id(r"ctl00_ctl00_g_9ab92c0a_eb10_4b6c_ad1b_7277cbdab46 2_btnGenerateLocalReport")
elem.click()
What is the error you are getting?
Which step is it failing at, is it the actual clicking on export? If this is the case, check the way you are identifying the element in the Google dev console. Go to F12 dev tools, then in the console type:
$$('#ctl00_ctl00_g_9ab92c0a_eb10_4b6c_ad1b_7277cbdab46.2_btnGenerateLocalReport')
This will show all elements with that selected id. If it's an empty array, then the selector is invalid.
You may have to experiment with the id selector as that space looks a bit odd. More information is needed on the error you are getting and why it's failing to provide a better answer.

Scraping elements rendered using React JS with BeautifulSoup

I want to scrape anchor links with class="_1UoZlX" from the search results from this particular page - https://www.flipkart.com/search?as=on&as-pos=1_1_ic_sam&as-show=on&otracker=start&page=6&q=samsung+mobiles&sid=tyy%2F4io
When I created a soup from the page I realised that the search results are being rendered using React JS and hence I can't find them in the page source (or in the soup).
Here's my code
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
listUrls = ['https://www.flipkart.com/search?as=on&as-pos=1_1_ic_sam&as-show=on&otracker=start&page=6&q=samsung+mobiles&sid=tyy%2F4iof']
PHANTOMJS_PATH = './phantomjs'
browser = webdriver.PhantomJS(PHANTOMJS_PATH)
urls=[]
for url in listUrls:
browser.get(url)
wait = WebDriverWait(browser, 20)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "_1UoZlX")))
soup = BeautifulSoup(browser.page_source,"html.parser")
results = soup.findAll('a',{'class':"_1UoZlX"})
for result in results:
link = result["href"]
print link
urls.append(link)
print urls
This is the error I'm getting.
Traceback (most recent call last):
File "fetch_urls.py", line 19, in <module>
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "_1UoZlX")))
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/support/wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Screenshot: available via screen
Someone mentioned in this answer that there is a way to use selenium to process the javascript on a page. Can someone elaborate on that? I did some googling but couldn't find an approach that works for this particular case.
There is no problem with your code but the website you are scraping - it does not stop loading for some reason that prevents the parsing of the page and subsequent code you wrote.
I tried with wikipedia to confirm the same:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
listUrls = ["https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"]
# browser = webdriver.PhantomJS('/usr/local/bin/phantomjs')
browser = webdriver.Chrome("./chromedriver")
urls=[]
for url in listUrls:
browser.get(url)
soup = BeautifulSoup(browser.page_source,"html.parser")
results = soup.findAll('a',{'class':"mw-redirect"})
for result in results:
link = result["href"]
urls.append(link)
print urls
Outputs:
[u'/wiki/List_of_states_and_territories_of_India_by_area', u'/wiki/List_of_Indian_states_by_GDP_per_capita', u'/wiki/Constitutional_republic', u'/wiki/States_and_territories_of_India', u'/wiki/National_Capital_Territory_of_Delhi', u'/wiki/States_Reorganisation_Act', u'/wiki/High_Courts_of_India', u'/wiki/Delhi_NCT', u'/wiki/Bengaluru', u'/wiki/Madras', u'/wiki/Andhra_Pradesh_Capital_City', u'/wiki/States_and_territories_of_India', u'/wiki/Jammu_(city)']
P.S. I'm using a chrome driver in order to run the script against the real chrome browser for debugging purposes. Download the chrome driver from https://chromedriver.storage.googleapis.com/index.html?path=2.27/
Selenium will render the page including the Javascript. Your code is working properly. It is waiting for the element to be generated. In your case, Selenium didn't get that CSS element. The URL which you gave is not rendering the result page. Instead of that, It is generating the following error page.
http://imgur.com/a/YwFyE
This page is not having the CSS class. Your code is waiting for that particular CSS element. Try Firefox web driver to see what is happening.

Categories

Resources