Scraping Javascript using Selenium via Python - javascript

I'm trying to scrape javascript data from a site. Currently I'm given myself the challenge of trying to scrape the number of Followers from this website. Here's my code so far:
import os
from selenium import webdriver
import time
chromedriver = "/Users/INSERT USERNAME/Desktop/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
driver.get("http://freelegalconsultancy.blogspot.co.uk/")
time.sleep(5)
title = driver.find_element_by_class_name
print title
As you can see, I've got a chromedriver file located on my desktop. When I execute the code, I get the following result:
<bound method WebDriver.find_element_by_class_name of <selenium.webdriver.chrome.webdriver.WebDriver (session="dd9e5d3f429bc2810c30ebe7067e4e22")>>
I tried iterating into this with a for loop but it returned an error. Does anyone know how I can get the Javascript data and ultimately get the number of followers?
EDIT:
So as per request, I have changed my code to this:
import os
from selenium import webdriver
import time
chromedriver = "/Users/INSERT USERNAME/Desktop/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
driver.get("http://freelegalconsultancy.blogspot.co.uk/")
time.sleep(5)
title = driver.find_element_by_class_name("member-title")
print title
But I now get this error:
Traceback (most recent call last):
File "C:\Users\INSERT USERNAME\Desktop\blogger_v.1.py", line 11, in <module>
title = driver.find_element_by_class_name("member-title")
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 413, in find_element_by_class_name
return self.find_element(by=By.CLASS_NAME, value=name)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 752, in find_element
'value': value})['value']
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 236, in execute
self.error_handler.check_response(response)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 192, in check_response
raise exception_class(message, screen, stacktrace)
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"class name","selector":"member-title"}
(Session info: chrome=53.0.2785.143)
(Driver info: chromedriver=2.24.417431 (9aea000394714d2fbb20850021f6204f2256b9cf),platform=Windows NT 6.1.7601 SP1 x86_64)
Any ideas on how I can get around it?
EDIT:
So I've changed my code to this:
import os
from selenium import webdriver
import time
chromedriver = "/Users/INSERT USERNAME/Desktop/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
driver.get("http://freelegalconsultancy.blogspot.co.uk/")
time.sleep(5)
title = driver.find_element_by_class_name("item-title")
print title
And I get this result:
<selenium.webdriver.remote.webelement.WebElement (session="5fe8fb966edd26fdf808da07f99d4109", element="0.9924860218635834-1")>
How would I go about just printing all the javascript? Is this even possible?

You need to provide the class name you're looking for as a parameter.
title = driver.find_element_by_class_name("TheNameOfTheClass")

Related

Can't narrow down correct element in Python/Selenium

So I'm trying to craft a website manipulation script to help automate teh creation of email mailboxes on our hosted provider.
I'm both new to Python and new to scripting web resources so if something looks weird or mediocre that's why :)
Here's my script:
import time
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
from selenium import webdriver
from selenium.webdriver.support.select import Select as driverselect
driver = webdriver.Firefox()
main_url = 'https://website.com:446'
opts = Options()
# noinspection PyDeprecation
# opts.set_headless()
#assert opts.headless # Operating in headless mode
browser = Firefox(options=opts)
browser.get(main_url)
search_form = browser.find_element_by_id('LoginName')
search_form.send_keys('username')
search_form = browser.find_element_by_id('Password')
search_form.send_keys('password')
search_form.submit()
time.sleep(5)
# provision = driverselect(driver.find_element_by_xpath("/html/body/div[2]/div[2]/nav/div/ul/li[4]"))
provision = driver.find_element_by_xpath('/html/body/div[2]/div[2]/nav/div/ul/li[4]/a/span[1]')
provision.submit()
# exchange = driver.find_element_by_name('Exchange')
# exchange.submit()
My error is:
Traceback (most recent call last): File
"/home/turd/PycharmProjects/Automate_NSGEmail/selenium_test.py", line
23, in provision =
driver.find_element_by_xpath('/html/body/div[2]/div[2]/nav/div/ul/li[4]/a/span1')
File "/home/turd/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py",
line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath) File "/home/turd/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py",
line 976, in find_element
return self.execute(Command.FIND_ELEMENT, { File "/home/turd/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py",
line 321, in execute
self.error_handler.check_response(response) File "/home/turd/.local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py",
line 242, in check_response
raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: Unable to
locate element: /html/body/div[2]/div[2]/nav/div/ul/li[4]/a/span1
Now that Xpath value I copied straight from the dev tools on that page, here's what this block of code looks like from the site:
I'm trying to grab and 'click' on the one Active Dynamic-Menu item in the pic above. I think that menu is JS but I'm not 100% positive.
Anyway I'd be much obliged if anyone could help me narrow this down and grab that blasted element.
So I discovered the answer myself.. I had some wrong code at the beginning of my script:
driver = webdriver.Firefox()
main_url = 'https://website.com:446'
opts = Options()
# noinspection PyDeprecation
# opts.set_headless()
#assert opts.headless # Operating in headless mode
browser = Firefox(options=opts)
browser.get(main_url)
I changed this section to:
driver = webdriver.Firefox()
url = 'https://website.com:446'
opts = Options()
driver.maximize_window()
driver.get(url)
I was opening two instances of Firebox before, the driver.* lines would attempting to locate the xpath tags on the FF instance that was not logged in.
Derp.

'chromedriver' executable needs to be in PATH but it's already there

I want to send a message to this website with Python.
It is to say to do the following but with python :
That's why I tried the following script with Selenium:
api_location = 'http://iphoneapp.spareroom.co.uk'
api_search_endpoint = 'flatshares'
api_details_endpoint = 'flatshares'
location = 'http://www.spareroom.co.uk'
details_endpoint = 'flatshare/flatshare_detail.pl?flatshare_id='
def contact_room(self, room_id):
url = '{location}/{endpoint}/{id}?format=json'.format(location=self.api_location, endpoint=self.api_details_endpoint, id=room_id)
from selenium import webdriver
driver = webdriver.Chrome()
# Go to your page url
driver.get(url)
# Get button you are going to click by its id ( also you could use find_element_by_css_selector to get element by css selector)
button_element = driver.find_element_by_id('button id')
button_element.click()
But it returns:
C:\Users\antoi\Documents\Programming\projects\roomfinder>python test_message.py
Traceback (most recent call last):
File "C:\Python36\lib\site-packages\selenium\webdriver\common\service.py", line 76, in start
stdin=PIPE)
File "C:\Python36\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\Python36\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test_message.py", line 21, in <module>
contact_room(13829371)
File "test_message.py", line 14, in contact_room
driver = webdriver.Chrome() # Optional argument, if not specified will search path.
File "C:\Python36\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in __init__
self.service.start()
File "C:\Python36\lib\site-packages\selenium\webdriver\common\service.py", line 83, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
While I already added it in the PATH:
I am javascript learner. If you have tips and time to show how to answer the question as well in Javascript I am always happy to learn :)
The chromedriver needs to be in the path of your python script or you need to add it to your driver:
driver_path = 'Path\to\your\Driver'
driver = webdriver.Chrome(executable_path = driver_path)
Why are you using webdriver.Firefox() if you talk about Chrome?

Unable to execute onClick javascript selenium - python

I'm trying to scrape some data from TripAdvisor and using Selenium with Python binding to get it done.
The review objects in the webpage sometimes have a 'More' button at the bottom to display the full review content upon clicking it. It is actually a span element with an onlclick JS function written for it.
What I want to achieve is to load the page, find the 'More' links and click them so that the web page then has fully loaded reviews shown before scraping operations begin.
So far, I've tried the following code with no luck. I can't seem to understand the errors shown in stack trace.
import os
import time
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("https://www.tripadvisor.ca/Attraction_Review-g304138-d317476-Reviews-Temple_of_the_Tooth_Sri_Dalada_Maligawa-Kandy_Central_Province.html#REVIEWS");
more = [];
more = driver.find_elements_by_class_name('moreLink')
print(len(more));
for x in range(0,len(more)):
if more[x].is_displayed():
more[x].click();
print("clicked");
These are the error logs that I'm getting in the console.
3
Traceback (most recent call last):
File "C:\Users\**\workspace\ReviewScraper\src\scraper\test3.py", line 13, in <module>
more[x].click();
File "C:\Users\**\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 75, in click
self._execute(Command.CLICK_ELEMENT)
File "C:\Users\**\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 454, in _execute
return self._parent.execute(command, params)
File "C:\Users\**\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 201, in execute
self.error_handler.check_response(response)
File "C:\Users\**\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 102, in check_response
value = json.loads(value_json)
File "C:\Users\**\AppData\Local\Programs\Python\Python35-32\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "C:\Users\**\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\**\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Any help is highly appreciated.
I managed to get this done by reverting back to Selenium 1.48.0, and by logging into TA before scraping the reviews, everytime. Once logged in, you could click on 'More' button and extract the full reviews easily.

Read a page after another from a drop down menu - Can't find the drop down on 2nd page

I've got a page.
And I want to go on every page (in order to get the URL) associated with an element of the drop down menu from the top of the page.
New to selenium, I'm trying some preliminary work:
Open the driver
Get it to webpage
Select the drop down menu
Just select a random "name" from a arbitrary value = 2
Get on the page and get the URL from it. Print it.
Just select a random "name" from a arbitrary value = 3
ERROR.
The code I use:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
import time
driver = webdriver.Firefox()
driver.get("http://www.hillsproducts.com/General.aspx/en-GB/PD/a-d-canine/original/can")
select = Select(driver.find_element_by_xpath("//select[#id='productSpecifier_product']"))
value="2"
select.select_by_value(value)
print(driver.current_url)
time.sleep(10)
value="3"
select.select_by_value(value)
print(driver.current_url)
There is something i don't get.
The error i've got is the following :
Traceback (most recent call last): File
"/Users/Luigi/Desktop/selenium_attempt.py", line 19, in
select.select_by_value(value) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg/selenium/webdriver/support/select.py",
line 76, in select_by_value
opts = self._el.find_elements(By.CSS_SELECTOR, css) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg/selenium/webdriver/remote/webelement.py",
line 485, in find_elements
{"using": by, "value": value})['value'] File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg/selenium/webdriver/remote/webelement.py",
line 447, in _execute
return self._parent.execute(command, params) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg/selenium/webdriver/remote/webdriver.py",
line 193, in execute
self.error_handler.check_response(response) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg/selenium/webdriver/remote/errorhandler.py",
line 181, in check_response
raise exception_class(message, screen, stacktrace) selenium.common.exceptions.StaleElementReferenceException: Message:
Element not found in the cache - perhaps the page has changed since it
was looked up Stacktrace:
at fxdriver.cache.getElementAt (resource://fxdriver/modules/web-element-cache.js:9348)
at Utils.getElementAt (file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpr37ozu9l/extensions/fxdriver#googlecode.com/components/driver-component.js:8942)
at FirefoxDriver.prototype.findElementsInternal_ (file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpr37ozu9l/extensions/fxdriver#googlecode.com/components/driver-component.js:10685)
at FirefoxDriver.prototype.findChildElements (file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpr37ozu9l/extensions/fxdriver#googlecode.com/components/driver-component.js:10706)
at DelayedCommand.prototype.executeInternal_/h (file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpr37ozu9l/extensions/fxdriver#googlecode.com/components/command-processor.js:12643)
at DelayedCommand.prototype.executeInternal_ (file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpr37ozu9l/extensions/fxdriver#googlecode.com/components/command-processor.js:12648)
at DelayedCommand.prototype.execute/< (file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpr37ozu9l/extensions/fxdriver#googlecode.com/components/command-processor.js:12590)
Any idea would be appreciated !
UPDATE after Alex's answer :
Traceback (most recent call last): File
"/Users/Luigi/Desktop/selenium_attempt.py", line 18, in
if index >= len(select.options): File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg/selenium/webdriver/support/select.py",
line 46, in options
return self._el.find_elements(By.TAG_NAME, 'option') File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg/selenium/webdriver/remote/webelement.py",
line 485, in find_elements
{"using": by, "value": value})['value'] File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg/selenium/webdriver/remote/webelement.py",
line 447, in _execute
return self._parent.execute(command, params) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg/selenium/webdriver/remote/webdriver.py",
line 193, in execute
self.error_handler.check_response(response) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/selenium-2.46.1-py3.4.egg/selenium/webdriver/remote/errorhandler.py",
line 181, in check_response
raise exception_class(message, screen, stacktrace) selenium.common.exceptions.StaleElementReferenceException: Message:
Element not found in the cache - perhaps the page has changed since it
was looked up Stacktrace:
at fxdriver.cache.getElementAt (resource://fxdriver/modules/web-element-cache.js:9348)
at Utils.getElementAt (file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpzrilw39c/extensions/fxdriver#googlecode.com/components/driver-component.js:8942)
at FirefoxDriver.prototype.findElementsInternal_ (file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpzrilw39c/extensions/fxdriver#googlecode.com/components/driver-component.js:10685)
at FirefoxDriver.prototype.findChildElements (file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpzrilw39c/extensions/fxdriver#googlecode.com/components/driver-component.js:10706)
at DelayedCommand.prototype.executeInternal_/h (file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpzrilw39c/extensions/fxdriver#googlecode.com/components/command-processor.js:12643)
at DelayedCommand.prototype.executeInternal_ (file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpzrilw39c/extensions/fxdriver#googlecode.com/components/command-processor.js:12648)
at DelayedCommand.prototype.execute/< (file:///var/folders/8s/hl6bx6z91yq6r81hpqg995rw0000gn/T/tmpzrilw39c/extensions/fxdriver#googlecode.com/components/command-processor.js:12590)
You have to reinstantiate the Select() every time a new page is loaded:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
driver = webdriver.Firefox()
driver.get("http://www.hillsproducts.com/General.aspx/en-GB/PD/a-d-canine/original/can")
index = 0
while True:
select = Select(driver.find_element_by_id("productSpecifier_product"))
# exit the loop if all the options were seen
if index >= len(select.options):
break
select.select_by_index(index)
print(driver.current_url)
index += 1

How should I use this onclick(javascript) in Python

I'm first time to ask questions here and I'm new to Python.
I install the mechanize and BeautifulSoup to change some forms from a page.
Now, I use br.submit() to send the request , it doesn't work!
Is there any way to call the onclick function(javascript)?
Here is the code about that button send data:
<div class="go_btm w_a1">
<p class="gogo">search</p>
<p class="gogo">cancel</p>
<br class="CLEAR" />
</div>
UPDATE:
Thank you for support the Selenium this tool.
But I have another problem. My code below:
for i in range(len(all_options)):
arr.append(all_options[i])
count = 0
for option in arr:
print("Value is: %s" % option.get_attribute("value"))
if count > 1:
option.click()
string = u'search'
link2 = browser.find_element_by_link_text(string.encode('utf8'))
response = link2.click()
browser.back()
count = count + 1
After I back to the same page,it answer me:
Traceback (most recent call last):
File "C:\Users\pc2\Desktop\TEST.py", line 44, in <module>
print("Value is: %s" % option.get_attribute("value"))
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webelement.py", line 93, in get_attribute
resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name})
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webelement.py", line 385, in _execute
return self._parent.execute(command, params)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 173, in execute
self.error_handler.check_response(response)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 166, in check_response
raise exception_class(message, screen, stacktrace)
StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=40.0.2214.111)
(Driver info: chromedriver=2.9.248315,platform=Windows NT 6.1 SP1 x86_64)
I can only click the select once.
Is that talk me my option in the array disappear?
How should I keep the variable(option) let next loop to click?
mechanize cannot handle javascript:
How do I use Mechanize to process JavaScript?
Instead, you can automate a real browser via selenium. Example:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('myurl')
link = driver.find_element_by_link_text('search')
link.click()

Categories

Resources