I am trying to execute a JavaScript function (run with a button click) within a session using Python's requests_html
I understand the regular requests library does not have JavaScript support so I am trying to use requests_html instead.
Here's what I have (using requests):
s = requests.Session()
r = s.post(url)
print(r.text)
r2 = s.post(url2)
print(r2.text)
url is the link to the page containing the button and url2 is the POST request link the button's JavaScript function executes. (I found url2 through the network tab while in my browser inspector and clicking the button as a test)
However, this does not work and I get this from r2.text:
<h2>Error(500): An error occurred.</h2>
<p>We are sorry but an unexpected error has occurred on our side while handling your request. In the meantime, please retry your request or try the following:</p>
To my understanding, an error 500 means that the issue is server-side, not client-side. However, clicking the button manually on the webpage works fine.
This brings me to attempting to directly execute the JavaScript function instead. I couldn't find anything on the requests_html documentation. I've also looked at Selenium, but that doesn't seem to be up to date.
It is also worth mentioning that the button inspector looks like this: <button onclick="registerInterest(72833,959320000, '')" type="button" class="btn btn-primary"><i class="far fa-clipboard"></i> Register Interest</button>
So essentially, I would like to execute registerInterest(72833,959320000, '') after my first POST request.
Any help would be greatly appreciated,
I will gladly provide any additional needed information.
You need to use Selenium for manipulating html elements. You can use code like this:
from selenium import webdriver
#set chromodriver.exe path
driver = webdriver.Chrome(executable_path="C:\\chromedriver.exe")
#implicit wait
driver.implicitly_wait(0.5)
#maximize browser
driver.maximize_window()
#launch URL
driver.get("https://www.tutorialspoint.com/index.htm")
#identify element
l =driver.find_element_by_xpath("//button[text()='Check it Now']")
#perform click
l.click()
print("Page title is: ")
print(driver.title)
#close browser
driver.quit()
Just check docs on methods of Selenium and find a method which fits you the best.
Related
I am trying to create a script using python and selenium to automate the checkout process at bestbuy.ca.
I get all the way to the final stage where you click to review the final order, but get the following 403 forbidden message (as seen in the network response) when I try to click through to the final step.
Is there something server side that has detected that I am using selenium and preventing me to proceed?
How can I hide the fact that it is selenium being used?
These are the options I am using for selenium:
options = Options()
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("start-maximized")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(options=options)
I currently have 10 second delays after each action(ie open page, wait, click add to cart, wait, click checkout, wait)
I have implemented a random useragent to be used on each run:
import fake_useragent
ua = UserAgent()
userAgent = ua.random
options.add_argument(f'user-agent={userAgent}')
I have also modified my chromedriver binary as per the comments in THIS THREAD
Error seen when proceeding to order review page:
After much testing the last few days, here are the options that have allowed me to bypass the restrictions I was facing.
Modified cdc_ string in my chromedriver
Chromedriver options:
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("--disable-extensions")
options.add_experimental_option('useAutomationExtension', False)
options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_driver = webdriver.Chrome(options=options)
Change the property value of the navigator for webdriver to undefined:
chrome_driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
After all three of these were implemented I no longer faced any 403 error when navigating the site and the cart/checkout process.
In my case, either using code to control the browser, or simply starting Chrome through python and manually using the browser always leads to the 403 error, even just adding a product to the cart.
As you said, I think that this site someway knows that the user is using Selenium or some sort of automation tool and the server is blocking API requests.
Searching in stackoverflow I found this https://stackoverflow.com/a/52108199/3228768 but editing the chromedriver results anyway in a failure.
The only way I completed the flow is settings this options:
u = 'https://www.bestbuy.ca/en-ca/category/appliances/26517'
# relevant part start here
options = webdriver.ChromeOptions()
options.add_argument("--disable-blink-features")
options.add_argument("--disable-blink-features=AutomationControlled")
# relevant part ends here
driver = webdriver.Chrome(executable_path=r"chromedriver.exe", options=options)
driver.maximize_window()
driver.get(u)
In this way I managed to add a product to the cart. I think you could use it to proceed the flow until checkout.
Let me know.
Try this one: https://github.com/ultrafunkamsterdam/undetected-chromedriver
It avoids the selenium detection quite well, I've been having good result with it so far. Headless is not guaranteed though.
import undetected_chromedriver as uc
driver = uc.Chrome()
driver.get('https://www.bestbuy.ca/')
I am trying to scrape the following webpage: https://steamdb.info/app/730/graphs/
(I have gained permission from the website)
The problem is that the "Monthly Breakdown" table seems to be loaded by Javascript, and BeautifulSoup does not work. When using Selenium to open the webpage, it says that to see the table "You must have Javascript enabled.", which should be enabled when using Selenium. Here is my code:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--enable-javascript")
browser = webdriver.Chrome(options=options)
browser.maximize_window()
url = "https://steamdb.info/app/730/graphs/"
browser.get(url)
Any ways to solve this problem?
How the page should look:
How it looks on Selenium:
Try this and see if you no longer get that error message:
options.add_argument("javascript.enabled", True)
You may also need to look into "waits" here to make sure the async operation on the webpage has time to load.
Update:
To enable or disable javascript :
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
#value 1 enables it , if you set to 2 it disables it
chrome_options.add_experimental_option( "prefs",{'profile.managed_default_content_settings.javascript': 1})
driver =webdriver.Chrome(r".\chromedriver.exe",options=chrome_options)
driver.get("https://www.google.com")
THis wont solve your issue
you can check if the javascript is enabled by typing below in your address bar:
chrome://settings/content/javascript?search=javascript
you can see that even if its enabled the website won't load properly.
it seems they have enabled security to avoid using selenium in thier website
Previous answer:
There is no command line argument called --enable-javascript chromium project try the above javascript-harmony flag instead.
below are the full list of supported chrome flags:
https://peter.sh/experiments/chromium-command-line-switches/#login-profile
please add screenshots and other information if more help is needed
I want to call to a onclick function as I am writting script for MikroTik RouterBoard in order to restart my Modem by just visiting a simple link directly but what I found was that the page from which Modem is rebooting there is a button which calls to a onlick function as :
<input type="button" onclick="btnReset()" value="Reboot">
So is there any way that I can Call this onclick function directly in a http url like :
http://admin:hunter#192.168.1.1/resetrouter.html?btnReset()=Reboot
Here is MikroTik Script which I am writting but it can't do the job..It need a direct link which it visits only and downloads a file..!
If anyone MikroTik Scripting person can help will be greatful..Until then if there is any way to do it in direct url so then that will be great..!
{
/tool fetch url="http://admin:hunter#192.168.1.1/resetrouter.html?btnReset()=Reboot" mode=http
}
That's not possible to perform js code from a hyperlink (unless the page has script inside, specifically for this and it's checking for some parameter...) .
However there are some tools which could help you perform desired action in other way:
Install Custom JS for Web Sites plugin and define js script which would be executed after your http://admin:hunter... page would be loaded in browser.
For your case it would be simple function call:
btnReset()
Analyse body of btnReset() function( probably that function sends http request) and construct same request with cURL, see cURL Tutorial.
Take a look on Bookmarklet, which is a bookmark stored in a web browser that contains JavaScript commands.
Get familiar with PhantomJS or Selenium and write simple script which would perform desired action for you.
The btnReset() function is probably just a dialog followed by a URL fetch. Open the javascript console, type btnReset.toString() and have a look. If you see a URL in there, try visiting that directly.
You need a very specific URL "http://admin:hunter#192.168.1.1/rebootinfo.cgi". In my case the function btnReset() calls "rebootinfo.cgi". I followed the below steps to find "rebootinfo.cgi".
Open the "save/reboot" page or equivalent page of your router.
Inspect the button "Reboot", i.e, right click on the particular button and click "Inspect" from the menu.
In the developer tool now opened to the right, note the function for the onclick attribute in the highlighted line. E.g. in my case, btnReset() from the line "input type="button" onclick="btnReset()" value="Reboot".
Go to the console tab on the developer tool and paste the function name without parenthesis E.g., btnReset (not btnReset() ) and hit enter.
In the function definition for btnReset(), look for the exact command that is responsible for rebooting the router. In my case it was "rebootinfo.cgi".
That's it, construct the URL.
I am using mechanize, ruby and ruby & rails to scrape this website .
And i want to click the "Trabajo En Sala" tab so that I could scrape whatever information in that tab.
I know that mechanize doesn't support javascript, but i read it here how this guy is using mechanize to handle the javascript response. And one thing I noticed, I have more or less the same problem and could probably use the same solution like he did. The reasons being:
1) The tab href is using the same __doPostBack() function
<a id="ctl00_mainPlaceHolder_btnSala" href="javascript:__doPostBack('ctl00$mainPlaceHolder$btnSala','')">Trabajo en sala</a>
2) When I look at the source code, I could clearly see the form which is related to the javascript __doPostBack function:
So I have read the post of that guy wrote and tried to modified his solution into mine. And this is what I got so far:
require 'mechanize'
task :scraper_test => [:environment] do
agent = Mechanize.new
page = agent.get("https://www.camara.cl/camara/diputado_detalle.aspx?prmid=968")
form = page.form("aspnetForm.add_field!('__EVENTTARGET','')")
form.add_field!('ctl00$mainPlaceHolder$btnSala','')
tab = agent.submit(form)
end
p/s: im using rake within rails app to run this.
But when i run it, I got this error:
NoMethodError: undefined method `add_field!' for nil:NilClass
So, can you help me to figure out the right way to do this? Thanks!
I just ran this in my console and you're getting this error
NoMethodError: undefined method `add_field!' for nil:NilClass
because this line returns nil
form = page.form("aspnetForm.add_field!('__EVENTTARGET','')")
Change it to this and it will fix that current error.
form = page.form("aspnetForm")
I am using selenium for some browser automation. I need to install an extension in the browser for my work. I am doing it as follows:
import selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
executable_path = "/usr/bin/chromedriver"
options = Options()
options.add_extension('/home/TheRookie/Downloads/extensionSamples/abhcfceiempjmchhhdhbnkbimnfpckgl.crx')
browser = webdriver.Chrome(executable_path=executable_path, chrome_options=options)
The browser is starting fine but I am prompted with a pop-up to confirm that I want to add the extension as follows:
and after I get this pop-up, Python soon returns with the following exception:
selenium.common.exceptions.WebDriverException: Message: u'unknown
error: failed to wait for extension background page to load:
chrome-extension://abhcfceiempjmchhhdhbnkbimnfpckgl/toolbar.html\nfrom
unknown error: page could not be found:
chrome-extension://abhcfceiempjmchhhdhbnkbimnfpckgl/toolbar.html\n
(Driver info: chromedriver=2.12.301324
(de8ab311bc9374d0ade71f7c167bad61848c7c48),platform=Linux
3.13.0-39-generic x86_64)'
I tried handling the popup as a regular JavaScript alert using the following code:
alert = browser.switch_to_alert()
alert.accept()
However, this doesn't help. Could anyone please tell me how do I install this extension without the popup or a way to accept the popup? Any help would be greatly appreciated. Thanks!
Usually, you cannot test inline installation of a Chrome extension with just Selenium, because of that installation dialog. There are a few examples in the wild that show how to use external tools outside Selenium to solve this problem, but these are not very portable (i.e. platform-specific) and rely on a state of Chrome's UI, which is not guaranteed to be consistent.
But that does not mean that you cannot test inline installation. If you replace chrome.webstore.install with a substitute that behaves like the chrome.webstore.install API (but without the dialog), then the end-result is the same for all intents and purposes.
"Behaves like chrome.webstore.install" consists of two things:
Same behavior in error reporting and callback invocation.
An extension is installed.
I have just set up such an example on Github, which includes the source code of the helper extension/app and a few examples using Selenium (Python, Java). I suggest to read the README and the source code to get a better understanding of what happens: https://github.com/Rob--W/testing-chrome.webstore.install.
The sample does not require the tested extension to be available in the Chrome Web store. It does not even connect to the Chrome Web store. In particular, it does not check whether the site where the test runs is listed as a verified website, which is required for inline installation to work.
I had some really big code which I would have to re-write if I had to use Java. Luckily, python has a library for automating GUI events called ldtp. I used that to automate the clicking on the "Add" button. I did something on the following lines:
from ldtp import *
from threading import Thread
import selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def thread_function():
for i in range(5):
if activatewindow('Confirm New Extension'):
generatekeyevent('<left><space>')
break
time.sleep(1)
def main():
executable_path = "/usr/bin/chromedriver"
options = Options()
options.add_extension('/home/TheRookie/Downloads/extensionSamples/abhcfceiempjmchhhdhbnkbimnfpckgl.crx')
thread.start()
browser = webdriver.Chrome(executable_path=executable_path, chrome_options=options)
Hope it helps somebody.