How to disable screenshots and javascript for PhantomJS in python selenium? - javascript

I am scraping in a python/selenium framework using phantomJS on windows. First, I tried to disable javascript and screenhsots with selenium:
driver = webdriver.PhantomJS("phantomjs.exe", desired_capabilities = dcap)
webdriver.DesiredCapabilities.PHANTOMJS["phantomjs.page.settings.javascriptEnabled"] = False
webdriver.DesiredCapabilities.PHANTOMJS["phantomjs.takesScreenshot"] = False
webdriver.DesiredCapabilities.PHANTOMJS["phantomjs.page.clearMemoryCash"] = False
However, when I have a look at ghostdriver.log, Session.negotiatedCapabilities includes:
browserName:phantomjs
version:2.1.1
driverName:ghostdriver
driverVersion:1.2.0
platform:windows-7-32bit
javascriptEnabled:true # Should be false
takesScreenshot:true # Should be false
Therefore, I think I need to disable both parameters during onInitialized=function(), similar to the below code snippet:
phantom_exc_uri='/session/$sessionId/phantom/execute'
driver.command_executor._commands['executePhantomScript'] = ('POST', phantom_exc_uri)
initScript="""
this.onInitialized=function() {
var page=this;
### disable javascript and screenshots here ###
}
"""
driver.execute('executePhantomScript',{'script': initScript, 'args': []})
Q1: How come I can set some phantomJS specs in webdriver.DesiredCapabilities, but others not? Is this my mistake or some bug?
Q2: Is it reasonable to accomplish this during onInitialized or am I on the wrong way?
Q2: If so, how to disable JS and screenshots during onInitialized?

You have raised quite a few queries in your question. Let me try to address them all. A simple workflow with Selenium v3.8.1, ghostdriver v1.2.0 and phantomjs v2.1.1 Browser shows us that the following Session.negotiatedCapabilities are passed by default :
"browserName":"phantomjs"
"version":"2.1.1"
"driverName":"ghostdriver"
"driverVersion":"1.2.0"
"platform":"windows-8-32bit"
"javascriptEnabled":true
"takesScreenshot":true
"handlesAlerts":false
"databaseEnabled":false
"locationContextEnabled":false
"applicationCacheEnabled":false
"cssSelectorsEnabled":true
"webStorageEnabled":false
"rotatable":false
"acceptSslCerts":false
"nativeEvents":true
"proxy":{"proxyType":"direct"}}
So by default it was mandated that to establish a successful session through PhantomJSDriver and Ghost Browser combination the following Capabilities were a minimum requirement.
Then the users had the DesiredCapabilities class at their disposal to tweak the capabilities. But there are certain capabilities which are minimum requirement to create a successful Ghost Browser session.
javascriptEnabled is such a property which is mandatory. Till a few releases back Selenium did allow to tweak the javascriptEnabled attribute to false. But now WebDriver being a W3C Recommendation Candidate the mandatory capabilities cannot be over-ridden anymore through DesiredCapabilities at user level.
Even if you try to tweak them at user level, WebDriver will override them to default while configuring the capabilities.
So, though you have tried the following :
webdriver.DesiredCapabilities.PHANTOMJS["phantomjs.page.settings.javascriptEnabled"] = False
webdriver.DesiredCapabilities.PHANTOMJS["phantomjs.takesScreenshot"] = False
The properties javascriptEnabled and takesScreenshot defaults to required mandatory configuration.
Update
As you mentioned in your comment What about changing those AFTER the Ghostdriver session is established, i.e. page.onInitialized the straight answer is No.
Once the capabilities are freezed and negotiated to initialize a Browsing Session the capabilities holds true till the particular session is active. So you can't change any of the capabilities once the session is established. To change the capabilities you have to configure the WebDriver instance again.

Related

Getting 403 when using Selenium to automate checkout process

I am trying to create a script using python and selenium to automate the checkout process at bestbuy.ca.
I get all the way to the final stage where you click to review the final order, but get the following 403 forbidden message (as seen in the network response) when I try to click through to the final step.
Is there something server side that has detected that I am using selenium and preventing me to proceed?
How can I hide the fact that it is selenium being used?
These are the options I am using for selenium:
options = Options()
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("start-maximized")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(options=options)
I currently have 10 second delays after each action(ie open page, wait, click add to cart, wait, click checkout, wait)
I have implemented a random useragent to be used on each run:
import fake_useragent
ua = UserAgent()
userAgent = ua.random
options.add_argument(f'user-agent={userAgent}')
I have also modified my chromedriver binary as per the comments in THIS THREAD
Error seen when proceeding to order review page:
After much testing the last few days, here are the options that have allowed me to bypass the restrictions I was facing.
Modified cdc_ string in my chromedriver
Chromedriver options:
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("--disable-extensions")
options.add_experimental_option('useAutomationExtension', False)
options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_driver = webdriver.Chrome(options=options)
Change the property value of the navigator for webdriver to undefined:
chrome_driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
After all three of these were implemented I no longer faced any 403 error when navigating the site and the cart/checkout process.
In my case, either using code to control the browser, or simply starting Chrome through python and manually using the browser always leads to the 403 error, even just adding a product to the cart.
As you said, I think that this site someway knows that the user is using Selenium or some sort of automation tool and the server is blocking API requests.
Searching in stackoverflow I found this https://stackoverflow.com/a/52108199/3228768 but editing the chromedriver results anyway in a failure.
The only way I completed the flow is settings this options:
u = 'https://www.bestbuy.ca/en-ca/category/appliances/26517'
# relevant part start here
options = webdriver.ChromeOptions()
options.add_argument("--disable-blink-features")
options.add_argument("--disable-blink-features=AutomationControlled")
# relevant part ends here
driver = webdriver.Chrome(executable_path=r"chromedriver.exe", options=options)
driver.maximize_window()
driver.get(u)
In this way I managed to add a product to the cart. I think you could use it to proceed the flow until checkout.
Let me know.
Try this one: https://github.com/ultrafunkamsterdam/undetected-chromedriver
It avoids the selenium detection quite well, I've been having good result with it so far. Headless is not guaranteed though.
import undetected_chromedriver as uc
driver = uc.Chrome()
driver.get('https://www.bestbuy.ca/')

Selenium will not show javascript loaded table in webpage with Python

I am trying to scrape the following webpage: https://steamdb.info/app/730/graphs/
(I have gained permission from the website)
The problem is that the "Monthly Breakdown" table seems to be loaded by Javascript, and BeautifulSoup does not work. When using Selenium to open the webpage, it says that to see the table "You must have Javascript enabled.", which should be enabled when using Selenium. Here is my code:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--enable-javascript")
browser = webdriver.Chrome(options=options)
browser.maximize_window()
url = "https://steamdb.info/app/730/graphs/"
browser.get(url)
Any ways to solve this problem?
How the page should look:
How it looks on Selenium:
Try this and see if you no longer get that error message:
options.add_argument("javascript.enabled", True)
You may also need to look into "waits" here to make sure the async operation on the webpage has time to load.
Update:
To enable or disable javascript :
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
#value 1 enables it , if you set to 2 it disables it
chrome_options.add_experimental_option( "prefs",{'profile.managed_default_content_settings.javascript': 1})
driver =webdriver.Chrome(r".\chromedriver.exe",options=chrome_options)
driver.get("https://www.google.com")
THis wont solve your issue
you can check if the javascript is enabled by typing below in your address bar:
chrome://settings/content/javascript?search=javascript
you can see that even if its enabled the website won't load properly.
it seems they have enabled security to avoid using selenium in thier website
Previous answer:
There is no command line argument called --enable-javascript chromium project try the above javascript-harmony flag instead.
below are the full list of supported chrome flags:
https://peter.sh/experiments/chromium-command-line-switches/#login-profile
please add screenshots and other information if more help is needed

clearing cache on chrome using web driver

We are testing web app using Jmeter selenium webdriver.
as HTTP manager doenst work we were trying to clear the cache using below Code. Due to some reason this is failling. We need clear cache mechanism to be implemented. Besides this we also tried incognito mode many other options as gooogle suggests without luck.
We are also trying to hit (Sendkeys) Enter after lauching browser as (chrome://settings/clearBrowserData) it on clear browsing window. Driver.close() will not help us as per the scenario needs.
Please throw some ideas / suggest how to execute Enter after browser launch.
Really appreciate your time and help.
var pkg=JavaImporter(org.openqa.selenium,org.openqa.selenium.support.ui) //import java selenium packages
var Thr=JavaImporter(java.lang.Thread) //import Thread sleep packages
var wait = new pkg.WebDriverWait(WDS.browser,30) //import WebDriverWait Package
WDS.browser.get('chrome://settings/clearBrowserData')
Thr.Thread.sleep(5000)
WDS.browser.switchTo().frame("settings")
var ChkBox = WDS.browser.findElement(pkg.By.xpath('//*[#id="delete-form-data-checkbox"]'))
ChkBox.click()
////*[#id="clear-browser-data-overlay"]/div[4]
//wait.until(pkg.ExpectedConditions.presenceOfElementLocated(pkg.By.xpath('//*[#id="clear-browser-data-commit"]')))
//wait.until(pkg.ExpectedConditions.presenceOfElementLocated(pkg.By.xpath('//*[#id="clear-browser-data-overlay"]/div[4]')))
var ClearCache = WDS.browser.findElement(pkg.By.xpath('//*[#id="clear-browser-data-commit"]'))
ClearCache.click()
wait.until(pkg.ExpectedConditions.presenceOfElementLocated(pkg.By.xpath('//*[#id="privacy-section"]/h3')))
The current chrome browser clear cache button is shadowed (its a ShadowDOM). We will not be able to interact with it directly. We will need to identify its JS path and perform the click using the executeScript function. No java packages need to be imported for the executeScript function.
Just append the below line for clearing the cache in your script.
WDS.browser.executeScript('return document.querySelector("body > settings-ui").shadowRoot.querySelector("#main").shadowRoot.querySelector("settings-basic-page").shadowRoot.querySelector("#advancedPage > settings-section:nth-child(1) > settings-privacy-page").shadowRoot.querySelector("settings-clear-browsing-data-dialog").shadowRoot.querySelector("#clearBrowsingDataConfirm").click();')
Happy testing using JMeter+WebDriver

Can't turn off Javascript using Selenium

I'm trying to turn off javascript via the profile when opening using Selenium. This has work previously but now I've updated Selenium/Firfox I can't get it to work.
profile = webdriver.FirefoxProfile()
profile.set_preference('javascript.enabled', False)
driver = webdriver.Firefox(profile)
driver.implicitly_wait(30)
driver.get("http://www.enable-javascript.com/")
All other settings seem to change while using profile.set_preference() on other option and javascript.enabled exists and is set to True when I look at the Firefox settings about:config. Is it possible Javascript is being set to True after loading the profile or something?
FF version 43.0.3
Selenium version 2.48.0
Any suggestions on why this could be happening?
UPDATE
Adding profile.add_extension("path/to/noscript_security_suite-2.9.xpi"); to the above code with the downloaded extension as #alecxe suggested fixed the issue.
This issue affects selenium starting with 2.46.0, javascript.enabled is being ignored:
Firefox driver 2.46.0 regression - unable to set to non-js
As a workaround, load the noscript addon, see:
How to disable Javascript when using Selenium by JAVA?

How to handle basic authentication with protractor?

I'm trying protractor to write a few tests in a non angular application. I have to login in a page trough basic authentication in google chrome, but i have no idea how.
I already tried baseUrl: 'https://username:password#url' and capabilities: {
'browserName': 'chrome',
'chromeOptions' : {
args: ['--login-user=foo', '--login-password=bar']
}
}
But none if these worked for me. Anyone knows how to do it? I'm having some hard time on it.
You can set the URL as http://username:password#yourdomain.example. Chrome will handle it!
The short answer is there is no easy way of doing it on chrome because they do not support modifying request headers -- see https://code.google.com/p/selenium/issues/detail?id=141 (title says response headers, but if you read it, it's for all headers).
That being said, there are ways to do it, albeit difficult.
1) Find a chrome extension/plugin that allows you to modify header. A simple search bring up many of them: https://chrome.google.com/webstore/search/modify%20header. You'll need to add the plugin to webdriver: see Is it possible to add a plugin to chromedriver under a protractor test?.
2) You can use browsermob-proxy (https://github.com/lightbody/browsermob-proxy); this way you route your traffic through the proxy, which would add the headers for you.
From the docs:
POST /proxy/[port]/auth/basic/[domain] - Sets automatic basic authentication for the specified domain
Payload data should be json encoded username and password name/value pairs (ex: {"username": "myUsername", "password": "myPassword"}
There's a node project that may help you, https://github.com/zzo/browsermob-node, but you would still need to set up your proxy server yourself.
Both ways for chrome are complex, but would get you what you want. (or you can stick with firefox and follow Robert's answer)
As of version 59 Chrome no longer supports URLs with embedded credentials.
To work around this I wrote the authenticator-browser-extension Node module, which might be useful if you're using Protractor, WebDriver.io or similar test runners.
To use the module install it from npm:
npm install --save-dev authenticator-browser-extension
And import in the protractor.conf.js:
const { Authenticator } = require('authenticator-browser-extension');
exports.config = {
capabilities: {
browserName: 'chrome',
chromeOptions: {
extensions: [
Authenticator.for('username', 'password').asBase64()
]
}
},
}
Pro tip: remember not to commit your credentials with your code, consider using env variables instead.
Hope this helps!
Jan
It's because Firefox doesn't trust any site by default with sending the Windows auth info over. Even if you change it in the configurations manually, it won't affect protractor because it opens Firefox with an isolated configuration each time you run your end to end tests.
You'll need to programatically set up a Firefox profile and set its preferences such that it would trust localhost (or some other website, depending where the pages are loaded from)
First, check out this example. It shows how you can set up the profile and how you can set preferences.
https://github.com/juliemr/protractor-demo/tree/master/howtos/setFirefoxProfile
What it does is that it modifies the homepage for each new tab. In the same manner (with the firefoxProfile.setPreference method) you can change the preferences responsible for trusting websites. They're called "network.automatic-ntlm-auth.trusted-uris" and "network.negotiate-auth.delegation-uris". You'll need to set them both to "localhost". (Again, if they're at some other place, it's obviously that URL)
hankduan's browsermob-proxy solution worked for me on Chrome - but the latest revisions of browsermob are using a thing called littleproxy which does not support auth headers. Thusly I had to do browsermob-proxy -port 9090 --use-littleproxy false, which got things working.
You may use Windows Credentials Manager to avoid this pop-up being constantly shown on every attempt to log in.
Add your credentials to the 'Generic' category there, restart browser (including background apps running).
Some explanation I currently have: this pop-up is not 'browser' specific, it is 'in the middle', between browser and domain credentials verification. Thus browser features (save password, autofill) do not work completely. By the same reason Protractor / Selenium etc. do not have complete control over that pop-up - it is by design of the domain authentication.
As not completely sure if it is the only reason there are some other hints:
- you may also need to add your site to the IE (IE, not Chrome) list of trusted sites (Chrome grabs information from there);
- check "Automatic logon with current user name and password" in IE (not Chrome) - may not work if credentials you are using for the site are different from those you use to login to the machine.
If you're reading this in 2019, with Angular 7/8, consider this:
https://www.npmjs.com/package/authenticator-browser-extension
I find it much easier than the solutions suggested above.

Categories

Resources