As an example, the chat site Omegle always displays on its homepage the current number of users online, which I am able to extract with this python script using the headless HTMLUnit Webdriver in Selenium:
from selenium import webdriver
driver = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.HTMLUNITWITHJS)
driver.get('http://www.omegle.com/')
element = driver.find_element_by_id("onlinecount")
print element.text.split()[0]
The output is like:
22,183
This number is dynamically generated and updated periodically by a script, and I want to read just this dynamically updated content at intervals without repeatedly loading the entire page with driver.get. What Selenium Webdriver method or functionality will let me do that?
This article seems like a relevant lead, though it led me nowehere.
This is untested, but I think the following might work:
from selenium import webdriver
from time import sleep
driver = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.HTMLUNITWITHJS)
driver.get('http://www.omegle.com/')
interval = 10 #or whatever interval you want
while True:
element = driver.find_element_by_id("onlinecount")
print element.text.split()[0]
sleep(interval)
I think if you find the element after it's been altered, it will give you the new value.
Related
I am trying to create a script using python and selenium to automate the checkout process at bestbuy.ca.
I get all the way to the final stage where you click to review the final order, but get the following 403 forbidden message (as seen in the network response) when I try to click through to the final step.
Is there something server side that has detected that I am using selenium and preventing me to proceed?
How can I hide the fact that it is selenium being used?
These are the options I am using for selenium:
options = Options()
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("start-maximized")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(options=options)
I currently have 10 second delays after each action(ie open page, wait, click add to cart, wait, click checkout, wait)
I have implemented a random useragent to be used on each run:
import fake_useragent
ua = UserAgent()
userAgent = ua.random
options.add_argument(f'user-agent={userAgent}')
I have also modified my chromedriver binary as per the comments in THIS THREAD
Error seen when proceeding to order review page:
After much testing the last few days, here are the options that have allowed me to bypass the restrictions I was facing.
Modified cdc_ string in my chromedriver
Chromedriver options:
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("--disable-extensions")
options.add_experimental_option('useAutomationExtension', False)
options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_driver = webdriver.Chrome(options=options)
Change the property value of the navigator for webdriver to undefined:
chrome_driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
After all three of these were implemented I no longer faced any 403 error when navigating the site and the cart/checkout process.
In my case, either using code to control the browser, or simply starting Chrome through python and manually using the browser always leads to the 403 error, even just adding a product to the cart.
As you said, I think that this site someway knows that the user is using Selenium or some sort of automation tool and the server is blocking API requests.
Searching in stackoverflow I found this https://stackoverflow.com/a/52108199/3228768 but editing the chromedriver results anyway in a failure.
The only way I completed the flow is settings this options:
u = 'https://www.bestbuy.ca/en-ca/category/appliances/26517'
# relevant part start here
options = webdriver.ChromeOptions()
options.add_argument("--disable-blink-features")
options.add_argument("--disable-blink-features=AutomationControlled")
# relevant part ends here
driver = webdriver.Chrome(executable_path=r"chromedriver.exe", options=options)
driver.maximize_window()
driver.get(u)
In this way I managed to add a product to the cart. I think you could use it to proceed the flow until checkout.
Let me know.
Try this one: https://github.com/ultrafunkamsterdam/undetected-chromedriver
It avoids the selenium detection quite well, I've been having good result with it so far. Headless is not guaranteed though.
import undetected_chromedriver as uc
driver = uc.Chrome()
driver.get('https://www.bestbuy.ca/')
I am trying to scrape the following webpage: https://steamdb.info/app/730/graphs/
(I have gained permission from the website)
The problem is that the "Monthly Breakdown" table seems to be loaded by Javascript, and BeautifulSoup does not work. When using Selenium to open the webpage, it says that to see the table "You must have Javascript enabled.", which should be enabled when using Selenium. Here is my code:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--enable-javascript")
browser = webdriver.Chrome(options=options)
browser.maximize_window()
url = "https://steamdb.info/app/730/graphs/"
browser.get(url)
Any ways to solve this problem?
How the page should look:
How it looks on Selenium:
Try this and see if you no longer get that error message:
options.add_argument("javascript.enabled", True)
You may also need to look into "waits" here to make sure the async operation on the webpage has time to load.
Update:
To enable or disable javascript :
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
#value 1 enables it , if you set to 2 it disables it
chrome_options.add_experimental_option( "prefs",{'profile.managed_default_content_settings.javascript': 1})
driver =webdriver.Chrome(r".\chromedriver.exe",options=chrome_options)
driver.get("https://www.google.com")
THis wont solve your issue
you can check if the javascript is enabled by typing below in your address bar:
chrome://settings/content/javascript?search=javascript
you can see that even if its enabled the website won't load properly.
it seems they have enabled security to avoid using selenium in thier website
Previous answer:
There is no command line argument called --enable-javascript chromium project try the above javascript-harmony flag instead.
below are the full list of supported chrome flags:
https://peter.sh/experiments/chromium-command-line-switches/#login-profile
please add screenshots and other information if more help is needed
Problem Description:
I am currently trying to set up a Selenium webdriver in Java.
However every time I try to load this specific webpage: Expected Website I end up with this Unintentional Website. No matter which driver I use (Firefox,Chrome,Edge), I somehow always get redirected and I did not find any solution to overcome this. Please note that the page loads some JS during the page loading process. This might be causing this redirection.
However if I use a standard browser I get the Expected Website as wished.
Goal:
Load this website with a Selenium webdriver: Expected Website
Additional Information:
The code I am using so far:
System.setProperty("webdriver.gecko.driver", "E:/Downloads/geckodriver.exe");
File pathToBinary = new File(
"C:/Program Files (x86)/Mozilla Firefox/firefox.exe");
FirefoxBinary ffBinary = new FirefoxBinary(pathToBinary);
FirefoxProfile firefoxProfile = new FirefoxProfile();
driver = new FirefoxDriver(ffBinary,firefoxProfile);
driver.get("https://www.liketoknow.it/featured");
try {
Thread.sleep(10000);
}catch (InterruptedException e) {}
driver.quit();
The reason why this happens is the following:
If you want to open this page you have to have granted access. To have so you have to first login on the main webpage.
For other people having a similar issues of getting redirected:
Use a different user agent when you set up your Webdriver and switch to a either mobile or PC/MAC, depending on your needs.
Cheers
Have you seen that /featured is redirecting to the root site literally?
I would go for that first. Probably is something related with it and you will end with the same result if you connect to https://www.liketoknow.it/ in the first place.
I'm trying to write a simple script with Windmill to open a page (which has javascript) and then download the entire html. My code is:
from windmill.authoring import setup_module, WindmillTestClient
from windmill.conf import global_settings
import sys
global_settings.START_FIREFOX = True
setup_module(sys.modules[__name__])
def my_func():
url = "a certain url"
client = WindmillTestClient(__name__)
client.open(url=cur_url)
html = client.commands.getPageText()
This last line, with getPageText() just seems to hang. Nothing happens and it never returns.
Also, is it necessary for windmill to open up the whole GUI every time? And if it is, is there a function in python to close it when I'm done (a link to any actual documentation would be helpful; all I've found are a few examples)?
Edit: solved the problem by just using Selenium instead, took about 15 minutes vs 3 hours of trying to make Windmill work.
A colleague of mine came up with an alternate solution, which was to actually watch the network traffic coming into the browser and scrape the GET requests. Not totally sure how he did it though.
I'm trying to test my Backbone.js web application with Selenium IDE.
Selenium can open my test case's initial URL so long as it's in a fresh browser window -- e.g. open /#/login -- but it times out whenever it tries to open subsequent URLs.
It seems that Selenium is listening for an event that just isn't triggered when only the URL hash changes.
I would imagine this happens any time you're using hashchange + Selenium...
In Selenium IDE simply use the 'storeEval' command, for example :
Command = storeEval
Target = window.location.hash='/search/events/birthdays/1'
storeEval runs the javascript snippet assigned to "target".
What you can then do, is have one test case that opens the start page using the open(url) command, and the rest of your cases changing the hash using the storeEval command.
Run this on console of developer tool -> window.location.hash='#abcde'. It should change hash for you in the browser tab.
Execute javascript through Selenium Webdriver and Java:
((JavascriptExecutor) driver).executeScript("window.location.hash='#abcde'");
A brief update: We gave up trying to use Selenium IDE to write our integration tests, and instead used the Selenium Python bindings for Selenium WebDriver.
With this approach, we can navigate to a URL and then use WebDriverWait to detect a particular change in the DOM, e.g.
driver = webdriver.Firefox()
driver.get("/#/login")
WebDriverWait(driver, 10).until(
lambda driver: driver.find_element_by_css_selector("form.login").is_displayed())