Selenium Webdriver Automatic Redirecting - javascript

Problem Description:
I am currently trying to set up a Selenium webdriver in Java.
However every time I try to load this specific webpage: Expected Website I end up with this Unintentional Website. No matter which driver I use (Firefox,Chrome,Edge), I somehow always get redirected and I did not find any solution to overcome this. Please note that the page loads some JS during the page loading process. This might be causing this redirection.
However if I use a standard browser I get the Expected Website as wished.
Goal:
Load this website with a Selenium webdriver: Expected Website
Additional Information:
The code I am using so far:
System.setProperty("webdriver.gecko.driver", "E:/Downloads/geckodriver.exe");
File pathToBinary = new File(
"C:/Program Files (x86)/Mozilla Firefox/firefox.exe");
FirefoxBinary ffBinary = new FirefoxBinary(pathToBinary);
FirefoxProfile firefoxProfile = new FirefoxProfile();
driver = new FirefoxDriver(ffBinary,firefoxProfile);
driver.get("https://www.liketoknow.it/featured");
try {
Thread.sleep(10000);
}catch (InterruptedException e) {}
driver.quit();

The reason why this happens is the following:
If you want to open this page you have to have granted access. To have so you have to first login on the main webpage.
For other people having a similar issues of getting redirected:
Use a different user agent when you set up your Webdriver and switch to a either mobile or PC/MAC, depending on your needs.
Cheers

Have you seen that /featured is redirecting to the root site literally?
I would go for that first. Probably is something related with it and you will end with the same result if you connect to https://www.liketoknow.it/ in the first place.

Related

Getting 403 when using Selenium to automate checkout process

I am trying to create a script using python and selenium to automate the checkout process at bestbuy.ca.
I get all the way to the final stage where you click to review the final order, but get the following 403 forbidden message (as seen in the network response) when I try to click through to the final step.
Is there something server side that has detected that I am using selenium and preventing me to proceed?
How can I hide the fact that it is selenium being used?
These are the options I am using for selenium:
options = Options()
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("start-maximized")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(options=options)
I currently have 10 second delays after each action(ie open page, wait, click add to cart, wait, click checkout, wait)
I have implemented a random useragent to be used on each run:
import fake_useragent
ua = UserAgent()
userAgent = ua.random
options.add_argument(f'user-agent={userAgent}')
I have also modified my chromedriver binary as per the comments in THIS THREAD
Error seen when proceeding to order review page:
After much testing the last few days, here are the options that have allowed me to bypass the restrictions I was facing.
Modified cdc_ string in my chromedriver
Chromedriver options:
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("--disable-extensions")
options.add_experimental_option('useAutomationExtension', False)
options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_driver = webdriver.Chrome(options=options)
Change the property value of the navigator for webdriver to undefined:
chrome_driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
After all three of these were implemented I no longer faced any 403 error when navigating the site and the cart/checkout process.
In my case, either using code to control the browser, or simply starting Chrome through python and manually using the browser always leads to the 403 error, even just adding a product to the cart.
As you said, I think that this site someway knows that the user is using Selenium or some sort of automation tool and the server is blocking API requests.
Searching in stackoverflow I found this https://stackoverflow.com/a/52108199/3228768 but editing the chromedriver results anyway in a failure.
The only way I completed the flow is settings this options:
u = 'https://www.bestbuy.ca/en-ca/category/appliances/26517'
# relevant part start here
options = webdriver.ChromeOptions()
options.add_argument("--disable-blink-features")
options.add_argument("--disable-blink-features=AutomationControlled")
# relevant part ends here
driver = webdriver.Chrome(executable_path=r"chromedriver.exe", options=options)
driver.maximize_window()
driver.get(u)
In this way I managed to add a product to the cart. I think you could use it to proceed the flow until checkout.
Let me know.
Try this one: https://github.com/ultrafunkamsterdam/undetected-chromedriver
It avoids the selenium detection quite well, I've been having good result with it so far. Headless is not guaranteed though.
import undetected_chromedriver as uc
driver = uc.Chrome()
driver.get('https://www.bestbuy.ca/')

Selenium will not show javascript loaded table in webpage with Python

I am trying to scrape the following webpage: https://steamdb.info/app/730/graphs/
(I have gained permission from the website)
The problem is that the "Monthly Breakdown" table seems to be loaded by Javascript, and BeautifulSoup does not work. When using Selenium to open the webpage, it says that to see the table "You must have Javascript enabled.", which should be enabled when using Selenium. Here is my code:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--enable-javascript")
browser = webdriver.Chrome(options=options)
browser.maximize_window()
url = "https://steamdb.info/app/730/graphs/"
browser.get(url)
Any ways to solve this problem?
How the page should look:
How it looks on Selenium:
Try this and see if you no longer get that error message:
options.add_argument("javascript.enabled", True)
You may also need to look into "waits" here to make sure the async operation on the webpage has time to load.
Update:
To enable or disable javascript :
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
#value 1 enables it , if you set to 2 it disables it
chrome_options.add_experimental_option( "prefs",{'profile.managed_default_content_settings.javascript': 1})
driver =webdriver.Chrome(r".\chromedriver.exe",options=chrome_options)
driver.get("https://www.google.com")
THis wont solve your issue
you can check if the javascript is enabled by typing below in your address bar:
chrome://settings/content/javascript?search=javascript
you can see that even if its enabled the website won't load properly.
it seems they have enabled security to avoid using selenium in thier website
Previous answer:
There is no command line argument called --enable-javascript chromium project try the above javascript-harmony flag instead.
below are the full list of supported chrome flags:
https://peter.sh/experiments/chromium-command-line-switches/#login-profile
please add screenshots and other information if more help is needed

Unsuccessful in trying to start Selenium IDE recording from C# code

I'm working with Selenium WebDriver (3 latest, Chrome driver, 83 latest) in C# (windows10, .net fw 4.6.2). I'm trying to start the IDE recording from within the code of a running automation test, on the open web page, (my intention is to record all the actions being done within the automation test), but with no success.
I'm trying to do it using the IDE extension API (I don't want that the IDE will be actually open during the test, nor do I want it to reopen the page for recording, since it is already open by the automation test)
My final intention is to use the outcome of the IDE recording for something I need.
Here is my C# code:
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
namespace IDE
{
class Program
{
static void Main(string[] args)
{
ChromeOptions options = new ChromeOptions();
// Load IDE extension to chrome
options.AddExtension(#"C:\Users\<my user name>\Downloads\extension_3_17_0_0.crx");
IWebDriver driver = new ChromeDriver(options);
// Start IDE recording
((IJavaScriptExecutor)driver).ExecuteScript("chrome.runtime.sendMessage(\"mooikfkahbdckldjjndioackbalphokd\", {uri: \"/record/session\", verb: \"post\", payload: {url: 'https://www.google.com'}});");
driver.Navigate().GoToUrl("https://www.google.com");
driver.Manage().Window.Maximize();
// Do some more actions
// ......
// Stop the IDE recording
((IJavaScriptExecutor)driver).ExecuteScript("chrome.runtime.sendMessage(\"mooikfkahbdckldjjndioackbalphokd\", {uri: \"/record/session\", verb: \"delete\"});");
driver.Quit();
}
}
}
I can see that the IDE extension is loaded, but the "chrome.runtime.sendMessage" is always throwing an exception (no matter what I supply as a value to its parameters) : "Cannot read property 'sendMessage' of undefined". The code I pasted here is just an example. I tried many other variations of "chrome.runtime.sendMessage". All of them threw the same exception.
I've seen a few discussions over the net about that exception in relation to my issue, but they all mention some java script files that have to be changed (manifest, content, etc.), which I'm not sure how to combine their suggestions with my code as it is in C#.
What am I missing here?
Any help will be much appreciated.
Thanks!!!
I am still waiting for anyone who could help. My question is very detailed and well explained. So is the code, that can be easily pasted, as is, to the editor (Just have the relevant refereces and the CRX file in your machine). If I can narrow it to one sentence, it would be: "How do I use the Selenium IDE API from the web driver in C# code?"

Download a file using Watir Webdriver and phantomjs

I am using Watir Webdriver and a headless(phantomjs) browser to goto a website, login into it and click and download a file using javascript submit button.When I click on submit, I am redirected with 302 to a different address that i can see under my Network.This is url of the file to download.I am degugging using screenshots so i can see the phantomjs is working fine but after it hits on submit button, nothing happens.This whole procedure is working fine on firefox too.Using watir webdriver, how can i get that link and save it in database and redirect my phantomjs to download the file using that link?I tried reading github pull requests, official documentation and blog posts but i am unable to reach to any solution.Please provide me with suggestions or solutions. Even one word suggestion is also appreciated as it might help me to approach the problem.I have tried getting 'http request headers' but didn't succeed.I have browser.cookie.to_a and browser.headers is giving me only object like this Watir::HTMLElementCollection:0x000000024b88c0.Thank you
I was not to find solution to my question using Phantomjs but I have solved the problem using watirwebdriver(0.9.1) headless and firefox(44.0).
These are the settings i have used.
profile = Selenium::WebDriver::Firefox::Profile.new
profile['download.prompt_for_download'] = false
profile['browser.download.folderList'] = 2 # custom location
profile['browser.download.dir'] = download_directory
profile['browser.helperApps.neverAsk.saveToDisk'] = "application/pdf"
profile['pdfjs.disabled'] = true
profile['pdfjs.firstRun'] = false
headless = Headless.new
headless.start
browser = Watir::Browser.new(:firefox, :profile => profile)
browser.goto 'www.google.com'
browser.window.resize_to(1280, 720)
puts browser.title
puts browser.url

How to parse a web use javascript to load .html by Python?

I'm using Python to parse an auction site.
If I use browser to open this site, it will go to a loading page, then jump to the search result page automatically.
If I use urllib2 to open the webpage, the read() method only return the loading page.
Is there any python package could wait until all contents are loaded then read() method return all results?
Thanks.
How does the search page work? If it loads anything using Ajax, you could do some basic reverse engineering and find the URLs involved using Firebug's Net panel or Wireshark and then use urllib2 to load those.
If it's more complicated than that, you could simulate the actions JS performs manually without loading and interpreting JavaScript. It all depends on how the search page works.
Lastly, I know there are ways to run scripting on pages without a browser, since that's what some functional testing suites do, but my guess is that this could be the most complicated approach.
After tracing for the auction web source code, I found that it uses .php to create loading page and redirect to result page. Reverse engineering to find the ture URLs is not working because it's the same URL as loading page.
And #Manoj Govindan, I've tried Mechanize, but even if I add
br.set_handle_refresh(True)
br.set_handle_redirect(True)
it still read the loading page.
After hours of searching on www, I found a possible solution : using pywin32
import win32com.client
import time
url = 'http://search.ruten.com.tw/search/s000.php?searchfrom=headbar&k=halo+reach'
ie = win32com.client.Dispatch("InternetExplorer.Application")
ie.Visible = 0
ie.Navigate(url)
while 1:
state = ie.ReadyState
if state == 4:
break
time.sleep(1)
print ie.Document.body.innerHTML
However this only works on win32 platform, I'm looking for a cross platform solutoin.
If anyone know how to deal this, please tell me.

Categories

Resources