Using selenium to download a file via window.open - javascript

I'm trying to scrape a webpage where clicking a link results in a new window popping open that immediately downloads a csv. I haven't been able to figure out the format of the url since it's fairly dense javascript (and one function is called via the onClick property while another is called as part of the href property. I have not worked with Selenium before, so I was hoping to confirm before getting started that what I want to do is possible. I had read somewhere that downloading files via new popup windows is not necessarily something I can do with Selenium.
Any advice would be greatly appreciated. A this is possible would be very helpful as would as here's how you'd do it even sketched in broad detail. Thanks much!
To be clear, my difficulties primarily stem from the fact that I can't figure out how the URL to download the file is generated. Even looking at the Google chrome network calls, I am not seeing where it is, and it would probably take me many hours to track this down, so I am looking for a solution that relies on clicking specific text in the browser rather than disentangling the cumbersome machinery behind the scenes.

Here's how I download files using Firefox webdriver. It's essentially creating a browser profile so that the default download location for certain file types are set. You can then verify if the file exists at that location.
import os
from selenium import webdriver
browser_profile = webdriver.FirefoxProfile()
# add the file_formats to download
file_formats = ','.join(["text/plain",
"application/pdf",
"application/x-pdf",
"application/force-download"])
preferences = {
"browser.download.folderList": 2,
"browser.download.manager.showWhenStarting": False,
"browser.download.dir": os.getcwd(), # will download to current directory
"browser.download.alertOnEXEOpen": False,
"browser.helperApps.neverAsk.saveToDisk": file_formats,
"browser.download.manager.focusWhenStarting": False,
"browser.helperApps.alwaysAsk.force": False,
"browser.download.manager.showAlertOnComplete": False,
"browser.download.manager.useWindow": False,
"services.sync.prefs.sync.browser.download.manager.showWhenStarting": False,
"pdfjs.disabled": True
}
for pref, val in preferences.items():
browser_profile.set_preference(pref, val)
browser_binary = webdriver.firefox.firefox_binary.FirefoxBinary()
browser = webdriver.Firefox(firefox_binary=browser_binary,
firefox_profile=browser_profile)
# set the file name that will be saved as when you download is complete
file_name = 'ABC.txt'
# goto the link to download the file from it will be automatically
# downloaded to the current directory
file_url = 'http://yourfiledownloadurl.com'
browser.get(file_url)
# verify if the expected file name exists in the current directory
path = os.path.join(os.getcwd(), file_name)
assert os.path.isfile(path)

Related

python : disable download popup when using firefox with selenium

I have script that using selenium and firefox to automating download action.
The problem is whenever I run script I always get pop up from firefox keep asking what kinds of action I would like to do, even though I set download path in firefox preference. I checked files and folders to create master mimeTypes.rdf for all users, but I couldn't find mine.(I'm using ubuntu). I found ~/.mozilla/firefox but there was no file for directory of my profile name nor any file has an extension like .rdf
here is the criminal's pic that making me crazy
firefox download popup
below is what I've done to disable the popup.
profile = FirefoxProfile()
profile.set_preference("browser.download.panel.shown", False)
profile.set_preference("browser.helperApps.neverAsk.openFile", 'application/zip')
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", 'application/zip')
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.dir", "/home/i-06/Downloads")
driver = webdriver.Firefox(firefox_profile=profile)
I have spent many hours trying to suppress that "save or open" pop-up that appears when downloading a file using the firefox driver with selenium (python 3.x). None of the many suggestions involving various values for profile.set_preference worked for me. Maybe I missed something.
Still, I finally got it working by the other method that is recommended : using an existing firefox profile.
You can tweak your default (or custom) profile to the file save behaviour you want. Type the following in the firefox address bar and make changes here :
about:preferences#applications
Then the only setting up you need to do to download the file into your current working directory is :
from selenium import webdriver
fp = webdriver.FirefoxProfile(<your firefox profile directory>)
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.dir", os.getcwd())
driver = webdriver.Firefox(firefox_profile=fp)
If you have a typical ubuntu setup, you can find your default firefox profile dir by viewing ~/.mozilla/firefox/profile.ini
In that .ini file, look for Path under [Profile0]
I doubt you need to define both. Remove the below line from your code
profile.set_preference("browser.helperApps.neverAsk.openFile", 'application/zip')
Also sometime the MIME type of zip file can be different based on the server. It could be any of below
application/octet-stream
multipart/x-zip
application/zip
application/zip-compressed
application/x-zip-compressed
So in Network tab check what is the content type you are getting and add that to your profile to make sure the dialog doesn't come
I removed profile.set_preference("browser.helperApps.neverAsk.openFile", 'application/zip') as Tarun Lalwani suggest and it still work. But my problem was that I put application/mp4 instead of video/mp4. You could check MIME type here.

Download a file using Watir Webdriver and phantomjs

I am using Watir Webdriver and a headless(phantomjs) browser to goto a website, login into it and click and download a file using javascript submit button.When I click on submit, I am redirected with 302 to a different address that i can see under my Network.This is url of the file to download.I am degugging using screenshots so i can see the phantomjs is working fine but after it hits on submit button, nothing happens.This whole procedure is working fine on firefox too.Using watir webdriver, how can i get that link and save it in database and redirect my phantomjs to download the file using that link?I tried reading github pull requests, official documentation and blog posts but i am unable to reach to any solution.Please provide me with suggestions or solutions. Even one word suggestion is also appreciated as it might help me to approach the problem.I have tried getting 'http request headers' but didn't succeed.I have browser.cookie.to_a and browser.headers is giving me only object like this Watir::HTMLElementCollection:0x000000024b88c0.Thank you
I was not to find solution to my question using Phantomjs but I have solved the problem using watirwebdriver(0.9.1) headless and firefox(44.0).
These are the settings i have used.
profile = Selenium::WebDriver::Firefox::Profile.new
profile['download.prompt_for_download'] = false
profile['browser.download.folderList'] = 2 # custom location
profile['browser.download.dir'] = download_directory
profile['browser.helperApps.neverAsk.saveToDisk'] = "application/pdf"
profile['pdfjs.disabled'] = true
profile['pdfjs.firstRun'] = false
headless = Headless.new
headless.start
browser = Watir::Browser.new(:firefox, :profile => profile)
browser.goto 'www.google.com'
browser.window.resize_to(1280, 720)
puts browser.title
puts browser.url

Saving files in a Chrome App

Summary
Normally I could download a bunch of files, but Chrome Apps won't show the download shelf when a download occurs. What would be the best way of getting around this limitation of Chrome Apps?
Ideas
I could go about this by creating a zip file, but this would require the user to perform an extra step of unzipping the file.
I'm able to silently download the files, and so I could display a prompt to the user when the file is downloaded, but this would require the user to manually search for the file in their downloads folder.
What I've Learned
Everywhere on the internet tells me to use Chrome's download API, but this only works for Chrome extensions and not Chrome apps.
I can't bring up a save as window because 50 save as windows for 50 files is unacceptable
I can, however, bring up a prompt using chrome.fileSystem.chooseEntry({'type': "openDirectory"} to ask the user to choose a directory, but I can't find a way of saving to that directory.
My question is basically the same as How can a Chrome extension save many files to a user-specified directory? but for a Chrome app instead of an extension.
Project and Example Code
The app I'm building will be the same as this webpage I've built, but with a few modifications to make it work as a web-app.
This is how my website solves the problem
let example_pic = ""
let a = document.createElement("a");
a.href = example_pic;
document.body.appendChild(a)
a.click();
window.URL.revokeObjectURL(a.href);
a.remove()
I can, however, bring up a prompt using chrome.fileSystem.chooseEntry({'type': "openDirectory"}) to ask the user to choose a directory, but I can't find a way of saving to that directory.
That's what you need to work on.
Suppose you declare all the sub-permissions for the fileSystem API:
"permissions": [
{"fileSystem": ["write", "retainEntries", "directory"]}
]
Then you can:
Get an entry from the user:
chrome.fileSystem.chooseEntry({'type': "openDirectory"}, function(dirEntry) {
// Check for chrome.runtime.lastError, then use dirEntry
});
Retain it, so you can reuse it later without asking the user again:
dirEntryId = chrome.fileSystem.retainEntry(dirEntry);
// Use chrome.storage to save/retrieve it
chrome.fileSystem.restoreEntry(dirEntryId, function(entry) { /* ... */ });
Using the HTML FileSystem API, create files in the directory:
dirEntry.getFile(
"test.txt",
{create: true}, // add "exclusive: true" to prevent overwrite
function(fileEntry) { /* write here */ },
function(e) { console.error(e) }
);

Where is the file sandbox for a Chrome app?

I'm developing a chrome app with the capability to handle files. I need to copy these files to the app, which I believe stores it in the app's sandbox.
But where are these files, like on my disk?
Here's where I get access to the filesystem:
fs = null
oneGig = Math.pow 2, 30 # 1GB
window.webkitRequestFileSystem window.PERSISTENT, oneGig,
(_fs) -> # on fs init
fs = _fs
console.log fs.root.fullPath #=> "/" obviously not right
(e) -> # on fs error
console.log e
Followed by this code to actually write the files.
fs.root.getFile songObj.md5, create: true, (fileEntry) ->
fileEntry.createWriter (fileWriter) ->
fileWriter.onwriteend = (e) ->
console.log 'Song file saved!', fileEntry, e
# Where the hell on disk is my file now?
fileWriter.onerror = (e) ->
console.log 'fileWriter.onerror', e
fileWriter.write songObj.blob
, (e) -> console.log 'fileEntry.createWriter error', e
, (e) -> console.log 'fs.root.getFile error', e
I've had some bugs in my file handling and want to be able to easily inspect what is going on, as well as clean things up if necessary. And I can't seem find anywhere in the docs that it says files go. And this especially frustrating since I'm have files just vanish after coming back to the app a few days later.
They go into a "sandbox" which isn't easy to inspect. You might want to instead use the new chrome.fileSystem chooseEntry function (http://developer.chrome.com/apps/fileSystem.html) with the "directory" option and also "retainEntry" to get access to write to a normal directory on your computer, so that you can see the files and have them not be cleared out when you clear your browser cache, etc.
So I'm not entirely sure at the lower level how Chrome Apps differ from a normal web-page, in the contect of the HTML5 Filesystem API.
But I can say that persistent storage On Windows 7, with Chrome is here:
C:\Users\{user}\AppData\Local\Google\Chrome\User Data\Default\File System\
If you want to find out where it is stored via Chrome on other OS please see this_post
Note, the directory names are weird, that said though the underlying file contents and sizes are exactly the same, so if you poke around the files and open in a text editor you'll be able to see the original contents of your files before you copied them into the persistent web-app space.

How to bypass document.domain limitations when opening local files?

I have a set of HTML files using JavaScript to generate navigation tools, indexing, TOC, etc. These files are only meant to be opened locally (e.g., file://) and not served on a web server. Since Firefox 3.x, we run into the following error when clicking a nav button that would generate a new frame for the TOC:
Error: Permission denied for <file://> to get property Location.href from <file://>.
I understand that this is due to security measures within FF 3.x that were not in 2.x, in that the document.domain does not match, so it's assuming this is cross-site scripting and is denying access.
Is there a way to get around this issue? Perhaps just a switch to turn off/on within Firefox? A bit of JavaScript code to get around it?
In firefox:
In address bar, type about:config,
then type network.automatic-ntlm-auth.trusted-uris in search bar
Enter comma separated list of
servers (i.e.,
intranet,home,company)
Another way is editing the users.js.
In users.js, write:
user_pref("capability.policy.policynames", "localfilelinks");
user_pref("capability.policy.localfilelinks.sites", "http://site1.com http://site2.com");
user_pref("capability.policy.localfilelinks.checkloaduri.enabled", "allAccess");
But if you want to stop all verification, just Write the following line into users.js file:
user_pref("capability.policy.default.checkloaduri.enabled", "allAccess");
You may use this in firefox to read the file.
function readFile(arq) {
netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect");
var file = Components.classes["#mozilla.org/file/local;1"].createInstance(Components.interfaces.nsILocalFile);
file.initWithPath(arq);
// open an input stream from file
var istream = Components.classes["#mozilla.org/network/file-input-stream;1"].createInstance(Components.interfaces.nsIFileInputStream);
istream.init(file, 0x01, 0444, 0);
istream.QueryInterface(Components.interfaces.nsILineInputStream);
var line = {}, lines = [], hasmore;
do {
hasmore = istream.readLine(line);
lines.push(line.value);
} while(hasmore);
istream.close();
return lines;
}
Cleiton's method will work for yourself, or for any users who you expect will go through this manual process (not likely unless this is a tool for you and your coworkers or something).
I'd hope that this type of thing would not be possible, because if it is, that means that any site out there could start opening up documents on my machine and reading their contents.
You can have all files that you want to access in subfolders relative to the page that is doing the request.
You can also use JSONP to load files from anywhere.
Add "file://" to network.automatic-ntlm-auth.trusted-uris in about:config

Categories

Resources