I want to use Mechanize to simulate browsing to a web page with active JavaScript, including DOM Events and AJAX, and so far I've found no way to do that.
I looked at some Python client browsers that support JavaScript like Spynner and Zope, and none of them really work for me. Spynner crashes PyQt all the time, and Zope doesn't support JavaScript as it seems.
Is there a way to simulate browsing with Python only (no extra processes) like WATIR or libraries that manipulate Firefox or Internet Explorer while supporting Javascript fully as if actually browsing the page?
I've played with this new alternative to Mechanize (which I love) called Phantom JS.
It is a full web kit browser like Safari or Chrome but is headless and scriptable. You script it with javascript, not python (as far as I know at least).
There are some example scripts to get you started. It's a lot like using Firebug. I've only spent a few min using it but I found I was quite productive right from the start.
From http://wwwsearch.sourceforge.net/mechanize/faq.html#general
If you come across this in a page you want to automate, you have four options. Here they are, roughly in order of simplicity.
Figure out what the JavaScript is doing and emulate it in your Python code: for example, by manually adding cookies to your CookieJar instance, calling methods on HTMLForms, calling urlopen, etc. See above re forms.
Use Java’s HtmlUnit or HttpUnit from Jython, since they know some JavaScript.
Instead of using mechanize, automate a browser instead. For example use MS Internet Explorer via its COM automation interfaces, using the Python for Windows extensions, aka pywin32, aka win32all (e.g. simple function, pamie; pywin32 chapter from the O’Reilly book) or ctypes (example). This kind of thing may also come in useful on Windows for cases where the automation API is lacking. For Firefox, there is PyXPCOM.
Get ambitious and automatically delegate the work to an appropriate interpreter (Mozilla’s JavaScript interpreter, for instance). This is what HtmlUnit and httpunit do. I did a spike along these lines some years ago, but I think it would (still) be quite a lot of work to do well.
Basically if you want something that deals with javascript then you need a real javascript engine, these invariably involve automating a real browser (I'm including headless ones in this).
Java’s HtmlUnit doesn't do a very good job as it doesn't use a javascript engine from an actual browser. Phantom JS sounds ideal (as newz2000 points out) however I find that when manipulating pages with javascript it can be very difficult to debug your script if you can't actually see the page you're dealing with.
This leads to solutions such as Selenium Webdriver which has a full python API to automate various browsers, however you must run a java jar and it actually launches the browser, so not the pure python solution you're after (but I think this is as close as you can get).
You can use Selenium with Python. You can then scrape JavaScript-generated content as well as manipulate the page with additional JavaScript (as well as Python).
# In your virtualenv: pip install selenium
from selenium import webdriver
# Launch Firefox GUI
browser = webdriver.Firefox()
# Alternatively, you can drive PhantomJS without a GUI
# With Node.js installed: `npm install -g phantomjs`
# browser = webdriver.PhantomJS()
# Fetch a webpage
browser.get('http://example.com')
# If you need the whole HTML document
# just like inspecting the rendered page with the console
html = browser.page_source
# Get an element, even if it was created with JS
button = browser.find_element_by_css_selector('div.some-class > \
input.the-submit-button')
# Click on something
button.click()
# Execute some JavaScript (assumes jQuery is loaded on the page)
browser.execute_script("$('html, body').animate({ scrollTop: 500 }, 50);")
You can run the code in a Python REPL and use autocomplete to discover the methods available on browser or whatever element you have selected. Or do something like print(dir(browser)) to see what is available.
An example how to use PyV8, to run JS on a DOM with python can be found here:
https://github.com/buffer/thug
This should be fairly easy to make it run together with mechanize.
Related
I don't have a lot of hope for this one, but I have to ask. I am hoping, for didactic purposes, to come up with some means by which a student could load a simple javascript program into a browser, and have it interact with them in the old-fashioned command line manner, where it prints a line and then reads a line of input. This works fine if you use prompt(), but the fact that it creates popups is aesthetically annoying, and today's browsers cheerfully volunteer to stifle the scripts which overuse it. The problem is, prompt() appears to be the only way browser Javascript has of actually pausing a script to wait for input. If we avoid it, that throws us immediately into having to deal with a real-time GUI event input model.
I've been looking for a way to fake it -- to set up some kind of environment in which it is possible for a Javascript method to wait for input and then return when it's given. The best possibility I've got so far is to connect it to a Java applet, but the java applet brand is kind of poisoned now and I doubt people would want to install the plugin. Could there be another way? Worker threads? A browser add-on? Some server-side trick? Does anyone have an idea?
This question is now years old... I wonder if the addition of promises and async/await to Javascript makes this any more possible now?
...I sort of got it to work. If the read-eval-print loop is in an async function and uses await for the line that reads input, you can indeed make an old fashioned linear read-eval-print loop in an event-driven browser environment. But I don't think you can make the main thread wait on input without an async function, except with the grandfathered prompt method (which they're now getting ready to deprecate).
The technical term for what you want is a REPL: a Read-Evaluate-Print-Loop. There are many REPLs out there.
The Best Browser-Based Solution
Use Google Chrome's inbuilt console! It's a Javascript REPL that can also let you interact with a web page. You can access it by using Ctrl+Shift+I on Windows and Unix inside Chrome, or the equivalent command on a Mac (Google it!).
To load a file and play with it, all your students need to do is create a directory structure like this:
project
|--> index.html
|--> javascript.js
make sure index.html has a script tag that points to javascript.js, and then open index.html with Chrome. Voila! You have loaded a Javascript file, and can now play around with it in Chrome.
It's the best solution because you get the full power of the DOM, can do REPL stuff, and is virtually painless - everyone uses Chrome, and your students can even go home and mess around with it completely.
A Browser-Based Alternative
Rather than reinvent the wheel, you can also use Repl.it.
It's a browser-based Javascript REPL website that supports inserting an arbitrary Javascript program, and interacting with its contents. This is the closest you'll probably get to meeting your requirements - it'll be unable to interact with the DOM (obviously), but it'll more than work.
Non-Browser-Based Alternative
If the requirement for a web-based solution can be relaxed, simply using Node.js' inbuilt REPL on a terminal can be more than sufficient.
You could install Node on your lab's computers, and have people play with Javascript in that capacity. There'll be no DOM, but you could certainly have them write functions and algorithms to solve simple problems. Plus, it'd be a good way to introduce them to the fact that Javascript is no longer client-side-bound.
The fact that you'll be interacting with Javascript in an actual terminal, rather than an emulated one in the browser, is another neat bonus.
If I've Misunderstood...
Some of the question comments make it sound like you want a way to be able to interact with terminal utilities using Javascript from a browser. If that is the case, this is impossible.
There is no way for Javascript to evaluate, parse or do anything on the command line that isn't written using Javascript. You cannot expect the equivalent of a bash ls using a browser-based solution - that's because browsers don't have access to your underlying filesystem, which is a good thing. You cannot run sed, awk, grep, etc. for the same reason - Unix utilities are inaccessible to a browser. There are ways to run Unix utilities using Node, of course, but then you will be teaching them how to use Node, rather than how to play with the Unix console.
If all you want, however, is a way to SSH from a browser into a common environment, there are certainly browser-based ways to do that. FireSSH is a Firefox plugin (now also ported to Chrome) that lets people SSH into a common server. They can then do ls, vim, etc. and have it run in the server, with the results piped back to their browser screen. You'll have to think carefully about security in this case, of course, but I think simply giving people user permissions for this server should more than suffice.
Note that FireSSH doesn't use Javascript to parse or do anything - all it is doing is relaying commands you type to a server, having the server execute those commands remotely, and then piping the results back to your screen.
Alternatives to Prompt
I added this after understanding OP's requirements in more detail.
This is a question that has been asked before. I am fond of library solutions, and in 2013, iocream.js was developed for just this sort of browser functionality. You can embed it in a page, and use the jin function to assign values.
If going with a Node.js solution, by far the best approach is to make use of the prompt library. I personally find it very useful for embedding within Node.js applications.
SpiderMonkey is Mozilla's C++-based Javascript engine, and supports a function called readline(). Unfortunately, there doesn't appear to be a mainstream browser implementation.
This question already has answers here:
Web-scraping JavaScript page with Python
(18 answers)
Closed 7 years ago.
Is there any way to get the executed javascript contents from a webpage?
I have tried requests + BeautifulSoup, mechanize, these yield me with "source code" of the webpage and not the executed javascript.
For example, this website :- http://listen.tidal.com/login
As you can see, in the source code, there is un-execute JS, but, when you inspect the element, you'll see the executed code.
Now, is there any way I could get that EXECUTED code in python?
Hints please, because I have tried emulating a browser using mechanize and it does the same like reuqests.
Thank You
In fact, JavaScript engine is needed for execution of javascript. Python is a language with its own interpreter(compiler!) to execute python code. These are two different technology. So if you want to execute javascript from python, python must have api or sort of bindings that interacts with the engine executes javascript. Fortunately python has interactivity with several JS Engine for implementing web related works(testing etc.). This interoperable JS can be divided into two groups as below-
Browser without Graphical User Interface(GUI) aka Headless browser: e.g. PhantomJS a Webkit rendering engine based headless browser, SlimerJS a Gecko rendering engine based headless browser for more see here. You can interoperate with PhantomJS with selenium ( a glue between python and PhantomJS) or you can use PyQt and use python to run JS like here.
Browser with Graphical User Interface(GUI): e.g. Firefox, Chromium, Safari etc. In this case also you can execute JS through selenium python.
Simple execution example of JS in selenium python as below-
from selenium import webdriver
#define driver- firefox, chrome or phantomjs etc.
driver = webdriver.Firefox()
#Open the url
driver.get('https://www.google.com')
#see how javascript simple alert is being executed
driver.execute_script("alert('hello world');")
#close the driver i.e. closing opened Firefox instance!
driver.close()
Just to highlight - Python doesn't execute your Js code, but runtime does.
Here is an example of the python module that picks available runtime and evaluates code for you.
Look at PyExecJS, here you can find some examples, but take into account that it might not contain any browser APIs like DOM, Html5 Api, etc. It's mostly based on js engine capabilities.
Another big question, what is the reason to evaluate code in the python?
Yes, you have to pick a tool that supports Javascript content, other to mechanics. Mechanics is only for static content like you already had observed. There are several, found by search words "python alternative to mechanics". I would test PhantomJS, if I had to pick one.
Also several others are found in the other answers linked in comments, just I wanted to leave these as comments due to avoid "SO is not a pick your favorite tool recommending site" problem and thus only the universal solution is mentioned. So, please do have a little search ;)
I'm attempting to create a front-end to launch several programs using HTML, CSS, and Javascript. My problem, however, is that the only ways that I can find to run a file either rely on Internet Explorer (Which I am not going to use) or download a new copy of the file.
Basically, I want to click a button or an image (Not 100% sure which one yet) and then run the program at the specified location. This isn't actually being hosted on a webserver; I'm just doing it because I make crappy GUIs in other languages, and HTML is comparatively easy.
This is also on Windows 7, if that has relevance.
I did something similar last year and used NW.js for it, it´s a webkit browser with integrated node.js functions.
It has its own executable and has access to your filesystem through node.
It was fun and easy to use, maybe give it a try.
If running your HTML files in a browser using the file:// protocol isn't an option, choose the technology you're the most comfortable with and look for a way to display webpages through it.
For instance you can have a look at:
WPF WebBrowser component (if you know a bit about .Net)
Java FX2 WebView (if you're more of a java guy)
etc...
I need to mimic the behavior of a browser. For Say I need to acess the DOM properties of a webpage (ex. Document.cookie or window.onblur etc) without loading a webpage in an actual browser and without interacting (ex. clicking button, putting mouse over a link etc) in the browser.
Infact, I am trying to do some thing where I have an imaginary browser Object BROWSER. So, I can do :
BROWSER g = BROWSER.load('google.com');
g.document.cookie();
g.window.onblur();
I guess this is known as 'browser instrumentation'. How Can I do it ? Any ideas.. ?
Your best bet is probably Selenium (http://seleniumhq.org/). Although it does use a real browser (which is sort of necessary if you want to test things in a real browser environment), it will allow you to completely automate/control that browser to make it do whatever you want. Using it you can write code like this:
# Pseudo-code to search for Selenium on Google
browser.open('www.google.com')
browser.findElementByCSS('#search').value('Selenium')
browser.findElementByCSS('#submitButton').click()
which sounds like what you are trying to do.
Is this something you're looking for? It's called PhantomJS
From the site:
Full web stack, No browser required
PhantomJS is a headless WebKit with JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.
Run functional tests with frameworks such as Jasmine, QUnit or CasperJS.
I am writing in javascript for windows (and also in wsf using javascript and vbscript) a desktop script not for internet and not using any explorer.
I need tool for debugging (free one).
Does someone can recommend on one ?
Thanks
I'm assuming you are creating WSH scripts based on your description. In order to debug WSH JavaScripts, start your script with wscript.exe /d path to WSH file after that, whenever an exception is going to occur, you are going to be presented with a choice to debug the script with Visual Studio or Microsoft Script Debugger (free). If you just want to step through the code start your script with wscript.exe /d /x path to WSH file this will cause an exception right at the begging of your script execution.
More information here
Aptana Studio is a great Eclipse extension and can also debug Javascript
I've heard Firebug Lite could do this? That's probably not what you're looking for still.
From the question, it sounds like you are trying to make an AJAX app that perhaps loads from local javascript + HTML.
That said, if it is OK to use Firefox as the web client, you might try Firebug. It is an excellent javascript debugger. It lets you do usual step / breakpoint things, inspect variables, and display the current page as a DOM model to help see what your jQuery (or Prototype, in my case) queries will find.