This question already has answers here:
Web-scraping JavaScript page with Python
(18 answers)
Closed 7 years ago.
Is there any way to get the executed javascript contents from a webpage?
I have tried requests + BeautifulSoup, mechanize, these yield me with "source code" of the webpage and not the executed javascript.
For example, this website :- http://listen.tidal.com/login
As you can see, in the source code, there is un-execute JS, but, when you inspect the element, you'll see the executed code.
Now, is there any way I could get that EXECUTED code in python?
Hints please, because I have tried emulating a browser using mechanize and it does the same like reuqests.
Thank You
In fact, JavaScript engine is needed for execution of javascript. Python is a language with its own interpreter(compiler!) to execute python code. These are two different technology. So if you want to execute javascript from python, python must have api or sort of bindings that interacts with the engine executes javascript. Fortunately python has interactivity with several JS Engine for implementing web related works(testing etc.). This interoperable JS can be divided into two groups as below-
Browser without Graphical User Interface(GUI) aka Headless browser: e.g. PhantomJS a Webkit rendering engine based headless browser, SlimerJS a Gecko rendering engine based headless browser for more see here. You can interoperate with PhantomJS with selenium ( a glue between python and PhantomJS) or you can use PyQt and use python to run JS like here.
Browser with Graphical User Interface(GUI): e.g. Firefox, Chromium, Safari etc. In this case also you can execute JS through selenium python.
Simple execution example of JS in selenium python as below-
from selenium import webdriver
#define driver- firefox, chrome or phantomjs etc.
driver = webdriver.Firefox()
#Open the url
driver.get('https://www.google.com')
#see how javascript simple alert is being executed
driver.execute_script("alert('hello world');")
#close the driver i.e. closing opened Firefox instance!
driver.close()
Just to highlight - Python doesn't execute your Js code, but runtime does.
Here is an example of the python module that picks available runtime and evaluates code for you.
Look at PyExecJS, here you can find some examples, but take into account that it might not contain any browser APIs like DOM, Html5 Api, etc. It's mostly based on js engine capabilities.
Another big question, what is the reason to evaluate code in the python?
Yes, you have to pick a tool that supports Javascript content, other to mechanics. Mechanics is only for static content like you already had observed. There are several, found by search words "python alternative to mechanics". I would test PhantomJS, if I had to pick one.
Also several others are found in the other answers linked in comments, just I wanted to leave these as comments due to avoid "SO is not a pick your favorite tool recommending site" problem and thus only the universal solution is mentioned. So, please do have a little search ;)
Related
This question already has answers here:
What is client side javascript and what is server side javascript?
(8 answers)
Closed 9 months ago.
I'm learning Javascript, and I keep seeing this phrase being used when speakers are explaining NodeJS and the V8 engine in relation to Javascript. I know because of Node, we can run JS "outside of the browser," but what does that mean? Oftentimes, instructors use terms I don't know yet (like "server-side scripting") to explain, which makes it difficult to understand.
From what I know so far, it means that you can use JS like a back-end language..? Whereas "running inside a browser," means you can only code visual/front-end interactions, things only users see. Am I correct?
Yes you are more or less on the right track. The V8 engine is what powers both chrome and nodejs. When used with nodejs, it provides a runtime environment for javascript that allows JS code to be run "outside the browser". Traditionally the only place javascript could run was in a browser that implemented support for ECMAScript (all major browsers). But then along came node.js, which pulled out the critical v8 engine from chrome, and mixed with LibUV for C++ support, created the first runtime that successfully allowed Javascript to run without a browser to run the code. With Node.js (and Deno if you are curious) we can implement a variety of "back-end", server-side, code. Think anything you could program in another language now available to be programmed in Javascript. Node.js and Express js are absolutely great for running web servers, microservices, and many other services independent from a browser run time environment.
Originally, the ability to process the JavaScript language was something that Netscape added to their browser so that it could process not only HTML and CSS, but also JavaScript. Microsoft followed suit and so did all the other browsers. That's what we call running JavaScript within the browser.
But (as an example) Node.js is a command line runtime that includes the core JavaScript runtime, but extracted from the Chrome browser itself. So, with Node, you have the ability to write and run JavaScript, but you aren't running it inside of a browser and therefore you don't have the Document Object Model (DOM) or the Browser Object Model (BOM) available to you because those are APIs that are native to browsers.
But you do have the core JavaScript language and can create applications that do all sorts of non-browser tasks.
You'll find that many other development environments and tools have similar set ups, where the core JavaScript runtime has been added so that you can develop with JavaScript, but outside of a browser.
I have been developing in Node.js for some time now.
Today, I came across this article
Introduction to the JavaScript shell - Mozilla | MDN
It talks about javascript shell and goes onto say that it can execute javascript programs from a file as well.
I was able to research and understand V8 and spydermonkey.
I want to know the difference between Node.js and the javascript shell talked about in this article since it says that the shell can execute javascript programs on its own.
Do they only differ in that the node.js uses a V8 engine while the other uses a spidermonkey?
if so then why is it that node.js is so popularly used for writing Server Side JavaScript?
I couldn't exactly find what I was looking for on the Internet. either google showed me difference between spidermonkey and v8 or some forums on "difference between javascript and node.js" and since I am a new developer its really hard for me to understand,
Can spidermonkey be used to achieve the same?
JavaScript is a language.
node.js is not a language or a special dialect of JavaScript - it's just a thingamabob that runs normal JavaScript.
All browsers have JavaScript engines that run the JavaScript of web pages. Firefox has an engine called Spidermonkey, Safari has JavaScriptCore, and Chrome has an engine called V8.
Node.js is simply the V8 engine bundled with some libraries to do I/O and networking, so that you can use JavaScript outside of the browser, to create shell scripts, backend services or run on hardware (https://tessel.io/).
Credits : https://www.quora.com/What-is-the-difference-between-JavaScript-and-Node-js
I hope that helped clearing out the basic difference between them. The specifics you required are not answered here.
Node.js enables JavaScript to be used for server-side scripting, and
runs scripts server-side to produce dynamic web page content before
the page is sent to the user's web browser.
Source: https://en.wikipedia.org/wiki/Node.js
Obviously the shell can not serve HTML web pages by itself.
In addition, Node.js is asynchronous, and non-blocking, meaning it can serve multiple requests and host multiple processes simultaneously.
EDIT: provided source.
I don't have a lot of hope for this one, but I have to ask. I am hoping, for didactic purposes, to come up with some means by which a student could load a simple javascript program into a browser, and have it interact with them in the old-fashioned command line manner, where it prints a line and then reads a line of input. This works fine if you use prompt(), but the fact that it creates popups is aesthetically annoying, and today's browsers cheerfully volunteer to stifle the scripts which overuse it. The problem is, prompt() appears to be the only way browser Javascript has of actually pausing a script to wait for input. If we avoid it, that throws us immediately into having to deal with a real-time GUI event input model.
I've been looking for a way to fake it -- to set up some kind of environment in which it is possible for a Javascript method to wait for input and then return when it's given. The best possibility I've got so far is to connect it to a Java applet, but the java applet brand is kind of poisoned now and I doubt people would want to install the plugin. Could there be another way? Worker threads? A browser add-on? Some server-side trick? Does anyone have an idea?
This question is now years old... I wonder if the addition of promises and async/await to Javascript makes this any more possible now?
...I sort of got it to work. If the read-eval-print loop is in an async function and uses await for the line that reads input, you can indeed make an old fashioned linear read-eval-print loop in an event-driven browser environment. But I don't think you can make the main thread wait on input without an async function, except with the grandfathered prompt method (which they're now getting ready to deprecate).
The technical term for what you want is a REPL: a Read-Evaluate-Print-Loop. There are many REPLs out there.
The Best Browser-Based Solution
Use Google Chrome's inbuilt console! It's a Javascript REPL that can also let you interact with a web page. You can access it by using Ctrl+Shift+I on Windows and Unix inside Chrome, or the equivalent command on a Mac (Google it!).
To load a file and play with it, all your students need to do is create a directory structure like this:
project
|--> index.html
|--> javascript.js
make sure index.html has a script tag that points to javascript.js, and then open index.html with Chrome. Voila! You have loaded a Javascript file, and can now play around with it in Chrome.
It's the best solution because you get the full power of the DOM, can do REPL stuff, and is virtually painless - everyone uses Chrome, and your students can even go home and mess around with it completely.
A Browser-Based Alternative
Rather than reinvent the wheel, you can also use Repl.it.
It's a browser-based Javascript REPL website that supports inserting an arbitrary Javascript program, and interacting with its contents. This is the closest you'll probably get to meeting your requirements - it'll be unable to interact with the DOM (obviously), but it'll more than work.
Non-Browser-Based Alternative
If the requirement for a web-based solution can be relaxed, simply using Node.js' inbuilt REPL on a terminal can be more than sufficient.
You could install Node on your lab's computers, and have people play with Javascript in that capacity. There'll be no DOM, but you could certainly have them write functions and algorithms to solve simple problems. Plus, it'd be a good way to introduce them to the fact that Javascript is no longer client-side-bound.
The fact that you'll be interacting with Javascript in an actual terminal, rather than an emulated one in the browser, is another neat bonus.
If I've Misunderstood...
Some of the question comments make it sound like you want a way to be able to interact with terminal utilities using Javascript from a browser. If that is the case, this is impossible.
There is no way for Javascript to evaluate, parse or do anything on the command line that isn't written using Javascript. You cannot expect the equivalent of a bash ls using a browser-based solution - that's because browsers don't have access to your underlying filesystem, which is a good thing. You cannot run sed, awk, grep, etc. for the same reason - Unix utilities are inaccessible to a browser. There are ways to run Unix utilities using Node, of course, but then you will be teaching them how to use Node, rather than how to play with the Unix console.
If all you want, however, is a way to SSH from a browser into a common environment, there are certainly browser-based ways to do that. FireSSH is a Firefox plugin (now also ported to Chrome) that lets people SSH into a common server. They can then do ls, vim, etc. and have it run in the server, with the results piped back to their browser screen. You'll have to think carefully about security in this case, of course, but I think simply giving people user permissions for this server should more than suffice.
Note that FireSSH doesn't use Javascript to parse or do anything - all it is doing is relaying commands you type to a server, having the server execute those commands remotely, and then piping the results back to your screen.
Alternatives to Prompt
I added this after understanding OP's requirements in more detail.
This is a question that has been asked before. I am fond of library solutions, and in 2013, iocream.js was developed for just this sort of browser functionality. You can embed it in a page, and use the jin function to assign values.
If going with a Node.js solution, by far the best approach is to make use of the prompt library. I personally find it very useful for embedding within Node.js applications.
SpiderMonkey is Mozilla's C++-based Javascript engine, and supports a function called readline(). Unfortunately, there doesn't appear to be a mainstream browser implementation.
I am looking for a JavaScript JavaScript debugger.
The situation is as follows: I am making a JS game engine. The AI scripts as well as various other actions are implemented in JS. It is possible, from the engine's developer mode, to edit this code from the browser itself (using Ace).
Now I want to add debugging capabilities. Mostly I am looking for breakpoints with step into/step over support.
I couldn't find any such library. The best I could find is the outdated debug-js project.
Note that this debugger is intended for developers who are building games using my engine. This happens from inside the browser. The engine is in JS. The debugger should be in JS as well. I want full control over these debugging features, so I can't just use the browser's debugger.
For example if you type the ID of a character in an AI script, I highlight this character. This is the kind of things I can't provide if I edit the scripts in the browser's debugger, but that I can do from Ace running in the page.
Esprima looks like an interesting starting point. It makes it possible to instrument JavaScript code. This JavaScript execution visualization looks particularly promising.
I want to use Mechanize to simulate browsing to a web page with active JavaScript, including DOM Events and AJAX, and so far I've found no way to do that.
I looked at some Python client browsers that support JavaScript like Spynner and Zope, and none of them really work for me. Spynner crashes PyQt all the time, and Zope doesn't support JavaScript as it seems.
Is there a way to simulate browsing with Python only (no extra processes) like WATIR or libraries that manipulate Firefox or Internet Explorer while supporting Javascript fully as if actually browsing the page?
I've played with this new alternative to Mechanize (which I love) called Phantom JS.
It is a full web kit browser like Safari or Chrome but is headless and scriptable. You script it with javascript, not python (as far as I know at least).
There are some example scripts to get you started. It's a lot like using Firebug. I've only spent a few min using it but I found I was quite productive right from the start.
From http://wwwsearch.sourceforge.net/mechanize/faq.html#general
If you come across this in a page you want to automate, you have four options. Here they are, roughly in order of simplicity.
Figure out what the JavaScript is doing and emulate it in your Python code: for example, by manually adding cookies to your CookieJar instance, calling methods on HTMLForms, calling urlopen, etc. See above re forms.
Use Java’s HtmlUnit or HttpUnit from Jython, since they know some JavaScript.
Instead of using mechanize, automate a browser instead. For example use MS Internet Explorer via its COM automation interfaces, using the Python for Windows extensions, aka pywin32, aka win32all (e.g. simple function, pamie; pywin32 chapter from the O’Reilly book) or ctypes (example). This kind of thing may also come in useful on Windows for cases where the automation API is lacking. For Firefox, there is PyXPCOM.
Get ambitious and automatically delegate the work to an appropriate interpreter (Mozilla’s JavaScript interpreter, for instance). This is what HtmlUnit and httpunit do. I did a spike along these lines some years ago, but I think it would (still) be quite a lot of work to do well.
Basically if you want something that deals with javascript then you need a real javascript engine, these invariably involve automating a real browser (I'm including headless ones in this).
Java’s HtmlUnit doesn't do a very good job as it doesn't use a javascript engine from an actual browser. Phantom JS sounds ideal (as newz2000 points out) however I find that when manipulating pages with javascript it can be very difficult to debug your script if you can't actually see the page you're dealing with.
This leads to solutions such as Selenium Webdriver which has a full python API to automate various browsers, however you must run a java jar and it actually launches the browser, so not the pure python solution you're after (but I think this is as close as you can get).
You can use Selenium with Python. You can then scrape JavaScript-generated content as well as manipulate the page with additional JavaScript (as well as Python).
# In your virtualenv: pip install selenium
from selenium import webdriver
# Launch Firefox GUI
browser = webdriver.Firefox()
# Alternatively, you can drive PhantomJS without a GUI
# With Node.js installed: `npm install -g phantomjs`
# browser = webdriver.PhantomJS()
# Fetch a webpage
browser.get('http://example.com')
# If you need the whole HTML document
# just like inspecting the rendered page with the console
html = browser.page_source
# Get an element, even if it was created with JS
button = browser.find_element_by_css_selector('div.some-class > \
input.the-submit-button')
# Click on something
button.click()
# Execute some JavaScript (assumes jQuery is loaded on the page)
browser.execute_script("$('html, body').animate({ scrollTop: 500 }, 50);")
You can run the code in a Python REPL and use autocomplete to discover the methods available on browser or whatever element you have selected. Or do something like print(dir(browser)) to see what is available.
An example how to use PyV8, to run JS on a DOM with python can be found here:
https://github.com/buffer/thug
This should be fairly easy to make it run together with mechanize.