Call JavaScript (3rd party library) from Python - javascript

I've already searched quite a bit but came to now clear conclusion as some projects (pyv8) seem to be dead and I'm not sure if that is suitable at all. The 3rd part lib requires a DOM, eg. a container element in which it runs. It also uses web assembly and in general is pretty heavy.
Not sure if libs like pyv8 would actually be suitable for that. Other approach would be to go with selenium and headless chrome or a local node.js service but both of these sound very heavy. Oh, and the lib must work in windows as that's simply company policy, windows servers so PyMiniRacer is out.
What are my other options?

Consider taking a look at this post: How do I call a Javascript function from Python?.
However, if your objective is to access JS code in a webpage for reasons such as webscraping, you could also consider using selenium webdriver + python to do so. Take a look at this medium.com post: How to Run JavaScript in Python | Web Scraping | Web Testing
Other Resources:
https://www.quora.com/How-do-we-use-JavaScript-with-Python
Python to JS: https://pypi.org/project/javascripthon/
P.S: I am not sure if this would help you. There is another library (PyExecJS) which is maintained no longer; but I think you have looked it up already.

Related

JavaScript basics/instructional

I have just begun giving JavaScript a try on a programmer friend's suggestion (though I have used it to some extent before with websites using jQuery without fully understanding the intricacies of the language itself) and was wondering how certain features work.
Question:
What other uses of JavaScript are there aside from websites (by itself and/or with jQuery framework)? Is there a difference between making a site interactive and an actual "web app"? I've heard the term and don't know the difference.
To add to my main question:
I've read and done some tutorials on prototyping and am not sure on its actual application (since what little I've dabbled in, by making a website more interactive, I've never seen them used). Can someone link a website with extensive deep/heavy JavaScript use so I can check it out?
Gist:
In essence, what I'm trying to understand is where heavy use of JavaScript comes into play because so far, all JavaScript I've experienced is superficial.
Appreciate in advance any advice/help in this matter!
What other uses of Javascript are there aside from websites (by itself and/or with jQuery framework)?
Check out node.js. It runs JavaScript programs not intended for web use. Node offers a bunch of imports for things like file system IO, socket IO and a bunch of others. I personally use it for a WebSocket server because I don't like my alternatives.
Further, JavaScript has native support in many operating systems, including all Windows versions after and including XP (not sure about before). Windows Script Host runs .js files as JScript and these can be used to achieve things like those node can. It's a convenient way to do things batch can't.

Scraping dynamically generated html inside Android app

I am currently writing an Android app that, among other things, uses text information from websites which I do not own. In addition, some of the pages require authentification.
For some pages I have been able to log in and retrieve the html code using BasicNameValuePairs and an HTTPClient with its associated objects.
Unfortunately, these methods retrieve the webpage source without running any javascript functions that a browser (Android Webview even) would normally run. I need the text that some of these scripts are retrieving.
I've done my research, but everything I've found is guesswork & extremely confusing. I'm okay with ignoring pages that require login for now. Also, I am willing to post any code that may be useful for constructing a solution; It is an independent project.
Any concrete solutions for scraping the html result from javascript calls? An example would be absolutely top-notch.
Final Success:
Rhino. Used this jar file.
Other Things I Tried:
HttpClient provided by Android
Cannot run javascript
HtmlUnit
4 hours, no success. Also huge, added 12 mb to my apk.
SL4A
Finally compiled. Used THIS guide to set-up. Abandoned as overkill for a simple rhino jar.
Things That Might Work:
Selenium
Further results will be posted. Others results will be added if posted.
Note: many of the options listed above reference each other. I think rhino is included in both sl4a and htmlunit. Also, I think htmlunit contains selenium.
The aforementioned solutions are very slow and restrict you to 1 url (well, not really, but I dare you to scrape 10 urls with Rhino while your user is impatiently waiting for results).
An alternative is to use a cloud scraping solution. You get the benefit of not wasting phone bandwidth on downloading content you won't use.
Try this solution: Bobik Java SDK
It gives you the ability to scrape up to hundreds of sites in a matter of seconds

Emulate javascript _dopostback in python, web scraping

Here LINK it is suggested that it is possible to "Figure out what the JavaScript is doing and emulate it in your Python code: " This is what I would like help doing ie my question. How do I emulate javascript:__doPostBack ?
Code from a website (full page source here LINK:
<a style="color: Black;" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$gvSearchResults','Page$2')">2</a>
Of course I have basically know idea where to go from here.
Thanks in advance for your help and ideas
Ok there are lots of posts asking how to CLICK a javascript button when web scraping with python libraries mechanize, beautifulsoup....,similar. I see a lot of "that is not supported" responses use THIS non python solution. I think a python solution to this problem would be of great benefit to many. In that light I am not looking for answers like use x,y or z which are not python code or require interacting with a browser.
The mechanize page is not suggesting that you can emulate JavaScript in Python. It is saying that you can change a hidden field in a form, thus tricking the web server that a human1 has selected the field. You still need to analyse the target yourself.
There will be no Python-based solution to this problem, unless you wish to create a JavaScript interpreter in Python.
My thoughts on this problem have led me to three possible solutions:
create an XULRunner application
browser automation
attempt to interpret the client-side code
Of those three, I've only really seen discussion of 2. I've seen something
close to 1 in a commercial scraping application, where you basically create
scripts by browsing on sites and selecting things on the pages that you
would like the script to extract in the future.
1 could possibly made to work with a Python script by accepting a
serialisation (JSON ?) of wsgi Request objects, getting the app to fetch the
URL, then sending the processed page as a wsgi Response object. You could
possibly wrap some middleware around urllib2 to achieve this. Overkill
probably, but kind of fun to think about.
2 is usually achieved via Selenium RC (Remote Control), a testing-centric
tool. It provides a few methods like getHtmlSource but most people that I've
heard using it get don't like its API.
3 I have no idea about. node.js is very hot right now, but I haven't
touched it. I've never been able to build spidermonkey on my Ubuntu
machine, so I haven't touched that either. My hunch is that in order to do
this, you would provide the HTML source and your details to a JS
interpreter, that would need to fake being your User-Agent etc in case the
JavaScript wanted to reconnect with the server.
1 well, more technically, a JavaScript compliant User-Agent, which is almost always a web browser used by a human

Scraping websites with Javascript enabled?

I'm trying to scrape and submit information to websites that heavily rely on Javascript to do most of its actions. The website won't even work when i disable Javascript in my browser.
I've searched for some solutions on Google and SO and there was someone who suggested i should reverse engineer the Javascript, but i have no idea how to do that.
So far i've been using Mechanize and it works on websites that don't require Javascript.
Is there any way to access websites that use Javascript by using urllib2 or something similar?
I'm also willing to learn Javascript, if that's what it takes.
I wrote a small tutorial on this subject, this might help:
http://koaning.io.s3-website.eu-west-2.amazonaws.com/dynamic-scraping-with-python.html
Basically what you do is you have the selenium library pretend that it is a firefox browser, the browser will wait until all javascript has loaded before it continues passing you the html string. Once you have this string, you can then parse it with beautifulsoup.
I've had exactly the same problem. It is not simple at all, but I finally found a great solution, using PyQt4.QtWebKit.
You will find the explanations on this webpage : http://blog.motane.lu/2009/07/07/downloading-a-pages-content-with-python-and-webkit/
I've tested it, I currently use it, and that's great !
Its great advantage is that it can run on a server, only using X, without a graphic environment.
You should look into using Ghost, a Python library that wraps the PyQt4 + WebKit hack.
This makes g the WebKit client:
import ghost
g = ghost.Ghost()
You can grab a page with g.open(url) and then g.content will evaluate to the document in its current state.
Ghost has other cool features, like injecting JS and some form filling methods, and you can pass the resulting document to BeautifulSoup and so on: soup = bs4.BeautifulSoup(g.content).
So far, Ghost is the only thing I've found that makes this kind of thing easy in Python. The only limitation I've come across is that you can't easily create more than one instance of the client object, ghost.Ghost, but you could work around that.
Check out crowbar. I haven't had any experience with it, but I was curious about the answer to your question so I started googling around. I'd like to know if this works out for you.
http://grep.codeconsult.ch/2007/02/24/crowbar-scrape-javascript-generated-pages-via-gecko-and-rest/
Maybe you could use Selenium Webdriver, which has python bindings I believe. I think it's mainly used as a tool for testing websites, but I guess it should be usable for scraping too.
I would actually suggest using Selenium. Its mainly designed for testing Web-Applications from a "user perspective however it is basically a "FireFox" driver. I've actually used it for this purpose ... although I was scraping an dynamic AJAX webpage. As long as the Javascript form has a recognizable "Anchor Text" that Selenium can "click" everything should sort itself out.
Hope that helps

Writing a non-GUI bot using Mozilla Framework

I'm looking for a way to write a non-GUI bot using Mozilla Framework. The bot should be able to work like normal browser (automatically download relevant JS files, make XMLHTTPRequests, run JS operations, modify DOM), except no GUI will be needed.
I wonder if it is possbile to build XULRunner without X, GTK/KDE (without any GUI dependencies), as I will run the bot on FreeBSD server 6.4.
It may sound a bit weird but I need a bot with capacity to operate like browser, runs JS, modifies DOM, submit forms running on non-GUI environments.
I've looked into other browsers such as Lynx, Links, Hulahop, Chrome V8 engine, WebKit JavascriptCore but yet to find desirable output.
It's a part of school project, thesis. We will use to observe price change of budget airlines and after one year long data collection, we need to deduce pricing strategy and customer behavior. It is a serious Final Year Project.
Any hint or help is greatly appreciated! Thank you in advance!
Regards.
You should be able to make progress with selenium. It's a record/test/play tool but its core is manipulating the DOM.
Update from Grundlefleck's comment: As for launching the actual tests there is selenium remote-control, which allows you to write your tests in Java, Ruby, plain HTML and other possible drivers.
Yes, it is possible (but it might very well require LOTS of code changes).
No, I do not know any of the details.
I would not recommend this approach for your purposes. From your comment, it sounds like you are trying to scrape webpages. If you really need to use JavaScript, you can use a stand-alone JavaScript-engine (Mozilla's is available here). Otherwise, I would use Beautiful Soup with Python or Twill. You might also want to read this question.

Categories

Resources