I would like to setup a simple web browser that download a html page , parse it, generate a dom and execute the javascript code. I would like to know if there is a simple project(so not firefox which is good but too big to just understand this piece of logic) showing if it is the right way to handle this or someone to explain me if i am missing something. No particular language( but preferably be python, c#/c++/c ). I am stuck now at integrating the javascript engine, i don't know what to do.
Thx
I don't think its easy to pull off a javascript engine on your own. You could however use an open source engine (like WebKit's JS engine for example) and integrate it in your project.
More Infos:
http://www.webkit.org
google chrome is open source too with a neat javascript engine v8.
http://code.google.com/chromium/
http://code.google.com/p/v8/
another way could be nodejs. it's server side javascript using the v8 engine. so there is no rendering, just pure javascript. maybe thats enough if you do not need the rendering.
http://nodejs.org/
You might want to use the WebBrowser class from .NET for that purpose.
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.aspx
Related
I am an Android developer and I am new to java web development, so please apologize if my thought is wrong. In our web app project we have to compare images (GIF/PNG/JPEG) and have to show the result.
We got a javascript lib (Resemble.js) which will compare the images and will give the result.
Please help on implementing this lib to my web app project. As its a js lib we can implement it only in front end right? If so is this a correct way? Because we have all other process in backend. Or else how to use a js lib in backend
Or let me know the best way in implementing the same. We are using React.js for front-end.
To execute javascript you need a javascript engine. A java virtual machine is not designed to run javascript so short answer is: You should use the Resemble.js into your front-end browser's engine (or into a back-end server that process javascript like Node.js).
Long answer: There are implementation of JavaScript written in Java that you could use (I don't konw if supports HTML). Take a look at: How can I run JavaScript code at server side Java code?
I am trying to develop an app for Ubuntu Touch. I am using QML integrated with JavaScript.
I know that it is a huge mess to read or write to files in JavaScript when it is embedded in a webpage, but this is not embedded in a webpage so it should be easier right? The Ubuntu documentation is pretty bad right now.
Does anyone know how I can get this done? I want to get it done without using c++ because using QML, JavaScript and c++ seems like just a big mess. If the only way to do it is using c++ then I guess thats what I will have to do, but I would like to find another way.
You will have to write a wrapper for QFile class that will be exposed to QML code. Here is an example how to do that. I'm not sure if it's outdated but it looks like it should work just fine.
The simplest way to handle data in your Ubuntu Touch app is to actually use SQLite (No surprise there). You can find a really good tutorial on using SQLite with Ubuntu Touch here:
https://askubuntu.com/questions/352157/how-to-use-a-sqlite-database-from-qml
It seems like this is the most efficient way to handle app data in Ubuntu Touch. If you want to be able to write to an actual file, you need to handle it using c++. Check out Kamil's answer for that.
This is part of a project I am working on for work.
I want to automate a Sharepoint site, specifically to pull data out of a database that I and my coworkers only have front-end access to.
I FINALLY managed to get mechanize (in python) to accomplish this using Python-NTLM, and by patching part of it's source code to fix a reoccurring error.
Now, I am at what I would hope is my final roadblock: Part of the form I need to submit seems to be output of a JavaScript function :| and lo and behold... Mechanize does not support javascript. I don't want to emulate the javascript functionality myself in python because I would ideally like a reusable solution...
So, does anyone know how I could evaluate the javascript on the local html I download from sharepoint? I just want to run the javascript somehow (to complete the loading of the page), but without a browser.
I have already looked into selenium, but it's pretty slow for the amount of work I need to get done... I am currently looking into PyV8 to try and evaluate the javascript myself... but surely there must be an app or library (or anything) that can do this??
Well, in the end I came down to the following possible solutions:
Run Chrome headless and collect the html output (thanks to koenp for the link!)
Run PhantomJS, a headless browser with a javascript api
Run HTMLUnit; same thing but for Java
Use Ghost.py, a python-based headless browser (that I haven't seen suggested anyyyywhere for some reason!)
Write a DOM-based javascript interpreter based on Pyv8 (Google v8 javascript engine) and add this to my current "half-solution" with mechanize.
For now, I have decided to use either use Ghost.py or my own modification of the PySide/PyQT Webkit (how ghost works) to evaluate the javascript, as apparently they can run quite fast if you optimize them to not download images and disable the GUI.
Hopefully others will find this list useful!
Well you will need something that both understands the DOM and understand Javascript, so that comes down to a headless browser of some sort. Maybe you can take a look at the selenium webdriver, but I guess you already did that. I don't hink there is an easy way of doing this without running the stuff in an actually browser engine.
I'm wondering if there is a tool out there that does any javascript code generation. I'm asking because the team I'm on are not web developers. They are VB6 developers.
We are looking at a AJAX, JavaScript/jQuery, JSON, webservices model and was wondering if there were any tools that would provide the basics for JavaScript templates (i.e. jQuery AJAX calls)? Obviously a tool like this, might make the change from VB6 to JavaScript a little easier. It also seems like Code Generation is a buzz word so I thought there might be something for JavaScript.
If not, do you think this would be a good tool to work on (for the basics, as they would have to edit and modify to fit the need of the page)? Or do you think it is a waste of time?
Personally I think this is a complete waste of time. Spend a little time to teach your developers javascript or go another route. Endless time will be wasted tracking down bugs by blindly copying and pasting template data all over the place.
If you feel comfortable in the Java world then you can use as well. So you can code in Java and have the code be generated to Javascript etc.
From the GWT SDK documentation:
The GWT SDK provides a core set of Java APIs and libraries that allow you to productively build user interfaces and logic for the browser client. You then compile that source code to JavaScript. All that runs in the end is plain ol' JavaScript in the browser. Oh, and you can mix in and interoperate with JavaScript in your source code as well.
I recently had a similar thought and found this https://learning.divi.space/jquery-function-generator/
It is a Jquery function generator.
I'm trying to scrape and submit information to websites that heavily rely on Javascript to do most of its actions. The website won't even work when i disable Javascript in my browser.
I've searched for some solutions on Google and SO and there was someone who suggested i should reverse engineer the Javascript, but i have no idea how to do that.
So far i've been using Mechanize and it works on websites that don't require Javascript.
Is there any way to access websites that use Javascript by using urllib2 or something similar?
I'm also willing to learn Javascript, if that's what it takes.
I wrote a small tutorial on this subject, this might help:
http://koaning.io.s3-website.eu-west-2.amazonaws.com/dynamic-scraping-with-python.html
Basically what you do is you have the selenium library pretend that it is a firefox browser, the browser will wait until all javascript has loaded before it continues passing you the html string. Once you have this string, you can then parse it with beautifulsoup.
I've had exactly the same problem. It is not simple at all, but I finally found a great solution, using PyQt4.QtWebKit.
You will find the explanations on this webpage : http://blog.motane.lu/2009/07/07/downloading-a-pages-content-with-python-and-webkit/
I've tested it, I currently use it, and that's great !
Its great advantage is that it can run on a server, only using X, without a graphic environment.
You should look into using Ghost, a Python library that wraps the PyQt4 + WebKit hack.
This makes g the WebKit client:
import ghost
g = ghost.Ghost()
You can grab a page with g.open(url) and then g.content will evaluate to the document in its current state.
Ghost has other cool features, like injecting JS and some form filling methods, and you can pass the resulting document to BeautifulSoup and so on: soup = bs4.BeautifulSoup(g.content).
So far, Ghost is the only thing I've found that makes this kind of thing easy in Python. The only limitation I've come across is that you can't easily create more than one instance of the client object, ghost.Ghost, but you could work around that.
Check out crowbar. I haven't had any experience with it, but I was curious about the answer to your question so I started googling around. I'd like to know if this works out for you.
http://grep.codeconsult.ch/2007/02/24/crowbar-scrape-javascript-generated-pages-via-gecko-and-rest/
Maybe you could use Selenium Webdriver, which has python bindings I believe. I think it's mainly used as a tool for testing websites, but I guess it should be usable for scraping too.
I would actually suggest using Selenium. Its mainly designed for testing Web-Applications from a "user perspective however it is basically a "FireFox" driver. I've actually used it for this purpose ... although I was scraping an dynamic AJAX webpage. As long as the Javascript form has a recognizable "Anchor Text" that Selenium can "click" everything should sort itself out.
Hope that helps