Mechanize to make post request on behalf of Selenium / WebDriver? - javascript

Because Selenium can traverse javascript websites (which Mechanize cannot), and Mechanize can make post requests (which Selenium cannot), in some cases it would be powerful to use the two in conjunction.
The answer by +Zarkonnen to this question suggests that one would use Selenium initially, then Mechanize would step in to make the post request and than pass that back to Selenium.
How would one integrate Mechanize post method into Selenium?
I am using the Ruby versions of these libraries, but any information would be useful.
EDIT Here's a Venn Diagram to hopefully clarify the functionality I am seeking.
"Javascript website" in this case simply means a website whose functions in question will not work without javascript enabled. Meaning, say I needed to traverse a website to get to a form on that website. Along the way I ran into buttons which didn't work without javascript enabled. Then, in order for the form to work the way I wanted, I had to do a custom post. In this case scenario, neither Selenium WebDriver nor Mechanize can handle it by themselves - they need help from each other.
How would you accomplish this? Would you use Selenium and then have Mechanize step into to help when you had to do the post? Would you use some other method to make a post within Selenium? Would you use the Capybara gem? I get there are limitations with WebDrivers making Posts, but I know there must be a workaround.

The question is a bit vague, but both Selenium (WebDriver) and a good non-interactive HTTP library (like Mechanize) are crucial elements in a tester's armoury.
In general I say that if you need to simulate a human being in an interactive scenario, then you can't beat WebDriver. However, the web is built upon HTTP, everything Selenium does is HTTP, and so the less interactive your scenario, the less you need to simulate a real user, and the more performance matters, the more you should look to Mechanize, and possibly even lower-level HTTP libraries.
Because of that, although the two technologies are complementary in a sense, I can't think of all that many good reasons to use them in conjunction. But perhaps the following:
WebDriver manages a user on a web site, Mechanize is used to query REST endpoints to dump metrics, clear caches, run usage reports, kick off simultaneous requests to simulate concurrency.
Mechanize is used to seed/prepare test data prior to a WebDriver run.
Those are both examples where WebDriver could be used for everything, but where it would be vastly easier and more efficient to use a non-interactive tool.

Related

Python Beautiful Soup (HTML Parsing)

I am a beginner in in Python3.6 using BeautifulSoup to perform "web-scraping."
Once I have ran a request.get() and prettyify the output I notice that the webpage does not return the values, it would seem to be storing code which would be related to the value.
Here is the link to the webpage in specific:
http://www.tennisabstract.com/cgi-bin/wplayer.cgi?p=AngeliqueKerber&f=r1
I am trying to extract the hand which the player uses in Tennis. Highlighted Yellow from picture below:
Picture of what I am trying to obtain:
I would appreciate feedback concerning the outline of the question, if it is confusing (or non-standard) feedback such as this will help me in the future to ensure I am asking questions appropriately.
There are two options (mostly).
The first one is easier and slower - browser emulation. You just try to use the site as a normal user - with browser. There is a python module for this task - selenium. It uses specific webdriver to use browser. There are plenty of webdrivers available (for example chromedriver to use chrome). Also, there are headless solutions (PhantomJS for example).
The other way is smarter and faster - XMLHttpRequests (XHRs). Basically - site uses some hidden API to get info via JS, and you try to find out how exactly. In most cases you can use Inspect Element toolbox of your browser. Switch to the network tab of it, clear it an try to get results. Then sort it to see only XHRs. It usually returns JSON-based values that are easily converted into a python dictionary using json() method of Response object.
Here's a really great GitHub that someone made on this website, an API practically you can change/edit few things (fork it) and then use it the way you want to.
HERE
It uses Selenium webdriver but it's high quality.

HttpClient with Selenium or HTMLUNIT?

Alright, so I'm in a small pickle. I'm running into issues with JSoup since the page needs Javascript to to finish loading some of the page. Fortunately, I've worked around it in the past (parsed the raw javascript code), and it's very tedious. As of late, I tried to make a program to login to a website but it requires a token from a element. That form element is not visible unless JavaScript is executed, so it wont show up at all for me to even extract. So I decided to look into Selenium.
First question, is this the library I should be looking into? The reason why I'm so bent on using HttpClient is because some of these websites are very high in traffic and doesn't load up all the way BUT I don't need these pages to load up all the way. I just need it to load up enough to where I can retrieve the login token. I prefer to communicate with the webserver with raw JSON/POST methods once I discover the the methods required vs. having Selenium automate a click/wait/type sequence.
Basically, I only need selenium to load up 1/4 of the page, just to retrieve request tokens. The rest of my program will send POST methods using HttpClient.
Or should I just let selenium do all the work? My goal is speed. I need to login, purchase an item fast.
Edit: Actually, I might go with HtmlUnit because it's very minimal. I only need to scrape information, and I don't want to run Selenium's StandAlone Server. Is this the better approach?
Basically, HtmlUnit is quicker than Selenium so if you are going for speed you should use that. Anyway, keep in mind that Selenium has its own implementation of HtmlUnitDriver. So, as another option, you could use Selenium with HtmlUnit. The difference between them is that HtmlUnit is a browser itself without GUI, meanwhile Selenium works calling browsers feature. You may want to take a look at this other question for further details: Selenium vs HtmlUnit?

Headless testing of remote JavaScript web-applications

There is an web-application, that needs to be tested. This application uses AJAX and jQuery. Tests have to be written for all possible interactions with the browser and client-side. There are some tools for this, for example, Selenium IDE, but I wonder if it is possible to use any headless browser.
So, requirements for the testing system are:
Query pages from the remote server, simulate browser behavior (basically we give the headless browser the URL, browser fetches the page and launches tests on it);
Inject tested JavaScript or test JavaScript already loaded on the remote page;
Use any of testing frameworks than can be integrated with any of CI software (Jasmine, Mocha etc.).
It is possible to use mocking techniques when dealing with AJAX requests, for example, but I'm trying to test real-life application. Hope that this question will be useful for anybody.
As far as I investigated this topic, there is no means of doing this so far.
In my case I have a server PHP application, that talks with the outside world using REST interface. My JavaScript code talks to server and performs some interface manipulations depending on responses. So, my goal is to test the JavaScript code, but it hardly relies on the server side. So, I have to ways of testing JavaScript:
Using mocks, consider looking up this article. You basically simulate you server-side API. There is a problem with this method - whenever you change your API, you have to perform corresponding changes in your mocks, so the testing set will be actual.
Calling JavaScript testing utilities directly from PHPUnit tests (or whatever server-side testing is used) - there are no solution for this yet, unfortunately. But this method will save a lot of developers time (no need to rewrite mocks for 100-200 example queries), also we can guide the server's behavior on-the-fly.
Please, give a feedback on the second approach. If it is really needed, I guess it make sense to implement it.

Website Performance testing automated tools/frameworks

I have a website that uses AJAX heavily to communicate with the server. Now I want to do performance and stress testing using automatic scripts. Do you have any recommendations?
The functionality maybe, given a URL, hook up the page ready callback. In the callback I can emulate "click" to some button using the button's id property.
Thanks.
There are a bunch of tools you can use to automate the UX of your site to make sure that things work fine. I'll break them down arbitrarily.
The ones that come to mind are Sahi and Selenium. These allow you to automate clicking, submitting etc. similar to what GUI testing tools do and test your application.
Mechanize (Perl version(the original), Ruby version and python version) are used to write scripts that can interact with your website to simulate a user. They're not "GUI" based so don't rely on a browser. This might affect what you can do with Javascript. Another similar tool (although I don't have personal experience with it) is watir.
If you want to hammer your website (i.e. performance testing), they only thing I've come across is the Apache Benchmarker. It can generate reports on how much raw traffic your site can take before it comes crashing down. Assuming your callbacks are not stateful, you can use this to hammer them.
use Selenium...

Emulate javascript _dopostback in python, web scraping

Here LINK it is suggested that it is possible to "Figure out what the JavaScript is doing and emulate it in your Python code: " This is what I would like help doing ie my question. How do I emulate javascript:__doPostBack ?
Code from a website (full page source here LINK:
<a style="color: Black;" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$gvSearchResults','Page$2')">2</a>
Of course I have basically know idea where to go from here.
Thanks in advance for your help and ideas
Ok there are lots of posts asking how to CLICK a javascript button when web scraping with python libraries mechanize, beautifulsoup....,similar. I see a lot of "that is not supported" responses use THIS non python solution. I think a python solution to this problem would be of great benefit to many. In that light I am not looking for answers like use x,y or z which are not python code or require interacting with a browser.
The mechanize page is not suggesting that you can emulate JavaScript in Python. It is saying that you can change a hidden field in a form, thus tricking the web server that a human1 has selected the field. You still need to analyse the target yourself.
There will be no Python-based solution to this problem, unless you wish to create a JavaScript interpreter in Python.
My thoughts on this problem have led me to three possible solutions:
create an XULRunner application
browser automation
attempt to interpret the client-side code
Of those three, I've only really seen discussion of 2. I've seen something
close to 1 in a commercial scraping application, where you basically create
scripts by browsing on sites and selecting things on the pages that you
would like the script to extract in the future.
1 could possibly made to work with a Python script by accepting a
serialisation (JSON ?) of wsgi Request objects, getting the app to fetch the
URL, then sending the processed page as a wsgi Response object. You could
possibly wrap some middleware around urllib2 to achieve this. Overkill
probably, but kind of fun to think about.
2 is usually achieved via Selenium RC (Remote Control), a testing-centric
tool. It provides a few methods like getHtmlSource but most people that I've
heard using it get don't like its API.
3 I have no idea about. node.js is very hot right now, but I haven't
touched it. I've never been able to build spidermonkey on my Ubuntu
machine, so I haven't touched that either. My hunch is that in order to do
this, you would provide the HTML source and your details to a JS
interpreter, that would need to fake being your User-Agent etc in case the
JavaScript wanted to reconnect with the server.
1 well, more technically, a JavaScript compliant User-Agent, which is almost always a web browser used by a human

Categories

Resources