How to save contents of AJAX request using PhantomJS

How to save contents of AJAX request using PhantomJS - javascript

I am trying to record constantly updating data on a webpage. In the Google Chrome developer tools, I can see that my incoming data is obtained by an AJAX request.
When I click on the 'got' text file, I can see the data that I want in Google Chrome. I would like to use PhantomJS to receive the AJAX responses and then save these responses to files.
So far I have a program that opens the URL of the webpage I'm interested in and can print out an overview of the network traffic that is being received, but I do not know how I can save the actual files as they come in. How would I do this?
Code so far:
var page = require('webpage').create();
var url = "www.site_of_interest.com";
page.onResourceRequested = function(request) {
console.log('Request ' + JSON.stringify(request, undefined, 4));
};
page.onResourceReceived = function(response) {
console.log('Receive ' + JSON.stringify(response, undefined, 4));
};
page.open(url);

Currently, this is not possible with PhantomJS. It does not expose the request/response content in those callbacks. Possible workarounds would be:
If the AJAX requests can be replayed (multiple requests to the same URL yield the same response every time), then you can make your own AJAX request in the onResourceReceived handler and save the response into a file using the fs module.
AJAX responses for the same URL would mean that some content changes in the page. You could write custom code to check the DOM for those changes and infer what the AJAX request might have been. It doesn't necessarily have to be DOM. Maybe the data is accessible in some JavaScript variable from the page context or it is saved in localStorage.
It is also possible to write a custom XMLHttpRequest implementation as a proxy which saves the responses so that they can be grabbed. It must be injected before any page JavaScript runs. So the page.onInitialized handler works best.
I have written a post about those workarounds for CasperJS, but they can be easily converted to be used with plain PhantomJS: A: How can I catch and process the data from the XHR responses using casperjs?.

Related

Execute Javascript function from HTML Request

I'm new to JS and am trying to execute a function on a site to pull all the data in table in JSON format.
I am using Parse Cloud Code to send my http requests, and the requests themselves are working, but I can't seem to get just the data itself.
It seems I am only able to get it in HTML and even then the objects do not display the same way that they do in the webpage's elements.
Any help/advice would be greatly appreciated!
Thank you in advance.
This is the link:
http://www.capetown.gov.za/Media-and-news#k=thinkwater
Here is the code:
Parse.Cloud.define('hello', function(req, res) {
res.success('Hi');
});
Parse.Cloud.define('htmlTest', function(req, res) {
Parse.Cloud.httpRequest({
method: 'POST',
url: 'http://www.capetown.gov.za/Media-and-news#k=thinkwater',
params: {
action: '/Media-and-news',
id: 'aspnetForm',
onsubmit: 'javascript:return WebForm_OnSubmit();'
},
headers: {
'Content-Type': 'application/json;charset=utf-8'
}
}).then(function(httpResponse) {
// success
res.success(httpResponse.text);
}, function(httpResponse) {
// error
res.error('Request failed with response code ' + httpResponse.status);
});
});

You can't execute client-side JavaScript function with an HTTP request.
Here's what happens when you load that page:
Server (the site you're trying to fetch) receives an HTTP request (from you)
Server generates initial HTML and responds to whoever made the above request, be it a browser, or your NodeJS Code. This "initial" HTML is what you get with a simple HTTP request. (which in your case doesn't contain the results you need)
If the HTML was served inside a browser, additional client-side JavaScript code is executed (i.e. the "javascript function" which you're trying to want to execute). This can only happen in a browser (or browser-like) environment. This JavaScript code (or function) modifies HTML (when loaded in a browser-environment, using DOM) and thus the final HTML is rendered. You can't get to these results with a simple HTTP request*, as that will only get you till #2.
*You can find out which URL the client javascript uses to fetch those results itself. Network tab in console tools might help with this. When you click on the button that triggers it to fetch the results keep an eye on which requests are made.
.
In your case it seems to be fetching JSON with a POST request from http://www.capetown.gov.za/_vti_bin/client.svc/ProcessQuery although it doesn't seem straightforward, it makes a series of requests each depending on the previous one, at least that's what it seems on the first glance. Feel free to explore this route yourself.
So in order to get the final HTML you will either
Need the direct URL that serves those results. This is usually the quickest but requires understanding the site's API and how it fetches results if it does so via AJAX (fetching via client-side JavaScript)
Use a fetcher with a browser or browser-like environment. E.g. PhantomJS (deprecated), Puppeteer, selenium, zombie

Gracefully Handle REST Server Error

I have an application that generates a PDF on the fly (accessed via service/generatePdf).
It returns a HTTP response with content-type="application/pdf" and the outputStream set to the binary contents.
Per business specs, when the user clicks a button we need to open a new window/tab that displays that PDF.
Submitting the form below, that all works beautifully on the Happy Path, where the response is an actual PDF.
<form action="service/generatePdf" method="post" name="PdfForm" target="_blank">
However, what does not work so well is when the PDF can't be generated for whatever reason. For example, let's say that the HTTP response outputStream is empty.
What I want to be able to do is display a nice error message on the first page, and not open the new window/tab.
But there doesn't seem to be any way to do it. Your choices seem to be
Return a valid PDF, or
Live with how the browser's PDF plugin handles corrupt files
I've tried jQuery, Ajax, the jQuery Form Plugin, the jQuery Download plugin plugin, and nothing seems to work.

The server should indicate error or success through a HTTP status code (e.g. 200 = OK, 500 = error). This you can catch in your REST client, with JQuery
$.ajax({
url: 'service/generatedPDF',
error: function(jqXHR, textStatus, errorThrown) {
... // show error message
},
).done(function(data) {
// data contains the PDF
}
It would be better to just create the PDF on the server, put it in a temporary store and send the URL to this PDF in the response. Once the client downloads the file, or after a certain download, the PDF is removed from the store.
In that case you would just open a new window with the URL you received from the server.
If the server provides the PDF in the initial request, you can convert it to a Data URI and open that data URI in a new window.

This is a fairly common requirement. You need to make your REST app a little smarter. It needs to check the result of the LiveCycle PDF generation and, if it wasn't successful, return an HTML response (with a content-type of text/html).
The browser is fairly dumb. It examines the content-type of the incoming response and, based on the content-type, launches the plug-in. It's then up to the plug-in to process the response. The PDF plug-in is also not so bright, it assumes that the incoming data stream is a PDF and if it's empty, it produces an error.
The key here is to send down the right content-type (and content) to the browser, which means checking the PDF result and sending a more appropriate response if the PDF result is a failure.
We often see this in LiveCycle orchestrations too. The temptation is to generate the PDF into a com.adobe.idp.Document object and then return that object directly. This leads to similar problems that you describe. Instead, the better approach is to check the result of the PDF generation. If it is valid, then return that response. If the PDF generation failed, then construct an HTML response in a com.adobe.idp.Document object (with the appropriate text/html content-type) and then return that instead.

Forcing an HTTP request to fail in browser

Is it possible to make an http request that has been sent to a server by the browser fail without having to alter the javascript?
I have a POST request that my website is sending to the server and we are trying to test how our code reacts when the request fails (e.g. an HTTP 500 response). Unfortunately, the environment that I need to test it in has uglified and compressed javascript, so inserting a breakpoint or altering the javascript isn't an option. Is there a way for us to utilize any browser to simulate a failed request?
The request takes a long time to complete, so using the browser's console to run a javascript command is a possibility.
I have tried using window.stop(), however, this does not work since I need to failure code to execute.
I am aware of the option of setting up a proxy server, but would like to avoid this is possible.

In Chrome (just checked v63), you can actually block a specific URL (or even a whole domain) from the Network tab. You only need to right-click on the entry and select Block request URL (or Block request domain.)

One possible solution is to modify the XMLHttpRequest objects that will be used by the browser. Running this code in a javascript console will cause all future AJAX calls on the page to be redirected to a different URL (which will probably give a 404 error):
XMLHttpRequest.prototype._old_open =
XMLHttpRequest.prototype._old_open || XMLHttpRequest.prototype.open;
XMLHttpRequest.prototype.open = function(method, url, async, user, pass) {
return XMLHttpRequest.prototype._old_open.call(
this, method, 'TEST-'+url, async, user, pass);
};

Don't overlook the simplest solution: disconnect your computer from the Internet, and then trigger the AJAX call.

Chrome's dev tools have an option to "beautify" (i.e. re-indent) minified JavaScript (press the "{}" button at the bottom left). This can be combined with the "XHR breakpoint" option to break when the request is made. XHR breakpoints don't support modifying the response though AFAIK, but you should be able to find a way to do it via code.

To block a specific URL and make an API call failure, you just need to follow below steps:
Go to Network tab in your browser.
Find that API call which needs to fail(as per your requirement).
Right click on that API call and
Click on 'Block Request URL', you can unblock as well in same manner as the option will turn into 'Unblock'

Just type at the brower a changed URL, e.g. the well formed URL e.g. http://thedomain.com/welcome/ by another placing "XX": http://thedomain.com/welcomeXX/ , that will cause a 404 error (not found)

NodeJS servers requests

I'm working with NodeJS and I'm still familiarizing with it.
Given the structure of my system I have two NodeJS servers running in different machines.
The user/browser sends a request to the first server, which returns to the browser a JSON file that is located in this first machine.
This first server also updates this JSON file every 5-10 seconds sending a request to the second server, which returns another JSON file which data will overwrite the one in the JSON file in the first server, so the next user/browser request will be updated.
This second server also has a NodeJS server running but it only dispatches the request coming from the first server.
I have this structure since I don't want the user to know about the second server for security reasons (Anyone could see the redirection with any dev tools).
This two events are executed asynchronously since the Browser petitions may be in different time from the event updating the JSON file.
My question is: How can I update the JSON file in the first server? I wonder if there's a NodeJS library I can use for requesting the new JSON file to the second server.
I make the Browser-FirstServer petition via AJAX and everything works properly, but AJAX only works on the client side, so I'm not really sure how to do this for the First-Second server request.
Any help would be appreciated.
Something i'm xpecting for is the following:
setInterval(function(){
// make request to server 2
//receive JSON file
// use 'fs' for overwriting the JSON from server 1
}, 5000)

You can either use the built in http/https modules in nodejs or use something like request
var request = require('request');
request('/url/for/json', function (error, response, body) {
if (!error && response.statusCode == 200) {
//write body to the file system
}
});

Instead of operating both as web (html) servers, I strongly advise connecting to the 2nd using sockets... This way you can pass information/changes back and forth whenever an event happens. Here's an example of using sockets in node.js

jquery.post(): how do i honor a redirect from the server?

I'm trying my hand at unobtrusive JS, using JQuery in my Ruby On Rails app.
After the user fills out a form, the client-side JQuery code calls:
$.post("/premises", ui.form)
I can see the POST hit the server, and I can see the server emit a redirect notice to http://localhost:3000/users/42 complete with the data to be displayed.
But the browser page doesn't change. This doesn't really surprise me -- the whole point of client-side javascript is to control what gets updated -- I get that. But in this case, I'd like to honor whatever the server replies with.
I tried extending the call to post() based on How to manage a redirect request after a jQuery Ajax call:
$.post("/premises",
ui.item,
function(data, textStatus) {
if (data.redirect) {
// data.redirect contains the string URL to redirect to
window.location.href = data.redirect;
} else {
// data.form contains the HTML for the replacement form
$("#myform").replaceWith(data.form);
}
});
... but (among other problems) data.redirect is undefined. I suspect the real answer is simple, right? Looking forward to it!

The post you refer to uses JSON as return value and it is constructing that json on server side. it means if there is redirect your data object would look like
{redirect:'redirecturl.html'}
and if it is not redirect then data object would be like
{form:html-string-for-form}
now job is to construct json object accordingly on server side

The server is saying that the data you want to process with JavaScript is available at a different URL, not that the browser should load a new document into the top level frame. Sending the browser to the URL where it was told the data it was requesting with JS is wouldn't be honouring the redirect.
If you want to do that, then the server should respond with data (in the body of the response) that the JavaScript interprets as a reason to assign a new value to location.

data.redirect is probably undefined because you're not specifying it on the server side. In the answer you linked to the point was to have the server always respond with 200 regardless of the outcome, and then the JSON body it sends back determines how the client reacts. So, on the server side you'd want to respond with {"redirect" : "/where/to/go"}

Develop Reference

JavaScript is the programming language of the Web.