How do I dynamically create a document for download in Javascript?

How do I dynamically create a document for download in Javascript? - javascript

I'm writing some Javascript code that generates an XML document in the client (via Google Earth plugin). I'd like the user to be able to click a button on the page and be prompted to save that XML to a new file. If I were generating the XML server-side this would be easy, just make the button open the link. But the XML is generated client-side.
I've come up with a couple of hacks that half-work, inspired in part by this StackOverflow question. But neither completely work. Here's a demo HTML with embedded code:
<html><head><script>
function getData() { return '<?xml version="1.0" encoding="UTF-8"?><doc>Hello</doc>'; }
function dlDataURI() {
window.open("data:text/xml;charset=utf-8," + getData());
}
function dlWindow() {
var w = window.open();
w.document.open();
w.document.write(getData());
w.document.close();
}
</script><body>
<div onclick="dlDataURI()">Click for Data URL</div>
<div onclick="dlWindow()">Click for Window</div>
</body></html>
The dlDataURI() version works great in Firefox, poorly in Chrome (can't save), and not at all in IE. The Window() version works OK in Firefox and IE, and not well in Chrome (can't save, XML embedded inside HTML). Neither version ever prompts a user download, it always opens a new window trying to display the XML.
Is there a good way to do what I want in client side Javascript? I'd like this to work in today's browsers, ideally Firefox, MSIE 8, and Chrome.
Update with sample Downloadify code
window.onload = function() {
Downloadify.create("dlify", {
data: getData(),
filename: "data.xml",
swf: 'media/downloadify.swf',
downloadImage: 'images/download.png',
width: 100, height: 30});};

The best I've seen as far is Downloadify by Doug Neiner, it requires Flash but works very well:
"A tiny JavaScript + Flash library that enables the generation and saving of files on the fly, in the browser, without server interaction."
Check this video.

If Flash is an option then the Flash Player (version 10+) offers the means for limited reading/writing of files from the local filesystem.
Check this out:
http://help.adobe.com/en_US/AS3LCR/Flash_10.0/flash/net/FileReference.html#save%28%29

Related

Best option for crawling a website that loads content via ajax [duplicate]

Please advise how to scrape AJAX pages.

Overview:
All screen scraping first requires manual review of the page you want to extract resources from. When dealing with AJAX you usually just need to analyze a bit more than just simply the HTML.
When dealing with AJAX this just means that the value you want is not in the initial HTML document that you requested, but that javascript will be exectued which asks the server for the extra information you want.
You can therefore usually simply analyze the javascript and see which request the javascript makes and just call this URL instead from the start.
Example:
Take this as an example, assume the page you want to scrape from has the following script:
<script type="text/javascript">
function ajaxFunction()
{
var xmlHttp;
try
{
// Firefox, Opera 8.0+, Safari
xmlHttp=new XMLHttpRequest();
}
catch (e)
{
// Internet Explorer
try
{
xmlHttp=new ActiveXObject("Msxml2.XMLHTTP");
}
catch (e)
{
try
{
xmlHttp=new ActiveXObject("Microsoft.XMLHTTP");
}
catch (e)
{
alert("Your browser does not support AJAX!");
return false;
}
}
}
xmlHttp.onreadystatechange=function()
{
if(xmlHttp.readyState==4)
{
document.myForm.time.value=xmlHttp.responseText;
}
}
xmlHttp.open("GET","time.asp",true);
xmlHttp.send(null);
}
</script>
Then all you need to do is instead do an HTTP request to time.asp of the same server instead. Example from w3schools.
Advanced scraping with C++:
For complex usage, and if you're using C++ you could also consider using the firefox javascript engine SpiderMonkey to execute the javascript on a page.
Advanced scraping with Java:
For complex usage, and if you're using Java you could also consider using the firefox javascript engine for Java Rhino
Advanced scraping with .NET:
For complex usage, and if you're using .Net you could also consider using the Microsoft.vsa assembly. Recently replaced with ICodeCompiler/CodeDOM.

In my opinion the simpliest solution is to use Casperjs, a framework based on the WebKit headless browser phantomjs.
The whole page is loaded, and it's very easy to scrape any ajax-related data.
You can check this basic tutorial to learn Automating & Scraping with PhantomJS and CasperJS
You can also give a look at this example code, on how to scrape google suggests keywords :
/*global casper:true*/
var casper = require('casper').create();
var suggestions = [];
var word = casper.cli.get(0);
if (!word) {
casper.echo('please provide a word').exit(1);
}
casper.start('http://www.google.com/', function() {
this.sendKeys('input[name=q]', word);
});
casper.waitFor(function() {
return this.fetchText('.gsq_a table span').indexOf(word) === 0
}, function() {
suggestions = this.evaluate(function() {
var nodes = document.querySelectorAll('.gsq_a table span');
return [].map.call(nodes, function(node){
return node.textContent;
});
});
});
casper.run(function() {
this.echo(suggestions.join('\n')).exit();
});

If you can get at it, try examining the DOM tree. Selenium does this as a part of testing a page. It also has functions to click buttons and follow links, which may be useful.

The best way to scrape web pages using Ajax or in general pages using Javascript is with a browser itself or a headless browser (a browser without GUI). Currently phantomjs is a well promoted headless browser using WebKit. An alternative that I used with success is HtmlUnit (in Java or .NET via IKVM, which is a simulated browser. Another known alternative is using a web automation tool like Selenium.
I wrote many articles about this subject like web scraping Ajax and Javascript sites and automated browserless OAuth authentication for Twitter. At the end of the first article there are a lot of extra resources that I have been compiling since 2011.

I like PhearJS, but that might be partially because I built it.
That said, it's a service you run in the background that speaks HTTP(S) and renders pages as JSON for you, including any metadata you might need.

Depends on the ajax page. The first part of screen scraping is determining how the page works. Is there some sort of variable you can iterate through to request all the data from the page? Personally I've used Web Scraper Plus for a lot of screen scraping related tasks because it is cheap, not difficult to get started, non-programmers can get it working relatively quickly.
Side Note: Terms of Use is probably somewhere you might want to check before doing this. Depending on the site iterating through everything may raise some flags.

I think Brian R. Bondy's answer is useful when the source code is easy to read. I prefer an easy way using tools like Wireshark or HttpAnalyzer to capture the packet and get the url from the "Host" field and the "GET" field.
For example,I capture a packet like the following:
GET /hqzx/quote.aspx?type=3&market=1&sorttype=3&updown=up&page=1&count=8&time=164330
HTTP/1.1
Accept: */*
Referer: http://quote.hexun.com/stock/default.aspx
Accept-Language: zh-cn
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Host: quote.tool.hexun.com
Connection: Keep-Alive
Then the URL is :
http://quote.tool.hexun.com/hqzx/quote.aspx?type=3&market=1&sorttype=3&updown=up&page=1&count=8&time=164330

As a low cost solution you can also try SWExplorerAutomation (SWEA). The program creates an automation API for any Web application developed with HTML, DHTML or AJAX.

Selenium WebDriver is a good solution: you program a browser and you automate what needs to be done in the browser. Browsers (Chrome, Firefox, etc) provide their own drivers that work with Selenium. Since it works as an automated REAL browser, the pages (including javascript and Ajax) get loaded as they do with a human using that browser.
The downside is that it is slow (since you would most probably like to wait for all images and scripts to load before you do your scraping on that single page).

I have previously linked to MIT's solvent and EnvJS as my answers to scrape off Ajax pages. These projects seem no longer accessible.
Out of sheer necessity, I have invented another way to actually scrape off Ajax pages, and it has worked for tough sites like findthecompany which have methods to find headless javascript engines and show no data.
The technique is to use chrome extensions to do scraping. Chrome extensions are the best place to scrape off Ajax pages because they actually allow us access to javascript modified DOM. The technique is as follows, I will certainly open source the code in sometime. Create a chrome extension ( assuming you know how to create one, and its architecture and capabilities. This is easy to learn and practice as there are lots of samples),
Use content scripts to access the DOM, by using xpath. Pretty much get the entire list or table or dynamically rendered content using xpath into a variable as string HTML Nodes. ( Only content scripts can access DOM but they can't contact a URL using XMLHTTP )
From content script, using message passing, message the entire stripped DOM as string, to a background script. ( Background scripts can talk to URLs but can't touch the DOM ). We use message passing to get these to talk.
You can use various events to loop through web pages and pass each stripped HTML Node content to the background script.
Now use the background script, to talk to an external server (on localhost), a simple one created using Nodejs/python. Just send the entire HTML Nodes as string, to the server, where the server would just persist the content posted to it, into files, with appropriate variables to identify page numbers or URLs.
Now you have scraped AJAX content ( HTML Nodes as string ), but these are partial html nodes. Now you can use your favorite XPATH library to load these into memory and use XPATH to scrape information into Tables or text.
Please comment if you cant understand and I can write it better. ( first attempt ). Also, I am trying to release sample code as soon as possible.

"Access is Denied" when embedding file from blob URL in IE

I have a web service that is sending the client a file as an arraybuffer which is then read into a blob object:
$scope.contentType = response.headers["content-type"];
$scope.file = new Blob([response.data], { type: $scope.contentType });
$scope.fileUrl = URL.createObjectURL($scope.file);
$scope.content = $sce.trustAsResourceUrl($scope.fileUrl);
I am using an object tag as the container:
<object id="documentContainer" ng-show="loaded" ng-attr-type="{{contentType}}" ng-attr-data="{{content}}" class="document-container"></object>
This works great in FF, chrome, mobile browsers, web browsers developed by alien species who have never had contact with humanity, etc., but not in IE.
When the data parameter of the object tag is set, IE responds in the console with
Error: Access is denied.
This seems to be some sort of security feature in IE where it doesn't want to use the file as a source because it resides on the client machine. It prohibits access even if you use javascript to create a brand new dom element with the data source set.
Microsoft provides their own blob methods like msSaveOrOpenBlob, but I need to be able to embed the file in the browser, not prompt the user to open the file in an external application.
Does anyone know of a workaround or way to embed the blob, which can be a wide variety of file types, in IE? I would hate to have to drastically refactor the web service and front end code just to accommodate IE, but it is looking like that might have to be the case.

I think the answer is "no". Our site generates PDF on the fly but we sniff the browser for what can be done with the returned PDF.
Example: http://www.cloudformatter.com/CSS2Pdf.Demos.Structures
If you are on Chrome you could select the "Embed PDF" here and it works like a charm ... if you are on IE, even if you select "Embed", it downloads the file. Because IE cannot and we just reroute anyone on IE to the download code.
http://caniuse.com/#feat=datauri
And don't get us started on what else is wrong on IE, several of the pages just broke because IE stopped serializing end tags for some "p" tags in the document.

How to Enable JavaScript file API in IE8

I have developed a web application in asp.net, there is a page in this project which user should choose a file in picture format (jpeg, jpg, bmp,...) and I want to preview image in the page but I don't want to post file to server I want to handle it in client i have done it with java scripts functions via file API but it only works in IE9 but most of the costumers use IE8 the reason is that IE8 doesn't support file API is there any way to make IE8 upgrade or some patches in code behind I mean that check if the browser is IE and not support file API call a function which upgrades IE8 to IE9 automatically.
I don't want to ask user to do it in message I want to do it programmatically!!
even if it is possible to install a special patch that is required for file API
because customers thought it is a bug in my application and their computer knowledge is low
what am I supposed to do with this?
I also use Async File Upload Ajax Control But it post the file to server any way with ajax solution and HTTP handler but java scripts do it all in client browser!!!
following script checks the browser supports API or not
<script>
if (window.File && window.FileReader && window.FileList && window.Blob)
document.write("<b>File API supported.</b>");
else
document.write('<i>File API not supported by this browser.</i>');
</script>
following scripts do the read and Load Image
function readfile(e1)
{
var filename = e1.target.files[0];
var fr = new FileReader();
fr.onload = readerHandler;
fr.readAsText(filename);
}
HTML code:
<input type="file" id="getimage">
<fieldset><legend>Your image here</legend>
<div id="imgstore"></div>
</fieldset>
JavaScript code:
<script>
function imageHandler(e2)
{
var store = document.getElementById('imgstore');
store.innerHTML='<img src="' + e2.target.result +'">';
}
function loadimage(e1)
{
var filename = e1.target.files[0];
var fr = new FileReader();
fr.onload = imageHandler;
fr.readAsDataURL(filename);
}
window.onload=function()
{
var x = document.getElementById("filebrowsed");
x.addEventListener('change', readfile, false);
var y = document.getElementById("getimage");
y.addEventListener('change', loadimage, false);
}
</script>

You can't install anything special to add support for File API in IE8. What you can do is use a polyfill in order to implement this functionality in browsers that don't natively support it. Take a look at this list of HTML5 Cross Browser Polyfills (you should find something suitable for you in File API / Drag and Drop section).

This works for me in pre IE10, i use https://github.com/eligrey/FileSaver.js for every other browser, note though this works it's not perfect, because it's IE and well you know what I mean
Hope this helps
/**
* saves File, pops up a built in javascript file as a download
* #param {String} filename, eg doc.csv
* #param {String} filecontent eg "this","is","csv"
* #param {String} mimetype eg "text/plain"
*/
function saveAs (filename, filecontent, mimetype) {
var w = window.open();
var doc = w.document;
doc.open( mimetype,'replace');
doc.charset = "utf-8";
doc.write(filecontent);
doc.close();
doc.execCommand("SaveAs", null, filename);
}

After some researches i have got there is no way to Enable file API in IE8 but, i got some thing that want to share it with you .....
Not directly, HTML 5 is not supported by IE8. There is addons for Canvas, and also there is Google Chrome Frame, a plugin to add HTML 5 to IE older than 9.
As i understood , Google Chrome Frame is perfect way to use HTML5.
It increases IE speed 10 times and also it's usage is very easy and i describe it for you all members...
Google Chrome Frame
A plugin for Internet Explorer that add to the Microsoft's browser, a full HTML 5 support and the JavaScript compiler of Chrome!
A stable version was released on September 22, 2010 and a lot of sites have added the code in their pages.
This plugin will work on Internet Explorer 6, 7 and further versions. Google wants to break a barrier that prevents the Web to evolve: the most common browser and its lack of compatibility with the new standards.
When it is recognized, Internet Explorer will run under WebKit, the rendering engine of Chrome and Safari, and the it will use the ultra-fast JavaScript compiler in replacement if the IE interpreter.
The advantage of this plugin is great for the compatibility of web applications and will become even more useful with WebGL integrated in Webkit, which let us have applications in 3D on the browser: a very different Web!
WebGL is also supported by Firefox since the version 3.7.
Since May 2011, the plugin can be installed without administrator rights, so on older version of IE which are incorporated into a server and can not be updated.
A tag in the code of a Web page will display a message prompting the user to download the plugin. Once it is installed, IE run as Chrome and will support HTML 5.
Features of Frame
The off-line mode.
The and tags. Microsoft also plans to implement them in IE.
Canvas.
WebGL.
CSS 3.
JavaScript compiler.
Compatibility at Acid 3 level.
See the vidéo.
Reaction from Microsoft
We were expecting that Microsoft has not appreciated really this initiative that promotes HTML 5 to the detriment of their own solution, Silverlight. The firm highlights a security problem:
Given the security issues with plug-ins in general and Google Chrome in particular, Google Chrome Frame running as a plug-in has doubled the attack area for malware and malicious scripts.
**
Chrome Frame. Automatic install. ->
http://www.google.com/chromeframe/?quickenable=true Google Chrome
Frame code. -> https://developers.google.com/chrome/chrome-frame/
Chrome Frame: Developer Guide. -> http://www.chromium.org/developers/how-tos/chrome-frame-getting-started
**

Embed PDF in page and print - IE9 issues

I have some code which dynamically loads a PDF document into a web page by setting a container's innerHTML to the returned string of this function:
function getPdfString(url) {
return '<object data="' + url + '" type="application/pdf" classid="clsid:ca8a9780-280d-11cf-a24d-444553540000" style="width:100%;height:600px"></object>';
}
In IE with the Adobe Reader plugin installed (as determined by the code that detects the Adobe ActiveX at PDFObject), my code inserts this HTML into a hidden container, puts a reference to the object element into el, and then runs this code (Repeater is a custom class):
log("** start repeater **");
var r = _repeater = new Repeater(function() {
try {
var delta = timeInterval();
log("iteration - " + delta + "ms");
el.gotoFirstPage(); //throws exceptions until the PDF is loaded
log("** assuming success, stop **");
r.stop();
r = undefined;
setTimeout(function() {
el.print(); //should succeed, can't tell because it doesn't throw or return anything
}, 100);
} catch(e) { }
}, 0, 100);
This is very convoluted, but necessary because there's no way to tell when the PDF is loaded, nor whether or not el.print() succeeded. It took me a long time to figure out, but it seems to work well in IE7 and IE8. IE9 has been hit and miss, usually working on my local machine (which runs IIS7.5), but sometimes not. IE9 has never worked when the site is running on my test server, which runs IIS6 out of necessity. I don't know if the version of IIS that I am running is causing my issue, but judging from the Fiddler logs, I doubt it.
I have been poring over Fiddler, making small tweaks here and there to see if anything makes a difference. So far, nothing has. The only difference that I can see is the Server header.
I found that the classid attribute is needed by IE7 and IE8; otherwise, they will make multiple requests for the PDF, and often fail to load it. It also significantly improves IE9's caching behavior.
The PDF is slightly different each time it is acquired. I'm not currently saving it to a temporary file or anything, though I could if it is absolutely necessary (so I could re-send the same PDF in a subsequent request).
The response is being gzip encoded, but I have the same problem whether it is enabled or not.
I have noticed that when the problem occurs, terminating AcroRd32.exe sometimes fixes the issue temporarily.
Side note: Firefox and Opera use the same HTML in an in-page popup which embeds the PDF. This works perfectly fine. (The Adobe Reader NPAPI plugin doesn't have a print() method on it that I have been able to find, sadly, so the popup instructs users to click the embedded view's Print button)
Nothing is stopping me from trying other methods of embedding such as an iframe, but I had some weird issues with it when I first tried it (can't remember what they were now, after all this mess).
I think that's everything I know about the problem right now...

This seems to be a problem specifically with Adobe Reader and the IE plugin. I've found a few forum threads that indicate this is a common, reproducible error (http://forums.adobe.com/thread/758489).
The solution seems to be using an iFrame over an <object>/<embed> tag.

Javascript: cross-browser serverless file upload and download

So I'm working on a web app where a user will need to:
provide a file full of data to work on
save their results to a file
All the manipulation is done in javascript, so I don't really have a need for server-side code yet (just static hosting), and I like it that way.
In Firefox, I can use their file manipulation api to allow a user to upload a file directly into the client-side code (using a standard <input type=file/>) and create an object URL out of a file so a user can save a file created by the client-side code.
<input type="file" id="input" onchange="handleFiles(this.files)">
<a download="doubled" id="ex">right-click and save as</a>
<script>
function handleFiles(fileList){
var builder = new MozBlobBuilder();
var file = fileList[0];
var text = file.getAsBinary();
builder.append(text);
builder.append(text);
document.getElementById('ex').href = window.URL.createObjectURL( builder.getBlob() );
}
</script>
So this is great. Now I want to do the same in other browsers - or, at least, modern versions of other browsers. Do similar APIs exist for Chrome and IE? If so, has anyone already built a cross-browser wrapper that I should be using?

It's mostly available on Firefox 3.6+, Chrome 10+, Opera 11.1+, and hopefully Safari 6 and IE 10.
See: http://caniuse.com/#search=FileReader.

Check out FileSaver.js and the a[download] attribute (supported in Chrome dev channel). Blob (object) URLs have somewhat limited support right now.

Develop Reference

JavaScript is the programming language of the Web.