What is the best way to convert HTML into Excel - javascript

I have an HTML page which has a flash chart(FusionCharts) and HTML table. I need to convert this whole thing into Excel. HTML table should be displayed in cells of excel sheet. Flash chart can be displayed as an image.
Is there any open source API that we could use for achieving this. Could you let me know what are the possible options.
Can this be done by using javascript alone.

The HTML table is relatively easy. You can download the page, parse the HTML (there are various HTML parsing libraries available), extract the table and convert it into CSV (which Excel can load), or directly create an Excel file, e.g. using Java POI, as suggested above.
The Flash part is significantly harder. There are quite a few tools available to capture flash to an image, you'd need to use one of them. This can be tricky, as Flash might be interactive, so you'd possibly have to remote-control the Flash part so it shows the right image before capturing. Hard to tell without more info.
That said, screen-scraping (which is what you're doing) is always labour-intensive and fragile. You should really push for a better interface to get your data from, it will save loads of hassle in the long run.

Just set the content type of the page to "application/vnd.ms-excel". If the html page is just a table it will open with excel and look perfect. You can even add background colors and font styles.
Try some of these content types
application/excel
application/vnd.ms-excel
application/x-excel
application/x-msexcel

Excel can convert HTML tables by default. The easiest way to force it to do this is to save the HTML file with an XLS extension. Excel will then open the XLS as if it were its native workbook.

There's a very good Java POI api that would let you do that, but it's Java.
http://poi.apache.org/
If you're on Win32 you can also use Excel's COM api, there are quite a few tutorials on the net.

I cannot offer any advice on the Flash part, but I have done HTML table to Excel many times. Yes, Excel can open HTML tables but most HTML tables out there have extraneous crap in them that can make it fragile to consistently parse the tables.
CPAN module HTML::TableExtract is a wonderful module that allows you to focus on the non-presentation specific aspects of the table you are trying to extract. Just specify the column headings you are interested in and maybe specify the title or class of the table and you are mostly set. You might have to post process the rows returned a little, but that is considerably easier than dealing with the underlying tag soup in all its glory.
Further, for output to Excel format, stick with Spreadsheet::WriteExcel rather than the OLE interface. That way, you do not depend on having Excel installed for your program to work and things go a little faster.
Make sure you specify the data type of cells if you do not want content to be changed automatically by Excel upon opening the files (another reason I do not like sending around CSV files). Use a configuration file for formatting information so that you can change how the spreadsheet looks without having to change the program.
You can always use Excel's built-in charting facilities to replace the web site graphs.
This combination has enabled me to generate pretty good looking documents comprising several hundreds of megabytes of scraped data (with logos and image links etc) using just a few hundred lines of Perl and a couple of days' work.

What you're trying to do is fragile and difficult to maintain. You should attempt to create a csv feed to fetch the data. All it takes is for someone to come along and modify the HTML and your scraper will throw up on it (probably years after anyone remembers how your program works).
Try to get CSV and image data from the original source (ie, database or whatever) and build the Excel file from that.

I will add to SpliFF's answer that when you have your data as a CSV file you can set the mime type of the page to application/vnd.ms-excel which will open the page in Excel

Related

Skeleton website that will parse external file and display in HTML

I am creating a website that is meant to be a sort of user start guide with an introduction page, an overview page, and a page featuring the actual guide which will feature step-by-step instructions for the user to follow. Since I don't have the specific content for the page, I'm " hard coding" the HTML with placeholder text. Going forward, I want to create a skeleton HTML framework that will parse the start guide files JavaScript and display the information on the page in its expected place. I'm strongly considering using XML (or JSON) files and parsing it with JavaScript and using browser DOM methods because that's what I know. However, I'm working on a team of people who most likely won't know how to convert the start guides (which are PDFs) into well-formed XML so I'm wondering, are there any alternative ways to do this? I know Word has the ability to save documents as HTML and XML but they don't create the proper attributes/tags to be parsed with the JavaScript I'd write.
TLDR: Are there files or alternative methods other than JSON and XML (something that non-tech savvy people will understand) that can be parsed and processed with JavaScript to display in a skeleton website?

Dynamic Graphs to Pdf

I have some graphs on html which takes data from database using php function and javascripts. How can i create daily pdf with graphs on it of current date data without opening webpage and clicking on button?
The key problem is that HTML (graphics or even just text) does not translate directly into PDF. There are some libraries that will do this to a limited degree, but typically without the level of control that most people want in a PDF.
There are two very different ways to go about this, and I have used both at various times:
1 - Create a batch-mode PHP program (or other server-side language of your choice) that creates the graphics entirely server-side (many libraries available for that).
2 - Capture the page as if you were running a browser. I have used PhantomJS http://phantomjs.org/ to do that. The big advantage is that you can make use of all your existing graphics code - even libraries such as d3.
Either way, you will need to take the output and insert into a PDF together with headers, footers, explanatory text, etc. I usually use R&OS http://pdf-php.sourceforge.net/ for the PDF part, but there are other libraries that will work just as well.
try dompdf, it might help you. Here is the link
https://github.com/dompdf/dompdf

Generate PDF from web app

I need to generate a PDF from the current screen in my webapp. Some kind of screenshot, but I'm facing serious difficulties.
The main problem is that the view contains a grid made with jQuery Gridster; and some "widgets" contain complex elements like tables, highcharts, etc.
So plugins like jsPDF or html2canvas can't render my page in a prorper PDF. They always generate it blank.
This is how the page looks like. You can/move resize each element:
(Sorry for the CIA style, but there's business data in there)
Some ideas I came across but don't work are:
Using browser print-to-pdf feature programatically. (can't)
Use phantomjs. (but page state matters, so...)
I believe a solution to this poroblem may be widely adopted by anyone trying to generate a PDF of img from current screen in a web app. Quite an unresolved problem.
It's ok if only works on Google Chrome.
Many thanks.
EDIT:
One posible solution might be to find a way to represent the current layout status with an object and save it with and id.
Then retrieve that object via url param with the id and apply the stored layout to the inital page.
This way I might able to take a screenshot with phatomjs, but it seems quite complex to me. Any alternative?
Based on the fact that you're struggling with capturing dynamic content, I think at this point you need to take a step back and see that you might need to alter your approach. The reason these plugins are failing is because they will only work with the HTML before interactions right?
Why not convert the HTML to .pdf format from the server side? But the key part here is, send the current HTML back. By sending it back, you're sending updated static HTML back to the server to be rendered into a PDF? I've used HTML to PDF from server side before and it works fine, so I can't see why it wouldn't be appropriate here.
See this answer for details about HTML to PDF server side.

Convert HTML Report to PDF

I want convert an HTML report to PDF. I know that there are so many libraries are available for this purpose. But the HTML report contains so many graphs created using Jqplot. I want to include these graphs in the PDF also. If you are familiar with any library which also convert graphs to PDF, please give me the reference.
you can use bullzip pdf printer it creates a virtual pdf printer that can transform your HTML content to a pdf file from the browser:
http://www.bullzip.com/
you can also find a lot of useful tools right there!
I don't have experience with Jqplot, but I have successfully converted html pages which contain pie charts from Google Charts (completely javascript generated) to pdf.
I used wkhtmltopdf - it's a commandline tool but there are wrapper classes for php and some others I believe.
I'm sure you could find something applicable with a bit of googling but this looks promising: https://github.com/mikehaertl/phpwkhtmltopdf

How to generate graphics into photoshop using actionscript?

I've a text file with content like this:
id, pixelsize, color, text
block1, 200x60, black, Header
block2, 200x180, white, Body
block2, 200x60, black, Footer
Now using actionscript,
I want to generate a psd file which would generate a 3 vertical block graphics (like this) after parsing the given file. All the blocks are placed vertically on top of each other.
Convert this psd file into PDF automatically using the script.
Automate this whole process without opening photoshop. Is it possible?
Please help. Thanks.
You aren't going to be able to create a PSD w/o opening Photoshop. Even when you use something like Adobe Bridge to batch process files from any Adobe app it still uses the appropriate app to open a supported file and perform actions on it.
I have seen apps that allow you to output PDFs from user defined text and variable images (PageFlex comes to mind)...but even then, saving Adobe-compatible files aren't simple tasks to turn off and on (like when you make a text file). There's a lot of data to manage even with PDFs and I'd suspect even more when you look at a PSD file.
Unless you can find an open-source app that somehow allows you mess with its coding so that you can bypass opening it 100% to output a somewhat compatible PSD/PDF file, I don't think you're going to be able to automate much w/o lots of work and some potentially expensive software solutions.
Long answer short, I think you'll have to use Photoshop at some point in your solution. On the upside, you could make a recording of actions in PS so that individual files can be output to whatever format you like...and those I'm sure can be scripted into complicated solutions.
You can do this kind of thing using the ExtendScript Toolkit from Adobe.
Not sure you can do it without having Photoshop open, however.
Given that you want a PDF at the end, could you use something like AlivePDF (ActionScript 3 Open-Source PDF Library)?
If you actually need to also generate a PSD, I'm not sure how you do that from scratch, but the Photoshop SDK would be a good place to start, as well as getting a good understanding of bytearrays.

Categories

Resources