Generate PDF from web app

Generate PDF from web app - javascript

I need to generate a PDF from the current screen in my webapp. Some kind of screenshot, but I'm facing serious difficulties.
The main problem is that the view contains a grid made with jQuery Gridster; and some "widgets" contain complex elements like tables, highcharts, etc.
So plugins like jsPDF or html2canvas can't render my page in a prorper PDF. They always generate it blank.
This is how the page looks like. You can/move resize each element:
(Sorry for the CIA style, but there's business data in there)
Some ideas I came across but don't work are:
Using browser print-to-pdf feature programatically. (can't)
Use phantomjs. (but page state matters, so...)
I believe a solution to this poroblem may be widely adopted by anyone trying to generate a PDF of img from current screen in a web app. Quite an unresolved problem.
It's ok if only works on Google Chrome.
Many thanks.
EDIT:
One posible solution might be to find a way to represent the current layout status with an object and save it with and id.
Then retrieve that object via url param with the id and apply the stored layout to the inital page.
This way I might able to take a screenshot with phatomjs, but it seems quite complex to me. Any alternative?

Based on the fact that you're struggling with capturing dynamic content, I think at this point you need to take a step back and see that you might need to alter your approach. The reason these plugins are failing is because they will only work with the HTML before interactions right?
Why not convert the HTML to .pdf format from the server side? But the key part here is, send the current HTML back. By sending it back, you're sending updated static HTML back to the server to be rendered into a PDF? I've used HTML to PDF from server side before and it works fine, so I can't see why it wouldn't be appropriate here.
See this answer for details about HTML to PDF server side.

Related

Should JS dynamically generate metadata/the whole page?

So I am going to have many pages that have a bunch of text in them, that a JS and CSS file will convert to a colored and everything webpage. I noticed that the text is usually going to be long, and since there are going to be many webpages, I should lower file size. Also since I don't want to ruin file quality, I have decided that my JS file is going to take the text and make a webpage out of it. Side Note: what I am trying to do is make tutorial pages, so I am going to use JS to generate a lot of the things that are on every tutorial page, like the lessons list, to lower file size.
I have noticed that metadata (<head> content) usually takes up some space that JS could generate, so I thought, Why don't I just generate this with JS? But then arose the problem that the some browsers might not parse it, or it might be slow to parse it. So I am asking here on Stack Overflow:
Should JavaScript generate metadata (and maybe almost the whole page, like remove the <head> tag completely and generate it with JS)?

It depends on your desired result.
Google has improved it's SEO mechanisms to render your page before indexing it, see here:
https://developers.google.com/search/docs/guides/javascript-seo-basics
However other bots may not do the same, such as social media crawlers like facebook or twitter that read Open Graph meta tags, or other search engines like Baidu.
If a bot doesn't render your document then the javascript doesn't get executed and your meta isn't present.
Additionally, if your initial document does not contain the stylesheets or other CDNs it takes a bit longer for the client. Imagine the process:
With head
fetch document
fetch resources
render content
Without head
fetch document
render content
fetch resources
re-render
That's way over-simplified but it demonstrates my point.
Alternative:
If your content is so dynamic, you might consider Server Side Rendering (SSR) or Pre-Rendering
You would build your pages programmatically and store/serve them all, or build them on the server-side as they are requested.
https://developers.google.com/web/updates/2019/02/rendering-on-the-web

HTML Source-Code rip-save?

i came across a js library (jsMovie) and wanted to see the example files, but it is really badly documented (usage), so i tried to download the authors page to look in the source-code. But when trying to do that, I've recognized that "view-source" wasn't giving the full code (almost 80% of the code did not appear). (Tried in Chrome, Firefox)
So my question is, how can this be? Firebug is displaying everything propperly. At this moment i thought, that this could be as well a good way to prevent kiddies from ripping sites.
here the page: http://konsultaner.de/entwickler#Konsultaner
Hints are welcome

Generate the current source code, as interpreted by the browser. This can be done using an XMLSerializer on document.
var generatedSource = new XMLSerializer().serializeToString(document);
From there, if you want to open a page just showing the source, you could do
window.open('data:text/plain,'+encodeURIComponent(generatedSource), '_blank');

They are using AngularJS, a front-end javascript framework. That means almost all parts of the page are generated dynamically using javascript. Therefore, you can't see the page without javascript running (using view-source), but you can see the generated HTML via inspector.
If it is a static website (the javascripts and templates are all there), you can still 'rip' it. But not if it is a dynamic website, since all data and logic are 'fed' by the server.

tideSDK, jquery, XMLHttpRequest, and absolute URL's

I'm using TideSDK to get content from a website. I will need to pre fill in form data from the database on this website eventually.
I'm able to get the page and store it to a variable.
I'm able to parse our the relative URL's with alert()'s.
But I'm not able to replace the body with the corrected body
$('html').replaceWith(html); Jquery should be in memory so I don't have to worry about replacing html right?
I cannot figure out why this doesn't work. If an image or URL is absolute it works fine, but if it's relative it doesn't work. I don't have access to fix the website with absolute url's.
My demo code: http://jsfiddle.net/Cs5MC/13/ Changed from html to body in demo
Any ideas?

To begin with, if this is an implementation of Titanium, you will need to use the Ti Network api discussed at the this doc.
Pulling JSON data and using it is much the same, with a callback, as it would be with jQuery or any regular xhr request.
I hope that it helps.
By the way, using jQuery - which is heavily dependent upon the dom to do it's work - you will always have to be careful whether an expected dom structure is actually present - which it may not be with the titanium platform without a web view, although I am not familiar with the TideSDK and may stand corrected on that side.

JavaScript treeview for large static website

Need suggestion for a "treeview" (navigation) JS widget for a site that is:
Really large (up to 100,000 pages)
Static - all pages are generated from a external source, and the widget is embedded in every page.
To clarify: there are no frames, and no application server. All pages are generated and placed in a file system, each page is loaded independently, that means the treeview navigation will be loaded every time as well, so it should either use multiple files and load parts of the tree on demand, or to be super-efficient.
Commercial OK.

Use mashable kind of tree. Click here for detailed architecture

All serious JS tree widgets allow dynamic loading of children. The key issue here is that most of them will send the server a query like getChildren?parent=23674 and this won't do for your case.
Since the site is static, you need to generate files with descriptions of the branches of the tree in JSON format and request those from the server as the user expands nodes in the tree. You could also create files which contain the tree children as HTML but you will be more flexible when you send data to the client and use JavaScript to convert the data into HTML (plus you will save a lot of bandwidth).
Try Yahoo's TreeView. There is an example how to load data dynamically.

Noticed that none of the links are working. However there was one written for the exact same reason, which is efficiency on large number of data. You might want to check out PixoTree, and see if it's the right tool for you.
PS. I know it's an old question, but thought it might help someone who stumbles upon this question.

What is the best way to convert HTML into Excel

I have an HTML page which has a flash chart(FusionCharts) and HTML table. I need to convert this whole thing into Excel. HTML table should be displayed in cells of excel sheet. Flash chart can be displayed as an image.
Is there any open source API that we could use for achieving this. Could you let me know what are the possible options.
Can this be done by using javascript alone.

The HTML table is relatively easy. You can download the page, parse the HTML (there are various HTML parsing libraries available), extract the table and convert it into CSV (which Excel can load), or directly create an Excel file, e.g. using Java POI, as suggested above.
The Flash part is significantly harder. There are quite a few tools available to capture flash to an image, you'd need to use one of them. This can be tricky, as Flash might be interactive, so you'd possibly have to remote-control the Flash part so it shows the right image before capturing. Hard to tell without more info.
That said, screen-scraping (which is what you're doing) is always labour-intensive and fragile. You should really push for a better interface to get your data from, it will save loads of hassle in the long run.

Just set the content type of the page to "application/vnd.ms-excel". If the html page is just a table it will open with excel and look perfect. You can even add background colors and font styles.
Try some of these content types
application/excel
application/vnd.ms-excel
application/x-excel
application/x-msexcel

Excel can convert HTML tables by default. The easiest way to force it to do this is to save the HTML file with an XLS extension. Excel will then open the XLS as if it were its native workbook.

There's a very good Java POI api that would let you do that, but it's Java.
http://poi.apache.org/
If you're on Win32 you can also use Excel's COM api, there are quite a few tutorials on the net.

I cannot offer any advice on the Flash part, but I have done HTML table to Excel many times. Yes, Excel can open HTML tables but most HTML tables out there have extraneous crap in them that can make it fragile to consistently parse the tables.
CPAN module HTML::TableExtract is a wonderful module that allows you to focus on the non-presentation specific aspects of the table you are trying to extract. Just specify the column headings you are interested in and maybe specify the title or class of the table and you are mostly set. You might have to post process the rows returned a little, but that is considerably easier than dealing with the underlying tag soup in all its glory.
Further, for output to Excel format, stick with Spreadsheet::WriteExcel rather than the OLE interface. That way, you do not depend on having Excel installed for your program to work and things go a little faster.
Make sure you specify the data type of cells if you do not want content to be changed automatically by Excel upon opening the files (another reason I do not like sending around CSV files). Use a configuration file for formatting information so that you can change how the spreadsheet looks without having to change the program.
You can always use Excel's built-in charting facilities to replace the web site graphs.
This combination has enabled me to generate pretty good looking documents comprising several hundreds of megabytes of scraped data (with logos and image links etc) using just a few hundred lines of Perl and a couple of days' work.

What you're trying to do is fragile and difficult to maintain. You should attempt to create a csv feed to fetch the data. All it takes is for someone to come along and modify the HTML and your scraper will throw up on it (probably years after anyone remembers how your program works).
Try to get CSV and image data from the original source (ie, database or whatever) and build the Excel file from that.

I will add to SpliFF's answer that when you have your data as a CSV file you can set the mime type of the page to application/vnd.ms-excel which will open the page in Excel

Develop Reference

JavaScript is the programming language of the Web.