Huge PDF takes time to render while scrolling using pdfviewer.js - javascript

I have custom pdf viewer html page for rendering the pdf. i am using pdfjs library for render the pdf. It's working fine for me.
When i open the small pdf file then it downloads and renders the file quickly,
when i open the large pdf then it downloads the pdf file quickly but it takes too much time to render the pdf file.
I can see the large pdf file content but when i scroll down then it hang the whole browser.
Any suggestion?

In summary on your OP - since you did not respond to questions or provide example of the PDF you were having trouble with then no-one can give a conclusive answer. This is a shame because it would have been easy to set up a snippet to probe the issue.
At a guess, I would say it is likely that there is a mismatch between the contents of your PDF file and the capability of pdfjs. If we had your example file we might be able to raise a bug on the developers git, which seems to be active and well supported.
What follows is a high-level description of the issues involved with creation of a PDF rendering engine, provided to illuminate why you may want to stick with the mainstream built-in rendering engines provided within popular browsers.
Rendering a PDF is a complicated task. If you break it down to component operations it is feasible, but there are several levels of PDF standard that introduced a large array of options. It is likely that either your PDF contains something with a faulty rendering implementation in pdfjs, or something that pdfjs is choking on when it tries to render it.
Some background: The PDF format is both brilliant and fiendish at the same time. Brilliant because of its portability, but fiendish because of the internal structure and storage mechanisms. There is no friendly 'DOM' like with HTML. If we were starting out afresh to develop a portable document format it would not be PDF that we would choose. But PDF currently has too much momentum to be thrown away, period.
To 'render' the contents of a PDF file to a display device or printer, your code would need to unpack the PDF and render the components (images, formatted text, pages) to the display device. It sounds straightforwad to anyone with experience of HTML DOM manipulation but there is no direct comparison.
PDF is a vector-based graphics definition language. The most likely equivalent most people would have experienced is SVG.
Anything that is not an embedded image in a PDF file is a vector-based output, except text which is zip-compressed and laid out by x/y co-ordinates rather than continuous strings.
Drawing and layout instructions live in sections (digests) that are linked via pointers like a tree - no simple top-to-bottom read & render process. A PDF can have redundant sections, replaced by some later edit but still present. And while on the subject, unless the PDF file is configured for fast web viewing, the rendering engine has to wait for the entire file to be delivered before it can understand how to display it. Fast web view puts the 'index' and page 1 sections at the top of the file stream to allow the rendering engine to get something out to the screen as fast as possible.
To support PDF adequately you have to be able to render anything the PDF contains and to do this perfectly in line with the PDF standards, otherwise you may find your PDF viewer crashes or is unable to render the entire PDF. You would have to cater for the various Acrobat standard levels, and the shortcuts and bloats that the editing package (Word, Illustrator, InDesign) vendors chuck into the PDF file; layers, thumbnails, etc.
In PDF, text could be stored as vector drawing instructions 'or' references to characters in a font file (like HTML text).
Regarding colors, have a read of the PDF spec and you will see that there are an array of colorspace options that the original PDF producer can decide to use. Some of these are for print devices that use alien color mechanisms. You would have to interpret these to a reasonable device color on the screen.
And then fonts. Fonts might be embedded subsets, or not. You will have to take decisions over what alternative fonts to use if a font mentioned in the PDF is not present when the rendering engine runs. To keep fidelity with the PDF you will need to realised the glyphs as vector graphics on your drawing surface at the scale defined in the PDF.
Given the layering, scaling and rotating features in PDF, you would likely be looking at an html canvas as the drawing surface. Anyone who knows will tell you that in the world of canvas you are pretty much on your own for rendering functions - both the strength and weakness of canvas, though for rendering PDF you will probably want absolute control so most libraries are not going to be of use to you. Meaning you are working with drawing primitives which takes time and can be susceptible to bugs.
Probably your biggest challenge is understanding the full range and scope of what you have to do. This is not impossible, but hard.
In summary on this lecture about challenges of writing a PDF rendering engine - rendering PDF files perfectly is a very complicated undertaking. It will not be surprising if during early release phases such products are felt to be very buggy in terms of not supporting chunks of the PDF specifications. Do not be too hard on the developers - the target they are aiming for is hard. If the developers have the backing and therefore the time to stay with the project then the full set of features in the PDF spec may be covered in their product at some point in time. Ideally they would publish a list of unsupported PDF features so that users could recognise potential issues, though you would never really know there was an issue until a PDF file looked strange when rendered or the engine crashed.

This seems like that you are using older version of PDF.js, Try with newer version

Related

PDFObject 2.0 : determine when user reaches the end of the document

I want an embedded PDF on my HTML 5 web page. I was looking at different viewers like PDFObject 2.0. This works well for viewing, but I have a requirement where the user must read the entire document (scroll or page to the end), and then I can enable a button for them to click on. This is for a legal compliance situation.
What types of code hooks are there when doing an embed of a PDF document using PDFObject 2.0 for finding out when the user has reached the end of the document (scroll to end or last page).
I believe PDFObject is like a wrapper that helps render the PDF under different browser conditions including mobile, tablets and desktop scenarios, and this request may not be possible without using PDF.js and customizing the code and then dealing with all the browser scenarios myself.
PDFObject only embeds the PDF within the HTML page, it does not provide any additional functionality. Currently, most built-in PDF rendering engines do not even provide a JavaScript API.
As far as I know, there is no way to prove someone has read (or navigated) through an entire PDF; that's why compliance web sites typically just present the user with a checkbox, something along the lines of "Check here to indicate you have read and agree with this document".
If you absolutely must find a way to ensure someone has navigated to the final page of the document -- which, let me remind you, does not imply they read anything -- you might be able to put something together with PDF.js. The PDF.js utility is a JavaScript-based PDF rendering engine. You can force the PDF to be rendered via PDF.js (PDFObject.com has examples for this); since PDF.js is JavaScript-based and open-source, you might be able to hack something together using the PDF.js API.

Rendering huge, interactive SVGs in a browser

We have to display huge SVG documents (about 20mb) inside a web application. Users should be able to zoom in and move the image.
Rendering the SVG directly as a DOM object is too slow and the performance is inconsistent. The same applies for painting it on a canvas.
Generally, handling SVG on the client side seems weak. So I thought of implementing a server-side solution for providing the data in small chunks, in a non-vector format. If the user is not interacting with the document, the buffer starts lazy loading higher detailed pieces. My concern with this solution is, that the network traffic could be critical.
We will be rendering 2D DWG / DXF files, which will be converted to SVG.
The AutoCAD API seems really slow. The DWG sample does not work on any of our devices. Also, the application has to run without an internet connection, so we can't use the AutoCAD REST API.
How would you solve this? Are browsers even built for handling huge vector graphics?
When it comes to SVG it depends on the number of nodes, gradients, opacity and blur effects; however, why not use the end-user's graphics accelerator to handle this?
Most modern web browsers are made to support graphics acceleration through WebGL -with which you can build very complex (and "huge") rich graphics in 2D (or 3D) that is handled as fast as your graphics accelerator can handle it; exactly like modern games.
Using a WebGL library is recommended where a lot of work has been done for you already:
PlayCanvas : https://playcanvas.com/ -- you can import other formats, build & script your scenes with a friendly interface. PlayCanvas is well documented.
Three.js : http://threejs.org/ -- an advanced WebGL library, aimed at coders. ThreeJS can also handle many different types of 3D formats and this library is also well documented.
With Three.js you can also render your graphics as SVG, however, using WebGL is recommended for the obvious advantages in speed and quality.
Both of these libraries are very powerful, have an active community and is well supported in modern web browsers; however there are many others you can try.
For more information on the libraries mentioned above, it's best to visit the sites where extensive information and examples are available.
Instead implementing yourself, I would suggest you use the Autodesk Viewer, also available for developers with full REST + JavaScript APIs.
Basically this library will convert your DWG file (2D or 3D) into a JSON stream and adjust the amount of data according to the browser/device capabilities. It uses Three.js, but you don't need to handle the geometry directly (but you can).
Check the Forge Github for samples. I like the Galley better.
You may also run it locally using NodeJS to server it to the browser. The Extract sample does the whole process.

Is it possible to get vector font representation from local fonts in JavaScript?

Is there any way to get vector representation for user-provided font? I mean, if a user has some non-standard font installed on his system, I am able to detect that font, I can draw it to off-screen canvas and get raster font representation.
I searched for a long time, but couldn't find any way to get vector representation for such font - there are always those "security restrictions". Is there some way to do it from JavaScript (maybe even with small Flash app)?
Update:
Why would I need such a thing? The reason is that the app I am building works a lot like an svg editor, allowing a person to create various documents right in his browser. But I also provide facilities for PDF output and saving the edited in some intermediate format, so later the user can continue editing (possibly on another machine).
PDF carries all it's fonts right inside the file, so if user specifies some font that is only installed on his machine, I need to get that font specification and embed it into PDF file. If I get raster font representation, the quality will suffer - and since the documents are intended for printing, quality is paramount.
Same argument goes for save files - since user can transfer them to another machine with another set of fonts, I need to carry non-standard fonts inside the file.

print html page to PDF on a schedule

I have a HTML page that uses javascript to generate dynamic images using a graph handler on a different server. The images will contain the same data for 1 week but will change when the 1 week window expires.
I am trying to come up with a way to automatically save the contents of the page to either a local file on the server or write to a PDF file.
I tried to use a 'web downloader' like HTTTrack, but it does not get the dynamic images...
I am running the html page off IIS.
I have no experience with IIS or ASP.
Thanks!
I'm not sure that I see any way to do this directly off the front end in an automatic manner. The challenge is that any "screen scraper" you have go out and grab the site with would need to be running javascript to get the tables, which isn't how I see many such systems operating. It's partially why you see strangeness on Archive.org when you have a site that's heavily augmented with javascript or flash.
An untested concept you might attempt was posted in this Stack Question
I could see some sort of a system that you rig together with another computer that schedules an browser load then prints to .pdf in some fashion. I've been unable to find any specific software that would automate that process, so you'd be left cobbling such a system together on your own.
Clearly you have the data available to make your dynamic images. The most feature-rich way I could think of would be to use a system like Jasper Reports or Crystal Reports, which you could feed your data, replicate the report, and easily output via pdf, a built-in export in both systems.
Perhaps its worth questioning your end purpose. To me, creating a "snapshot" of the relevant data in another table and using another system to render your graphs from that snapshot data seems far more valuable than just a print of the screen. You can then go back and adjust data as needed, or use it for other reporting purposes, exporting in any number of tools that are even as simple as Access. Heck, 10 years down the road you may want the data to look better than the graph system you're currently using, and you'd have the data to render it any way you want. When the VP of marketing comes looking for his numbers, a simple click would output those numbers that could be manipulated as needed from there.
I was able to accomplish what I wanted to do using wkhtmltopdf to convert my HTML page with Javascript to PDF. I ran the job via a task scheduler to supply my website url and output file name as parameters.
I then used a windows batch file to check if the file was created and then rename/email it to interested parties.
This of course requires that you have the ability to install wkhtmltopdf on your server.

Saving Div Content As Image On Server

I have been learning a bit of jQuery and .Net in VB. I have created a product customize tool of sorts that basically layers up divs and add's text, images etc on top of a tshirt.
I'm stuck on an important stage!
I need to be able to convert the content of the div that wraps all these divs of text and images to one flat image taking into account any CSS that has been applied to it also.
I have heard of things that I could use to screen capture the content of a browser on the server which could be possible for low res thumbs etc, but it sounds a little troublesome! and it would really be nice to create an image of high res.
I have also heard to converting the html to html5 canvas then writing that out... but looks too complicated for me to fathom and browser support is an issue.
Is this possible in .NET?
Perhaps something with javascript could be done?
Any help or guidance in the correct direction would be appreciated!
EDIT:
I'm thinking perhaps I could do with two solutions for this. Ideally I would end up with a normal res jpg/png etc for displaying on the website, But also a print ready high res file would be very desirable as well.
PostScript Printer - I have heard of it but I'm struggling to find a good resource to understand it for a beginner (especially with wiki black out). Perhaps I could create a html page from my div content and send it to print to a EPS file. Anyone know any good tutorials for this?
We did this... about 10 years ago. Interestingly, the tech available really hasn't changed too much.
update - Best Answer
Spreadshirt licenses their product: http://blog.spreadshirt.net/uk/2007/11/27/everyones-a-designer-free-designers-for-premium-partners/
Just license it. Don't do this yourself, unless you have real graphics manipulating and print production experience. I'd say in today's world you're looking at somewhere around 4,000 to 5,000 hours of dev time to duplicate what they did... And that's if you have two top tier people working on it.
Short answer: you can't do it in html.
Slightly longer answer:
It doesn't work in part because you can't screen cap the client side and get the level of resolution needed for production type printing. Modern screen resolution is usually on the order of 100 ppi. For a decent print you really need something between 3 and 6 times that density. Otherwise you'll have lots of pixelation and it will generally look like crap when it comes out.
A different Answer:
Your best bet is to leverage something like SVG (scalable vector graphics) and provide a type of drawing surface to the browser. There are several ways of doing this using Flash (Spreadshirt.com uses this) or Silverlight (not recommended). We used flash and it was pretty good.
You might be able to get away with using HTML 5. Regardless, whatever path you pick is going to be complicated.
Once the user is happy with their drawing and wants to print it out, you create the final file and run a process to convert it to Postscript or whatever format your t-shirt provider needs. The converter (aka RIP software) is going to either take a long time to develop or cost a bunch of money... pick one. (helpful hint: buy it. Back then, we spent around $20k US and it was far cheaper than trying to develop).
Of course, this ignores issues such as color matching and calibration. This was actually our primary problem. Everyone's monitor is slightly different and what looks like red on one machine is pink on another.
And for a little background, we were doing customized wrapping paper. The user added text, selected images from our library or uploaded their own, and picked a pattern. Our prints came out on large-format HP Inkjet printers (36" and 60" wide). Ultimately we spent between $200k and $300k just on dev resources to make it happen... and it did, unfortunately, the price point we had to sell at was too high for the market.
If you can use some server-side tool, check phantomjs. This is a headless webkit browser (with no gui) which can take a page's screenshot, an uses a javascript api. It should do the trick.
Send the whole div with user generated content back to server using ajax call.
Generate an HTML Document on server using 'HtmlTextWriter' class.
Then you can convert that HTML file using external tools like
(1) http://www.officeconvert.com/products_website_to_image.htm#easyhtmlsnapshot
(2) http://html-to-image.acasystems.com/faq-html-to-picture.htm
which are not free tools, but you can use them by creating new Process on server.
The best option I came across is wkhtmltopdf. It comes with a tool called wkhtmltoimage. It uses QtWebKit (A Qt port of the WebKit rendering engine) to render a web page, and converts the result to PDF or image format of your choice, all done at server side.
Because it uses WebKit, it renders everything (images, css and even javascript) just like a modern browser does. In my use case, the results have been very satisfying and are almost identical to what browsers would render.
To start, you may want to look at how to run external tools in .NET:
Execute an external EXE with C#.NET

Categories

Resources