Preview generated docx before downloading [duplicate]

Preview generated docx before downloading [duplicate] - javascript

I have successfully done code to display a PDF file in the browser instead of the "Open/Save" dialog. Now, I'm stuck trying to display a Word document in the browser. I want to display a Word document in Firefox, IE7+, Chrome etc.
Can any one help? I am always getting the "Open/Save" dialog while displaying the Word doc in browser. I want to implement this functionality using JavaScript.

No browsers currently have the code necessary to render Word Documents, and as far as I know, there are no client-side libraries that currently exist for rendering them either.
However, if you only need to display the Word Document, but don't need to edit it, you can use Google Documents' Viewer via an <iframe> to display a remotely hosted .doc/.docx.
<iframe src="https://docs.google.com/gview?url=http://remote.url.tld/path/to/document.doc&embedded=true"></iframe>
Solution adapted from "How to display a word document using fancybox".
Example:
JSFiddle
However, if you'd rather have native support, in most, if not all browsers, I'd recommend resaving the .doc/.docx as a PDF file Those can also be independently rendered using PDF.js by Mozilla.
Edit:
Huge thanks to cubeguerrero for posting the Microsoft Office 365 viewer in the comments.
<iframe src='https://view.officeapps.live.com/op/embed.aspx?src=http://remote.url.tld/path/to/document.doc' width='1366px' height='623px' frameborder='0'>This is an embedded <a target='_blank' href='http://office.com'>Microsoft Office</a> document, powered by <a target='_blank' href='http://office.com/webapps'>Office Online</a>.</iframe>
One more important caveat to keep in mind, as pointed out by lightswitch05, is that this will upload your document to a third-party server. If this is unacceptable, then this method of display isn't the proper course of action.
Live Examples:
Google Docs Viewer
Microsoft Office Viewer

The answers by Brandon and fatbotdesigns are both correct, but having implemented the Google docs preview, we found multiple .docx files that couldn't be handled by Google. Switched to the MS Office Online preview and works likes a charm.
My recommendation would be to use the MS Office Preview URL over Google's.
https://view.officeapps.live.com/op/embed.aspx?src=http://remote.url.tld/path/to/document.doc'

There are a few JS libraries that can handle .docx (not .doc) to HTML conversion client-side (in no particular order):
https://github.com/VolodymyrBaydalka/docxjs — works in the browser, docx only, complex layouts might be off, live demo
https://github.com/lalalic/docx2html (npm) — docx to html, most elements are supported, works in browser or nodejs
https://github.com/mwilliamson/mammoth.js — supports headings, lists, tables, endnotes, footnotes, images and text boxes
https://github.com/artburkart/docx2html — apparently, works in the browser
Note: If you are looking for the best way to convert a doc/docx file on the client side, then probably the answer is don't do it. If you really need to do it then do it server-side, i.e. with libreoffice in headless mode, apache-poi (java), pandoc etc.

A great solution if your data is confidential
Since the documents are confidential, they should not be processed on third-party resources.
This solution is open-source:
On the server-side: use Gotenberg to convert word & excel files to PDF.Note: Gotenberg works like a charm, it is based on the LibreOffice engine.
On the frontend: It's very easy to render the PDF file with javascript. (You can use libraries like: pdf.js, react-pdf, etc.)

ViewerJS is helpful to view/embed openoffice format like odt,odp,ods and also pdf.
For embed openoffice/pdf document
<iframe src = "/ViewerJS/#../demo/ohm2013.odp" width='700' height='550' allowfullscreen webkitallowfullscreen></iframe>
/ViewerJS/ is the path of ViewerJS
#../demo/ohm2013 is the path of your file want to embed

I think I have an idea.
This has been doing my nut in too and I'm still having trouble getting it to display in Chrome.
Save document(name.docx) in word as single file webpage (name.mht)
In your html use
<iframe src= "name.mht" width="100%" height="800"> </iframe>
Alter the heights and widths as you see fit.

If you wanted to pre-process your DOCX files, rather than waiting until runtime you could convert them into HTML first by using a file conversion API such as Zamzar. You could use the API to programatically convert from DOCX to HMTL, save the output to your server and then serve that HTML up to your end users.
Conversion is pretty easy:
curl https://api.zamzar.com/v1/jobs \
-u API_KEY: \
-X POST \
-F "source_file=#my.docx" \
-F "target_format=html5"
This would remove any runtime dependencies on Google & Microsoft's services (for example if they were down, or you were rate limited by them).
It also has the benefit that you could extend to other filetypes if you wanted (PPTX, XLS, DOC etc)

Native Documents (in which I have an interest) makes a viewer (and editor) specifically for Word documents (both legacy binary .doc and modern docx formats). It does so without lossy conversion to HTML. Here's how to get started https://github.com/NativeDocuments/nd-WordFileEditor/blob/master/README.md

PDFTron WebViewer supports rendering of Word (and other Office formats) directly in any browser and without any server side dependencies.
To test, try https://www.pdftron.com/webviewer/demo

You can also use some existing API's like GroupDocs.Viewer which can convert your document into image or html and then you will be able to display it in your own application.

Use Libre Office API
Here is an example
libreoffice --headless --convert-to html docx-file-path --outdir html-dir-path

Related

How to remove Active Content from uploaded PDF documents?

I developed a kind of job application website and I only now realized that by allowing the upload of PDF files I'm at risk of receiving PDF documents containing encrypted data, active content (e.g. JavaScript, PostScript), and external references.
What could I use to sanitize or re-build the content of every PDF files uploaded by users?
I want that the companies that will later review the uploaded resumes are able to open the resumes from their browsers without putting them at risk..

The simplest method to flatten or sanitise a PDF that can be done using using GhostScript in safer mode requires just one pass:-
For a Windows user it will be as "simple" as using new 9.55 command
"c:\path to gs9.55\bin\GSwin64c.exe" -sDEVICE=pdfwrite -dNEWPDF -o "Output.pdf" "Input.pdf"
for others replace gs9.55\bin\GSwin64c with version 9.55 GS command
It is not a fast method e.g. around 40ppm is not uncommon, thus 4 pages is about 6 seconds to be reprinted, however, a 400 page document could take 10 minutes.
Advantages the file size is often smaller once any redundant content is removed. Images and font reconstruction may save storage, e.g. a 100 MB file may be reduced to 30 MB but that is a general bonus, not an aim.
JavaScript actions are usually discarded, However links such as bookmarks are usually retained, so be cautious as the result can still have rogue hyperlinks.
The next best suggestion is two passes via PostScript as discussed here https://security.stackexchange.com/questions/103323/effectiveness-of-flattening-a-pdf-to-remove-malware
GS[win64c] -sDEVICE=ps2write -o "%temp%\temp.ps" "Input.pdf"
GS[win64c] -sDEVICE=pdfwrite -o "Output.pdf" "%temp%\temp.ps"
But there is no proof that its any different or more effective than the one line approach.
Finally the strictest method of all, is burst the pdf into image only pages then stitch the images back into a single pdf and concurrently run OCR to reconstruct a searchable PDF (drops bookmarks). That can also be done using Ghostscript enabled with Tesseract.
Note:- visible external hyperlinks may then still be reactivated due to the pdf readers native ability.

How to serve blob and have good filename for all users?

I have a PDF file as a blob object. I want to serve to my users, and right now I'm doing:
html = '<iframe src="' + URL.createURL(blob) + '">';
That works fine for people that want to use their in-browser PDF tool.
But...some people have their browser set to automatically download PDFs. For those people, the name of the downloaded file is some random string based on the blob URL. That's a bad experience for them.
I know I can also do:
<a href="blobURL" download="some-filename.pdf">
But that's a bad experience for the people who want to use in-browser PDF readers, since it forces them to download the file.
Is there a way to make everybody have good file names and to allow everybody to read the PDF the way they want to (in their browser or in their OS's reader)?
Thanks

At least looking at Google Chrome, if the user disables the PDF Viewer (using the option "Download PDF files instead of automatically opening them in Chrome") then window.navigator.plugins will show neither "Chromium PDF Plugin" nor "Chromium PDF Viewer". If the option is left at the default setting, the viewer will show in the plugin list.
Using this method, one can utilize window.navigator.plugins to check if any of the elements' names are either of the aforementioned plugins. Then, depending upon that result, either display a <iframe> or a <a href="blobUrl" download="file.pdf">. For other browsers I imagine that different methods would have to be used. You can also check for a "Acrobat Reader" plugin, which some machines may have instead, or even just the word "PDF".
On a side note, it does look like it is possible to detect if the default Firefox PDF viewer is enabled by using http://www.pinlady.net/PluginDetect/PDFjs/ .

Try to append &filename=thename.pdf to the binary, metadata or http header:
Content-Disposition: attachment; filename="thename.pdf"

I have looked through the documentation of createObjectURL(blob), it will always return a unique and specific format of url. It is not possible to change the URL here.
The plugin thing is not consistent across browsers.
Now here is my radical idea
Find or create(if not available) a js library that can create and save PDF files to server from blob. (I looked through some of them like 'jsPDF','pdfkit' but none of them use blob)
Save the file to server with a valid name
use the above name in the iframe.

<!DOCTYPE html> in JS file

I am referencing two JS files in my map.HTML header. Chrome console gives
Uncaught SyntaxError: Unexpected token <
Here is why I'm confused. When I click on the Chrome Console error message, it takes me to the Sources tab. Under Sources, it puts me on the relative JS tab, and shows code starting with < !DOCTYPE html> then continues with a ton of code that is not in my map.html file or JS file. Presumably this is generated when the JS is read?
The two JS files are:
https://github.com/socib/Leaflet.TimeDimension/tree/master/dist
https://github.com/calvinmetcalf/leaflet-ajax/tree/gh-pages/dist
I am opening map.HTML locally with Chrome using a simple python server using a batch file (python.exe -m http.server).
I am sure this is very basic, but it's confusing me because I reference plenty of other JS files both online and locally and I don't get this error.
Thanks

If you try https://github.com/socib/Leaflet.TimeDimension/blob/master/dist/leaflet.timedimension.min.js in your browser, you will get an HTML page.
If you try https://raw.githubusercontent.com/socib/Leaflet.TimeDimension/master/dist/leaflet.timedimension.min.js you will get what seams a source javascript file. But your browser may also consider it text/html, because that's what github sends in content-type header.
You can use third party sites which will serve files with appropriate content-type header, (example: https://rawgit.com/socib/Leaflet.TimeDimension/master/dist/leaflet.timedimension.min.js ).

In the future, try to do more research before posting here, otherwise a lot of people are going to downvote your questions, and even insult you.
A simple Google search for the differences between html and javascript may be a good start. The first step would be to remove those doctype lines. They mean nothing in Javascript. Just like the word granola has no meaning in Japanese. Different languages.
However, looking at your code, I don't see any DOCTYPE text in your javascript. In order to really debug this, you're going to want to open your webpage (html) in a browser (I recommend Chrome) and press F12 to open the developer tools. Go to the console and trace the error back through all of the files to find the origin.
In order to check and make sure that you're trying to pull javascript files and not html, take all the src urls you're using and paste them in a browser. If you land on a webpage, that url will serve up html, not javascript like you want. If you get a wall of text, you're probably referencing it correctly.
Correct: https://api.mapbox.com/mapbox.js/v3.0.1/mapbox.js
Incorrect: https://github.com/socib/Leaflet.TimeDimension/blob/master/dist/leaflet.timedimension.min.js
Hopefully this helps before this question gets deleted or put on hold. Also notice that people are going to downvote me for actually answering and trying to help.

You can't directly reference code stored in a github repo like you're trying to.
The URLs you're listing aren't javascript files; they're github webpages. That's why they contain HTML doctypes and code you don't recognize -- it's the github website code.
You can get the URL for the actual javascript files by clicking the "raw" button at the top of any of those pages (after selecting a specific individual file -- the urls you gave were for directories, not individual files.) For example:
This is an HTML file: https://github.com/socib/Leaflet.TimeDimension/blob/master/dist/leaflet.timedimension.min.js
This is the raw javascript:
https://raw.githubusercontent.com/socib/Leaflet.TimeDimension/master/dist/leaflet.timedimension.min.js
(That said, I don't believe it's a good idea to treat github like a CDN; usually you would use that purely as a repository and host the actual files in use elsewhere.)

Hiding a PDF in html for download on link click

I am facing a situation in my application where i need to add a pdf download link on to my page, but i cannot refer to any relative or absolute path for the file, I need to host my pdf inside the html only.
Is there any way to do this, using basic HTML and JavaScript?
Details summary of situation is as below :
there is an application which owned by someone else, i am customizing if for a particular client.
we are given a provision to place some html(s) in a directory which are used in a few pages in the application.
these htmls are not used with href or include in the product application, but are picked up by the product's java code and are added in the response, thus keeping a PDF in the same folder as my HTML and providing relative URL wont work here.. and putting absolute URL is also not a solution as this needs to work across multiple environments.

You can always use data URI links:
download PDF!
Possibly together with the download attribute:
...
E.g. http://jsbin.com/gutahugoci/ (PDF is from here).
To encode in base64 use base64 -w0 my_file.pdf > my_file.pdf.b64
Disclaimer: Please notice that I said "you can use", not "you should use. This should only be a last resort thing to do for PDFs, because the HTML file will become exceedingly big and your client might ask if your are kidding them.

why don't you encode the url
/download_pdf.php?id=r4yhr4hrb4rd54hddb4
Where r4yhr4hrb4rd54hddb4 is and encoded id?

Creating OpenXML documents using JavaScript

I have an application that needs to create simple OpenXML documents (in particular PowerPoint presentations) using JavaScript.
Can anyone suggest how to get started on this please (or even if it is possible)? I've used the Microsoft OpenXML SDK for doing something similar using C#, and was wondering whether there were any JavaScript libraries with similar functionality.
Essentially the problem is how to create the individual OpenXML documents that make up an unzipped PowerPoint document, then zip them together to create the PowerPoint (.pptx) file, which someone can then save to their disk.
Any ideas welcome!

Use the OpenXML SDK for Javascript

Obviously, operations such as Zipping/unZipping a document or saving a document cannot be done client-side and with pure javascript.
However, if you want to do such things, I do believe that there are Linux packages out there that accept strings as input and give you a ready to use Office document as output.
If you're not convenient with Linux packages, assuming you want to save this as a word 2007 document:
<?xml version="1.0" encoding="utf-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:pPr>
<w:pStyle w:val="MyHeading1" />
</w:pPr>
<w:r>
<w:t>This is Heading</w:t>
</w:r>
</w:p>
</w:body>
</w:document>
You can build this string, client-side. then send it to server through AJAX and let your server deal with it. specifically I have used these APIs myself multiple times. let PHP handle it. save the result somewhere, or force client's browser to download it (stream results)

USE OPEN XML SDK.
You can run it on node and in 32 seconds it creates 2000 documents. Or you can run it on the browser.

Develop Reference

JavaScript is the programming language of the Web.