Chrome extension: How to show custom UI for a PDF file?

Chrome extension: How to show custom UI for a PDF file? - javascript

I'm trying to write a Google Chrome extension for showing PDF files. As soon as I detect that browser is redirecting to a URL pointing to a PDF file, I want it to stop loading the default PDF viewer, but start showing my UI instead. The UI will use PDF.JS to render the PDF and jQuery-ui to show some other stuff.
Question: how do I make this? It's very important to block the original PDF viewer, because I don't want to double memory consumption by showing two instance of the document. Therefore, I should somehow navigate the tab to my own view.

As the main author of the PDF.js Chrome extension, I can share some insights about the logic behind building a PDF Viewer extension for Chrome.
How to detect a PDF file?
In a perfect world, every website would serve PDF files with the standard application/pdf MIME-type. Unfortunately, the real world is not perfect, and in practice there are many websites which use an incorrect MIME-type. You will catch the majority of the cases by selecting requests that satisfy any of the following conditions:
The resource is served with the Content-Type: application/pdf response header.
The resource is served with the Content-Type: application/octet-stream response header, and its URL contains ".pdf" (case-insensitive).
Besides that, you also have to detect whether the user wants to view the PDF file or download the PDF file. If you don't care about the distinction, it's easy: Just intercept the request if it matches any of the previous conditions.
Otherwise (and this is the approach I've taken), you need to check whether the Content-Disposition response header exists and its value starts with "attachment".
If you want to support PDF downloads (e.g. via your UI), then you need to add the Content-Disposition: attachment response header. If the header already exists, then you have to replace the existing disposition type (e.g. inline) with "attachment". Don't bother with trying to parse the full header value, just strip the first part up to the first semicolon, then put "attachment" in front of it. (If you really want to parse the header, read RFC 2616 (section 19.5.1) and RFC 6266).
Which Chrome (extension) APIs should I use to intercept PDF files?
The chrome.webRequest API can be used to intercept and redirect requests. With the following logic, you can intercept and redirect PDFs to your custom viewer that requests the PDF file from the given URL.
chrome.webRequest.onHeadersReceived.addListener(function(details) {
if (/* TODO: Detect if it is not a PDF file*/)
return; // Nope, not a PDF file. Ignore this request.
var viewerUrl = chrome.extension.getURL('viewer.html') +
'?file=' + encodeURIComponent(details.url);
return { redirectUrl: viewerUrl };
}, {
urls: ["<all_urls>"],
types: ["main_frame", "sub_frame"]
}, ["responseHeaders", "blocking"]);
(see https://github.com/mozilla/pdf.js/blob/master/extensions/chromium/pdfHandler.js for the actual implementation of the PDF detection using the logic described at the top of this answer)
With the above code, you can intercept any PDF file on http and https URLs.
If you want to view PDF files from the local filesystem and/or ftp, then you need to use the chrome.webRequest.onBeforeRequest event instead of onHeadersReceived. Fortunately, you can assume that if the file ends with ".pdf", then the resource is most likely a PDF file. Users who want to use the extension to view a local PDF file have to explicitly allow this at the extension settings page though.
On Chrome OS, use the chrome.fileBrowserHandler API to register your extension as a PDF Viewer (https://github.com/mozilla/pdf.js/blob/master/extensions/chromium/pdfHandler-vcros.js).
The methods based on the webRequest API only work for PDFs in top-level documents and frames, not for PDFs embedded via <object> and <embed>. Although they are rare, I still wanted to support them, so I came up with an unconventional method to detect and load the PDF viewer in these contexts. The implementation can be viewed at https://github.com/mozilla/pdf.js/pull/4549/files. This method relies on the fact that when an element is put in the document, it eventually have to be rendered. When it is rendered, CSS styles get applied. When I declare an animation for the embed/object elements in the CSS, animation events will be triggered. These events bubble up in the document. I can then add a listener for this event, and replace the content of the object/embed element with an iframe that loads my PDF Viewer.
There are several ways to replace an element or content, but I've used Shadow DOM to change the displayed content without affecting the DOM in the page.
Limitations and notes
The method described here has a few limitations:
The PDF file is requested at least two times from the server: First a usual request to get the headers, which gets aborted when the extension redirects to the PDF Viewer. Then another request to request the actual data.
Consequently, if a file is valid only once, then the PDF cannot be displayed (the first request invalidates the URL and the second request fails).
This method only works for GET requests. There is no public API to directly get response bodies from a request in a Chrome extension (crbug.com/104058).
The method to get PDFs to work for <object> and <embed> elements requires a script to run on every page. I've profiled the code and found that the impact on performance is negligible, but you still need to be careful if you want to change the logic.
(I first tried to use Mutation Observers for detection, which slowed down the page load by 3-20% on huge documents, and caused an additional 1.5 GB peak in memory usage in a complex DOM benchmark).
The method to detect <object> / <embed> tags might still cause any NPAPI/PPAPI-based PDF plugins to load, because it only replaced the <embed>/<object> tag's content when it has already been inserted and rendered. When a tab is inactive, animations are not scheduled, and hence the dispatch of the animation event will significantly be delayed.
Afterword
PDF.js is open-source, you can view the code for the Chrome extension at https://github.com/mozilla/pdf.js/tree/master/extensions/chromium. If you browse the source, you'll notice that the code is a bit more complex than I explained here. That's because extensions cannot redirect requests at the onHeadersReceived event until I implemented it a few months ago (crbug.com/280464, Chrome 35).
And there is also some logic to make the URL in the omnibox look a bit better.
The PDF.js extension continues to evolve, so unless you want to significantly change the UI of the PDF Viewer, I suggest to ask users to install the PDF.js's official PDF Viewer in the Chrome Web Store, and/or open issues on PDF.js's issue tracker for reasonable feature requests.

Custom PDF Viewer
Basically, to accomplish what you want to do you'll need to:
Interject the PDF's URL when it's loaded;
Stop the PDF from loading;
Start your own PDF viewer and load the PDF inside it.
How to
Using the chrome.webRequest API you can easily listen to the web requests made by Chrome, and, more specifically, the ones that are going to load .pdf files. Using the chrome.webRequest.onBeforeRequest event you can listen to all the requests that end with ".pdf" and get the URL of the requested resource.
Create a page, for example display_pdf.html where you will show the PDFs and do whatever you want with them.
In the chrome.webRequest.onBeforeRequest listener, prevent the resource from being loaded returning {redirectUrl: ...} to redirect to your display_pdf.html page.
Pass the PDF's URL to your page. This can be done in several ways, but, for me, the simplest one is to add the encoded PDF URL at the end of your page's url, like an encoded query string, something like display_pdf.html?url=http%3A%2F%2Fwww.example.com%2Fexample.pdf.
Inside the page, get the URL with JavaScript and process and render the PDF with any library you want, like PDF.js.
The code
Following the above steps, your extension will look like this:
<root>/
/background.js
/display_pdf.html
/display_pdf.js
/manifest.json
So, first of all, let's look at the manifest.json file: you will need to declare the permissions for webRequest and webRequestBlocking, so it should look like this:
{
"manifest_version": 2,
"name": "PDF Test",
"version": "0.0.1",
"background": {
"scripts": ["/background.js"]
},
"permissions": ["webRequest", "webRequestBlocking", "<all_urls>"],
}
Then, in your background.js you will listen to the chrome.webRequest.onBeforeRequest event and update the tab which is loading the PDF with the URL of your custom display_pdv.html page, like this:
chrome.webRequest.onBeforeRequest.addListener(function(details) {
var displayURL;
if (/\.pdf$/i.test(details.url)) { // if the resource is a PDF file ends with ".pdf"
displayURL = chrome.runtime.getURL('/display_pdf.html') + '?url=' + encodeURIComponent(details.url);
return {redirectUrl: displayURL};
// stop the request and proceed to your custom display page
}
}, {urls: ['*://*/*.pdf']}, ['blocking']);
And finally, in your display_pdf.js file you will extract the PDF's url from the query string and use it to do whatever you want:
var PDF_URL = decodeURIComponent(location.href.split('?url=')[1]);
// this will be something like http://www.somesite.com/path/to/example.pdf
alert('The PDF url is: ' + PDF_URL);
// do something with the pdf... like processing it with PDF.js
Working Example
A working example of what I said above can be found HERE.
Documentation links
I recommend you to take a look at the official documentation of the above specified APIs, that you can find following these links:
chrome.webRequest API
chrome.webRequest.onBeforeRequest event
chrome.runtime API
chrome.runtime.getURL method

Related

How to display a PDF without downloading

I have a PDF like like this:
"http://centraldata.s3.amazonaws.com/.....pdf?AWSAccessKeyId=...."
which I get from an api call. Then I pass it into an link so that users can click and download it.
<a href={pdfUrl} />
So, my question is, is there a way to let user view the PDF without downloading it? Except passing the Url into an tag, I don't know if there is any other way to use this link

When you place a page of HTML in the Public Domain (World Wide Web) you are offering a service with Dis-positions (Download to later view this page after decoding download).
If you include images, text, audio, video or even a PDF via link, then you are offering to disposition a copy of the page content (be dispossessed of all with its content) from the server to the browser.
A web site can indicate to the browser that the download need not be viewed in the browser, (many browsers do not have a PDF viewer, or the browser may be secured to a safer setting such as Download Media ONLY) the HTTP response could include the html attachment headers:
Content-Type: application/pdf
Content-Disposition: attachment; filename="file.pdf"
A web site can indicate to the browser that the download need not be viewed in the browser, but there is a possibility if the user settings allow for inline viewing, (many browsers do not have a PDF viewer or it may be secured to a safer setting such as Download ONLY) the HTTP response should include the html attachment headers:
Content-Type: application/pdf
Content-Disposition: inline; filename="file.pdf"
To avoid problems with the text parsing of the optional filename (blob: or anything will be saved to filename at the discretion of the client dispossesser) then the proffered optional filename should be "double quoted".
User security settings should ideally be set to no-popups, like blanks, open secondary windows or target tabs, since those are unfriendly, anti-social server actions.
W3 Recommendation says
< a href="download-file.pdf" download >right click here for your options including view now< /a>
If you as my client have an inline PDF viewer active, here is an iframe should you wish to use it. (Your servant)
The next two related questions are
How can I stop that content I duplicated, from being duplicated as a copy by the client?, well clearly you cannot, since you willingly gave it away and once decrypted and decoded by the receiver it belongs to them as editable dis possessed content.
How can I mask the source URL ?, generally you can not since to decode the whole file and maintain it (if required) the sending caller ID must be maintained as an open comms channel during viewing. (Much like satellite download or Netflix recordings on demand.)

just use this
MyPDF

How Can I make my browser show files (images) that a server is sending me with download prompt?

The only solution I've found it to grab the link with getElementsByClassName then inject it into an html snippet on the page, but it looks so fake, and is also unnecessary (I don't want all the links)
I want to right click the link (one at a time) and show it to the next tab. If I right click the link the server sends me a download prompt. How can I evade this?

I think the browser decides to download a file or display it based on its MIME type.
If the server is under your control, you should make sure you supply the correct Content-Type HTTP header (e.g. you have to call a library function in PHP, and there should be a similar way to do that in other languages).
Otherwise, for a purely client-side solution in JavaScript, you can fetch the file with an XMLHttpRequest (most JavaScript toolkits have wrappers around it). Then, you can convert it to base 64, prefix the result data:image/png;base64,, and use it as the src attribute of an img element (thanks https://stackoverflow.com/a/21508186/324969).
Note that for security aspects, grabbing arbitrary files and stuffing them in a data: URL might not be safe. I don't know if any cross-site scripting or CORS attacks could be built upon this. You'll have to ask a separate question to know if the client-side solution is unsafe. For the server-side, be careful not to set the wrong content-type for user-uploaded data, or for endpoints of your service (e.g. letting the client-side send you in the request the Content-Type that it would like, as tempting as it looks, is a big no-no).
To open the image in a new tab, you can use window.open as usual, but download the image beforehand (using XMLHttpRequest) and put the data:image/png;base64,… as the URL of the new tab.
Since you can already see the images by placing their URL in an img tag, you can paint that img on a , extract a PNG from the canvas, craft a data:image/png;base64,… URL from that, and then either automatically open many tabs with these URLS, or write in your page a series of links to data: URLs.
You could also have a link to a tiny web page with just the img tag that you currently use: link text.

How do I take a converted file (pdf to html), and open it locally in a new tab in google chrome?

Basically I have a django app that communicates with a chrome extension. I have a bunch of functionality that interfaces with normal HTML pages that's all done by the extension. I want to allow users to have the same functionalities for PDF files. I have a python script which translates the pdf file into an html page.
The problem I run into is when the pdfs are open locally within chrome.
Like the following file:///home/wcr5048/Downloads/sample_pdf.pdf
This is my current solution, it basically gets the html and replaces all the current html, which is just an embedded pdf, and replaces it with the converted pdf(html). But I run into an issue because the "url" isn't really a url, and therefore I can't append html to something that doesn't exist.
function convert_to_html(request) {
console.log('converting to html...');
document.getElementsByTagName('html')[0].innerHTML = request.data;
chrome.runtime.sendMessage({
detail: 'refresh'
});
}
What I don't want to happen is to download a file just like the pdf but one that's been converted into html. I would rather have everything happen automatically.
I only see two possible options:
I create a unique link for the converted pdf file for every user, and then send the raw html string to populate the corresponding view.
I somehow tell the extension to use a popup to cover the entire width of the screen, and then populate it with the data.
Are there any suggested solutions that would be a better fit, and if not, which would be a better solution.
Thanks for viewing

Starting download of an url shortcut file?

I have created a shortcut file by going to Desktop -> new shortcut and entered link.
Now I have uploaded this shortcut file (*.url) on the
root on my server as shortcut.url
When i directly access mysite.com/shortcut.url, it does not start the download of it but instead show the content of the .url file.
Now on my page where i link to mysite.com/shortcut.url, I have tried the following methods:
How to start automatic download of a file in Internet Explorer?
But noone of them seems to work in Chrome (I though that if it their answers work in IE then Chrome it would too).
How is it possible to start downloading of this type of file, on click?

Generally speaking, a browser will fetch a resource in download mode (rather than displaying directly) if the content type is one that it cannot handle, and no plugins can handle. The easiest way for that to be the case is by using the content type for generic binary files:
Content-Type: application/octet-stream
Basically, configure your web server to use this content type (often called a MIME type) for .url files. How you do that depends on what server you're using.

load external webpage and add custom header and use the data from webpage

I want to load a external webpage on my own server and add my own header. Also i need to use the data from the external website like url and content (i need to search and find specific data, check if i got that data in my system and show my data in the header). The external webpage needs to be working (like the buttons for opening other pages, no new windows).
I know i can play with .NET to create software but i want to create a website that will do the trick. Can this be done? Php + iframe is to simple i think, that won't give me the data from external website and my server won't see changes in the external url (what i need).

If it's supposed to be client-side, then you can acquire the data necessary by using an Ajax request, parsing it in JavaScript and then just inserting it into an element. However you have to take into account that if the host doesn't support cross-origin resource sharing, then you won't be able to do it like this.
Ajax page source request: get full html source code of page through ajax request through javascript
Parsing elements from the source: http://ajaxian.com/archives/html-parser-in-javascript (not sure if useful)
Changing the element body:
// data --> the content you want to display in your element
document.getElementById('yourElement').innerHtml = data;
Other approach (server-side though) is to "act" like a browser by faking your user-agent to some browser's and then using cUrl for example to get the source. But you don't want to fake it, because that's not nice and you would feel bad..
Hope it gets you started!

Develop Reference

JavaScript is the programming language of the Web.