Downloading a file from a dynamically generated Blob URL - javascript

Summary:
I am trying to write a bookmarklet that automatically downloads all my receipts from a certain hotel website. I'm able to open a new tab with the PDF showing, but having trouble actually getting that file data and saving it locally.
What I've done/tried:
I've been able to get the bookmarklet to traverse through the page, and find the right element, and even obtain a link to the PDF. I've also been able to open that link in a new window where the PDF loads successfully. If I put a settimeout on loading that tab, I can even get the blob URL as well.
The problem is - once I have the blob URL - I'm not sure how to actually save it. Also - I'm not sure I actually can, because if I try to manually open a new tab and paste the same blob URL in there, I get a 404 not found (which makes me think the URL is a single-use only.
Code:
document.getElementsByClassName('aa-items past-activities--items')[0].getElementsByTagName('button')[0].click()
path_to_pdf = document.getElementsByClassName('aa-items past-activities--items')[0].getElementsByClassName('hotel-rebook')[0].querySelectorAll("[data-locator='hotel-bill']")[0].href
newwindow = window.open(path_to_pdf)
function checkLoad()
{
if (newwindow.location.href.includes('blob'))
{
path_to_pdf_blob = newwindow.location.href
console.log(path_to_pdf_blob)
newwindow.close()
} else {
setTimeout('checkLoad();', 1000)
}
}
checkLoad()

Related

How to display PDFs fetched from S3 using JavaScript?

I need to fetch a PDF file from s3.amazonaws.com and when I query it using Postman (or paste directly into the browser), it loads fine. However when I try to generate the file path for it (to pass to a viewer later), it didn't work:
fetch(<S3URL>).then(res => res.blob()).then(blob => {
// THIS STEP DOES NOT WORK
let myBlob = new Blob(blob, {type: 'application/pdf'});
// expect something like 'www.mysite.com/my-file.pdf'
let PDFLink = window.URL.createObjectURL(myBlob);
return PDFLink;
}
I'm using Autodesk's Forge PDF viewer and it works perfectly fine for local PDF files:
let myPDFLink = 'public/my-file.pdf';
Autodesk.Viewing.Initializer(options, () => {
viewer = new Autodesk.Viewing.Private.GuiViewer3D(document.getElementById('forgeViewer'));
viewer.start();
viewer.loadExtension('Autodesk.PDF').then( () => {
viewer.loadModel(myPDFLink, viewer); // <-- works fine here
});
});
// from https://github.com/wallabyway/offline-pdf-markup
So, how do I go from the S3 URL (e.g. s3.amazonaws.com/com.autodesk.oss-persistent/0d/ff/c4/2dfd1860d1...) to something the PDF viewer can understand (i.e. has .pdf extension in the URL)?
I know for JSON files I need to do res.json() to extract the JSON content, but for PDFs, what should I do with the res object?
Note: I don't have control over the S3 URL. Autodesk generates a temporary S3 link whenever I want to download documents from their BIM360 portal.
I tried a lot of options and the only way I could display a PDF fetched via API calls is by using an object element:
<object data='<PDF link>' type='application/pdf'>
Converting the downloaded blob to base64 doesn't work. Putting the PDF link in an iframe doesn't work either (it still downloads instead of displaying). All the options I have read only work if the PDFs are part of the frontend application (i.e. local files, not something fetched from a remote server).

I copied the URL text and paste in a browser new window using protractor but it is loading showing object object instead of actual URL

my actual URL is https://wallboard.ef.com/dashboard/1
//open in a new tab
var text=browser.actions().sendKeys(protractor.Key.CONTROL, 'v');
browser.executeScript("window.open('"+text+"')");
resulted URL past in browser is https://wallboard.ef.com/[object%20Object]
how to get the actual URL mention above in a browser new tab
I have changed the code now I used read from the clipboard.
browser.executeScript("navigator.clipboard.readText().then(text => {window.open(text)});");

Blob name issue with new tab in chrome and firefox [duplicate]

In my Vue app I receive a PDF as a blob, and want to display it using the browser's PDF viewer.
I convert it to a file, and generate an object url:
const blobFile = new File([blob], `my-file-name.pdf`, { type: 'application/pdf' })
this.invoiceUrl = window.URL.createObjectURL(blobFile)
Then I display it by setting that URL as the data attribute of an object element.
<object
:data="invoiceUrl"
type="application/pdf"
width="100%"
style="height: 100vh;">
</object>
The browser then displays the PDF using the PDF viewer. However, in Chrome, the file name that I provide (here, my-file-name.pdf) is not used: I see a hash in the title bar of the PDF viewer, and when I download the file using either 'right click -> Save as...' or the viewer's controls, it saves the file with the blob's hash (cda675a6-10af-42f3-aa68-8795aa8c377d or similar).
The viewer and file name work as I'd hoped in Firefox; it's only Chrome in which the file name is not used.
Is there any way, using native Javascript (including ES6, but no 3rd party dependencies other than Vue), to set the filename for a blob / object element in Chrome?
[edit] If it helps, the response has the following relevant headers:
Content-Type: application/pdf; charset=utf-8
Transfer-Encoding: chunked
Content-Disposition: attachment; filename*=utf-8''Invoice%2016246.pdf;
Content-Description: File Transfer
Content-Encoding: gzip
Chrome's extension seems to rely on the resource name set in the URI, i.e the file.ext in protocol://domain/path/file.ext.
So if your original URI contains that filename, the easiest might be to simply make your <object>'s data to the URI you fetched the pdf from directly, instead of going the Blob's way.
Now, there are cases it can't be done, and for these, there is a convoluted way, which might not work in future versions of Chrome, and probably not in other browsers, requiring to set up a Service Worker.
As we first said, Chrome parses the URI in search of a filename, so what we have to do, is to have an URI, with this filename, pointing to our blob:// URI.
To do so, we can use the Cache API, store our File as Request in there using our URL, and then retrieve that File from the Cache in the ServiceWorker.
Or in code,
From the main page
// register our ServiceWorker
navigator.serviceWorker.register('/sw.js')
.then(...
...
async function displayRenamedPDF(file, filename) {
// we use an hard-coded fake path
// to not interfere with legit requests
const reg_path = "/name-forcer/";
const url = reg_path + filename;
// store our File in the Cache
const store = await caches.open( "name-forcer" );
await store.put( url, new Response( file ) );
const frame = document.createElement( "iframe" );
frame.width = 400
frame.height = 500;
document.body.append( frame );
// makes the request to the File we just cached
frame.src = url;
// not needed anymore
frame.onload = (evt) => store.delete( url );
}
In the ServiceWorker sw.js
self.addEventListener('fetch', (event) => {
event.respondWith( (async () => {
const store = await caches.open("name-forcer");
const req = event.request;
const cached = await store.match( req );
return cached || fetch( req );
})() );
});
Live example (source)
Edit: This actually doesn't work in Chrome...
While it does set correctly the filename in the dialog, they seem to be unable to retrieve the file when saving it to the disk...
They don't seem to perform a Network request (and thus our SW isn't catching anything), and I don't really know where to look now.
Still this may be a good ground for future work on this.
And an other solution, I didn't took the time to check by myself, would be to run your own pdf viewer.
Mozilla has made its js based plugin pdf.js available, so from there we should be able to set the filename (even though once again I didn't dug there yet).
And as final note, Firefox is able to use the name property of a File Object a blobURI points to.
So even though it's not what OP asked for, in FF all it requires is
const file = new File([blob], filename);
const url = URL.createObjectURL(file);
object.data = url;
In Chrome, the filename is derived from the URL, so as long as you are using a blob URL, the short answer is "No, you cannot set the filename of a PDF object displayed in Chrome." You have no control over the UUID assigned to the blob URL and no way to override that as the name of the page using the object element. It is possible that inside the PDF a title is specified, and that will appear in the PDF viewer as the document name, but you still get the hash name when downloading.
This appears to be a security precaution, but I cannot say for sure.
Of course, if you have control over the URL, you can easily set the PDF filename by changing the URL.
I believe Kaiido's answer expresses, briefly, the best solution here:
"if your original URI contains that filename, the easiest might be to simply make your object's data to the URI you fetched the pdf from directly"
Especially for those coming from this similar question, it would have helped me to have more description of a specific implementation (working for pdfs) that allows the best user experience, especially when serving files that are generated on the fly.
The trick here is using a two-step process that perfectly mimics a normal link or button click. The client must (step 1) request the file be generated and stored server-side long enough for the client to (step 2) request the file itself. This requires you have some mechanism supporting unique identification of the file on disk or in a cache.
Without this process, the user will just see a blank tab while file-generation is in-progress and if it fails, then they'll just get the browser's ERR_TIMED_OUT page. Even if it succeeds, they'll have a hash in the title bar of the PDF viewer tab, and the save dialog will have the same hash as the suggested filename.
Here's the play-by-play to do better:
You can use an anchor tag or a button for the "download" or "view in browser" elements
Step 1 of 2 on the client: that element's click event can make a request for the file to be generated only (not transmitted).
Step 1 of 2 on the server: generate the file and hold on to it. Return only the filename to the client.
Step 2 of 2 on the client:
If viewing the file in the browser, use the filename returned from the generate request to then invoke window.open('view_file/<filename>?fileId=1'). That is the only way to indirectly control the name of the file as shown in the tab title and in any subsequent save dialog.
If downloading, just invoke window.open('download_file?fileId=1').
Step 2 of 2 on the server:
view_file(filename, fileId) handler just needs to serve the file using the fileId and ignore the filename parameter. In .NET, you can use a FileContentResult like File(bytes, contentType);
download_file(fileId) must set the filename via the Content-Disposition header as shown here. In .NET, that's return File(bytes, contentType, desiredFilename);
client-side download example:
download_link_clicked() {
// show spinner
ajaxGet(generate_file_url,
{},
(response) => {
// success!
// the server-side is responsible for setting the name
// of the file when it is being downloaded
window.open('download_file?fileId=1', "_blank");
// hide spinner
},
() => { // failure
// hide spinner
// proglem, notify pattern
},
null
);
client-side view example:
view_link_clicked() {
// show spinner
ajaxGet(generate_file_url,
{},
(response) => {
// success!
let filename = response.filename;
// simplest, reliable method I know of for controlling
// the filename of the PDF when viewed in the browser
window.open('view_file/'+filename+'?fileId=1')
// hide spinner
},
() => { // failure
// hide spinner
// proglem, notify pattern
},
null
);
I'm using the library pdf-lib, you can click here to learn more about the library.
I solved part of this problem by using api Document.setTitle("Some title text you want"),
Browser displayed my title correctly, but when click the download button, file name is still previous UUID. Perhaps there is other api in the library that allows you to modify download file name.

Chrome blocking PDF views on web redirection to new tab via Top-Frame Navigations

As per the Chrome version >=60 the PDF view functionality by any top-frame navigations options like
<A HREF=”data:…”>
window.open(“data:…”)
window.location = “data:…”
has been blocked by Google for which the discussion can be found at Google Groups. Now the problem is how to display the PDF on web without explicitly or forcibly making PDF to download. My old code looked as below via window.open to view the PDF data
dataFactory.getPDFData(id, authToken)
.then(function(res) {
window.open("data:application/pdf," + escape(res.data));
},function(error){
//Some Code
}).finally(function(){
//Some Code
});
In above I extract the PDF data from server and display it. But since window.open is blocked by Chrome and as suggested by one of the expert over here to use <iframe> to open the PDF data and I tried but it's not working. It always says Failed to Load PDF Data as below
The updated JS code for the <iframe> looks as below:
dataFactory.getPDFData(id, authToken)
.then(function(res) {
$scope.pdfData = res.data;
},function(error){
//Some Code
}).finally(function(){
//Some Code
});
And the HTML looks as below:
<iframe src="data:application/pdf;base64,pdfData" height="100%" width="100%"></iframe>
How can I proceed and bring back the original PDF view functionality? I searched over other stack questions but out of luck on how to resolve this. May be I did something wrong or missed something with the iframe code but it's not working out.
After unable to find the desired result I came up with below approach to resolve the issue.
Instead of opening the PDF on new page what I did is as soon as user clicks on the Print button PDF file gets downloaded automatically. Below is the source for same.
//File Name
var fileName = "Some File Name Here";
var binaryData = [];
binaryData.push(serverResponse.data); //Normal pdf binary data won't work so needs to push under an array
//To convert the PDF binary data to file so that it gets downloaded
var file = window.URL.createObjectURL(new Blob(binaryData, {type: "application/pdf"}));
var fileURL = document.createElement("fileURL");
fileURL.href = file;
fileURL.download = serverResponse.name || fileName;
document.body.appendChild(fileURL);
fileURL.click();
//To remove the inserted element
window.onfocus = function () {
document.body.removeChild(fileURL)
}
In your old code :
"data:application/pdf," + escape(res.data)
In the new :
your iframe src is like "data:application/pdf;base64,pdfData"
Try to remove base64 from the src, it seems to be already present in the value of 'pdfdata'.

jQuery - Read data from text file with .get/.load - force reload?

I want to load data from a web server with jQuery. I'm uploading a file to a web server, put the response (which is containing a link) to an iframe and read this link with .get(). When I now upload another file, which leads to the same filename but changed contents, .get() does not read the content correctly on the first try, but reliable on the second. .load() should do basically the same, but does not reload the file no matter how often I re-upload the file.
Is there a chance to force the reload of the changed file?
var linkToTextFile = 'http://www.myserver.com/myTextFile.txt';
$.get(linkToTextFile, function(data){
alert("Data:" + data); //content changes on second try
});
Try to append a random GET parameter to your text file's URL :
var timestamp = new Date().getTime();
var linkToTextFile = 'http://www.myserver.com/myTextFile.txt?t=' + timestamp;
This way it will force the browser to reload the file.

Categories

Resources