Wrong encoding on JavaScript Blob when fetching file from server

Wrong encoding on JavaScript Blob when fetching file from server - javascript

Using a FileStreamResult from C# in a SPA website (.NET Core 2, SPA React template), I request a file from my endpoint, which triggers this response in C#:
var file = await _docService.GetFileAsync(token.UserName, instCode.Trim()
.ToUpper(), fileSeqNo);
string contentType = MimeUtility.GetMimeMapping(file.FileName);
var result = new FileStreamResult(file.File, contentType);
var contentDisposition = new ContentDispositionHeaderValue("attachment");
Response.Headers[HeaderNames.ContentDisposition] =
contentDisposition.ToString();
return result;
The returned response is handled using msSaveBlob (spesificly for MS, but this is a problem even though I use createObjectURL and different browser (Yes, I have tried multiple solutions to this, but none of them seems to work). This is the code I use to send the request, and receive the PDF FileStreamResult from the server.
if (window.navigator.msSaveBlob) {
axios.get(url).then(response => {
window.navigator.msSaveOrOpenBlob(
new Blob([response.data], {type: "application/pdf"}),
filename);
});
The problem is that the returned PDF file that I get has a wrong encoding on it somehow. So the PDF will not open.
I have tried adding encoding to the end of type: {type: "application/pdf; encoding=UTF-8"} which was suggested in different posts, however, it makes no difference.
Comparing a PDF file that I have fetched in a different way, I can clearly see that the encoding is wrong. Most of the special characters are not correct. Indicated by the response header, the PDF file should be in UTF-8, but I have no idea how to actually find out and check.

Without knowing axios it seems though from its readme page that it uses JSON as default responseType. This may potentially alter the content as it is now treated as text (axios will probably bail out when it cannot convert to an actual JSON object and keep the string/text source for response data).
A PDF should be loaded as binary data even though it can be both, either 8-bit binary content or 7-bit ASCII - both should in any case be treated as a byte stream, from Adobe PDF reference sec. 2.2.1:
PDF files are represented as sequences of 8-bit binary bytes.
A PDF file is designed to be portable across all platforms and
operating systems. The binary rep resentation is intended to be
generated, transported, and consumed directly, without translation
between native character sets, end-of-line representations, or other
conventions used on various platforms. [...].
Any PDF file can also be represented in a form that uses only 7-bit
ASCII [...] character codes. This is useful for the purpose of
exposition, as in this book. However, this representation is not
recommended for actual use, since it is less efficient than the normal
binary representation. Regardless of which representation is
used, PDF files must be transported and stored as binary files,
not as text files. [...]
So to solve the conversion that happens I would suggest trying specifying the configuration entry responseType when doing the request:
axios.get(url, {responseType: "arraybuffer"}) ...
or in this form:
axios({
method: 'get',
url: url,
responseType:'arraybuffer'
})
.then( ... )
You can also go directly to response-type blob if you are sure the mime-type is preserved in the process.

Related

NestJS StreamableFile is changing bytes when streaming a pdf

I'm trying to download a pdf file from an s3 bucket, however for security reasons the file must be streamed back to the client (an angular web app). My plan was to request a download link from s3 and stream the document from that link back to the client using NestJS StreamableFile.
My Controller:
#Post('download')
#ApiOkResponse({
schema: {
type: 'string',
format: 'binary'
}
})
#ApiProduces('application/pdf')
async documentDownload(#Body() body: DownloadURLRequest) {
const result = await this.service.getS3DownloadURL(body);
const got = require('got');
return new StreamableFile(got.stream(result.downloadUrl));
}
My client receives a Blob that is exactly the length of the original file, however the file fails to load. Upon closer inspection it turns out some of (roughly 2/3) the bytes are different.
After a bit of debugging I know for sure the got.stream is generating the correct bytes, so it would appear that StreamableFile is changing or misinterpreting some of the bytes for some reason. Anyone with experience using StreamableFile know why this might be? Or if there is a better way to handle this with NestJS or otherwise I'm open to suggestions.
Edit: Further testing -- printing the stream from the StreamableFile object on the API side shows correct bytes, however the file returned to swagger has incorrect bytes.

Unsure if an existing project is using blob files of some other data format

I am working on an application that was created by a colleague that is no longer with the company. I am trying to open what I believe is a PDF in blob file format, but none of the tutorials is giving me the expected result.
I notice that I have some data preceding what I believe to be the blob file, leading me to wonder if I am understanding what I am working with correctly. The whole blob is too large to post here, but this is how it starts:
data:application/pdf;base64,JVBERi0xLjcNJeLjz9MNCjIxIDAgb2JqDTw8L0xpbmVhcml6ZWQgMS9MIDQyMTExNC9PIDIzL0UgMzc0MzIwL04gMi9UIDQyMDU3NC9IIFsgMjYxNiA0NDhdPj4NZW5kb2JqDSAgICAgICAgICAg
In the posts and tutorials I am reading, like this one for instance, they use code similar to this:
var file = new Blob([response], {type: 'application/pdf'});
Since my blob file includes the "application/pdf" portion as part of the string, I am starting to wonder if I misunderstood and that this is another kind of file instead of a blob. I have also seen other examples of blob files, and they also do not include the "data:application/pdf;base64,".
I am having issues opening the files in a browser if they are too large. I have outlined my problem in this post, but have yet to receive any advice, so I am trying to find a different way to approach this. The results are not as expected, which leads me to wonder if I am looking for the right thing.
Here is what I have tried and what the result is:
var file = new Blob([upload.split(',')[upload.split(',').length - 1]], { type: 'application/pdf' });
var fileURL = URL.createObjectURL(file);
I remove the "data:application/pdf;base64," portion of the string using "[upload.split(',')[upload.split(',').length - 1]]", thinking that this will leave me with only the blob data, but this invariably fails to open, giving me the below result:
So my question is this: Am I working with a blob file or not? If not, what kind of data file is this so I can start looking for more relevant tutorials?

"Blob files" aren't really a thing. You can think of the term "blob" as "just some binary data", or if you're a database vendor, call them "Binary Large OBject"s.
Your data:application/pdf;base64,JVBERi0xLjcNJeLjz9MNCjIxIDAgb2JqDTw8L... string is a data URL; a way to represent arbitrary binary data in an URL string. Since the URI has the base64 marker, you know the data is encoded as Base64 and you can use any Base64 decoder you can find (there are plenty online) to decode the data.
Your (truncated) data begins
%PDF-1.7
%
21 0 obj
<</Linearized 1/L 421114/O 23/E 374320/N 2/T 420574/H [ 2616 448]>>
endobj
so it is representing something like a valid PDF file.

AJAX response gives a corrupted compressed (.tgz) file

We are implementing a client-side web application that communicates with the server exclusively via XMLHttpRequests (and AJAX engine).
The XHR responses usually are plain text with some XML on it but in this case, the server is sending compressed data in .tgz file type. We know for sure that the data that the server is sending is correct because if we use an HTTP command-line client such as curl, the file sent as response is valid and contains the expected data.
However, when making an AJAX call and "blobing" the response in a downloadable file, the file we obtain is different in size (higher) than the correct one and it is not recognized by the decompresser. It Gives the following error:
gzip: stdin: not in gzip format
/bin/gtar: Child returned status 1
/bin/gtar: Error is not recoverable: exiting now
The code I'm using is the following:
*$.AJAX*.done(function(data){
window.URL = window.webkitURL || window.URL;
var contentType = 'application/x-compressed-tar';
var file = new Blob([data], {type: contentType});
var a = document.createElement('a'),
ev = document.createEvent("MouseEvents");
a.download = "browser_download2.tgz";
a.href = window.URL.createObjectURL(file);
ev.initMouseEvent("click", true, false, self, 0, 0, 0, 0, 0,
false, false, false, false, 0, null);
a.dispatchEvent(ev);
});
I avoided the parameters used to make the AJAX call, but let's assume that this is not the problem as I correctly receive an answer. I used this contentType because is the same one displayed by the obtained by curl but I tried different ones. The code may look a little bit weird so I'll desglosse it for you: I'm basically creating a link and I'm attaching to it the download link and the name of the file (it's a dirty way to be able to name the file). Finally I'm virtually clicking the link.
I compared the correct tgz file and the one obtained via browser with a hex viewer and I observed the repetition of patterns in the corrupted one (EF, BF and BD, all along the file) that is not present in the correct one.
Therefore I think about some possible causes:
(a) The browser is adding extra characters or maybe the response
header is still in the downloaded file.
(b) The file has been partially decompressed because when I inspect
the request Header I can state "Accept-Encoding: gzip, deflate";
although I don't know if the browser (Firefox in my case)
automatically decompresses data.
(c) The code that I'm using to blob the data is not correct; although
it acomplished well the aim with a plain/text file in another
occasion.
Edit
I also provide you the links to the hex inspection:
(a) Corrupted file: http://en.webhex.net/view/278aac05820c34dfbdd2217c03970dd9/0
(b) (Presumably) correct file: http://en.webhex.net/view/4a01894b814c17d2ec71ba49ac48e683

I don't know if this thread will be helpful for somebody, but just in case I figured out the cause and a possible solution for my problem.
The cause
Default Javascript variables store information in Unicode/ASCII format; they are not prepared for storing binary data correctly and this is why one can easily see wrong characters interpreted (this also explains why repetitions of EF, BF, etc. were observed in the Hex Viewer, which stand for wrong characters of ASCII/Unicode).
The solution
The last browser versions implement the so called typed arrays. They are javascript arrays that can store data in different formats (also binary). Then, if one specifies that the XMLHttpRequest response is in binary format, data will be correctly stored and, when blobed into a file, the file will not be corrupted. Check out the code I used:
var xhr = new XMLHttpRequest();
xhr.open('POST', url, true);
xhr.responseType = 'arraybuffer';
Notice that the key point is to define the responseType as "arraybuffer". It may be also interesting noticing that I decided not to use Jquery for the AJAX anymore. It poorly implements this feature and all attempts I did to parse Jquery were in vain (overrideMimeType described somewhere else didn't work in my case). Instead, old plain XMLHttRquest worked pretty nicely.

javascript sendfile binary data to web service

At work we are trying to upload files from a web page to a web service using html 5/javascript in the browser end and C# in the web service. But have some trouble with encoding of some sort.
As for the javascript we get the file's binary data with help from a FileReader.
var file = ... // gets the file from an input
var fileReader = new FileReader();
fileReader.onload = dataRecieved;
fileReader.readAsBinaryString(file);
function dataRecieved() {
// Here we do a normal jquery ajax post with the file data (fileReader.result).
}
Wy we are posting the data manually and not with help from XmlHttpRequest (or similar) is for easier overall posting to our web service from different parts of the web page (it's wrapped in a function). But that doesn't seem to be the problem.
The code in the Web Service looks like this
[WebMethod]
public string SaveFileValueFieldValue(string value)
{
System.Text.UnicodeEncoding encoder = new UnicodeEncoding();
byte[] bytes = encoder.GetBytes(value);
// Saves file from bytes here...
}
All works well, and the data seems to be normal, but when trying to open a file (an image as example) it cannot be opened. Very basic text files seems to turn out okay. But if I upload a "binary" file like an image and then open both the original and the uploaded version in a normal text editor as notepad to see what differs, it seems to be wrong with only a few "invisible" characters and something that displays as a new line a few bytes in from from the start.
So basicly, the file seems to encode just a few bytes wrong somewhere in the conversions.
I've also tried to create an int array in javascript from the data, and then again transformed to a byte[] in the web service, with the exact same problem. If I try to convert with anything else than unicode (like UTF-8), the data turns out completly different from the original, so I think om on the right track here, but with something slightly wrong.

The request itself is text, so binary data is lost if you send the wrong enc-type.
What you can do is encode the binary to base64 and decode it on the other side.
To change the enc-type to multi-part/mixed and set boundaries (just like an e-mail or something) you'd have to assemble the request yourself.

Upload a binary file using pure JavaScript

I'm working on a Chrome app that uses the HTML5 Filesystem API, and allows users to import and sync files. One issue I'm having is that if the user tries to sync image files, the files get corrupted during the upload process to the server. I'm assuming it's because they're binary.
For uploading, I opted just to make an Ajax POST request (using MooTools) and then put the file contents as the body of the request. I told MooTools to turn off urlEncoding and set the charset to "x-user-defined" (not sure if that's necessary, I just saw it on some websites).
Given that Chrome doesn't have support for xhr.sendAsBinary, does anyone have any sample code that would allow me to send binary files via Ajax?

FF's xhr.sendAsBinary() is not standard. XHR2 supports sending files (xhr.send(file)) and blobs (xhr.send(blob)):
function upload(blobOrFile) {
var xhr = new XMLHttpRequest();
xhr.open('POST', '/server', true);
xhr.onload = function(e) { ... };
// Listen to the upload progress.
xhr.upload.onprogress = function(e) { ... };
xhr.send(blobOrFile);
}
You can also send an ArrayBuffer.

IF you're writing the server, then you can just transform the bytes that you read into pure text, send it to the server and then decode it back.
Here's the simplest way (not very efficient, but that's just to show the technique) -
translate each byte you read from the file into a string of two hexadecimal characters. If you read the byte 53 (in decimal) then translate it into "45" (the hexadecimal representation of 53). concatenate all these strings together, and send the resulting string to the server.
On the server side, break the string on even positions, translate each pair of digits into a byte.

Develop Reference

JavaScript is the programming language of the Web.