I have linked a JSON response on user's request to fetch an excel document, the response is in the manner
{
"format": // file extn ----only xls
"docTitle": //file name
"document" :// base 64 encoded data
}
I tried to handle the API on Frontend by decoding the document key using atob() function.
downloadTemplate() {
this.fetchData.fetchDownloadTemplate().subscribe(
res => {
let docString = atob(res.document);
let format = res.format;
let fileName = res.docTitle;
let file = new Blob([docString], {type: "text/xls"});
let url = URL.createObjectURL(file);
let dwldLink = document.getElementById('template_link');
dwldLink.setAttribute("href", url);
dwldLink.setAttribute("download", fileName);
document.body.appendChild(dwldLink);
dwldLink.click();
},
err => {
console.log(err.responseJSON.message);
})
}
But the data gets corrupted, On doing some research I got to know that atob() decodes using ASCII charset, while I have done the encoding using charset UTF-8.
Can you suggest any alternative method for decoding the data with UTF-8 charset in typescript, as the angular(v4+) project I am working throws error on using JS validators.
While searching for a suitable module which would support Unicode to binary conversion, I came across
https://github.com/dankogai/js-base64#decode-vs-atob-and-encode-vs-btoa
Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings
The Base64.decode() support utob decrpyption.
The module supports angular(v4+), and also add dependencies for Typescript(2)
Related
I'm trying to read an Excel file which has been converted into a base-64 string. The File was originally a react-dropzone file before conversion. Unfortunately, when parsing the data with XLSX the file is returning unusual characters. Below is my code:
const XLSX = require("xlsx")
const file = "base64String"
const buffer = Buffer.from(file, 'base64')
const workbook = XLSX.read(buffer, { type: 'buffer' })
const sheetNamesList = workbook.SheetNames
// parse excel data to json
const excelData = XLSX.utils.sheet_to_json(workbook.Sheets[sheetNamesList[0]])
When logging the excelData, the output returns something like:
'u«ZjeÆÿ¾wh¥éñWè®f³ê\u001f~\'\u001ev.éí²ÞiÛ!yëfÈ^zÖÚ±î¸PK\u0003\u0004\u0014\u0000\u0006\u0000\b\u0000
\u0000\u0000!\u0000bîËNÃ0\u0010E÷HüCä-Jܲ#\b5íÇ\u0012*Q>ÀÄƪc[û\u0010B¡\u0015j7±\u0012ÏÜ{2ñÍh²nm¶Æ»R\fÈÀU^\
u001b7/ÅÇì%¿\u0017\u0019rZYï
Your file isn't just a base64 string. In your repl.it, you have additional pieces of data prior to the base64 content, so that is being interpreted as junk base64 content.
If you strip away the data:application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;base64, part, then the buffer should be able to properly parse your content.
A Python backend reads a binary file, base64 encodes it, inserts it into a JSON doc and sends it to a JavaScript frontend:
#Python
with open('some_binary_file', 'rb') as in_file:
return base64.b64encode(in_file.read()).decode('utf-8')
The JavaScript frontend fetches the base64 encoded string from the JSON document and turns it into a binary blob:
#JavaScript
b64_string = response['b64_string'];
decoded_file = atob(b64_string);
blob = new Blob([decoded_file], {type: 'application/octet-stream'});
Unfortunately when downloading the blob the encoding seems to be wrong but I'm not sure where the problem is. E.g. it is an Excel file that I can't open anymore. In the Python part I've tried different decoders ('ascii', 'latin1') but that doesn't make a difference. Is there a problem with my code?
I found the answer here. The problem was at the JavaScript side. It seems only applying atob to the base64 encoded string doesn't work for binary data. You'll have to convert that into a typed byte array. What I ended up doing (LiveScript):
byte_chars = atob base64_str
byte_numbers = [byte_chars.charCodeAt(index) for bc, index in byte_chars]
byte_array = new Uint8Array byte_numbers
blob = new Blob [byte_array], {type: 'application/octet-stream'}
I have hacked together a small tool to extract shipping data from Amazon CSV order data. it works so far. here is a simple version as JS Bin: http://output.jsbin.com/jarako
For printing stamps/shipping labels, I need a file for uploading to Deutsche Post and to other parcel services. I used a small function saveTextAsFile which i found on stackoverflow. Everything good so far. No wrong displayed special characters (äöüß...) in the output textarea or downloaded files.
All these german post / parcel services sites accept only latin1 / iso-8859-1 encoded files for upload. But my downloaded file is always utf-8. If i upload it, all special characters (äöüß...) go wrong.
How can i change this? I still searched a lot. I have tried i.e.:
Setting the charset of the tool to iso-8859-1:
<META http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
But the result is: Now I have wrong special characters still in the output textarea and in the downloaded file. If I upload it to the post site, I still get more wrong characters. Also if I check the encoding in CODA Editor it still says the downloaded file is UTF-8.
The saveTextAsFile function uses var textFileAsBlob = new Blob([textToWrite], {type:'text/plain'});. May be there is a ways to set the charset for download there!?
function saveTextAsFile()
{
var textToWrite = $('#dataOutput').val();
var textFileAsBlob = new Blob([textToWrite], {type:'text/plain'});
var fileNameToSaveAs = "Brief.txt";
var downloadLink = document.createElement("a");
downloadLink.download = fileNameToSaveAs;
downloadLink.innerHTML = "Download File";
if (window.webkitURL != null)
{
// Chrome allows the link to be clicked
// without actually adding it to the DOM.
downloadLink.href = window.webkitURL.createObjectURL(textFileAsBlob);
}
else
{
// Firefox requires the link to be added to the DOM
// before it can be clicked.
downloadLink.href = window.URL.createObjectURL(textFileAsBlob);
downloadLink.onclick = destroyClickedElement;
downloadLink.style.display = "none";
document.body.appendChild(downloadLink);
}
downloadLink.click();
}
Anyhow, there have to be a way to download files in other encoding as the site uses itself. The Amazon site, where i download the CSV file from is UTF-8 encoded. But downloaded CSV file from there is Latin1 (iso-8859-1) if i check it in CODA...
SCROLL DOWN TO THE UPDATE for the real solution!
Because I got no answer, I have searched more and more. It looks like there is NO SOLUTION in Javascript. Every test download I'v made, which was generated in javascript was UTF-8 encoded. Looks like Javascript is only made for UNICODE / UTF-8 or an other encoding would (possibly) only apply if the data would be transported again using a former HTTP transport. But for a Javascript, which runs on the client no additional HTTP transport happens, because the data is still on the client..
I have helped me now with building a small PHP Script on my server, to which i send the Data via GET or POST request. It converters the encoding to latin1 / ISO-8859-1 and downloads it as file. This is a ISO-8859-1 file with correctly encoded special characters, which I can upload to the mentioned postal and parcel service sites and everything looks good.
latin-download.php: (It is VERY IMPORTANT to save the PHP file itself also in ISO-8859-1, to make it work!!)
<?php
$decoded_a = urldecode($_REQUEST["a"]);
$converted_to_latin = mb_convert_encoding($decoded_a,'ISO-8859-1', 'UTF-8');
$filename = $_REQUEST["filename"];
header('Content-Disposition: attachment; filename="'.$filename.'"; content-type: text/plain; charset=iso-8859-1;');
echo $converted_to_latin;
?>
in my javascript code i use:
<a id="downloadlink">Download File</a>
<script>
var mydata = "this is testdata containing äöüß";
document.getElementById("downloadlink").addEventListener("click", function() {
var mydataToSend = encodeURIComponent(mydata);
window.open("latin-download.php?a=" + mydataToSend + "&filename=letter-max.csv");
}, false);
</script>
for bigger amounts of data you have to switch from GET to POST...
UPDATE 08-Feb-2016
A half year later now i have found a solution in PURE JAVASCRIPT. Using inexorabletash/text-encoding. This is a polyfill for Encoding Living Standard. The standard includes decoding of old encodings like latin1 ("windows-1252"), but it forbids encoding into these old encoding types. So if you use the browser implemented window.TextEncoder function it does offer only UTF encoding. BUT, the polyfill solution offers a legacy mode, which does ALLOW also encoding into old encodings like latin1.
i use it like that:
<!DOCTYPE html>
<script>
// 'Copy' browser build in TextEncoder function to TextEncoderOrg (because it can NOT encode windows-1252, but so you can still use it as TextEncoderOrg() )
var TextEncoderOrg = window.TextEncoder;
// ... and deactivate it, to make sure only the polyfill encoder script that follows will be used
window.TextEncoder = null;
</script>
<script src="lib/encoding-indexes.js"></script> // needed to support encode to old encoding types
<script src="lib/encoding.js"></script> // encording polyfill
<script>
function download (content, filename, contentType) {
if(!contentType) contentType = 'application/octet-stream';
var a = document.createElement('a');
var blob = new Blob([content], {'type':contentType});
a.href = window.URL.createObjectURL(blob);
a.download = filename;
a.click();
}
var text = "Es wird ein schöner Tag!";
// Do the encoding
var encoded = new TextEncoder("windows-1252",{ NONSTANDARD_allowLegacyEncoding: true }).encode(text);
// Download 2 files to see the difference
download(encoded,"windows-1252-encoded-text.txt");
download(text,"utf-8-original-text.txt");
</script>
The encoding-indexes.js file is about 500kb big, because it contains all the encoding tables. Because i need only windows-1252 encoding, for my use i have deleted the other encodings in this file. so now there are only 632 byte left.
The problem is not the encoding but the fact that the special characters are displayed wrong in some applications, e.g. Microsoft Excel. UTF-8 is fine for displaying all special German characters. You can fix the problem by adding a Byte order mark (BOM) in front of the csv.
const BOM = "\uFEFF"
let csvData = BOM + csvData
const blob = new Blob([csvData], { type: "text/csv;charset=utf-8" });
Solution based on this github post
You cannot force a web server to send you data in a given encoding, only ask it politely. Your approach to just convert to the format you need is the right way to go.
If you wanted to avoid the PHP script, you may have luck specifying the encoding as a parameter when creating your Blob:
var textFileAsBlob = new Blob(textToWrite, {
type: 'text/plain;charset=ISO-8859-1',
encoding: "ISO-8859-1"
});
See Specifying blob encoding in Google Chrome for more details.
We transform HTML to PDF in the backend (PHP) using dompdf. The generated output from dompdf is Base64 encoded with
$output = $dompdf->output();
base64_encode($output);
This Base64 encoded content is saved as a file on the server. When we decode this file content like this:
cat /tmp/55acbaa9600f4 | base64 -D > test.pdf
we get a proper PDF file.
But when we transfer the Base64 content to the client as a string value inside a JSON object (the server provides a RESTful API...):
{
"file_data": "...the base64 string..."
}
And decode it with atob() and then create a Blob object to download the file later on, the PDF is always "empty"/broken.
$scope.downloadFileData = function(doc) {
DocumentService.getFileData(doc).then(function(data) {
var decodedFileData = atob(data.file_data);
var file = new Blob([decodedFileData], { type: doc.file_type });
saveAs(file, doc.title + '.' + doc.extension);
});
};
When we log the decoded content, it seems that the content is "broken", because several symbols are not the same as when we decode the content on the server using base64 -D.
When we encode/decode the content of simple text/plain documents, it's working as expected. But all binary (or not ASCII formats) are not working.
We have searched the web for many hours, but didn't find a solution for this that works for us. Does anyone have the same problem and can provide us with a working solution? Thanks in advance!
This is a example for a on the server Base64 encoded content of a PDF document:
JVBERi0xLjMKMSAwIG9iago8PCAvVHlwZSAvQ2F0YWxvZwovT3V0bGluZXMgMiAwIFIKL1BhZ2VzIDMgMCBSID4+CmVuZG9iagoyIDAgb2JqCjw8IC9UeXBlIC9PdXRsaW5lcyAvQ291bnQgMCA+PgplbmRvYmoKMyAwIG9iago8PCAvVHlwZSAvUGFnZXMKL0tpZHMgWzYgMCBSCl0KL0NvdW50IDEKL1Jlc291cmNlcyA8PAovUHJvY1NldCA0IDAgUgovRm9udCA8PCAKL0YxIDggMCBSCj4+Cj4+Ci9NZWRpYUJveCBbMC4wMDAgMC4wMDAgNjEyLjAwMCA3OTIuMDAwXQogPj4KZW5kb2JqCjQgMCBvYmoKWy9QREYgL1RleHQgXQplbmRvYmoKNSAwIG9iago8PAovQ3JlYXRvciAoRE9NUERGKQovQ3JlYXRpb25EYXRlIChEOjIwMTUwNzIwMTMzMzIzKzAyJzAwJykKL01vZERhdGUgKEQ6MjAxNTA3MjAxMzMzMjMrMDInMDAnKQo+PgplbmRvYmoKNiAwIG9iago8PCAvVHlwZSAvUGFnZQovUGFyZW50IDMgMCBSCi9Db250ZW50cyA3IDAgUgo+PgplbmRvYmoKNyAwIG9iago8PCAvRmlsdGVyIC9GbGF0ZURlY29kZQovTGVuZ3RoIDY2ID4+CnN0cmVhbQp4nOMy0DMwMFBAJovSuZxCFIxN9AwMzRTMDS31DCxNFUJSFPTdDBWMgKIKIWkKCtEaIanFJZqxCiFeCq4hAO4PD0MKZW5kc3RyZWFtCmVuZG9iago4IDAgb2JqCjw8IC9UeXBlIC9Gb250Ci9TdWJ0eXBlIC9UeXBlMQovTmFtZSAvRjEKL0Jhc2VGb250IC9UaW1lcy1Cb2xkCi9FbmNvZGluZyAvV2luQW5zaUVuY29kaW5nCj4+CmVuZG9iagp4cmVmCjAgOQowMDAwMDAwMDAwIDY1NTM1IGYgCjAwMDAwMDAwMDggMDAwMDAgbiAKMDAwMDAwMDA3MyAwMDAwMCBuIAowMDAwMDAwMTE5IDAwMDAwIG4gCjAwMDAwMDAyNzMgMDAwMDAgbiAKMDAwMDAwMDMwMiAwMDAwMCBuIAowMDAwMDAwNDE2IDAwMDAwIG4gCjAwMDAwMDA0NzkgMDAwMDAgbiAKMDAwMDAwMDYxNiAwMDAwMCBuIAp0cmFpbGVyCjw8Ci9TaXplIDkKL1Jvb3QgMSAwIFIKL0luZm8gNSAwIFIKPj4Kc3RhcnR4cmVmCjcyNQolJUVPRgo=
If you atob() this, you don't get the same result as on the console with base64 -D. Why?
Your issue looks identical to the one I needed to solve recently.
Here is what worked for me:
const binaryImg = atob(base64String);
const length = binaryImg.length;
const arrayBuffer = new ArrayBuffer(length);
const uintArray = new Uint8Array(arrayBuffer);
for (let i = 0; i < length; i++) {
uintArray[i] = binaryImg.charCodeAt(i);
}
const fileBlob = new Blob([uintArray], { type: 'application/pdf' });
saveAs(fileBlob, 'filename.pdf');
It seems that only doing a base64 decode is not enough...you need to put the result into a Uint8Array. Otherwise, the pdf pages appear blank.
I found this solution here:
https://github.com/sayanee/angularjs-pdf/issues/110#issuecomment-579988190
You can use btoa() and atob() work in some browsers:
For Exa.
var enc = btoa("this is some text");
alert(enc);
alert(atob(enc));
To JSON and base64 are completely independent.
Here's a JSON stringifier/parser (and direct GitHub link).
Here's a base64 Q&A. Here's another one.
At the moment I am writing a file like this:
var blob = new Blob([contents], {type: 'text/plain;charset=iso-8859-1'});
fileWriter.write(blob);
However, when I run file -i on the resulting file the charset is always UTF-8.
The variable contents is encoded in ISO-8859-1 on the server side, and then communicated over the wire in base64:
def write_csv_file
filewriter = RMS::LabelFile.for_order(self.order)
csv = filewriter.to_csv
csv = csv.encode("ISO-8859-1")
csv = Base64.encode64(csv)
%Q{<script type="text/javascript" charset="ISO-8859-1">
var csv_data = #{csv.inspect.gsub('\n', '')};
csv_data = window.atob(csv_data);
parent.phn.filewriter.writeFile("#{self.order.order_number}.csv", csv_data, 'ISO-8859-1');
</script>
}
end
I've checked and double checked that the encoding is still ISO-8859-1 on the client side in the javascript. It seems like Blob and fileWriter are changing the encoding before they write. Examining the W3C's working draft it seems as though Blob converts DOMStrings to UTF-8 before writing them.