Javascript -> Download CSV file encoded in ISO-8859-1 / Latin1 / Windows-1252 - javascript

I have hacked together a small tool to extract shipping data from Amazon CSV order data. it works so far. here is a simple version as JS Bin: http://output.jsbin.com/jarako
For printing stamps/shipping labels, I need a file for uploading to Deutsche Post and to other parcel services. I used a small function saveTextAsFile which i found on stackoverflow. Everything good so far. No wrong displayed special characters (äöüß...) in the output textarea or downloaded files.
All these german post / parcel services sites accept only latin1 / iso-8859-1 encoded files for upload. But my downloaded file is always utf-8. If i upload it, all special characters (äöüß...) go wrong.
How can i change this? I still searched a lot. I have tried i.e.:
Setting the charset of the tool to iso-8859-1:
<META http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
But the result is: Now I have wrong special characters still in the output textarea and in the downloaded file. If I upload it to the post site, I still get more wrong characters. Also if I check the encoding in CODA Editor it still says the downloaded file is UTF-8.
The saveTextAsFile function uses var textFileAsBlob = new Blob([textToWrite], {type:'text/plain'});. May be there is a ways to set the charset for download there!?
function saveTextAsFile()
{
var textToWrite = $('#dataOutput').val();
var textFileAsBlob = new Blob([textToWrite], {type:'text/plain'});
var fileNameToSaveAs = "Brief.txt";
var downloadLink = document.createElement("a");
downloadLink.download = fileNameToSaveAs;
downloadLink.innerHTML = "Download File";
if (window.webkitURL != null)
{
// Chrome allows the link to be clicked
// without actually adding it to the DOM.
downloadLink.href = window.webkitURL.createObjectURL(textFileAsBlob);
}
else
{
// Firefox requires the link to be added to the DOM
// before it can be clicked.
downloadLink.href = window.URL.createObjectURL(textFileAsBlob);
downloadLink.onclick = destroyClickedElement;
downloadLink.style.display = "none";
document.body.appendChild(downloadLink);
}
downloadLink.click();
}
Anyhow, there have to be a way to download files in other encoding as the site uses itself. The Amazon site, where i download the CSV file from is UTF-8 encoded. But downloaded CSV file from there is Latin1 (iso-8859-1) if i check it in CODA...

SCROLL DOWN TO THE UPDATE for the real solution!
Because I got no answer, I have searched more and more. It looks like there is NO SOLUTION in Javascript. Every test download I'v made, which was generated in javascript was UTF-8 encoded. Looks like Javascript is only made for UNICODE / UTF-8 or an other encoding would (possibly) only apply if the data would be transported again using a former HTTP transport. But for a Javascript, which runs on the client no additional HTTP transport happens, because the data is still on the client..
I have helped me now with building a small PHP Script on my server, to which i send the Data via GET or POST request. It converters the encoding to latin1 / ISO-8859-1 and downloads it as file. This is a ISO-8859-1 file with correctly encoded special characters, which I can upload to the mentioned postal and parcel service sites and everything looks good.
latin-download.php: (It is VERY IMPORTANT to save the PHP file itself also in ISO-8859-1, to make it work!!)
<?php
$decoded_a = urldecode($_REQUEST["a"]);
$converted_to_latin = mb_convert_encoding($decoded_a,'ISO-8859-1', 'UTF-8');
$filename = $_REQUEST["filename"];
header('Content-Disposition: attachment; filename="'.$filename.'"; content-type: text/plain; charset=iso-8859-1;');
echo $converted_to_latin;
?>
in my javascript code i use:
<a id="downloadlink">Download File</a>
<script>
var mydata = "this is testdata containing äöüß";
document.getElementById("downloadlink").addEventListener("click", function() {
var mydataToSend = encodeURIComponent(mydata);
window.open("latin-download.php?a=" + mydataToSend + "&filename=letter-max.csv");
}, false);
</script>
for bigger amounts of data you have to switch from GET to POST...
UPDATE 08-Feb-2016
A half year later now i have found a solution in PURE JAVASCRIPT. Using inexorabletash/text-encoding. This is a polyfill for Encoding Living Standard. The standard includes decoding of old encodings like latin1 ("windows-1252"), but it forbids encoding into these old encoding types. So if you use the browser implemented window.TextEncoder function it does offer only UTF encoding. BUT, the polyfill solution offers a legacy mode, which does ALLOW also encoding into old encodings like latin1.
i use it like that:
<!DOCTYPE html>
<script>
// 'Copy' browser build in TextEncoder function to TextEncoderOrg (because it can NOT encode windows-1252, but so you can still use it as TextEncoderOrg() )
var TextEncoderOrg = window.TextEncoder;
// ... and deactivate it, to make sure only the polyfill encoder script that follows will be used
window.TextEncoder = null;
</script>
<script src="lib/encoding-indexes.js"></script> // needed to support encode to old encoding types
<script src="lib/encoding.js"></script> // encording polyfill
<script>
function download (content, filename, contentType) {
if(!contentType) contentType = 'application/octet-stream';
var a = document.createElement('a');
var blob = new Blob([content], {'type':contentType});
a.href = window.URL.createObjectURL(blob);
a.download = filename;
a.click();
}
var text = "Es wird ein schöner Tag!";
// Do the encoding
var encoded = new TextEncoder("windows-1252",{ NONSTANDARD_allowLegacyEncoding: true }).encode(text);
// Download 2 files to see the difference
download(encoded,"windows-1252-encoded-text.txt");
download(text,"utf-8-original-text.txt");
</script>
The encoding-indexes.js file is about 500kb big, because it contains all the encoding tables. Because i need only windows-1252 encoding, for my use i have deleted the other encodings in this file. so now there are only 632 byte left.

The problem is not the encoding but the fact that the special characters are displayed wrong in some applications, e.g. Microsoft Excel. UTF-8 is fine for displaying all special German characters. You can fix the problem by adding a Byte order mark (BOM) in front of the csv.
const BOM = "\uFEFF"
let csvData = BOM + csvData
const blob = new Blob([csvData], { type: "text/csv;charset=utf-8" });
Solution based on this github post

You cannot force a web server to send you data in a given encoding, only ask it politely. Your approach to just convert to the format you need is the right way to go.
If you wanted to avoid the PHP script, you may have luck specifying the encoding as a parameter when creating your Blob:
var textFileAsBlob = new Blob(textToWrite, {
type: 'text/plain;charset=ISO-8859-1',
encoding: "ISO-8859-1"
});
See Specifying blob encoding in Google Chrome for more details.

Related

Issues downloading picture to local system using OneNote API

I'm trying to write a program that dowloads OneNote pages to my pc, including files in the pages. I'm stuck on the downloading images from the pages. I make a GET request and get the binary data for the image just fine, when I save it and try to open it, I get a "it looks like we don't support this file format.
The code I'm using is
var u16 = btoa(unescape(encodeURIComponent(resp)));
var imgAsBlob = new Blob([u16], {type: 'application/octet-stream'});
var downloadLink = document.createElement("a");
downloadLink.download = "hello.png";
downloadLink.href = window.webkitURL.createObjectURL(imgAsBlob);
downloadLink.click();
resp is the responseText from the GET request with the binary data.
I've tried not using btoa and saving the resp directly on the blob. I've tried changing the blob type to image/png and I've tried escaping it using Uint16Array(resp.length) and equaling each byte to a byte from resp. I'm out of ideas and don't know what I'm doing wrong.

blob with conversion to 8bit cp1251 or cp1252

I need a solution with encoding utf to 8-bit cp1251 or cp1252 using blob
I managed to change the
https://github.com/b4stien/js-csv-encoding including windows 1251, but there are insoluble problems:
Unfortunately noscript does not allow loading external javascript on a page with scripts turned off via it.
Therefore, it is impossible to use js-csv-encoding in the bookmarker, as well as to load jquery! Disabling noscript, especially after meltdown and specter is simply not secure.
Therefore, only the version of a small script written in native javascript is left.
If you find an alternative way to run jquery with noscript off, then finding a solution will be easier although I doubt it's possible.
A good solution would be
https://www.npmjs.com/package/windows-1251 or https://www.npmjs.com/package/windows-1252
However, it does not succeed to transcode two-byte text into a single-byte text through these scripts. For example:
<script src="windows-1251.js"></script>
<script type="text/javascript">
function download(text, name, type) {
var a = document.getElementById("a");
var file = new Blob([text], {type: type});
a.href = URL.createObjectURL(file);
a.download = name;
</script>
There have been many attempts to use windows1251, for example these:
<script type="text/javascript">
function exportToCsv() {
window.open(windows1251.encode('data:text/csv;charset=windows-1251,' +'текст'));
}
var button = document.getElementById('b');
button.addEventListener('click', exportToCsv);
</script>
<script type="text/javascript">
function exportToCsv() {window.open('data:text/csv;charset=windows-1251,' +windows1251.encode('текст'));}
var button = document.getElementById('b');
button.addEventListener('click', exportToCsv);
</script>
Using encode or decode from windows-1251 does not translate the script into a 8-bit format. In js-csv-encoding, csvContentEncoded is used for transcoding:
Attempts to use something like that have failed. Perhaps you need some kind of hack, just put windows-1251 is not enough,
since js stores in utf8, then most likely you need to add the conversion to 1251 at the very end. Part of the code: js-csv-encoding.
var csvContent = 'текст',
textEncoder = new CustomTextEncoder('windows-1251', {NONSTANDARD_allowLegacyEncoding: true}),
fileName = 'some-data.csv';
var a = document.getElementById('download-csv');
a.addEventListener('click', function(e) {
var csvContentEncoded = textEncoder.encode([csvContent]);
var blob = new Blob([csvContentEncoded], {type: 'text/csv;charset=windows-1251;'});
saveAs(blob, fileName);
e.preventDefault();
});
I also tried to use conversions using charcode, saving not to the server but to the computer, so using urlencode .. is not the right solution, because in this case I have to encode the text into the readable one.
Of course, it's hard to find a solution of no more than 4000-5000 characters for a bookmarklet, and my knowledge is not enough.
If there is a solution with the help of other scripts, for example, recoding by the value table, this can also be a solution.
I spent the half of the day trying to save an xml file with Cyrillic symbols in windows-1251 encoding. Turned out it's pretty easy - you just need to create an appropriate byte array. See the example below (The full repo with this example):
import iconv from 'pika-iconv-lite';
import saveAs from 'save-as';
const byteArrayWin1251 = iconv.encode(
`<?xml version="1.0" encoding="windows-1251"?>
<note>
<to>Михаил</to>
<from>Андрей</from>
<heading>Reminder</heading>
<body>Вот такая вот xml! И сохранюсь я как win-1251</body>
</note>`,
'win1251'
);
const blob = new Blob([byteArrayWin1251], { type: 'application/xml;charset=windows-1251' })
saveAs(blob, 'myxml.xml');

Decode Base64 encoded PDF content in browser

We transform HTML to PDF in the backend (PHP) using dompdf. The generated output from dompdf is Base64 encoded with
$output = $dompdf->output();
base64_encode($output);
This Base64 encoded content is saved as a file on the server. When we decode this file content like this:
cat /tmp/55acbaa9600f4 | base64 -D > test.pdf
we get a proper PDF file.
But when we transfer the Base64 content to the client as a string value inside a JSON object (the server provides a RESTful API...):
{
"file_data": "...the base64 string..."
}
And decode it with atob() and then create a Blob object to download the file later on, the PDF is always "empty"/broken.
$scope.downloadFileData = function(doc) {
DocumentService.getFileData(doc).then(function(data) {
var decodedFileData = atob(data.file_data);
var file = new Blob([decodedFileData], { type: doc.file_type });
saveAs(file, doc.title + '.' + doc.extension);
});
};
When we log the decoded content, it seems that the content is "broken", because several symbols are not the same as when we decode the content on the server using base64 -D.
When we encode/decode the content of simple text/plain documents, it's working as expected. But all binary (or not ASCII formats) are not working.
We have searched the web for many hours, but didn't find a solution for this that works for us. Does anyone have the same problem and can provide us with a working solution? Thanks in advance!
This is a example for a on the server Base64 encoded content of a PDF document:
JVBERi0xLjMKMSAwIG9iago8PCAvVHlwZSAvQ2F0YWxvZwovT3V0bGluZXMgMiAwIFIKL1BhZ2VzIDMgMCBSID4+CmVuZG9iagoyIDAgb2JqCjw8IC9UeXBlIC9PdXRsaW5lcyAvQ291bnQgMCA+PgplbmRvYmoKMyAwIG9iago8PCAvVHlwZSAvUGFnZXMKL0tpZHMgWzYgMCBSCl0KL0NvdW50IDEKL1Jlc291cmNlcyA8PAovUHJvY1NldCA0IDAgUgovRm9udCA8PCAKL0YxIDggMCBSCj4+Cj4+Ci9NZWRpYUJveCBbMC4wMDAgMC4wMDAgNjEyLjAwMCA3OTIuMDAwXQogPj4KZW5kb2JqCjQgMCBvYmoKWy9QREYgL1RleHQgXQplbmRvYmoKNSAwIG9iago8PAovQ3JlYXRvciAoRE9NUERGKQovQ3JlYXRpb25EYXRlIChEOjIwMTUwNzIwMTMzMzIzKzAyJzAwJykKL01vZERhdGUgKEQ6MjAxNTA3MjAxMzMzMjMrMDInMDAnKQo+PgplbmRvYmoKNiAwIG9iago8PCAvVHlwZSAvUGFnZQovUGFyZW50IDMgMCBSCi9Db250ZW50cyA3IDAgUgo+PgplbmRvYmoKNyAwIG9iago8PCAvRmlsdGVyIC9GbGF0ZURlY29kZQovTGVuZ3RoIDY2ID4+CnN0cmVhbQp4nOMy0DMwMFBAJovSuZxCFIxN9AwMzRTMDS31DCxNFUJSFPTdDBWMgKIKIWkKCtEaIanFJZqxCiFeCq4hAO4PD0MKZW5kc3RyZWFtCmVuZG9iago4IDAgb2JqCjw8IC9UeXBlIC9Gb250Ci9TdWJ0eXBlIC9UeXBlMQovTmFtZSAvRjEKL0Jhc2VGb250IC9UaW1lcy1Cb2xkCi9FbmNvZGluZyAvV2luQW5zaUVuY29kaW5nCj4+CmVuZG9iagp4cmVmCjAgOQowMDAwMDAwMDAwIDY1NTM1IGYgCjAwMDAwMDAwMDggMDAwMDAgbiAKMDAwMDAwMDA3MyAwMDAwMCBuIAowMDAwMDAwMTE5IDAwMDAwIG4gCjAwMDAwMDAyNzMgMDAwMDAgbiAKMDAwMDAwMDMwMiAwMDAwMCBuIAowMDAwMDAwNDE2IDAwMDAwIG4gCjAwMDAwMDA0NzkgMDAwMDAgbiAKMDAwMDAwMDYxNiAwMDAwMCBuIAp0cmFpbGVyCjw8Ci9TaXplIDkKL1Jvb3QgMSAwIFIKL0luZm8gNSAwIFIKPj4Kc3RhcnR4cmVmCjcyNQolJUVPRgo=
If you atob() this, you don't get the same result as on the console with base64 -D. Why?
Your issue looks identical to the one I needed to solve recently.
Here is what worked for me:
const binaryImg = atob(base64String);
const length = binaryImg.length;
const arrayBuffer = new ArrayBuffer(length);
const uintArray = new Uint8Array(arrayBuffer);
for (let i = 0; i < length; i++) {
uintArray[i] = binaryImg.charCodeAt(i);
}
const fileBlob = new Blob([uintArray], { type: 'application/pdf' });
saveAs(fileBlob, 'filename.pdf');
It seems that only doing a base64 decode is not enough...you need to put the result into a Uint8Array. Otherwise, the pdf pages appear blank.
I found this solution here:
https://github.com/sayanee/angularjs-pdf/issues/110#issuecomment-579988190
You can use btoa() and atob() work in some browsers:
For Exa.
var enc = btoa("this is some text");
alert(enc);
alert(atob(enc));
To JSON and base64 are completely independent.
Here's a JSON stringifier/parser (and direct GitHub link).
Here's a base64 Q&A. Here's another one.

How can I write a file in ISO-8859-1 with Chrome's File API

At the moment I am writing a file like this:
var blob = new Blob([contents], {type: 'text/plain;charset=iso-8859-1'});
fileWriter.write(blob);
However, when I run file -i on the resulting file the charset is always UTF-8.
The variable contents is encoded in ISO-8859-1 on the server side, and then communicated over the wire in base64:
def write_csv_file
filewriter = RMS::LabelFile.for_order(self.order)
csv = filewriter.to_csv
csv = csv.encode("ISO-8859-1")
csv = Base64.encode64(csv)
%Q{<script type="text/javascript" charset="ISO-8859-1">
var csv_data = #{csv.inspect.gsub('\n', '')};
csv_data = window.atob(csv_data);
parent.phn.filewriter.writeFile("#{self.order.order_number}.csv", csv_data, 'ISO-8859-1');
</script>
}
end
I've checked and double checked that the encoding is still ISO-8859-1 on the client side in the javascript. It seems like Blob and fileWriter are changing the encoding before they write. Examining the W3C's working draft it seems as though Blob converts DOMStrings to UTF-8 before writing them.

JPEG data obtained from FileReader doesn't match data in file

I'm trying to select a local JPEG file in the web browser via the HTML5 FileReader so I can submit it to a server without reloading the page. All the mechanics are working and I think I'm transferring and saving the exact data that JavaScript gave me, but the result is an invalid JPEG file on the server. Here's the basic code that demonstrates the problem:
<form name="add_photos">
​<input type=​"file" name=​"photo" id=​"photo" /><br />
​<input type=​"button" value=​"Upload" onclick=​"upload_photo()​;​" />​
</form>
<script type="text/javascript">
function upload_photo() {
file = document.add_photos.photo.files[0];
if (file) {
fileReader = new FileReader();
fileReader.onload = upload_photo_ready;
fileReader.readAsBinaryString(file);
}
}
function upload_photo_ready(event) {
data = event.target.result;
// alert(data);
URL = "submit.php";
ajax = new XMLHttpRequest();
ajax.open("POST", URL, 1);
ajax.setRequestHeader("Ajax-Request", "1");
ajax.send(data);
}
</script>
Then my PHP script does this:
$data = file_get_contents("php://input");
$filename = "test.jpg";
file_put_contents($filename, $data);
$result = imagecreatefromjpeg($filename);
That last line throws a PHP error "test.jpg is not a valid JPEG file." If I download the data back to my Mac and try to open it in Preview, Preview says the file "may be damaged or use a file format that Preview doesn’t recognize."
If I open both the original file on my desktop and the uploaded file on the server in text editors to inspect their contents, they are almost but not quite the same. The original file starts like this:
ˇÿˇ‡JFIFˇ˛;CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 90
But the uploaded file starts like this:
ÿØÿàJFIFÿþ;CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 90
Interestingly, if I view the data in a JavaScript alert with the commented-out line above, it looks just like the uploaded file's data, so it seems as if the FileReader isn't giving the correct data at the very beginning, as opposed to a problem that is introduced while transferring or saving the data on the server. Can anyone explain this?
I'm using Safari 6 and I also tried Firefox 14.
UPDATE: I just figured out that if I skip the FileReader code and change ajax.send(data) to ajax.send(file), the image is transferred and saved correctly on the server. So my problem is basically solved, but I'll award the answer points to anyone who can explain why my original approach with readAsBinaryString didn't work.
Your problem lies with readAsBinaryString. This will transfer the binary data byte-for-byte into a string, so that you will send a text string to your PHP file. Now a text string always has an encoding; and when you use XmlHttpRequest to upload a string, by default it will use UTF-8.
So each character, which was originally supposed to represent one byte, will be encoded as UTF-8... which uses multiple bytes for each character with a code point above 127!
Your best best is to use readAsArrayBuffer instead of readAsBinaryString. This will avoid all the character set conversions (that are necessary when dealing with strings).

Categories

Resources