FileSaver.js can't specify the charset - javascript

I'm using a FilseSaver.js to save a rtf file. This is working fine.
However when I'm using special chars, it goes wrong... The charset automatically "changes" from ansi to utf-8 and characters aren't displayed right in the rtf document.
I've tried "forcing" ansi, but it seems like all browsers ignore this setting?
Here is a part of my script:
var blob = new Blob([rtf], {type: "application/rtf;charset=windows-1252"});
saveAs(blob, filename);
This can be fixed by converting my special characters to unicode of hex characters. However using the correct charset seems simpler. Why isn't my script working and how should I fix this?
Thanks!

I've come across the same problem, and it seems there's no possibility of stating the charset in the constructor, other than to manually encode your data into the encoding you need.
See this issue https://github.com/eligrey/FileSaver.js/issues/14

Related

Save JavaScript files using Notepad as Encoding of Ansi or UTF-8

I'm new at web development and JavaScript, I know that each html5 and Css file should be set as UTF-8 if it's included more than ANSI, but what about JavaScript? what simple to do when it comes to save a JavaScript file? I'm using windows7, save the file as ANSI or UTF-8?
Please see this attached image when saving a JavaScript using windows7 Notepad.
Thanks for your helps and answers!.
Your script files inherit their character encoding declarations from the document. So if you are using <meta charset="utf-8"> or HTTP header "Content-Type: text/html; charset=utf-8" in your document, then any script file that is referenced in the document should also be saved in UTF-8 format.
Generally speaking you should always use UTF-8 for everything unless you have no choice but to use a single byte encoding such as Windows-1252 (ANSI).
If you change the top dropdown to 'All Files' and then just add .js to the end of your file name that should do it.
You can leave the character encoding as UTF-8
You can use any of them. ANSI encoding is just an extension of ASCII with an additional 128 characters. I do not think there will be any advantage to using one over another(in the context of javascript programming) but I may be wrong. Here is a comparison

Unicode -- What's going on here?

This code:
console.log('😀');
console.log('\uD83D\uDE00');
From HTML script tag:
😀
😀
Ran pasted into browser console (same browser):
😀
😀
What's going on here that causes the first console.log('😀'); to fail when it's included with a script tag, but work fine when run in the browser console. The obvious problem seems to be that it isn't being converted to a surrogate pair, since the second line works as expected.
Your HTML file is not saved in the same encoding that the HTTP headers or HTML meta tags advertise. The file is interpreted in the wrong encoding resulting in the wrong characters. That doesn't matter for the unicode escape sequence, which is pure ASCII, it does matter for the non-ASCII literal.
Concrete guess: the file is saved as UTF-8 but advertised as ISO-8859-1.

Character in the url is changed when using window.location.search

I have an url like: file:///C:/Users/index.html?Scale:%20Service-Qualität
When I use window.location.search to get the parameter in the url, in this case the parameter should be Scale: Service-Qualität but what I actually received was Scale:%20Service-Qualit%C3%A4t, I dont know why my character ä changed to %C3%A4 and when I tested in the console it displayed as Scale: Service-Qualität
Can anyone help me to fix this problem?
I found the solution for my problem. What I need to do is decode again my url using: decodeURIComponent(url); then I will get again exact url string.
You are seeing two issues here.
The ä being converted into %C3%A4 is called URL or percent encoding.
It's because URLs can't, technically, contain Unicode characters.
Browsers and servers work around this by converting non-ASCII characters in URLs to their percent encoded equivalents.
It's generally nothing to worry about.
In your case however, there seems to be an actual problem as well. The weird output in the console could be because your web page uses a single-byte encoding (like ISO-8859-1) instead of UTF-8.
Switching the web page to UTF-8 might solve the problem, using this Meta tag:
<meta charset="utf-8"/>
and, of course, saving the HTML file as UTF-8 in your editor.

Escape HTML tags. Any issue possible with charset encoding?

I have a function to escape HTML tags, to be able to insert text into HTML.
Very similar to:
Can I escape html special chars in javascript?
I know that Javascript use Unicode internally, but HTML pages may be encoded in different charsets like UTF-8 or ISO8859-1, etc..
My question is: There is any issue with this very simple conversion? or should I take into consideration the page charset?
If yes, how to handle that?
PS: For example, the equivalente PHP function (http://php.net/manual/en/function.htmlspecialchars.php) has a parameter to select a charset.
No, JavaScript lives in the Unicode world so encoding issues are generally invisible to it. escapeHtml in the linked question is fine.
The only place I can think of where JavaScript gets to see bytes would be data: URLs (typically hidden beneath base64). So this:
var markup = '<p>Hello, '+escapeHtml(user_supplied_data);
var url = 'data:text/html;base64,'+btoa(markup);
iframe.src = url;
is in principle a bad thing. Although I don't know of any browsers that will guess UTF-7 in this situation, a charset=... parameter should be supplied to ensure that the browser uses the appropriate encoding for the data. (btoa uses ISO-8859-1, for what it's worth.)

HTML Encode String

I am trying to HTML-Encode a string with jQuery, but I can't seem to find the right encoding format.
What I got is a String like Ãœtest.docx. The server doesn't handle special characters very well so that I get a FileNotFoundException from Java (I have no way of editing the server itself).
Now, I tried around and found out that the URL works when I replace Ü with %DC. Now I tought this is called HTML Encoding, googled a bit but I always get results saying something about URL-Encoding. I checked that, and it seems like this isn't the right encoding, because Ü is beeing encoded to %C3%9C, which doesn't work for the server.
Now, which encoding is it, that would encode Ü to %DC? And is there a function in javascript or jQuery that would to the encoding for me?
Thanks for any help, I've been trying to find out which encoding I need for an hour now, but no luck.
They are both URL encoding, just that the UTF-8 one is a newer standard.
If you are using Tomcat, you can use just encodeURIComponent() which uses UTF-8
and works when you set the Tomcat connector URIEncoding attribute to <connector URIEncoding="UTF-8" ...>
If that's not ok, you can use this:
function uriEncodeLegacy( str ) {
return escape(str.replace( /[\u0100-\uFFFF]/g, ""));
}
uriEncodeLegacy("Ü") //%DC
However UTF-8 is recommended, otherwise you cannot even support the € character for example.

Categories

Resources