Characters outside of ASCII are not displayed properly - javascript

I am trying to display characters outside ASCII but it doesn't work. I only get scrambled characters. The JavaScript file should also be encoded in UTF-8, at least IntelliJ says so. What is missing or causing the error?
I have this in the index.html (which also has its charset set to UTF-8).
<script src="javascript/app.js" charset="utf-8"></script>
Just trying to output
console.log("Å");
I have this in the index.html file. It is an AngularJs application.
<meta charset="utf-8"/>

Specify UTF-8 encoding in your HTML file. Here are some ways.
Check if your JavaScript file is really UTF-8-encoded (see also this question).

Related

Save JavaScript files using Notepad as Encoding of Ansi or UTF-8

I'm new at web development and JavaScript, I know that each html5 and Css file should be set as UTF-8 if it's included more than ANSI, but what about JavaScript? what simple to do when it comes to save a JavaScript file? I'm using windows7, save the file as ANSI or UTF-8?
Please see this attached image when saving a JavaScript using windows7 Notepad.
Thanks for your helps and answers!.
Your script files inherit their character encoding declarations from the document. So if you are using <meta charset="utf-8"> or HTTP header "Content-Type: text/html; charset=utf-8" in your document, then any script file that is referenced in the document should also be saved in UTF-8 format.
Generally speaking you should always use UTF-8 for everything unless you have no choice but to use a single byte encoding such as Windows-1252 (ANSI).
If you change the top dropdown to 'All Files' and then just add .js to the end of your file name that should do it.
You can leave the character encoding as UTF-8
You can use any of them. ANSI encoding is just an extension of ASCII with an additional 128 characters. I do not think there will be any advantage to using one over another(in the context of javascript programming) but I may be wrong. Here is a comparison

Convert unicode to Chinese characters

Supposing I have a string of code like so:
\u00e5\u00b1\u00b1\u00e4\u00b8\u008a\u00e7\u009a\u0084\u00e4\u00ba\u00ba
How would I convert these back into Chinese characters using Javascript:
山上的人
This is so that I can actually display Chinese on my web page. Right now it comes out as å±±ä¸ç人.
This website manages to accomplish this, however this is with PHP they don't expose.
I am not familiar with how character encoding works well at all, so I don't even know the terminology to search for a proper solution.
The string appears to be in UTF-8.
https://github.com/mathiasbynens/utf8.js is a helpful Javascript library that saves you the headache of learning the UTF-8 standard, and will decode the UTF-8 into text.
Here's a demo: https://mothereff.in/utf-8
Paste in \u00e5\u00b1\u00b1\u00e4\u00b8\u008a\u00e7\u009a\u0084\u00e4\u00ba\u00ba into the "UTF-8-encoded" textarea to decode it.
Add <meta charset="UTF-8"> inside the <head></head> tag of your HTML file so that it will display Chinese properly. Just put the Chinese characters directly in your HTML file

Outputting russian characters into the console

I am trying to output a russian string into the console like this
сonsole.log("Привет");
But the console ouputs this ПривеÑ. How to solve this problem?
You need to declare your site's (or scripts for that matter) encoding.
You can use <meta charset="UTF-8"> in the HEAD of your site to tell the whole page to be UTF-8 encoded. --
OR
If you just need your script to be encoded .. You can encode JUST the script -- IE <script type="text/javascript" charset="utf-8" src="blah.js"/>
Either way you should always tell your site/script which character set you are using.

Writing Javascript using UTF-16 character encoding

Here is what I am trying but am not sure how to get this working or if it is even possible -
I have an HTML page MyHTMLPage.htm and I want to src a Javascript from this HTML file. This is pretty straightforward. I plan to include a <script src = "MyJavascript.js"></script> tag in my HTML file and that should take care of it.
However, I want to create my Javascript file using UTF-16 encoding. So, I plan to use the following tag <script charset="UTF-16" src="MyJavascript.js"></script> in my HTML file to take care of that
Now the problem I am really stuck at is how do I create the Javascript using UTF-16 encoding - E.g. let's say my Javascript code is alert(1); I created my Javascript file with the contents as \u0061\u006c\u0065\u0072\u0074\u0028\u0031\u0029\u003b but that does not seem to execute as valid Javascript at runtime.
To summarize, here is what I have -
MyHTMLPage.html
...
...
...
<script charset="UTF-16" src="MyJavascript.js"></script>
...
...
...
MyJavascript.js
\u0061\u006c\u0065\u0072\u0074\u0028\u0031\u0029\u003b
When I open the HTML page in Firefox, I get the error - "Syntax error - Illegal character" right at the beginning of the MyJavascript.js file. I have also tried adding the BOM character "\ufeff" at the beginning of the above Javascript but I still get the same error.
I know I could create my Javascript file as - "alert(1);" and then save it using UTF-16 encoding using the text editor and then the browser runs it fine however is there a way I could use "\u" notation (or an alternate escape character) and still get the Javascript to execute fine?
Thanks,
You are misunderstanding character encoding. Character encoding is a scheme of how characters are represented as bits behind the scenes.
You would not write \u004a in your file to "make it utf-16" as that is literally a sequence of 6 characters:
\, u, 0, 0, 4, a
And if you saved the above as utf-16, it would be represented as the following bits:
005C0075
00300030
00340061
Had you saved it as utf-8 it would be:
5C753030
3461
Which takes 50% of the space and bandwidth. It takes even less to write that character literally ("J"): just a byte
(4A) in utf-8.
The "\u"-notation is a way to reference any BMP character by just using a small set of ascii characters. If you were
working with a text editor with no unicode support, you could write "\u2665", instead of literally writing "♥" and the
browser would show it properly.
If you for some weird reason still want to use utf-16, simply write the code normally, save the file as utf-16 and serve it with the proper charset header.

international characters in Javascript

I am working on a web application, where I transfer data from the server to the browser in XML.
Since I'm danish, I quickly run into problems with the characters æøå.
I know that in html, I use the "&aelig;&oslash;&aring;" for æøå.
however, as soon as the chars pass through JavaScript, I get black boxes with "?" in them when using æøå, and "æøå" is printed as is.
I've made sure to set it to utf-8, but that isn't helping much.
Ideally, I want it to work with any special characters (naturally).
The example that isn't working is included below:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
<script type="text/javascript" charset="utf-8">
alert("æøå");
alert("æøå");
</script>
</head>
<body>
</body>
</html>
What am I doing wrong?
Ok, thanks to Grapefrukts answer, I got it working.
I actually needed it for data coming from an MySQL server. Since the saving of the files in UTF-8 encoding only solves the problem for static content, I figure I'd include the solution for strings from a MySQL server, pulled out using PHP:
utf8_encode($MyStringHere)
If you ever can't set the response encoding, you can use \u escape sequence in the JavaScript string literal to display these characters.
alert("\u00e6\u00f8\u00e5")
Just specifying UTF-8 in the header is not enough. I'd bet you haven't saved your file as UTF-8. Any reasonably advanced text editor will have this option. Try that and I'm sure it'll work!
You can also use String.fromCharCode() to output a character from a numeric entity.
e.g. String.fromCharCode( 8226 ) will create a bullet character.
I get "æøå" for the first one and some junk characters for the next. Could it be that the javascript is not mangling (or mojibake) your letters but the alert dialog uses the system default font, and the font is incapable of displaying the letters?
I use the code like this with Thai language. It's fine.
$message is my PHP variable.
echo("<html><head><meta charset='utf-8'></head><body><script type='text/javascript'>alert('" . $message . "');</script></body></html>");
Hope this can help. Thank you.
(I cannot post image of what I did as the system said "I don't have enough reputation", so I leave the image, here. http://goo.gl/9P3DtI Sorry for inconvenience.)
Sorry for my weak English.
This works as expected for me:
alert("æøå");
... creates an alert containing the string "æøå" whereas
alert("æøå");
... creates an alert with the non-ascii characters.
Javascript is pretty utf-8 clean and doesn't tend to put obstacles in your way.
Maybe you're putting this on a web server that serves it as ISO-8859-1? If you use Apache, in your Apache config file (or in .httaccess, if you can override), you should have a line
AddCharset utf-8 .js
(Note: edited to escape the ampersands... otherwise it didn't make sense.)

Categories

Resources