international characters in Javascript - javascript

I am working on a web application, where I transfer data from the server to the browser in XML.
Since I'm danish, I quickly run into problems with the characters æøå.
I know that in html, I use the "æøå" for æøå.
however, as soon as the chars pass through JavaScript, I get black boxes with "?" in them when using æøå, and "æøå" is printed as is.
I've made sure to set it to utf-8, but that isn't helping much.
Ideally, I want it to work with any special characters (naturally).
The example that isn't working is included below:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
<script type="text/javascript" charset="utf-8">
alert("æøå");
alert("æøå");
</script>
</head>
<body>
</body>
</html>
What am I doing wrong?
Ok, thanks to Grapefrukts answer, I got it working.
I actually needed it for data coming from an MySQL server. Since the saving of the files in UTF-8 encoding only solves the problem for static content, I figure I'd include the solution for strings from a MySQL server, pulled out using PHP:
utf8_encode($MyStringHere)

If you ever can't set the response encoding, you can use \u escape sequence in the JavaScript string literal to display these characters.
alert("\u00e6\u00f8\u00e5")

Just specifying UTF-8 in the header is not enough. I'd bet you haven't saved your file as UTF-8. Any reasonably advanced text editor will have this option. Try that and I'm sure it'll work!

You can also use String.fromCharCode() to output a character from a numeric entity.
e.g. String.fromCharCode( 8226 ) will create a bullet character.

I get "æøå" for the first one and some junk characters for the next. Could it be that the javascript is not mangling (or mojibake) your letters but the alert dialog uses the system default font, and the font is incapable of displaying the letters?

I use the code like this with Thai language. It's fine.
$message is my PHP variable.
echo("<html><head><meta charset='utf-8'></head><body><script type='text/javascript'>alert('" . $message . "');</script></body></html>");
Hope this can help. Thank you.
(I cannot post image of what I did as the system said "I don't have enough reputation", so I leave the image, here. http://goo.gl/9P3DtI Sorry for inconvenience.)
Sorry for my weak English.

This works as expected for me:
alert("æøå");
... creates an alert containing the string "æøå" whereas
alert("æøå");
... creates an alert with the non-ascii characters.
Javascript is pretty utf-8 clean and doesn't tend to put obstacles in your way.
Maybe you're putting this on a web server that serves it as ISO-8859-1? If you use Apache, in your Apache config file (or in .httaccess, if you can override), you should have a line
AddCharset utf-8 .js
(Note: edited to escape the ampersands... otherwise it didn't make sense.)

Related

Save JavaScript files using Notepad as Encoding of Ansi or UTF-8

I'm new at web development and JavaScript, I know that each html5 and Css file should be set as UTF-8 if it's included more than ANSI, but what about JavaScript? what simple to do when it comes to save a JavaScript file? I'm using windows7, save the file as ANSI or UTF-8?
Please see this attached image when saving a JavaScript using windows7 Notepad.
Thanks for your helps and answers!.
Your script files inherit their character encoding declarations from the document. So if you are using <meta charset="utf-8"> or HTTP header "Content-Type: text/html; charset=utf-8" in your document, then any script file that is referenced in the document should also be saved in UTF-8 format.
Generally speaking you should always use UTF-8 for everything unless you have no choice but to use a single byte encoding such as Windows-1252 (ANSI).
If you change the top dropdown to 'All Files' and then just add .js to the end of your file name that should do it.
You can leave the character encoding as UTF-8
You can use any of them. ANSI encoding is just an extension of ASCII with an additional 128 characters. I do not think there will be any advantage to using one over another(in the context of javascript programming) but I may be wrong. Here is a comparison

Convert unicode to Chinese characters

Supposing I have a string of code like so:
\u00e5\u00b1\u00b1\u00e4\u00b8\u008a\u00e7\u009a\u0084\u00e4\u00ba\u00ba
How would I convert these back into Chinese characters using Javascript:
山上的人
This is so that I can actually display Chinese on my web page. Right now it comes out as å±±ä¸ç人.
This website manages to accomplish this, however this is with PHP they don't expose.
I am not familiar with how character encoding works well at all, so I don't even know the terminology to search for a proper solution.
The string appears to be in UTF-8.
https://github.com/mathiasbynens/utf8.js is a helpful Javascript library that saves you the headache of learning the UTF-8 standard, and will decode the UTF-8 into text.
Here's a demo: https://mothereff.in/utf-8
Paste in \u00e5\u00b1\u00b1\u00e4\u00b8\u008a\u00e7\u009a\u0084\u00e4\u00ba\u00ba into the "UTF-8-encoded" textarea to decode it.
Add <meta charset="UTF-8"> inside the <head></head> tag of your HTML file so that it will display Chinese properly. Just put the Chinese characters directly in your HTML file

Outputting russian characters into the console

I am trying to output a russian string into the console like this
сonsole.log("Привет");
But the console ouputs this ПривеÑ. How to solve this problem?
You need to declare your site's (or scripts for that matter) encoding.
You can use <meta charset="UTF-8"> in the HEAD of your site to tell the whole page to be UTF-8 encoded. --
OR
If you just need your script to be encoded .. You can encode JUST the script -- IE <script type="text/javascript" charset="utf-8" src="blah.js"/>
Either way you should always tell your site/script which character set you are using.

Why does this ​ sign pop up?

Whilst running a code on notepad ++ that makes traffic lights run automatically I notices that this sign pop ups next to one of the buttons, ​. I did some research and found out that to make it go away I need to specify the charset to equal utf-8. I did this and the sign went away however I am confused because the default character encoding in HTML5 is utf-8 and it is even shown in notepad that it is using utf-8.
I was wondering if someone could tell me why the sign pops up considering the fact that it was already encoded in utf-8.
There are a number of things that all need to be set to UTF8.
The original file, of course, needs to be UTF8.
However, there is also an HTML header that specifies the encoding of the file. If this header is set incorrectly, the browser may try another encoding.
So, using a specific over-ride in the HTML file can "work around" this issue.
There a bit of discussion here: <meta charset="utf-8"> vs <meta http-equiv="Content-Type">

How do I keep my UTF-8 characters from becoming junk?

I'm creating a simple JavaScript multiple choice game. Here is a sample question:
p ∧ q ≡ q ∧ p by which rule?
When I run it on localhost, it works fine, it prints out those special characters. However, when I upload it to my school's server, it prints out garbage:
p ∨ q ≡ q ∨ p by which rule?
I have this at the top of my HTML:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I can't use PHP in my assignment, or I'd use header('Content-Type: text/xml, charset=utf-8');
If you want, I can give a link... but I'd rather not because then everyone can see my really bad educational game...
How can I keep my UTF-8 characters?
Edit: I found out that if I Filezilla my files up to the server and download them from the server, the characters become little squares. I don't know if that's useful information.
Edit: I found out that if I Filezilla my files up to the server and download them from the server, the characters become little squares. I don't know if that's useful information.
Yes, filezilla is corrupting your files in transit. Make sure filezilla transfers your files as binary in order to make sure the text doesn't get corrupted in transit. If its transferring in ascii mode, it'll try to fix newlines and unrecognized characters.
If you cannot easily fix the HTTP headers, escape from the problem by using “character escapes.” If e.g. “∧” occurs in HTML content, use ∧ for it. If it occurs in a JavaScript string literal, use \u2227 for it.
To check out the codes for other characters, consult e.g.
http://www.alanwood.net/unicode/mathematical_operators.html
Copying and pasting the questions into notepad or any other app that allows you to save as UTF-8 might work if that is a viable option.
I think you could also use a regex to identify the hex values and replace them with the corresponding value that would work in UTF8.
Also if you're using a specialized type of font this could cause the problem - are the questions stylized with a particular font? a set of fallbacks? you may need to do an #font-face import but I suspect there's another option...with the symbols you're trying to use it seems like LaTeX might be an option..I believe there are a few options out there for javascript, fonts, etc..
this article may also be useful: http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html

Categories

Resources