Why does this ​ sign pop up? - javascript

Whilst running a code on notepad ++ that makes traffic lights run automatically I notices that this sign pop ups next to one of the buttons, ​. I did some research and found out that to make it go away I need to specify the charset to equal utf-8. I did this and the sign went away however I am confused because the default character encoding in HTML5 is utf-8 and it is even shown in notepad that it is using utf-8.
I was wondering if someone could tell me why the sign pops up considering the fact that it was already encoded in utf-8.

There are a number of things that all need to be set to UTF8.
The original file, of course, needs to be UTF8.
However, there is also an HTML header that specifies the encoding of the file. If this header is set incorrectly, the browser may try another encoding.
So, using a specific over-ride in the HTML file can "work around" this issue.
There a bit of discussion here: <meta charset="utf-8"> vs <meta http-equiv="Content-Type">

Related

Why does this character encoding issue only occur on select systems?

We are using a JavaScript WYSIWYG text editor called CKEditor. The editor has a source view that marks up, with HTML, what the user has entered in the text editor. Sometimes the editor will insert non-breaking spaces ( ) into this source view, which is fine.
Everything seemed to work correctly on the dev machines so we deployed to our production servers. At this point we started seeing a weird  character (Â) being inserted into the text. After some reading I saw that this was reported in several tickets on the CKEditor bug tracking page. I was able to resolve the issue by setting the charset attribute on the script tag for ckeditor.js to UTF-8.
My question is this: Why did the script tag need the charsetattribute set in the first place, and why only on certain systems?
The last comment on this SO question mentions that the byte sequence for a non-breaking space in UTF-8 is actually the  character followed by a non-breaking space in latin1 (which is ISO-8859-1 right?). This could definitely be a clue because another  character is inserted, one after another, every time the user switches to source view. It is as if the CKEditor framework is trying to inject a non-breaking space, but that gets turned into Â&nbsp, then ÂÂ&nbsp, and so on. The content-type on all systems (viewed from Chrome debugger) is text/html;charset=ISO-8859-1, which I am unsure why. The Dfile.encoding option in all Tomcat configs is set to utf-8. The meta tag is also <meta charset="utf-8">.
Fire up your development tools in the Web browser. When a form is rendered / submitted, stop and look at the request and response headers that are sent back and forth. Make sure you see UTF-8 everywhere. If it's missing, then one side will assume "default encoding" - whatever that might be.
Also make sure you have set the charset on the forms because they don't automatically inherit the one from the page.
EDIT This page explains in detail how you can set the charset when using Tomcat plus the necessary code for your servlets.

How do I keep my UTF-8 characters from becoming junk?

I'm creating a simple JavaScript multiple choice game. Here is a sample question:
p ∧ q ≡ q ∧ p by which rule?
When I run it on localhost, it works fine, it prints out those special characters. However, when I upload it to my school's server, it prints out garbage:
p ∨ q ≡ q ∨ p by which rule?
I have this at the top of my HTML:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I can't use PHP in my assignment, or I'd use header('Content-Type: text/xml, charset=utf-8');
If you want, I can give a link... but I'd rather not because then everyone can see my really bad educational game...
How can I keep my UTF-8 characters?
Edit: I found out that if I Filezilla my files up to the server and download them from the server, the characters become little squares. I don't know if that's useful information.
Edit: I found out that if I Filezilla my files up to the server and download them from the server, the characters become little squares. I don't know if that's useful information.
Yes, filezilla is corrupting your files in transit. Make sure filezilla transfers your files as binary in order to make sure the text doesn't get corrupted in transit. If its transferring in ascii mode, it'll try to fix newlines and unrecognized characters.
If you cannot easily fix the HTTP headers, escape from the problem by using “character escapes.” If e.g. “∧” occurs in HTML content, use ∧ for it. If it occurs in a JavaScript string literal, use \u2227 for it.
To check out the codes for other characters, consult e.g.
http://www.alanwood.net/unicode/mathematical_operators.html
Copying and pasting the questions into notepad or any other app that allows you to save as UTF-8 might work if that is a viable option.
I think you could also use a regex to identify the hex values and replace them with the corresponding value that would work in UTF8.
Also if you're using a specialized type of font this could cause the problem - are the questions stylized with a particular font? a set of fallbacks? you may need to do an #font-face import but I suspect there's another option...with the symbols you're trying to use it seems like LaTeX might be an option..I believe there are a few options out there for javascript, fonts, etc..
this article may also be useful: http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html

Efficient way to encode scandinavian letters in a string

My JavaScript get string value from server:
var name = VALUE_FROM_SERVER;
This name will be shown on a web page, since name contains Scandinavian letter (for example, name could be TÖyoeävä) I need to encode it somehow to display it correctly on the browser.
In JavaScript, what is the most efficient way to encode all those Scandinavian letters?
(I prefer to do it with Javascript.)
(e.g. I would like to create a JS function which takes TÖyoeävä as parameter and returns TÖyoeävä)
var encoder=function(string){
for(var s=0; s<string.length; s++){
//Check each letter in the string, if it is Scandinavian, encode it??
}
}
I'd advise you not to encode in the JS. Just make sure your (html) page encoding matches what your server is returning.
Preferably, that would be a UTF-8 encoding (to be able to support other languages down the road). But if it's just Scandinavian languages you're interested in, ISO-8859-1 (Latin 1) is enough.
There's no way to tell from a random byte string if it's one encoding or another (generally speaking anyway). So you have to know in your Javascript what encoding the server is sending.
You also have to set your page's encoding at some point, and that point has to be before the browser starts interpreting its content.
So all in all, getting encoding A from the server en converting to encoding B on the client side is going to be tricky, and pretty much a waste of time (IMO). You're not gaining any flexibility that I can see, except allowing your server to change encodings, which doesn't seem like such a good idea.
UTF-8 all the way will save you headaches.
Just use the UTF-8 charset when sending the response from the server. For example
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Javascript encoding question

We have an external .js file that we want to include in a number of different pages. The file contains code for sorting a table on the client-side, and uses the ▲ and ▼ characters in the script to indicate which column is sorted and in which direction.
The script was originally written for an ASP.Net page to offload some sorting work from the server to client (prevent sorting postbacks when javascript is enabled). In that case, the encoding is pretty much always UTF-8 and it works great in that context.
However, we also have a number of older Classic ASP pages where we want to include the script. For these pages the encoding is more of a hodgepodge depending on who wrote the page when and what tool they were using (notepad, vs6, vs2005, other html helper). Often no encoding is specified in the page so it's up to the browser to pick, but there's really no hard rule for it that I can see.
The problem is that if a different (non-UTF8) encoding is used the ▼ and ▲ characters won't show up correctly. I tried using html entities instead, but couldn't get them to work well from the javascript.
How can I make the script adjust for the various potential encodings so that the "special" characters always show up correctly? Are there different characters I could be using, or a trick I missed to make the html entities work from javascript?
Here is the snippet where the characters are used:
// get sort direction, arrow
var dir = 1;
if (self.innerHTML.indexOf(" ▲") > -1)
dir = -1;
var arrow = (dir == 1)?" ▲":" ▼";
// SORT -- function that actually sorts- not relevant to the question
if (!SimpleTableSort(t.id, self.cellIndex, dir, sortType)) return;
//remove all arrows
for (var c = 0,cl=t.rows[0].cells.length;c<cl;c+=1)
{
var cell = t.rows[0].cells[c];
cell.innerHTML = cell.innerHTML.replace(" ▲", "").replace(" ▼", "");
}
// set new arrow
self.innerHTML += arrow;
For the curious, the code points I ended up using with the accepted answer were \u25B4 and \u25BC.
The encoding of the JavaScript file depends on the encoding of the HTML page, where it is embedded. If you have a UTF-8 JavaScript file and a ISO-8859-1 HTML page the JavaScript is interpreted as ISO-8859-1.
If you load the JavaScript from as a external file you could specify the encoding of the JavaScript:
<script type="text/javascript" charset="UTF-8" src="externalJS.js"></script>
Anyway the best option is to save all files related to a webproject in one encoding, UTF-8 recommended.
You want Javascript Unicode escapes i.e. "\uxxxx", where "xxxx" is the Unicode code point for the character. I believe "\u25B2" and "\u25BC" are the two you need.
I voted for both. I think both answers put together would be your best bet.
You're probably going to have to write the script twice, putting in a part for UTF-8, and putting in a part for non UTF-8. It's more trouble, and might not work all the time, STILL.
Someone needs to come up with standards for your developers. If you all write with at least the same encoding, it'll make things a lot easier for yourselves in the future.

international characters in Javascript

I am working on a web application, where I transfer data from the server to the browser in XML.
Since I'm danish, I quickly run into problems with the characters æøå.
I know that in html, I use the "&aelig;&oslash;&aring;" for æøå.
however, as soon as the chars pass through JavaScript, I get black boxes with "?" in them when using æøå, and "æøå" is printed as is.
I've made sure to set it to utf-8, but that isn't helping much.
Ideally, I want it to work with any special characters (naturally).
The example that isn't working is included below:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
<script type="text/javascript" charset="utf-8">
alert("æøå");
alert("æøå");
</script>
</head>
<body>
</body>
</html>
What am I doing wrong?
Ok, thanks to Grapefrukts answer, I got it working.
I actually needed it for data coming from an MySQL server. Since the saving of the files in UTF-8 encoding only solves the problem for static content, I figure I'd include the solution for strings from a MySQL server, pulled out using PHP:
utf8_encode($MyStringHere)
If you ever can't set the response encoding, you can use \u escape sequence in the JavaScript string literal to display these characters.
alert("\u00e6\u00f8\u00e5")
Just specifying UTF-8 in the header is not enough. I'd bet you haven't saved your file as UTF-8. Any reasonably advanced text editor will have this option. Try that and I'm sure it'll work!
You can also use String.fromCharCode() to output a character from a numeric entity.
e.g. String.fromCharCode( 8226 ) will create a bullet character.
I get "æøå" for the first one and some junk characters for the next. Could it be that the javascript is not mangling (or mojibake) your letters but the alert dialog uses the system default font, and the font is incapable of displaying the letters?
I use the code like this with Thai language. It's fine.
$message is my PHP variable.
echo("<html><head><meta charset='utf-8'></head><body><script type='text/javascript'>alert('" . $message . "');</script></body></html>");
Hope this can help. Thank you.
(I cannot post image of what I did as the system said "I don't have enough reputation", so I leave the image, here. http://goo.gl/9P3DtI Sorry for inconvenience.)
Sorry for my weak English.
This works as expected for me:
alert("æøå");
... creates an alert containing the string "æøå" whereas
alert("æøå");
... creates an alert with the non-ascii characters.
Javascript is pretty utf-8 clean and doesn't tend to put obstacles in your way.
Maybe you're putting this on a web server that serves it as ISO-8859-1? If you use Apache, in your Apache config file (or in .httaccess, if you can override), you should have a line
AddCharset utf-8 .js
(Note: edited to escape the ampersands... otherwise it didn't make sense.)

Categories

Resources