utf-8 text is being garbled - javascript

I'm still new to webdev and dealing with character set encodings. I've read http://kunststube.net/encoding/ along with a few other pieces on the subject.
My problem is that I've got a bunch of text that I'm pulling from a server. It is encoded and served as utf-8.
However, when I display the strings, the french / spanish accents are garbled up. I've googled around and it seems JavaScript engines use UCS-2 or UTF-16 internally. Is there something I have to do to get it to treat my text as UTF-8? I have the <meta charset="utf-8"> in my html, but it doesn't seem to do anything.
Any ideas?

Without any links, I can't inspect what you are doing directly, but you shouldn't need to do anything special inside JavaScript to get it to work, just make sure all your sources are set to UTF-8 correctly, and that the browser is interpreting them as such.
You may need to make sure your server (Apache? IIS?) is setting the appropriate encode header. For example in PHP:
header('Content-Type: text/plain; charset=utf-8');
header('Content-Type: text/html; charset=utf-8');
Or in .htaccess there are many ways to do it. A couple of ways:
AddCharset UTF-8 .html
or specific files:
<Files "example.js">
AddCharset UTF-8 .js
</Files>
refs:
http://us2.php.net/manual/fr/function.header.php
https://www.w3.org/International/questions/qa-htaccess-charset.en

If you don't have meta tag in your html then put it in the header :
<meta charset="UTF-8">
else , you have to declare character encoding in your script file

Related

How to fix UTF8 URL?

I have links in an html file like
href="%87%d9%84-%d9%8a%d9%86%d9%81%d8%b9-%d8%a7%d8%ae%d9%84%d9%89-%d8%a8%d8%b1%d9%86%d8%a7%d9%85%d8%ac-%d8%a7%d9%84%d9%85%d9%8a%d8%aa%d8%a7%d8%aa%d8%b1%d9%8a%d8%af-%d9%8a%d9%86%d8%a8%d9%87%d9%86%d9%89/index.html"
And I want when the user clicks on this link in the browser, the link will
be
RealUtf8Text/index.html
Is There any way to use it using .htaccess file ?
If not, how we can do it using a javascript file ?
I don't want to make changes in the files, just add .htaccess or javascript file and the problem is solved.
The problem appears to be that your URL does not contain valid UTF-8 data. There is no UTF-8 sequence that begins with the octet 87.
I'm guessing that your URL is missing a d9 or d8 octet. This URL:
http://localhost/%d9%87%d9%84-%d9%8a%d9%86%d9%81%d8%b9-%d8%a7%d8%ae%d9%84%d9%89-%d8%a8%d8%b1%d9%86%d8%a7%d9%85%d8%ac-%d8%a7%d9%84%d9%85%d9%8a%d8%aa%d8%a7%d8%aa%d8%b1%d9%8a%d8%af-%d9%8a%d9%86%d8%a8%d9%87%d9%86%d9%89/index.html
is shown as arabic characters in my browser:
How the URL is displayed will of course depend on the browser's support for arabic characters, and is not something that can be affected by JavaScript or .htaccess.
You use urldecode,
urldecode("%87%d9%84-%d9%8a%d9%86%d9%81%d8%b9-%d8%a7%d8%ae%d9%84%d9%89-%d8%a8%d8%b1%d9%86%d8%a7%d9%85%d8%ac-%d8%a7%d9%84%d9%85%d9%8a%d8%aa%d8%a7%d8%aa%d8%b1%d9%8a%d8%af-%d9%8a%d9%86%d8%a8%d9%87%d9%86%d9%89/index.html")
see the documentation in
http://php.net/manual/en/function.urldecode.php.
also make sure you have this on your html
<meta http-equiv="Content-type" content="text/html; charset=utf-8">

Use files with different encoding on HTML

I have a website hompage encoded on iso-8859-1.
Then into that website i include different css and javascript files encoded on utf-8.
There is a way for show correct characters into the page from js files without change all encoding?
It should not be an issue. You've probably failed to identify the encoding of some of the files. To be on the safe side:
Configure your web server to add a correct Content-Type HTTP header with a charset attribute, e.g.:
Content-Type: application/javascript; charset=utf-8
When the language supports it, identify the encoding from the document itself, e.g.:
HTML 4:<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
HTML 5:<meta charset="iso-8859-1">
CSS:#charset "UTF-8";
Declare the charset when linking the resource, e.g.:
<script type="text/javascript" src="foo.js" charset="utf-8"></script>
(This is actually deprecated.)
In practice, you can probably omit some of these steps. I'd say #1 is the most important.
If you mix encoding, then you will face difficulty in future, especially if your pages contain different locales. So always use UTF-8 encoding.
Also you can change iso-8859-1 to UTF-8, without any changes in body as UTF-8 contains all characters of any encoding.

Character encoding works on one page, but not the other

I have a page http://199.193.248.80/test/test.php that contains the « character.
But when I read this page with js on http://199.193.248.80/test/test.html, the character turns into �
Both pages are using Charset Windows-1252 so I have no idea why it works on one page but not the other. What needs to be done to fix this?
This is probably because PHP sets a different character set (when serving the .php) in the headers than Apache does (when serving the .html). Browsers use the character set that's mentioned in the response headers; it overrides the <meta> tags in fact.
By default PHP chooses iso-8859-1 I believe, but you can override the character set in PHP by using:
header('Content-Type: text/html; charset=windows-1252');
Or change the php.ini for a global change.
See also:
http://httpd.apache.org/docs/2.0/mod/core.html#adddefaultcharset (for Apache)
http://www.php.net/manual/en/ini.core.php#ini.default-charset (for PHP)
I suggest to use the HTML-entity form: «
This way it doesn't matter what charset you use for your file, because your browser just parses it.
In PHP you can use $str = htmlentities( $str ); to encode a string

Why is loadURIWithFlags not using the charset that I'm passing to it?

I have an HTML document stored in a file, with a UTF-8 encoding, and I want my extension to display this file in the browser, so I call loadURIWithFlags('file://' + file.path, flags, null, 'UTF-8', null); but it loads it as ISO-8859-1 instead of UTF-8. (I can tell because ISO-8859-1 is selected on the View>Character Encoding menu, and because non-breaking-space characters are showing up as an  followed by a space. If I switch to UTF-8 using the Character Encoding menu, then everything looks right.)
I tried including LOAD_FLAGS_BYPASS_CACHE and LOAD_FLAGS_CHARSET_CHANGE in the flags but that didn't seem to have any effect. I also checked that auto-detect was turned off, so that wasn't the problem either. Adding <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> to the document seems to have solved the problem, but I would expect that using the 'charset' argument of loadURIWithFlags should work just as well, so I'm wondering if I did something wrong in my initial attempt.
You did the right thing and the only solution is to include encoding information inside the document because if you rely only on HTTP headers you will fail to load the document when the document is saved on disk (because there is no such thing as headers for files).
If you are the one saving the file you could add the UTF-8 BOM to the file in order to assure that it will be properly loaded by Firefox or other applications.

iPhone browser/IIS/Tomcat, Japanese locale, http parameters getting messed

First the environment: the client is a mobile Safari on iPhone, the server consists of a Tomcat 5.5 fronted by IIS.
I have a piece of javascript code that sends a single parameter to the server and gets back some response:
var url = "/abc/ABCServlet";
var paramsString = "name=SomeName"
xmlhttpobj = getXmlHttpObject(); //Browser specific object returned
xmlhttpobj.onreadystatechange = callbackFunction;
xmlhttpobj.open("GET", url + "?" + paramsString, true);
xmlhttpobj.send(null);
This works fine when the iPhone language/locale is EN/US; but when the locale/language is changed to Japanese the query parameter received by the server becomes "SomeName#" without the quotes. Somehow a # is getting appended at the end.
Any clues why?
Hopefully, all you need to do is add a meta tag to the top of your HTML page that specifies the correct character set (e.g. <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />) and match whatever encoding your datafiles are expecting.
If that doesn't work, ensure that you are using the same character encoding (preferably UTF-8) throughout your application. Your server-side scripts and any files that include text strings you will be adding directly to the response stream should be saved with that single encoding. It's a good idea to have your servers send a "Content-Type" HTTP header of the same encoding if possible (e.g. "text/html; charset=utf-8"). And you should ensure that the mobile safari page that's doing the displaying has the right Content-Type meta tag.
Japanese developers have a nasty habit of storing files in EUC or ISO-2022-JP, both of which often force the browser to use different fonts faces on some browsers and can seriously break your page if the browser is expecting a Roman charset. The good news is that if you're forced to use one of the Japanese encodings, that encoding will typically display right for most English text. It's the extended characters you need to look out for.
Now I may be wrong, but I THOUGHT that loading these files via AJAX was not a problem (I think the browser remaps the character data according to the character set for every text file it loads), but as you start mixing document encodings in a single file (and especially in your document body), bad things can happen. Maybe mobile safari requires the same encoding for both HTML files and AJAX files. I hope not. That would be ugly.

Categories

Resources