External javascript writes in latin-1

External javascript writes in latin-1 - javascript

I'm a bit stuck, given my page includes an external JavaScript which uses document.write. The problem is my page is UTF-8 encoded, and the contents written are encoded in latin-1, which causes some display problems.
Is there any way to handle this ?

I have to admit never having had to mix encodings, but in theory you should be able to specify the charset attribute (link) on the script tag — but be sure you're not conflicting with it when serving the external file. From that link:
The charset attribute gives the character encoding of the external script resource...its value must be a valid character encoding name, must be an ASCII case-insensitive match for the preferred MIME name for that encoding, and must match the encoding given in the charset parameter of the Content-Type metadata of the external file, if any.
So that will tell the browser how to interpret the script data, provided your server provides the same charset (or doesn't supply any charset) in the Content-Type header when serving up the script file.
Once the browser is reading the script with the right charset, you should be okay, because by the time JavaScript is dealing with strings, they're UTF-16 (according to Section 8.4 of the 5th edition spec).

Related

Is URL encoding sufficient for href attribute for protection against XSS?

According to OWASP, user input into an href attribute should "...except for alphanumeric characters, escape all characters with ASCII values less than 256 with the %HH escaping format."
I don't understand the rationale behind this. Why can't URL encoding do the job? I have spent hours trying to create an attack vector for a URL string that is dynamically generated and presented back to the user, and to me, it seems like a pretty solid protection against XSS attacks.
I've also been looking into this for a while now and most people are advising to use URL encoding alongside HTML encoding. I totally get why HTML encoding is insufficient because other vectors can still be utilised such as onclick=alert()
Can someone show me an example of an attack vector being used to manipulate an href which is being rendered with URL encoding and without HTML encoding or the encoding suggested by owasp.org in rule #5?

No, if someone injects javascript:alert(0) then it will work. No method of encoding will prevent that, you should try to block javascript URI schemes along with all other URI schemes which would allow for XSS there, such as data: and blob: for example.
Recommended action is not to directly reflect user input into a link.
Additionally it is important to remember not to simply block these schemes precisely using something like preg_replace as line feeds would bypass this and produce an XSS payload. Such is: java%0a%0dscript:alert(0);. As you can see, a CRLF character was placed in the middle of the payload to prevent PHP (or other server-side languages) from recognizing it as javascript: which you have blocked. But HTML will still render this as javascript:alert(0); as the CRLF character is whitespace and ignored by HTML (within the value of an element's attribute), yet interpreted by PHP and other languages.

The encoding is context dependent. When you have a URL inside a HTML document, then you need both URL encoding and HTML encoding, but at different times.
...except for alphanumeric characters, escape all characters with
ASCII values less than 256 with the %HH escaping format.
This is recommending to use URL encoding. But not for the whole URL. The context is when inserting URL parameters into the URL They need to be URL encoded just to allow for example & symbols in a value.
Do not encode complete or relative URL's with URL encoding!
This is a separate rule for the whole URL Once the URL is encoded, then when inserting into a html attribute, then you apply html encoding.
You can't apply URL encoding to a complete URL, because it is already URL encoded and encoding it again will result in double encoding, corrupting the URL. For example, any % symbols in the original URL will be wrong.
HTML encoding is needed because of characters like ampersands are valid characters in URLs but have a different meaning in HTML because of character entities. It's possible for a URL to contain strings that look like HTML entities but aren't so need to be encoded when inserting into an HTML document.

special characters in title tag

How do you enter special characters in the title attribute of html?
My JavaScript code:
var link_english = document.createElement("img");
link_english.setAttribute("src", "images/english_out.png");
link_english.setAttribute("title", "The english version of Nauma.com"); //
link_english.className = "classEnglish";
My question is: How do I insert a special tag within the value of the title attribute?
Example:
link_english.setAttribute("title", "here I want to put a special character for a German letter, the " ü " ");
If it is a normal html file, no problem.

Just enter them.
You only need to ensure that the HTML page is saved using a character encoding which supports those characters (UTF-8 is preferred if you want world domination) and that the HTML page is been served over HTTP with a Content-Type header which signifies the very same character encoding, so that the browser understands how to display them to the human, e.g text/html;charset=UTF-8.
How to save the files the right way depends on the editor used, but usually this is available by Save As function and/or in the general editor settings. How to set the right HTTP content type header depends on the programming language and/or webserver used.
Update: the above answer still applies on your update. Save/serve file as UTF-8 and instruct browser by Content-Type header to interpret it as UTF-8. This applies on JS files as good as on HTML files and other text based files.

Yes you can but to enter special character you have to use its HTML entity. Here is the table for special characters and its HTML entity.

iPhone browser/IIS/Tomcat, Japanese locale, http parameters getting messed

First the environment: the client is a mobile Safari on iPhone, the server consists of a Tomcat 5.5 fronted by IIS.
I have a piece of javascript code that sends a single parameter to the server and gets back some response:
var url = "/abc/ABCServlet";
var paramsString = "name=SomeName"
xmlhttpobj = getXmlHttpObject(); //Browser specific object returned
xmlhttpobj.onreadystatechange = callbackFunction;
xmlhttpobj.open("GET", url + "?" + paramsString, true);
xmlhttpobj.send(null);
This works fine when the iPhone language/locale is EN/US; but when the locale/language is changed to Japanese the query parameter received by the server becomes "SomeName#" without the quotes. Somehow a # is getting appended at the end.
Any clues why?

Hopefully, all you need to do is add a meta tag to the top of your HTML page that specifies the correct character set (e.g. <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />) and match whatever encoding your datafiles are expecting.
If that doesn't work, ensure that you are using the same character encoding (preferably UTF-8) throughout your application. Your server-side scripts and any files that include text strings you will be adding directly to the response stream should be saved with that single encoding. It's a good idea to have your servers send a "Content-Type" HTTP header of the same encoding if possible (e.g. "text/html; charset=utf-8"). And you should ensure that the mobile safari page that's doing the displaying has the right Content-Type meta tag.
Japanese developers have a nasty habit of storing files in EUC or ISO-2022-JP, both of which often force the browser to use different fonts faces on some browsers and can seriously break your page if the browser is expecting a Roman charset. The good news is that if you're forced to use one of the Japanese encodings, that encoding will typically display right for most English text. It's the extended characters you need to look out for.
Now I may be wrong, but I THOUGHT that loading these files via AJAX was not a problem (I think the browser remaps the character data according to the character set for every text file it loads), but as you start mixing document encodings in a single file (and especially in your document body), bad things can happen. Maybe mobile safari requires the same encoding for both HTML files and AJAX files. I hope not. That would be ugly.

Convert ISO/Windows charsets to UTF-8 in Javascript

I'm developing a firefox plugin and i fetch web pages to do some analysis for the user. The problem is when i try to get (XMLHttpRequest) pages that are not utf-8 encoded the string i see is messed up. For example hebrew pages with windows-1125 or Chinese pages with gb2312.
I already tried the following:
var uDecoder=Components.classes["#mozilla.org/intl/scriptableunicodeconverter"].getService(Components.interfaces.nsIScriptableUnicodeConverter);
uDecoder.charset="windows-1255";
alert( xhr.responseText );
var decoder=Components.classes["#mozilla.org/intl/utf8converterservice;1"].getService(Components.interfaces.nsIUTF8ConverterService);
alert(decoder.convertStringToUTF8(xhr.responseText,"WINDOWS-1255",true));
I also tried escape/unescape/encodeURIComponent
any ideas???

Once XMLHttpRequest has tried to decode a non-UTF-8 string using UTF-8, you've already lost. The byte sequences in the page that weren't valid UTF-8 sequences will have been mangled (typically converted to �, the U+FFFD replacement character). No amount of re-encoding/decoding will get them back.
Pages that specify a Content-Type: text/html;charset=something HTTP header should be OK. Pages that don't have a real HTTP header but do have a <meta> version of it won't be, because XMLHttpRequest doesn't know about parsing HTML so it won't see the meta. If you know in advance the charset you want, you can tell XMLHttpRequest and it'll use it:
xhr.open(...);
xhr.overrideMimeType('text/html;charset=gb2312');
xhr.send();
(This is a currently non-standardised Mozilla extension.)
If you don't know the charset in advance, you can request the page once, hack about with the header for a <meta> charset, parse that out and request again with the new charset.
In theory you could get a binary response in a single request:
xhr.overrideMimeType('text/html;charset=iso-8859-1');
and then convert that from bytes-as-chars to UTF-8. However, iso-8859-1 wouldn't work for this because the browser interprets that charset as really being Windows code page 1252.
You could maybe use another codepage that maps every byte to a character, and do a load of tedious character replacements to map every character in that codepage to the character it would have been in real-ISO-8859-1, then do the conversion. Most encodings don't map every byte, but Arabic (cp1256) might be a candidate for this?

Using special Chars in Firefox and IE, are being encoded by the browser differently

I've got a multilingual site, that allows users to input text to search a form field, but the text goes through Javascript before heading off to the backend.
Special chars like "欢" are being properly handled in Firefox, but not in any version of IE.
Can someone help me understand what's going on?
Thanks!

You might find it useful to add the accept-charset attribute to your form. This specifies to the browser what character-set the server accepts. Your JS should follow this and send it in that format.
Some other things that can affect the way IE handles character encoding:
Specifying the correct doctype (ie, standards vs. "compliance" modes).
The Content-Type header sent by the server; I believe most browsers adhere to the header over the meta-tag, so if your server is specifying ISO-8859-1 and your page specifies UTF-8 there will be some confusion.
The format of the Content-Type header; some "modern" browsers (specifically FF) accept utf8 as an alias of utf-8. IE does not, and falls-back to ISO-8859-1. (This comes from painful personal experience! ;)
Character-sets are a real pain. You need to ensure that all the components are talking the same "language" front-to-back - that includes both storage and communication.
The next step to track down what is going on is to have your server code log the headers for your JS request to be sure that the encoding matches what you're expecting.

Some browsers default differently, set the default encoding for your site forcefully by utilizing the meta tag for encoding. As here:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
UTF-8 may not be what you're looking for but is the most likely. Testing, go to view->character encoding in firefox and set it manually. By knowing which one works, you will know which to set it to. A list of schemas here:
http://tlt.its.psu.edu/suggestions/international/web/tips/declare.html
and more here: http://tlt.its.psu.edu/suggestions/international/bylanguage/index.html

Ensure your characters and your page are both using UTF-8 encoding.

Develop Reference

JavaScript is the programming language of the Web.