Accents in inline javascript - javascript

I know that for javascript alerts accented characters should be written in unicode hex (for example š is \u0161).
But what if the javascript function which contains the alert is inlined inside a html document, that has a UTF-8 encoding? Should I still write the accented characters in unicode hex?
Example:
<!DOCTYPE html>
<html lang="ro">
<head>
<meta charset="UTF-8">
</head><body>
<script type="text/javascript">
window.onload=error_alert();function error_alert(){alert('A apărut o eroare!')}
</script>
</body></html>
Is it safe to write the character "ă" just like that, or should I use \u0103 ? Remember that meta charset is already defined as UTF-8.

Yes, you could still use escape sequences in JavaScript strings, even if they’re inline in the HTML.
Is it safe to write the character "ă" just like that, or should I use \u0103 ? Remember that meta charset is already defined as UTF-8.
It depends. If you’re sure that the HTML document includes <meta charset=utf-8> and will always do so, or if the server sends the Content-Type: text/html;charset=utf-8 HTTP header and you’re sure it will always do so, then by all means, use the raw symbols without escaping them.
There’s a problem, though — these things are not always under your control. Visitors might be behind a proxy that strips or mangles HTTP headers or modifies the encoding of the HTML response.
In any case, the safest thing to do is to use escape sequences for any non-ASCII characters, as that way the entire source code consists of ASCII characters only. (You could use HTML character references for the HTML parts of the document, and CSS escape sequences for CSS.)

Related

Convert unicode to Chinese characters

Supposing I have a string of code like so:
\u00e5\u00b1\u00b1\u00e4\u00b8\u008a\u00e7\u009a\u0084\u00e4\u00ba\u00ba
How would I convert these back into Chinese characters using Javascript:
山上的人
This is so that I can actually display Chinese on my web page. Right now it comes out as å±±ä¸ç人.
This website manages to accomplish this, however this is with PHP they don't expose.
I am not familiar with how character encoding works well at all, so I don't even know the terminology to search for a proper solution.
The string appears to be in UTF-8.
https://github.com/mathiasbynens/utf8.js is a helpful Javascript library that saves you the headache of learning the UTF-8 standard, and will decode the UTF-8 into text.
Here's a demo: https://mothereff.in/utf-8
Paste in \u00e5\u00b1\u00b1\u00e4\u00b8\u008a\u00e7\u009a\u0084\u00e4\u00ba\u00ba into the "UTF-8-encoded" textarea to decode it.
Add <meta charset="UTF-8"> inside the <head></head> tag of your HTML file so that it will display Chinese properly. Just put the Chinese characters directly in your HTML file

How do I properly escape inline Javascript in a <script> tag?

I'm writing a server-side function for a framework that will let me inline a Javascript file. It takes the filename as input, and its output would be like this:
<script>
/* contents of Javascript file */
</script>
How do I escape the contents of the Javascript file safely?
I am particularly worried if the file contains something like </script>. If the input Javascript file has syntax errors, I still want it to escape correctly. I also realise that XHTML expects some entities to be encoded, whereas HTML doesn't.
There are a lot of questions similar to this asking about how to escape string literals or JSON. But I want something that can handle the general case, so that I can write a tool for the general case.
(I realise inlining potentially untrusted Javascript isn't the best idea, so no need to spend time discussing that.)
This is a work in progress, let me know if I've missed a corner case!
The answer is different depending on whether you're using XHTML or HTML.
1. XHTML with Content-Type: application/xhtml+xml header
In this case, you can simply XML escape any entities, turning this file:
console.log("Example Javascript file");
console.log(1</script>2/);
console.log("That previous line prints false");
To this:
<script>
console.log("Example Javascript file");
console.log(1</script>2/);
console.log("That previous line prints false");
</script>
Note that if you're using XHTML with a different Content-Type header, then different browsers may behave differently, and I haven't researched it, so I would recommend fixing the Content-Type header.
2. HTML
Unfortunately, I know of no way to escape it properly in this case (without at least parsing the Javascript). Replacing all instances of / with \/ would cause some Javascript to break, including the previous example.
The best that I can recommend is that you search for </script case-insensitively and throw an exception if you find it. If you're only dealing with string literals or JSON, substitute all instances of / with \/.
Some Javascript minifiers might deal with </script in a safe manner perhaps, let me know if you find one.

Escape HTML tags. Any issue possible with charset encoding?

I have a function to escape HTML tags, to be able to insert text into HTML.
Very similar to:
Can I escape html special chars in javascript?
I know that Javascript use Unicode internally, but HTML pages may be encoded in different charsets like UTF-8 or ISO8859-1, etc..
My question is: There is any issue with this very simple conversion? or should I take into consideration the page charset?
If yes, how to handle that?
PS: For example, the equivalente PHP function (http://php.net/manual/en/function.htmlspecialchars.php) has a parameter to select a charset.
No, JavaScript lives in the Unicode world so encoding issues are generally invisible to it. escapeHtml in the linked question is fine.
The only place I can think of where JavaScript gets to see bytes would be data: URLs (typically hidden beneath base64). So this:
var markup = '<p>Hello, '+escapeHtml(user_supplied_data);
var url = 'data:text/html;base64,'+btoa(markup);
iframe.src = url;
is in principle a bad thing. Although I don't know of any browsers that will guess UTF-7 in this situation, a charset=... parameter should be supplied to ensure that the browser uses the appropriate encoding for the data. (btoa uses ISO-8859-1, for what it's worth.)

Efficient way to encode scandinavian letters in a string

My JavaScript get string value from server:
var name = VALUE_FROM_SERVER;
This name will be shown on a web page, since name contains Scandinavian letter (for example, name could be TÖyoeävä) I need to encode it somehow to display it correctly on the browser.
In JavaScript, what is the most efficient way to encode all those Scandinavian letters?
(I prefer to do it with Javascript.)
(e.g. I would like to create a JS function which takes TÖyoeävä as parameter and returns TÖyoeävä)
var encoder=function(string){
for(var s=0; s<string.length; s++){
//Check each letter in the string, if it is Scandinavian, encode it??
}
}
I'd advise you not to encode in the JS. Just make sure your (html) page encoding matches what your server is returning.
Preferably, that would be a UTF-8 encoding (to be able to support other languages down the road). But if it's just Scandinavian languages you're interested in, ISO-8859-1 (Latin 1) is enough.
There's no way to tell from a random byte string if it's one encoding or another (generally speaking anyway). So you have to know in your Javascript what encoding the server is sending.
You also have to set your page's encoding at some point, and that point has to be before the browser starts interpreting its content.
So all in all, getting encoding A from the server en converting to encoding B on the client side is going to be tricky, and pretty much a waste of time (IMO). You're not gaining any flexibility that I can see, except allowing your server to change encodings, which doesn't seem like such a good idea.
UTF-8 all the way will save you headaches.
Just use the UTF-8 charset when sending the response from the server. For example
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

international characters in Javascript

I am working on a web application, where I transfer data from the server to the browser in XML.
Since I'm danish, I quickly run into problems with the characters æøå.
I know that in html, I use the "&aelig;&oslash;&aring;" for æøå.
however, as soon as the chars pass through JavaScript, I get black boxes with "?" in them when using æøå, and "æøå" is printed as is.
I've made sure to set it to utf-8, but that isn't helping much.
Ideally, I want it to work with any special characters (naturally).
The example that isn't working is included below:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
<script type="text/javascript" charset="utf-8">
alert("æøå");
alert("æøå");
</script>
</head>
<body>
</body>
</html>
What am I doing wrong?
Ok, thanks to Grapefrukts answer, I got it working.
I actually needed it for data coming from an MySQL server. Since the saving of the files in UTF-8 encoding only solves the problem for static content, I figure I'd include the solution for strings from a MySQL server, pulled out using PHP:
utf8_encode($MyStringHere)
If you ever can't set the response encoding, you can use \u escape sequence in the JavaScript string literal to display these characters.
alert("\u00e6\u00f8\u00e5")
Just specifying UTF-8 in the header is not enough. I'd bet you haven't saved your file as UTF-8. Any reasonably advanced text editor will have this option. Try that and I'm sure it'll work!
You can also use String.fromCharCode() to output a character from a numeric entity.
e.g. String.fromCharCode( 8226 ) will create a bullet character.
I get "æøå" for the first one and some junk characters for the next. Could it be that the javascript is not mangling (or mojibake) your letters but the alert dialog uses the system default font, and the font is incapable of displaying the letters?
I use the code like this with Thai language. It's fine.
$message is my PHP variable.
echo("<html><head><meta charset='utf-8'></head><body><script type='text/javascript'>alert('" . $message . "');</script></body></html>");
Hope this can help. Thank you.
(I cannot post image of what I did as the system said "I don't have enough reputation", so I leave the image, here. http://goo.gl/9P3DtI Sorry for inconvenience.)
Sorry for my weak English.
This works as expected for me:
alert("æøå");
... creates an alert containing the string "æøå" whereas
alert("æøå");
... creates an alert with the non-ascii characters.
Javascript is pretty utf-8 clean and doesn't tend to put obstacles in your way.
Maybe you're putting this on a web server that serves it as ISO-8859-1? If you use Apache, in your Apache config file (or in .httaccess, if you can override), you should have a line
AddCharset utf-8 .js
(Note: edited to escape the ampersands... otherwise it didn't make sense.)

Categories

Resources