Efficient way to encode scandinavian letters in a string

Efficient way to encode scandinavian letters in a string - javascript

My JavaScript get string value from server:
var name = VALUE_FROM_SERVER;
This name will be shown on a web page, since name contains Scandinavian letter (for example, name could be TÖyoeävä) I need to encode it somehow to display it correctly on the browser.
In JavaScript, what is the most efficient way to encode all those Scandinavian letters?
(I prefer to do it with Javascript.)
(e.g. I would like to create a JS function which takes TÖyoeävä as parameter and returns TÖyoeävä)
var encoder=function(string){
for(var s=0; s<string.length; s++){
//Check each letter in the string, if it is Scandinavian, encode it??
}
}

I'd advise you not to encode in the JS. Just make sure your (html) page encoding matches what your server is returning.
Preferably, that would be a UTF-8 encoding (to be able to support other languages down the road). But if it's just Scandinavian languages you're interested in, ISO-8859-1 (Latin 1) is enough.
There's no way to tell from a random byte string if it's one encoding or another (generally speaking anyway). So you have to know in your Javascript what encoding the server is sending.
You also have to set your page's encoding at some point, and that point has to be before the browser starts interpreting its content.
So all in all, getting encoding A from the server en converting to encoding B on the client side is going to be tricky, and pretty much a waste of time (IMO). You're not gaining any flexibility that I can see, except allowing your server to change encodings, which doesn't seem like such a good idea.
UTF-8 all the way will save you headaches.

Just use the UTF-8 charset when sending the response from the server. For example
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Related

javascript string internal representation

As far as I know java uses UTF-16 to represent chars and string internally,
so if we load a text file from a file it is automatically decoded to its original encoding to utf-16.
Now the same can be said also for javascript
it also uses utf-16 as the internal string representation.
Suppose we load a string x encoded in utf-8 using ajax,
a converion takes place in order for javascript to be able to represent internally that string in UTF-16.
Please tell me if any of what I stated is correct or not,
because the real question is yet to come...
Now suppose the browser is rendering a page using utf-8 encoding,
and using javascript we want the browser to render also the ajax string x (as you normally do)
Would, in this case, a further conversion be needed from utf-16 to utf-8 ?
Thanks in advance.

According to this article, it is USC-2 or UTF-16

Escape HTML tags. Any issue possible with charset encoding?

I have a function to escape HTML tags, to be able to insert text into HTML.
Very similar to:
Can I escape html special chars in javascript?
I know that Javascript use Unicode internally, but HTML pages may be encoded in different charsets like UTF-8 or ISO8859-1, etc..
My question is: There is any issue with this very simple conversion? or should I take into consideration the page charset?
If yes, how to handle that?
PS: For example, the equivalente PHP function (http://php.net/manual/en/function.htmlspecialchars.php) has a parameter to select a charset.

No, JavaScript lives in the Unicode world so encoding issues are generally invisible to it. escapeHtml in the linked question is fine.
The only place I can think of where JavaScript gets to see bytes would be data: URLs (typically hidden beneath base64). So this:
var markup = '<p>Hello, '+escapeHtml(user_supplied_data);
var url = 'data:text/html;base64,'+btoa(markup);
iframe.src = url;
is in principle a bad thing. Although I don't know of any browsers that will guess UTF-7 in this situation, a charset=... parameter should be supplied to ensure that the browser uses the appropriate encoding for the data. (btoa uses ISO-8859-1, for what it's worth.)

Decoding cp1251 to UTF-8 in javascript

How to decode cp-1251 to UTF-8 in javascript?
The cp-1251 is from a datafeed, which required to decode from js client side.
There is no way to change server side output, since it is related to a 3rd party, and due to some reason, I would not use any server side programming to convert the datafeed to become another datafeed.

(Assuming that by "UTF-8" you meant the JS strings in their native encoding...)
Depending on the format your 'cp-1251' data is in and depending on the browsers you need to support, you can choose from:
TextDecoder.decode() API (decodes a sequence of octets from a typed array, like Uint8Array) - if you're using web sockets, you can get an ArrayBuffer out of it to decode.
https://github.com/mathiasbynens/windows-1251 operates on something it calls 'byte strings' (JS Strings consisting of characters like \u00XY, where 0xXY is the encoded byte.
build the decoding table yourself (example)
Note that in most cases (not something as low-level as websockets though) it might be easier to read the data in the correct encoding before it ends up as a JS string (for example, you can force XMLHttpRequest to use a certain encoding even if the server misreports the encoding).

UTF-8 in HTML input added by JavaScript

I just don't get it.
My case is, that my application is sending all the needed GUI text by JSON at page startup from my PHP server. On my PHP server I have all text special characters written in UTF-8. Example: Für
So on the client side I have exactly the same value, and it gets displayed nicely everywhere except on input fields. When I do this with JavaScript:
document.getElementById('myInputField').value = "FÖr";
Then it is written exactly like that without any transformation into the special character.
Did I understand something wrong in UTF-8 concepts?
Thanks for any hints.

The notation ü has nothing particular to do with UTF-8. The use of character references is a common way of avoiding the need to use UTF-8; they can be used with any encoding, but if you use UTF-8, you don’t need them.
The notation ü is an HTML notation, not JavaScript. Whether it gets interpreted by HTML rules when it appears inside your JavaScript code depends on the context (like JavaScript inside an HTML document vs. separate JavaScript file). This problem is best avoided by using either characters as such or by using JavaScript notations for characters.
For example, ü means the same as ü, i.e. U+00FC, ü (u with diaeresis). The JavaScript notation, for use inside string literals, for this is \u00fc (\u followed by exactly four hexadecimal digits). E.g., the following sets the value to “Für”:
document.getElementById('myInputField').value = "F\u00fcr";

Your using whats called HTML entities to encode characters which it not the same as UTF-8, but of course a UTF-8 string can include HTML entities.
I think the problem is that tag attributes can't include HTML entities so you have to use some other encoding when assigning the text input value attribute. I think you have two options:
Decode the HTML entity on the client side. A quite ugly solution to piggyback on the decoder available in the browser (im using jQuery in the example, but you probably get the point).
inputElement.value = $("<p/>").html("FÖr").text();
Another option, which is think is nicer, is to not send HTML entities in the server response but instead use proper UTF-8 encoding for all characters which should work fine when put into text nodes or tag attributes. This assumes the HTML page uses UTF-8 encoding of course.

what the function that I can use in Javascript to convert from one character encoding to another?

what the built-in or user-defined function that I can use in Javascript or jQuery to convert from one character encoding to another?
For Example,
FROM "utf-8" TO "windows-1256"
OR
FROM "windows-1256" TO "utf-8"
A practical use of that is if you have a php page with specific character encoding like "windows-1256" that you could not change it according to the business needs and when you use ajax to send a block data from database using json which uses "utf-8" encoding only so you need to convert the ouput of json to this encoding so that the characters and the strings will be displayed well
Thanks in advance .....

From the standpoint of a JavaScript runtime environment, there's really no such thing as character encodings – the messiness of encodings is abstracted away from you. By spec, all JS source text is interpreted as Unicode characters, and all Strings are Unicode.
As such, there's no way in JavaScript to represent characters in anything other than Unicode. Look at the methods available on a String instance – you'll see there's nothing related to character encoding.
Because JavaScript runs in Unicode, and all JavaScript strings are stored in Unicode, all AJAX calls will be transmitted over the wire in Unicode. From the jQuery AJAX docs:
Data will always be transmitted to the server using UTF-8 charset; you must decode this appropriately on the server side.
Your PHP script is going to have to cope with Unicode input from AJAX calls.

Develop Reference

JavaScript is the programming language of the Web.

Efficient way to encode scandinavian letters in a string - javascript

Just use the UTF-8 charset when sending the response from the server. For example <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Related

javascript string internal representation

Escape HTML tags. Any issue possible with charset encoding?

Decoding cp1251 to UTF-8 in javascript

UTF-8 in HTML input added by JavaScript

what the function that I can use in Javascript to convert from one character encoding to another?

Categories

Resources