When I JSON.stringify() the following code:
var exampleObject = { "name" : "Žiga Kovač", "kraj" : "Žužemberk"};
I get different results between browsers.
IE8 and Google Chrome return:
{"name":"\u017diga Kova\u010d","kraj":"\u017du\u017eemberk"}
While Firefox and Opera return:
{"name":"Žiga Kovač","kraj":"Žužemberk"}
I am using the browser's native JSON implementation in all 4 browsers. If I undefine the native JSON implementation and replace it with the one from json.org, then all browsers return:
{"name":"Žiga Kovač","kraj":"Žužemberk"}
Why is this happening, which result is correct and is it possible to make that all browsers return:
{"name":"\u017diga Kova\u010d","kraj":"\u017du\u017eemberk"}
?
These two representations are absolutely equivalent.
The one uses Unicode escape sequences (\uxxxx) to represent a Unicode character, the other uses an actual Unicode character. json.org defines a string as:
string
- ""
- "chars"
chars
- char
- char chars
char
- any Unicode character except " or \ or control characters
- one of: \" \\ \/ \b \f \n \r \t
- \u four-hex-digits
There is no difference in the strings themselves, only in their representation. This is the same thing HTML does when you use ©, © or © to represent the copyright sign.
The 'correct' (visibly) version is a UTF8 string, and the escaped string is an ASCII string with UTF8 escape codes. While the first one can be used in an HTTP body (as long as content-encoding is set to UTF8), the second one can also be used in an HTTP GET request header.
If you want to use the UTF8 version in a GET request, you need to escape it first, using encodeURIComponent.
When the content is received on the server side, the native string implementation will make sure that it contains exactly the same data (from all clients), provided that the HTTP transmission is correct.
Your browser will generally handle the encoding of it, if you send it as an HTTP POST body.
Both result's are correct, as long as your first example is encoded in UTF-8.
e.g. \u017d ist just another notation of Ž (017d is the position in UTF8-charset)
They are all correct. Some are returning it encoded in UTF-8, and some in ASCII.
Related
Here's the "data:" URI, load this in browser.
data:text/html,<script>alert("#")</script>
I get alert() executed in Chrome, but not in Firefox.
Firefox removes "#" character and all the following characters.
How can I make FF to alert("#")?
Update: I understand the "#" fragment part, but the question is more like "Why did Chrome ignore the "fragment" case and consider it as a normal character, when FF didn't?".
The data portion of a data URI must be encoded, # as a literal character is not allowed. From the Wikipedia page
The data, separated from the preceding part by a comma (,). The data is a sequence of octets represented as characters. Permitted characters within a data URI are the ASCII characters for the lowercase and uppercase letters of the modern English alphabet, and the Arabic numerals. Octets represented by any other character must be percent-encoded, as in %26 for an ampersand (&).
...which cites RFC3986.
So your data URI should be:
data:text/html,%3Cscript%3Ealert(%22%23%22)%3C%2Fscript%3E
...which works in both Chrome and Firefox:
Click here
You can get the URI data using JavaScript's encodeURIComponent, e.g.:
var dataUri = "data:text/html," + encodeURIComponent('<script>alert("#")</script>');
A # has special meaning in a URL (it indicates the start of the fragment part). You have to encode it as %23 to include it as data.
According to MDN, The 'encodeURI()' function:
replacing each instance of certain characters by one, two, three, or four escape sequences representing the UTF-8 encoding of the character
However, when invoking encodeURI('\u0082') (in Chrome) Im getting %C2%81 as output.
I expected to get %82 or %00%82. What does the %C2 mean?
The '0082' in '\u0082' is the Unicode code point, not the UTF-8 bytes representation.
UTF-8 maps u+0082 code point to two bytes: C2+81
Unicode to UTF-8 mapping table
Decoding %C2 at http://www.albionresearch.com/misc/urlencode.php leads to Â
When dealing with German texts and ISO 8859-15 / ISO 8859-1 vs. UTF-8 I often ran into the à character. The characters are quite close to each other. May this also be an encoding problem?
Maybe HTML encoding issues - "Â" character showing up instead of " " helps.
I have a pretty simple question, but a few simple googling and stachexchange queries were not able to answer it, so i guess i'm missing something here.
Here are my simplified parameters:
I'm using Javascript.
I have a text that needs to get URLEncoded and the text have more than 1 line.
My question is: What is the character for newline before the text get encoded? (I know that after the encoding the newline will be encoded into %0A)
I guess asking "What char is decoded when decoding %0A" will be the same.
Those codes consist of a percent sign, followed by a two character hexadecimal number representing a byte value.
So in this case, the byte value is 0A, representing the ASCII newline character. This is commonly written as \n inside strings in JavaScript (and others, like PHP).
But I think your question suggests you want to do some search and replace for this character. I would not do that, since there can be other characters too that need encoding. Instead, use the function encodeURIComponent, which can encode the entire string for you. There is encodeURI as well, but in your case, I think the first is more appropriate.
This example shows how special characters (newline, space, and others) are encoded to an url-friendly format. Note that the diacritic é translates to the two bytes of its UTF-8 representation.
document.write(encodeURIComponent("Normal text\nEéy, check the specials: /, + and \t!"));
Say I have the following Javascript instruction:
var a="hiàja, c . Non di–g t";
a contains binary data, i.e., any ASCII from 0-255.
Before what ASCII bytes should I add backslash so that a is read properly? (for example, before ").
Should I use an specific charset and content-type different than text/Javascript and UTF-8?
Thanks
The ASCII range is 0 to 127, but strings are not limited to ASCII in JavaScript. According to the ECMAScript standard, “All characters may appear literally in a string literal except for the closing quote character, backslash, carriage return, line separator, paragraph separator, and line feed.” If the encoding of your document is suitable (e.g., windows-1252 or utf-8) and properly declared, you can use your example string as it is.
What is the difference between the JavaScript functions decodeURIComponent and decodeURI?
To explain the difference between these two let me explain the difference between encodeURI and encodeURIComponent.
The main difference is that:
The encodeURI function is intended for use on the full URI.
The encodeURIComponent function is intended to be used on .. well .. URI components that is
any part that lies between separators (; / ? : # & = + $ , #).
So, in encodeURIComponent these separators are encoded also because they are regarded as text and not special characters.
Now back to the difference between the decode functions, each function decodes strings generated by its corresponding encode counterpart taking care of the semantics of the special characters and their handling.
encodeURIComponent/decodeURIComponent() is almost always the pair you want to use, for concatenating together and splitting apart text strings in URI parts.
encodeURI in less common, and misleadingly named: it should really be called fixBrokenURI. It takes something that's nearly a URI, but has invalid characters such as spaces in it, and turns it into a real URI. It has a valid use in fixing up invalid URIs from user input, and it can also be used to turn an IRI (URI with bare Unicode characters in) into a plain URI (using %-escaped UTF-8 to encode the non-ASCII).
Where encodeURI should really be named fixBrokenURI(), decodeURI() could equally be called potentiallyBreakMyPreviouslyWorkingURI(). I can think of no valid use for it anywhere; avoid.
js> s = "http://www.example.com/string with + and ? and & and spaces";
http://www.example.com/string with + and ? and & and spaces
js> encodeURI(s)
http://www.example.com/string%20with%20+%20and%20?%20and%20&%20and%20spaces
js> encodeURIComponent(s)
http%3A%2F%2Fwww.example.com%2Fstring%20with%20%2B%20and%20%3F%20and%20%26%20and%20spaces
Looks like encodeURI produces a "safe" URI by encoding spaces and some other (e.g. nonprintable) characters, whereas encodeURIComponent additionally encodes the colon and slash and plus characters, and is meant to be used in query strings. The encoding of + and ? and & is of particular importance here, as these are special chars in query strings.
As I had the same question, but didn't find the answer here, I made some tests in order to figure out what the difference actually is.
I did this, since I need the encoding for something, which is not URL/URI related.
encodeURIComponent("A") returns "A", it does not encode "A" to "%41"
decodeURIComponent("%41") returns "A".
encodeURI("A") returns "A", it does not encode "A" to "%41"
decodeURI("%41") returns "A".
-That means both can decode alphanumeric characters, even though they did not encode them. However...
encodeURIComponent("&") returns "%26".
decodeURIComponent("%26") returns "&".
encodeURI("&") returns "&".
decodeURI("%26") returns "%26".
Even though encodeURIComponent does not encode all characters, decodeURIComponent can decode any value between %00 and %7F.
Note: It appears that if you try to decode a value above %7F (unless it's a unicode value), then your script will fail with an "URI error".
encodeURIComponent()
Converts the input into a URL-encoded
string
encodeURI()
URL-encodes the input, but
assumes a full URL is given, so
returns a valid URL by not encoding
the protocol (e.g. http://) and
host name (e.g.
www.stackoverflow.com).
decodeURIComponent() and decodeURI() are the opposite of the above
decodeURIComponent will decode URI special markers such as &, ?, #, etc, decodeURI will not.
encodeURIComponent
Not Escaped:
A-Z a-z 0-9 - _ . ! ~ * ' ( )
encodeURI()
Not Escaped:
A-Z a-z 0-9 ; , / ? : # & = + $ - _ . ! ~ * ' ( ) #
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURI
Encode URI:
The encodeURI() method does not encodes:
, / ? : # & = + $ * #
Example
URI: https://my test.asp?name=ståle&car=saab
Encoded URI: https://my%20test.asp?name=st%C3%A5le&car=saab
Encode URI Component:
The encodeURIComponent() method also encodes:
, / ? : # & = + $ #
Example
URI: https://my test.asp?name=ståle&car=saab
Encoded URI: https%3A%2F%2Fmy%20test.asp%3Fname%3Dst%C3%A5le%26car%3Dsaab
For More: W3Schoools.com