What is the difference between the JavaScript functions decodeURIComponent and decodeURI?
To explain the difference between these two let me explain the difference between encodeURI and encodeURIComponent.
The main difference is that:
The encodeURI function is intended for use on the full URI.
The encodeURIComponent function is intended to be used on .. well .. URI components that is
any part that lies between separators (; / ? : # & = + $ , #).
So, in encodeURIComponent these separators are encoded also because they are regarded as text and not special characters.
Now back to the difference between the decode functions, each function decodes strings generated by its corresponding encode counterpart taking care of the semantics of the special characters and their handling.
encodeURIComponent/decodeURIComponent() is almost always the pair you want to use, for concatenating together and splitting apart text strings in URI parts.
encodeURI in less common, and misleadingly named: it should really be called fixBrokenURI. It takes something that's nearly a URI, but has invalid characters such as spaces in it, and turns it into a real URI. It has a valid use in fixing up invalid URIs from user input, and it can also be used to turn an IRI (URI with bare Unicode characters in) into a plain URI (using %-escaped UTF-8 to encode the non-ASCII).
Where encodeURI should really be named fixBrokenURI(), decodeURI() could equally be called potentiallyBreakMyPreviouslyWorkingURI(). I can think of no valid use for it anywhere; avoid.
js> s = "http://www.example.com/string with + and ? and & and spaces";
http://www.example.com/string with + and ? and & and spaces
js> encodeURI(s)
http://www.example.com/string%20with%20+%20and%20?%20and%20&%20and%20spaces
js> encodeURIComponent(s)
http%3A%2F%2Fwww.example.com%2Fstring%20with%20%2B%20and%20%3F%20and%20%26%20and%20spaces
Looks like encodeURI produces a "safe" URI by encoding spaces and some other (e.g. nonprintable) characters, whereas encodeURIComponent additionally encodes the colon and slash and plus characters, and is meant to be used in query strings. The encoding of + and ? and & is of particular importance here, as these are special chars in query strings.
As I had the same question, but didn't find the answer here, I made some tests in order to figure out what the difference actually is.
I did this, since I need the encoding for something, which is not URL/URI related.
encodeURIComponent("A") returns "A", it does not encode "A" to "%41"
decodeURIComponent("%41") returns "A".
encodeURI("A") returns "A", it does not encode "A" to "%41"
decodeURI("%41") returns "A".
-That means both can decode alphanumeric characters, even though they did not encode them. However...
encodeURIComponent("&") returns "%26".
decodeURIComponent("%26") returns "&".
encodeURI("&") returns "&".
decodeURI("%26") returns "%26".
Even though encodeURIComponent does not encode all characters, decodeURIComponent can decode any value between %00 and %7F.
Note: It appears that if you try to decode a value above %7F (unless it's a unicode value), then your script will fail with an "URI error".
encodeURIComponent()
Converts the input into a URL-encoded
string
encodeURI()
URL-encodes the input, but
assumes a full URL is given, so
returns a valid URL by not encoding
the protocol (e.g. http://) and
host name (e.g.
www.stackoverflow.com).
decodeURIComponent() and decodeURI() are the opposite of the above
decodeURIComponent will decode URI special markers such as &, ?, #, etc, decodeURI will not.
encodeURIComponent
Not Escaped:
A-Z a-z 0-9 - _ . ! ~ * ' ( )
encodeURI()
Not Escaped:
A-Z a-z 0-9 ; , / ? : # & = + $ - _ . ! ~ * ' ( ) #
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURI
Encode URI:
The encodeURI() method does not encodes:
, / ? : # & = + $ * #
Example
URI: https://my test.asp?name=ståle&car=saab
Encoded URI: https://my%20test.asp?name=st%C3%A5le&car=saab
Encode URI Component:
The encodeURIComponent() method also encodes:
, / ? : # & = + $ #
Example
URI: https://my test.asp?name=ståle&car=saab
Encoded URI: https%3A%2F%2Fmy%20test.asp%3Fname%3Dst%C3%A5le%26car%3Dsaab
For More: W3Schoools.com
Related
I am using a url to open a html page, and i am sending data in querystring withe the page url.
For example: abc.html?firstParameter=firstvalue&seconedParameter=seconedvalue
Problem is that if firstvalue or secondvalue in parameter contains
special character like #,(,),%,{, then my url is not constructing well. In this case url is not validating.
I am doing all this in javascript.
Can any body please help me out this.
You have 3 options:
escape() will not encode: #*/+
encodeURI() will not encode: ~!##$&*()=:/,;?+'
encodeURIComponent() will not encode: ~!*()'
But in your case, if you want to pass a url into a GET parameter of other page, you should use escape or encodeURIComponent, but not encodeURI.
To be safe and ensure that you've escaped all the reserved characters specified in both RFC 1738 and RFC 3986 you should use a combination of encodeURIComponent, escape and a replace for the asterisk('*') like this:
encoded = encodeURIComponent( parm ).replace(/[!'()]/g, escape).replace(/\*/g, "%2A");
[Explanation]
While RFC 1738: Uniform Resource Locators (URL) specifies that the *, !, ', ( and ) characters may be left unencoded in the URL,
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.
RFC 3986, pages 12-13, states that these special characters are reserved as sub-delimiters.
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "#"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
The escape() function has been deprecated but can be used to URL encode the exclamation mark, single quote, left parenthesis and right parenthesis. And since there is some ambiguity on whether an asterisk must be encoded in a URL, and it doesn't hurt to encode, it you can explicitly encode is using something like the replace() function call. [Note that the escape() function is being passed as the second parameter to the first replace() function call. As used here, replace calls the escape() function once for each matched special character of !, ', ( or ), and escape merely returns the 'escape sequence' for that character back to replace, which reassembles any escaped characters with the other fragments.]
Also see 'https://stackoverflow.com/questions/6533561/urlencode-the-asterisk-star-character'
Also while some websites have even identified the asterkisk(*) as being a reserved character under RFC3986, they don't include it in their URL component encoding tool.
Unencoded URL parms:
parm1=this is a test of encoding !##$%^&*()'
parm2=note that * is not encoded
Encoded URL parms:
parm1=this+is+a+test+of+encoding+%21%40%23%24%25%5E%26*%28%29%27
parm2=note+that+*+is+not+encodeds+not+encoded
Which of these two methods should be used for encoding URLs?
It depends on what you are actually wanting to do.
encodeURI assumes that the input is a complete URI that might have some characters which need encoding in it.
encodeURIComponent will encode everything with special meaning, so you use it for components of URIs such as
var world = "A string with symbols & characters that have special meaning?";
var uri = 'http://example.com/foo?hello=' + encodeURIComponent(world);
If you're encoding a string to put in a URL component (a querystring parameter), you should call encodeURIComponent.
If you're encoding an existing URL, call encodeURI.
xkr.us has a great discussion, with examples. To quote their summary:
The escape() method does not encode the + character which is
interpreted as a space on the server side as well as generated by
forms with spaces in their fields. Due to this shortcoming and the
fact that this function fails to handle non-ASCII characters
correctly, you should avoid use of escape() whenever possible. The
best alternative is usually encodeURIComponent().
escape() will not encode: #*/+
Use of the encodeURI() method is a bit more specialized than escape()
in that it encodes for URIs as opposed to the querystring, which is
part of a URL. Use this method when you need to encode a string to be
used for any resource that uses URIs and needs certain characters to
remain un-encoded. Note that this method does not encode the '
character, as it is a valid character within URIs.
encodeURI() will not encode: ~!##$&*()=:/,;?+'
Lastly, the encodeURIComponent() method should be used in most cases
when encoding a single component of a URI. This method will encode
certain chars that would normally be recognized as special chars for
URIs so that many components may be included. Note that this method
does not encode the ' character, as it is a valid character within
URIs.
encodeURIComponent() will not encode: ~!*()'
Here is a summary.
escape() will not encode # * _ + - . /
Do not use it.
encodeURI() will not encode A-Z a-z 0-9 ; , / ? : # & = + $ - _ . ! ~ * ' ( ) #
Use it when your input is a complete URL like 'https://searchexample.com/search?q=wiki'
encodeURIComponent() will not encode A-Z a-z 0-9 - _ . ! ~ * ' ( )
Use it when your input is part of a complete URL
e.g
const queryStr = encodeURIComponent(someString)
encodeURI and encodeURIComponent are used for different purposes.
Some of the difference are
encodeURI is used to encode a full URL whereas encodeURIComponent is used for encoding a URI component such as a query string.
There are 11 characters which are not encoded by encodeURI, but encoded by encodeURIComponent.
List:
Character
encodeURI
encodeURIComponent
#
#
%23
$
$
%24
&
&
%26
+
+
%2B
,
,
%2C
/
/
%2F
:
:
%3A
;
;
%3B
=
=
%3D
?
?
%3F
#
#
%40
Notes:
encodeURIComponent does not encode -_.!~*'(). If you want to these characters are encoded, you have to replace them with a corresponding UTF-8 sequence of characters
If you want to learn more about encodeURI and encodeURIComponent, please check the reference link.
Reference Link
encodeURIComponent() : assumes that its argument is a portion (such as the protocol, hostname, path, or query string)
of a URI. Therefore it escapes the punctuation characters that are used to separate the portionsof a URI.
encodeURI(): is used for encoding existing url
Difference between encodeURI and encodeURIComponent:
encodeURIComponent(value) is mainly used to encode queryString parameter values, and it encodes every applicable character in value. encodeURI ignores protocol prefix (http://) and domain name.
In very, very rare cases, when you want to implement manual encoding to encode additional characters (though they don't need to be encoded in typical cases) like: ! * , then
you might use:
function fixedEncodeURIComponent(str) {
return encodeURIComponent(str).replace(/[!*]/g, function(c) {
return '%' + c.charCodeAt(0).toString(16);
});
}
(source)
Other answers describe the purposes. Here are the characters each function will actually convert:
control = '\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F'
+ '\x10\x11\x12\x13\x14\X15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F'
+ '\x7F'
encodeURI (control + ' "%<>[\\]^`{|}' )
encodeURIComponent(control + ' "%<>[\\]^`{|}' + '#$&,:;=?' + '+/#' )
escape (control + ' "%<>[\\]^`{|}' + '#$&,:;=?' + "!'()~")
All characters above are converted to percent-hexadecimal codes. Space to %20, percent to %25, etc. The characters below pass through unchanged.
Here are the characters the functions will NOT convert:
pass_thru = '*-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
encodeURI (pass_thru + '#$&,:;=?' + '+/#' + "!'()~")
encodeURIComponent(pass_thru + "!'()~")
escape (pass_thru + '+/#' )
As a general rule use encodeURIComponent. Don't be scared of the long name thinking it's more specific in it's use, to me it's the more commonly used method. Also don't be suckered into using encodeURI because you tested it and it appears to be encoding properly, it's probably not what you meant to use and even though your simple test using "Fred" in a first name field worked, you'll find later when you use more advanced text like adding an ampersand or a hashtag it will fail. You can look at the other answers for the reasons why this is.
w3schools says the following about encodeURIComponent function:
This function encodes special characters. In addition,
it encodes the following characters: , / ? : # & = + $ #.
Does that mean that it cannot encode a backslash (\)?
This function encodes special characters. In addition, it encodes the following characters: , / ? : # & = + $ # .
This definition is vague as to what "special characters" are. It sounds like a comparison between encodeURI and encodeURIComponent. Both will correctly escape \ as %5C, so you don't have to worry about backslashes.
encodeURI will leave the listed characters as it is assumed that the entire URI is being encoded:
encodeURI('http://example.com/foo bar/baz.html');
//produces "http://example.com/foo%20bar/baz.html"
encodeURIComponent will escape everything as it is assumed that the string is to be used as part of a query-string:
'http://example.com?foo=' + encodeURIComponent('http://example.com/fizz/buzz.html');
//produces "http://example.com?foo=http%3A%2F%2Fexample.com%2Ffizz%2Fbuzz.html"
Which of these two methods should be used for encoding URLs?
It depends on what you are actually wanting to do.
encodeURI assumes that the input is a complete URI that might have some characters which need encoding in it.
encodeURIComponent will encode everything with special meaning, so you use it for components of URIs such as
var world = "A string with symbols & characters that have special meaning?";
var uri = 'http://example.com/foo?hello=' + encodeURIComponent(world);
If you're encoding a string to put in a URL component (a querystring parameter), you should call encodeURIComponent.
If you're encoding an existing URL, call encodeURI.
xkr.us has a great discussion, with examples. To quote their summary:
The escape() method does not encode the + character which is
interpreted as a space on the server side as well as generated by
forms with spaces in their fields. Due to this shortcoming and the
fact that this function fails to handle non-ASCII characters
correctly, you should avoid use of escape() whenever possible. The
best alternative is usually encodeURIComponent().
escape() will not encode: #*/+
Use of the encodeURI() method is a bit more specialized than escape()
in that it encodes for URIs as opposed to the querystring, which is
part of a URL. Use this method when you need to encode a string to be
used for any resource that uses URIs and needs certain characters to
remain un-encoded. Note that this method does not encode the '
character, as it is a valid character within URIs.
encodeURI() will not encode: ~!##$&*()=:/,;?+'
Lastly, the encodeURIComponent() method should be used in most cases
when encoding a single component of a URI. This method will encode
certain chars that would normally be recognized as special chars for
URIs so that many components may be included. Note that this method
does not encode the ' character, as it is a valid character within
URIs.
encodeURIComponent() will not encode: ~!*()'
Here is a summary.
escape() will not encode # * _ + - . /
Do not use it.
encodeURI() will not encode A-Z a-z 0-9 ; , / ? : # & = + $ - _ . ! ~ * ' ( ) #
Use it when your input is a complete URL like 'https://searchexample.com/search?q=wiki'
encodeURIComponent() will not encode A-Z a-z 0-9 - _ . ! ~ * ' ( )
Use it when your input is part of a complete URL
e.g
const queryStr = encodeURIComponent(someString)
encodeURI and encodeURIComponent are used for different purposes.
Some of the difference are
encodeURI is used to encode a full URL whereas encodeURIComponent is used for encoding a URI component such as a query string.
There are 11 characters which are not encoded by encodeURI, but encoded by encodeURIComponent.
List:
Character
encodeURI
encodeURIComponent
#
#
%23
$
$
%24
&
&
%26
+
+
%2B
,
,
%2C
/
/
%2F
:
:
%3A
;
;
%3B
=
=
%3D
?
?
%3F
#
#
%40
Notes:
encodeURIComponent does not encode -_.!~*'(). If you want to these characters are encoded, you have to replace them with a corresponding UTF-8 sequence of characters
If you want to learn more about encodeURI and encodeURIComponent, please check the reference link.
Reference Link
encodeURIComponent() : assumes that its argument is a portion (such as the protocol, hostname, path, or query string)
of a URI. Therefore it escapes the punctuation characters that are used to separate the portionsof a URI.
encodeURI(): is used for encoding existing url
Difference between encodeURI and encodeURIComponent:
encodeURIComponent(value) is mainly used to encode queryString parameter values, and it encodes every applicable character in value. encodeURI ignores protocol prefix (http://) and domain name.
In very, very rare cases, when you want to implement manual encoding to encode additional characters (though they don't need to be encoded in typical cases) like: ! * , then
you might use:
function fixedEncodeURIComponent(str) {
return encodeURIComponent(str).replace(/[!*]/g, function(c) {
return '%' + c.charCodeAt(0).toString(16);
});
}
(source)
Other answers describe the purposes. Here are the characters each function will actually convert:
control = '\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F'
+ '\x10\x11\x12\x13\x14\X15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F'
+ '\x7F'
encodeURI (control + ' "%<>[\\]^`{|}' )
encodeURIComponent(control + ' "%<>[\\]^`{|}' + '#$&,:;=?' + '+/#' )
escape (control + ' "%<>[\\]^`{|}' + '#$&,:;=?' + "!'()~")
All characters above are converted to percent-hexadecimal codes. Space to %20, percent to %25, etc. The characters below pass through unchanged.
Here are the characters the functions will NOT convert:
pass_thru = '*-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
encodeURI (pass_thru + '#$&,:;=?' + '+/#' + "!'()~")
encodeURIComponent(pass_thru + "!'()~")
escape (pass_thru + '+/#' )
As a general rule use encodeURIComponent. Don't be scared of the long name thinking it's more specific in it's use, to me it's the more commonly used method. Also don't be suckered into using encodeURI because you tested it and it appears to be encoding properly, it's probably not what you meant to use and even though your simple test using "Fred" in a first name field worked, you'll find later when you use more advanced text like adding an ampersand or a hashtag it will fail. You can look at the other answers for the reasons why this is.
When I JSON.stringify() the following code:
var exampleObject = { "name" : "Žiga Kovač", "kraj" : "Žužemberk"};
I get different results between browsers.
IE8 and Google Chrome return:
{"name":"\u017diga Kova\u010d","kraj":"\u017du\u017eemberk"}
While Firefox and Opera return:
{"name":"Žiga Kovač","kraj":"Žužemberk"}
I am using the browser's native JSON implementation in all 4 browsers. If I undefine the native JSON implementation and replace it with the one from json.org, then all browsers return:
{"name":"Žiga Kovač","kraj":"Žužemberk"}
Why is this happening, which result is correct and is it possible to make that all browsers return:
{"name":"\u017diga Kova\u010d","kraj":"\u017du\u017eemberk"}
?
These two representations are absolutely equivalent.
The one uses Unicode escape sequences (\uxxxx) to represent a Unicode character, the other uses an actual Unicode character. json.org defines a string as:
string
- ""
- "chars"
chars
- char
- char chars
char
- any Unicode character except " or \ or control characters
- one of: \" \\ \/ \b \f \n \r \t
- \u four-hex-digits
There is no difference in the strings themselves, only in their representation. This is the same thing HTML does when you use ©, © or © to represent the copyright sign.
The 'correct' (visibly) version is a UTF8 string, and the escaped string is an ASCII string with UTF8 escape codes. While the first one can be used in an HTTP body (as long as content-encoding is set to UTF8), the second one can also be used in an HTTP GET request header.
If you want to use the UTF8 version in a GET request, you need to escape it first, using encodeURIComponent.
When the content is received on the server side, the native string implementation will make sure that it contains exactly the same data (from all clients), provided that the HTTP transmission is correct.
Your browser will generally handle the encoding of it, if you send it as an HTTP POST body.
Both result's are correct, as long as your first example is encoded in UTF-8.
e.g. \u017d ist just another notation of Ž (017d is the position in UTF8-charset)
They are all correct. Some are returning it encoded in UTF-8, and some in ASCII.