encodeURI doesn't escape `equals` - why? - javascript

I have an URI like that:
http://client.dev/dap/module/hdfs-web/api/v1.0/clusters/Cluster%201%20-%20CDH4?operation=copy&to=/user/hdfs/year=2016/partial.txt&overwrite=true
I use encodeURI function to escape string. I'm wondering why spaces are encoded with %20 while equals characters are not?

encodeURI encodes a full URI, and URIs can contain = characters. For instance, if a user types in a URI, a first step to resolve it would be to call encodeURI on it.
If on the other hand you are the one constructing the URI, and the input just determines one field (for instance a search query, when given E=mc² you want to resolve https://www.google.com/search?q=E%3Dmc%C2%B2), then you are not encoding a full URI, but a URI component. Use encodeURIComponent for that:
> encodeURIComponent('= ')
'%3D%20'

The encodeURI() function is used to encode a URI.
This function encodes special characters, except:, / ? : # & = + $ # (Use encodeURIComponent() to encode these characters).
Tip: Use the decodeURI() function to decode an encoded URI.
SOURCE: W3Schools

Related

Difference between : and :?

Can somebody tell me the difference between two in html. i was found this : in url but when i'm connecting with request in node js, the respond is ERR_UNESCAPED_CHARACTERS.
Which is confusing because symbols are like : (colon).
Is there a way to solve this?
You can use encodeURI to encode the special character. It assumes that the input is a complete URI that might have some characters which need encoding in it.
encodeURI(url)
This function encodes special characters, except: , / ? : # & = + $ #
(Use encodeURIComponent() to encode these characters).
Tip: Use the decodeURI() function to decode an encoded URI.

Replacement for javascript escape?

I know that the escape function has been deprecated and that you should use encodeURI or encodeURIComponent instead. However, the encodeUri and encodeUriComponent doesn't do the same thing as escape.
I want to create a mailto link in javascript with Swedish åäö. Here are a comparison between escape, encodeURIComponent and encodeURI:
var subject="åäö";
var body="bodyåäö";
console.log("mailto:?subject="+escape(subject)+"&body=" + escape(body));
console.log("mailto:?subject="+encodeURIComponent(subject)+"&body=" + encodeURIComponent(body));
console.log("mailto:?subject="+encodeURI(subject)+"&body=" + encodeURI(body));
Output:
mailto:?subject=My%20subject%20with%20%E5%E4%F6&body=My%20body%20with%20more%20characters%20and%20swedish%20%E5%E4%F6
mailto:?subject=My%20subject%20with%20%C3%A5%C3%A4%C3%B6&body=My%20body%20with%20more%20characters%20and%20swedish%20%C3%A5%C3%A4%C3%B6
mailto:?subject=My%20subject%20with%20%C3%A5%C3%A4%C3%B6&body=My%20body%20with%20more%20characters%20and%20swedish%20%C3%A5%C3%A4%C3%B6
Only the mailto link created with "escape" opens a properly formatted mail in Outlook using IE or Chrome. When using encodeURI or encodeURIComponent the subject says:
My subject with åäö
and the body is also looking messed up.
Is there some other function besides escape that I can use to get the working mailto link?
escape() is defined in section B.2.1.2 escape and the introduction text of Annex B says:
... All of the language features and behaviours specified in this annex have one or more undesirable characteristics and in the absence of legacy usage would be removed from this specification. ...
For characters, whose code unit value is 0xFF or less, escape() produces a two-digit escape sequence: %xx. This basically means, that escape() converts a string containing only characters from U+0000 to U+00FF to an percent-encoded string using the latin-1 encoding.
For characters with a greater code unit, the four-digit format %uxxxx is used. This is not allowed within the hfields section (where subject and body are stored) of an mailto:-URI (as defined in RFC6068):
mailtoURI = "mailto:" [ to ] [ hfields ]
to = addr-spec *("," addr-spec )
hfields = "?" hfield *( "&" hfield )
hfield = hfname "=" hfvalue
hfname = *qchar
hfvalue = *qchar
...
qchar = unreserved / pct-encoded / some-delims
some-delims = "!" / "$" / "'" / "(" / ")" / "*"
/ "+" / "," / ";" / ":" / "#"
unreserved and pct-encoded are defined in STD66:
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
A percent sign is only allowed if it is directly followed by two hexdigits, percent followed by u is not allowed.
Using a self-implemented version, that behaves exactly like escape doesn't solve anything - instead just continue to use escape, it won't be removed anytime soon.
To summerise: Your previous usage of escape() generated latin1-percent-encoded mailto-URIs if all characters are in the range U+0000 to U+00FF, otherwise an invalid URI was generated (which might still be correctly interpreted by some applications, if they had javascript-encode/decode compatibility in mind).
It is more correct (no risk of creating invalid URIs) and future-proof, to generate UTF8-percent-encoded mailto-URIs using encodeURIComponent() (don't use encodeURI(), it does not escape ?, /, ...). RFC6068 requires usage of UTF-8 in many places (but allows other encodings for "MIME encoded words and for bodies in composed email messages").
Example:
text_latin1="Swedish åäö"
text_other="Emoji 😎"
document.getElementById('escape-latin-1-link').href="mailto:?subject="+escape(text_latin1);
document.getElementById('escape-other-chars-link').href="mailto:?subject="+escape(text_other);
document.getElementById('utf8-link').href="mailto:?subject="+encodeURIComponent(text_latin1);
document.getElementById('utf8-other-chars-link').href="mailto:?subject="+encodeURIComponent(text_other);
function mime_word(text){
q_encoded = encodeURIComponent(text) //to utf8 percent encoded
.replace(/[_!'()*]/g, function(c){return '%'+c.charCodeAt(0).toString(16).toUpperCase();})// encode some more chars as utf8
.replace(/%20/g,'_') // mime Q-encoding is using underscore as space
.replace(/%/g,'='); //mime Q-encoding uses equal instead of percent
return encodeURIComponent('=?utf-8?Q?'+q_encoded+'?=');//add mime word stuff and escape for uri
}
//don't use mime_word for body!!!
document.getElementById('mime-word-link').href="mailto:?subject="+mime_word(text_latin1);
document.getElementById('mime-word-other-chars-link').href="mailto:?subject="+mime_word(text_other);
<a id="escape-latin-1-link">escape()-latin1</a><br/>
<a id="escape-other-chars-link">escape()-emoji</a><br/>
<a id="utf8-link">utf8</a><br/>
<a id="utf8-other-chars-link">utf8-emoji</a><br/>
<a id="mime-word-link">mime-word</a><br/>
<a id="mime-word-other-chars-link">mime-word-emoji</a><br/>
For me, the UTF-8 links and the Mime-Word links work in Thunderbird. Only the plain UTF-8 links work in Windows 10 builtin Mailapp and my up-to-date version of Outlook.
To quote the MDN Documentation directly...
This function was used mostly for URL queries (the part of a URL following ?)—not for escaping ordinary String literals, which use the format "\xHH". (HH are two hexadecimal digits, and the form \xHH\xHH is used for higher-plane Unicode characters.)
The problem you are experiencing is because escape() does not support the UTF-8 while encodeURI() and encodeURIComponent() do.
But to be absolutely clear: never use encodeURI() or encodeURIComponent(). Let's just try it out:
console.log(encodeURIComponent('##*'));
Input: ##*. Output: %40%23*. Ordinarily, once user input is cleansed, I feel like I can trust that cleansed input. But if I ran rm * on my Linux system to delete a file specified by a user, that would literally delete all files on my system, even if I did the encoding 100% completely server-side. This is a massive bug in encodeURI() and encodeURIComponent(), which MDN Web docs clearly point with a solution.
Use fixedEncodeURI(), when trying to encode a complete URL (i.e., all of example.com?arg=val), as defined and further explained at the MDN encodeURI() Documentation...
function fixedEncodeURI(str) {
return encodeURI(str).replace(/%5B/g, '[').replace(/%5D/g, ']');
}
Or, you may need to use use fixedEncodeURIComponent(), when trying to encode part of a URL (i.e., the arg or the val in example.com?arg=val), as defined and further explained at the MDN encodeURIComponent() Documentation...
function fixedEncodeURIComponent(str) {
return encodeURIComponent(str).replace(/[!'()*]/g, function(c) {
return '%' + c.charCodeAt(0).toString(16);
});
}
If you are having trouble distinguishing what fixedEncodeURI(), fixedEncodeURIComponent(), and escape() do, I always like to simplify it with:
fixedEncodeURI() : will not encode +#?=:#;,$& to their http-encoded equivalents (as & and + are common URL operators)
fixedEncodeURIComponent() will encode +#?=:#;,$& to their http-encoded equivalents.
The escape() function was deprecated in JavaScript version 1.5. Use encodeURI() or encodeURIComponent() instead.
example
string: "May/June 2016, Volume 72, Issue 3"
escape: "May/June%202016%2C%20Volume%2072%2C%20Issue%203"
encodeURI: "May/June%202016,%20Volume%2072,%20Issue%203"
encodeURIComponent:"May%2FJune%202016%2C%20Volume%2072%2C%20Issue%203"
source https://www.w3schools.com/jsref/jsref_escape.asp

Encoding URL in JavaScript

I'm unable to encoding data URI:
var uri ="...";
var res = encodeURI(uri);
document.location.href = 'display.jsp?img='+res;
After encoding, I'm getting the same uri. display.jsp is landing as am empty page.
There is no encoding happening because what you have there is already a valid, completely encoded URI.
If you want to use that as a parameter in an other URI, you should use encodeURIComponent:
document.location.href = 'display.jsp?img='+encodeURIComponent(uri);
It is not correct to use encodeURI() as this function encodes special character except: , / ? : # & = + $ #
Use encodeURIComponent() to encode these characters.
For more info check below link:
http://www.w3schools.com/jsref/jsref_encodeuri.asp
Your problem is that the encodeURI function is for making a URL valid for a browser, not for formatting content into a URL (which is what you're doing). A base64 string is already formatted in such a way that it registers as valid. To encode it as part of the URL, you need to use encodeURLComponent.
Basically, just use:
var uri ="...";
var res = encodeURIComponent(uri);
document.location.href = 'display.jsp?img='+res;
For more info, check out: When are you supposed to use escape instead of encodeURI / encodeURIComponent?

What is the best way to serialize a JavaScript object into something that can be used as a fragment identifier (url#hash)?

My page state can be described by a JavaScript object that can be serialized into JSON. But I don't think a JSON string is suitable for use in a fragment ID due to, for example, the spaces and double-quotes.
Would encoding the JSON string into a base64 string be sensible, or is there a better way? My goal is to allow the user to bookmark the page and then upon returning to that bookmark, have a piece of JavaScript read window.location.hash and change state accordingly.
I think you are on a good way. Let's write down the requirements:
The encoded string must be usable as hash, i.e. only letters and numbers.
The original value must be possible to restore, i.e. hashing (md5, sha1) is not an option.
It shouldn't be too long, to remain usable.
There should be an implementation in JavaScript, so it can be generated in the browser.
Base64 would be a great solution for that. Only problem: base64 also contains characters like - and +, so you win nothing compared to simply attaching a JSON string (which also would have to be URL encoded).
BUT: Luckily, theres a variant of base64 called base64url which is exactly what you need. It is specifically designed for the type of problem you're describing.
However, I was not able to find a JS implementation; maybe you have to write one youself – or do a bit more research than my half-assed 15 seconds scanning the first 5 Google results.
EDIT: On a second thought, I think you don't need to write an own implementation. Use a normal implementation, and simply replace the “forbidden” characters with something you find appropriate for your URLs.
Base64 is an excellent way to store binary data in text. It uses just 33% more characters/bytes than the original data and mostly uses 0-9, a-z, and A-Z. It also has three other characters that would need encoded to be stored in the URL, which are /, =, and +. If you simply used URL encoding, it would take up 300% (3x) the size.
If you're only storing the characters in the fragment of the URL, base64-encoded text it doesn't need to be re-encoded and will not change. But if you want to send the data as part of the actual URL to visit, then it matters.
As referenced by lxg, there there is a base64url variant for that. This is a modified version of base64 to replace unsafe characters to store in the URL. Here is how to encode it:
function tobase64url(s) {
return btoa(x).replace(/\+/g,'-').replace(/\//g,'_').replace(/=/g,'');
}
console.log(tobase64url('\x00\xff\xff\xf1\xf1\xf1\xff\xff\xfe'));
// Returns "AP__8fHx___-" instead of "AP//8fHx///+"
And to decode a base64 string from the URL:
function frombase64url(s) {
return atob(x.replace(/-/g,'+').replace(/_/g, '/'));
}
Use encodeURIComponent and decodeURIComponent to serialize data for the fragment (aka hash) part of the URL.
This is safe because the character set output by encodeURIComponent is a subset of the character set allowed in the fragment. Specifically, encodeURIComponent escapes all characters except:
A - Z
a - z
0 - 9
- . _ ~ ! ' ( ) *
So the output includes the above characters, plus escaped characters, which are % followed by hexadecimal digits.
The set of allowed characters in the fragment is:
A - Z
a - z
0 - 9
? / : # - . _ ~ ! $ & ' ( ) * + , ; =
percent-encoded characters (a % followed by hexadecimal digits)
This set of allowed characters includes all the characters output by encodeURIComponent, plus a few other characters.

How to prepare encoded POST data on javascript?

I need to send data by POST method.
For example, I have the string "bla&bla&bla". I tried using encodeURI and got "bla&bla&bla" as the result. I need to replace "&" with something correct for this example.
What kind of method should I call to prepare correct POST data?
UPDATED:
I need to convert only charachters which may broke POST request. Only them.
>>> encodeURI("bla&bla&bla")
"bla&bla&bla"
>>> encodeURIComponent("bla&bla&bla")
"bla%26bla%26bla"
You can also use escape() function.The escape() function encodes a string.
This function makes a string portable, so it can be transmitted across any network to any computer that supports ASCII characters.This function encodes special characters, with the exception of: * # - _ + . /
var queryStr = "bla&bla&bla";
alert(queryStr); //bla&bla&bla
alert(escape(queryStr)); //bla%26bla%26bla
Use unescape() to decode a string.
var newQueryStr=escape(queryStr);
alert(unescape(newQueryStr)); //bla&bla&bla
Note:
escape() will not encode: #*/+
encodeURI() will not encode: ~!##$&*()=:/,;?+'
encodeURIComponent() will not encode: ~!*()'
After some search on internet, I got the following:
escape()
Don't use it.
encodeURI()
Use encodeURI when you want a working URL. Make this call:
encodeURI("http://www.google.com/a file with spaces.html")
to get:
http://www.google.com/a%20file%20with%20spaces.html
Don't call encodeURIComponent since it would destroy the URL and return
http%3A%2F%2Fwww.google.com%2Fa%20file%20with%20spaces.html
encodeURIComponent()
Use encodeURIComponent when you want to encode a URL parameter.
param1 = encodeURIComponent("http://xyz.com/?a=12&b=55")
Then you may create the URL you need:
url = "http://domain.com/?param1=" + param1 + "&param2=99";
And you will get this complete URL:
http://www.domain.com/?param1=http%3A%2F%2Fxyz.com%2F%Ffa%3D12%26b%3D55¶m2=99
Note that encodeURIComponent does not escape the ' character. A common bug is to use it to create html attributes such as href='MyUrl', which could suffer an injection bug. If you are constructing html from strings, either use " instead of ' for attribute quotes, or add an extra layer of encoding (' can be encoded as %27).
REF:When are you supposed to use escape instead of encodeURI / encodeURIComponent?
Also, as you are using JQuery, take a look at this built-in function.
Use encodeURIComponent() as encodeURI() will not encode: ~!##$&*()=:/,;?+'
This has been explained quite well at the following link:
http://xkr.us/articles/javascript/encode-compare/
More recent DOM APIs for URLSearchParams (and via URL, possibly others too) handle encoding in some cases. For example, create or use an existing URL object (like from an anchor tag) I map entries of an object as key value pairs for URL encoded params (to use for GET/POST/etc in application/x-www-form-urlencoded mimetype). Note how the emoji, ampersand and double quotes are encoded without any special handling (copied from the Chrome devtools console):
var url = new URL(location.pathname, location.origin);
Object.entries({a:1,b:"🍻",c:'"stuff&things"'}).forEach(url.searchParams.set, url.searchParams);
url.search;
"?a=1&b=%F0%9F%8D%BB&c=%22stuff%26things%22"
fetch(url.pathname, {
method: 'POST',
headers: new Headers({
"Content-type": "application/x-www-form-urlencoded"
}),
// same format as GET url search params a&b&c
body: url.searchParams
}).then((res)=>{ console.log(res); return res }).catch((res)=>{ console.warn(res); return res; });
I want POST the javascript-created hidden form.
So the question is if encodeURIComponent() should be used on each POST variable.
I haven't found the answer for Dmitry's (and my) question in this thread.
But I have found the answer in this thread.
In case of form/POST where you have upload field(s) you must use <form enctype="multipart/form-data">, if no upload field is used, you should choose yourself as described here.
Submitting the form should do the job completly, so there is no need to use encodeURIComponent() explicitly.
If you create a Http POST without using a form or without some library which creates a Http POST from your data, then you need choose an enctype= and join data yourselves.
This will be easy for application/x-www-form-urlencoded, where you will use encodeURIComponent() on each value and join them exactly as for GET request.
If you decide use multipart/form-data then ....? You should google more how to encode and join them in such case.

Categories

Resources