In particular, when saving a JSON to the cookie is it safe to just save the raw value?
The reason I dopn't want to encode is because the json has small values and keys but a complex structure, so encoding, replacing all the ", : and {}, greatly increases the string length
if your values contain "JSON characters" (e.g. comma, quotes, [] etc) then you should probably use encodeURIComponent so these get escaped and don't break your code when reading the values back.
You can convert your JSON object to a string using the JSON.stringify() method then save it in a cookie.
Note that cookies have a 4000 character limit.
If your Json string is valid there should be no need to encode it.
e.g.
JSON.stringify({a:'foo"bar"',bar:69});
=> '{"a":"foo\"bar\"","bar":69}' valid json stings are escaped.
This is documented very well on MDN
To avoid unexpected requests to the server, you should call encodeURIComponent on any user-entered parameters that will be passed as part of a URI. For example, a user could type "Thyme &time=again" for a variable comment. Not using encodeURIComponent on this variable will give comment=Thyme%20&time=again. Note that the ampersand and the equal sign mark a new key and value pair. So instead of having a POST comment key equal to "Thyme &time=again", you have two POST keys, one equal to "Thyme " and another (time) equal to again.
If you can't be certain that your JSON will not include reserved characters such as ; then you will want to perform escaping on any strings being stored as a cookie. RFC 6265 covers special characters that are not allowed in the cookie-name or cookie-value.
If you are encoding static content you control, then this escaping may be unnecessary. If you are encoding dynamic content such as encoding user generated content, you probably need escaping.
MDN recommends using encodeURIComponent to escape any disallowed characters.
You can pull in a library such as cookie to handle this for you, but if your server is written in another language you will need to ensure it uses a library or language utilities to encodeURIComponent when setting cookies and to decodeURIComponent when reading cookies.
JSON.stringify is not sufficient as illustrated by this trivial example:
const bio = JSON.stringify({ "description": "foo; bar; baz" });
document.cookie = `bio=${stringified}`;
// Notice that the content after the first `;` is dropped.
// Attempting to JSON.parse this later will fail.
console.log(document.cookie) // bio={\"description\":\"foo;
Cookie: name=value; name2=value2
Spaces are part of the cookie separation in the HTTP Cookie header. Raw spaces in cookie values could thus confuse the server.
Related
I'm working on a tool that reads arbitrary data files and creates a table out of its data which I then store in a database. I'd like to preserve the column headers. The column headers are already ASCII text (or maybe latin1), but they have characters that aren't valid variable names (e.g., spaces, %), so I need to encode them somehow. I'm looking for an encoding for the column titles that has these properties:
Legible: it would be nice if the encoded text looked as similar as possible to the unencoded text (i.e., for debugging).
Legal identifier: I'd like the encoded text to be a valid JavaScript identifier (ECMA-262 Section 7.6).
Invertible: I'd like to be able to get the exact original text back from the encoded text.
I can think of approaches that work for 2 of the 3 cases, but I don't know how to get all 3. E.g., url encoding doesn't produce legal identifier names, I think I could transform base64 to be legal, but it isn't legible, what I've got currently just does some substitutions so it's not invertible.
Efficiency isn't a concern, so if necessary, I could store the encoded and unencoded texts together. The best option I can think of is to use url encoding and then swap percents for $. I thought there would be better options than this though, but I can't find anything. Is there anything better?
This pair of methods relying on Guava's PercentEscaper seems to meet my requirements. Guava doesn't provide an unescaper, but given my simple needs here, I can just use a simple URLDecoder.
private static PercentEscaper escaper = new PercentEscaper('',false)
static String getIdentifier(String str) {
//minimal safe characters, but leaves letters alone, so it's somewhat legible
String escaped = escaper.escape(str);
//javascript identifiers can't start with a digit, and the escaper doesn't know the first
//character has different rules. so prepend a "%3" to encode the digit
if(Character.isDigit(escaped.charAt(0))){
escaped = "%3"+escaped
}
//a percent isn't a valid in a javascript identifier, so we'll use _ as our special character
escaped = escaped.replace('%','_');
return escaped;
}
static String invertIdentifier(String str){
String unescaped = str.replace('_','%');
unescaped = URLDecoder.decode(unescaped, "UTF-8");
return unescaped;
}
I have a GET request that takes a parameter, this parameter is also a URL. So normally I just encode the URL and then decode it in my server, this works pefectly from Java, but now I am on jQuery and I have a problem with it.
This is the value of that parameter:
http://www.BookOntology.com/bo#ania
When I encode it like this:
encodeURI(userURI)
I get the same value, while i thought that i should have gotten this
http%3A%2F%2Fwww.BookOntology.com%2Fbo%23ania
To show you what is the wrong
My current approach (which is using econdeURI) brings this final URL (note that I just want to encode the paramter not the whole URL).
http://bla bla bla?userURI=http://www.BookOntology.com/bo#ania
But in the server when i read the value of the userURI parameter i get:
http://www.BookOntology.com/bo
It is definitely a problem with the way i encode that value of that parameter because, again, the value after and before encoding is the same though the value contains some characters that should be changed.
Could you help me pass that please?
Try with encodeURIComponent function , which encodes a Uniform Resource Identifier (URI)
DEMO: encode input value
Read the MDN DOCS for more info.
encodeURI only changes characters that can't appear in a URL at all.
You're looking for encodeURIComponent which encodes all characters with special meaning in a URL as well (and makes it suitable for inserting in a query string).
My page state can be described by a JavaScript object that can be serialized into JSON. But I don't think a JSON string is suitable for use in a fragment ID due to, for example, the spaces and double-quotes.
Would encoding the JSON string into a base64 string be sensible, or is there a better way? My goal is to allow the user to bookmark the page and then upon returning to that bookmark, have a piece of JavaScript read window.location.hash and change state accordingly.
I think you are on a good way. Let's write down the requirements:
The encoded string must be usable as hash, i.e. only letters and numbers.
The original value must be possible to restore, i.e. hashing (md5, sha1) is not an option.
It shouldn't be too long, to remain usable.
There should be an implementation in JavaScript, so it can be generated in the browser.
Base64 would be a great solution for that. Only problem: base64 also contains characters like - and +, so you win nothing compared to simply attaching a JSON string (which also would have to be URL encoded).
BUT: Luckily, theres a variant of base64 called base64url which is exactly what you need. It is specifically designed for the type of problem you're describing.
However, I was not able to find a JS implementation; maybe you have to write one youself – or do a bit more research than my half-assed 15 seconds scanning the first 5 Google results.
EDIT: On a second thought, I think you don't need to write an own implementation. Use a normal implementation, and simply replace the “forbidden” characters with something you find appropriate for your URLs.
Base64 is an excellent way to store binary data in text. It uses just 33% more characters/bytes than the original data and mostly uses 0-9, a-z, and A-Z. It also has three other characters that would need encoded to be stored in the URL, which are /, =, and +. If you simply used URL encoding, it would take up 300% (3x) the size.
If you're only storing the characters in the fragment of the URL, base64-encoded text it doesn't need to be re-encoded and will not change. But if you want to send the data as part of the actual URL to visit, then it matters.
As referenced by lxg, there there is a base64url variant for that. This is a modified version of base64 to replace unsafe characters to store in the URL. Here is how to encode it:
function tobase64url(s) {
return btoa(x).replace(/\+/g,'-').replace(/\//g,'_').replace(/=/g,'');
}
console.log(tobase64url('\x00\xff\xff\xf1\xf1\xf1\xff\xff\xfe'));
// Returns "AP__8fHx___-" instead of "AP//8fHx///+"
And to decode a base64 string from the URL:
function frombase64url(s) {
return atob(x.replace(/-/g,'+').replace(/_/g, '/'));
}
Use encodeURIComponent and decodeURIComponent to serialize data for the fragment (aka hash) part of the URL.
This is safe because the character set output by encodeURIComponent is a subset of the character set allowed in the fragment. Specifically, encodeURIComponent escapes all characters except:
A - Z
a - z
0 - 9
- . _ ~ ! ' ( ) *
So the output includes the above characters, plus escaped characters, which are % followed by hexadecimal digits.
The set of allowed characters in the fragment is:
A - Z
a - z
0 - 9
? / : # - . _ ~ ! $ & ' ( ) * + , ; =
percent-encoded characters (a % followed by hexadecimal digits)
This set of allowed characters includes all the characters output by encodeURIComponent, plus a few other characters.
I am having a problem with special character in javascript.
I have a form with a input text that has the following string:
10/10/2010
after a form.serialize(); I get this string as
10%2F10%2F2010
The '/' character is converted to its ASCII code %2F.
I would be able to convert that using String.fromCharCode(ascii_code) but I have many inputs in my form so these string is somenthing like:
var=14&var=10%2F10%2F2010&var=10%2F10%2F2010&var=10%2F10%2F2010
Just an example to state that I would have to go through this string ("manually") and find those value and convert it.
Is there any easy way to perform that conversion?
Strange thing because I did not have that problem before, I am not sure why this is happening now.
I happens that way because that's how it's meant to be:
The .serialize() method creates a text string in standard URL-encoded
notation. It operates on a jQuery object representing a set of form
elements.
As far as I know, there's no native jQuery function to unserialize but your post suggests you already got that and are only stuck in the URL-encoded strings:
decodeURIComponent(encodedURI)Decodes a Uniform Resource Identifier (URI) component previously created by encodeURIComponent or
by a similar routine.
I've got some data from dbpedia using jena and since jena's output is based on xml so there are some circumstances that xml characters need to be treated differently like following :
Guns n ' Roses
I just want to know what kind of econding is this?
I want decode/encode my input based on above encode(r) with the help of javascript and send it back to a servlet.
(edited post if you remove the space between & and amp you will get the correct character since in stackoverflow I couldn't find a way to do that I decided to put like that!)
Seems to be XML entity encoding, and a numeric character reference (decimal).
A numeric character reference refers to a character by its Universal
Character Set/Unicode code point, and uses the format
You can get some info here: List of XML and HTML character entity references on Wikipedia.
Your character is number 39, being the apostrophe: ', which can also be referenced with a character entity reference: '.
To decode this using Javascript, you could use for example php.js, which has an html_entity_decode() function (note that it depends on get_html_translation_table()).
UPDATE: in reply to your edit: Basically that is the same, the only difference is that it was encoded twice (possibly by mistake). & is the ampersand: &.
This is an SGML/HTML/XML numeric character entity reference.
In this case for an apostrophe '.