Converting JSON strings with escaped Unicode characters to JavaScript objects - javascript

I have a JSON string which contains an escaped Unicode character. The JSON includes this snippet:
I co-ordinate our Chat Literacy network \u2013 an online group for practitioners of Information Literacy
The \u2013 is a long dash.
I'm using
var theObject = eval ("(" + jsonString + ")");
to convert the JSON string to a JavaScript object. I need to use a version of SpiderMonkey that doesn't have a direct JSON to Object method in it.
After conversion, the character in question becomes the Unicode control character \0013 which is an invalid UTF-8 character.
Is there another way I can convert the JSON to an object which will preserve the correct long-dash character? Maybe some other JSON to Object method I can load?
This happens with some other characters also, like curly quotes.
Thanks,
Doug

eval() is evil. Stay away from it.
Try using JSON 3: http://bestiejs.github.io/json3/

Related

Keeping escaped unicode characters with JSON.stringify of JSON.parse

I have an input JSON like this (which really contains the literal values "\u2013" (the encoded form of a unicode character)):
{"source":"Subject: NEED: 11/5 BNA-MSL \u2013 1200L Departure - 1 Pax"}
I read it with JSON.parse and it reads the \u2013 as –, which is fine for display in my app.
However, I need to export again the same JSON, to send it down to some other app. I want to keep the same format and have back the \u2013 into the JSON. I am doing JSON.stringify, but it keeps the – in the output.
Any idea what I could do to keep the \u syntax?
Using a replacer function in a JSON.stringify call didn't work - strings returned from the replacer with an escaped backslash produce a double backslash in output, and a single backslashed character is unescaped in output if possible.
Simply re-escaping the stringify result has potential:
const obj = {"source":"Subject: NEED: 11/5 BNA-MSL \u2013 1200L Departure - 1 Pax"}
console.log(" stringify: ", JSON.stringify( obj));
console.log("& replaceAll: ", JSON.stringify(obj).replaceAll('\u2013', '\\u2013'));
using more complex string modifications as necessary.
However this looks very like an X solution to an X-Y problem. Better might be to fix the downstream parsing to handle JSON text as JSON text and not try to use it in raw form - particularly given that JSON text in encoded in utf-8 and can handle non-ASCII characters without special treatment.

Failed to declare and display UTF-8 character properly in JSON

I have a JSON object where one attribute contains a static special character - https://www.compart.com/en/unicode/U+1F514
I have tried to store the string both as encoded UTF-8 "\xF0\x9F\x94\x94"
or tried to print it using its HEX value - String.fromCharCode(0x1F514) or decimal value String.fromCharCode(128276)
But it all results in an empty charater/empty square character in Google Chrome.
How can I please store this character properly, statically in a simple JSON {header1:"____"} and then echo it?
Also not able to display it in IntelliJ - so if you have a comment regarding this side issue would be very thankful.
For historial reasons, JavaScript doesn't have full Unicode support because language creators assumed that UTF-16 would never need more than 2-bytes to encode a single character. JSON inherits that and \u entities only accept 4 hexadecimal characters.
You need to use a workaround that basically consists on splitting the actual 4-byte UTF-16 character in two 2-byte characters, as in:
var raw = "🔔";
var doesNotWork = "\u1F514";
var works = "\uD83D\uDD14";
console.log(raw, doesNotWork, works);
... or get rid of entities and just dump the actual binary character:
var data = ["🔔"];
var json = JSON.stringify(data);
console.log(json, JSON.parse(json));
I think that the problem is that the font doesn't have support for such symbol, hence the square character being drawn. If there is not an specific reason as why you are using this character, you could draw it with an icon, or using a character in an icon font.

JSON unicode characters conversion

I came across this strange JSON which I can't seem to decode.
To simplify things, let's say it's a JSON string:
"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"
After decoding it should look as following:
└── mystring
JS or PHP doesn't seem to convert it correctly.
js> JSON.parse('"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"')
ffe2ff94ff94ffe2ff94ff80ffe2ff94ff80 mystring
PHP behaves the same
php> json_decode('"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"')
ffe2ff94ff94ffe2ff94ff80ffe2ff94ff80 mystring
Any ideas how to properly parse this JSON string would be welcome.
It is not valid JSON string - JSON supports only 4 hex digits after \u. Results from both PHP and JS are correct.
It is not possible decode this using standard functions.
Where did you get this JSON string?
About correct json for string you want to get - it should be "\u2514\u2500\u2500 mystring", or just "└── mystring" (json supports any unicode characters in strings except " and \).
Also if you need to encode some character that require more than two bytes - it will result in two escape codes for example "𩄎" would be "\ud864\udd0e" when escaped.
So, If you really need to decode string above - you can fix it before decoding, replacing \uffffffe2 by \uffff\uffe2 via regexp (for js it would be something like: s.replace(/(\\u[A-Fa-f0-9]{4})([A-Fa-f0-9]{4})/gi,'$1\\u$2') ).
But anyway character codes in string specified above does not look right.

JSON.parse failing on valid Json. Have escaped control characters.If

I've escaped control characters and am feeding my validated JSON into JSON.parse and jQuery.parseJSON. Both are giving the same result.
Getting error message "Unexpected token $":
$(function(){
try{
$.parseJSON('"\\\\\"$\\\\\"#,##0"');
} catch (exception) {
alert(exception.message);
}
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>
Thanks for checking out this issue.
What's happening here is that there are two levels of backslash removal being applied to the string. The first is done by the browser's JavaScript engine when it parses the single-quoted string. In JavaScript, single-quoted strings and double-quoted strings are exactly equivalent (other than the fact that single-quotes must be backslash-escaped in single-quoted strings and double-quotes must be backslash-escaped in double-quoted strings); both types of strings take backslash escape codes such as \\ for backslash, \' for single-quote (redundant but accepted in double-quoted strings), and \" for double-quote (redundant but accepted in single-quoted strings).
In your JavaScript single-quoted string literal you have several instances of this kind of thing, which are meant to be valid JSON double-quoted strings:
"\\\\\"$\\\\\"#,##0"
After the browser has parsed it, the string contains exactly the following characters (including the outer double-quotes, which are unremoved because they are contained in a single-quoted string):
"\\"$\\"#,##0"
You can see that each consecutive pair of backslashes became a single literal backslash, and the two cases of an odd backslash followed by a double-quote each became a literal double-quote.
That is the text that is being passed as an argument to $.parseJSON, which is when the second level of backslash removal occurs. During JSON parsing of the above text, the leading double-quote signifies the start of a JSON string literal, then the pair of backslashes is interpreted as a single literal backslash, and then the immediately following double-quote terminates the JSON string literal. The stuff that follows (dollar, backslash, backslash, etc.) is invalid JSON syntax.
The problem is that you've embedded valid JSON in a JavaScript single-quoted string literal, which, although it happens to be valid JavaScript syntax by fluke (it wouldn't have been if the JSON contained single-quotes, or if you'd tried using double-quotes to delimit the JavaScript string literal), no longer contains valid JSON after being parsed by the browser's JavaScript engine.
To solve the problem, you have to either manually escape the JSON content to be properly embedded in a JavaScript string literal, or load it independently of the JavaScript source, e.g. from a flat file.
Here's a demonstration of how to solve the problem using your latest example code:
$(function() {
try {
alert($.parseJSON('{"key":"\\\\\\\\\\"$\\\\\\\\\\"#,##0"}').key); // works
alert($.parseJSON('{"key":"\\\\\"$\\\\\"#,##0"}').key); // doesn't work
} catch (exception) {
alert(exception.message);
}
});
http://jsfiddle.net/814uw638/2/
Since JavaScript has a simple escaping scheme (e.g. see http://blogs.learnnowonline.com/2012/07/19/escape-sequences-in-string-literals-using-javascript/), it's actually pretty easy to solve this problem in the general case. You just have to decide in advance how you're going to quote the string in JavaScript (single-quotes are a good idea, because strings in JSON are always double-quoted), and then when you prepare the JavaScript source, just add a backslash before every single-quote and every backslash in the embedded JSON. That should guarantee it will be perfectly valid, regardless of the exact JSON content (provided, of course, that it is valid JSON to begin with).
In your original problem, why do you need to do JSONparse in the first place? You could have easily gotten the object you wanted by just doing
var o = { blah }
by manually removing the single quotes you have around the curly braces rather than doing
$.JSONparse('{blah}')
Is there any reason for evaluating the string first (ie var s = '{blah}' and then doing $.JSONparse(s)) which is what your original code was doing? There shouldn't be a case where this is necessary. Since you mentioned somewhere that the string was produced by JSON.stringify, there shouldn't be a scenario where you need to explicitly store it into a variable (ie copy and paste it and put quotes around it).
The main problem here is the string produced by JSON.stringify, which is properly escaped, has been 'evaluated' once when you manually put braces around it. So the key is to make sure the string doesn't get 'evaluated'
Even if you wanted to pass the stringified variable to database or anything, there is no need to explicitly use quotes. One could do
var s = JSON.stringify(obj);
db.save("myobj",s)
var newObj = JSON.parse(db.load("myobj"))
The string is stored verbatim without getting evaluated, so that when you retrieve it, you would have the exact same string.

Browser JSON vs node JSON

I'm attempting to serialize a string that contains escaped strings into JSON. I would have imagined that JSON.stringify() would correctly re-escape those strings and allow me to JSON.parse it. In a simple case, for example:
JSON.parse(JSON.stringify("\\"))
The output from node is "\". The output from the browser is "\" - it seems the browser (chrome in my case) is not correctly converting the double backslash \\ into \\\\.
Why is that?
When you write code, you have to write "\\" (because backslash self is used as escaping), which is a string contains only one backslash ("\\".length is 1).
But when displayed in console or browser, it will displayed as "\".

Categories

Resources