Keeping escaped unicode characters with JSON.stringify of JSON.parse - javascript

I have an input JSON like this (which really contains the literal values "\u2013" (the encoded form of a unicode character)):
{"source":"Subject: NEED: 11/5 BNA-MSL \u2013 1200L Departure - 1 Pax"}
I read it with JSON.parse and it reads the \u2013 as –, which is fine for display in my app.
However, I need to export again the same JSON, to send it down to some other app. I want to keep the same format and have back the \u2013 into the JSON. I am doing JSON.stringify, but it keeps the – in the output.
Any idea what I could do to keep the \u syntax?

Using a replacer function in a JSON.stringify call didn't work - strings returned from the replacer with an escaped backslash produce a double backslash in output, and a single backslashed character is unescaped in output if possible.
Simply re-escaping the stringify result has potential:
const obj = {"source":"Subject: NEED: 11/5 BNA-MSL \u2013 1200L Departure - 1 Pax"}
console.log(" stringify: ", JSON.stringify( obj));
console.log("& replaceAll: ", JSON.stringify(obj).replaceAll('\u2013', '\\u2013'));
using more complex string modifications as necessary.
However this looks very like an X solution to an X-Y problem. Better might be to fix the downstream parsing to handle JSON text as JSON text and not try to use it in raw form - particularly given that JSON text in encoded in utf-8 and can handle non-ASCII characters without special treatment.

Related

How to display unicode / hexadecimal emoji and octal literals in HTML using Vue.js

So I'm getting such response from webserver:
"\ud83d\ude48\ud83d\ude02\ud83d\ude30\ud83d\ude09\ud83d\udc4f\ud83c\udffd\ud83d\udc4c\ud83c\udffd\ud83d\udd1d\u2714\ufe0f\ud83d\ude42 \344\366\374\337\u015b\u0161"
which after decoding should look like this:
🙈😂😰😉👏🏽👌🏽🔝✔️🙂 äöüßśš
äöüß are encoded as octal literals \344\366\374\337
To display correctly this message (not encoded plain text) I've used:
{{ JSON.parse('"' + messageContent.message + '"') }}
And it worked perfectly for escaped unicode values but when octal literals appear it's not, so here is the problem - ES6 won't allow for using octal literals since they are deprecated, and an error occurs, so what I've done is just finding with regex for octal literals and then parse them using: String.fromCharCode(parseInt(parseInt(val.replace('\\', ''), 8), 10)) so that from eg: \344 I'm getting ä. After I replace octals, I have to search for any unicode characters and again, parse it one by one using JSON.parse(`"${val}"`) (here is the same case as described below - if I hardcode a string and return just \ud83d\ude48 I don't have to parse it with JSON.parse, it just returns 🙈). I believe it's not optimal solution.
The other strange thing for me is when I try display message directly from server response (even if it does not contain any octal literals) using
{{ response.message }} it will print as normal string, but when I create new variable and assign exact the same value as I receive from server:
message='\ud83d\ude48\ud83d\ude02\ud83d\ude30\ud83d\ude09\ud83d\udc4f\ud83c\udffd\ud83d\udc4c\ud83c\udffd\ud83d\udd1d\u2714\ufe0f\ud83d\ude42'
and then display it
{{ message }} displayed value is 🙈😂😰😉👏🏽👌🏽🔝✔️🙂.
And last thing: even when I use my algorithm i'm just looking for text that match /\\[[a-zA-Z0-9]{1,5}\\[[a-zA-Z0-9]{1,5}/g sometimes it does not parse unicode well - eg: if user change a skin color, the unicode message would be: \ud83d\udc4d\ud83c\udffd, decoded: 👍🏽, but with this regex it would be 👍�\udffd
It's possible to make some small changes on the backend side if it's necessary, but it's used also by mobile apps that are finished so that changes should not affect them.
Thanks for any help.
Try manually decoding the unicode escape sequences (\uXXXX) and octal escape sequences (\XXX) as follows:
const response = '\\ud83d\\ude48\\ud83d\\ude02\\ud83d\\ude30\\ud83d\\ude09\\ud83d\\udc4f\\ud83c\\udffd\\ud83d\\udc4c\\ud83c\\udffd\\ud83d\\udd1d\\u2714\\ufe0f\\ud83d\\ude42 \\344\\366\\374\\337\\u015b\\u0161'
const decoded = response
.replace(/\\u(....)/g, (match, p1) => String.fromCharCode(parseInt(p1, 16)))
.replace(/\\(\d{3})/g, (match, p1) => String.fromCharCode(parseInt(p1, 8)))
console.log(decoded)
The server is sending you a string containing the literal characters \ud83d\ude48 (and so on), so the string must be explicitly decoded somehow by converting the escape sequences into the unicode characters they represent. On the other hand, if a string literal in JavaScript code contains the characters \ud83d\ude48 then it will be automatically decoded into 🙈.
Observe the difference between these two strings:
console.log('\ud83d\ude48')
console.log('\\ud83d\\ude48')

JSON unicode characters conversion

I came across this strange JSON which I can't seem to decode.
To simplify things, let's say it's a JSON string:
"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"
After decoding it should look as following:
└── mystring
JS or PHP doesn't seem to convert it correctly.
js> JSON.parse('"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"')
ffe2ff94ff94ffe2ff94ff80ffe2ff94ff80 mystring
PHP behaves the same
php> json_decode('"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"')
ffe2ff94ff94ffe2ff94ff80ffe2ff94ff80 mystring
Any ideas how to properly parse this JSON string would be welcome.
It is not valid JSON string - JSON supports only 4 hex digits after \u. Results from both PHP and JS are correct.
It is not possible decode this using standard functions.
Where did you get this JSON string?
About correct json for string you want to get - it should be "\u2514\u2500\u2500 mystring", or just "└── mystring" (json supports any unicode characters in strings except " and \).
Also if you need to encode some character that require more than two bytes - it will result in two escape codes for example "𩄎" would be "\ud864\udd0e" when escaped.
So, If you really need to decode string above - you can fix it before decoding, replacing \uffffffe2 by \uffff\uffe2 via regexp (for js it would be something like: s.replace(/(\\u[A-Fa-f0-9]{4})([A-Fa-f0-9]{4})/gi,'$1\\u$2') ).
But anyway character codes in string specified above does not look right.

JS - JSON.parse - preserve special characters

I'm running a NodeJS app that gets certain posts from an API.
When trying to JSON.parse with special characters in, the JSON.parse would fail.
Special characters can be just any other language, emojis etc.
Parsing works fine when posts don't have special characters.
I need to preserve all of the text, I can't just ignore those characters since I need to handle every possible language.
I'm getting the following error:
"Unexpected token �"
Example of a text i'm supposed to be able to handle:
"summary": "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT&メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦"�アイルランドと英国(他は専門外)※Togetterコメ欄と陰謀論が嫌いです。"
How can I properly parse such a text?
Thanks
You have misdiagnosed your problem, it has nothing to do with that character.
Your code contains an unescaped " immediately before the special character you think is causing the problem. The early " is prematurely terminating the string.
If you insert a backslash to escape the ", your string can be parsed as JSON just fine:
x = '{"summary": "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT&メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦\\"�アイルランドと英国(他は専門外)※Togetterコメ欄と陰謀論が嫌いです。"}';
console.log(JSON.parse(x));
You need to pass a string not as an object.
Example
JSON.parse('{"summary" : "a"}');
In your case it should be like this
JSON.parse(
'{"summary" : "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT&メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦�アイルランドと英国(他は専門外)※Togetterコメ欄と陰謀論が嫌いです。"}')

Converting JSON strings with escaped Unicode characters to JavaScript objects

I have a JSON string which contains an escaped Unicode character. The JSON includes this snippet:
I co-ordinate our Chat Literacy network \u2013 an online group for practitioners of Information Literacy
The \u2013 is a long dash.
I'm using
var theObject = eval ("(" + jsonString + ")");
to convert the JSON string to a JavaScript object. I need to use a version of SpiderMonkey that doesn't have a direct JSON to Object method in it.
After conversion, the character in question becomes the Unicode control character \0013 which is an invalid UTF-8 character.
Is there another way I can convert the JSON to an object which will preserve the correct long-dash character? Maybe some other JSON to Object method I can load?
This happens with some other characters also, like curly quotes.
Thanks,
Doug
eval() is evil. Stay away from it.
Try using JSON 3: http://bestiejs.github.io/json3/

JSON.parse failing on valid Json. Have escaped control characters.If

I've escaped control characters and am feeding my validated JSON into JSON.parse and jQuery.parseJSON. Both are giving the same result.
Getting error message "Unexpected token $":
$(function(){
try{
$.parseJSON('"\\\\\"$\\\\\"#,##0"');
} catch (exception) {
alert(exception.message);
}
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>
Thanks for checking out this issue.
What's happening here is that there are two levels of backslash removal being applied to the string. The first is done by the browser's JavaScript engine when it parses the single-quoted string. In JavaScript, single-quoted strings and double-quoted strings are exactly equivalent (other than the fact that single-quotes must be backslash-escaped in single-quoted strings and double-quotes must be backslash-escaped in double-quoted strings); both types of strings take backslash escape codes such as \\ for backslash, \' for single-quote (redundant but accepted in double-quoted strings), and \" for double-quote (redundant but accepted in single-quoted strings).
In your JavaScript single-quoted string literal you have several instances of this kind of thing, which are meant to be valid JSON double-quoted strings:
"\\\\\"$\\\\\"#,##0"
After the browser has parsed it, the string contains exactly the following characters (including the outer double-quotes, which are unremoved because they are contained in a single-quoted string):
"\\"$\\"#,##0"
You can see that each consecutive pair of backslashes became a single literal backslash, and the two cases of an odd backslash followed by a double-quote each became a literal double-quote.
That is the text that is being passed as an argument to $.parseJSON, which is when the second level of backslash removal occurs. During JSON parsing of the above text, the leading double-quote signifies the start of a JSON string literal, then the pair of backslashes is interpreted as a single literal backslash, and then the immediately following double-quote terminates the JSON string literal. The stuff that follows (dollar, backslash, backslash, etc.) is invalid JSON syntax.
The problem is that you've embedded valid JSON in a JavaScript single-quoted string literal, which, although it happens to be valid JavaScript syntax by fluke (it wouldn't have been if the JSON contained single-quotes, or if you'd tried using double-quotes to delimit the JavaScript string literal), no longer contains valid JSON after being parsed by the browser's JavaScript engine.
To solve the problem, you have to either manually escape the JSON content to be properly embedded in a JavaScript string literal, or load it independently of the JavaScript source, e.g. from a flat file.
Here's a demonstration of how to solve the problem using your latest example code:
$(function() {
try {
alert($.parseJSON('{"key":"\\\\\\\\\\"$\\\\\\\\\\"#,##0"}').key); // works
alert($.parseJSON('{"key":"\\\\\"$\\\\\"#,##0"}').key); // doesn't work
} catch (exception) {
alert(exception.message);
}
});
http://jsfiddle.net/814uw638/2/
Since JavaScript has a simple escaping scheme (e.g. see http://blogs.learnnowonline.com/2012/07/19/escape-sequences-in-string-literals-using-javascript/), it's actually pretty easy to solve this problem in the general case. You just have to decide in advance how you're going to quote the string in JavaScript (single-quotes are a good idea, because strings in JSON are always double-quoted), and then when you prepare the JavaScript source, just add a backslash before every single-quote and every backslash in the embedded JSON. That should guarantee it will be perfectly valid, regardless of the exact JSON content (provided, of course, that it is valid JSON to begin with).
In your original problem, why do you need to do JSONparse in the first place? You could have easily gotten the object you wanted by just doing
var o = { blah }
by manually removing the single quotes you have around the curly braces rather than doing
$.JSONparse('{blah}')
Is there any reason for evaluating the string first (ie var s = '{blah}' and then doing $.JSONparse(s)) which is what your original code was doing? There shouldn't be a case where this is necessary. Since you mentioned somewhere that the string was produced by JSON.stringify, there shouldn't be a scenario where you need to explicitly store it into a variable (ie copy and paste it and put quotes around it).
The main problem here is the string produced by JSON.stringify, which is properly escaped, has been 'evaluated' once when you manually put braces around it. So the key is to make sure the string doesn't get 'evaluated'
Even if you wanted to pass the stringified variable to database or anything, there is no need to explicitly use quotes. One could do
var s = JSON.stringify(obj);
db.save("myobj",s)
var newObj = JSON.parse(db.load("myobj"))
The string is stored verbatim without getting evaluated, so that when you retrieve it, you would have the exact same string.

Categories

Resources