I have a dict that contains some Unicode strings (among other objects). I'd like to save this dict as a JSON file, and then display the content of this via AJAX.
If final_res is the dict, I usually do this:
json.dumps(final_res, ensure_ascii=True)
In the result, I see strings like:
"l\\u00a0m\\u00fcdale"
I imagine these are unicode encoded characters. But when I try to display them in Javascript, these get printed with the slashes, instead of the encoded unicode letter.
Is there something I am doing wrong in Javascript for displaying these properly? Or should I do decode these into ASCII in Python, before outputting to JSON?
UPDATE:
Based on the discussion in the comments below with #spectra, I realized that json.dumps should not be outputting double slashes. When I parse this in the browser, this prints it as a literal single slash.
I am trying to figure out a way to fix this with the json module, not sure why it's happening.
The solution for me was to save the result of json.dumps to the database with the "single-slashed" version. I did it by calling print on the result of json.dumps and then copy-pasting that to the database.
You can encode the json file in UTF8 instead of escaped charaters:
json.dumps(final_res,ensure_ascii=False).encode('utf8')
For example
print json.dumps({'name':u'l\u00a0m\u00fcdale'},ensure_ascii=False).encode('utf8')
# {"name": "l müdale"}
Then in your client side AJAX code, set the encoding to 'utf8' :
How to set encoding in .getJSON JQuery
$.ajax({ contentType: "application/json; charset=utf-8", ... })
Related
I'm parsing a Uint8 array that is an HTML document. It contains a script tag which in turn contains JSON data that I would like to parse.
I first converted the array to text:
data = Buffer.from(str).toString('utf8')
I then searched for the script tag, and extracted the string containing the JSON:
... {\"phrase\":\"Go to \"California\"\",\"color\":\"red\",\"html\":\"<div class=\"myclass\">Ok</div>\"} ...
I then did a replace to clean it up.
data = data.replace(/\\"/g, "\"").replace(/\\/g, "").
{"phrase":"Go to "California"","color":"red","html":"<div class="myclass">Ok</div>"}
I tried to parse using JSON.parse() and got an error because the attributes contain quotes. Is there a way to process this further using a regex ? Or perhaps a library? I am working with Cheerio, so can use that if helpful.
The escape characters are necessary if you want to parse the JSON. The embedded quotes would need to be double escaped, so the extracted text isn't even valid JSON.
"{\"phrase\":\"Go to \\\"California\\\"\",\"color\":\"red\",\"html\":\"<div class=\\\"myclass\\\">Ok</div>\"}"
or, using single quotes:
'{"phrase":"Go to \\"California\\"","color":"red","html":"<div class=\\"myclass\\">Ok</div>"}'
Thanks.
After some more tinkering around, I realized that I should have encoded the data to Uint8 at the source (a Lambda function) before transmitting it for further processing. So now, I have:
Text
Encoded text to Uint8
Return from Lambda function.
Decode from Uint8 to text
Process readily as no escape characters.
Before, I was skipping step 2. And so Lambda was encoded the text however it does by default.
I'm analysing the way a server of a search page works (by inspecting element) and I could conclude that the request is sent with a POST with JSON as parameters.
Then I simulated the same POST (with the same parameters) using Insomnia. It was successful, but the response JSON came as string and inside the JSON, variables that uses quotes now uses \" instead.
An example of the JSON response:
"{\"AudienceRefiner\":{\"ItemCount\":0}}"
How can I read this on Python?
It is just an escape character, Consider using the ast library in order to parse it into an object.
literal_eval:
Safely evaluate an expression node or a Unicode or Latin-1 encoded
string containing a Python literal or container display.
import ast
st = "{\"AudienceRefiner\":{\"ItemCount\":0}}"
obj = ast.literal_eval(st)
print (obj)
>>> {'AudienceRefiner': {'ItemCount': 0}}
For more information about it, read ast.literal_eval
I am not sure where you got "{\"AudienceRefiner\":{\"ItemCount\":0}}" I guess that its not is python but webbrowser. Anyhow just use json if the json is in a string object
import json
di = json.loads("{\"AudienceRefiner\":{\"ItemCount\":0}}")
I came across this strange JSON which I can't seem to decode.
To simplify things, let's say it's a JSON string:
"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"
After decoding it should look as following:
└── mystring
JS or PHP doesn't seem to convert it correctly.
js> JSON.parse('"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"')
ffe2ff94ff94ffe2ff94ff80ffe2ff94ff80 mystring
PHP behaves the same
php> json_decode('"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"')
ffe2ff94ff94ffe2ff94ff80ffe2ff94ff80 mystring
Any ideas how to properly parse this JSON string would be welcome.
It is not valid JSON string - JSON supports only 4 hex digits after \u. Results from both PHP and JS are correct.
It is not possible decode this using standard functions.
Where did you get this JSON string?
About correct json for string you want to get - it should be "\u2514\u2500\u2500 mystring", or just "└── mystring" (json supports any unicode characters in strings except " and \).
Also if you need to encode some character that require more than two bytes - it will result in two escape codes for example "𩄎" would be "\ud864\udd0e" when escaped.
So, If you really need to decode string above - you can fix it before decoding, replacing \uffffffe2 by \uffff\uffe2 via regexp (for js it would be something like: s.replace(/(\\u[A-Fa-f0-9]{4})([A-Fa-f0-9]{4})/gi,'$1\\u$2') ).
But anyway character codes in string specified above does not look right.
I am getting the following json object when I call the URL from Browser which I expect no data in it.
"{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}"
However, when I tried to call it in javascript it gives me error Parsing Json message
dspservice.callService(URL, "GET", "", function (data) {
var dataList = JSON.parse(data);
)};
This code was working before I have no idea why all of a sudden stopped working and throwing me error.
You say the server is returning the JSON (omitting the enclosing quotes):
{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}
This is invalid JSON. The quote marks in JSON surrounding strings and property names should not be preceded by a backslash. The backslash in JSON is strictly for inserting double quote marks inside a string. (It can also be used to escape other characters inside strings, but that is not relevant here.)
Correct JSON would be:
{"data":[], "SkipToken":"", "top":""}
If your server returned this, it would parse correctly.
The confusion here, and the reports by other posters that it seems like your string should work, lies in the fact that in a simple-minded test, where I type this string into the console:
var x = "{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}";
the JavaScript string literal escaping mechanism, which is entirely distinct from the use of escapes in JSON, results in a string with the value
{"data":[], "SkipToken":"", "top":""}
which of course JSON.parse can handle just fine. But Javascript string escaping applies to string literals in source code, not to things coming down from the server.
To fix the server's incorrectly-escaped JSON, you have two possibilities. One is to tell the server guys they don't need to (and must not) put backslashes before quote marks (except for quote marks inside strings). Then everything will work.
The other approach is to undo the escaping yourself before handing it off to JSON.parse. A first cut at this would be a simple regexp such as
data.replace(/\\"/g, '"')
as in
var dataList = JSON.parse(data.replace(/\\"/g, '"')
It might need additional tweaking depending on how the server guys are escaping quotes inside strings; are they sending \"\\"\", or possibly \"\\\"\"?
I cannot explain why this code that was working suddenly stopped working. My best guess is a change on the server side that started escaping the double quotes.
Since there is nothing wrong with the JSON string you gave us, the only other explanation is that the data being passed to your function is something other than what you listed.
To test this hypothesis, run the following code:
dspservice.callService(URL, "GET", "", handler(data));
function handler(data) {
var goodData = "{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}";
alert(goodData); // display the correct JSON string
var goodDataList = JSON.parse(goodData); // parse good string (should work)
alert(data); // display string in question
var dataList = JSON.parse(data); // try to parse it (should fail)
}
If the goodData JSON string can be parsed with no issues, and data appears to be incorrectly-formatted, then you have the answer to your question.
Place a breakpoint on the first line of the handler function, where goodData is defined. Then step through the code. From what you told me in your comments, it is still crashing during a JSON parse, but I'm willing to wager that it is failing on the second parse and not the first.
Did you mean that your JSON is like this?
"{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}"
Then data in your callback would be like this:
'"{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}"'
Because data is the fetched text content string.
You don't have to add extra quotes in your JSON:
{"data":[], "SkipToken":"", "top":""}
I am getting an issue with Javascript Encoding and Decoding. I have a json object which contains a string encoded in UTF-8 like 'R\xc3\xa9union'. To ensure that the Javascript file correctly displays the string, I'm adding an attribute charset to the script tag. The json object is in countries.js. I'm including countries.js as <script src="js/countries.js" charset="UTF-8"></script> and yet it is still being displayed as Réunion instead of Réunion. Any suggestion?
Use escape() combined with decodeURIComponent():
decodeURIComponent(escape('R\xc3\xa9union'));
That should do the trick:
escape('R\xc3\xa9union'); // "R%C3%A9union"
decodeURIComponent("R%C3%A9union"); // "Réunion"
Now, you said you couldn't do this manually for all the places you need strings from the JSON. I really don't know a way to automate this without re-building the JSON with JS, so I'd suggest writing a little "wrapper" function to decode on the fly:
function dc(str){
return decodeURIComponent(escape(str));
}
You can then decode the required strings with minimal effort:
var myString = dc(myJson["some"]["value"]);
Now, what else could work, but is a little more risky: JSON.stringify() the entire object, decode that using the 2 functions, then JSON.parse() it again.