Parsing malformed JSON in JavaScript - javascript

Thanks for looking!
BACKGROUND
I am writing some front-end code that consumes a JSON service which is returning malformed JSON. Specifically, the keys are not surrounded with quotes:
{foo: "bar"}
I have NO CONTROL over the service, so I am correcting this like so:
var scrubbedJson = dirtyJson.replace(/(['"])?([a-zA-Z0-9_]+)(['"])?:/g, '"$2": ');
This gives me well formed JSON:
{"foo": "bar"}
Problem
However, when I call JSON.parse(scrubbedJson), I still get an error. I suspect it may be because the entire JSON string is surrounded in double quotes but I am not sure.
UPDATE
This has been solved--the above code works fine. I had a rogue single quote in the body of the JSON that was returned. I got that out of there and everything now parses. Thanks.
Any help would be appreciated.

You can avoid using a regexp altogether and still output a JavaScript object from a malformed JSON string (keys without quotes, single quotes, etc), using this simple trick:
var jsonify = (function(div){
return function(json){
div.setAttribute('onclick', 'this.__json__ = ' + json);
div.click();
return div.__json__;
}
})(document.createElement('div'));
// Let's say you had a string like '{ one: 1 }' (malformed, a key without quotes)
// jsonify('{ one: 1 }') will output a good ol' JS object ;)
Here's a demo: http://codepen.io/csuwldcat/pen/dfzsu (open your console)

something like this may help to repair the json ..
$str='{foo:"bar"}';
echo preg_replace('/({)([a-zA-Z0-9]+)(:)/','$1"$2"${3}',$str);
Output:
{"foo":"bar"}
EDIT:
var str='{foo:"bar"}';
str.replace(/({)([a-zA-Z0-9]+)(:)/,'$1"$2"$3')

There is a project that takes care of all kinds of invalid cases in JSON https://github.com/freethenation/durable-json-lint

I was trying to solve the same problem using a regEx in Javascript. I have an app written for Node.js to parse incoming JSON, but wanted a "relaxed" version of the parser (see following comments), since it is inconvenient to put quotes around every key (name). Here is my solution:
var objKeysRegex = /({|,)(?:\s*)(?:')?([A-Za-z_$\.][A-Za-z0-9_ \-\.$]*)(?:')?(?:\s*):/g;// look for object names
var newQuotedKeysString = originalString.replace(objKeysRegex, "$1\"$2\":");// all object names should be double quoted
var newObject = JSON.parse(newQuotedKeysString);
Here's a breakdown of the regEx:
({|,) looks for the beginning of the object, a { for flat objects or , for embedded objects.
(?:\s*) finds but does not remember white space
(?:')? finds but does not remember a single quote (to be replaced by a double quote later). There will be either zero or one of these.
([A-Za-z_$\.][A-Za-z0-9_ \-\.$]*) is the name (or key). Starts with any letter, underscore, $, or dot, followed by zero or more alpha-numeric characters or underscores or dashes or dots or $.
the last character : is what delimits the name of the object from the value.
Now we can use replace() with some dressing to get our newly quoted keys:
originalString.replace(objKeysRegex, "$1\"$2\":")
where the $1 is either { or , depending on whether the object was embedded in another object. \" adds a double quote. $2 is the name. \" another double quote. and finally : finishes it off.
Test it out with
{keyOne: "value1", $keyTwo: "value 2", key-3:{key4:18.34}}
output:
{"keyOne": "value1","$keyTwo": "value 2","key-3":{"key4":18.34}}
Some comments:
I have not tested this method for speed, but from what I gather by reading some of these entries is that using a regex is faster than eval()
For my application, I'm limiting the characters that names are allowed to have with ([A-Za-z_$\.][A-Za-z0-9_ \-\.$]*) for my 'relaxed' version JSON parser. If you wanted to allow more characters in names (you can do that and still be valid), you could instead use ([^'":]+) to mean anything other than double or single quotes or a colon. You can have all sorts of stuff in here with this expression, so be careful.
One shortcoming is that this method actually changes the original incoming data (but I think that's what you wanted?). You could program around that to mitigate this issue - depends on your needs and resources available.
Hope this helps.
-John L.

How about?
function fixJson(json) {
var tempString, tempJson, output;
tempString = JSON.stringify(json);
tempJson = JSON.parse(tempString);
output = JSON.stringify(tempJson);
return output;
}

Related

Converting double backslash into backslash used for escaping?

I have a javascript string that contains \\n. When this is displayed out to a webpage, it shows literally as \n (as expected). I used text.replace(/\\n/g, '\n') to get it to act as a newline (which is the desired format).
I'm trying to determine the best way to catch all such instances (including similar instances like tabs \\t -> \t).
Is there a way to use regex (can't determine how to copy the matched wildcard letter to use in the replacement string) or anything else?
As mentioned by dandavis in the comments in original post, JSON.parse() ended up working for me.
i.e. text = JSON.parse(text);
Second answer, the first was wrong.
JavaScript works on a special way in this case. Read this for more details.
In your case it should be one of this ...
var JSCodeNewLine = "\u000A";
text.replace(/\\n/g, JSCodeNewLine);
var JSCodeCarriageReturnNewLine = "\u000D\u000A";
text.replace(/\\n/g, JSCodeCarriageReturnNewLine);

Dealing with the Cyrillic encoding in Node.Js / Express App

In my app a user submits text through a form's textarea and this text is passed on to the app and is then processed by jsesc library, which escapes javascript strings.
The problem is that when I type in a text in Russian, such as
нам #интересны наши #идеи
what i get is
'\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438'
I then need to pass this data through FlowDock to extract hashtags and FlockDock just does not recognize it.
Can someone please tell me
1) What is the need for converting it into that representation;
2) If it makes sense to convert it back to cyrillic encoding for FlowDock and for the database, or shall I keep it in Unicode and try to make FlowDock work with it?
Thanks!
UPDATE
The complete script is:
result = getField(req, field);
result = S(result).trim().collapseWhitespace().s;
// at this point result = "нам #интересны наши #идеи"
result = jsesc(result, {
'quotes': 'double'
});
// now i end up with Unicode as above above (\u....)
var hashtags = FlowdockText.extractHashtags(result);
FlowDock receives the result which is
\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438
And doesn't extract hashtags from it...
These are 2 representations of the same string:
'нам #интересны наши #идеи' === '\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438'
looks like flowdock-text doesn't work well with non-ASCII characters
UPD: Tried, actually works well:
fdt.extractHashtags('\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438');
You shouldn't have used escaping in the first place, it gives you string literal representation (suits for eval, etc), not a string.
UPD2: I've reduced you code to the following:
var jsesc = require('jsesc');
var fdt = require('flowdock-text');
var result = 'нам #интересны наши #идеи';
result = jsesc(result, {
'quotes': 'double'
});
var hashtags = fdt.extractHashtags(result);
console.log(hashtags);
As I said, the problem is with jsesc: you don't need it. It returns javascript-encoded string. You need when you are doing eval with concatenation to protect from code injection, or something like this. For example if you add result = eval('"' + result + '"');, it will work.
What is the need for converting it into that representation?
jsesc is a JavaScript library for escaping JavaScript strings while generating the shortest possible valid ASCII-only output. Here’s an online demo.
This can be used to avoid mojibake and other encoding issues, or even to avoid errors when passing JSON-formatted data (which may contain U+2028 LINE SEPARATOR, U+2029 PARAGRAPH SEPARATOR, or lone surrogates) to a JavaScript parser or an UTF-8 encoder, respectively.
Sounds like in this case you don’t intend to use jsesc at all.
Try this:
decodeURIComponent("\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438");

How to process a javascript function call that returns a string with quotes in it

I'm having an issue with a javascript call to an api function in NetSuite that returns a string with quotes in it. An error is thrown each time the call is made.
var selling_point_1 = "<%=getCurrentAttribute('item','custitemsellingpoint1')%>";
when looking in the debugger, this evaluates to:
var selling_point_1 = "Product Dimensions: H:14" W:24"";
Any string function (like .length or charAt(0) ) on this also throws an error. I have no control over what the function call returns, so i need to know how to handle embedded quotes.
Any help would be greatly appreciated, John
Although not the most robust method you could use:
var selling_point_1 = escape("<%=getCurrentAttribute('item','custitemsellingpoint1')%>");
This is actually for URI escaping, but will get rid of the pesky double quotes, plus you can use unescape to get the original format back. As suggested
var selling_point_1 = '<%=getCurrentAttribute(\'item\',\'custitemsellingpoint1\')%>';
Should also work in your case.
See this thread for someone dealing with roughly the same issue. The short answer is that you need to run some kind of escape function in the server-side code (i.e., within the <%=...%> block) so that only escaped values get inserted into the client-side code. All of the solutions below can handle an unlimited number of single and double quotes.
My first suggestion is to try:
var selling_point_1 = decodeURI("<%=Server.URLEncode(getCurrentAttribute('item','custitemsellingpoint1'))%>");
This will produce server-side JS that looks like:
var selling_point_1 = decodeURI("Product Dimensions: H:14%22 W:24%22");
The decodeURI JavaScript function will convert the %22 back into quotes and the correct string will be stored in selling_point_1.
If that fails, you might also try something like:
var selling_point_1 = unescape("<%=HttpServerUtility.HtmlEncode(getCurrentAttribute('item','custitemsellingpoint1'))%>");
which operates similarly, but tuuns your quotes into \" sequences, which will be converted back into ordinary quotes by JavaScript's unescape.

Is there a difference between single/double quoted strings passed to javascript's eval?

i have a server message sent via web-sockets. that message is a json (validated) string.
when it gets to the browser i check that it is a string with typeof(data) and it tells me that it is, in fact, a string. When finally i do var some_obj = eval( '(' + data + ')' );
it gives me an Uncaught SyntaxError: Unexpected token ILLEGAL error.
also, before using eval(), i console.log(data) and it displays correctly, although an alert(data) won't show anything on the dialog.
i can't understand what's happening.
i also tried var myJson = '{ "x": "Hello, World!", "y": [1, 2, 3] }'; and then var myObj = eval( '(' + myJson + ')' ); and it works, so i really can't understand why mine can't be evaluated (parsed).
the string received via web-sockets is this:
received 37 bytes » { "cmd": "setname", "params": "ok" }
where data = { "cmd": "setname", "params": "ok" } (with quotes i suppose, because of typeof(data) being = string).
any tips? thanks
edit1 » with web-sockets, you have to prepend a null char (0 ascii) and append a escape char (255 ascii) to the output string from the server. i assume the client (browser) as it implements web-sockets must deal with this and unwrap the string correctly (as the standard) and as i do in my server. thing is, there might be some escape char left and it doesn't deal with it correctly. but the problem only started when i tried to send json strings to be eval()ed. otherwise they work properly as any other string.
No, there's no difference between " and ' for quoting strings other than that you can use " without escaping it inside a string quoted with ' and vice-versa. But I don't think that (the title of your question) actually has anything to do with the problem you're having.
Re your edit, if you want to ensure that there are no characters with the value 0 or 255 in the string, you can do that like this:
data = data.replace(/[\u0000\u00ff]/g, '');
...before passing it to eval. And it sounds like you might want to do that, since your thing is saying it's received 37 bytes but the string is only 36 characters long and doesn't use any characters requiring two bytes (or perhaps it just has a space at the end I can't see).
Off-topic: It's best not to use eval to deserialize JSON. Instead, use a library that handles it directly. Crockford has two different non-eval libs on his github page, one (json_parse.js) that uses a recursive-descent parser and another (json_parse_state.js) that uses a state machine. If you really, really want to use eval to parse JSON, take a look at his implementation in json2.js, which at least takes a couple of steps to weed out malicious stuff.
Off-topic 2: Re
where data = { "cmd": "setname", "params": "ok" } (with quotes i suppose, because of typeof(data) being = string).
We only use quotes to quote string literals in code; there are no quotes around actual string data itself in memory. If I do this:
var foo = "bar";
...the string that foo points to consists entirely of the characters b, a, and r. There are no quotes; the quotes are only there in the code to tell the parser that what follows is a string literal.

jQuery.parseJSON throws “Invalid JSON” error due to escaped single quote in JSON

I’m making requests to my server using jQuery.post() and my server is returning JSON objects (like { "var": "value", ... }). However, if any of the values contains a single quote (properly escaped like \'), jQuery fails to parse an otherwise valid JSON string. Here’s an example of what I mean (done in Chrome’s console):
data = "{ \"status\": \"success\", \"newHtml\": \"Hello \\\'x\" }";
eval("x = " + data); // { newHtml: "Hello 'x", status: "success" }
$.parseJSON(data); // Invalid JSON: { "status": "success", "newHtml": "Hello \'x" }
Is this normal? Is there no way to properly pass a single quote via JSON?
According to the state machine diagram on the JSON website, only escaped double-quote characters are allowed, not single-quotes. Single quote characters do not need to be escaped:
Update - More information for those that are interested:
Douglas Crockford does not specifically say why the JSON specification does not allow escaped single quotes within strings. However, during his discussion of JSON in Appendix E of JavaScript: The Good Parts, he writes:
JSON's design goals were to be minimal, portable, textual, and a subset of JavaScript. The less we need to agree on in order to interoperate, the more easily we can interoperate.
So perhaps he decided to only allow strings to be defined using double-quotes since this is one less rule that all JSON implementations must agree on. As a result, it is impossible for a single quote character within a string to accidentally terminate the string, because by definition a string can only be terminated by a double-quote character. Hence there is no need to allow escaping of a single quote character in the formal specification.
Digging a little bit deeper, Crockford's org.json implementation of JSON for Java is more permissible and does allow single quote characters:
The texts produced by the toString methods strictly conform to the JSON syntax rules. The constructors are more forgiving in the texts they will accept:
...
Strings may be quoted with ' (single quote).
This is confirmed by the JSONTokener source code. The nextString method accepts escaped single quote characters and treats them just like double-quote characters:
public String nextString(char quote) throws JSONException {
char c;
StringBuffer sb = new StringBuffer();
for (;;) {
c = next();
switch (c) {
...
case '\\':
c = this.next();
switch (c) {
...
case '"':
case '\'':
case '\\':
case '/':
sb.append(c);
break;
...
At the top of the method is an informative comment:
The formal JSON format does not allow strings in single quotes, but an implementation is allowed to accept them.
So some implementations will accept single quotes - but you should not rely on this. Many popular implementations are quite restrictive in this regard and will reject JSON that contains single quoted strings and/or escaped single quotes.
Finally to tie this back to the original question, jQuery.parseJSON first attempts to use the browser's native JSON parser or a loaded library such as json2.js where applicable (which on a side note is the library the jQuery logic is based on if JSON is not defined). Thus jQuery can only be as permissive as that underlying implementation:
parseJSON: function( data ) {
...
// Attempt to parse using the native JSON parser first
if ( window.JSON && window.JSON.parse ) {
return window.JSON.parse( data );
}
...
jQuery.error( "Invalid JSON: " + data );
},
As far as I know these implementations only adhere to the official JSON specification and do not accept single quotes, hence neither does jQuery.
If you need a single quote inside of a string, since \' is undefined by the spec, use \u0027 see http://www.utf8-chartable.de/ for all of them
edit: please excuse my misuse of the word backticks in the comments. I meant backslash. My point here is that in the event you have nested strings inside other strings, I think it can be more useful and readable to use unicode instead of lots of backslashes to escape a single quote. If you are not nested however it truly is easier to just put a plain old quote in there.
I understand where the problem lies and when I look at the specs its clear that unescaped single quotes should be parsed correctly.
I am using jquery`s jQuery.parseJSON function to parse the JSON string but still getting the parse error when there is a single quote in the data that is prepared with json_encode.
Could it be a mistake in my implementation that looks like this (PHP - server side):
$data = array();
$elem = array();
$elem['name'] = 'Erik';
$elem['position'] = 'PHP Programmer';
$data[] = json_encode($elem);
$elem = array();
$elem['name'] = 'Carl';
$elem['position'] = 'C Programmer';
$data[] = json_encode($elem);
$jsonString = "[" . implode(", ", $data) . "]";
The final step is that I store the JSON encoded string into an JS variable:
<script type="text/javascript">
employees = jQuery.parseJSON('<?=$marker; ?>');
</script>
If I use "" instead of '' it still throws an error.
SOLUTION:
The only thing that worked for me was to use bitmask JSON_HEX_APOS to convert the single quotes like this:
json_encode($tmp, JSON_HEX_APOS);
Is there another way of tackle this issue? Is my code wrong or poorly written?
Thanks
When You are sending a single quote in a query
empid = " T'via"
empid =escape(empid)
When You get the value including a single quote
var xxx = request.QueryString("empid")
xxx= unscape(xxx)
If you want to search/ insert the value which includes a single quote in a query
xxx=Replace(empid,"'","''")
Striking a similar issue using CakePHP to output a JavaScript script-block using PHP's native json_encode. $contractorCompanies contains values that have single quotation marks and as explained above and expected json_encode($contractorCompanies) doesn't escape them because its valid JSON.
<?php $this->Html->scriptBlock("var contractorCompanies = jQuery.parseJSON( '".(json_encode($contractorCompanies)."' );"); ?>
By adding addslashes() around the JSON encoded string you then escape the quotation marks allowing Cake / PHP to echo the correct javascript to the browser. JS errors disappear.
<?php $this->Html->scriptBlock("var contractorCompanies = jQuery.parseJSON( '".addslashes(json_encode($contractorCompanies))."' );"); ?>
I was trying to save a JSON object from a XHR request into a HTML5 data-* attribute. I tried many of above solutions with no success.
What I finally end up doing was replacing the single quote ' with it code ' using a regex after the stringify() method call the following way:
var productToString = JSON.stringify(productObject);
var quoteReplaced = productToString.replace(/'/g, "'");
var anchor = '<a data-product=\'' + quoteReplaced + '\' href=\'#\'>' + productObject.name + '</a>';
// Here you can use the "anchor" variable to update your DOM element.
Interesting. How are you generating your JSON on the server end? Are you using a library function (such as json_encode in PHP), or are you building the JSON string by hand?
The only thing that grabs my attention is the escape apostrophe (\'). Seeing as you're using double quotes, as you indeed should, there is no need to escape single quotes. I can't check if that is indeed the cause for your jQuery error, as I haven't updated to version 1.4.1 myself yet.

Categories

Resources