Weird symbol in string breaks JSON.parse, but seems to be undetectable? - javascript

The description field is a text area field, somehow a user ended up with some strange little symbol in it. (see image)
When I grab this from the server, I assemble my data from the objects I grab, which includes the description on this object, and turn it into JSON string, and send it to my javascript.
From javascript, I JSON.parse it. But that weird little symbol causes the parse to fail. But, when you look at it, there is no character there or anything, yet it throws an undefined character in JSON.parse.
My response from the server has the description like this:
"blahblahtesttext\r\nslkdjf",
There is nothing but the expected \r\n......
But it has an unexpected token where that symbol is.
{"value":"blah blah test text//Symbol should be here, but there is nothing and it forces it to the next line
\r\nslkdjf","fieldType":"TEXTAREA","field":"Description"}
Where that symbol forces the string to the next line, which causes the issue.
Because I can't see what the actual character is... I do not know how to handle this.
Is there something that can strip out invalid characters in a JSON string so the parse works? I don't want to just try/catch this as it would toss out everything, I just want that weird invalid symbol to be stripped out.
Or is there a way to see what the actual character is that JSON.parse does not like?

 <-- here is that symbol for copy pasting into a string if you want to try parsing it.
EDIT:
I found that it was doing this in Notepad++
Where you can see that where the line separator was, it is placing actual carriage return and line feed there, breaking the string. It already has \r\n\r\n for the two returns that were placed in the actual text area after that line separator character.
But still unsure of how to deal with this, as that carriage return and line feed do not appear in the string as '\n\r', there is no character representation of them, but instead it actually puts a return there and breaks the string.
NEW EDIT:
Finally found something to get this working. I couldn't do a replace on that line separator character. When I pulled it from my database, it came through as a hidden carriage return. When you manually pressed 'Enter' in the text area, the string I got from the database would actually put a '\r\n' there. But the line separator did not.
So, I added these three lines before parsing to ensure I was escaping any invalid new lines/carriage returns.
result = result.replace(/\r\n/g, '\\r\\n');
result = result.replace(/\r/g, '\\r');
result = result.replace(/\n/g, '\\n');
The '\r\n' that were actually in the string would correctly be escaped already, which tripped me up because I didn't have to worry about escaping those until someone tried introducing this line separator....

As Xufox says, that appears to be U+2028. JSON.parse shouldn't fail on it since U+2028 doesn't require escaping in JSON; Chrome's doesn't, but that's probably because it's implementing this stage 4 proposal Xufox pointed out:
const o = {prop: "testing\u2028one two three"};
console.log(JSON.parse(JSON.stringify(o)));
If you need to work around a JSON.parse implementation that doesn't handle it, you could do this:
str = str.replace(/\u2028/g, "\\u2028");
...before running JSON.parse on str.

Related

Angular Markdown Parsing "\n" Newline Characters Issue

I've created a blog where markdown is returned as a string from my backend, however when I return a string with the correct newline characters it does not behave as expected.
I am using ngx-markdown to handle parsing of markdown characters, and using the ngPreserveWhitespaces attribute to ensure this should be functioning as expected.
Example:
// Example #1 Returned String
"## This is a subheader\n This is a sentence"
// Output
<h2>This is a subheader\n This is a sentence</h2>
It displays it as a single line with the newline character physically rendered as text, as above.
However within any of my Angular components I can literally write this same string as a property on the component and return it, and it renders correctly like so:
// Example #2 Hard Coded String
public correct: string = "## This is a subheader\n This is a sentence";
// Output
<h2>This is a subheader</h2>
<p>This is a sentence</p>
As mentioned, the markdown parser I am using is implemented as such:
<markdown ngPreserveWhitespaces>{{content}}</markdown>
I have also attempted to do this by setting ngPreserveWhitespaces within the main/tsconfig files. However I do not believe this is what the issue is, as I can (natively in JS) console.log both my returned string (#1) and the hard coded string (#2) and even my damn logs display differently (with the latter formatting correctly and the former just stringifying the output).
I have attempted:
JSON stringify/parsing the data in multiple ways (as well as without)
I have attempted using regex to manually replace characters
I have attempted to just manually use <br />
I have tried everything outlined here regarding this markdown parser handling whitespace (which I do not believe is the issue)
Nothing appears to appease the newline gods.
Solution
Okay so I've found my solution to this, I am now storing my markdown string as such:
"## This is a subheader\n This is a sentence"
This difference here is the interface I'm storing this within (a Mongo db) puts the speechmarks around string values already, so it now actually looks like this:
""## This is a subheader\n This is a sentence""
Which is more a weird thing with MongoDB Compass than with what I was attempting.
From this, I am able to JSON.parse the value correctly (previously it was attempting to do so with a string that wasn't valid JSON as it wasn't enclosed in double speechmarks).
I am now handling this in an extension of a Property class I have for each of these called HtmlProperty which when instantiated, parses the value correctly.

JS - JSON.parse - preserve special characters

I'm running a NodeJS app that gets certain posts from an API.
When trying to JSON.parse with special characters in, the JSON.parse would fail.
Special characters can be just any other language, emojis etc.
Parsing works fine when posts don't have special characters.
I need to preserve all of the text, I can't just ignore those characters since I need to handle every possible language.
I'm getting the following error:
"Unexpected token �"
Example of a text i'm supposed to be able to handle:
"summary": "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT&メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦"�アイルランドと英国(他は専門外)※Togetterコメ欄と陰謀論が嫌いです。"
How can I properly parse such a text?
Thanks
You have misdiagnosed your problem, it has nothing to do with that character.
Your code contains an unescaped " immediately before the special character you think is causing the problem. The early " is prematurely terminating the string.
If you insert a backslash to escape the ", your string can be parsed as JSON just fine:
x = '{"summary": "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT&メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦\\"�アイルランドと英国(他は専門外)※Togetterコメ欄と陰謀論が嫌いです。"}';
console.log(JSON.parse(x));
You need to pass a string not as an object.
Example
JSON.parse('{"summary" : "a"}');
In your case it should be like this
JSON.parse(
'{"summary" : "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT&メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦�アイルランドと英国(他は専門外)※Togetterコメ欄と陰謀論が嫌いです。"}')

Error Parsing JSON with escaped quotes

I am getting the following json object when I call the URL from Browser which I expect no data in it.
"{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}"
However, when I tried to call it in javascript it gives me error Parsing Json message
dspservice.callService(URL, "GET", "", function (data) {
var dataList = JSON.parse(data);
)};
This code was working before I have no idea why all of a sudden stopped working and throwing me error.
You say the server is returning the JSON (omitting the enclosing quotes):
{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}
This is invalid JSON. The quote marks in JSON surrounding strings and property names should not be preceded by a backslash. The backslash in JSON is strictly for inserting double quote marks inside a string. (It can also be used to escape other characters inside strings, but that is not relevant here.)
Correct JSON would be:
{"data":[], "SkipToken":"", "top":""}
If your server returned this, it would parse correctly.
The confusion here, and the reports by other posters that it seems like your string should work, lies in the fact that in a simple-minded test, where I type this string into the console:
var x = "{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}";
the JavaScript string literal escaping mechanism, which is entirely distinct from the use of escapes in JSON, results in a string with the value
{"data":[], "SkipToken":"", "top":""}
which of course JSON.parse can handle just fine. But Javascript string escaping applies to string literals in source code, not to things coming down from the server.
To fix the server's incorrectly-escaped JSON, you have two possibilities. One is to tell the server guys they don't need to (and must not) put backslashes before quote marks (except for quote marks inside strings). Then everything will work.
The other approach is to undo the escaping yourself before handing it off to JSON.parse. A first cut at this would be a simple regexp such as
data.replace(/\\"/g, '"')
as in
var dataList = JSON.parse(data.replace(/\\"/g, '"')
It might need additional tweaking depending on how the server guys are escaping quotes inside strings; are they sending \"\\"\", or possibly \"\\\"\"?
I cannot explain why this code that was working suddenly stopped working. My best guess is a change on the server side that started escaping the double quotes.
Since there is nothing wrong with the JSON string you gave us, the only other explanation is that the data being passed to your function is something other than what you listed.
To test this hypothesis, run the following code:
dspservice.callService(URL, "GET", "", handler(data));
function handler(data) {
var goodData = "{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}";
alert(goodData); // display the correct JSON string
var goodDataList = JSON.parse(goodData); // parse good string (should work)
alert(data); // display string in question
var dataList = JSON.parse(data); // try to parse it (should fail)
}
If the goodData JSON string can be parsed with no issues, and data appears to be incorrectly-formatted, then you have the answer to your question.
Place a breakpoint on the first line of the handler function, where goodData is defined. Then step through the code. From what you told me in your comments, it is still crashing during a JSON parse, but I'm willing to wager that it is failing on the second parse and not the first.
Did you mean that your JSON is like this?
"{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}"
Then data in your callback would be like this:
'"{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}"'
Because data is the fetched text content string.
You don't have to add extra quotes in your JSON:
{"data":[], "SkipToken":"", "top":""}

Parsing inconsistent data

Here's what the data's supposed to look like:
Some junk data
More junk data
1. fairly long key, all on one line
value: some other text with spaces and stuff
2. hey look! another long key. still on one line
value: a different value with some different information
There's several of these per file, usually between twenty and thirty. The total number of key-value pairs exceeds 20,000, meaning manually correcting each file is a non-option. The number prefacing each key is supposed to increment properly. There is supposed to be a newline between a value and the following key. Each value should be prefaced with the string "value: "
Right now, I go line by line and classify each line as either key, value, or junk. I then parse the number out of the key and store the number, key, and value in an object.
Issues arise when the data is improperly formatted. Here are a few issues I've encountered thus far:
no newline between the key and value.
an unexpected newline in the middle of the key or value, which results in the program viewing a portion of each key or value as junk data.
the word "value" being spelled wrong.
I handle the third scenario by computing the Levenstein distance between the first six characters of each line and a master string "value:". How can I fix the other two issues?
If it matters, the parsing is happening on a node.js server, but I'm open to other languages if they can work with this inconsistent data more easily.
Take a look at this:
RegEx: ^(\d+)\. ?(.+?)(?:value|vlaue|balue|valie): ?(.+?)[\n\r]{2,}
Explained demo here: http://regex101.com/r/gG0wH8
If you have your 'misspelled value' issue fixed you can simplify it to:
^(\d+)\. ?(.+?)value: ?(.+?)[\n\r]{2,} otherwise add as many misspellings with a | in that RegEx part.
For this to work I hooked on:
line must start with digit(s) and a dot with a optional space
key is everything after the id and before the value
value ends after at least 2 line breaks
You should also remove the correct entries and then reexamine the file to check if anything else is missing.

Escaping raw, unescaped strings in bookmarklet

I’m trying to write a search engine bookmarklet (for Chrome), but I’m having trouble escaping the string.
For example if the search engine bookmarklet is the following:
javascript:alert("%s"); //%s is the search engine query, passed literally by chrome.
Then running it on the following string will give incorrect results:
c:\zebra
c:zebra instead of c:\zebra
If the character after the slash happens to be an actual escape character, then the results will vary depending on the character.
I’ve tried escaping and unescaping the string, I’ve tried reg-ex’ing it, and replacing the slash with a double-slash, but I cannot figure out a way to get this to work because the first time that the raw string enters the script, it is unescaped, and any operation after that will see it incorrectly.
How can this be handled correctly?
So far I can only make this work in chrome:
javascript: var str = (function(){STARTOFSTRING:/*%s*/ENDOFSTRING:;}).toString().match( /STARTOFSTRING:\/\*([\s\S]*)\*\/ENDOFSTRING:/ )[1]; alert(str);
writing c:\zebra will alert c:\zebra.
Firefox doesn't sustain the comments inside the function body when decompiled, unfortunately.
You also can't write the sequence */ in the string, but everything else should be passed literally, including quotes " ' etc

Categories

Resources