Weird charaters in a stringified buffer in javascript - javascript
I have a particular context in which one data are transformed a lot to get transferred across network. At the end, when I try to get this data back, I have unwanted characters at the beginning of the string.
First, I get the data from a db and it returns it to me as bytes (<Array.<byte>>), fully readable with .toString(). The result is:
{\"company\":\"xxx\",\"email\":\"xxx\",\"firstName\":\"xxx\",\"lastName\":\"xxx\",\"providerId\":\"xxx\",\"role\":\"xxx\",\"status\":\"xxx\"}
These data are passed to another "environment" with a function (not developed by me and that I cannot change) that returns the data in a "I don't really know what format it is".
I can decode it with the following piece of code:
jsonIdentity = JSON.stringify(bufferIdentity);
Buffer.from(JSON.parse(jsonIdentity).payload.buffer.data).toString('utf-8')
However, at the beginning of the string, I have the following:
"\u0008\u0006\u001a�\u0001\u0008�\u0001\u001a{{\"company\":\"xxx\",\"email\":\"xxx\",\"firstName\":\"xxx\",\"lastName\":\"xxx\",\"providerId\":\"xxx\",\"role\":\"xxx\",\"status\":\"xxx\"}
Also represented like that in my logs:
��{{"company":"xxx","email":"xxx","firstName":"xxx","lastName":"xxx","providerId":"xxx","role":"xxx","status":"xxx"
How can I remove it/prevent it to get in my result? It prevents me from using the JSON.
Update: here is the buffer I get:
{"status":200,"message":"","payload":{"buffer":{"type":"Buffer","data":[8,6,26,128,1,8,200,1,26,123,123,34,99,111,109,112,97,110,121,34,58,34,105,98,109,34,44,34,101,109,97,105,108,34,58,34,102,64,105,98,109,46,99,111,109,34,44,34,102,105,114,115,116,78,97,109,101,34,58,34,102,108,111,114,105,97,110,34,44,34,108,97,115,116,78,97,109,101,34,58,34,99,97,115,116,34,44,34,112,114,111,118,105,100,101,114,73,100,34,58,34,102,99,34,44,34,114,111,108,101,34,58,34,117,115,101,114,34,44,34,115,116,97,116,117,115,34,58,34,111,107,34,125,34,64,98,54,57,51,50,51,53,100,49,52,97,49,98,102,57,57,56,100,50,99,97,102,53,53,52,52,100,97,49,50,50,51,55,101,97,55,99,50,56,55,50,49,56,97,101,55,51,100,55,97,50,53,101,52,55,48,48,51,56,52,100,54,53,54,58,14,100,101,102,97,117,108,116,99,104,97,110,110,101,108]},"offset":10,"markedOffset":-1,"limit":133,"littleEndian":true,"noAssert":false}}
Can you try this out:
const yourString = JSON.parse(jsonIdentity).payload.buffer.data;
console.log(Buffer.from(yourString, 'base64').toString('utf-8'))
An ugly solution would be just trim or replace the characters from your result
Problem fixed. Solution is available on Jira here: https://jira.hyperledger.org/browse/FAB-14785?focusedCommentId=58680&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-58680
Related
Javascript: String comparison returns true although the strings are different (intermittent issue)
I'm processing the data in chunks using WritableStream. The decoded data is a json string and in case it starts with , I need to remove the comma. But here's the problem, after the chunk is being decoded to string I'm checking the first character const startsWithComma = chunk.at(0) === ',' and SOMETIMES it returns true although the chunk doesn't start with , and causes the JSON.parse to fail later on. See the attached image. Things I tried: used .at() alternatives like .charAt(), .startsWith(), chunk[0] The issue is intermittent meaning sometimes it can process the entire data and sometimes might fail mid through.
so, expanding on my comment: from your image, is it possible that the debug is running after the comma was already taken out? is it also possible that the chunk may begin with more than 1 comma so sometimes debugging at that exact spot would still show a comma? The solution would be to take off the commas using a while loop such as while( chunk.at(0)===',' ){ chunk = chunk.slice(1).trim(); } now I do not know the reason for doing it if isFirstChunk so I'd leave that alone, but the above loop should solve your startsWithComma issue :D
Weird symbol in string breaks JSON.parse, but seems to be undetectable?
The description field is a text area field, somehow a user ended up with some strange little symbol in it. (see image) When I grab this from the server, I assemble my data from the objects I grab, which includes the description on this object, and turn it into JSON string, and send it to my javascript. From javascript, I JSON.parse it. But that weird little symbol causes the parse to fail. But, when you look at it, there is no character there or anything, yet it throws an undefined character in JSON.parse. My response from the server has the description like this: "blahblahtesttext\r\nslkdjf", There is nothing but the expected \r\n...... But it has an unexpected token where that symbol is. {"value":"blah blah test text//Symbol should be here, but there is nothing and it forces it to the next line \r\nslkdjf","fieldType":"TEXTAREA","field":"Description"} Where that symbol forces the string to the next line, which causes the issue. Because I can't see what the actual character is... I do not know how to handle this. Is there something that can strip out invalid characters in a JSON string so the parse works? I don't want to just try/catch this as it would toss out everything, I just want that weird invalid symbol to be stripped out. Or is there a way to see what the actual character is that JSON.parse does not like? <-- here is that symbol for copy pasting into a string if you want to try parsing it. EDIT: I found that it was doing this in Notepad++ Where you can see that where the line separator was, it is placing actual carriage return and line feed there, breaking the string. It already has \r\n\r\n for the two returns that were placed in the actual text area after that line separator character. But still unsure of how to deal with this, as that carriage return and line feed do not appear in the string as '\n\r', there is no character representation of them, but instead it actually puts a return there and breaks the string. NEW EDIT: Finally found something to get this working. I couldn't do a replace on that line separator character. When I pulled it from my database, it came through as a hidden carriage return. When you manually pressed 'Enter' in the text area, the string I got from the database would actually put a '\r\n' there. But the line separator did not. So, I added these three lines before parsing to ensure I was escaping any invalid new lines/carriage returns. result = result.replace(/\r\n/g, '\\r\\n'); result = result.replace(/\r/g, '\\r'); result = result.replace(/\n/g, '\\n'); The '\r\n' that were actually in the string would correctly be escaped already, which tripped me up because I didn't have to worry about escaping those until someone tried introducing this line separator....
As Xufox says, that appears to be U+2028. JSON.parse shouldn't fail on it since U+2028 doesn't require escaping in JSON; Chrome's doesn't, but that's probably because it's implementing this stage 4 proposal Xufox pointed out: const o = {prop: "testing\u2028one two three"}; console.log(JSON.parse(JSON.stringify(o))); If you need to work around a JSON.parse implementation that doesn't handle it, you could do this: str = str.replace(/\u2028/g, "\\u2028"); ...before running JSON.parse on str.
Error Parsing JSON with escaped quotes
I am getting the following json object when I call the URL from Browser which I expect no data in it. "{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}" However, when I tried to call it in javascript it gives me error Parsing Json message dspservice.callService(URL, "GET", "", function (data) { var dataList = JSON.parse(data); )}; This code was working before I have no idea why all of a sudden stopped working and throwing me error.
You say the server is returning the JSON (omitting the enclosing quotes): {\"data\":[], \"SkipToken\":\"\", \"top\":\"\"} This is invalid JSON. The quote marks in JSON surrounding strings and property names should not be preceded by a backslash. The backslash in JSON is strictly for inserting double quote marks inside a string. (It can also be used to escape other characters inside strings, but that is not relevant here.) Correct JSON would be: {"data":[], "SkipToken":"", "top":""} If your server returned this, it would parse correctly. The confusion here, and the reports by other posters that it seems like your string should work, lies in the fact that in a simple-minded test, where I type this string into the console: var x = "{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}"; the JavaScript string literal escaping mechanism, which is entirely distinct from the use of escapes in JSON, results in a string with the value {"data":[], "SkipToken":"", "top":""} which of course JSON.parse can handle just fine. But Javascript string escaping applies to string literals in source code, not to things coming down from the server. To fix the server's incorrectly-escaped JSON, you have two possibilities. One is to tell the server guys they don't need to (and must not) put backslashes before quote marks (except for quote marks inside strings). Then everything will work. The other approach is to undo the escaping yourself before handing it off to JSON.parse. A first cut at this would be a simple regexp such as data.replace(/\\"/g, '"') as in var dataList = JSON.parse(data.replace(/\\"/g, '"') It might need additional tweaking depending on how the server guys are escaping quotes inside strings; are they sending \"\\"\", or possibly \"\\\"\"? I cannot explain why this code that was working suddenly stopped working. My best guess is a change on the server side that started escaping the double quotes.
Since there is nothing wrong with the JSON string you gave us, the only other explanation is that the data being passed to your function is something other than what you listed. To test this hypothesis, run the following code: dspservice.callService(URL, "GET", "", handler(data)); function handler(data) { var goodData = "{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}"; alert(goodData); // display the correct JSON string var goodDataList = JSON.parse(goodData); // parse good string (should work) alert(data); // display string in question var dataList = JSON.parse(data); // try to parse it (should fail) } If the goodData JSON string can be parsed with no issues, and data appears to be incorrectly-formatted, then you have the answer to your question. Place a breakpoint on the first line of the handler function, where goodData is defined. Then step through the code. From what you told me in your comments, it is still crashing during a JSON parse, but I'm willing to wager that it is failing on the second parse and not the first.
Did you mean that your JSON is like this? "{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}" Then data in your callback would be like this: '"{\"data\":[], \"SkipToken\":\"\", \"top\":\"\"}"' Because data is the fetched text content string. You don't have to add extra quotes in your JSON: {"data":[], "SkipToken":"", "top":""}
Dealing with the Cyrillic encoding in Node.Js / Express App
In my app a user submits text through a form's textarea and this text is passed on to the app and is then processed by jsesc library, which escapes javascript strings. The problem is that when I type in a text in Russian, such as нам #интересны наши #идеи what i get is '\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438' I then need to pass this data through FlowDock to extract hashtags and FlockDock just does not recognize it. Can someone please tell me 1) What is the need for converting it into that representation; 2) If it makes sense to convert it back to cyrillic encoding for FlowDock and for the database, or shall I keep it in Unicode and try to make FlowDock work with it? Thanks! UPDATE The complete script is: result = getField(req, field); result = S(result).trim().collapseWhitespace().s; // at this point result = "нам #интересны наши #идеи" result = jsesc(result, { 'quotes': 'double' }); // now i end up with Unicode as above above (\u....) var hashtags = FlowdockText.extractHashtags(result); FlowDock receives the result which is \u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438 And doesn't extract hashtags from it...
These are 2 representations of the same string: 'нам #интересны наши #идеи' === '\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438' looks like flowdock-text doesn't work well with non-ASCII characters UPD: Tried, actually works well: fdt.extractHashtags('\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438'); You shouldn't have used escaping in the first place, it gives you string literal representation (suits for eval, etc), not a string. UPD2: I've reduced you code to the following: var jsesc = require('jsesc'); var fdt = require('flowdock-text'); var result = 'нам #интересны наши #идеи'; result = jsesc(result, { 'quotes': 'double' }); var hashtags = fdt.extractHashtags(result); console.log(hashtags); As I said, the problem is with jsesc: you don't need it. It returns javascript-encoded string. You need when you are doing eval with concatenation to protect from code injection, or something like this. For example if you add result = eval('"' + result + '"');, it will work.
What is the need for converting it into that representation? jsesc is a JavaScript library for escaping JavaScript strings while generating the shortest possible valid ASCII-only output. Here’s an online demo. This can be used to avoid mojibake and other encoding issues, or even to avoid errors when passing JSON-formatted data (which may contain U+2028 LINE SEPARATOR, U+2029 PARAGRAPH SEPARATOR, or lone surrogates) to a JavaScript parser or an UTF-8 encoder, respectively. Sounds like in this case you don’t intend to use jsesc at all.
Try this: decodeURIComponent("\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438");
convert a structured string into array
this question might have been asked already. But i really have no idea what to search for. If I have a string like {{aa:bb,aaa:bbb,cc:ee{{aa:cd,cdc:dd,{{ss:ee}},kk:ee}},se:ff}} I need to get output in probably in array ar[0] = aa:bb, ar[1]=aaa:bbb, ar[3] = {{...}} I tried using variable.split("}}") which is breaking the string and not getting the actual data. Is there any recursive function to do this? I am not able to search because I have no clear idea of what objects,strings.
If you used an existing format for structuring your string, such as JSON: ["aa:bb","aaa:bbb","cc:ee",["aa:cd","cdc:dd",["ss:ee"],"kk:ee"],"se:ff"] Then you could just run it through JSON.parse(). - It'd be far easier than trying to decode the meaning of that string without being told what it means.
I think what you're looking for is how to parse a JSON string into an object. I'm not certain, but at least it looks like that based on the format of your string. Can you confirm if the source is providing JSON output? If yes: Read this other SO question.