How to convert JSON escaped string into plain HTML-compatible string

How to convert JSON escaped string into plain HTML-compatible string - javascript

I am using an API to compile code, and when there is an error, the response containing the error message uses JSON escape characters, but when outputting it back into the HTML front-end, it just produces garbage characters. How can I either convert the escaped string to a plain text string using Javascript, or output it in HTML correctly?
This is what the string looks like properly outputted (in Powershell):
https://i.imgur.com/tv0BZFl.jpg
This is the escaped string:
\u001b[01m\u001b[K:\u001b[m\u001b[K In function '\u001b[01m\u001b[Kin...
This is what the string looks like if I directly output it in HTML:
[01m[K:[m[K In function '[01m[Kint main()[m[K':
[01m[K:9:1:[m[K [01;31m[Kerror: [m[Kexpected '[01m[K;[m[K' before '[01m[K}[m[K' token
}
[01;32m[K ^[m[K

Looks like you can use the strip-ansi package. Here's an example using your escaped string:
const stripAnsi = require('strip-ansi');
stripAnsi("\u001b[01m\u001b[K:\u001b[m\u001b[K In function '\u001b[01m\u001b[Kin...")
// result => ": In function 'in..."
If you aren't using node.js, or cannot use that package for whatever reason, this Stack Overflow answer has a regular expression you may be able to use instead.

Just found this tool too:
https://www.npmjs.com/package/ansi-to-html
which converts ANSI to html.

Related

UINT8 Array to String without escape characters

I'm parsing a Uint8 array that is an HTML document. It contains a script tag which in turn contains JSON data that I would like to parse.
I first converted the array to text:
data = Buffer.from(str).toString('utf8')
I then searched for the script tag, and extracted the string containing the JSON:
... {\"phrase\":\"Go to \"California\"\",\"color\":\"red\",\"html\":\"<div class=\"myclass\">Ok</div>\"} ...
I then did a replace to clean it up.
data = data.replace(/\\"/g, "\"").replace(/\\/g, "").
{"phrase":"Go to "California"","color":"red","html":"<div class="myclass">Ok</div>"}
I tried to parse using JSON.parse() and got an error because the attributes contain quotes. Is there a way to process this further using a regex ? Or perhaps a library? I am working with Cheerio, so can use that if helpful.

The escape characters are necessary if you want to parse the JSON. The embedded quotes would need to be double escaped, so the extracted text isn't even valid JSON.
"{\"phrase\":\"Go to \\\"California\\\"\",\"color\":\"red\",\"html\":\"<div class=\\\"myclass\\\">Ok</div>\"}"
or, using single quotes:
'{"phrase":"Go to \\"California\\"","color":"red","html":"<div class=\\"myclass\\">Ok</div>"}'

Thanks.
After some more tinkering around, I realized that I should have encoded the data to Uint8 at the source (a Lambda function) before transmitting it for further processing. So now, I have:
Text
Encoded text to Uint8
Return from Lambda function.
Decode from Uint8 to text
Process readily as no escape characters.
Before, I was skipping step 2. And so Lambda was encoded the text however it does by default.

Encoding in C# and Decoding in Javascript

I have encoded some text in C# like below:
var encodedCredential = Convert.ToBase64String(Encoding.Unicode.GetBytes(JsonConvert.SerializeObject("Sample text")));
The encoded String is :IgBTAGEAbQBwAGwAZQAgAHQAZQB4AHQAIgA=
I want to decode the encoded String in java script.
I have tried the below
decodeURIComponent(atob("IgBTAGEAbQBwAGwAZQAgAHQAZQB4AHQAIgA="))
decodeURIComponent(atob("IgBTAGEAbQBwAGwAZQAgAHQAZQB4AHQAIgA=").replace(' ',''))
The result is something different, There are some spaces in each letter. I cant even replace the spaces.

You need to use UTF-8 encoding in C#. Export base64 by this command
Convert.ToBase64String(Encoding.UTF8.GetBytes("Sample text"))

#King_Fisher, you shouldn't be getting additional spaces, also the replace method will replace a single occurrence.
Here's what I did with your code (see attached screenshot)

JSON unicode characters conversion

I came across this strange JSON which I can't seem to decode.
To simplify things, let's say it's a JSON string:
"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"
After decoding it should look as following:
└── mystring
JS or PHP doesn't seem to convert it correctly.
js> JSON.parse('"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"')
ffe2ff94ff94ffe2ff94ff80ffe2ff94ff80 mystring
PHP behaves the same
php> json_decode('"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"')
ffe2ff94ff94ffe2ff94ff80ffe2ff94ff80 mystring
Any ideas how to properly parse this JSON string would be welcome.

It is not valid JSON string - JSON supports only 4 hex digits after \u. Results from both PHP and JS are correct.
It is not possible decode this using standard functions.
Where did you get this JSON string?
About correct json for string you want to get - it should be "\u2514\u2500\u2500 mystring", or just "└── mystring" (json supports any unicode characters in strings except " and \).
Also if you need to encode some character that require more than two bytes - it will result in two escape codes for example "𩄎" would be "\ud864\udd0e" when escaped.
So, If you really need to decode string above - you can fix it before decoding, replacing \uffffffe2 by \uffff\uffe2 via regexp (for js it would be something like: s.replace(/(\\u[A-Fa-f0-9]{4})([A-Fa-f0-9]{4})/gi,'$1\\u$2') ).
But anyway character codes in string specified above does not look right.

Dealing with the Cyrillic encoding in Node.Js / Express App

In my app a user submits text through a form's textarea and this text is passed on to the app and is then processed by jsesc library, which escapes javascript strings.
The problem is that when I type in a text in Russian, such as
нам #интересны наши #идеи
what i get is
'\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438'
I then need to pass this data through FlowDock to extract hashtags and FlockDock just does not recognize it.
Can someone please tell me
1) What is the need for converting it into that representation;
2) If it makes sense to convert it back to cyrillic encoding for FlowDock and for the database, or shall I keep it in Unicode and try to make FlowDock work with it?
Thanks!
UPDATE
The complete script is:
result = getField(req, field);
result = S(result).trim().collapseWhitespace().s;
// at this point result = "нам #интересны наши #идеи"
result = jsesc(result, {
'quotes': 'double'
});
// now i end up with Unicode as above above (\u....)
var hashtags = FlowdockText.extractHashtags(result);
FlowDock receives the result which is
\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438
And doesn't extract hashtags from it...

These are 2 representations of the same string:
'нам #интересны наши #идеи' === '\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438'
looks like flowdock-text doesn't work well with non-ASCII characters
UPD: Tried, actually works well:
fdt.extractHashtags('\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438');
You shouldn't have used escaping in the first place, it gives you string literal representation (suits for eval, etc), not a string.
UPD2: I've reduced you code to the following:
var jsesc = require('jsesc');
var fdt = require('flowdock-text');
var result = 'нам #интересны наши #идеи';
result = jsesc(result, {
'quotes': 'double'
});
var hashtags = fdt.extractHashtags(result);
console.log(hashtags);
As I said, the problem is with jsesc: you don't need it. It returns javascript-encoded string. You need when you are doing eval with concatenation to protect from code injection, or something like this. For example if you add result = eval('"' + result + '"');, it will work.

What is the need for converting it into that representation?
jsesc is a JavaScript library for escaping JavaScript strings while generating the shortest possible valid ASCII-only output. Here’s an online demo.
This can be used to avoid mojibake and other encoding issues, or even to avoid errors when passing JSON-formatted data (which may contain U+2028 LINE SEPARATOR, U+2029 PARAGRAPH SEPARATOR, or lone surrogates) to a JavaScript parser or an UTF-8 encoder, respectively.
Sounds like in this case you don’t intend to use jsesc at all.

Try this:
decodeURIComponent("\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438");

Why are endline characters illegal in HTML string sent over ajax?

Within HTML, it is okay to have endline characters. But when I try to send HTML strings that have endline characters over AJAX to have them operated with JavaScript/jQuery, it returns an error that says that endline characters are illegal. For example, if I have a Ruby string:
"<div>Hello</div>"
and jsonify it with Ruby by to_json, and send it over ajax, parse it within JavaScript by JSON.parse, and insert that in jQuery like:
$('body').append('<div>Hello</div>');
then it does not return an error, but if I do a similar thing with a string like
"<div>Hello\n</div>"
it returns an error. Why are they legal in HTML and illegal in AJAX? Are there any other differences between a legal HTML string loaded as a page and legal HTML string sent over ajax?

string literals can contain line breaks, they just need to be escaped with a backslash like so:
var string = "hello\
world!";
However, this does not create a line break in the string, as it must be an explicit \n escape sequence. This would technically become helloworld. Doing
var string = "hello"
+ "world"
would be much cleaner

Specify the type of the ajax call as 'html'. Jquery will try to infer the type when parsing the response.
If the response is json, newlines should be escaped.
I'd recommend using a library to serialize json. You're unlikely to handle all the edge cases if you roll your own.

Strings in JavaScript MUST appear on a single line, with the exception of escaping that line:
var str = "abc \
def";
However note that the newline is escaped and will not appear in the string itself.
The best option is \n, but note that if it is already going through something that parses \n then you will need to double-escape it as \\n.

Seeing how you're already escaping the JSON properly by using to_json in Ruby, I do believe the bug is in jQuery; when there are newlines in the string it has trouble determining whether you meant to create a single element or a document fragment. This would work just fine:
var str = "<div>Hello\n</div>";
var wrapper = document.createElement('div');
wrapper.innerHTML = str;
$('body').append(wrapper);
Demo

Develop Reference

JavaScript is the programming language of the Web.

How to convert JSON escaped string into plain HTML-compatible string - javascript

Just found this tool too: https://www.npmjs.com/package/ansi-to-html which converts ANSI to html.

Related

UINT8 Array to String without escape characters

Encoding in C# and Decoding in Javascript

JSON unicode characters conversion

Dealing with the Cyrillic encoding in Node.Js / Express App

Why are endline characters illegal in HTML string sent over ajax?

Categories

Resources