UINT8 Array to String without escape characters - javascript

I'm parsing a Uint8 array that is an HTML document. It contains a script tag which in turn contains JSON data that I would like to parse.
I first converted the array to text:
data = Buffer.from(str).toString('utf8')
I then searched for the script tag, and extracted the string containing the JSON:
... {\"phrase\":\"Go to \"California\"\",\"color\":\"red\",\"html\":\"<div class=\"myclass\">Ok</div>\"} ...
I then did a replace to clean it up.
data = data.replace(/\\"/g, "\"").replace(/\\/g, "").
{"phrase":"Go to "California"","color":"red","html":"<div class="myclass">Ok</div>"}
I tried to parse using JSON.parse() and got an error because the attributes contain quotes. Is there a way to process this further using a regex ? Or perhaps a library? I am working with Cheerio, so can use that if helpful.

The escape characters are necessary if you want to parse the JSON. The embedded quotes would need to be double escaped, so the extracted text isn't even valid JSON.
"{\"phrase\":\"Go to \\\"California\\\"\",\"color\":\"red\",\"html\":\"<div class=\\\"myclass\\\">Ok</div>\"}"
or, using single quotes:
'{"phrase":"Go to \\"California\\"","color":"red","html":"<div class=\\"myclass\\">Ok</div>"}'

Thanks.
After some more tinkering around, I realized that I should have encoded the data to Uint8 at the source (a Lambda function) before transmitting it for further processing. So now, I have:
Text
Encoded text to Uint8
Return from Lambda function.
Decode from Uint8 to text
Process readily as no escape characters.
Before, I was skipping step 2. And so Lambda was encoded the text however it does by default.

Related

Keeping escaped unicode characters with JSON.stringify of JSON.parse

I have an input JSON like this (which really contains the literal values "\u2013" (the encoded form of a unicode character)):
{"source":"Subject: NEED: 11/5 BNA-MSL \u2013 1200L Departure - 1 Pax"}
I read it with JSON.parse and it reads the \u2013 as –, which is fine for display in my app.
However, I need to export again the same JSON, to send it down to some other app. I want to keep the same format and have back the \u2013 into the JSON. I am doing JSON.stringify, but it keeps the – in the output.
Any idea what I could do to keep the \u syntax?
Using a replacer function in a JSON.stringify call didn't work - strings returned from the replacer with an escaped backslash produce a double backslash in output, and a single backslashed character is unescaped in output if possible.
Simply re-escaping the stringify result has potential:
const obj = {"source":"Subject: NEED: 11/5 BNA-MSL \u2013 1200L Departure - 1 Pax"}
console.log(" stringify: ", JSON.stringify( obj));
console.log("& replaceAll: ", JSON.stringify(obj).replaceAll('\u2013', '\\u2013'));
using more complex string modifications as necessary.
However this looks very like an X solution to an X-Y problem. Better might be to fix the downstream parsing to handle JSON text as JSON text and not try to use it in raw form - particularly given that JSON text in encoded in utf-8 and can handle non-ASCII characters without special treatment.

How to convert JSON escaped string into plain HTML-compatible string

I am using an API to compile code, and when there is an error, the response containing the error message uses JSON escape characters, but when outputting it back into the HTML front-end, it just produces garbage characters. How can I either convert the escaped string to a plain text string using Javascript, or output it in HTML correctly?
This is what the string looks like properly outputted (in Powershell):
https://i.imgur.com/tv0BZFl.jpg
This is the escaped string:
\u001b[01m\u001b[K:\u001b[m\u001b[K In function '\u001b[01m\u001b[Kin...
This is what the string looks like if I directly output it in HTML:
[01m[K:[m[K In function '[01m[Kint main()[m[K':
[01m[K:9:1:[m[K [01;31m[Kerror: [m[Kexpected '[01m[K;[m[K' before '[01m[K}[m[K' token
}
[01;32m[K ^[m[K
Looks like you can use the strip-ansi package. Here's an example using your escaped string:
const stripAnsi = require('strip-ansi');
stripAnsi("\u001b[01m\u001b[K:\u001b[m\u001b[K In function '\u001b[01m\u001b[Kin...")
// result => ": In function 'in..."
If you aren't using node.js, or cannot use that package for whatever reason, this Stack Overflow answer has a regular expression you may be able to use instead.
Just found this tool too:
https://www.npmjs.com/package/ansi-to-html
which converts ANSI to html.

Encoding in C# and Decoding in Javascript

I have encoded some text in C# like below:
var encodedCredential = Convert.ToBase64String(Encoding.Unicode.GetBytes(JsonConvert.SerializeObject("Sample text")));
The encoded String is :IgBTAGEAbQBwAGwAZQAgAHQAZQB4AHQAIgA=
I want to decode the encoded String in java script.
I have tried the below
decodeURIComponent(atob("IgBTAGEAbQBwAGwAZQAgAHQAZQB4AHQAIgA="))
decodeURIComponent(atob("IgBTAGEAbQBwAGwAZQAgAHQAZQB4AHQAIgA=").replace(' ',''))
The result is something different, There are some spaces in each letter. I cant even replace the spaces.
You need to use UTF-8 encoding in C#. Export base64 by this command
Convert.ToBase64String(Encoding.UTF8.GetBytes("Sample text"))
#King_Fisher, you shouldn't be getting additional spaces, also the replace method will replace a single occurrence.
Here's what I did with your code (see attached screenshot)

JSON unicode characters conversion

I came across this strange JSON which I can't seem to decode.
To simplify things, let's say it's a JSON string:
"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"
After decoding it should look as following:
└── mystring
JS or PHP doesn't seem to convert it correctly.
js> JSON.parse('"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"')
ffe2ff94ff94ffe2ff94ff80ffe2ff94ff80 mystring
PHP behaves the same
php> json_decode('"\uffffffe2\uffffff94\uffffff94\uffffffe2\uffffff94\uffffff80\uffffffe2\uffffff94\uffffff80 mystring"')
ffe2ff94ff94ffe2ff94ff80ffe2ff94ff80 mystring
Any ideas how to properly parse this JSON string would be welcome.
It is not valid JSON string - JSON supports only 4 hex digits after \u. Results from both PHP and JS are correct.
It is not possible decode this using standard functions.
Where did you get this JSON string?
About correct json for string you want to get - it should be "\u2514\u2500\u2500 mystring", or just "└── mystring" (json supports any unicode characters in strings except " and \).
Also if you need to encode some character that require more than two bytes - it will result in two escape codes for example "𩄎" would be "\ud864\udd0e" when escaped.
So, If you really need to decode string above - you can fix it before decoding, replacing \uffffffe2 by \uffff\uffe2 via regexp (for js it would be something like: s.replace(/(\\u[A-Fa-f0-9]{4})([A-Fa-f0-9]{4})/gi,'$1\\u$2') ).
But anyway character codes in string specified above does not look right.

JavaScript Encoding & Decoding Error

I am getting an issue with Javascript Encoding and Decoding. I have a json object which contains a string encoded in UTF-8 like 'R\xc3\xa9union'. To ensure that the Javascript file correctly displays the string, I'm adding an attribute charset to the script tag. The json object is in countries.js. I'm including countries.js as <script src="js/countries.js" charset="UTF-8"></script> and yet it is still being displayed as Réunion instead of Réunion. Any suggestion?
Use escape() combined with decodeURIComponent():
decodeURIComponent(escape('R\xc3\xa9union'));
That should do the trick:
escape('R\xc3\xa9union'); // "R%C3%A9union"
decodeURIComponent("R%C3%A9union"); // "Réunion"
Now, you said you couldn't do this manually for all the places you need strings from the JSON. I really don't know a way to automate this without re-building the JSON with JS, so I'd suggest writing a little "wrapper" function to decode on the fly:
function dc(str){
return decodeURIComponent(escape(str));
}
You can then decode the required strings with minimal effort:
var myString = dc(myJson["some"]["value"]);
Now, what else could work, but is a little more risky: JSON.stringify() the entire object, decode that using the 2 functions, then JSON.parse() it again.

Categories

Resources