Which string encoding is this?

Which string encoding is this? - javascript

On one of my webhooks, I am receiving the following string
9\x09?\x09\x02\x09&\x09#
This was supposed to be the text in regional language. For English, the content seems fine. But for vernacular strings, the service provider is sending this. Consider that I am using Javascript, how do I decode this string?
This is how the webhook is being called :
/api/test?content=9\x09?\x09\x02\x09&\x09#&timestamp=20210120145223

It's ASCII. All occurrences of the four characters \xST are converted to 1 character, whose ASCII code is ST (in hexadecimal), where S and T are any of 0123456789abcdefABCDEF.

Related

Get a string between two strings in Javascript

I have the below string that I need help pulling an ID from in Presto. Presto uses the javascript regex. I've searched multiple options including:
JavaScript text between double quotes
Javascript regex to extract all characters between quotation marks following a specific word
I need to pull the GA Client ID which looks like this:
75714ae471df63202106404675dasd800097erer1849995367
Below is a snipped where it sits in the string.
The struggle is that the "s:38:" is not constant. The number can be anything. For example, it could be s:40: or s:1000: etc. I need it to return just the alphanumeric id.
String Snippet
"GA_ClientID__c";s:38:"75714ae471df63202106404675dasd800097erer1849995367";
Full string listed below
99524";s:9:"FirstName";s:2:"John";s:8:"LastName";s:8:"Doe";s:7:"Company";s:10:"Sample";s:5:"Email";s:20:"xxxxx#gmail.com";s:5:"Phone";s:10:"8888888888";s:7:"Country";s:13:"United States";s:5:"Title";s:8:"Creative";s:5:"State";s:2:"NC";s:13:"Last_Asset__c";s:40:"White Paper: Be a More Strategic Partner";s:16:"Last_Campaign__c";s:18:"70160000000q6TgAAI";s:16:"Referring_URL__c";s:8:"[direct]";s:19:"leadPriorityMarketo";s:2:"P2";s:18:"ProductInterest__c";s:9:"sample";s:14:"landingpageurl";s:359:"https://www.sample.com;mkt_tok=samplesamplesamplesample";s:14:"GA_ClientID__c";s:38:"75714ae471df63202106404675dasd800097erer1849995367";s:13:"Drupal_SID__c";s:36:"e1380c07-0258-47de-aaf8-82d4d8061e1a";s:4:"form";s:4:"1046";} ```

This works for your sample
"GA_ClientID__c";[^"]*"([^"]*)"
https://regex101.com/r/Q4Orj6/1

JS - JSON.parse - preserve special characters

I'm running a NodeJS app that gets certain posts from an API.
When trying to JSON.parse with special characters in, the JSON.parse would fail.
Special characters can be just any other language, emojis etc.
Parsing works fine when posts don't have special characters.
I need to preserve all of the text, I can't just ignore those characters since I need to handle every possible language.
I'm getting the following error:
"Unexpected token �"
Example of a text i'm supposed to be able to handle:
"summary": "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT＆メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦"�アイルランドと英国（他は専門外）※Togetterコメ欄と陰謀論が嫌いです。"
How can I properly parse such a text?
Thanks

You have misdiagnosed your problem, it has nothing to do with that character.
Your code contains an unescaped " immediately before the special character you think is causing the problem. The early " is prematurely terminating the string.
If you insert a backslash to escape the ", your string can be parsed as JSON just fine:
x = '{"summary": "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT＆メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦\\"�アイルランドと英国（他は専門外）※Togetterコメ欄と陰謀論が嫌いです。"}';
console.log(JSON.parse(x));

You need to pass a string not as an object.
Example
JSON.parse('{"summary" : "a"}');
In your case it should be like this
JSON.parse(
'{"summary" : "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT＆メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦�アイルランドと英国（他は専門外）※Togetterコメ欄と陰謀論が嫌いです。"}')

What is the best way to serialize a JavaScript object into something that can be used as a fragment identifier (url#hash)?

My page state can be described by a JavaScript object that can be serialized into JSON. But I don't think a JSON string is suitable for use in a fragment ID due to, for example, the spaces and double-quotes.
Would encoding the JSON string into a base64 string be sensible, or is there a better way? My goal is to allow the user to bookmark the page and then upon returning to that bookmark, have a piece of JavaScript read window.location.hash and change state accordingly.

I think you are on a good way. Let's write down the requirements:
The encoded string must be usable as hash, i.e. only letters and numbers.
The original value must be possible to restore, i.e. hashing (md5, sha1) is not an option.
It shouldn't be too long, to remain usable.
There should be an implementation in JavaScript, so it can be generated in the browser.
Base64 would be a great solution for that. Only problem: base64 also contains characters like - and +, so you win nothing compared to simply attaching a JSON string (which also would have to be URL encoded).
BUT: Luckily, theres a variant of base64 called base64url which is exactly what you need. It is specifically designed for the type of problem you're describing.
However, I was not able to find a JS implementation; maybe you have to write one youself – or do a bit more research than my half-assed 15 seconds scanning the first 5 Google results.
EDIT: On a second thought, I think you don't need to write an own implementation. Use a normal implementation, and simply replace the “forbidden” characters with something you find appropriate for your URLs.

Base64 is an excellent way to store binary data in text. It uses just 33% more characters/bytes than the original data and mostly uses 0-9, a-z, and A-Z. It also has three other characters that would need encoded to be stored in the URL, which are /, =, and +. If you simply used URL encoding, it would take up 300% (3x) the size.
If you're only storing the characters in the fragment of the URL, base64-encoded text it doesn't need to be re-encoded and will not change. But if you want to send the data as part of the actual URL to visit, then it matters.
As referenced by lxg, there there is a base64url variant for that. This is a modified version of base64 to replace unsafe characters to store in the URL. Here is how to encode it:
function tobase64url(s) {
return btoa(x).replace(/\+/g,'-').replace(/\//g,'_').replace(/=/g,'');
}
console.log(tobase64url('\x00\xff\xff\xf1\xf1\xf1\xff\xff\xfe'));
// Returns "AP__8fHx___-" instead of "AP//8fHx///+"
And to decode a base64 string from the URL:
function frombase64url(s) {
return atob(x.replace(/-/g,'+').replace(/_/g, '/'));
}

Use encodeURIComponent and decodeURIComponent to serialize data for the fragment (aka hash) part of the URL.
This is safe because the character set output by encodeURIComponent is a subset of the character set allowed in the fragment. Specifically, encodeURIComponent escapes all characters except:
A - Z
a - z
0 - 9
- . _ ~ ! ' ( ) *
So the output includes the above characters, plus escaped characters, which are % followed by hexadecimal digits.
The set of allowed characters in the fragment is:
A - Z
a - z
0 - 9
? / : # - . _ ~ ! $ & ' ( ) * + , ; =
percent-encoded characters (a % followed by hexadecimal digits)
This set of allowed characters includes all the characters output by encodeURIComponent, plus a few other characters.

what kind of encoding is this?

I've got some data from dbpedia using jena and since jena's output is based on xml so there are some circumstances that xml characters need to be treated differently like following :
Guns n &#039; Roses
I just want to know what kind of econding is this?
I want decode/encode my input based on above encode(r) with the help of javascript and send it back to a servlet.
(edited post if you remove the space between & and amp you will get the correct character since in stackoverflow I couldn't find a way to do that I decided to put like that!)

Seems to be XML entity encoding, and a numeric character reference (decimal).
A numeric character reference refers to a character by its Universal
Character Set/Unicode code point, and uses the format
You can get some info here: List of XML and HTML character entity references on Wikipedia.
Your character is number 39, being the apostrophe: ', which can also be referenced with a character entity reference: &apos;.
To decode this using Javascript, you could use for example php.js, which has an html_entity_decode() function (note that it depends on get_html_translation_table()).
UPDATE: in reply to your edit: Basically that is the same, the only difference is that it was encoded twice (possibly by mistake). & is the ampersand: &.

This is an SGML/HTML/XML numeric character entity reference.
In this case for an apostrophe '.

Euro sign or other entity in Javascript alert/messagebox

Does anybody know how i can show a euro or other html entity in javascript alert windows?

alert('\u20AC');
HTML Entity Character Lookup

<script>alert("\u20ac");</script>
(20AC being the Unicode character for the euro sign.)

An alert box can show any characters that are in the codepage for the currently logged on session. So for example if the machine is using the 1252 codepage you can display the eurosign.
Its not clear what your trouble is, you javascript string should not have the characters encoded as entities anyway?
Edit:
If you specify UTF-8 in the HTML or as the Response.CharSet but you haven't actually saved the ASP file in UTF-8 format you will have problems with characters outside of ASCII.
ASP assumes static parts of an ASP file are in the required codepage already and sends it verbatim byte for byte, no encoding will happen.

You can use the characters €, £, $ or ¥, which are standard ASCII, and can be produced directly on the keyboard.

for example, U+1234 is used like this: alert('\u1234').
For full list, you can see All Entity list:
1) http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
2) http://www.utf8-chartable.de/
3) http://rishida.net/tools/conversion/ (CONVERTER)

Develop Reference

JavaScript is the programming language of the Web.

Which string encoding is this? - javascript

It's ASCII. All occurrences of the four characters \xST are converted to 1 character, whose ASCII code is ST (in hexadecimal), where S and T are any of 0123456789abcdefABCDEF.

Related

Get a string between two strings in Javascript

JS - JSON.parse - preserve special characters

What is the best way to serialize a JavaScript object into something that can be used as a fragment identifier (url#hash)?

what kind of encoding is this?

Euro sign or other entity in Javascript alert/messagebox

Categories

Resources