Javascript: Error while converting Unicode string to hex - javascript

I'm trying to convert a unicode string to a hexadecimal representation in javascript controller in SAPUI5 WebIDE.
I am using this function to convert the unicode data to hex. Str variable contains the Unicode data
convertToHex: function(str) {
var hex = '';
var i = 0;
while (str.length > i) {
hex += '' + str.charCodeAt(i).toString(16);
i++;
}
console.log(hex);
return hex;
},
This is first line of the result i am getting in hex variable
504b3414060800021062ee9d685e10090400130825b436f6e74656e745f54797065735d2e786d6c20a24228a002
Now when i am uploading same data to SAP Netweaver gateway, it converts the unicode data to hex as follows (First line) :
504B03041400060008000000210062EE9D685E01000090040000130008025B436F6E74656E745F54797065735D2E786D6C20A2040228A00002
This is the decoded unicode:
PK!bîh^[Content_Types].xml ¢( 
For my application to work i need both hex codes to be same but i am not able to generate the correct hex code in Javascript whereas in SAP i am getting the correct hex values.

Related

Javascript hexadecimal to ASCII with latin extended symbols

I am getting a hexadecimal value of my string that looks like this:
String has letters with diacritics: č,š,ř, ...
Hexadecimal value of this string is:
0053007400720069006E006700200068006100730020006C0065007400740065007200730020007700690074006800200064006900610063007200690074006900630073003A0020010D002C00200161002C00200159002C0020002E002E002E
The problem is that when i try to convert this value back to ascii it poorly converts the č,š,ř,.. and returns symbol of little box with question mark in it instead of these symbols.
My code for converting hex to ascii:
function convertHexadecimal(hexx){
let index = hexx.indexOf("~");
let strInfo = hexx.substring(0, index+1);
let strMessage = hexx.substring(index+1);
var hex = strMessage.toString();
var str = '';
for (var i = 0; i < hex.length; i += 2){
str += String.fromCharCode(parseInt(hex.substr(i, 2), 16));
}
console.log("Zpráva: " + str);
var strFinal = strInfo + str;
return strFinal;
}
Can somebody help me with this?
First an example solution:
let demoHex = `0053007400720069006E006700200068006100730020006C0065007400740065007200730020007700690074006800200064006900610063007200690074006900630073003A0020010D002C00200161002C00200159002C0020002E002E002E`;
function hexToString(hex) {
let str="";
for( var i = 0; i < hex.length; i +=4) {
str += String.fromCharCode( Number("0x" + hex.substr(i,4)));
}
return str;
}
console.log("Decoded string: %s", hexToString(demoHex) );
What it's doing:
It's treating the hex characters as a sequence of 4 hexadecimal digits that provide the UTF-16 character code of a character.
It gets each set of 4 digits in a loop using String.prototype.substr. Note MDN says .substr is deprecated but this is not mentioned in the ECMASript standard - rewrite it to use substring or something else as you wish.
Hex characters are prefixed with "0x" to make them a valid number representation in JavaScript and converted to a number object using Number. The number is then converted to a character string using the String.fromCharCode static method.
I guessed the format of the hex string by looking at it, which means a general purpose encoding routine to encode UTF16 characters (not code points) into hex could look like:
const hexEncodeUTF16 =
str=>str.split('')
.map( char => char.charCodeAt(0).toString(16).padStart(4,'0'))
.join('');
console.log( hexEncodeUTF16( "String has letters with diacritics: č, š, ř, ..."));
I hope these examples show what needs doing - there are any number of ways to implement it in code.

convert Ascii into Hex in Java Script for comparison

I have an Ascii value (something like "#") that I want to convert into an hex in JavaSkript that I can compare this value with some other hex-values.
Are there any casting possiblities?
Best regards and thanks,
Florian
Convert Ascii strings into base64 (hex)
var b64 = btoa("##%##!##$%");
Convert base64 to Ascii:
atob(b64);
Don't know what you want to compare, beside equality.
// Convert a (UTF-8 or ASCII) string to HEX:
function stringToHex(string) {
return '0x'+[...string].map(char => char.codePointAt(0).toString(16)).join('')
}
// Convert a HEX string into a number:
function hexToNumber(hex) {
return parseInt(hex, 16)
}

javascript Convert string representation of hex value to hex

In Javascript, how do I convert a string representation of a hex value into it's hex representation ?
What I have returning from a checksum routine is a string value "FE". What I need is it's hex representation "\xFE"
I cannot simply do this, as it gives me an error:
var crc = "FE";
var hex = "\x" + crc;
This just gives me a new 4 character ASCII string:
var crc = "FE";
var hex = "0x" + "FE";
thxs for any guidance.
like this
var hex = parseInt("FF", 16);
For the string \xFE, escape the backslash: var hex = '\\x'+'FE'
To convert 'FE' to a Number use +('0xFE')
To show +('0xFE') as a hexadecimal, use (224).toString(16), or '0x'+((254).toString(16))

Decode Unicode to character in javascript

I have the following unicode sequence:
d76cb9dd0020b370b2c8c758
I tried randomly in non-English character (for this experiment, I tried korean languange) as the original of above unicode lines :
희망 데니의
How can i decode those-above-mentioned unicode sequence into the original form?
As a JavaScript string literal, escape hex codes with \u:
var koreanString = "\ud76c\ub9dd\u0020\ub370\ub2c8\uc758";
Or just enter the korean characters into the string:
var koreanString = "희망 데니의";
To process a hex string representing unicode characters, parse the hex string to numbers and the build the unicode string use String.fromCharCode():
var hex = "d76cb9dd0020b370b2c8c758";
var koreanString = "";
for (var i = 0; i < hex.length; i += 4) {
koreanString += String.fromCharCode(parseInt(hex.substring(i, 4), 16));
}
Edit: You can get the length of any string by accessing its length property:
var stringLength = koreanString.length;
This will return 6. There is no "english" string. You have a string representing hexadecimal numbers, and hexadecimal numbers consist of characters from the latin character set, but these are not in any spoken language. They are just numbers. You can, of course, get the length of the hexadecimal string using the length property, but I'm not sure why you'd want to do that. It would be more straight forward to use an array of numbers instead of a string:
var charCodes = [0xd76c, 0xb9dd, 0x0020, 0xb370, 0xb2c8, 0xc758];
var koreanString = String.fromCharCode.apply(null, charCodes);
In this way, charCodes.length will be the same as koreanString.length.
How about
var str = 'd76cb9dd0020b370b2c8c758';
str = '"'+str.replace(/([0-9a-z]{4})/g, '\\u$1')+'"';
alert(JSON.parse(str));
DEMO

How to convert large UTF-8 strings into ASCII?

I need to convert large UTF-8 strings into ASCII. It should be reversible, and ideally a quick/lightweight algorithm.
How can I do this? I need the source code (using loops) or the JavaScript code. (should not be dependent on any platform/framework/library)
Edit: I understand that the ASCII representation will not look correct and would be larger (in terms of bytes) than its UTF-8 counterpart, since its an encoded form of the UTF-8 original.
You could use an ASCII-only version of Douglas Crockford's json2.js quote function. Which would look like this:
var escapable = /[\\\"\x00-\x1f\x7f-\uffff]/g,
meta = { // table of character substitutions
'\b': '\\b',
'\t': '\\t',
'\n': '\\n',
'\f': '\\f',
'\r': '\\r',
'"' : '\\"',
'\\': '\\\\'
};
function quote(string) {
// If the string contains no control characters, no quote characters, and no
// backslash characters, then we can safely slap some quotes around it.
// Otherwise we must also replace the offending characters with safe escape
// sequences.
escapable.lastIndex = 0;
return escapable.test(string) ?
'"' + string.replace(escapable, function (a) {
var c = meta[a];
return typeof c === 'string' ? c :
'\\u' + ('0000' + a.charCodeAt(0).toString(16)).slice(-4);
}) + '"' :
'"' + string + '"';
}
This will produce a valid ASCII-only, javascript-quoted of the input string
e.g. quote("Doppelgänger!") will be "Doppelg\u00e4nger!"
To revert the encoding you can just eval the result
var encoded = quote("Doppelgänger!");
var back = JSON.parse(encoded); // eval(encoded);
Any UTF-8 string that is reversibly convertible to ASCII is already ASCII.
UTF-8 can represent any unicode character - ASCII cannot.
As others have said, you can't convert UTF-8 text/plain into ASCII text/plain without dropping data.
You could convert UTF-8 text/plain into ASCII someother/format. For instance, HTML lets any character in UTF-8 be representing in an ASCII data file using character references.
If we continue with that example, in JavaScript, charCodeAt could help with converting a string to a representation of it using HTML character references.
Another approach is taken by URLs, and implemented in JS as encodeURIComponent.
Your requirement is pretty strange.
Converting UTF-8 into ASCII would loose all information about Unicode codepoints > 127 (i.e. everything that's not in ASCII).
You could, however try to encode your Unicode data (no matter what source encoding) in an ASCII-compatible encoding, such as UTF-7. This would mean that the data that is produced could legally be interpreted as ASCII, but it is really UTF-7.
If the string is encoded as UTF-8, it's not a string any more. It's binary data, and if you want to represent the binary data as ASCII, you have to format it into a string that can be represented using the limited ASCII character set.
One way is to use base-64 encoding (example in C#):
string original = "asdf";
// encode the string into UTF-8 data:
byte[] encodedUtf8 = Encoding.UTF8.GetBytes(original);
// format the data into base-64:
string base64 = Convert.ToBase64String(encodedUtf8);
If you want the string encoded as ASCII data:
// encode the base-64 string into ASCII data:
byte[] encodedAscii = Encoding.ASCII.GetBytes(base64);
It is impossible to convert an UTF-8 string into ASCII but it is possible to encode Unicode as an ASCII compatible string.
Probably you want to use Punycode - this is already a standard Unicode encoding that encodes all Unicode characters into ASCII. For JavaScript code check this question
Please edit you question title and description in order to prevent others from down-voting it - do not use term conversion, use encoding.
function utf8ToAscii(str) {
/**
* ASCII contains 127 characters.
*
* In JavaScript, strings is encoded by UTF-16, it means that
* js cannot present strings which charCode greater than 2^16. Eg:
* `String.fromCharCode(0) === String.fromCharCode(2**16)`
*
* #see https://developer.mozilla.org/en-US/docs/Web/API/DOMString/Binary
*/
const reg = /[\x7f-\uffff]/g; // charCode: [127, 65535]
const replacer = (s) => {
const charCode = s.charCodeAt(0);
const unicode = charCode.toString(16).padStart(4, '0');
return `\\u${unicode}`;
};
return str.replace(reg, replacer);
}
Better way
See Uint8Array to string in Javascript also. You can use TextEncoder and Uint8Array:
function utf8ToAscii(str) {
const enc = new TextEncoder('utf-8');
const u8s = enc.encode(str);
return Array.from(u8s).map(v => String.fromCharCode(v)).join('');
}
// For ascii to string
// new TextDecoder().decode(new Uint8Array(str.split('').map(v=>v.charCodeAt(0))))
Do you want to strip all non ascii chars (slash replace them with '?', etc) or to store Unicode code points in a non unicode system?
First can be done in a loop checking for values > 128 and replacing them.
If you don't want to use "any platform/framework/library" then you will need to write your own encoder. Otherwise I'd just use JQuery's .html();
Here is a function to convert UTF8 accents to ASCII Accents (àéèî etc)
If there is an accent in the string it's converted to %239 for exemple
Then on the other side, I parse the string and I know when there is an accent and what is the ASCII char.
I used it in a javascript software to send data to a microcontroller that works in ASCII.
convertUtf8ToAscii = function (str) {
var asciiStr = "";
var refTable = { // Reference table Unicode vs ASCII
199: 128, 252: 129, 233: 130, 226: 131, 228: 132, 224: 133, 231: 135, 234: 136, 235: 137, 232: 138,
239: 139, 238: 140, 236: 141, 196: 142, 201: 144, 244: 147, 246: 148, 242: 149, 251: 150, 249: 151
};
for(var i = 0; i < str.length; i++){
var ascii = refTable[str.charCodeAt(i)];
if (ascii != undefined)
asciiStr += "%" +ascii;
else
asciiStr += str[i];
}
return asciiStr;
}
An implementation of the quote() function might do what you want.
My version can be found here
You can use eval() to reverse the encoding:
var foo = 'Hägar';
var quotedFoo = quote(foo);
var unquotedFoo = eval(quotedFoo);
alert(foo === unquotedFoo);

Categories

Resources