Decode Unicode to character in javascript

Decode Unicode to character in javascript - javascript

I have the following unicode sequence:
d76cb9dd0020b370b2c8c758
I tried randomly in non-English character (for this experiment, I tried korean languange) as the original of above unicode lines :
희망 데니의
How can i decode those-above-mentioned unicode sequence into the original form?

As a JavaScript string literal, escape hex codes with \u:
var koreanString = "\ud76c\ub9dd\u0020\ub370\ub2c8\uc758";
Or just enter the korean characters into the string:
var koreanString = "희망 데니의";
To process a hex string representing unicode characters, parse the hex string to numbers and the build the unicode string use String.fromCharCode():
var hex = "d76cb9dd0020b370b2c8c758";
var koreanString = "";
for (var i = 0; i < hex.length; i += 4) {
koreanString += String.fromCharCode(parseInt(hex.substring(i, 4), 16));
}
Edit: You can get the length of any string by accessing its length property:
var stringLength = koreanString.length;
This will return 6. There is no "english" string. You have a string representing hexadecimal numbers, and hexadecimal numbers consist of characters from the latin character set, but these are not in any spoken language. They are just numbers. You can, of course, get the length of the hexadecimal string using the length property, but I'm not sure why you'd want to do that. It would be more straight forward to use an array of numbers instead of a string:
var charCodes = [0xd76c, 0xb9dd, 0x0020, 0xb370, 0xb2c8, 0xc758];
var koreanString = String.fromCharCode.apply(null, charCodes);
In this way, charCodes.length will be the same as koreanString.length.

How about
var str = 'd76cb9dd0020b370b2c8c758';
str = '"'+str.replace(/([0-9a-z]{4})/g, '\\u$1')+'"';
alert(JSON.parse(str));
DEMO

Related

Javascript hexadecimal to ASCII with latin extended symbols

I am getting a hexadecimal value of my string that looks like this:
String has letters with diacritics: č,š,ř, ...
Hexadecimal value of this string is:
0053007400720069006E006700200068006100730020006C0065007400740065007200730020007700690074006800200064006900610063007200690074006900630073003A0020010D002C00200161002C00200159002C0020002E002E002E
The problem is that when i try to convert this value back to ascii it poorly converts the č,š,ř,.. and returns symbol of little box with question mark in it instead of these symbols.
My code for converting hex to ascii:
function convertHexadecimal(hexx){
let index = hexx.indexOf("~");
let strInfo = hexx.substring(0, index+1);
let strMessage = hexx.substring(index+1);
var hex = strMessage.toString();
var str = '';
for (var i = 0; i < hex.length; i += 2){
str += String.fromCharCode(parseInt(hex.substr(i, 2), 16));
}
console.log("Zpráva: " + str);
var strFinal = strInfo + str;
return strFinal;
}
Can somebody help me with this?

First an example solution:
let demoHex = `0053007400720069006E006700200068006100730020006C0065007400740065007200730020007700690074006800200064006900610063007200690074006900630073003A0020010D002C00200161002C00200159002C0020002E002E002E`;
function hexToString(hex) {
let str="";
for( var i = 0; i < hex.length; i +=4) {
str += String.fromCharCode( Number("0x" + hex.substr(i,4)));
}
return str;
}
console.log("Decoded string: %s", hexToString(demoHex) );
What it's doing:
It's treating the hex characters as a sequence of 4 hexadecimal digits that provide the UTF-16 character code of a character.
It gets each set of 4 digits in a loop using String.prototype.substr. Note MDN says .substr is deprecated but this is not mentioned in the ECMASript standard - rewrite it to use substring or something else as you wish.
Hex characters are prefixed with "0x" to make them a valid number representation in JavaScript and converted to a number object using Number. The number is then converted to a character string using the String.fromCharCode static method.
I guessed the format of the hex string by looking at it, which means a general purpose encoding routine to encode UTF16 characters (not code points) into hex could look like:
const hexEncodeUTF16 =
str=>str.split('')
.map( char => char.charCodeAt(0).toString(16).padStart(4,'0'))
.join('');
console.log( hexEncodeUTF16( "String has letters with diacritics: č, š, ř, ..."));
I hope these examples show what needs doing - there are any number of ways to implement it in code.

JavaScript how to split a string every n characters while ignoring ANSI codes

How would you approach splitting a JavaScript string every n characters while ignoring the ansi codes? (so splitting every n + length of ansi characters contained in that string)
It is important to keep the ansi code in the final array.
I know using regex you'd write something like /.{1,3}/, but how would you ignore the ansi chars in the count?
Example:
Given \033[34mHey \033[35myou\033[0m, how would you split every 3 chars to get:
[
'\033[34mHey',
' \033[35myo',
'u\033[0m'
]

Here is a way to achieve what you need:
s = "\033[34mHey \033[35myou\033[0mfd\033[1m";
chunks = s.match(/(?:(?:\033\[[0-9;]*m)*.?){1,3}/g);
var arr = [];
[].forEach.call(chunks, function(a) {
if (!/^(?:\033\[[0-9;]*m)*$/.test(a)) {
arr.push(a);
}
});
document.getElementById("r").innerHTML = JSON.stringify(arr);
<div id="r"/>
Note that octal codes can be used directly in the regex. We filter all the empty and ANSI-color code only elements in the forEach call.

You can match ANSI color escapes as \x1B\[[\d;]*m, and only count all characters except escapes [^\x1B]
/(?:(?:\x1B\[[\d;]*m)*[^\x1B]){1,3}/g
Also, to include escapes in the end of string as part of the last token:
/(?:(?:\x1B\[[\d;]*m)*[^\x1B]){1,3}(?:(?:\x1B\[[\d;]*m)+$)?/g
Code
subject = "\033[34mHey y\033[0mou\033[0m";
pattern = /(?:(?:\x1B\[[\d;]*m)*[^\x1B]){1,3}(?:(?:\x1B\[[\d;]*m)+$)?/g;
result = subject.match(pattern);
document.write('<pre>' + JSON.stringify(result) + '</pre>');

javascript Convert string representation of hex value to hex

In Javascript, how do I convert a string representation of a hex value into it's hex representation ?
What I have returning from a checksum routine is a string value "FE". What I need is it's hex representation "\xFE"
I cannot simply do this, as it gives me an error:
var crc = "FE";
var hex = "\x" + crc;
This just gives me a new 4 character ASCII string:
var crc = "FE";
var hex = "0x" + "FE";
thxs for any guidance.

like this
var hex = parseInt("FF", 16);

For the string \xFE, escape the backslash: var hex = '\\x'+'FE'
To convert 'FE' to a Number use +('0xFE')
To show +('0xFE') as a hexadecimal, use (224).toString(16), or '0x'+((254).toString(16))

How do I convert unicode from an HTML input to a greek character in javascript?

How do I convert unicode from an HTML input to a greek character in javascript? The first example does not work, but the second does.
var str = input.value; \u03B1 (typed into input box)
console.log(str); \u03B1
var str = "\u03B1"; assigned directly
console.log(str); α

To convert unicode literals in strings to actual characters you can just run them though String.prototype.replace with String.fromCharCode
var str = '\\u03B1\\u03B2\\u03B3\\u03B4'; // "\u03B1\u03B2\u03B3\u03B4"
str.replace(/\\u([\da-fA-F]{4})/g, function (m, $1) {
return String.fromCharCode(parseInt($1, 16));
}); // "αβγδ"

The backslash is escaping in the second variable str - in the first value, if input.value was \u03B1, it would ACTUALLY be the same as var str = "\\U03B1" to invalidate the backslash by escaping it.
If you want to evaluate the escaped character in the field, you can do so like this:
var str = input.value.replace("\\u", "");
str = String.fromCharCode(parseInt(str, 16));
This works because you are parsing an integer from everything after \u and passing that into fromCharCode. Character codes are in integers - you are parsing that code from the original \u23B1 code.

How to check if any Arabic character exists in the string ( javascript )

How to check if any Arabic character exists in the string with javascript language

According to Wikipedia, Arabic characters fall in the Unicode range 0600 - 06FF. So you can use a regular expression to test if the string contains any character in this range:
var arabic = /[\u0600-\u06FF]/;
var string = 'عربية‎'; // some Arabic string from Wikipedia
alert(arabic.test(string)); // displays true

function isArabic(text) {
var pattern = /[\u0600-\u06FF\u0750-\u077F]/;
result = pattern.test(text);
return result;
}

how it work for me is
$str = "عربية";
if(preg_match("/^\x{0600}-\x{06FF}]+/u", $str))echo "invalid";
else echo "valid";
You can check extended range of Arabic character
0x600 - 0x6ff
0x750 - 0x77f
0xfb50 - 0xfc3f
0xfe70 - 0xfefc
So expression will look more like "/^\x{0600}-\x{06FF}\x{0750}-\x{077f}]+/u"
Good Luck

Ranges for Arabic characters are:
0x600 - 0x6ff
0x750 - 0x77f
0xfb50 - 0xfc3f
0xfe70 - 0xfefc

Check if string is arabic:
function isArabic (string) {
let def = 0;
let ar = 0;
string.split('').forEach(i => /[\u0600-\u06FF]/.test(i) ? (ar++) : (def++))
return ar >= def
}

Checkout the npm package I created.
https://www.npmjs.com/package/is-arabic
It checks both Arabic and Farsi letters and Unicode as well. It also checks for Arabic symbols, Harakat, and numbers. You can also make it check for a certain number of characters.By default it checks if the whole string is Arabic. Use the count option to check if a string includes Arabic characters. It has full support. Check it out.
Example:
const isArabic = require("is-arabic");
const text = "سلام";
// Checks if the whole string is Arabic
if (isArabic(text)){
// Do something
}
// Check if string includes Arabic characters
// count: The number of Arabic characters occurrences for the string to be considered Arabic
const text2 = "مرحبا Hello";
const options = { count: 4 };
const includesArabic = isArabic(text, options);
console.log(includesArabic); // true

Develop Reference

JavaScript is the programming language of the Web.

Decode Unicode to character in javascript - javascript

I have the following unicode sequence: d76cb9dd0020b370b2c8c758 I tried randomly in non-English character (for this experiment, I tried korean languange) as the original of above unicode lines : 희망 데니의 How can i decode those-above-mentioned unicode sequence into the original form?

How about var str = 'd76cb9dd0020b370b2c8c758'; str = '"'+str.replace(/([0-9a-z]{4})/g, '\\u$1')+'"'; alert(JSON.parse(str)); DEMO

Related

Javascript hexadecimal to ASCII with latin extended symbols

JavaScript how to split a string every n characters while ignoring ANSI codes

javascript Convert string representation of hex value to hex

How do I convert unicode from an HTML input to a greek character in javascript?

How to check if any Arabic character exists in the string ( javascript )

Categories

Resources