Javascript OCR tesseract.js Error in copying number after recognition - javascript

i'm working on this project the idea of it is that you give the program an image and using OCR in javascript the program well detect or (recognize) a string or a word for example ('رقم العداد')
and copies the the number or the integer after the string with ( spaces ) like ==>>
7038842 رقم العداد
and that is it so i'm using Tesseract.js ( Tesseract.recognize ) to recognize the string but at the first i faced an Error
Uncaught (in promise)
Erorr so after beating around the bush its turned out that the tesseract fail to detect some Arabic letters as they are so i print all the text detected from the image and it turend out that the string ['نقطة الخدمة'] is recognized as ['ننطة الخدمة'] and ['رقم العداد'] as ['رم العداد'] so using
string.match method
to maniplate and copy the number after the word the number was given for ['رم العداد'] was correct and clear but !!! for some reason the code is not copying the number written after the word ['ننطة الخدمة'] i tried to play around like adding spaces and tabs but the same problem is given so eventually i decieded to ask for some help so what is i'm missing
the code :-
<script>
Tesseract.recognize(
'form.png',
'ara',
{ logger: m => console.log(m) }
).then(({ data: { text } }) => {
console.log(text);
const info = ['ننطة الخدمة','رم العداد','القراءة'];
for(k=0;k<info.length;k++){
var result = text.match(new RegExp(info[0] + '\\s+(\\w+)'))[1]; /* info[0] the index of ['نقطة الخدمة']*/
alert(result);
}
})
</script>
the project image:-

Related

Correct way of converting unicode to emoji

I'm using String.formCodePoint to convert Unicode to emoji, but some emojis don't convert as expected. They display like line icons. Please check the example below, first two emojis render correctly, but the last two don't.
for example:
const unicode = ["1f976", "1f97a", "263a-fe0f", "2639"]
unicode.forEach((val) => {
document.body.innerHTML += String.fromCodePoint(parseInt(val, 16))
});
Result:
Your code is not correct.
Old Emoji are not coloured by default, so you need to add the variation code 'fe0f`. You tried on the third one (but not on the forth one), but you convert wrongly to numbers, so it will fail.
This code will fix it (if you have emoji fonts installed).
const unicode = ["1f976", "1f97a", "263a", "fe0f", "2639", "fe0f"]
unicode.forEach((val) => {
document.body.innerHTML += String.fromCodePoint(parseInt(val, 16))
});

Template literal, weird output

I'm working with some lab equipment using ASTM over TCP/IP. Getting some weird behavior. Using just Node and the net package.
socket.on('data', data => {
let str = data.toString('ascii');
console.log(`the string ---- ${str}`);
if (str === ENQ) {
socket.write(ACK);
} else {
console.log(str);
}
outputs (given correct input):
E1 string ---- 1H|\^&|||1^Analyzer 1^6.0|||||||P||20201216150358
E1|\^&|||1^Analyzer 1^6.0|||||||P||20201216150358
I need the stuff on the top line after the dashes, but "The" becomes E1, then E1 moves down to the next line and replaces 1H. What's going on here? I'm hoping it just has something to do with console.log so I can still get to the results I'm looking for.
So it looks like some of the control characters are making the output weird. Towards the end of the line, there is a CR and ETX at the end of the line followed by a checksum of the line. So it seems that the carriage return is sending the cursor back to the front of the line and putting the ETX and checksum in place of "The"

JavaScript - Why does this code alert a message?

I don't know much about JavaScript, but I found this code as a part of some game engine code. I tried to inspect it, because I noticed this part of code alerts a message and I really cannot figure out how. Here is the minimal code (I reduced it and extracted from original script and I changed variable names to single letters):
var a = '͏‪͏‪‪‪‪‪͏͏‪‪‪‪͏‪͏͏‪͏͏‪‪‪͏‪͏‪‪͏‪‪͏‪‪‪‪‪‪͏͏‪͏‪‪͏‪‪͏͏‪͏‪͏͏͏͏‪‪‪͏͏͏͏͏‪‪͏‪‪͏‪͏‪‪‪͏͏͏‪͏‪‪‪͏‪‪‪͏‪‪‪͏‪͏͏͏‪‪‪‪͏‪‪͏‪‪͏‪‪‪͏͏‪‪‪‪͏‪‪͏‪‪‪‪‪͏͏͏‪‪‪‪‪͏‪͏‪‪‪‪‪͏͏͏‪‪‪‪͏‪‪͏‪‪‪͏‪͏͏͏‪‪‪‪‪͏‪͏‪‪‪‪͏͏‪͏‪‪‪͏͏͏͏͏‪‪‪‪‪͏͏͏‪‪‪‪‪͏‪͏‪‪͏‪‪͏‪͏‪‪‪͏͏͏‪͏‪‪‪͏‪‪‪͏‪‪‪‪‪͏͏͏‪‪‪‪͏‪‪͏‪͏‪‪‪͏‪͏‪͏‪‪‪͏‪͏͏‪͏‪͏͏͏͏͏‪͏‪͏͏͏͏‪‪‪͏‪͏‪͏‪‪‪͏͏͏‪͏‪‪͏‪‪‪͏͏‪‪‪͏͏‪͏͏‪‪‪‪‪͏͏͏‪‪‪‪‪͏‪͏‪‪‪‪͏͏‪͏‪‪‪͏‪‪͏͏‪‪‪‪͏‪͏͏‪‪‪͏‪‪‪͏‪͏‪‪‪͏‪͏͏‪͏‪͏‪‪͏‪‪‪͏͏͏‪͏‪‪‪͏͏‪‪͏‪‪‪͏͏‪‪͏‪‪‪‪͏‪‪͏‪‪‪͏͏‪‪͏‪‪‪‪͏‪‪͏‪‪‪‪͏‪‪͏‪‪‪͏͏‪‪͏‪‪‪‪͏‪‪͏‪‪‪‪͏‪‪͏‪‪‪‪͏‪‪͏‪‪͏‪‪͏‪͏‪‪‪‪͏‪͏͏‪‪‪͏‪‪‪͏‪͏‪‪‪͏‪͏͏‪͏‪͏͏‪͏͏‪͏‪͏͏͏͏͏‪͏‪͏͏‪͏‪‪‪‪‪‪͏͏‪͏‪‪‪͏‪͏‪͏‪‪͏‪‪͏͏‪͏‪͏͏͏͏‪‪‪͏‪͏‪͏‪͏‪‪͏‪‪͏‪‪‪‪‪͏‪͏‪‪‪‪͏͏‪͏‪͏‪‪͏‪‪͏‪‪‪‪͏͏͏͏‪͏‪‪‪͏‪͏‪‪‪‪‪͏‪͏‪͏‪‪͏‪‪͏‪‪͏‪‪‪‪͏‪͏‪‪‪͏‪͏͏‪͏‪͏͏‪͏‪‪‪͏͏͏‪͏‪‪‪͏‪‪‪͏‪‪͏‪‪‪͏͏‪‪‪‪‪͏͏͏‪‪‪‪‪͏‪͏‪‪‪͏͏‪͏͏‪͏‪‪‪͏‪͏‪͏‪‪‪͏‪͏‪‪‪‪‪͏‪͏͏‪͏‪͏‪‪͏‪‪͏‪‪‪‪͏‪‪‪‪‪͏͏͏‪‪‪‪͏͏‪͏͏‪͏͏͏͏‪͏‪‪‪͏‪͏‪͏‪‪‪͏͏͏‪͏͏‪͏‪͏‪‪͏͏‪͏͏͏͏‪͏‪‪‪‪‪͏‪͏‪‪‪‪͏‪͏͏‪͏‪‪͏‪‪͏͏‪͏͏͏͏‪͏‪‪‪‪‪͏‪͏‪‪‪‪͏‪‪͏‪‪‪‪͏͏‪͏‪͏‪‪‪͏‪͏͏‪͏‪͏‪‪͏͏‪͏‪͏͏͏͏‪‪‪‪͏͏͏͏͏‪͏͏͏͏‪͏‪‪͏‪‪‪‪͏‪‪‪͏‪‪‪͏͏‪͏‪͏‪‪͏‪͏‪‪͏‪‪͏‪‪‪‪‪͏‪͏‪͏‪‪‪͏‪͏‪‪‪͏‪‪͏͏‪‪‪‪͏͏͏͏͏‪͏‪͏‪‪͏‪͏‪‪͏‪‪͏‪‪‪‪‪‪͏͏͏‪͏‪͏͏‪͏‪‪‪͏‪͏‪͏‪‪‪‪͏͏͏͏‪‪‪‪‪͏‪͏͏‪͏‪͏͏‪͏‪‪‪͏‪͏͏͏‪‪‪‪͏͏‪͏‪‪͏‪‪‪‪͏‪͏‪‪͏‪‪͏‪‪͏‪‪‪‪͏‪͏‪‪‪͏‪͏‪‪‪‪‪͏‪͏͏‪͏‪͏͏‪͏‪͏‪‪͏‪‪͏‪‪‪͏͏‪‪͏‪͏‪‪‪͏‪͏͏‪͏‪͏‪‪͏‪‪‪‪͏‪͏͏‪‪‪͏͏‪͏͏‪‪‪‪‪͏͏͏͏‪͏͏͏͏‪͏‪‪‪‪‪͏‪͏‪‪‪‪͏‪‪͏‪‪‪‪͏‪‪͏‪‪‪‪‪͏͏͏‪‪‪‪͏‪‪͏‪‪‪͏͏͏‪͏‪͏‪‪͏‪‪͏‪‪‪‪͏‪‪͏‪‪‪͏͏͏͏͏‪͏‪‪͏‪‪͏‪‪‪‪‪‪͏͏‪‪‪‪‪͏‪͏͏‪͏‪͏͏‪͏‪‪‪‪͏‪͏͏‪‪‪‪͏‪‪͏‪‪͏‪‪‪‪͏‪͏‪‪͏‪‪͏‪͏‪‪‪͏‪͏‪‪͏‪‪‪͏͏‪‪‪‪͏͏‪͏‪‪‪͏‪͏‪͏‪‪‪‪͏‪͏͏‪‪͏‪‪͏‪͏‪‪‪‪͏͏‪͏͏‪͏͏͏͏‪͏‪‪‪͏‪‪͏͏͏‪͏͏‪‪‪͏͏‪‪‪͏‪‪͏‪‪͏͏‪‪͏͏‪‪͏‪‪‪‪͏‪‪‪͏͏‪͏͏͏‪͏‪͏͏͏͏‪‪͏‪͏͏‪͏͏‪͏͏͏͏͏͏‪‪͏‪‪‪‪͏‪‪͏͏‪‪͏͏͏‪͏͏‪‪‪͏‪‪͏‪‪͏‪͏‪‪͏‪‪‪͏͏‪‪͏‪‪‪‪͏‪‪‪͏͏͏͏͏‪‪‪͏͏͏‪͏‪‪‪͏͏‪͏͏‪‪‪͏͏‪‪͏‪‪‪͏‪͏͏͏‪‪‪͏‪͏‪͏‪‪‪͏‪‪͏͏‪‪‪͏‪‪‪͏‪‪‪‪͏͏͏͏‪‪‪‪͏͏‪͏‪‪‪‪͏‪͏͏‪‪‪‪͏‪‪͏‪‪‪‪‪͏͏͏‪‪‪‪‪͏‪͏‪‪‪‪‪‪͏͏͏‪͏͏‪‪‪͏͏‪͏‪͏͏‪͏‪‪‪͏‪‪‪͏‪‪͏‪͏͏‪͏‪‪‪͏‪͏͏͏‪‪͏‪͏͏͏͏͏‪͏‪͏͏͏͏‪͏‪‪‪‪‪͏͏‪͏‪‪‪͏͏‪‪‪͏͏‪‪͏‪‪‪͏͏͏͏͏‪‪͏‪‪͏͏͏‪‪͏‪͏͏‪͏‪‪‪͏‪͏͏͏͏‪͏‪͏͏͏͏‪‪͏‪͏͏‪͏͏‪͏‪͏͏‪͏͏‪͏‪͏͏‪͏‪͏‪‪‪‪‪͏͏‪‪‪‪͏‪͏‪‪͏‪͏‪͏͏‪‪͏‪‪‪‪͏‪‪͏‪͏͏‪͏‪‪͏‪‪‪͏͏͏‪͏‪͏͏͏͏‪‪‪͏͏͏͏͏‪‪͏‪‪‪‪͏‪‪‪͏͏͏͏͏͏‪͏‪͏͏͏͏͏‪͏‪͏͏‪͏͏‪͏‪͏͏‪͏͏‪‪‪͏‪‪͏‪‪͏͏‪͏‪͏‪‪‪͏‪‪͏͏‪‪͏͏͏͏‪͏‪‪͏‪‪͏͏͏͏‪͏‪͏͏͏͏‪͏‪‪‪‪‪͏͏‪͏‪͏͏‪';
var b = a.match(/.{8}/g);
var c = b.map(a => [...a].map(a => a == '‪' | 0));
var d = c.map(a => parseInt(a.join``, 2).toString(16));
var e = d.map(a => eval(`'\\x${a.padStart(2, 0)}'`));
var f = eval(e.join``);
I'm trying to understand how they succeed to alert a message. It alerts number 12345, but how? I see some evals here, so I suppose they are making code on the fly, but still I tried using debugger but I couldn't find explanation. They are somehow generating code and executing it, I'm still unable to see how.
I tried this code in jsFiddle and it still works and I tried in Node.js and it throw error alert is not defined, so I am pretty sure everything this code does is to alert a message.
What trick did they use here? How are they making and evaling code and how do they succeed to alert a message? Is this some sort of encription or what?
My question has absolutely nothing to do with this question.
The code is all there, hidden in the variable a. No, it's not an empty string, its a string consisting of 1888 invisible characters - either \u034f or \u202a to be precise. So this is in fact just a disguised binary encoding.
The code part
var b = a.match(/.{8}/g);
var c = b.map(a => [...a].map(a => a == '‪' | 0));
var d = c.map(a => parseInt(a.join``, 2).toString(16));
breaks them in chunks of 8, then converts each chunk from an array of characters to an array of booleans (or rather, the integers 0 and 1) - notice that it compares the character against the invisible \u202a, and then converts each array-of-8-booleans (oh look, an octet!) into an actual byte and gets a hex representation of it. Here's the hex string (d.join('')):
5f3d275b7e5b28706d7177747b6e7b7c7d7c7b747d79707c7d6d71777c7b5d5d282875716e727c7d79767a775d2b7173737b737b7b737b7b7b6d7a775d2928297e5d5b28755b7d795b785d7d5b6f5d2971776e7c7d725d5d7d2b6f7c792175712b217d7a5b217d7b795d2b2878216f772b5b7d5d76782b5b7e2975787d2974796f5b6f5d7d295b735d2b7a727c217d7b7b7c7b715b7b705b7e7d297a7b6f5b5d6e79757a6d792176273b666f722869206f66276d6e6f707172737475767778797a7b7c7d7e272977697468285f2e73706c6974286929295f3d6a6f696e28706f702829293b6576616c285f29
The part
d.map(a => eval(`'\\x${a.padStart(2, 0)}'`));
has each of them parsed into a character, using a backslash escape. String.fromCharCode would have been the simpler choice. Also the padStart is not even required here, given that none of the bytes is a control character with a byte value less than 16. Maybe this would've been more familiar:
"\x5f\x3d\x27\x5b\x7e\x5b\x28\x70\x6d\x71\x77\x74\x7b\x6e\x7b\x7c\x7d\x7c\x7b\x74\x7d\x79\x70\x7c\x7d\x6d\x71\x77\x7c\x7b\x5d\x5d\x28\x28\x75\x71\x6e\x72\x7c\x7d\x79\x76\x7a\x77\x5d\x2b\x71\x73\x73\x7b\x73\x7b\x7b\x73\x7b\x7b\x7b\x6d\x7a\x77\x5d\x29\x28\x29\x7e\x5d\x5b\x28\x75\x5b\x7d\x79\x5b\x78\x5d\x7d\x5b\x6f\x5d\x29\x71\x77\x6e\x7c\x7d\x72\x5d\x5d\x7d\x2b\x6f\x7c\x79\x21\x75\x71\x2b\x21\x7d\x7a\x5b\x21\x7d\x7b\x79\x5d\x2b\x28\x78\x21\x6f\x77\x2b\x5b\x7d\x5d\x76\x78\x2b\x5b\x7e\x29\x75\x78\x7d\x29\x74\x79\x6f\x5b\x6f\x5d\x7d\x29\x5b\x73\x5d\x2b\x7a\x72\x7c\x21\x7d\x7b\x7b\x7c\x7b\x71\x5b\x7b\x70\x5b\x7e\x7d\x29\x7a\x7b\x6f\x5b\x5d\x6e\x79\x75\x7a\x6d\x79\x21\x76\x27\x3b\x66\x6f\x72\x28\x69\x20\x6f\x66\x27\x6d\x6e\x6f\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x27\x29\x77\x69\x74\x68\x28\x5f\x2e\x73\x70\x6c\x69\x74\x28\x69\x29\x29\x5f\x3d\x6a\x6f\x69\x6e\x28\x70\x6f\x70\x28\x29\x29\x3b\x65\x76\x61\x6c\x28\x5f\x29"
This string is the one evaled in the last line. But surprise, the contents of that string are just
_='[~[(pmqwt{n{|}|{t}yp|}mqw|{]]((uqnr|}yvzw]+qss{s{{s{{{mzw])()~][(u[}y[x]}[o])qwn|}r]]}+o|y!uq+!}z[!}{y]+(x!ow+[}]vx+[~)ux})tyo[o]})[s]+zr|!}{{|{q[{p[~})z{o[]nyuzmy!v';for(i of'mnopqrstuvwxyz{|}~')with(_.split(i))_=join(pop());eval(_)
So what does - still obfuscated - code do?
var _='[~[(pmqwt{n{|}|{t}yp|}mqw|{]]((uqnr|}yvzw]+qss{s{{s{{{mzw])()~][(u[}y[x]}[o])qwn|}r]]}+o|y!uq+!}z[!}{y]+(x!ow+[}]vx+[~)ux})tyo[o]})[s]+zr|!}{{|{q[{p[~})z{o[]nyuzmy!v';
for (var i of 'mnopqrstuvwxyz{|}~')
with (_.split(i))
_=join(pop());
eval(_)
Removing the with magic, we get
for (var i of 'mnopqrstuvwxyz{|}~') {
let temp = _.split(i);
_ = temp.join(temp.pop());
}
So for all of these characters from m to z, it splits _ by that, takes the last part out, and joins it back together, effectively
replacing m by y!v,
replacing n by yuz,
replacing o by [],
replacing p by [~})z{,
replacing q by [{,
replacing r by |!}{{|{,
replacing s by ]+z,
replacing t by y[][[]]})[,
replacing u by x}),
replacing v by x+[~),
replacing w by +[}],
replacing x by ![],
replacing y by ]+(,
replacing z by [!}{,
replacing { by +!},
replacing | by ]+(!![]})[,
replacing } by +[],
replacing ~ by ][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]
and after all that we get for _ to be evaled the code
[][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]][([][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+([][[]]+[])[+!+[]]+(![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[+!+[]]+([][[]]+[])[+[]]+([][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+(!![]+[])[+!+[]]]((![]+[])[+!+[]]+(![]+[])[!+[]+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]+(!![]+[])[+[]]+(![]+[][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[!+[]+!+[]+[+[]]]+[+!+[]]+[!+[]+!+[]]+[!+[]+!+[]+!+[]]+[!+[]+!+[]+!+[]+!+[]]+[!+[]+!+[]+!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+([![]]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[!+[]+!+[]+[+[]]])()
Now doesn't that look familiar? It's good old jsfuck!
I found this code as a part of some game engine code
I doubt it. Looks much more like a submission to a code obfusciation context. However, it doesn't appear to be hand-crafted, more likely someone just blindly chained multiple obfusciation tools together.

Why is the following file extension regex not matching?

This is the code:
isValid = field.uploads.forEach(upload => {
console.log(upload.file)
console.log('ext:', _.getFileExt(upload.file.name))
console.log('reg:', regex)
console.log('res:', regex.test(_.getFileExt(upload.file.name)))
})
These are the logs:
ext: jpg
reg: /^.*\.(.jpg)$/i
res: false
As you can see, even if the file is jpg the regex returns false. Why is this?
EDIT:
Here are the utility functions:
_.listToRegex = (array) => {
return new RegExp('^.*\\.(' + array.join('|') + ')$', 'i')
}
_.getFileExt = (string) => {
return string.split('.').pop()
}
You have two problems as far as I can tell. First, your regular expression has an error. Second, you have to decide whether the regular expression that you are creating is intended to match against file names or file extensions.
1. The regex that was logged to the console contains an extra period, as pointed out in the comments above. /^.*\.(.jpg)$/i could be condensed to /\.(jpg)$/i if you only intend to use the expression to test for validity. You did not seem to show the assignment of your regex variable, so it is difficult to gauge how exactly the error arose, but my best guess is that you called your listToRegex utility like:
var regex = _.listToRegex(['.jpg'])
The . in the string '.jpg' would cause the introduction of the extra period. You could replace that code with:
var regex = _.listToRegex(['jpg'])
2. Secondly, you seem to be testing your regular expression against the file extension, when I think you want to test it against a file name.
regex.test(upload.file.name) //=> true

Why multiple unicode conversions String.fromCharCode("👉".charCodeAt(0)) ruin the symbol in Chrome console and how to fix it?

I have found this today and can't make out why it fails:
Basically if you take some obscure symbol like
"👉"
then "👉".charCodeAt(0) in chrome console - you will get the code 55357, but when you revert the operation with String.fromCharCode(55357) it produces "�"
Even if I do it like this String.fromCharCode("👉".charCodeAt(0)) it produces "�" however String.fromCharCode("👉".charCodeAt(0)).charCodeAt(0) is still 55357, so information isn't lost, and it implies that it is Chrome that can't find correct symbol to map to 55357.
Why Chrome cannot represent symbol correctly? Is it because it cannot map it to font correctly? How do I make double conversion to be shown as "👉" again?
If you log
"👉".length
you will get 2, that is, the string actually contains 2 characters, not one. This is because JS only supports 16-bit unicode (BMP) and encodes "astral plane" symbols with "surrogate pairs". Your symbol is \uD83D\uDC49 internally, and when you do .charCodeAt(0) you only get \uD83D, which is invalid unicode.
More on https://mathiasbynens.be/notes/javascript-unicode
Following script will get the 'correct' char code (128073)
(("👉".charCodeAt(0)-0xD800)*0x400) + ("👉".charCodeAt(1)-0xDC00) + 0x10000
one then can convert it to HTML char code like this:
"&#x"+(((("👉".charCodeAt(0)-0xD800)*0x400) + ("👉".charCodeAt(1)-0xDC00) + 0x10000)).toString(16)+";"
And string extension:
String.prototype.charCodeUTF32 = function(){
return ((((this.charCodeAt(0)-0xD800)*0x400) + (this.charCodeAt(1)-0xDC00) + 0x10000));
};
Hope this saves you some time.
TypeScript to convert a text containing emojis:
private emoji2html(text: string): string {
const regexAstralSymbols = /([\uD800-\uDBFF])([\uDC00-\uDFFF])/g;
return text.replace(regexAstralSymbols, (m, first, second) =>
`&#x${(first + second).charCodeUTF32().toString(16)};`);
}

Categories

Resources