Display unicode characters (emojies) in Javascript - javascript

I have a database with tweets such as "\U0001f374 Lunch. Had loads of meat..." -- that is, with emojies represented as unicode (\U0001f374 is the knife&fork emoji). In my Web app I fetch tweets using Ajax requests and want to display them.
No big deal, and I have it so far up and running that I can display the "raw" tweet strings with the unicode. However, I like to render the emojies. How can I do this in Javascript?

Since a notation like \U0001f374 is undefined in JavaScript, you need to construct the character from it with your own code (or suitable library code). You could parse the Unicode number from the string and convert it to a pair of surrogate code points.
But if you are using JavaScript in the HTML (or XML) context, you could let the HTML (or XML) parser do the job. Just change the string (assumed to have 8 hex digits) to an HTML or XML character reference and make sure the result is parsed as markup:
var sample = document.getElementById('in').value;
sample = sample.replace(/\\U([0-9a-f]{8})/gi, "&#x$1;");
document.getElementById('demo').innerHTML = sample;
<input id=in size=40
value="\U0001f374 Lunch. Had loads of meat...">
<div id=demo>To be replaced</div>

Related

Decoding Base64 String in Java

I'm using Java and I have a Base64 encoded string that I wish to decode and then do some operations to transform.
The correct decoded value is obtained in JavaScript through function atob(), but in java, using Base64.decodeBase64() I cannot get an equal value.
Example:
For:
String str = "AAAAAAAAAAAAAAAAAAAAAMaR+ySCU0Yzq+AV9pNCCOI="
With JavaScript atob(str) I get ->
"Æ‘û$‚SF3«àö“Bâ"
With Java new String(Base64.decodeBase64(str)) I get ->
"Æ?û$?SF3«à§ö?â"
Another way I could fixed the issue is to run JavaScript in Java with a Nashorn engine, but I'm getting an error near the "$" symbol.
Current Code:
ScriptEngine engine = new ScriptEngineManager().getEngineByName("JavaScript");
String script2 = "function decoMemo(memoStr){ print(atob(memoStr).split('')" +
".map((aChar) => `0${aChar.charCodeAt(0).toString(16)}`" +
".slice(-2)).join('').toUpperCase());}";
try {
engine.eval(script2);
Invocable inv = (Invocable) engine;
String returnValue = (String)inv.invokeFunction("decoMemo", memoTest );
System.out.print("\n result: " + returnValue);
} catch (ScriptException | NoSuchMethodException e1) {
e1.printStackTrace();
Any help would be appreciated. I search a lot of places but can't find the correct answer.
btoa is broken and shouldn't be used.
The problem is, bytes aren't characters. Base64 encoding does only one thing. It converts bytes to a stream of characters that survive just about any text-based transport mechanism. And Base64 decoding does that one thing in reverse, it converts such characters into bytes.
And the confusion is, you're printing those bytes as if they are characters. They are not.
You end up with the exact same bytes, but javascript and java disagree on how you're supposed to turn that into an ersatz string because you're trying to print it to a console. That's a mistake - bytes aren't characters. Thus, some sort of charset encoding is being used, and you don't want any of this, because these characters clearly aren't intended to be printed like that.
Javascript sort of half-equates characters and bytes and will freely convert one to the other, picking some random encoding. Oof. Javascript sucks in this regard, it is what it is. The MDN docs on btoa explains why you shouldn't use it. You're running into that problem.
Not entirely sure how you fix it in javascript - but perhaps you don't need it. Java is decoding the bytes perfectly well, as is javascript, but javascript then turns those bytes into characters into some silly fashion and that's causing the problem.
What you have there is not a text string at all. The giveaway is the AA's at the beginning. Those map to a number of zero bytes. That doesn't translate to meaningful text in any standard character set.
So what you have there is most likely binary data. Converting it to a string is not going to give you meaningful text.
Now to explain the difference you are seeing between Java and Javascript. It looks to me as if both Java and Javascript are making a "best effort" attempt to convert the binary data as if is was encoded in ISO-8859-1 (aka ISO LATIN-1).
The problem is some of the bytes codes are mapping to unassigned codes.
In the Java case those unassigned codes are being mapped to ?, either when the string is created or when it is being output.
In the Javascript case, either the unassigned codes are not included in the string, or them are being removed when you attempt to display them.
For the record, this is how an online base64 decoder the above for me:
����������������Æû$SF3«àöBâ
The unassigned codes are 0x91 0x82 and 0x93. 0x15 and 0x0B are non-printing control codes.
But the bottom line is that you should not be converting this data into a string in either Java or in Javascript. It should be treated as binary; i.e. an array of byte values.
byte[] data = Base64.getDecoder().decode(str);

How to feed strange characters to javaScript? [duplicate]

I need to insert an Omega (Ω) onto my html page. I am using its HTML escaped code to do that, so I can write Ω and get Ω. That's all fine and well when I put it into a HTML element; however, when I try to put it into my JS, e.g. var Omega = Ω, it parses that code as JS and the whole thing doesn't work. Anyone know how to go about this?
I'm guessing that you actually want Omega to be a string containing an uppercase omega? In that case, you can write:
var Omega = '\u03A9';
(Because Ω is the Unicode character with codepoint U+03A9; that is, 03A9 is 937, except written as four hexadecimal digits.)
Edited to add (in 2022): There now exists an alternative form that better supports codepoints above U+FFFF:
let Omega = '\u{03A9}';
let desertIslandEmoji = '\u{1F3DD}';
Judging from https://caniuse.com/mdn-javascript_builtins_string_unicode_code_point_escapes, most or all browsers added support for it in 2015, so it should be reasonably safe to use.
Although #ruakh gave a good answer, I will add some alternatives for completeness:
You could in fact use even var Omega = 'Ω' in JavaScript, but only if your JavaScript code is:
inside an event attribute, as in onclick="var Omega = '&#937';
alert(Omega)" or
in a script element inside an XHTML (or XHTML + XML) document
served with an XML content type.
In these cases, the code will be first (before getting passed to the JavaScript interpreter) be parsed by an HTML parser so that character references like Ω are recognized. The restrictions make this an impractical approach in most cases.
You can also enter the Ω character as such, as in var Omega = 'Ω', but then the character encoding must allow that, the encoding must be properly declared, and you need software that let you enter such characters. This is a clean solution and quite feasible if you use UTF-8 encoding for everything and are prepared to deal with the issues created by it. Source code will be readable, and reading it, you immediately see the character itself, instead of code notations. On the other hand, it may cause surprises if other people start working with your code.
Using the \u notation, as in var Omega = '\u03A9', works independently of character encoding, and it is in practice almost universal. It can however be as such used only up to U+FFFF, i.e. up to \uffff, but most characters that most people ever heard of fall into that area. (If you need “higher” characters, you need to use either surrogate pairs or one of the two approaches above.)
You can also construct a character using the String.fromCharCode() method, passing as a parameter the Unicode number, in decimal as in var Omega = String.fromCharCode(937) or in hexadecimal as in var Omega = String.fromCharCode(0x3A9). This works up to U+FFFF. This approach can be used even when you have the Unicode number in a variable.
One option is to put the character literally in your script, e.g.:
const omega = 'Ω';
This requires that you let the browser know the correct source encoding, see Unicode in JavaScript
However, if you can't or don't want to do this (e.g. because the character is too exotic and can't be expected to be available in the code editor font), the safest option may be to use new-style string escape or String.fromCodePoint:
const omega = '\u{3a9}';
// or:
const omega = String.fromCodePoint(0x3a9);
This is not restricted to UTF-16 but works for all unicode code points. In comparison, the other approaches mentioned here have the following downsides:
HTML escapes (const omega = '&#937';): only work when rendered unescaped in an HTML element
old style string escapes (const omega = '\u03A9';): restricted to UTF-16
String.fromCharCode: restricted to UTF-16
The answer is correct, but you don't need to declare a variable.
A string can contain your character:
"This string contains omega, that looks like this: \u03A9"
Unfortunately still those codes in ASCII are needed for displaying UTF-8, but I am still waiting (since too many years...) the day when UTF-8 will be same as ASCII was, and ASCII will be just a remembrance of the past.
I found this question when trying to implement a font-awesome style icon system in html. I have an API that provides me with a hex string and I need to convert it to unicode to match with the font-family.
Say I have the string const code = 'f004'; from my API. I can't do simple string concatenation (const unicode = '\u' + code;) since the system needs to recognize that it's unicode and this will in fact cause a syntax error if you try.
#coldfix mentioned using String.fromCodePoint but it takes a number as an argument, not a string.
To finally cross the finish line, just add parseInt and pass 16 (since hex is base 16) to it's second parameter. You'll finally get a unicode string from a simple hex string.
This is what I did:
const code = 'f004';
const toUnicode = code => String.fromCodePoint(parseInt(code, 16));
toUnicode(code);
// => '\uf004'
Try using Function(), like this:
var code = "2710"
var char = Function("return '\\u"+code+"';")()
It works well, just do not add any 's or "s or spaces.
In the example, char is "✐".

javascript and html character escaping

I have this problem:
I have a javascript, saved in a database field, that is going to be used in a web page as a href target, e.g.
insert into table_with_links (id, url)
values (1, 'javascript:var url="blö blö";.....');
// run scripts that use the database values to generate web pages
// part of the generated html code:
<a href="javascript:var url='blabla';..... </a>
So far no problems. I have german letters (Umlaute - e.g. ö) in the javascript. I shouldn't save the german letters in the database, so I escape them:
insert into table_with_links (id, url)
values (1, 'javascript:var url="bl%F6 bl%F6";.....');
Now comes the problem - I shouldn't store the % sign in the database either, because the scripts that generate the web pages cannot handle it properly. I guess you can imagine how these scripts are 3-rd party scripts and cannot be changed.
So, my question is - can I also escape the % sign?
did you tryed this? :
var str= "remove the %";
var str_n = str.replace("%","");
here are the basics http://www.w3schools.com/jsref/jsref_replace.asp
then you can use an array of chars to replace take a look here javascript replace globally with array
I would suggest using oracle's built in internationalization, Oracle is capable of handling special german characters:
http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/u_i18n.htm
If you want to handle it on your own, I would suggest doing a string replace to some sequence you know:
var str = str.replace(/ö/g,"[german-umlaute]");
(the g at the end of /ö/g is to replace all occurrences in the string)

what kind of encoding is this?

I've got some data from dbpedia using jena and since jena's output is based on xml so there are some circumstances that xml characters need to be treated differently like following :
Guns n &#039; Roses
I just want to know what kind of econding is this?
I want decode/encode my input based on above encode(r) with the help of javascript and send it back to a servlet.
(edited post if you remove the space between & and amp you will get the correct character since in stackoverflow I couldn't find a way to do that I decided to put like that!)
Seems to be XML entity encoding, and a numeric character reference (decimal).
A numeric character reference refers to a character by its Universal
Character Set/Unicode code point, and uses the format
You can get some info here: List of XML and HTML character entity references on Wikipedia.
Your character is number 39, being the apostrophe: ', which can also be referenced with a character entity reference: &apos;.
To decode this using Javascript, you could use for example php.js, which has an html_entity_decode() function (note that it depends on get_html_translation_table()).
UPDATE: in reply to your edit: Basically that is the same, the only difference is that it was encoded twice (possibly by mistake). & is the ampersand: &.
This is an SGML/HTML/XML numeric character entity reference.
In this case for an apostrophe '.

html entity is not rendered

If I just put in XUL file
<label value="°C"/>
it works fine. However, I need to assing ° value to that label element and it doesn't show degree symbol, instead literal value.
UPD
sorry guys, I just missed couple words here - it doesn't work from within javascript - if I assign mylablel.value = degree + "°" - this will show literal value.
It does show degree symbol only if I put above manually in XUL file.
What happens when you use a JavaScript escape, like "\u00B0C", instead of "°C"?
Or when using mylabel.innerHTML instead of mylabel.value? (According to MDC, this should be possible.)
EDIT: you can convert those entities to JavaScript escapes using the Unicode Code Converter.
This makes sense to me. When you express the entity in an attribute value within XML markup, the XML parser interpolates the entity reference and then sets the label value to the result. From Javascript, however, there's no XML parser to do that work for you, and in fact life would be pretty nasty if there were! Note that when you set the value attribute (from Javascript) of an <input type='text'> element, you don't have to worry about having to escape XML entities (or even angle brackets, for that matter). However, you do have to worry about XML entities when you're setting the "value" attribute within XML markup.
Another way to think about it is this: XML entity notation is XML syntax, not Javascript syntax. In Javascript, you can produce special characters using 16-bit Unicode escape sequences, which look like \u followed by a four-digit hex constant. As noted in Marcel Korpel's answer, if you know what Unicode value is produced by the XML entity, then you should be able to use that directly from Javascript. In this case, you could use "\u00B0".
This way it will not work ,can you convert it to be like this
<label>°C</label>

Categories

Resources