Insert emoji with zero width joiner using Javascript

Insert emoji with zero width joiner using Javascript - javascript

I have not been able to successfully insert an emoji into the DOM using Javascript when I am given the codepoints and zero width joiners are used.
Consider this emoji: 👩‍👩‍👦
I am able to create a string that looks like this:
👩‍👩‍👦
and insert it into the innerHtml of an element but the 3 characters end up getting displayed instead of the single combined character. If you look at the html on this page for this character, you can see that the html is formatted in the same way as my string is:
https://emojipedia.org/family-woman-woman-boy/
This is only an issue when zero width joiners are used.
So doing this:
el.innerHTML = "👩‍👩‍👦"
should result in a single character but it doesn't, so how can I get the single character to display. NOTE: the character cannot just be added by typing the text into an editor. The content is generated by javascript.

Not really sure what the question is here, but if you have a good UTF8/Unicode editor you can of course just paste the emoji into your text file.
If this is problematic you could build it up using HTML escaping.
Below I have done both, the first just pasting into the editor, unfortunately SO editor is not the best here. And the second one I use using HTML escaping..
Hope this helps..
update: Using your version also seems to work for me using Chrome,
what browsers are you using..?
document.querySelector("#container").innerHTML = "👩‍👩‍👦";
document.querySelector("#container2").innerHTML =
"👩‍👩‍👦";
document.querySelector("#container3").innerHTML =
"👩‍👩‍👦";
<div id="container">
</div>
<div id="container2">
</div>
<div id="container3">
</div>

Related

How should I prevent HTML from interpreting user-entered text as an entity?

I have a website where users and enter text. A user entered something "I worked on the #3&#4 valves" into an <input>. That text gets stored in a database, and displayed on screen somewhere else. My problem is that the "&#4" is being interpreted as an HTML entity or special character, and I want it to be interpreted literally.
Do I need to use Javascript to escape & from the <input>? I was hoping that <pre> would work, but it also interprets the text as a code. Again, this is user inputted text.
For example, when I run the code below, the <input> shows different text than the <p>. I want the <p> to show exactly what the <input> shows.
<html>
<body>
<input id="box">
<p id="para"></p>
</body>
<script>
document.getElementById("box").value = "something #3&#4";
document.getElementById("para").innerHTML = "something #3&#4";
</script>
</html>
Fiddle
EDIT:
I realized that I'll need both a client-side solution and a server-side solution. In one place that user-inputted text is displayed, I'm using Javascript's .innerHTML, and on another webpage, I'm echoing it with PHP.

I think your real issue is a lack of server side filtering. Given that you are having this problem, it seems very likely to me that you aren't doing any server-side input filtering/cleaning at all, which means that you are also going to be vulnerable to XSS
On the server side you should be sanitizing everything that goes back out to the client, which includes both stripping HTML tags (and also returning errors on save if people try to send up HTML tags) as well as replacing html special characters (see htmlspecialchars). The latter will convert your & into &, which will have the end result you desire: your HTML will not be interpreted as HTML special characters.
The problem with fixing this with javascript client side is that, not only do you have to do it everywhere, but you also have to remember to do it in a different way if there are cases where this same output is shown in the HTML document itself, i.e. not displayed by javascript.
In short, coming up with a coherent (and thorough) method for sanitizing user data before it goes back to the browser will fix your problem and also provide a first layer of protection against a number of malicious attacks.

Working fiddle.
Try to append the content as text not as HTML using one of the followinf methods ( innerText or textContent ), like :
document.getElementById("para").innerText = "something #3&#4";
document.getElementById("para").textContent = "something #3&#4";
NOTE : In case of server-side display you could use htmlentities($content).
Hope this helps.
document.getElementById("para").textContent = "something #3&#4";
<p id="para"></p>

Use innerText instead of innerHTML.
https://jsfiddle.net/9746ah8s/2/

You need to stop manipulating it as HTML, because text only becomes code if you do it explicitly. In a slightly modified version of your example, please compare:
var txt = "one <strong>two</strong>";
document.getElementById("box").value = txt;
document.getElementById("para1").innerHTML = txt;
document.getElementById("para2").innerText = txt;
<input id="box">
<p id="para1"></p>
<p id="para2"></p>
(In the case of <input> there's only one option because the element cannot hold HTML in the first place.)

To display &, you could replace all the & with &amp, this way you will see #3&#4 and '&#4' wont be interpreted.

Parsing plain text Markdown from a ContentEditable div

I know there are other questions on editable divs, but I couldn't find one specific to the Markdown-related issue I have.
User will be typing inside a ContentEditable div. And he may choose to do any number of Markdown-related things like code blocks, headers, and whatever.
I am having issues extracting the source properly and storing it into my database to be displayed again later by a standard Markdown parser. I have tried two ways:
$('.content').text()
In this method, the problem is that all the line breaks are stripped out and of course that is not okay.
$('.content').html()
In this method, I can get the line breaks working fine by using regex to replace <br\> with \n before inserting into database. But the browser also wraps things like ## Heading Here with divs, like this: <div>## Heading Here</div>. This is problematic for me because when I go to display this afterwards, I don't get the proper Markdown formatting.
What's the best (most simple and reliable) way to solve this problem as of 2015?
EDIT: Found a potential solution here: http://www.davidtong.me/innerhtml-innertext-textcontent-html-and-text/

if you check the documentation of jquery's .text() method,
The result of the .text() method is a string containing the combined text of all matched elements. (Due to variations in the HTML parsers in different browsers, the text returned may vary in newlines and other white space.)
so getting whitespaces is not guaranteed in all browsers.
try using the innerText property of the element.
document.getElementsByClassName('content')[0].innerText
this returns the text with all white spacing intact. But this is not cross browser compatible. It works in IE and Chrome, but not in Firefox.
the innerText equivalent for Firefox is textContent (link), but that strips out the whitespaces.

This is what I've been able to come up with using that link I posted above in my edit. It's in Coffeescript.
div = $('.content')[0]
if div.innerText
text = div.innerText
else
escapedText = div.innerHTML
.replace(/(?:\r\<br\>|\r|\<br\>)/g, '\n')
.replace(/(\<([^\>]+)\>)/gi, "")
text = _.unescape(escapedText)
Basically, I'm checking whether or not innerText works, and if it doesn't then we do this other thing where we:
Take the HTML, which has escaped text.
Replace all the <br> tags with line breaks.
Strip out any tags (escaped ones won't be stripped, i.e. the stuff the user types).
Unescape the escaped text.

Invalid location of <script> tag within a HTML <pre> tag

I am going through the example given in JavaScript The Complete Reference 3rd Edition.
The O/P can be seen here, given by the author.
<body>
<h1>Standard Whitespace Handling</h1>
<script>
// STRINGS AND (X)HTML
document.write("Welcome to JavaScript strings.\n");
document.write("This example illustrates nested quotes 'like this.'\n");
document.write("Note how newlines (\\n's) and ");
document.write("escape sequences are used.\n");
document.write("You might wonder, \"Will this nested quoting work?\"");
document.write(" It will.\n");
document.write("Here's an example of some formatted data:\n\n");
document.write("\tCode\tValue\n");
document.write("\t\\n\tnewline\n");
document.write("\t\\\\\tbackslash\n");
document.write("\t\\\"\tdouble quote\n\n");
</script>
<h1>Preserved Whitespace</h1>
<pre>
<script> // in Eclipse IDE, at this line invalid location of tag(script)
// STRINGS AND (X)HTML
document.write("Welcome to JavaScript strings.\n");
document.write("This example illustrates nested quotes 'like this.'\n");
document.write("Note how newlines (\\n's) and ");
document.write("escape sequences are used.\n");
document.write("You might wonder, \"Will this nested quoting work?\"");
document.write(" It will.\n");
document.write("Here's an example of some formatted data:\n\n");
document.write("\tCode\tValue\n");
document.write("\t\\n\tnewline\n");
document.write("\t\\\\\tbackslash\n");
document.write("\t\\\"\tdouble quote\n\n");
</script>
</pre>
</body>
(X)HTML automatically “collapses” multiple whitespace characters down to one whitespace. So, for example, including multiple consecutive tabs in your HTML shows up as only one space character. In this example, the pre tag is used to tell the browser that the
text is preformatted and that it should not collapse the white space inside of it. Similarly, we could use the CSS white-space property to modify standard white space handling. Using pre allows the tabs in the example to be displayed correctly in the output.
So, how to get rid of this warning and do i really need to have a concern for this? I think i am missing something as i have the intuition of the authors not being wrong?

There is nothing wrong in having script inside pre tag. It is just Eclipse IDE validation issue. If you use this html in the browser everything works fine and no warnings are displayed.
Also, if you wanted to show script tag as 'text content' inside pre tag, then have a look at this question: script in pre

Replacing to HTML Character Entities and reverting back

When replacing things in my chat room it comes up in the box as the 'HTML Character Entities'. However, I want it to revert back and actually show the character typed in when it is then shown in the chat room. So I am using the following code to stop any html from being entered and damaging the chat room by replacing certain html character with there entities (I want to get one or two working before I look at the others I know there are many more.) ....
Javascript
var str1 = this.value.replace(/>/g, '<');
if (str1!=this.value) this.value=str1;
var str2 = this.value.replace(/</g, '>');
if (str2!=this.value) this.value=str2;
and then the following code then displays the text after it has been entered into the database etc. and on updating the chat box it uses the following to add in the the updated messages ...
Returned from php and then displayed through the following javascript
$('#chatroomarea').append($("<p>"+ data.text[i] +"</p>"));
I have messed around with this a few times changing it to val and using
.html(.append($("<p>"+ data.text[i] +"</p>")));
Etc. But I have had no luck. I'm not quite sure how to do this I just need the HTML Character Entities to actually show up back in there true Character instead of displaying something such as... '&#62'
This might be something I need to actually put within the replacing code where it will include code of it's own on replacing such as (this is just an example I'm not exactly sure on how I would write it) ....
var str1 = this.value.replace(/>/g, '.html(<)');
Any help on this would be much appreciated, Thank you.

$('#chatroomarea').append($("<xmp>"+ data.text[i] +"</xmp>"));
HTML xmp tag
The use is deprecated, but supported in most browsers.
Another option will be to use a styled textarea , To my knowledge these two are the tags that doesn't bother rendering html tags as it is.

Getting unparsed (raw) HTML with JavaScript

I need to get the actual html code of an element in a web page.
For example if the actual html code inside the element is "How to fix"
Running this JavaScript:
getElementById('myE').innerHTML
Gives me "How to fix" which is the parsed HTML.
How can I get the unparsed "How to fix" using JavaScript?

You cannot get the actual HTML source of part of your web page.
When you give a web browser an HTML page, it parses the HTML into some DOM nodes that are the definitive version of your document as far as the browser is concerned. The DOM keeps the significant information from the HTML—like that you used the Unicode character U+00A0 Non-Breaking Space before the word fix—but not the irrelevent information that you used it by means of an entity reference rather than just typing it raw ( ).
When you ask the browser for an element node's innerHTML, it doesn't give you the original HTML source that was parsed to produce that node, because it no longer has that information. Instead, it generates new HTML from the data stored in the DOM. The browser decides on how to format that HTML serialisation; different browsers produce different HTML, and chances are it won't be the same way you formatted it originally.
In particular,
element names may be upper- or lower-cased;
attributes may not be in the same order as you stated them in the HTML;
attribute quoting may not be the same as in your source. IE often generates unquoted attributes that aren't even valid HTML; all you can be sure of is that the innerHTML generated will be safe to use in the same browser by writing it to another element's innerHTML;
it may not use entity references for anything but characters that would otherwise be impossible to include directly in text content: ampersands, less-thans and attribute-value-quotes. Instead of returning it may simply give you the raw character.
You may not be able to see that that's a non-breaking space, but it still is one and if you insert that HTML into another element it will act as one. You shouldn't need to rely anywhere on a non-breaking space character being entity-escaped to ... if you do, for some reason, you can get that by doing:
x= el.innerHTML.replace(/\xA0/g, ' ')
but that's only escaping U+00A0 and not any of the other thousands of possible Unicode characters, so it's a bit questionable.
If you really really need to get your page's actual source HTML, you can make an XMLHttpRequest to your own URL (location.href) and get the full, unparsed HTML source in the responseText. There is almost never a good reason to do this.

What you have should work:
Element test:
<div id="myE">How to fix</div>
JavaScript test:
alert(document.getElementById("myE").innerHTML); //alerts "How to fix"
You can try it out here. Make sure that wherever you're using the result isn't show as a space, which is likely the case. If you want to show it somewhere that's designed for HTML, you'll need to escape it.

You can use a script tag instead, which will not parse the HTML. This is more relevant when there are angle brackets, like loading a lodash or underscore template.
document.getElementById("asDiv").value = document.getElementById("myDiv").innerHTML;
document.getElementById("asScript").value = document.getElementById("myScript").innerHTML;
<div id="myDiv">
<h1>
<%= ${var} %> %>
How to fix
</h1>
</div>
<script id="myScript" type="text/template">
<h1>
<%= ${var} %>
How to fix
</h1>
</script>
<textarea rows="10" cols="40" id="asDiv"></textarea>
<textarea rows="10" cols="40" id="asScript"></textarea>
Because the HTML in a div is parsed, the inner HTML for brackets comes back as
<
, but as a script it does not.

Develop Reference

JavaScript is the programming language of the Web.