Escape characters within an HTML tag - javascript

I have a specific div that cannot have tags within it.
Whenever tags are found, I would like to escape them and display as regular text.
For example:
<div class='no-tags-div'>
<h1>Hi!</h1>
<p>Blablablabalablal</p>
</div>
Instead of displaying the Hi! as a header text followed by a paragraph of Blablablabalablal, I would like to literally display it with the tags:
<h1>Hi!</h1>
<p>Blablablabalablal</p>
I already have access to the content I just need to figure out how to escape any of these special characters.
Edit: I should probably specify, the content within the div is posted through an input. I am attempting to not allow users to post other tags through the input, so this isn't just static HTML text we are talking about here.

You can use < and > to escape < and >. If you're doing this on the server side, you can find and replace those. On the client, you can use element.innerText, as D. Pardal suggested, which replaces the contents of element with a text node, rather than interpreting it as HTML.

Related

Insert emoji with zero width joiner using Javascript

I have not been able to successfully insert an emoji into the DOM using Javascript when I am given the codepoints and zero width joiners are used.
Consider this emoji: 👩‍👩‍👦
I am able to create a string that looks like this:
👩‍👩‍👦
and insert it into the innerHtml of an element but the 3 characters end up getting displayed instead of the single combined character. If you look at the html on this page for this character, you can see that the html is formatted in the same way as my string is:
https://emojipedia.org/family-woman-woman-boy/
This is only an issue when zero width joiners are used.
So doing this:
el.innerHTML = "👩‍👩‍👦"
should result in a single character but it doesn't, so how can I get the single character to display. NOTE: the character cannot just be added by typing the text into an editor. The content is generated by javascript.
Not really sure what the question is here, but if you have a good UTF8/Unicode editor you can of course just paste the emoji into your text file.
If this is problematic you could build it up using HTML escaping.
Below I have done both, the first just pasting into the editor, unfortunately SO editor is not the best here. And the second one I use using HTML escaping..
Hope this helps..
update: Using your version also seems to work for me using Chrome,
what browsers are you using..?
document.querySelector("#container").innerHTML = "👩‍👩‍👦";
document.querySelector("#container2").innerHTML =
"👩‍👩‍👦";
document.querySelector("#container3").innerHTML =
"👩‍👩‍👦";
<div id="container">
</div>
<div id="container2">
</div>
<div id="container3">
</div>

match every character until a pattern occurs in the beginning of the line (javascript)

I have this text:
<a>
a lot of text here with all types of symbols ! : . %& < >
</a>
<a>
another text here with all types of symbols ! : . %& < >
</a>
I want to match the tag name and its contents: so the procedure I'm using is match:
<([^]*?)>(?:([^]*)<\/\1>)?
NOTE: I use the conditional group at the end because it can be omitted, for example.
<a>
<a>
another text here with all types of symbols ! : . %& < >
</a>
But my problem is that the regex tries to consume every character so it opens and closes the tab and the contents of the tab becomes:
<a>
another text here with all types of symbols ! : . %& < >
when I wanted to detect two matches one the isolated tag and the other the multiline tag.
NOTE2: This is NOT HTML or XML so I don't need to parse it like wise.
NOTE3: my ideia was to replace the regex part:
(?:([^]*)....
by something that would 'match every character until '<' appears at the beginning of the line (this because in the text I'm parsing there can't be tags inside tags) so I thought that would be good.. but I can't seem to find a regex for that :(
I think what you want is /<([a-z0-9-]+)>([^]*?)(?:(<\/\1>)|$|(?=(?:<[a-zA-Z0-9\-]+>)))/gi
I suggest you parse it by program:
Match the first occurrence of any opening tag:
<([a-z0-9]+)>
With this, you can get the tag's name.
Get the position of the second occurrence of any opening tag and the position of the first ocurrence of the closing tag with the same name that the read before.
Compare these positions and decide if it was a single-line just-open-tag or a multi-line open-and-close-tag.
Get the content enclosed between the first opening tag and the lowest position got in step 2.

How to retrieve the text in html CDATA section?

I have the following script element section in HTML:
<script type="text/x-markdown"><![CDATA[
# hello, This is Markdown Script Demo]]></script>
When i'm trying to retrieve the inner content via scripttag.innerHTML, it returns the text with ![CDATA[...]]>parts
Is there more efficient way to retrieve the inner part of CDATA section at once instead of applying regexp to remove it from received innerHTML data?
I don't think you will be able to retreive only whats inside the CDATA as its not a tag but plain text, when you get the innerHTML of the tag you will get everything as a string, so regexp is the only way I see you could get whats inside.
CDATA is an XML concept. It is a way of specifying a section of text inside which things that look like mark-up or special XML characters are treated as plain text. It is essentially equivalent to escaping < to < etc. everywhere within the CDATA section.
If the document has an HTML doctype, then the CDATA receives no special processing and is just more characters. If the document had an XHTML doctype, then you would be able to retrieve the CDATA section as is, with no further ado.
This question is quite old, but this might help somebody.
You can probably use textContent.
Example from parsing a rss feed node which looks like this:
<title><![CDATA[This contains the title]]></title>
Javascript:
const desc = el.querySelector('title').textContent;

How to replace all periods in a string that aren't in an html tag?

I need to replace all periods in a user submitted paragraph of text that will most likely be copy and pasted from a microsoft word document so the text will have formatting on it.
For example, text pasted in from word looks like this:
<p class="MsoNormal" style="margin-bottom: 5.75pt; text-indent: 0.5in;"><span style="font-size:12.0pt;font-family: etc...
I need to edit all of the periods not within these tags and put span tags around them, so I can't just grab the html and do .replace.
:(
Use this answer to find text nodes, then do the replace on them.
If you have it as a string, convert into document fragment first.

Match HTML tag's content with a Javascript RegEx

I have the following HTML as a string in my JavaScript function:
<p>one</p> <p align='center'>two</p>
I want to extract this string:
"onetwo" (without quotes obviously)
Can you please suggest some pure JavaScript code (jQuery is also OK...) to get tags' content?
Using jQuery you don't need a complex regex, you can easily parse the HTML and use the DOM:
var s = "<p>one</p> <p align='center'>two</p>";
var wrapper = $('<div />').html(s);
var text = wrapper.text();
In this case $(s).text() would have also worked, but it will fail if you have free text on the first level (e.g. <p>1</p>2), so I usually avoid it.
Note that the result here is "one two" (not "onetwo"), because you have a space between the <p> tags.
If that's a problem, you can use wrapper.children().text() or wrapper.find('p').text(), for example, according to your exact needs.
Working example: http://jsbin.com/osidi3
I made the following Regex to grab content from XML tags.
This will only work with a tag that has content and is followed by a closing tag. Will not get contents of tags that contain other tags.
The tag name is in capture group 1 and the tag content is in capture group 2. This will work to get all content including <, >, ", ' and & inside of tag content.
<([^\s>]+)\s?[^>]*>(.*)(?:<\/\1)>

Categories

Resources