How to render only parts of a string as HTML - javascript

I want to render a text as common HTML and parse occurrences of [code] tags that should be output unrendered - with the tags left untouched.
So input like this gets processed accordingly:
<p>render as HTML here</p>
[code]<p>keep tags visible here</p>[/code]
<p>more unescaped text</p>
I've regexed all code-tags but I have no idea how to properly set the text of the element afterwards. If I use jQuery's text() method nothing gets escaped, if I set it with the html() method everything gets rendered and I gained nothing. Can anybody give me a hint here?

Try replacing [code] with <xmp> and [/code] with </xmp> using regex or alike, and then use the jQuery html() function.
Note that <xmp> is technically deprecated in HTML5, but it still seems to work in most browsers. For more information see How to display raw html code in PRE or something like it but without escaping it.

You could replace the [code] and [/code] tags by <pre> and </pre> tags respectively, and then replace the < within the <pre> tags by & lt;
A programmatic solution based on Javascript is as follows
function myfunction(){
//the string 's' probably would be passed as a parameter
var s = "<p>render as HTML here</p>\
[code]<p>keep tags visible here</p>[/code]\
<p>more unescaped text</p>";
//keep everything before [code] as it is
var pre = s.substring(0, s.indexOf('[code]'));
//replace < within code-tags by <
pre += s.substring(s.indexOf('[code]'), s.indexOf('[/code]'))
.replace(new RegExp('<', 'g'),'<');
//concatenate the remaining text
pre += s.substring(s.indexOf('[/code]'), s.length);
pre = pre.replace('[code]', '<pre>');
pre = pre.replace('[/code]', '</pre>');
//pre can be set as some element's innerHTML
return pre;
}

I would NOT recommend the accepted answer by Andreas at all, because the <xmp> tag has been deprecated and browser support is totally unreliable.
It's much better to replace the [code] and [/code] tags by <pre> and </pre> tags respectively, as raghav710 suggested.
He's also right about replacing the < character with <, but that's actually not the only character you should replace. In fact, you should replace character that's a special character in HTML with corresponding HTML entities.
Here's how you replace a character with its corresponding HTML entity :
var chr = ['&#', chr.charCodeAt(), ';'].join('');

You can replace the [code]...[/code] with a placeholder element. And then $.parseHTML() the string with the placeholders. Then you can insert the code into the placeholder using .text(). The entire thing can then be inserted to the document (run below or in JSFiddle).
var str = "<div><b>parsed</b>[code]<b>not parsed</b>[/code]</div>";
var placeholder = "<div id='code-placeholder-1' style='background-color: gray'></div>";
var codepat = /\[code\](.*)\[\/code\]/;
var code = codepat.exec(str)[1];
var s = str.replace(codepat, placeholder);
s = $.parseHTML(s);
$(s).find("#code-placeholder-1").text(code);
$("#blah").html(s);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
Text
<div id="blah">place holder</div>
Around
The code above will need some modifications if you have multiple [code] blocks, you will need to generate a unique placeholder id for each code block.
If you may be inserting untrusted structure code, would highly recommend using large random number for the placeholder id to prevent a malicious user from hijacking the placeholder id.

Related

Write <script> or other tags as text in html [duplicate]

I want HTML, for example, <p>, to show show as just that, in plain text, and not interpreted by the browser as an actual tag.
I know JQuery has .html and .text, but how is this done in raw JS?
There are functions like encodeURIComponent that encodes <p> to %3Cp%3E but if I just put that into HTML, it interprets it literally as %3Cp%3E.
So there are also things like > and <, they work but I can't find any JavaScript functions that escapes & unescapes from this.
Is there a correct way to show HTML as text with raw JavaScript?
There's no need to escape the characters. Simply use createTextNode:
var text = document.createTextNode('<p>Stuff</p>');
document.body.appendChild(text);
See a working example here: http://jsfiddle.net/tZ3Xj/.
This is exactly how jQuery does it (line 43 of jQuery 1.5.2):
return this.empty().append( (this[0] && this[0].ownerDocument || document).createTextNode( text ) );
The function used by Prototype looks like a good start:
http://www.prototypejs.org/api/string/escapeHTML
function escapeHTML() {
return this.replace(/&/g,'&').replace(/</g,'<').replace(/>/g,'>');
}
Version more suited to use outside Prototype:
function escapeHTML(html) {
return html.replace(/&/g,'&').replace(/</g,'<').replace(/>/g,'>');
}
You can use aslo innerText from most of DOM elements:
document.getElementById('some').innerText = "There can be <b> even HTML tags </b>";
The tags will not be formatted. You can aslo use \n for new line and more codes (\t, \u2623...). If you want aslo to use fixed-size characters you can use easy <pre> tag.
This is a job for the method createTextNode
var target div = document.getElementById('div1');
targetDiv.appendChild(document.createTextNode('<p>HelloWorld</p>'));
i suggest to use pre tag of html
and you can convert your using this link
e.g if you copy
<p>Hi </p>
it will give you converted code as...
<p>Hi </p>
Just copy and paste above code in pre and it will work fine...

RegEx to only look at text inside HTML tags?

I recently started learning/using about RegEx.
Is there a way to avoid matching words that are HTML tag attributes or belonging to tag attributes?
For example:
<p style=“position: absolute”>position: </p>
I tried
/\bposition\b\W\s/g
But that matches both instances.
Can I only match the second “position: “?
Clarification:
I am trying to search the document for words that the user enters and replace them with a span element containing those words - this is similar to "Ctrl + F". Simply having the text is not enough as I would need a way to also update the document once the text was replaced with the span elements.
Disclaimer: Use stuff like document.innerText and other DOM APIs rather than Regex.
Match HTML tags:
<.+?>/g
Match everything within HTML tags (should handle nested ones as well):
/(?<=<.+.>)(.*?)(?=<.*\/.+.?>)/g
https://regex101.com/r/2uZHli/ for example of the above.
The RegEx to match the HTML / XML tags is /(<([^>]+)>)/ig. Maybe be this is what you're looking for.
let str = '<p style="position: absolute">position: </p>';
const strWithoutTag = str.replace(/(<([^>]+)>)/ig, '');
console.log(strWithoutTag);
You can try the Regex to match your temp, which matched the second "position: ".
/(?=\b.*(?<yourKeyword>position).*\b)(?<=<[^]*>)([^<>]+)(?=<\/([^<>]*)>)/g

JQuery : Delete span from title tag

Please how can I get a span into title tag i try to use
console.log($('title > span').text());
but it return an empty result. For example I want to get the word beautiful from this code :
<title>lorem epsum dolor <span class="spa">beautiful</span></title>
<title> tag should not include any-other tags inside it.
as stated here https://www.w3.org/TR/html401/struct/global.html#h-7.4.2
Titles may contain character entities (for accented characters, special characters, etc.), but may not contain other markup (including comments).
try this:
$('title span').html()
As Nate mentioned, a span tag is not valid within a title tag. However, if you still need to process it for whatever reason (for instance, a dynamically generated title from a database), you could use a regular expression.
<script>
var titleTag = $('title').text();
var match = titleTag.match(/<span>(.+)<\/span>/);
console.log(match[1]);
</script>

innerHTML to ascii

I am attempting to write my own piece of Javascript that converts html to ascii code (for learning purposes) so that the browser will render the code as you would see it in a text editor.
After looking around on Stack I have gotten as far as below. I am trying to turn an html element into a string; at this stage I am just trying to .replace() the angular brackets into ascii. If anyone could tell me where I am going wrong as far as having my test <body> tag showing up in the console that would be much appreciated.
<code class="lang-html">
<body></body>
</code>
(function() {
var html = $('.lang-html').innerHTML;
html.replace('<', '<');
html.replace('>', '>');
console.log(html);
});
Just to clarify, I am expecting that the console would spit out <body></body>.
Any help would be much appreciated.
A few things:
$('.lang-html').innerHTML
Assuming this is jQuery, this won't work. .innerHTML only works on raw DOM elements, like what's returned from document.getElementById(...). Instead, $('.lang-html') returns a jQuery collection, which has its own accessor methods. You should do:
$('.lang-html').html() // get the HTML as text from this element
Moving on, .replace() won't modify the original string. It returns a new copy. In the simplest case you can do:
var html = $('.lang-html')
.html()
.replace('<', '<')
.replace('>', '>');
But you still have to re-assign it to the HTML source. Again, jQuery provides a simple API for this.
$('.lang-html').html(html);
However, there's one more problem. .replace() only replaces the first match in a string. To replace all of them, you need to construct a regex and use the /g (global) flag. Here's the complete code:
var $element = $('.lang-html');
var html = $element.html()
.replace(/</g, '<')
.replace(/>/g, '>');
$element.html(html)
If you want get html code representation of an DOMElement in your browser then you won't need the replace to escape the html special chars. But you can use the browser to take care of all edge cases.
You could just use innerHTML/outerHTML and textContent.
This will e.g. will replace the content of the body with its html code representation.
var elm = document.getElementsByTagName('body')[0];
elm.textContent = elm.outerHTML;
Or if you just want to have the result as string but not displayed in the browsers then you could wrap that into a function:
function escapeHTML(html) {
var div = document.createElement('div');
div.textContent = html;
return div.innerHTML;
}
console.log( escapeHTML('<div>test</div>') );
You can also do a
$('.lang-html').prop("innerText")
which will hand you back the contents of that div, as real text.
No further translation should be needed.
Actually <body> tags will not be returned in the innerHTML of the posted code because the HTML is invalid. To explain:
To cater for changes to the DOM made in Javascript, browsers dynamically create innerHTML strings from the DOM by inspecting child elements of a specified node and generating HTML code from them.
Since <body> tags are only valid immediately following the head section, browsers silently respond to the <code> tag in your post by first creating a body element in which to place it. The <body> tags which follow are then ignored because they are invalid in this position. Hence there is no body element child of the code node, and no body tags in its innerHTML
Update (2): To pretty print the HTML without viewing page source you could try.
(function() {
var body = document.body;
var html = body.parentNode.outerHTML;
html = html.replace(/</g, '<');
html = html.replace(/>/g, '>');
html = html.replace(/\ /g, " ");
html = html.replace(/\n/g, '<br>\n');
// console.log(html);
body.innerHTML = html;
body.style.fontFamily = "monospace";
});

Why JavaScript converts my < into >

JavaScript converts my < into >. I want to alert it but my message is with encoded marks like ##&*()}{>?>? - how to display it normally but prevent from executing as HTML code?
<span id="ID" onClick="alertIt(this.id);">
<p>Some string with special chars: ~!##&*()}{>?>?>|{">##$#^#$</p>
<p>Why when clicked it gives something like this:</p>
<p>'<br>
Some string with special chars: ~!##&*()}{>?>?>|... and so on
<br>'</p>
</span>
<script type="text/javascript">
function alertIt(ID)
{
var ID = ID;
var content = document.getElementById(ID).innerHTML;
alert(content);
}
</script>
Use innerText instead of innerHTML. http://jsfiddle.net/WVf95/
Your problem is that you use the wrong approach to get the text to display with alert().
Some characters are illegal in HTML text (they are used for HTML tags and entities). innerHTML will make sure that text is properly escaped (i.e. you can see tags and escaped text).
If you want to see tag and text in alert(), there is no solution.
If you want only the text, then you will have to extract it yourself. There is no built-in support for that. It's also not really trivial to implement. I suggest to include jQuery in your page; then you can get the text with:
function alertIt(ID) {
alert($(ID).text());
}
Using textContent instaed of innerHTML or innerText is a solution.

Categories

Resources