innerHTML to ascii - javascript

innerHTML to ascii - javascript

I am attempting to write my own piece of Javascript that converts html to ascii code (for learning purposes) so that the browser will render the code as you would see it in a text editor.
After looking around on Stack I have gotten as far as below. I am trying to turn an html element into a string; at this stage I am just trying to .replace() the angular brackets into ascii. If anyone could tell me where I am going wrong as far as having my test <body> tag showing up in the console that would be much appreciated.
<code class="lang-html">
<body></body>
</code>
(function() {
var html = $('.lang-html').innerHTML;
html.replace('<', '<');
html.replace('>', '>');
console.log(html);
});
Just to clarify, I am expecting that the console would spit out <body></body>.
Any help would be much appreciated.

A few things:
$('.lang-html').innerHTML
Assuming this is jQuery, this won't work. .innerHTML only works on raw DOM elements, like what's returned from document.getElementById(...). Instead, $('.lang-html') returns a jQuery collection, which has its own accessor methods. You should do:
$('.lang-html').html() // get the HTML as text from this element
Moving on, .replace() won't modify the original string. It returns a new copy. In the simplest case you can do:
var html = $('.lang-html')
.html()
.replace('<', '<')
.replace('>', '>');
But you still have to re-assign it to the HTML source. Again, jQuery provides a simple API for this.
$('.lang-html').html(html);
However, there's one more problem. .replace() only replaces the first match in a string. To replace all of them, you need to construct a regex and use the /g (global) flag. Here's the complete code:
var $element = $('.lang-html');
var html = $element.html()
.replace(/</g, '<')
.replace(/>/g, '>');
$element.html(html)

If you want get html code representation of an DOMElement in your browser then you won't need the replace to escape the html special chars. But you can use the browser to take care of all edge cases.
You could just use innerHTML/outerHTML and textContent.
This will e.g. will replace the content of the body with its html code representation.
var elm = document.getElementsByTagName('body')[0];
elm.textContent = elm.outerHTML;
Or if you just want to have the result as string but not displayed in the browsers then you could wrap that into a function:
function escapeHTML(html) {
var div = document.createElement('div');
div.textContent = html;
return div.innerHTML;
}
console.log( escapeHTML('<div>test</div>') );

You can also do a
$('.lang-html').prop("innerText")
which will hand you back the contents of that div, as real text.
No further translation should be needed.

Actually <body> tags will not be returned in the innerHTML of the posted code because the HTML is invalid. To explain:
To cater for changes to the DOM made in Javascript, browsers dynamically create innerHTML strings from the DOM by inspecting child elements of a specified node and generating HTML code from them.
Since <body> tags are only valid immediately following the head section, browsers silently respond to the <code> tag in your post by first creating a body element in which to place it. The <body> tags which follow are then ignored because they are invalid in this position. Hence there is no body element child of the code node, and no body tags in its innerHTML
Update (2): To pretty print the HTML without viewing page source you could try.
(function() {
var body = document.body;
var html = body.parentNode.outerHTML;
html = html.replace(/</g, '<');
html = html.replace(/>/g, '>');
html = html.replace(/\ /g, " ");
html = html.replace(/\n/g, '<br>\n');
// console.log(html);
body.innerHTML = html;
body.style.fontFamily = "monospace";
});

Related

Replace non-code text on webpage

I searched through a bunch of related questions that help with replacing site innerHTML using JavaScript, but most reply on targetting the ID or Class of the text. However, my can be either inside a span or td tag, possibly elsewhere. I finally was able to gather a few resources to make the following code work:
$("body").children().each(function() {
$(this).html($(this).html().replace(/\$/g,"%"));
});
The problem with the above code is that I randomly see some code artifacts or other issues on the loaded page. I think it has something to do with there being multiple "$" part of the website code and the above script is converting it to %, hence breaking things.using JavaScript or Jquery
Is there any way to modify the code (JavaScript/jQuery) so that it does not affect code elements and only replaces the visible text (i.e. >Here<)?
Thanks!
---Edit---
It looks like the reason I'm getting a conflict with some other code is that of this error "Uncaught TypeError: Cannot read property 'innerText' of undefined". So I'm guessing there are some elements that don't have innerText (even though they don't meet the regex criteria) and it breaks other inline script code.
Is there anything I can add or modify the code with to not try the .replace if it doesn't meet the regex expression or to not replace if it's undefined?

Wholesale regex modifications to the DOM are a little dangerous; it's best to limit your work to only the DOM nodes you're certain you need to check. In this case, you want text nodes only (the visible parts of the document.)
This answer gives a convenient way to select all text nodes contained within a given element. Then you can iterate through that list and replace nodes based on your regex, without having to worry about accidentally modifying the surrounding HTML tags or attributes:
var getTextNodesIn = function(el) {
return $(el)
.find(":not(iframe, script)") // skip <script> and <iframe> tags
.andSelf()
.contents()
.filter(function() {
return this.nodeType == 3; // text nodes only
}
);
};
getTextNodesIn($('#foo')).each(function() {
var txt = $(this).text().trim(); // trimming surrounding whitespace
txt = txt.replace(/^\$\d$/g,"%"); // your regex
$(this).replaceWith(txt);
})
console.log($('#foo').html()); // tags and attributes were not changed
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="foo"> Some sample data, including bits that a naive regex would trip up on:
foo<span data-attr="$1">bar<i>$1</i>$12</span><div>baz</div>
<p>$2</p>
$3
<div>bat</div>$0
<!-- $1 -->
<script>
// embedded script tag:
console.log("<b>$1</b>"); // won't be replaced
</script>
</div>

I did it solved it slightly differently and test each value against regex before attempting to replace it:
var regEx = new RegExp(/^\$\d$/);
var allElements = document.querySelectorAll("*");
for (var i = 0; i < allElements.length; i++){
var allElementsText = allElements[i].innerText;
var regExTest = regEx.test(allElementsText);
if (regExTest=== true) {
console.log(el[i]);
var newText = allElementsText.replace(regEx, '%');
allElements[i].innerText=newText;
}
}
Does anyone see any potential issues with this?
One issue I found is that it does not work if part of the page refreshes after the page has loaded. Is there any way to have it re-run the script when new content is generated on page?

How to render only parts of a string as HTML

I want to render a text as common HTML and parse occurrences of [code] tags that should be output unrendered - with the tags left untouched.
So input like this gets processed accordingly:
<p>render as HTML here</p>
[code]<p>keep tags visible here</p>[/code]
<p>more unescaped text</p>
I've regexed all code-tags but I have no idea how to properly set the text of the element afterwards. If I use jQuery's text() method nothing gets escaped, if I set it with the html() method everything gets rendered and I gained nothing. Can anybody give me a hint here?

Try replacing [code] with <xmp> and [/code] with </xmp> using regex or alike, and then use the jQuery html() function.
Note that <xmp> is technically deprecated in HTML5, but it still seems to work in most browsers. For more information see How to display raw html code in PRE or something like it but without escaping it.

You could replace the [code] and [/code] tags by <pre> and </pre> tags respectively, and then replace the < within the <pre> tags by & lt;
A programmatic solution based on Javascript is as follows
function myfunction(){
//the string 's' probably would be passed as a parameter
var s = "<p>render as HTML here</p>\
[code]<p>keep tags visible here</p>[/code]\
<p>more unescaped text</p>";
//keep everything before [code] as it is
var pre = s.substring(0, s.indexOf('[code]'));
//replace < within code-tags by <
pre += s.substring(s.indexOf('[code]'), s.indexOf('[/code]'))
.replace(new RegExp('<', 'g'),'<');
//concatenate the remaining text
pre += s.substring(s.indexOf('[/code]'), s.length);
pre = pre.replace('[code]', '<pre>');
pre = pre.replace('[/code]', '</pre>');
//pre can be set as some element's innerHTML
return pre;
}

I would NOT recommend the accepted answer by Andreas at all, because the <xmp> tag has been deprecated and browser support is totally unreliable.
It's much better to replace the [code] and [/code] tags by <pre> and </pre> tags respectively, as raghav710 suggested.
He's also right about replacing the < character with <, but that's actually not the only character you should replace. In fact, you should replace character that's a special character in HTML with corresponding HTML entities.
Here's how you replace a character with its corresponding HTML entity :
var chr = ['&#', chr.charCodeAt(), ';'].join('');

You can replace the [code]...[/code] with a placeholder element. And then $.parseHTML() the string with the placeholders. Then you can insert the code into the placeholder using .text(). The entire thing can then be inserted to the document (run below or in JSFiddle).
var str = "<div><b>parsed</b>[code]<b>not parsed</b>[/code]</div>";
var placeholder = "<div id='code-placeholder-1' style='background-color: gray'></div>";
var codepat = /\[code\](.*)\[\/code\]/;
var code = codepat.exec(str)[1];
var s = str.replace(codepat, placeholder);
s = $.parseHTML(s);
$(s).find("#code-placeholder-1").text(code);
$("#blah").html(s);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
Text
<div id="blah">place holder</div>
Around
The code above will need some modifications if you have multiple [code] blocks, you will need to generate a unique placeholder id for each code block.
If you may be inserting untrusted structure code, would highly recommend using large random number for the placeholder id to prevent a malicious user from hijacking the placeholder id.

Add actual HTML elements to PRE/CODE contents

I'm trying to create a quick/dirty way to add some syntax highlighting for pre/code tags in html using javascript.
The problem i'm running into, is that if i edit either the text() or html(), I get escaped content. That is, the added tags render as pre/code, or i get a bunch of eascape characters.
Consider the following html:
<pre>
<code class="target">
public interface __iIFoo { }
public class __tBar : __iIFoo { }
var list = new List__/__iIFoo\__();
</code>
</pre>
The goal here is to replace occurrences of __iIFoo with:
<span class="interface">IFoo</span>
So that it can be highlighted with css. And of course, when it's rendered, I don't want to see the actual SPAN tag.
Here's what I've tried:
$(function(){
var iPatt = /__i\w+/g
$.each($(".target").text().match(iPatt), function(i,match){
var replace = '<span class="interface">'+match.substring(3)+'</span>';
$(".target").text(function(){
return $(this).text().replace(match, replace);
});
});
});
This works, BUT, the span tags I'm adding show up in the rendered content e.g. they are just like all the other pre code. I don't want to see it!

Use .html() instead of .text(). When you use .text(), the value is the literal text that you want users to see, so it replaces special HTML characters with entities so they'll show up literally.

DEMO
.text() treats value as text and .html() render it as html content
$(".target").html(function () { //replace text with html
return $(this).text().replace(match, replace);
});

Try using it with html instead:
$(function(){
var iPatt = /__i\w+/g
$.each($(".target").text().match(iPatt), function(i,match){
var replace = '<span class="interface">'+match.substring(3)+'</span>';
$(".target").html(function(){
return $(this).text().replace(match, replace);
});
});
});

As I said in my comment, change the html rather than the text (fiddle).
As a side-note, it's worrisome that you're completely overwriting the contents of .target every time you encounter a match. You should take advantage of RegExp capture groups and perform only one assignment.
(function () {
var iPattern = /__i(\w+)/g,
iTemplate = "<span class='interface'>$1</span>";
$(".target").each(function () {
this.innerHTML = this.innerHTML.replace(iPattern, iTemplate);
});
})();

Why isn't there a document.createHTMLNode()?

I want to insert html at the current range (a W3C Range).
I guess i have to use the method insertNode. And it works great with text.
Example:
var node = document.createTextNode("some text");
range.insertNode(node);
The problem is that i want to insert html (might be something like "<h1>test</h1>some more text"). And there is no createHTMLNode().
I've tried to use createElement('div'), give it an id, and the html as innerHTML and then trying to replace it with it's nodeValue after inserting it but it gives me DOM Errors.
Is there a way to do this without getting an extra html-element around the html i want to insert?

Because "<h1>test</h1>some more text" consists of an HTML element and two pieces of text. It isn't a node.
If you want to insert HTML then use innerHTML.
Is there a way to do this without getting an extra html-element around the html i want to insert?
Create an element (don't add it to the document). Set its innerHTML. Then move all its child nodes by looping over foo.childNodes.

In some browsers (notably not any version of IE), Range objects have an originally non-standard createContextualFragment() that may help. It's likely that future versions of browsers such as IE will implement this now that it has been standardized.
Here's an example:
var frag = range.createContextualFragment("<h1>test</h1>some more text");
range.insertNode(frag);

Try
function createHTMLNode(htmlCode, tooltip) {
// create html node
var htmlNode = document.createElement('span');
htmlNode.innerHTML = htmlCode
htmlNode.className = 'treehtml';
htmlNode.setAttribute('title', tooltip);
return htmlNode;
}
From: http://www.koders.com/javascript/fid21CDC3EB9772B0A50EA149866133F0269A1D37FA.aspx

Instead of innerHTML just use appendChild(element); this may help you.
If you want comment here, and I will give you an example.

The Range.insertNode() method inserts a node at the start of the Range.
var range = window.getSelection().getRangeAt(0);
var node = document.createElement('b');
node.innerHTML = 'bold text';
range.insertNode(node);
Resources
https://developer.mozilla.org/en-US/docs/Web/API/range/insertNode

.html() and .append() without jQuery

Can anyone tell me how can I use these two functions without using jQuery?
I am using a pre coded application that I cannot use jQuery in, and I need to take HTML from one div, and move it to another using JS.

You can replace
var content = $("#id").html();
with
var content = document.getElementById("id").innerHTML;
and
$("#id").append(element);
with
document.getElementById("id").appendChild(element);

.html(new_html) can be replaced by .innerHTML=new_html
.html() can be replaced by .innerHTML
.append() method has 3 modes:
Appending a jQuery element, which is irrelevant here.
Appending/Moving a dom element.
.append(elem) can be replaced by .appendChild(elem)
Appending an HTML code.
.append(new_html) can be replaced by .innerHTML+=new_html
Examples
var new_html = '<span class="caps">Moshi</span>';
var new_elem = document.createElement('div');
// .html(new_html)
new_elem.innerHTML = new_html;
// .append(html)
new_elem.innerHTML += ' ' + new_html;
// .append(element)
document.querySelector('body').appendChild(new_elem);
Notes
You cannot append <script> tags using innerHTML. You'll have to use appendChild.
If your page is strict xhtml, appending a non strict xhtml will trigger a script error that will break the code. In that case you would want to wrap it with try.
jQuery offers several other, less straightforward shortcuts such as prependTo/appendTo after/before and more.

To copy HTML from one div to another, just use the DOM.
function copyHtml(source, destination) {
var clone = source.ownerDocument === destination.ownerDocument
? source.cloneNode(true)
: destination.ownerDocument.importNode(source, true);
while (clone.firstChild) {
destination.appendChild(clone.firstChild);
}
}
For most apps, inSameDocument is always going to be true, so you can probably elide all the parts that function when it is false. If your app has multiple frames in the same domain interacting via JavaScript, you might want to keep it in.
If you want to replace HTML, you can do it by emptying the target and then copying into it:
function replaceHtml(source, destination) {
while (destination.firstChild) {
destination.removeChild(destination.firstChild);
}
copyHtml(source, destination);
}

Few years late to the party but anyway, here's a solution:
document.getElementById('your-element').innerHTML += "your appended text";
This works just fine for appending html to a dom element.

.html() and .append() are jQuery functions, so without using jQuery you'll probably want to look at document.getElementById("yourDiv").innerHTML
Javascript InnerHTML

Code:
<div id="from">sample text</div>
<div id="to"></div>
<script type="text/javascript">
var fromContent = document.getElementById("from").innerHTML;
document.getElementById("to").innerHTML = fromContent;
</script>

Develop Reference

JavaScript is the programming language of the Web.

innerHTML to ascii - javascript

You can also do a $('.lang-html').prop("innerText") which will hand you back the contents of that div, as real text. No further translation should be needed.

Related

Replace non-code text on webpage

How to render only parts of a string as HTML

Add actual HTML elements to PRE/CODE contents

Why isn't there a document.createHTMLNode()?

.html() and .append() without jQuery

Categories

Resources