JS help with selectively escaping html into preview area

JS help with selectively escaping html into preview area - javascript

I've been having this problem for a while now and nobody can seem to answer me fully...
I am building a forum with a simple textarea editor, and would like a preview area that updates when you type, as it does here at stackoverflow.
var text = $("#edit").val();
$("#preview").html(text);
As it is, the above code works fine, until you start inputting html. I want to allow basic formatting such as bold and italics, as well as support for adding in <code> tags.
EVERTHING WITHIN the code tags needs to JUST be TEXT, but the jquery above is telling ALL of my output/preview text to be put through as html - so whether I put "<" or "<".
I have tried .replace(/</,"<") methods, and RegEx methods but nothing seems to have worked yet.
I am after something very similar to how the preview area works here - if that's any help - but I'm hoping someone will be able shed some light on how I might do this.
Many thanks

In this case, you do not need to allow html and be as benevolent as browsers are. Your implementation can be extremely strict!
When you are extremely strict, your formatting job is very easy. For example you can only accept this syntax[tag=parameter] and [/tag] where tag can be u,i,b,c (c for color) and parameter can be anything except ] or/and use special characters that should not appear often in normal text.
Then you can create some rules, assign priority to them and process the text based on that priority.
[code] is for multiline blocks (cannot be combined with others)
` is for inline code blocks (cannot be combined with others)
* is for bold text
** is for italic text
Now you simply need to find the tags in code and format the text accordingly:
function textify(text) { return $('<div/>').text(text).html(); }
function formatText(text)
{
if (text == '') return '';
var start = text.indexOf('[code]');
var end = text.indexOf('[/code]', start);
if ((end > start) && (start >= 0))
{
return formatText(text.substring(0, start))
+ '<pre>'
+ text.substring(start + 6, end)
+ '</pre>'
+ formatText(text.substring(end + 7));
}
text = text.replace(new RegExp('(^|\\s|>)\\*\\*(\\S.*?\\S)\\*\\*($|\\s|<)', 'gim') , '$1<strong>$2</strong>$3');
text = text.replace(new RegExp('(^|\\s|>)\\*(\\S.*?\\S)\\*($|\\s|<)', 'gim') , '$1<em>$2</em>$3');
return text;
}
And in your event handlers:
$("#preview").html(formatText(textify(text)));

Take a look at http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer for a standalone HTML sanitizer written in JavaScript.

There are many pitfalls to allowing selective html. Since, in your own words, you are still learning, i would suggest you go another way.
Instead of allowing selective html, create your own syntax, like bbcode and then convert that to the tags you allow.
This will be much easier for you to control.

Have a look at the innerText (IE) and textContent (w3c) DOM properties. They set/retreive only text, not tags and comments.
http://blog.coderlab.us/2006/04/18/the-textcontent-and-innertext-properties/
EDIT: I just noticed you want some HTML. In that case you need a much more complex solution than you can write on your own. You will need to investigate existing solutions like ckeditor. I believe the one used here on stack overflow is also open-source but I'm not sure what it's called or where you'd find the documentation. I can recommend ckeditor though, it's very powerful and does support selective filtering of allowed tags.

Related

How do I get document.getElementsByTagName('').innerHTML to make text between 2 tags?

I'm trying to use JavaScript to include a footer on several webpages, so if I want to change the footer, I only have to change it in one place. PHP is not available on this server and neither are server side inserts (SSI), but Perl, Python, and Tcl are available. I have been trying with document.getElementsByTagName('footer').innerHTML = "text"; but it doesn't produce text. I copied this code from dev.mozilla, and it tells me how many tags I have:
var footer = document.getElementsByTagName('footer');
var num = footer.length;
console.log('There is ' + num + ' footer in this document');
So, I don't know what's wrong with the innerHTML script. I also tried with paragraph tags and got the same results in both cases.

I reccoment using textContent instead. Se why here.
To see how it works, paste the following into your browser console while you're on StackOverflow and hit enter.
document.querySelector('.site-footer').textContent = 'Custom footer content.'
note: use querySelector with a class instead of getElementByTagName
Cheers! 🍻

Before asking this question, I had searched for Python includes without any luck, so I stopped there, but after asking this question, I thought that I should search for Perl/Ruby includes. Today, I found out that I can use the Perl use function, so I could study that and try to implement it although I am completely new to Perl. Ruby also appears capable, perhaps even more. I have no experience with Ruby either, but maybe I should start there.

I just figured out that getElementsByTagName() results in an array, so I have to refer to the footer's index with [0]:
var footerTags = document.getElementsByTagName('footer');
footerTags[0].innerHTML = "test";

How to apply CSS to part of some text without breaking the text content itself

I am currently considering how to support a legacy web application in a new language(language here meaning spoken language - not code!).
I will be doing this using some form of javascript internationalisation library however I've stumbled upon an issue.
Currently the application can be driven solely by keyboard shortcuts - these short cuts are indicated to the user by underlining the letter of a function label on the screen which corresponds with a short cut.
For example:
<u>R</u>un
<u>J</u>ump
J<u>o</u>g
The problem is when these strings are replaced with tokens for internationalisation the strings are going to be stored as plain text and I would like to not have to tarnish these strings files with html tags(especially a tag which is discouraged nowadays anyway)
If we decouple logic to decide which letter to underline - which could well change with along with a language change - how could I go about underlining a single character in a string? Is it even possible?
//HTML
//Strings file
action.jump=Jump
//Javascript/JQuery
$('<someHowOnlySelectAParticularLetter('J')> #jump').css({text-decoration:overline});
function someHowOnlySelectAParticularLetter(var character){
//TODO
}
Thanks in advance for any responses - even I haven't explained the issue at hand clearly please say so and I will attempt to clarify any questions!

What letter is "active" is language-dependent, so this info has to be stored in each language specific config file (translation table file):
English:
RUN: "Run"
RUN_ACTION: "R"
French:
RUN: "Courir"
RUN_ACTION: "C"
Then use this information (and meta-infromation) to generate your HTML:
function buildAction(label, letter) {
return label.replace(letter, '<u id="action-' + letter + '">' + letter + '</u>');
}
var html = '<p>menu: ' + buildAction(RUN, RUN_ACTION) + ', ...</p>';
document.write(html);
Then you can $('#action-' + RUN_ACTION).css and $('#action-' + RUN_ACTION).click.
With this you only need to switch between translation table files.
I'd generate the HTML server-side though.

If you want to keep using that design, you're gonna run into all sorts of problems.
What if the translated word doesn't have the letter shortcut you applied to your other language?
If a user gets used to a set of shortcuts and changes the language, are all the shortcuts he is used to going to change?
For example, Ctrl+S is a widely used shortcut for Save, even if some languages don't have a S in their translation of 'Save'. Change that letter to W, which is the common shortcut for Quit, and you're in for an unpleasant user experience.
I suggest you change your markup to
(R) Run
(J) Jump
(O) Jog
That way you only need to translate the word part, and leave the shortcut as it is.

It seems that you would have to use a little bit of RegEx (regular expressions) and .split to be able to grab that letter, store it in a variable and then style it with jquery's .css method.

It is an admirable goal to separate data from presentation.
I don't think pure CSS will get you all the way there, without also having some supporting HTML markup.
You actually need the hotkey information in two places:
In the UI markup
In the code that processes key presses
I would suggest that you store the information about the hotkey in a format similar to:
// Word Hotkey Offset Function Language
// Sichern S 0 Save DE
// Springe p 1 Jump DE
(example above uses German).
Use that data to drive
Rendering of the UI (e.g. when rendering to HTML markup, wrap the character position designated by Offset with a tag of your choice that matches your CSS rules.
Have the code that captures key clicks and executes functionality use the same data.

Better Way to Sanitize HTML for Insertion

In a recent review by the AMO editors, my Thunderbird addon's version was rejected because it "creates HTML from strings containing unsanitized data" - which "is a major security risk".
I think I understand why. Now, my problem is about how to solve that issue.
This thread gave me some clues, but it's not quite what I need.
My addon needs to paste the contents of the clipboard as a hyperlink, by using the clipboard contents as the link text, and inserting html around it like this: `" + clipboardtext + "".
Now, if I am inserting the clipboard contents as HTML, I need to "sanitize" it first. Here is what I came up with. Now, I haven't written in the regex part yet, because I don't think this is the best way to do this, although I think it will work:
function makeSafeHTML(whathtml){
var parser = Cc["#mozilla.org/parserutils;1"].getService(Ci.nsIParserUtils);
var sanitizedHTML = parser.sanitize(whathtml, 01);
//now remove the extratags added by the sanitization method, perhaps via regex
//"<html><head></head><body>"
//"</body></html>"
return sanitizedHTML;
}
My intent is to do this with the resulting "sanitized" string - this will paste the string as the href value of a hyperlink:
var html_editor = editor.QueryInterface(Components.interfaces.nsIHTMLEditor);
html_editor.insertHTML("<a href='"+whathref+"'>"+whattext+"</a>");
So I am looking for a better way to get sanitized HTML into a simple string variable. Would any of you do it this way?

It seems that you simply want to insert clipboard contents into HTML code as pure text - you don't need any complicated escaping approach then, it's enough to make sure all "dangerous" characters are replaced by HTML entities:
var sanitizedText = text.replace(/&/g, "&").replace(/</g, "<")
.replace(/>/g, ">").replace(/"/g, """);
It's not clear from your question what you do with the generated HTML code. If you add it to a DOM document via something like innerHTML then you can do better - add the HTML code first and manipulate the text in the document then:
document.getElementById("text-container").textContent = text;
Using Node.textContent to set text in a document is always safe, no escaping needs to be performed.

How can spaces be converted to &nbsp without breaking HTML tags?

I've inherited some pretty complex code for a web forum, and one of the features I'm trying to implement is the ability for spaces to not be truncated into only one. This is mainly because our users often want to include ASCII art, tables etc in their posts.
I first did this using a simple search and replace in javascript, which had the side effect of breaking HTML tags (eg <a href=....> became <a href=.....>).
I then tried doing this on server side, when the strings are retrieved, by having spaces converted before links and code people insert is converted to HTML. This works to a degree but it causes some issues with other parts of the code, for example where a message is truncated to appear on the home page, it might leave some of the space code, such as
Here is a message&nb
I think there may be a way to just alter the original javascript to achieve this - it just needs to only match spaces that are not inside a HTML tag.
The script I was using originally was message = message.replace(/\s/g, " ").
Thanks for any help you can provide with this.

You can use the pre element to include preformatted text, which renders spaces as-is. See http://www.w3.org/TR/html5-author/the-pre-element.html
Those docs specifically say one of the best uses of the pre element is "Displaying ASCII art".
Example: http://jsbin.com/owuruz/edit#preview
<pre>
/\_/\
____/ o o \
/~____ =ø= /
(______)__m_m)
</pre>
In your case, just put your message inside a pre tag.

Yes, but you need to process text content of elements, not all of the HTML document content. Moreover, you need to exclude style and script element content. As you can limit yourself to things inside the body element, you could use a recursive function like following, calling it with process(document.body) to apply it to the entire document (but you probably want to apply it to a specific element only):
function process(element) {
var children = element.childNodes;
for(var i = 0; i < children.length; i++) {
var child = children[i];
if(child.nodeType === 3) {
if(child.data) {
child.data = child.data.replace(/[ ]/g, "\xa0");
}
} else if(child.tagName != "SCRIPT") {
process(child);
}
}
}
(No reason to use the entity reference here; you can use the no-break space character U+00A0 itself, referring to it as "\xa0" in JavaScript.)

One way is to use <pre> tags to wrap your users posts so that their ASCII art is preserved. But why not use Markdown (like Stackoverflow does). There's a couple of different ports of Markdown to Javascript:
Showdown
WMD
uedit

How to print pretty xml in javascript?

What's the best way to pretty-print xml in JavaScript? I obtain xml content through ajax call and before displaying this request in textarea i want to format it so it looks nice to the eye :)

This does not take care of any indenting, but helps to encode the XML for use within <pre> or <textarea> tags:
/* hack to encode HTML entities */
var d = document.createElement('div');
var t = document.createTextNode(myXml);
d.appendChild(t);
document.write('<pre>' + d.innerHTML + '</pre>');
And if, instead of a <textarea>, you'd want highlighting and the nodes to be collapsable/expandable, then see Displaying XML in Chrome Browser on Super User.

take a look at the vkBeautify.js plugin
http://www.eslinstructor.net/vkbeautify/
it is exactly what you need.
it's written in plain javascript, less then 1.5K minified and very fast: takes less then 5 msec. to process 50K XML text.

Here is a small self contained prettifier that works for most cases does nice indenting for long lines and colorizes the output if needed.
function formatXml(xml,colorize,indent) {
function esc(s){return s.replace(/[-\/&<> ]/g,function(c){ // Escape special chars
return c==' '?' ':'&#'+c.charCodeAt(0)+';';});}
var sm='<div class="xmt">',se='<div class="xel">',sd='<div class="xdt">',
sa='<div class="xat">',tb='<div class="xtb">',tc='<div class="xtc">',
ind=indent||' ',sz='</div>',tz='</div>',re='',is='',ib,ob,at,i;
if (!colorize) sm=se=sd=sa=sz='';
xml.match(/(?<=<).*(?=>)|$/s)[0].split(/>\s*</).forEach(function(nd){
ob=('<'+nd+'>').match(/^(<[!?\/]?)(.*?)([?\/]?>)$/s); // Split outer brackets
ib=ob[2].match(/^(.*?)>(.*)<\/(.*)$/s)||['',ob[2],'']; // Split inner brackets
at=ib[1].match(/^--.*--$|=|('|").*?\1|[^\t\n\f \/>"'=]+/g)||['']; // Split attributes
if (ob[1]=='</') is=is.substring(ind.length); // Decrease indent
re+=tb+tc+esc(is)+tz+tc+sm+esc(ob[1])+sz+se+esc(at[0])+sz;
for (i=1;i<at.length;i++) re+=(at[i]=="="?sm+"="+sz+sd+esc(at[++i]):sa+' '+at[i])+sz;
re+=ib[2]?sm+esc('>')+sz+sd+esc(ib[2])+sz+sm+esc('</')+sz+se+ib[3]+sz:'';
re+=sm+esc(ob[3])+sz+tz+tz;
if (ob[1]+ob[3]+ib[2]=='<>') is+=ind; // Increase indent
});
return re;
}
for demo see https://jsfiddle.net/dkb0La16/

I agree with Arjan on utilizing the <pre> tags. I was trying to decipher 'ugly' xml code in my html output before I tried this out about 2 days ago. Makes life much easier and keeps you sane.

This is not the best way to do this but you can get the xml as text and use RegExp to find and replace '>' with tabs according to the depth of the node and breaklines but I don't really know RegExp very well, sorry.
You can also use XSLT and transform it using javascript.
Check this link and take a look at this tutorial.

Use prettydiff.com/markup_beauty.js. This is capable of supporting invalid markup, fragments, and JSTL code.
<c:out value="<strong>text</strong>"/>
You can demo that application using a web tool at prettydiff.com. Just choose the "beautify" and "markup" options.
It is important that you use a proper tool to beautify your XML and not arbitrarily rush the job. Otherwise you will add white space tokens where they were not intended and remove them where they were intended. To raw data this may be consequential, but to human consumable content this destroys the integrity of your code, especially with regard to recursion.

Develop Reference

JavaScript is the programming language of the Web.

JS help with selectively escaping html into preview area - javascript

Take a look at http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer for a standalone HTML sanitizer written in JavaScript.

There are many pitfalls to allowing selective html. Since, in your own words, you are still learning, i would suggest you go another way. Instead of allowing selective html, create your own syntax, like bbcode and then convert that to the tags you allow. This will be much easier for you to control.

Related

How do I get document.getElementsByTagName('').innerHTML to make text between 2 tags?

How to apply CSS to part of some text without breaking the text content itself

Better Way to Sanitize HTML for Insertion

How can spaces be converted to &nbsp without breaking HTML tags?

How to print pretty xml in javascript?

Categories

Resources