cross-browser way to get contenteditable's content

cross-browser way to get contenteditable's content - javascript

Different browsers generate the elements inside contenteditable differently. It's especially obvious when you have line breaks, or paste multi-line stuffs in.
In the old days with textarea, you can simply do $('textarea').val() to retrieve the content inside, and it's reliable and cross-browser compatible.
I wonder is there such universally-agreed method to retrieve content inside a contenteditble as well, such that it's striped off html tags, and lines are properly separated by \n. If not, how does Facebook messenger do it reliably? Do you need a complicated algorithm with browser detection?

One way to do it would be this:
var content = document.querySelector('[contenteditable]').textContent
The jury's still out on whether it's universally agreed upon because you would probably have to account for differences between browsers regarding newlines and what not.

Related

Remove all javascript from page

I have a web page with control, that render user's HTML markup.
I want remove all JS calls (and CSS, I guess) to prevent users from injecting malware code. Replacing all script tags and all onclick with others handlers seems to be a bad idea, so questin is about the best solution for this XSS problem in .Net world.

I'd strongly suggest not going down the regex route (You can't parse HTML with Regex), and consider something like HTMLAgilityPack.
This would allow you to remove all script elements, as well as remove all event handlers from elements regardless of how they're set up.
The alternative is to escape all HTML input, and then manually parse the particular tags you're interested in.
<b>Hello</b>
Becomes
<b>Hello</>
And you can then match <(b|i|u|p|em|othertagsgohere)>(.+?)</$1> so that it will only match tags with no attributes on them of the types that you're interested in and. But ultimately I think the HTMLAgiltiyPack route is the better one.

Chrome inserts non-breaking spaces into copy and pasted content

I'm talking about content from inside a contenteditable div, and the target is the same contenteditable div. So no external programs involved.
The structure of the HTML in this div is that each individual word is inside a span with some data we need to track. Then the whitespace is left as text nodes between the spans. This works fine for the most part (screw you newlines) but I've encountered a strange problem when copy and pasting.
Chrome turns this
<span attrs="stuff">word</span> <span attrs="stuff">another</span>
into this:
<span attrs="stuff">word </span><span attrs="stuff">another</span>
or this:
<span attrs="stuff">word</span><span style="line-height: 16.79999"> </span><span attrs="stuff">another</span>
This obviously means that if the user copy and pastes over more than one line, then the formatting is completely screwed up, and the content of the span has changed which invalidates our data that we need to track.
The core problem is that other stuff in the div may contain non-breaking spaces for real reasons, so if I globally start swapping them out, then I might break that.
For my spans with my attrs, then I know what should be in them so it's easy to strip out the non-breaking spaces and restore it to how it should be. But for these strange spans with the odd line height, I've no idea how to clean them out without nuking everything.
Right now, I've stripped all the inserted spans that contain just a non-breaking space. But what I'd really like is to either stop Chrome from doing this in the first place, or an unambiguous means to identify the problematic extra spans so that I can clean them up in safety without breaking any similar spans that exist for real reasons. I could use this strange line-height I guess but that's pretty brittle and unsafe it feels.
How can I prevent the spans from appearing or identify them unambiguously?

The problem is not a Chrome problem only. All the time you copy HTML Code somewhere something like this can happen.
This is why you can use editors like CKEditor. They have advanced filter techniques to remove such bad HTML code.
I recommend to use a clipboard program to see how the HTML code is when you copy from different places: https://softwarerecs.stackexchange.com/questions/17710/see-clipboard-contents-hex-text
But implementing this on your own would be a waste of time in my opinion.
CKEditor can be configured very well to prevent the bad HTML code.
Recent versions of CKEditor have a very sophisticated content filtering approach. It is called "Advanced Content Filter".
Basically "Advanced Content Filter" means: The whole HTML code gets parsed or checked. In the case that there is no rule which matches to the given HTML code, it gets filtered out.

define word boundaries for HTML5 spellcheck

I have some HTML in a contenteditable that looks like <span>hello wor</span><strong>ld</strong></span>. If I change it, so that world is misspelt, I would like to be able to get suggestions on this complete word. However, this is what actually happens:
The text is separated into two words, left clicking simply gives suggestions for one or the other.
Is there any recourse?

The implementation of “spelling checks” requested by using the spellcheck attribute (which the question is apparently about) is heavily browser-dependent, and the HTML5 spec intentionally leaves the issue open. Browsers may implement whatever checks they like, the way they like, and they do. You cannot change this in your code.
Although modern browsers generally have spelling checks of some kind at least for English, they differ in the treatment of cases like this, among other things. Firefox treats adjacent inline elements (with no whitespace between them) as constituting one word, but Chrome and IE do not. Moreover, browsers might not spellcheck initial content, only content as entered or edited by the user.
The only way to get consistent spelling checking is to implement it yourself: instead of using the spellcheck attribute (“HTML5 spellcheck”), you would need to have spelling checking routine and integrate it into your HTML document using JavaScript. People who have implemented such systems normally have the routine running server-side and make the HTML page communicate with it using Ajax.

What's the best method for creating a simple Rich-Text WYSIWYG editor?

I need to create a simple rich-text editor that saves its contents to an XML file using arbitrary markup to indicate special text styles (e.g: [b]...[/b] for bold and [i]...[/i] for italic). All the backend PHP stuff seems fairly straightforward, but the front-end WYSIWYG portion of the feature seems a bit more convoluted. I've been reticent to use one of the currently-available JavaScript-based WYSIWYG editors because the rich-text options I want to allow are so limited, and these applications are so fully-featured that it almost seems like more work to stip them down to the functions I need.
So, in setting out to create a bare-bones rich-text editor, I've encountered three approaches:
The first two approaches use the contentEditable or designMode properties to create an editable element, and the execCommand() method to apply new text styles to a selected range.
The first option uses a standard div element, executes all styling commands on that elements contents.
The second option uses the editible body of a window enclosed in an iframe, then passes any styling commands initiated from buttons in the parent document into its contentWindow to alter selected ranges in the contained body. This seems like several extra steps to accomplish the same effect as option one, but I suppose the isolation of the editable content in its own document has its advantages.
The third option uses a textarea overlaying a div, and uses the oninput JS event to update the background div's innerHTML to match the input textarea's value whenever it changes. Obviously, this requires some string finagling to to convert elements like newline characters in the textarea to <br/> in the div, but this would allow me to preserve the integrity of my [/] markup, while relegating the potentially-messy DOM manipulation to front-end display only.
I can see benefits and drawbacks for each method. the contentEditable solutions seem initially the simplest, but support for this features tends to vary across browsers, and each browser that DOES support it seems to manipulate the DOM differently when implementing execCommand(). As mentioned before, the textarea/div solution seems like the best way to preserve my arbitrary styling conventions, but the custom string-manipulation procedure to display rich text in the output div could get pretty hairy.
So, I submit to you my question: Given the development goals I've outlined, which method would you choose, and why? And of course, if there's another method I'm overlooking that might better serve my purpose, please enlighten me!
Thanks in advance!

Have you looked at http://php.net/manual/en/book.bbcode.php? This is your answer. If you are having doubts, then you are doing something wrong. :-)
Then use JS to track keyup event and simple AJAX to print preview of the input. Just like in stackoverflow.
NB It would be far more efficient to generate the preview using plain-js BBcode approach. However, do not overcomplicate stuff unless you necessary need it.

The problem with BBCode, Markdown, ... is that it's not that trivial for genpop. I suggest looking at widgEditor, it is by far the simplest WYSIWYG editor I've seen to date. It was developed some time ago, so I am not sure about compatibility, but it sure is an inspiration.
I would have included this only as a comment, since it does not directly answer your question, but I am fairly new to SA and could not find out how to do that. Sorry.

Create your own HTML Textfield with Javascript

I came across the following http://ckeditor.com/demo , and was wondering if anyone had a basic tutorial how to implement this (or perhaps what key search terms I should use)?
Is this just a heavily modified TextField, or have they somehow managed to create a completely new TextField from scratch?
I tried googling this many times, and I always get pages relating to customizing the built-in TextField with CSS etc.

A good place to start if you want to learn how richtext web editors work is to look into the contenteditable attribute and the document.execCommand method (the best editors use a lot more than this, but these are at the foundation). Over-simplified, an editor consists of a contenteditable block and ways to invoke document.execCommand on the text selection.
But, speaking as a person who has actually developed an editor of this kind, you might be better off using an existing one (CKEditor being a great one, in my opinion).
Edit: Note that contenteditable is a proprietary (Microsoft) property, but most (all?) browsers have implemented it now, and it will be in HTML5.
Edit 2: I want to try to clear up a few misconceptions.
A div or iframe isn't in itself editable, it requires the contenteditable attribute. The use of an iframe is typically a workaround for the fact that older Gecko browsers only supported an alternative editable property (designMode) that could only be applied to a whole document.
While some operations of advanced editors probably do employ innerHtml, this isn't the key to making an editor on the web.

It is not a textbox. It is a DIV that has lots of HTML injected to it with javascript.
The basic idea is that JavaScript uses the innerHtml property of the div and writes HTML to it.

This is a javascript implementation that replaces a input. It basically hides the input and uses it for storing and passing the data via POST.

The advanced textfields I have seen have all been iframe or div. The code behind them is quiet messy and not very accessible.
Proceed with caution!

You may want to consider WYSIWYM instead of WYSIWYG.

Develop Reference

JavaScript is the programming language of the Web.