Extracting text from a contentEditable div

Extracting text from a contentEditable div - javascript

I have a div set to contentEditable and styled with "white-space:pre" so it keeps things like linebreaks. In Safari, FF and IE, the div pretty much looks and works the same. All is well. What I want to do is extract the text from this div, but in such a way that will not lose the formatting -- specifically, the line breaks.
We are using jQuery, whose text() function basically does a pre-order DFS and glues together all the content in that branch of the DOM into a single lump. This loses the formatting.
I had a look at the html() function, but it seems that all three browsers do different things with the actual HTML that gets generated behind the scenes in my contentEditable div. Assuming I type this into my div:
1
2
3
These are the results:
Safari 4:
1
<div>2</div>
<div>3</div>
Firefox 3.6:
1
<br _moz_dirty="">
2
<br _moz_dirty="">
3
<br _moz_dirty="">
<br _moz_dirty="" type="_moz">
IE 8:
<P>1</P><P>2</P><P>3</P>
Ugh. Nothing very consistent here. The surprising thing is that MSIE looks the most sane! (Capitalized P tag and all)
The div will have dynamically set styling (font face, colour, size and alignment) which is done using CSS, so I'm not sure if I can use a pre tag (which was alluded to on some pages I found using Google).
Does anyone know of any JavaScript code and/or jQuery plugin or something that will extract text from a contentEditable div in such a way as to preserve linebreaks? I'd prefer not to reinvent a parsing wheel if I don't have to.
Update: I cribbed the getText function from jQuery 1.4.2 and modified it to extract it with whitespace mostly intact (I only chnaged one line where I add a newline);
function extractTextWithWhitespace( elems ) {
var ret = "", elem;
for ( var i = 0; elems[i]; i++ ) {
elem = elems[i];
// Get the text from text nodes and CDATA nodes
if ( elem.nodeType === 3 || elem.nodeType === 4 ) {
ret += elem.nodeValue + "\n";
// Traverse everything else, except comment nodes
} else if ( elem.nodeType !== 8 ) {
ret += extractTextWithWhitespace2( elem.childNodes );
}
}
return ret;
}
I call this function and use its output to assign it to an XML node with jQuery, something like:
var extractedText = extractTextWithWhitespace($(this));
var $someXmlNode = $('<someXmlNode/>');
$someXmlNode.text(extractedText);
The resulting XML is eventually sent to a server via an AJAX call.
This works well in Safari and Firefox.
On IE, only the first '\n' seems to get retained somehow. Looking into it more, it looks like jQuery is setting the text like so (line 4004 of jQuery-1.4.2.js):
return this.empty().append( (this[0] && this[0].ownerDocument || document).createTextNode( text ) );
Reading up on createTextNode, it appears that IE's implementation may mash up the whitespace. Is this true or am I doing something wrong?

Unfortunately you do still have to handle this for the pre case individually per-browser (I don't condone browser detection in many cases, use feature detection...but in this case it's necessary), but luckily you can take care of them all pretty concisely, like this:
var ce = $("<pre />").html($("#edit").html());
if($.browser.webkit)
ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; });
if($.browser.msie)
ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; });
if($.browser.mozilla || $.browser.opera ||$.browser.msie )
ce.find("br").replaceWith("\n");
var textWithWhiteSpaceIntact = ce.text();
You can test it out here. IE in particular is a hassle because of the way is does and new lines in text conversion, that's why it gets the <br> treatment above to make it consistent, so it needs 2 passes to be handled correctly.
In the above #edit is the ID of the contentEditable component, so just change that out, or make this a function, for example:
function getContentEditableText(id) {
var ce = $("<pre />").html($("#" + id).html());
if ($.browser.webkit)
ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; });
if ($.browser.msie)
ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; });
if ($.browser.mozilla || $.browser.opera || $.browser.msie)
ce.find("br").replaceWith("\n");
return ce.text();
}
You can test that here. Or, since this is built on jQuery methods anyway, make it a plugin, like this:
$.fn.getPreText = function () {
var ce = $("<pre />").html(this.html());
if ($.browser.webkit)
ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; });
if ($.browser.msie)
ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; });
if ($.browser.mozilla || $.browser.opera || $.browser.msie)
ce.find("br").replaceWith("\n");
return ce.text();
};
Then you can just call it with $("#edit").getPreText(), you can test that version here.

I forgot about this question until now, when Nico slapped a bounty on it.
I solved the problem by writing the function I needed myself, cribbing a function from the existing jQuery codebase and modifying it to work as I needed.
I've tested this function with Safari (WebKit), IE, Firefox and Opera. I didn't bother checking any other browsers since the whole contentEditable thing is non-standard. It is also possible that an update to any browser could break this function if they change how they implement contentEditable. So programmer beware.
function extractTextWithWhitespace(elems)
{
var lineBreakNodeName = "BR"; // Use <br> as a default
if ($.browser.webkit)
{
lineBreakNodeName = "DIV";
}
else if ($.browser.msie)
{
lineBreakNodeName = "P";
}
else if ($.browser.mozilla)
{
lineBreakNodeName = "BR";
}
else if ($.browser.opera)
{
lineBreakNodeName = "P";
}
var extractedText = extractTextWithWhitespaceWorker(elems, lineBreakNodeName);
return extractedText;
}
// Cribbed from jQuery 1.4.2 (getText) and modified to retain whitespace
function extractTextWithWhitespaceWorker(elems, lineBreakNodeName)
{
var ret = "";
var elem;
for (var i = 0; elems[i]; i++)
{
elem = elems[i];
if (elem.nodeType === 3 // text node
|| elem.nodeType === 4) // CDATA node
{
ret += elem.nodeValue;
}
if (elem.nodeName === lineBreakNodeName)
{
ret += "\n";
}
if (elem.nodeType !== 8) // comment node
{
ret += extractTextWithWhitespace(elem.childNodes, lineBreakNodeName);
}
}
return ret;
}

see this fiddle
Or this post
How to parse editable DIV's text with browser compatibility
created after lot of effort...........

I discovered this today in Firefox:
I pass a contenteditable div who's white-space is set to "pre" to this function, and it works sharply.
I added a line to show how many nodes there are, and a button that puts the output into another PRE, just to prove that the linebreaks are intact.
It basically says this:
For each child node of the DIV,
if it contains the 'data' property,
add the data value to the output
otherwise
add an LF (or a CRLF for Windows)
}
and return the result.
There is an issue, tho. When you hit enter at the end of any line of the original text, instead of putting a LF in, it puts a "Â" in. You can hit enter again and it puts a LF in there, but not the first time. And you have to delete the "Â" (it looks like a space). Go figure - I guess that's a bug.
This doesn't occur in IE8. (change textContent to innerText) There is a different bug there, tho. When you hit enter, it splits the node into 2 nodes, as it does in Firefox, but the "data" property of each one of those nodes then becomes "undefined".
I'm sure there's much more going on here than meets the eye, so any input on the matter will be enlightening.
<!DOCTYPE html>
<html>
<HEAD>
<SCRIPT type="text/javascript">
function htmlToText(elem) {
var outText="";
for(var x=0; x<elem.childNodes.length; x++){
if(elem.childNodes[x].data){
outText+=elem.childNodes[x].data;
}else{
outText+="\n";
}
}
alert(elem.childNodes.length + " Nodes: \r\n\r\n" + outText);
return(outText);
}
</SCRIPT>
</HEAD>
<body>
<div style="white-space:pre;" contenteditable=true id=test>Text in a pre element
is displayed in a fixed-width
font, and it preserves
both spaces and
line breaks
</DIV>
<INPUT type=button value="submit" onclick="document.getElementById('test2').textContent=htmlToText(document.getElementById('test'))">
<PRE id=test2>
</PRE>
</body>
</html>

here's a solution (using underscore and jquery) that seems to work in iOS Safari (iOS 7 and 8), Safari 8, Chrome 43, and Firefox 36 in OS X, and IE6-11 on Windows:
_.reduce($editable.contents(), function(text, node) {
return text + (node.nodeValue || '\n' +
(_.isString(node.textContent) ? node.textContent : node.innerHTML));
}, '')
see test page here: http://brokendisk.com/code/contenteditable.html
although I think the real answer is that if you're not interested in the markup provided by the browser, you shouldn't be using the contenteditable attribute - a textarea would be the proper tool for the job.

this.editableVal = function(cont, opts)
{
if (!cont) return '';
var el = cont.firstChild;
var v = '';
var contTag = new RegExp('^(DIV|P|LI|OL|TR|TD|BLOCKQUOTE)$');
while (el) {
switch (el.nodeType) {
case 3:
var str = el.data.replace(/^\n|\n$/g, ' ').replace(/[\n\xa0]/g, ' ').replace(/[ ]+/g, ' ');
v += str;
break;
case 1:
var str = this.editableVal(el);
if (el.tagName && el.tagName.match(contTag) && str) {
if (str.substr(-1) != '\n') {
str += '\n';
}
var prev = el.previousSibling;
while (prev && prev.nodeType == 3 && PHP.trim(prev.nodeValue) == '') {
prev = prev.previousSibling;
}
if (prev && !(prev.tagName && (prev.tagName.match(contTag) || prev.tagName == 'BR'))) {
str = '\n' + str;
}
}else if (el.tagName == 'BR') {
str += '\n';
}
v += str;
break;
}
el = el.nextSibling;
}
return v;
}

Related

"False" text node causing trouble

I'm working on a DOM traversal type of script and I'm almost finished with it. However, there is one problem that I've encountered and for the life of me, I can't figure out what to do to fix it. Pardon my ineptitude, as I'm brand new to JS/JQuery and I'm still learning the ropes.
Basically, I'm using Javascript/JQuery to create an "outline", representing the structure of an HTML page, and appending the "outline" to the bottom of the webpage. For example, if the HTML is this...
<html>
<head>
</head>
<body>
<h1>Hello World</h1>
<script src=”http://code.jquery.com/jquery-2.1.0.min.js” type=”text/javascript”>
</script>
<script src=”outline.js” type=”text/javascript”></script>
</body>
</html>
Then the output should be an unordered list like this:
html
head
body
h1
text(Hello World)
script src(”http://code.jquery.com/jquery-2.1.0.min.js”) type(”text/javascript”)
script src(”outline.js”) type(”text/javascript”)
Here's what I've got so far:
var items=[];
$(document).ready(function(){
$("<ul id = 'list'></ul>").appendTo("body");
traverse(document, function (node) {
if(node.nodeName.indexOf("#") <= -1){
items.push("<ul>"+"<li>"+node.nodeName.toLowerCase());
}
else {
var x = "text("+node.nodeValue+")";
if(node.nodeValue == null) {
items.push("<li> document");
}
else if(/[a-z0-9]/i.test(node.nodeValue) && node.nodeValue != null) {
items.push("<ul><li>"+ x +"</ul>");
}
else {
items.push("</ul>");
}
}
});
$('#list').append(items.join(''));
});
function traverse(node, func) {
func(node);
node = node.firstChild;
while (node) {
traverse(node, func);
node = node.nextSibling;
}
}
It works almost perfectly, except it seems to read a carriage return as a text node. For example, if there's
<head><title>
it reads that properly, adding head as an unordered list element, and then creating a new "unordered list" for title, which is nested inside the header. HOWEVER, if it's
<head>
<title>
It makes the new unordered list and its element, "head", but then jumps to the else statement that does items.push(</ul>) . How do I get it to ignore the carriage return? I tried testing to see if the nodeValue was equal to the carriage return, \r, but that didn't seem to do the trick.

I'm having a bit of a hard time understanding exactly which text nodes you want to skip. If you just want to skip a text node that is only whitespace, you can do that like this:
var onlyWhitespaceRegex = /^\s*$/;
traverse(document, function (node) {
if (node.nodeType === 3 && onlyWhitespaceRegex.test(node.nodeValue) {
// skip text nodes that contain only whitespace
return;
}
else if (node.nodeName.indexOf("#") <= -1){
items.push("<ul>"+"<li>"+node.nodeName.toLowerCase());
} else ...
Or, maybe you just want to trim any multiple leading or trailing whitespaces off a text node before displaying it since it may not display in HTML.
var trimWhitespaceRegex = /^\s+|\s+$/g;
traverse(document, function (node) {
if(node.nodeName.indexOf("#") <= -1){
items.push("<ul>"+"<li>"+node.nodeName.toLowerCase());
} else {
var text = node.nodeValue;
if (node.nodeType === 3) {
text = text.replace(trimWhitespaceRegex, " ");
}
var x = "text("+text+")";
if(node.nodeValue == null) {
items.push("<li> document");
} ....
A further description of exactly what you're trying to achieve in the output for various forms of different text nodes would help us better understand your requirements.

how to use jquery to insert a character at specified position into monospaced textarea

I have a monospaced textarea (not unlike the stackexchange editor). When my user clicks, I need a character to automagically appear on the previous line using jQuery. I know I need to use .click() to bind a function to that event, but the logic of the function eludes me.
Desired Behavior...user will click at position of the asterisk *
Here is some text in my editor.
When I double click at a position*
I want to insert a new line above, with a new character at the same position
The above text should become the following after the function gets run
Here is some text in my editor.
*
When I double click at a position*
I want to insert a new blank line above, at the same position
What I have tried:
I have found the caret jQuery plugin, which has a function called caret() that I can get to find the position of the the asterisk when I click (the position is 74).
<script src='jquery.caret.js'></script>
$('textarea').click(function(e) {
if (e.altKey){
alert($("textarea").caret());
}
});
But I really need to know the position of the character within the line, not the entire textarea. So far this eludes me.

Here's something without using caret.js
$('textarea').dblclick(function(e){
var text = this.value;
var newLinePos = text.lastIndexOf('\n', this.selectionStart);
var lineLength = this.selectionStart - newLinePos;
var newString = '\n';
for(var i=1; i < lineLength; ++i){
newString += ' ';
}
newString += text.substr(this.selectionStart,this.selectionEnd-this.selectionStart);
this.value = [text.slice(0, newLinePos), newString, text.slice(newLinePos)].join('');
});
Here's a fiddle. Credit to this post for 'inserting string into a string at specified position'.
Just realised that doing that on the top line is a bit broken, I'll have a look when I get home!
Update
Fixed the top-line problem.
if(newLinePos == -1){
this.value = newString + '\n' + this.value;
} else {
this.value = [text.slice(0, newLinePos), '\n'+newString, text.slice(newLinePos)].join('');
}
http://jsfiddle.net/daveSalomon/3dr8k539/4/

Assuming you know the position of the caret in the whole text area here's something you might do with it.
function getCaretPosition(text, totalOffset) {
var line = 0, pos = 0;
for (var i = 0; i < Math.min(totalOffset, text.length); i++) {
if (text[i] === '\n') {
line++;
pos = 0;
} else {
pos++;
}
}
return { row: line, col: pos };
}
var caretPosition = getCaretPosititon($("textarea").val(), $("textarea").caret());

How to get startOffset/endOffset of selected html text?

I'm trying to wrap selected text from a "contenteditable" div in a given tag. Below seems to be working ok but startOffset/endOffset doesn't include HTML text. My question is how do I get the Range object to count the html tags if they exist in the selection?
getSelectedText: function() {
var range;
if (window.getSelection) {
range = window.getSelection().getRangeAt(0);
return [range.startOffset, range.endOffset];
}
}
toggleTagOnRange: function(range, tag, closeTag) {
var removeExp, val;
if (closeTag == null) {
closeTag = tag;
}
val = this.get("value");
removeExp = RegExp("<" + tag + ">(.+)</" + closeTag + ">");
if (removeExp.test(val)) {
this.set("value", val.replace(removeExp, function(match, $1) {
return $1;
}));
} else {
if (range.length > 1) {
val = val.splice(range[1], "</" + closeTag + ">").splice(range[0], "<" + tag + ">");
this.set("value", val);
}
}
return this.get("val");
}
// this is called from a bold button click handler.
this.toggleTagOnSelection(this.getSelectedText(), 'strong');
Interested in other solutions if you've got them.

Honestly, things can get pretty nasty when trying to write code for this type of thing yourself. There are a lot of cases you need to cover, like when you're selecting text across multiple <p> tags for example. You don't need to reinvent the wheel. Look into a library like rangy where they have already taken care of the nitty gritty details. Specifically for your situation, if you can get by with using CSS styles instead of using tag elements like <strong>, look into the CSS Class Applier Module, which allows you to do this simply by doing:
var cssApplier = rangy.createCssClassApplier("someClass", {normalize: true});
cssApplier.toggleSelection();
Where .someClass is a CSS class containing whatever styles you need to apply.

Javascript - Only works if I do an alert()

I am trying to handle a contenteditable body in an iframe, in order to prevent browsers from adding br,p or div on their own when pressing Enter. But something weird happens when trying to reset the focus, and it just does work when making an alert() before processing the rest of the code. I think it is because javascript needs some time to make operations, but there must be a way to do it without "sleeping" the script...
Here I paste my code (working with Jquery), only WITH the "magic Alerts" it works perfectly:
//PREVENT DEFAULT STUFF
var iframewindow=document.getElementById('rte').contentWindow;
var input = iframewindow.document.body;
$( input ).keypress( function ( e ) {
var sel, node, offset, text, textBefore, textAfter, range;
// the Selection object
sel = iframewindow.getSelection();
alert(sel); //MAGIC ALERT
// the node that contains the caret
node = sel.anchorNode;
alert(node); //MAGIC ALERT
// if ENTER was pressed while the caret was inside the input field
if ( node.parentNode === input && e.keyCode === 13 ) {
// prevent the browsers from inserting <div>, <p>, or <br> on their own
e.preventDefault();
// the caret position inside the node
offset = sel.anchorOffset;
// insert a '\n' character at that position
text = node.textContent;
textBefore = text.slice( 0, offset );
textAfter = text.slice( offset ) || ' ';
node.textContent = textBefore + '\n' + textAfter;
SEEREF=SEEREF.replace(/\n/g, "<br>");
// position the caret after that newBR character
range = iframewindow.document.createRange();
range.setStart( node, offset + 4 );
range.setEnd( node, offset + 4 );
// update the selection
sel.removeAllRanges();
sel.addRange( range );
}
});
SEEREF = framewindow.document.body.innerHTML (it was too long)
Edit
When I remove the Magic Alerts it still works on Chrome, but in FF it focuses on the beginning of all! (Like if it were offset=0)
UPDATE
It seems like the porblem is the line which replaces the newlines with br tags. If I remove this line, it works perfectly even without the alerts. I need to keep this br tags, is there any other way to do it?

This question is narrow of yours one. So you should combine doc & win:
var idoc= iframe.contentDocument || iframe.contentWindow.document; // ie compatibility
var iwin= iframe.contentWindow || iframe.contentDocument.defaultView;
... idoc.getSelection() +(''+iwin.getSelection()) //firefox fix

javascript catch paste event in textarea

I currently have a textarea which I requires control over text that has been pasted in,
essentially I need to be able to take whatever the user wants to paste into a textarea and place it into a variable.
I will then work out the position in which they pasted the text and the size of the string to remove it from the textarea,
Then at the end deal with the text thats is in the variable in my own way.
My question: how would I go about getting a copy of the text in a variable that was just pasted in by the user?
Thanks.

I answered a similar question a few days ago: Detect pasted text with ctrl+v or right click -> paste. This time I've included quite a long function that accurately gets selection boundaries in textarea in IE; the rest is relatively simple.
You can use the paste event to detect the paste in most browsers (notably not Firefox 2 though). When you handle the paste event, record the current selection, and then set a brief timer that calls a function after the paste has completed. This function can then compare lengths to know where to look for the pasted content. Something like the following:
function getSelectionBoundary(el, start) {
var property = start ? "selectionStart" : "selectionEnd";
var originalValue, textInputRange, precedingRange, pos, bookmark, isAtEnd;
if (typeof el[property] == "number") {
return el[property];
} else if (document.selection && document.selection.createRange) {
el.focus();
var range = document.selection.createRange();
if (range) {
// Collapse the selected range if the selection is not a caret
if (document.selection.type == "Text") {
range.collapse(!!start);
}
originalValue = el.value;
textInputRange = el.createTextRange();
precedingRange = el.createTextRange();
pos = 0;
bookmark = range.getBookmark();
textInputRange.moveToBookmark(bookmark);
if (/[\r\n]/.test(originalValue)) {
// Trickier case where input value contains line breaks
// Test whether the selection range is at the end of the
// text input by moving it on by one character and
// checking if it's still within the text input.
try {
range.move("character", 1);
isAtEnd = (range.parentElement() != el);
} catch (ex) {
log.warn("Error moving range", ex);
isAtEnd = true;
}
range.moveToBookmark(bookmark);
if (isAtEnd) {
pos = originalValue.length;
} else {
// Insert a character in the text input range and use
// that as a marker
textInputRange.text = " ";
precedingRange.setEndPoint("EndToStart", textInputRange);
pos = precedingRange.text.length - 1;
// Delete the inserted character
textInputRange.moveStart("character", -1);
textInputRange.text = "";
}
} else {
// Easier case where input value contains no line breaks
precedingRange.setEndPoint("EndToStart", textInputRange);
pos = precedingRange.text.length;
}
return pos;
}
}
return 0;
}
function getTextAreaSelection(textarea) {
var start = getSelectionBoundary(textarea, true),
end = getSelectionBoundary(textarea, false);
return {
start: start,
end: end,
length: end - start,
text: textarea.value.slice(start, end)
};
}
function detectPaste(textarea, callback) {
textarea.onpaste = function() {
var sel = getTextAreaSelection(textarea);
var initialLength = textarea.value.length;
window.setTimeout(function() {
var val = textarea.value;
var pastedTextLength = val.length - (initialLength - sel.length);
var end = sel.start + pastedTextLength;
callback({
start: sel.start,
end: end,
length: pastedTextLength,
text: val.slice(sel.start, end),
replacedText: sel.text
});
}, 1);
};
}
window.onload = function() {
var textarea = document.getElementById("your_textarea");
detectPaste(textarea, function(pasteInfo) {
var val = textarea.value;
// Delete the pasted text and restore any previously selected text
textarea.value = val.slice(0, pasteInfo.start) +
pasteInfo.replacedText + val.slice(pasteInfo.end);
alert(pasteInfo.text);
});
};

You might now use FilteredPaste.js (http://willemmulder.github.com/FilteredPaste.js/) instead. It will let you control what content gets pasted into a textarea or contenteditable and you will be able to filter/change/extract content at will.

A quick search shows me that there are different methods for different browsers. I'm not sure if jQuery has a solution. Prototype.js does not appear to have one. Maybe YUI can do this for you?
You can also use TinyMCE, since it does have a gazillion of different event triggers. It is a full fledged word processor, but you can use it as plain text if you want. It might be a bit too much weight to add though. For example, upon initiation, it turns your <textarea> into an iFrame with several sub. But it will do what you ask.
--Dave

Develop Reference

JavaScript is the programming language of the Web.

Extracting text from a contentEditable div - javascript

see this fiddle Or this post How to parse editable DIV's text with browser compatibility created after lot of effort...........

Related

"False" text node causing trouble

how to use jquery to insert a character at specified position into monospaced textarea

How to get startOffset/endOffset of selected html text?

Javascript - Only works if I do an alert()

javascript catch paste event in textarea

Categories

Resources