A more efficient way to parse dom elements?

A more efficient way to parse dom elements? - javascript

I have some HTML I need to parse.
Basically I'm walking through the dom of a given element. Grabbing text nodes, and element nodes.
As I come across text nodes, I print them into a different element character by character. Each character is placed into it's own span, with its own style, which was taken from any element node found with a style attached.
So when an element node is found, it's style is applied to any text node detected until another element node is found and the old style is replaced with the new one.
The code below works. If you have a sentence or a short paragraph in the source element it reproduces the text accurately in less than a second. The longer the text gets the longer it takes (duh).
Interestingly, the more text that is already in the destination element, the longer it takes. So if I've ran this function 10 times on the same source element, with the same body of text being processed, it will run slower the 10th time through than the 1st time through, presumably because it's harder to render the text in an element that already has content.
Anyway, I really need to find a way to make this thing run faster.
Lastly, here is an example HTML snippet this thing might need to process:
<span style='blah: blah;'> Some text </span><span>Even more text </span> <p> stuff </p>
The resulting HTML would be:
<span style='blah: blah;'>S</span>
<span style='blah: blah;'>o</span>
<span style='blah: blah;'>m</span>
<span style='blah: blah;'>e</span>
<span style='blah: blah;'> </span>
<span style='blah: blah;'>t</span>
<span style='blah: blah;'>e</span>
<span style='blah: blah;'>x</span>
<span style='blah: blah;'>t</span>
.......
Nothing fancy.
Here's the code:
Code:
ed.rta_to_arr_paste = function(ele, cur_style) {
var child_arr = ele.childNodes;
if(!(is_set(cur_style))) {
cur_style = {};
}
for(var i = 0; i < child_arr.length; i++) {
if(child_arr[i].nodeType == 1) {
if(cur_style != child_arr[i].style) {
cur_style = child_arr[i].style;
}
} else if(child_arr[i].nodeType == 3) {
for(var n = 0; n < child_arr[i].nodeValue.length; n++) {
var span = ed.add_single_char(child_arr[i].nodeValue.charAt(n), cur_style);
}
}
ed.rta_to_arr_paste(child_arr[i], cur_style);
}
}
EDIT:
One example of a system like this being used is in google docs.
When a user pastes text into the document, it's first rendered off screen, then processed with a function similar (I'm assuming) to this one. It then reprints the text in the document.
It all happens extremely fast (unless the text is very long).

It seems you are directly inserting the new elements into the DOM tree, so I think you can get the best improvement by not doing that.
Avoid inserting a lot of elements one by one. Every time you insert an element, the browser has to recalculate the layout of the page and this takes time.
Instead, add the nodes to an element not in the DOM, the best would be using a DocumentFragment, which can be created via document.createDocumentFragment.
Then all you have to do is to insert this fragment and the browser only has to do one recalculation.
Update:
What you could also try is to use regular expressions to convert the text into the span elements.
var html = value.replace(/(.)/g, "<span>$1</span>")
At least in my naive test (not sure if the testcases are good this way), it performs much better than creating span elements and adding them to the document fragment:
Update 2: I adjusted the tests to also set the generated elements/string as content of an element and sadly, this takes away all the speed of using replace. But it might still be worth testing it:
http://jsperf.com/regex-vs-loop
You should also avoid repeated property access:
ed.rta_to_arr_paste = function(ele, cur_style) {
var child_arr = ele.childNodes;
if(!(is_set(cur_style))) {
cur_style = {};
}
for(var i = 0, l = child_arr; i <l; i++) {
var child = child_arr[i];
if(child.nodeType == 1) {
// this will always be true, because `el.style` returns an object
// so comparing it does not make sense. Maybe just override it always
if(cur_style != child.style) {
cur_style = child.style;
}
// doesn't need to be called for other nodes
ed.rta_to_arr_paste(child, cur_style);
}
else if(child.nodeType == 3) {
var value = child.nodeValue;
for(var n = 0, ln = value.length; n < ln; n++) {
ed.add_single_char(value.charAt(n), cur_style);
}
}
}
}

It looks like you are searching the DOM for an element on every call. I would think you would instead attach an event to the DOM elements in something like onload (or better yet use jquery document.ready). I would also (as a minor refactor) first check to make sure you have children ( child_arr.length > 0) prior to call the for loop (this may be totally insignificant but best practice)

Related

Codemirror: Retrieve character index from span tag

So I have a codemirror instance on the left of a document, and an iframe on the right. When the code is updated in the editor, it is written to the iframe.
During this rewrite, I add an index to each and every element that is created using jQuery's $.data function so that whenever the user hovers their mouse over the element it can be highlighted in the editor.
So far I have managed to pick out the required element's position in the editor in terms of where its generated <span class="cm-tag cm-bracket"> tag is and give it a class of cm-custom-highlight.
My question is - is there any way to turn an instance of a source span tag into an actual selection within the editor?
Update: Answered my own question - see below! You can check out my resulting code here.

I answered my own question! How about that?
Turns out CodeMirror has a neat little list of nodes in its display container. All I needed to do was loop through CodeMirror.display.renderedView[lineNumber].measure.map and test each text node's parentNode property to see if it was the same as the span I had highlighted.
The map array is structured like so:
[
0: 0
1: 1
2: text
3: 1
4: 5
...
]
Every text node here refers to a piece of code in the editor and the numbers before and after refer to its character index, so it was pretty easy to find the index that I needed:
var span = $('span.cm-custom-highlight', CodeMirror.display.lineDiv),
lineNumber = span.closest('.CodeMirror-line').closest('div[style]').index(),
lineView = CodeMirror.display.renderedView[lineNumber].measure.map,
char = 0;
for(var i in lineView.measure.map)
{
if(!char &&
typeof lineView.measure.map[i] == 'object' &&
lineView.measure.map[i].parentNode && span[0] == lineView.measure.map[i].parentNode)
{
char = lineView.measure.map[i - 1];
}
}
Sure it's a little messy, but it gets the job done nicely.

You will get better results when using markText rather than directly messing with the editor's DOM. The DOM and view data structure aren't part of the interface, and will change between versions. The editor can also update its DOM at any moment, overriding the changes you made.

How to get cursor HTML offset in contenteditable? [duplicate]

I have a contenteditable div as follow (| = cursor position):
<div id="mydiv" contenteditable="true">lorem ipsum <spanclass="highlight">indol|or sit</span> amet consectetur <span class='tag'>adipiscing</span> elit</div>
I would like to get the current cursor position including html tags. My code :
var offset = document.getSelection().focusOffset;
Offset is returning 5 (full text from the last tag) but i need it to handle html tags. The expected return value is 40. The code has to work with all recents browsers.
(i also checked this : window.getSelection() offset with HTML tags? but it doesn't answer my question).
Any ideas ?

Another way to do it is by adding a temporary marker in the DOM and calculating the offset from this marker. The algorithm looks for the HTML serialization of the marker (its outerHTML) within the inner serialization (the innerHTML) of the div of interest. Repeated text is not a problem with this solution.
For this to work, the marker's serialization must be unique within its div. You cannot control what users type into a field but you can control what you put into the DOM so this should not be difficult to achieve. In my example, the marker is made unique statically: by choosing a class name unlikely to cause a clash ahead of time. It would also be possible to do it dynamically, by checking the DOM and changing the class until it is unique.
I have a fiddle for it (derived from Alvaro Montoro's own fiddle). The main part is:
function getOffset() {
if ($("." + unique).length)
throw new Error("marker present in document; or the unique class is not unique");
// We could also use rangy.getSelection() but there's no reason here to do this.
var sel = document.getSelection();
if (!sel.rangeCount)
return; // No ranges.
if (!sel.isCollapsed)
return; // We work only with collapsed selections.
if (sel.rangeCount > 1)
throw new Error("can't handle multiple ranges");
var range = sel.getRangeAt(0);
var saved = rangy.serializeSelection();
// See comment below.
$mydiv[0].normalize();
range.insertNode($marker[0]);
var offset = $mydiv.html().indexOf($marker[0].outerHTML);
$marker.remove();
// Normalizing before and after ensures that the DOM is in the same shape before
// and after the insertion and removal of the marker.
$mydiv[0].normalize();
rangy.deserializeSelection(saved);
return offset;
}
As you can see, the code has to compensate for the addition and removal of the marker into the DOM because this causes the current selection to get lost:
Rangy is used to save the selection and restore it afterwards. Note that the save and restore could be done with something lighter than Rangy but I did not want to load the answer with minutia. If you decide to use Rangy for this task, please read the documentation because it is possible to optimize the serialization and deserialization.
For Rangy to work, the DOM must be in exactly the same state before and after the save. This is why normalize() is called before we add the marker and after we remove it. What this does is merge immediately adjacent text nodes into a single text node. The issue is that adding a marker to the DOM can cause a text node to be broken into two new text nodes. This causes the selection to be lost and, if not undone with a normalization, would cause Rangy to be unable to restore the selection. Again, something lighter than calling normalize could do the trick but I did not want to load the answer with minutia.

EDIT: This is an old answer that doesn't work for OP's requirement of having nodes with the same text. But it's cleaner and lighter if you don't have that requirement.
Here is one option that you can use and that works in all major browsers:
Get the offset of the caret within its node (document.getSelection().anchorOffset)
Get the text of the node in which the caret is located (document.getSelection().anchorNode.data)
Get the offset of that text within #mydiv by using indexOf()
Add the values obtained in 1 and 3, to get the offset of the caret within the div.
The code would look like this for your particular case:
var offset = document.getSelection().anchorOffset;
var text = document.getSelection().anchorNode.data;
var textOffset = $("#mydiv").html().indexOf( text );
offsetCaret = textOffset + offset;
You can see a working demo on this JSFiddle (view the console to see the results).
And a more generic version of the function (that allows to pass the div as a parameter, so it can be used with different contenteditable) on this other JSFiddle:
function getCaretHTMLOffset(obj) {
var offset = document.getSelection().anchorOffset;
var text = document.getSelection().anchorNode.data;
var textOffset = obj.innerHTML.indexOf( text );
return textOffset + offset;
}
About this answer
It will work in all recent browsers as requested (tested on Chrome 42, Firefox 37, and Explorer 11).
It is short and light, and doesn't require any external library (not even jQuery)
Issue: If you have different nodes with the same text, it may return the offset of the first occurrence instead of the real position of the caret.

NOTE: This solution works even in nodes with repeated text, but it detects html entities (e.g.: ) as only one character.
I came up with a completely different solution based on processing the nodes. It is not as clean as the old answer (see other answer), but it works fine even when there are nodes with the same text (OP's requirement).
This is a description of how it works:
Create a stack with all the parent elements of the node in which the caret is located.
While the stack is not empty, traverse the nodes of the containing element (initially the content editable div).
If the node is not the same one at the top of the stack, add its size to the offset.
If the node is the same as the one at the top of the stack: pop it from the stack, go to step 2.
The code is like this:
function getCaretOffset(contentEditableDiv) {
// read the node in which the caret is and store it in a stack
var aux = document.getSelection().anchorNode;
var stack = [ aux ];
// add the parents to the stack until we get to the content editable div
while ($(aux).parent()[0] != contentEditableDiv) { aux = $(aux).parent()[0]; stack.push(aux); }
// traverse the contents of the editable div until we reach the one with the caret
var offset = 0;
var currObj = contentEditableDiv;
var children = $(currObj).contents();
while (stack.length) {
// add the lengths of the previous "siblings" to the offset
for (var x = 0; x < children.length; x++) {
if (children[x] == stack[stack.length-1]) {
// if the node is not a text node, then add the size of the opening tag
if (children[x].nodeType != 3) { offset += $(children[x])[0].outerHTML.indexOf(">") + 1; }
break;
} else {
if (children[x].nodeType == 3) {
// if it's a text node, add it's size to the offset
offset += children[x].length;
} else {
// if it's a tag node, add it's size + the size of the tags
offset += $(children[x])[0].outerHTML.length;
}
}
}
// move to a more inner container
currObj = stack.pop();
children = $(currObj).contents();
}
// finally add the offset within the last node
offset += document.getSelection().anchorOffset;
return offset;
}
You can see a working demo on this JSFiddle.
About this answer:
It works in all major browsers.
It is light and doesn't require external libraries (apart from jQuery)
It has an issue: html entities like are counted as one character only.

javascript : focusOffset with html tags

I have a contenteditable div as follow (| = cursor position):
<div id="mydiv" contenteditable="true">lorem ipsum <spanclass="highlight">indol|or sit</span> amet consectetur <span class='tag'>adipiscing</span> elit</div>
I would like to get the current cursor position including html tags. My code :
var offset = document.getSelection().focusOffset;
Offset is returning 5 (full text from the last tag) but i need it to handle html tags. The expected return value is 40. The code has to work with all recents browsers.
(i also checked this : window.getSelection() offset with HTML tags? but it doesn't answer my question).
Any ideas ?

Another way to do it is by adding a temporary marker in the DOM and calculating the offset from this marker. The algorithm looks for the HTML serialization of the marker (its outerHTML) within the inner serialization (the innerHTML) of the div of interest. Repeated text is not a problem with this solution.
For this to work, the marker's serialization must be unique within its div. You cannot control what users type into a field but you can control what you put into the DOM so this should not be difficult to achieve. In my example, the marker is made unique statically: by choosing a class name unlikely to cause a clash ahead of time. It would also be possible to do it dynamically, by checking the DOM and changing the class until it is unique.
I have a fiddle for it (derived from Alvaro Montoro's own fiddle). The main part is:
function getOffset() {
if ($("." + unique).length)
throw new Error("marker present in document; or the unique class is not unique");
// We could also use rangy.getSelection() but there's no reason here to do this.
var sel = document.getSelection();
if (!sel.rangeCount)
return; // No ranges.
if (!sel.isCollapsed)
return; // We work only with collapsed selections.
if (sel.rangeCount > 1)
throw new Error("can't handle multiple ranges");
var range = sel.getRangeAt(0);
var saved = rangy.serializeSelection();
// See comment below.
$mydiv[0].normalize();
range.insertNode($marker[0]);
var offset = $mydiv.html().indexOf($marker[0].outerHTML);
$marker.remove();
// Normalizing before and after ensures that the DOM is in the same shape before
// and after the insertion and removal of the marker.
$mydiv[0].normalize();
rangy.deserializeSelection(saved);
return offset;
}
As you can see, the code has to compensate for the addition and removal of the marker into the DOM because this causes the current selection to get lost:
Rangy is used to save the selection and restore it afterwards. Note that the save and restore could be done with something lighter than Rangy but I did not want to load the answer with minutia. If you decide to use Rangy for this task, please read the documentation because it is possible to optimize the serialization and deserialization.
For Rangy to work, the DOM must be in exactly the same state before and after the save. This is why normalize() is called before we add the marker and after we remove it. What this does is merge immediately adjacent text nodes into a single text node. The issue is that adding a marker to the DOM can cause a text node to be broken into two new text nodes. This causes the selection to be lost and, if not undone with a normalization, would cause Rangy to be unable to restore the selection. Again, something lighter than calling normalize could do the trick but I did not want to load the answer with minutia.

EDIT: This is an old answer that doesn't work for OP's requirement of having nodes with the same text. But it's cleaner and lighter if you don't have that requirement.
Here is one option that you can use and that works in all major browsers:
Get the offset of the caret within its node (document.getSelection().anchorOffset)
Get the text of the node in which the caret is located (document.getSelection().anchorNode.data)
Get the offset of that text within #mydiv by using indexOf()
Add the values obtained in 1 and 3, to get the offset of the caret within the div.
The code would look like this for your particular case:
var offset = document.getSelection().anchorOffset;
var text = document.getSelection().anchorNode.data;
var textOffset = $("#mydiv").html().indexOf( text );
offsetCaret = textOffset + offset;
You can see a working demo on this JSFiddle (view the console to see the results).
And a more generic version of the function (that allows to pass the div as a parameter, so it can be used with different contenteditable) on this other JSFiddle:
function getCaretHTMLOffset(obj) {
var offset = document.getSelection().anchorOffset;
var text = document.getSelection().anchorNode.data;
var textOffset = obj.innerHTML.indexOf( text );
return textOffset + offset;
}
About this answer
It will work in all recent browsers as requested (tested on Chrome 42, Firefox 37, and Explorer 11).
It is short and light, and doesn't require any external library (not even jQuery)
Issue: If you have different nodes with the same text, it may return the offset of the first occurrence instead of the real position of the caret.

NOTE: This solution works even in nodes with repeated text, but it detects html entities (e.g.: ) as only one character.
I came up with a completely different solution based on processing the nodes. It is not as clean as the old answer (see other answer), but it works fine even when there are nodes with the same text (OP's requirement).
This is a description of how it works:
Create a stack with all the parent elements of the node in which the caret is located.
While the stack is not empty, traverse the nodes of the containing element (initially the content editable div).
If the node is not the same one at the top of the stack, add its size to the offset.
If the node is the same as the one at the top of the stack: pop it from the stack, go to step 2.
The code is like this:
function getCaretOffset(contentEditableDiv) {
// read the node in which the caret is and store it in a stack
var aux = document.getSelection().anchorNode;
var stack = [ aux ];
// add the parents to the stack until we get to the content editable div
while ($(aux).parent()[0] != contentEditableDiv) { aux = $(aux).parent()[0]; stack.push(aux); }
// traverse the contents of the editable div until we reach the one with the caret
var offset = 0;
var currObj = contentEditableDiv;
var children = $(currObj).contents();
while (stack.length) {
// add the lengths of the previous "siblings" to the offset
for (var x = 0; x < children.length; x++) {
if (children[x] == stack[stack.length-1]) {
// if the node is not a text node, then add the size of the opening tag
if (children[x].nodeType != 3) { offset += $(children[x])[0].outerHTML.indexOf(">") + 1; }
break;
} else {
if (children[x].nodeType == 3) {
// if it's a text node, add it's size to the offset
offset += children[x].length;
} else {
// if it's a tag node, add it's size + the size of the tags
offset += $(children[x])[0].outerHTML.length;
}
}
}
// move to a more inner container
currObj = stack.pop();
children = $(currObj).contents();
}
// finally add the offset within the last node
offset += document.getSelection().anchorOffset;
return offset;
}
You can see a working demo on this JSFiddle.
About this answer:
It works in all major browsers.
It is light and doesn't require external libraries (apart from jQuery)
It has an issue: html entities like are counted as one character only.

Appending elements to DOM with indentation/spacing

Here is an example. Check the console for the result. The first two divs (not appended; above the <script> in the console) have the proper spacing and indention. However, the second two divs do not show the same formatting or white space as the original even though they are completely the same, but appended.
For example the input
var newElem = document.createElement('div');
document.body.appendChild(newElem);
var another = document.createElement('div');
newElem.appendChild(another);
console.log(document.body.innerHTML);
Gives the output
<div><div></div></div>
When I want it to look like
<div>
<div></div>
</div>
Is there any way to generate the proper white space between appended elements and retain that spacing when obtaining it using innerHTML (or a possible similar means)? I need to be able to visually display the hierarchy and structure of the page I'm working on.
I have tried appending it within an element that is in the actual HTML but it has the same behavior
I'd be okay with doing it using text nodes and line breaks as lincolnk suggested, but it needs to affect dynamic results, meaning I cannot use the same .createTextNode(' </br>') because different elements are in different levels of the hierarchy
No jQuery please

I think you're asking to be able to append elements to the DOM, such that the string returned from document.body.innerHTML will be formatted with indentation etc. as if you'd typed it into a text editor, right?
If so, something like this might work:
function indentedAppend(parent,child) {
var indent = "",
elem = parent;
while (elem && elem !== document.body) {
indent += " ";
elem = elem.parentNode;
}
if (parent.hasChildNodes() && parent.lastChild.nodeType === 3 && /^\s*[\r\n]\s*$/.test(parent.lastChild.textContent)) {
parent.insertBefore(document.createTextNode("\n" + indent), parent.lastChild);
parent.insertBefore(child, parent.lastChild);
} else {
parent.appendChild(document.createTextNode("\n" + indent));
parent.appendChild(child);
parent.appendChild(document.createTextNode("\n" + indent.slice(0,-2)));
}
}
demo: http://jsbin.com/ilAsAki/28/edit
I've not put too much thought into it, so you might need to play with it, but it's a starting point at least.
Also, i've assumed an indentation of 2 spaces as that's what you seemed to be using.
Oh, and you'll obviously need to be careful when using this with a <pre> tag or anywhere the CSS is set to maintain the whitespace of the HTML.

You can use document.createTextNode() to add a string directly.
var ft = document.createElement('div');
document.body.appendChild(ft);
document.body.appendChild(document.createTextNode(' '));
var another = document.createElement('div');
document.body.appendChild(another);
console.log(document.body.innerHTML);

javascript get id of "commonAncestorContainer"

So I feel like I'm a bit out of my league here. But here's what I want to do.
Basically, I want to have a user select part of text within in a paragraph (which may contain many elemnts (i.e. <span> and <a>) to return the value of the id attribute of that paragraph. Here's what I thinking.
function getParaID() //function will be called using a mouseUp event
{
var selObj = window.getSelection();
var selRange = selObj.getRangeAt(0); //btw can anyone explain what this zero means
var paraElement = selRange.commonAncestorContainer;
var paraID = paraElement.getAttribute;
return paraID;
}
What do you think? Am I close?

The selection range's commonAncestorContainer property may be a reference to a text node, or a <span> or <a> element or <body> element, or whatever else you may have in your page. That being the case, you need to work up the DOM tree to find the containing <p> element, if one exists. You also need to be aware that IE < 9 does not support window.getSelection() or DOM Range, although it is possible to do what you want quite easily in IE < 9. Here's some code that will work in all major browsers, including IE 6:
jsFiddle: http://jsfiddle.net/44Juf/
Code:
function getContainingP(node) {
while (node) {
if (node.nodeType == 1 && node.tagName.toLowerCase() == "p") {
return node;
}
node = node.parentNode;
}
}
function getParaID() {
var p;
if (window.getSelection) {
var selObj = window.getSelection();
if (selObj.rangeCount > 0) {
var selRange = selObj.getRangeAt(0);
p = getContainingP(selRange.commonAncestorContainer);
}
} else if (document.selection && document.selection.type != "Control") {
p = getContainingP(document.selection.createRange().parentElement());
}
return p ? p.id : null;
}
Regarding the 0 passed to getRangeAt(), that is indicating which selected range you want. Firefox supports multiple selected ranges: if you make a selection and then hold down Ctrl and make another selection, you will see you now have two discontinous ranges selected, which can be accessed via getRangeAt(0) and getRangeAt(1). Also in Firefox, selecting a column of cells in a table creates a separate range for each selected cell. The number of selected ranges can be obtained using the rangeCount property of the selection. No other major browser supports multiple selected ranges.

You're quite close. If all you want is the id of the parent element, then you should replace your paraElement.getAttribute with paraElement.id, like:
var paraID = paraElement.id;
Regarding the parameter to getRangeAt(), it is specifying the index of the selection range to return, and it's only really relevant to controls that allow discontinuous selections. For instance, a select box in which the user can use ctrl + click to select several arbitrary groups of rows simultaneously. In such a case you could use the parameter to step from one selected region to the next. But for highlighting text within a paragraph you should never have a discontinuous selection and thus can always pass 0. In essence it means that you're asking for "the first selected region".
Also note that if your interface allows the user's selection to span multiple paragraphs then your commonAncestorContainer may not be a paragraph, it might also be whatever element it is that contains all of your paragraph tags. So you should be prepared to handle that case.
Edit:
After playing with this a bit, here is my suggestion: http://jsfiddle.net/vCsZH/
Basically, instead of relying on commonAncestorContainer this code applies a mouseDown and a mouseUp listener to each paragraph (in addition to the one already applied to the top-level container). The listeners will, in essence, record the paragraphs that the selection range starts and ends at, making it much simpler to reliably work out which paragraph(s) are selected.
If ever there was a case in favor of using dynamic event binding through a framework like jQuery, this is it.

Develop Reference

JavaScript is the programming language of the Web.