I'm working on a DOM traversal type of script and I'm almost finished with it. However, there is one problem that I've encountered and for the life of me, I can't figure out what to do to fix it. Pardon my ineptitude, as I'm brand new to JS/JQuery and I'm still learning the ropes.
Basically, I'm using Javascript/JQuery to create an "outline", representing the structure of an HTML page, and appending the "outline" to the bottom of the webpage. For example, if the HTML is this...
<html>
<head>
</head>
<body>
<h1>Hello World</h1>
<script src=”http://code.jquery.com/jquery-2.1.0.min.js” type=”text/javascript”>
</script>
<script src=”outline.js” type=”text/javascript”></script>
</body>
</html>
Then the output should be an unordered list like this:
html
head
body
h1
text(Hello World)
script src(”http://code.jquery.com/jquery-2.1.0.min.js”) type(”text/javascript”)
script src(”outline.js”) type(”text/javascript”)
Here's what I've got so far:
var items=[];
$(document).ready(function(){
$("<ul id = 'list'></ul>").appendTo("body");
traverse(document, function (node) {
if(node.nodeName.indexOf("#") <= -1){
items.push("<ul>"+"<li>"+node.nodeName.toLowerCase());
}
else {
var x = "text("+node.nodeValue+")";
if(node.nodeValue == null) {
items.push("<li> document");
}
else if(/[a-z0-9]/i.test(node.nodeValue) && node.nodeValue != null) {
items.push("<ul><li>"+ x +"</ul>");
}
else {
items.push("</ul>");
}
}
});
$('#list').append(items.join(''));
});
function traverse(node, func) {
func(node);
node = node.firstChild;
while (node) {
traverse(node, func);
node = node.nextSibling;
}
}
It works almost perfectly, except it seems to read a carriage return as a text node. For example, if there's
<head><title>
it reads that properly, adding head as an unordered list element, and then creating a new "unordered list" for title, which is nested inside the header. HOWEVER, if it's
<head>
<title>
It makes the new unordered list and its element, "head", but then jumps to the else statement that does items.push(</ul>) . How do I get it to ignore the carriage return? I tried testing to see if the nodeValue was equal to the carriage return, \r, but that didn't seem to do the trick.
I'm having a bit of a hard time understanding exactly which text nodes you want to skip. If you just want to skip a text node that is only whitespace, you can do that like this:
var onlyWhitespaceRegex = /^\s*$/;
traverse(document, function (node) {
if (node.nodeType === 3 && onlyWhitespaceRegex.test(node.nodeValue) {
// skip text nodes that contain only whitespace
return;
}
else if (node.nodeName.indexOf("#") <= -1){
items.push("<ul>"+"<li>"+node.nodeName.toLowerCase());
} else ...
Or, maybe you just want to trim any multiple leading or trailing whitespaces off a text node before displaying it since it may not display in HTML.
var trimWhitespaceRegex = /^\s+|\s+$/g;
traverse(document, function (node) {
if(node.nodeName.indexOf("#") <= -1){
items.push("<ul>"+"<li>"+node.nodeName.toLowerCase());
} else {
var text = node.nodeValue;
if (node.nodeType === 3) {
text = text.replace(trimWhitespaceRegex, " ");
}
var x = "text("+text+")";
if(node.nodeValue == null) {
items.push("<li> document");
} ....
A further description of exactly what you're trying to achieve in the output for various forms of different text nodes would help us better understand your requirements.
Related
I am new to javascript but understand jQuery. I am trying to use this code to convert www. and http in p tags to working links.
Here is the code I am using, the problem is that I do not fully understand how the code works, could anybody please explain?
<script>
var re = /(http:\/\/[^ ]+)/g;
function createLinks(els) {
$(els).contents().each(function () {
if (this.nodeType === 1 && this.nodeName !== 'script') {
createLinks(this);
} else if (this.nodeType === 3 && this.data.match(re)) {
var markup = this.data.replace(re, '$1');
$(this).replaceWith(markup);
}
});
}
createLinks(document.body);
</script>
First, you set regular expression template for matching text which starts from "http://"
Second, you create recursive function which traverse whole html document.
nodeType == 1 means that current element is html tag (i.e. a, p, div etc)
nodeType == 2 means that element is Attribute
nodeType == 3 means that element is text node
So when you found html tag, you're searching inside it,
when you found text node, you are checking via regular expression, if this text starts from "http://", if so you change and replce this text to yourmatchedurl
in the end you call your function to start from body as a root
ok, here goes...
//create a regular expression to format the link
var re = /(http:\/\/[^ ]+)/g;
//this is the create links function which gets called below, "els" is the elements passed to the function (document.body)
function createLinks(els) {
//for each of the elements in the body
$(els).contents().each(function () {
//check if its an element type but not a script
if (this.nodeType === 1 && this.nodeName !== 'script') {
//call the create links function and send in this object
createLinks(this);
//if its not an element but is a text node and the format matches the regular expression
} else if (this.nodeType === 3 && this.data.match(re)) {
//create the markup
var markup = this.data.replace(re, '$1');
//finally, replace this link with the marked up link
$(this).replaceWith(markup);
}
});
}
//call the create links function
createLinks(document.body);
I hope the commented code helps you understand.
I have the following problem. Once I add a blockquote in contenteditable, by pressing Enter key it moves to a new line and adds another blockquote element. It goes on forever, and I can’t escape the formatting. The desired functionality would be that of the unordered list. When you press the Enter key it adds a new empty <li> element, but if you press Enter again, it escapes the formatting, removes the previously created <li> and adds a <p>.
Check out the demo: http://jsfiddle.net/wa9pM/
One hack I found was to create an empty <p> under the blockquote, before you create a blockquote. But is there a way to break this formatting behaviour with JavaScript? No idea how I would check: if where the cursor is, it’s the end of the line and if it’s a blockquote and on Enter key press, don’t add a new blockquote.
I’m using this code to generate a blockquote in JS:
document.execCommand('formatBlock', false, 'blockquote');
While creating a rich text editor for an iOS application i faced the same problem. Every time i've inserted a <blockquote> tag in my text field and pressed Enter, it was impossible to get rid off the block-quote.
After researching a bit, i've found a working solution.
Finding inner HTML tags:
function whichTag(tagName){
var sel, containerNode;
var tagFound = false;
tagName = tagName.toUpperCase();
if (window.getSelection) {
sel = window.getSelection();
if (sel.rangeCount > 0) {
containerNode = sel.getRangeAt(0).commonAncestorContainer;
}
}else if( (sel = document.selection) && sel.type != "Control" ) {
containerNode = sel.createRange().parentElement();
}
while (containerNode) {
if (containerNode.nodeType == 1 && containerNode.tagName == tagName) {
tagFound = true;
containerNode = null;
}else{
containerNode = containerNode.parentNode;
}
}
return tagFound;
}
Checking for occurrences of the block-quote tag:
function checkBlockquote(){
var input = document.getElementById('text_field_id');
input.onkeydown = function() {
var key = event.keyCode || event.charCode;
if( key == 13){
if (whichTag("blockquote")){
document.execCommand('InsertParagraph');
document.execCommand('Outdent');
}
}
};
}
Triggering the key down events:
<body onLoad="checkBlockquote();">
<!-- stuff... -->
</body>
I believe the code above can be adjusted to fit your needs easily. If you need further help, feel free to ask.
Something like this did the work for me (at least on Chrome and Safari).
Demo at http://jsfiddle.net/XLPrw/
$("[contenteditable]").on("keypress", function(e) {
var range = window.getSelection().getRangeAt();
var element = range.commonAncestorContainer;
if(element.nodeName == "BLOCKQUOTE") {
element.parentElement.removeChild(element);
}
});
Didn't make any extensive test, but it looks like range.commonAncestorElement returns the current textnode in case the blockquote contains text, or the blockquote element itself in case it contains no textnode (on Chrome, a <br> is added and caret is positioned after it). You can remove the newly created blockquote in this case. Anyway, after deleting the element the caret looks like getting positioned somewhere upon the contenteditable, although typing confirms that it's right after the original blackquote.
Hope this points you to a more conclusive solution.
Super late answer, but this was a much simpler solution for me. Hopefully it helps anyone else looking. Browser compatibility may vary.
YOUR_EDITABLE_ELEMENT.addEventListener('keyup', e => {
if (e.which || e.keyCode === 13) {
if (document.queryCommandValue('formatBlock') === 'blockquote') {
exec('formatBlock', '<P>')
}
}
})
How can I (efficiently - not slowing the computer [cpu]) highlight a specific part of a page?
Lets say that my page is as so:
<html>
<head>
</head>
<body>
"My generic words would be selected here" !.
<script>
//highlight code here
var textToHighlight = 'selected here" !';
//what sould I write here?
</script>
</body>
</html>
My idea is to "clone" all the body into a variable and find via indexOf the specified text, change(insert a span with a background-color) the "cloned" string and replace the "real" body with the "cloned" one.
I just think that it isn't efficient.
Do you have any other ideas? (be creative :) )
I've adapted the following from my answers to several similar questions on SO (example). It's designed to be reusable and has proved to be so. It traverses the DOM within a container node you specify, searching each text node for the specified text and using DOM methods to split the text node and surround the relevant chunk of text in a styled <span> element.
Demo: http://jsfiddle.net/HqjZa/
Code:
// Reusable generic function
function surroundInElement(el, regex, surrounderCreateFunc) {
// script and style elements are left alone
if (!/^(script|style)$/.test(el.tagName)) {
var child = el.lastChild;
while (child) {
if (child.nodeType == 1) {
surroundInElement(child, regex, surrounderCreateFunc);
} else if (child.nodeType == 3) {
surroundMatchingText(child, regex, surrounderCreateFunc);
}
child = child.previousSibling;
}
}
}
// Reusable generic function
function surroundMatchingText(textNode, regex, surrounderCreateFunc) {
var parent = textNode.parentNode;
var result, surroundingNode, matchedTextNode, matchLength, matchedText;
while ( textNode && (result = regex.exec(textNode.data)) ) {
matchedTextNode = textNode.splitText(result.index);
matchedText = result[0];
matchLength = matchedText.length;
textNode = (matchedTextNode.length > matchLength) ?
matchedTextNode.splitText(matchLength) : null;
surroundingNode = surrounderCreateFunc(matchedTextNode.cloneNode(true));
parent.insertBefore(surroundingNode, matchedTextNode);
parent.removeChild(matchedTextNode);
}
}
// This function does the surrounding for every matched piece of text
// and can be customized to do what you like
function createSpan(matchedTextNode) {
var el = document.createElement("span");
el.style.backgroundColor = "yellow";
el.appendChild(matchedTextNode);
return el;
}
// The main function
function wrapText(container, text) {
surroundInElement(container, new RegExp(text, "g"), createSpan);
}
wrapText(document.body, "selected here");
<html>
<head>
</head>
<body>
<p id="myText">"My generic words would be selected here" !.</p>
<script>
//highlight code here
var textToHighlight = 'selected here" !';
var text = document.getElementById("myText").innerHTML
document.getElementById("myText").innerHTML = text.replace(textToHighlight, '<span style="color:red">'+textToHighlight+'</span>');
//what sould I write here?
</script>
</body>
</html>
Use this in combination with this and you should be pretty ok. (It is almost better than trying to implement selection / selection-highlighting logic yourself.)
Container is a div i've added some basic HTML to.
The debug_log function is printing the following:
I'm in a span!
I'm in a div!
I'm in a
p
What happened to the rest of the text in the p tag ("aragraph tag!!"). I think I don't understand how exactly to walk through the document tree. I need a function that will parse the entire document tree and return all of the elements and their values. The code below is sort of a first crack at just getting all of the values displayed.
container.innerHTML = '<span>I\'m in a span! </span><div> I\'m in a div! </div><p>I\'m in a <span>p</span>aragraph tag!!</p>';
DEMO.parse_dom(container);
DEMO.parse_dom = function(ele)
{
var child_arr = ele.childNodes;
for(var i = 0; i < child_arr.length; i++)
{
debug_log(child_arr[i].firstChild.nodeValue);
DEMO.parse_dom(child_arr[i]);
}
}
Generally when traversing the DOM, you want to specify a start point. From there, check if the start point has childNodes. If it does, loop through them and recurse the function if they too have childNodes.
Here's some code that outputs to the console using the DOM form of these nodes (I used the document/HTML element as a start point). You'll need to run an if against window.console if you're allowing non-developers to load this page/code and using console:
recurseDomChildren(document.documentElement, true);
function recurseDomChildren(start, output)
{
var nodes;
if(start.childNodes)
{
nodes = start.childNodes;
loopNodeChildren(nodes, output);
}
}
function loopNodeChildren(nodes, output)
{
var node;
for(var i=0;i<nodes.length;i++)
{
node = nodes[i];
if(output)
{
outputNode(node);
}
if(node.childNodes)
{
recurseDomChildren(node, output);
}
}
}
function outputNode(node)
{
var whitespace = /^\s+$/g;
if(node.nodeType === 1)
{
console.log("element: " + node.tagName);
}else if(node.nodeType === 3)
{
//clear whitespace text nodes
node.data = node.data.replace(whitespace, "");
if(node.data)
{
console.log("text: " + node.data);
}
}
}
Example: http://jsfiddle.net/ee5X6/
In
<p>I\'m in a <span>p</span>aragraph tag!!</p>
you request the first child, which is the text node containing "I\'m in a".
The text "aragraph tag!!" is the third child, which is not logged.
Curiously, the last line containing "p" should never occur, because the span element is not a direct child of container.
I'm not sure it is what you need or if it is possible in your environment but jQuery can accomplish something similar quite easily. Here is a quick jQuery example that might work.
<html>
<head>
<script src="INCLUDE JQUERY HERE">
</script>
</head>
<body>
<span>
<span>I\'m in a span! </span><div> I\'m in a div! </div><p>I\'m in a <span>p</span>aragraph tag!!</p>
</span>
<script>
function traverse(elem){
$(elem).children().each(function(i,e){
console.log($(e).text());
traverse($(e));
});
}
traverse($("body").children().first());
</script>
</body>
<html>
Which gives the following console output:
I\'m in a span!
I\'m in a div!
I\'m in a paragraph tag!!
p
I have a div set to contentEditable and styled with "white-space:pre" so it keeps things like linebreaks. In Safari, FF and IE, the div pretty much looks and works the same. All is well. What I want to do is extract the text from this div, but in such a way that will not lose the formatting -- specifically, the line breaks.
We are using jQuery, whose text() function basically does a pre-order DFS and glues together all the content in that branch of the DOM into a single lump. This loses the formatting.
I had a look at the html() function, but it seems that all three browsers do different things with the actual HTML that gets generated behind the scenes in my contentEditable div. Assuming I type this into my div:
1
2
3
These are the results:
Safari 4:
1
<div>2</div>
<div>3</div>
Firefox 3.6:
1
<br _moz_dirty="">
2
<br _moz_dirty="">
3
<br _moz_dirty="">
<br _moz_dirty="" type="_moz">
IE 8:
<P>1</P><P>2</P><P>3</P>
Ugh. Nothing very consistent here. The surprising thing is that MSIE looks the most sane! (Capitalized P tag and all)
The div will have dynamically set styling (font face, colour, size and alignment) which is done using CSS, so I'm not sure if I can use a pre tag (which was alluded to on some pages I found using Google).
Does anyone know of any JavaScript code and/or jQuery plugin or something that will extract text from a contentEditable div in such a way as to preserve linebreaks? I'd prefer not to reinvent a parsing wheel if I don't have to.
Update: I cribbed the getText function from jQuery 1.4.2 and modified it to extract it with whitespace mostly intact (I only chnaged one line where I add a newline);
function extractTextWithWhitespace( elems ) {
var ret = "", elem;
for ( var i = 0; elems[i]; i++ ) {
elem = elems[i];
// Get the text from text nodes and CDATA nodes
if ( elem.nodeType === 3 || elem.nodeType === 4 ) {
ret += elem.nodeValue + "\n";
// Traverse everything else, except comment nodes
} else if ( elem.nodeType !== 8 ) {
ret += extractTextWithWhitespace2( elem.childNodes );
}
}
return ret;
}
I call this function and use its output to assign it to an XML node with jQuery, something like:
var extractedText = extractTextWithWhitespace($(this));
var $someXmlNode = $('<someXmlNode/>');
$someXmlNode.text(extractedText);
The resulting XML is eventually sent to a server via an AJAX call.
This works well in Safari and Firefox.
On IE, only the first '\n' seems to get retained somehow. Looking into it more, it looks like jQuery is setting the text like so (line 4004 of jQuery-1.4.2.js):
return this.empty().append( (this[0] && this[0].ownerDocument || document).createTextNode( text ) );
Reading up on createTextNode, it appears that IE's implementation may mash up the whitespace. Is this true or am I doing something wrong?
Unfortunately you do still have to handle this for the pre case individually per-browser (I don't condone browser detection in many cases, use feature detection...but in this case it's necessary), but luckily you can take care of them all pretty concisely, like this:
var ce = $("<pre />").html($("#edit").html());
if($.browser.webkit)
ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; });
if($.browser.msie)
ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; });
if($.browser.mozilla || $.browser.opera ||$.browser.msie )
ce.find("br").replaceWith("\n");
var textWithWhiteSpaceIntact = ce.text();
You can test it out here. IE in particular is a hassle because of the way is does and new lines in text conversion, that's why it gets the <br> treatment above to make it consistent, so it needs 2 passes to be handled correctly.
In the above #edit is the ID of the contentEditable component, so just change that out, or make this a function, for example:
function getContentEditableText(id) {
var ce = $("<pre />").html($("#" + id).html());
if ($.browser.webkit)
ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; });
if ($.browser.msie)
ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; });
if ($.browser.mozilla || $.browser.opera || $.browser.msie)
ce.find("br").replaceWith("\n");
return ce.text();
}
You can test that here. Or, since this is built on jQuery methods anyway, make it a plugin, like this:
$.fn.getPreText = function () {
var ce = $("<pre />").html(this.html());
if ($.browser.webkit)
ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; });
if ($.browser.msie)
ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; });
if ($.browser.mozilla || $.browser.opera || $.browser.msie)
ce.find("br").replaceWith("\n");
return ce.text();
};
Then you can just call it with $("#edit").getPreText(), you can test that version here.
I forgot about this question until now, when Nico slapped a bounty on it.
I solved the problem by writing the function I needed myself, cribbing a function from the existing jQuery codebase and modifying it to work as I needed.
I've tested this function with Safari (WebKit), IE, Firefox and Opera. I didn't bother checking any other browsers since the whole contentEditable thing is non-standard. It is also possible that an update to any browser could break this function if they change how they implement contentEditable. So programmer beware.
function extractTextWithWhitespace(elems)
{
var lineBreakNodeName = "BR"; // Use <br> as a default
if ($.browser.webkit)
{
lineBreakNodeName = "DIV";
}
else if ($.browser.msie)
{
lineBreakNodeName = "P";
}
else if ($.browser.mozilla)
{
lineBreakNodeName = "BR";
}
else if ($.browser.opera)
{
lineBreakNodeName = "P";
}
var extractedText = extractTextWithWhitespaceWorker(elems, lineBreakNodeName);
return extractedText;
}
// Cribbed from jQuery 1.4.2 (getText) and modified to retain whitespace
function extractTextWithWhitespaceWorker(elems, lineBreakNodeName)
{
var ret = "";
var elem;
for (var i = 0; elems[i]; i++)
{
elem = elems[i];
if (elem.nodeType === 3 // text node
|| elem.nodeType === 4) // CDATA node
{
ret += elem.nodeValue;
}
if (elem.nodeName === lineBreakNodeName)
{
ret += "\n";
}
if (elem.nodeType !== 8) // comment node
{
ret += extractTextWithWhitespace(elem.childNodes, lineBreakNodeName);
}
}
return ret;
}
see this fiddle
Or this post
How to parse editable DIV's text with browser compatibility
created after lot of effort...........
I discovered this today in Firefox:
I pass a contenteditable div who's white-space is set to "pre" to this function, and it works sharply.
I added a line to show how many nodes there are, and a button that puts the output into another PRE, just to prove that the linebreaks are intact.
It basically says this:
For each child node of the DIV,
if it contains the 'data' property,
add the data value to the output
otherwise
add an LF (or a CRLF for Windows)
}
and return the result.
There is an issue, tho. When you hit enter at the end of any line of the original text, instead of putting a LF in, it puts a "Â" in. You can hit enter again and it puts a LF in there, but not the first time. And you have to delete the "Â" (it looks like a space). Go figure - I guess that's a bug.
This doesn't occur in IE8. (change textContent to innerText) There is a different bug there, tho. When you hit enter, it splits the node into 2 nodes, as it does in Firefox, but the "data" property of each one of those nodes then becomes "undefined".
I'm sure there's much more going on here than meets the eye, so any input on the matter will be enlightening.
<!DOCTYPE html>
<html>
<HEAD>
<SCRIPT type="text/javascript">
function htmlToText(elem) {
var outText="";
for(var x=0; x<elem.childNodes.length; x++){
if(elem.childNodes[x].data){
outText+=elem.childNodes[x].data;
}else{
outText+="\n";
}
}
alert(elem.childNodes.length + " Nodes: \r\n\r\n" + outText);
return(outText);
}
</SCRIPT>
</HEAD>
<body>
<div style="white-space:pre;" contenteditable=true id=test>Text in a pre element
is displayed in a fixed-width
font, and it preserves
both spaces and
line breaks
</DIV>
<INPUT type=button value="submit" onclick="document.getElementById('test2').textContent=htmlToText(document.getElementById('test'))">
<PRE id=test2>
</PRE>
</body>
</html>
here's a solution (using underscore and jquery) that seems to work in iOS Safari (iOS 7 and 8), Safari 8, Chrome 43, and Firefox 36 in OS X, and IE6-11 on Windows:
_.reduce($editable.contents(), function(text, node) {
return text + (node.nodeValue || '\n' +
(_.isString(node.textContent) ? node.textContent : node.innerHTML));
}, '')
see test page here: http://brokendisk.com/code/contenteditable.html
although I think the real answer is that if you're not interested in the markup provided by the browser, you shouldn't be using the contenteditable attribute - a textarea would be the proper tool for the job.
this.editableVal = function(cont, opts)
{
if (!cont) return '';
var el = cont.firstChild;
var v = '';
var contTag = new RegExp('^(DIV|P|LI|OL|TR|TD|BLOCKQUOTE)$');
while (el) {
switch (el.nodeType) {
case 3:
var str = el.data.replace(/^\n|\n$/g, ' ').replace(/[\n\xa0]/g, ' ').replace(/[ ]+/g, ' ');
v += str;
break;
case 1:
var str = this.editableVal(el);
if (el.tagName && el.tagName.match(contTag) && str) {
if (str.substr(-1) != '\n') {
str += '\n';
}
var prev = el.previousSibling;
while (prev && prev.nodeType == 3 && PHP.trim(prev.nodeValue) == '') {
prev = prev.previousSibling;
}
if (prev && !(prev.tagName && (prev.tagName.match(contTag) || prev.tagName == 'BR'))) {
str = '\n' + str;
}
}else if (el.tagName == 'BR') {
str += '\n';
}
v += str;
break;
}
el = el.nextSibling;
}
return v;
}