I want to build a chrome app that finds all the strings that look like a telephone and replaces them with a link. I want to only happen for text elements so it doesn't break javascript functions from the websites that the app runs on.
This is what I have so far:
var regex = /((\(\d{3}\) ?)|(\d{3}-))?\d{3}-\d{4}(?!([^<]*>)|(((?!<a).)*<\/a>))/g;
var text = $("body:first").html();
text = text.replace(regex, "$&");
$("body:first").html(text);
but it breaks if there are javascript
Yes, your code just retrieves the markup representation of the current state of the DOM, and overwrites that, losing all event bindings and a lot of other significant data.
What you'll need to do is to iterate through all the text nodes. You can't reach text nodes by the sizzle selector alone, so you'll need to rely on jQuery's contents() function.
You could do something like this to get all the text nodes:
var allTextNodes = $('*').contents().filter(function() {
return this.nodeType == Node.TEXT_NODE;
});
Related
Can the JavaScript command .replace replace text in any webpage? I want to create a Chrome extension that replaces specific words in any webpage to say something else (example cake instead of pie).
The .replace method is a string operation, so it's not immediately simple to run the operation on HTML documents, which are composed of DOM Node objects.
Use TreeWalker API
The best way to go through every node in a DOM and replace text in it is to use the document.createTreeWalker method to create a TreeWalker object. This is a practice that is used in a number of Chrome extensions!
// create a TreeWalker of all text nodes
var allTextNodes = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT),
// some temp references for performance
tmptxt,
tmpnode,
// compile the RE and cache the replace string, for performance
cakeRE = /cake/g,
replaceValue = "pie";
// iterate through all text nodes
while (allTextNodes.nextNode()) {
tmpnode = allTextNodes.currentNode;
tmptxt = tmpnode.nodeValue;
tmpnode.nodeValue = tmptxt.replace(cakeRE, replaceValue);
}
To replace parts of text with another element or to add an element in the middle of text, use DOM splitText, createElement, and insertBefore methods, example.
See also how to replace multiple strings with multiple other strings.
Don't use innerHTML or innerText or jQuery .html()
// the innerHTML property of any DOM node is a string
document.body.innerHTML = document.body.innerHTML.replace(/cake/g,'pie')
It's generally slower (especially on mobile devices).
It effectively removes and replaces the entire DOM, which is not awesome and could have some side effects: it destroys all event listeners attached in JavaScript code (via addEventListener or .onxxxx properties) thus breaking the functionality partially/completely.
This is, however, a common, quick, and very dirty way to do it.
Ok, so the createTreeWalker method is the RIGHT way of doing this and it's a good way. I unfortunately needed to do this to support IE8 which does not support document.createTreeWalker. Sad Ian is sad.
If you want to do this with a .replace on the page text using a non-standard innerHTML call like a naughty child, you need to be careful because it WILL replace text inside a tag, leading to XSS vulnerabilities and general destruction of your page.
What you need to do is only replace text OUTSIDE of tag, which I matched with:
var search_re = new RegExp("(?:>[^<]*)(" + stringToReplace + ")(?:[^>]*<)", "gi");
gross, isn't it. you may want to mitigate any slowness by replacing some results and then sticking the rest in a setTimeout call like so:
// replace some chunk of stuff, the first section of your page works nicely
// if you happen to have that organization
//
setTimeout(function() { /* replace the rest */ }, 10);
which will return immediately after replacing the first chunk, letting your page continue with its happy life. for your replace calls, you're also going to want to replace large chunks in a temp string
var tmp = element.innerHTML.replace(search_re, whatever);
/* more replace calls, maybe this is in a for loop, i don't know what you're doing */
element.innerHTML = tmp;
so as to minimize reflows (when the page recalculates positioning and re-renders everything). for large pages, this can be slow unless you're careful, hence the optimization pointers. again, don't do this unless you absolutely need to. use the createTreeWalker method zetlen has kindly posted above..
have you tryed something like that?
$('body').html($('body').html().replace('pie','cake'));
I have a webapp that must allow users to interactively manipulate strings (words, phrases and so on...)
Example:
given a foobar string, if the user clicks on b the string is split in two and a whitespace is added, resulting in foo bar.
I could put each single character inside a span element, but I fear this would be troublesome for long strings.
Any advice?
This version using jQuery (not necessary) should pretty much do what you need if I understood you correctly:
// Given a textarea with the content
var text = $('textarea').text().split('');
$('textarea').click(function(){
text.splice(this.selectionStart, 0, " ");
this.value = text.join('');
});
It's a very simple and not cross browser enabled example, but it should get you started.
Yes, it will be ok, but setup your event handler not on individual spans, but on the whole container and then see here: http://en.wikipedia.org/wiki/Flyweight_pattern
I need to get the the name of the tag "myChild" and the "content".
This is simple, but i am stuck, sleepy and here is what I get with my tests:
XML:
...
<myParent>
<myChild>content</myChild>
</myParent>
<myParent>
<myChild>content</myChild>
</myParent>
...
JS:
var x=xmlDoc.getElementsByTagName("myParent");
alert(x[1].childNodes[0].nodeName); //returns "#text" - "myChild" needed
alert(x[1].childNodes[0].nodeValue); //returns "" - "content" needed
You want tagName, which is the name of the element. (Sorry about that, for Elements, tagName and nodeName are the same.)
The problem is that the first child of your myParent element isn't the myChild element, it's a text node (containing whitespace). Your structure looks like this:
Element "myParent"
Text node with a carriage return and some spaces or tabs
Element "myChild"
Text node with "content"
Text node with a carriage return and some spaces or tabs
Element "myParent"
Text node with a carriage return and some spaces or tabs
Element "myChild"
Text node with "content"
Text node with a carriage return and some spaces or tabs
You need to navigate down to the actual myChild element, which you can do with getElementsByTagName again, or just by scanning:
var x=xmlDoc.getElementsByTagName("myParent");
var c = x[1].firstChild;
while (c && c.nodeType != 1) { // 1 = ELEMENT_NODE
c = c.nextSibling;
}
alert(c.nodeName); // "myChild"
Note that Elements don't have a meaningful nodeValue property; instead, you collect their child text nodes. (More in the DOM specs: DOM2, DOM3.)
Also note that when indexing into a NodeList, the indexes start at 0. You seem to have started with 1; ignore this comment if you were skipping the first one for a reason.
Off-topic: It's always best to understand the underlying mechanics of what you're working with, and I do recommend playing around with the straight DOM and referring to the DOM specs listed above. But for interacting with these trees, a good library can be really useful and save you a lot of time. jQuery works well with XML data. I haven't used any of the others like Prototype, YUI, Closure, or any of several others with XML, so can't speak to that, but I expect at least some of them support it.
Try x[1].getElementsByTagName('*')[0] instead.
(This is only trustable for index 0, other indexes may return elements that are not child-nodes, if the direct childs contain further element-nodes. )
My problem
I want to clean HTML pasted in a rich text editor (FCK 1.6 at the moment). The cleaning should be based on a whitelist of tags (and perhaps another with attributes). This is not primarily in order to prevent XSS, but to remove ugly HTML.
Currently I see no way to do it on the server, so I guess it must be done in JavaScript.
Current ideas
I found the jquery-clean plugin, but as far as I can see, it is using regexes to do the work, and we know that is not safe.
As I've not found any other JS-based solution I've started to impement one myself using jQuery. It would work by creating a jQuery version of the pasted html ($(pastedHtml)) and then traverse the resulting tree, removing each element not matching the whitelist by looking at the attribute tagName.
My questions
Is this any better?
Can I trust jQuery to represent the pasted
content well (there may be unmatched
ending tags and what-have-you)?
Is there a better solution already that
I couldn't find?
Update
This is my current, jQuery-based, solution (verbose and not extensively tested):
function clean(element, whitelist, replacerTagName) {
// Use div if no replace tag was specified
replacerTagName = replacerTagName || "div";
// Accept anything that jQuery accepts
var jq = $(element);
// Create a a copy of the current element, but without its children
var clone = jq.clone();
clone.children().remove();
// Wrap the copy in a dummy parent to be able to search with jQuery selectors
// 1)
var wrapper = $('<div/>').append(clone);
// Check if the element is not on the whitelist by searching with the 'not' selector
var invalidElement = wrapper.find(':not(' + whitelist + ')');
// If the element wasn't on the whitelist, replace it.
if (invalidElement.length > 0) {
var el = $('<' + replacerTagName + '/>');
el.text(invalidElement.text());
invalidElement.replaceWith(el);
}
// Extract the (maybe replaced) element
var cleanElement = $(wrapper.children().first());
// Recursively clean the children of the original element and
// append them to the cleaned element
var children = jq.children();
if (children.length > 0) {
children.each(function(_index, thechild) {
var cleaned = clean(thechild, whitelist, replacerTagName);
cleanElement.append(cleaned);
});
}
return cleanElement;
}
I am wondering about some points (see comments in the code);
Do I really need to wrap my element in a dummy parent to be able to match it with jQuery's ":not"?
Is this the recommended way to create a new node?
If you leverage the browser's HTML correcting abilities (e.g. you copy the rich text to the innerHTML of an empty div and take the resulting DOM tree), the HTML will be guaranteed to be valid (the way it will be corrected is somewhat browser-dependent). Although this is probably done by rich editor anyways.
jQuery's own text-top DOM transform is probably also safe, but definitely slower, so I would avoid it.
Using a whitelist based on the jQuery selector engine might be somewhat tricky because removing an element while preserving its children might make the document invalid, so the browser would correct it by changing the DOM tree, which might confuse a script trying to iterate through invalid elements. (E.g. you allow ul and li but not ol; the script removes the list root element, naked li elements are invalid so the browser wraps them in ul again, that ul will be missed by the cleaning script.) If you throw away unwanted elements together with all their children, I don't see any problems with that.
I need to construct an xpath string to select all descendants of a certain table with these conditions:
The table is a descendant of a form with a specific action attribute value.
The selected descendants are text nodes.
The text node content can only contain whitespace.
It'll probably look something like:
//form[#action = "submit.html"]//table//text()[ ...? ]
Any tips would be appreciated. Thanks.
Edit: Here is my previous working compromise:
function KillTextNodes(rootpath)
{
XPathIterate(rootpath + '//text()', function(node)
{
var tagname = node.parentNode.tagName;
if (tagname != 'OPTION' && tagname != 'TH')
Kill(node);
});
}
Here is my function based on the accepted answer:
function KillTextNodes(rootpath)
{
XPathIterate(rootpath + '//text()[not(normalize-space())]', function(node) { Kill(node); });
}
To explain my motivation a little - I'm iterating through the DOM with Javascript, and run into the same problem that many others do where unexpected empty text nodes throw off the results. This function helps me out a lot by simply deleting all of the empty text nodes so that my iteration logic can stay simple.
Hi there. I need to construct an xpath
string to select all descendants of a
certain table with these conditions:
•The table is a descendant of a form
with a specific action attribute
value.
•The selected descendants are
text nodes.
•The text node content can
only contain whitespace.
Use:
//form[#action = "submit.html"]//table//text()[not(normalize-space())]
This selects all text nodes that have only white-space in them and that are descendents of any tablethat is a descendent of any form having an action attribute with value "submit.html".
Text nodes containing whitespace only will be stripped from the document representation - i.e. there won't actually be a node. That means you can't access the text itself, but what you can do is match a parent lacking a text node using not() - something like:
//form[#action = "submit.html"]//table//*[not(text())]
Though in your case I would guess that will be far more aggressive than you actually intend. As an aside, be careful with these // matches, they're not very efficient and again very aggressive.
(I've just noticed this isn't an XSLT question! If you're in JS land have you considered using DOM methods to get your list?)