Is there an equivalent for string.find() in window objects? - javascript

So I know window.find(), an unstandard js object that finds a string on the page, it returns true if found and false if not.
Now is there somthing similar to string.replace(), but is a window object (ex : window.replace()) that would replace all concurrent elements to soemthing else (eg replace all of the "Hi" to "Hello")?

I don't think there is, but it's easier to write than you might suspect. You just walk the DOM looking for Text nodes and use replace on their nodeValue:
function replaceAll(element, regex, replacement) {
for (var child = element.firstChild;
child;
child = child.nextSibling) {
if (child.nodeType === 3) { // Text
child.nodeValue = child.nodeValue.replace(regex, replacement);
} else if (child.nodeType === 1) { // Element
replaceAll(child, regex, replacement);
}
}
}
There I used a regular expression (which needs to have the g flag) to get the "global" behavior when doing the replace, and for flexibility.
Live Example:
function replaceAll(element, regex, replacement) {
for (var child = element.firstChild;
child;
child = child.nextSibling) {
if (child.nodeType === 3) { // Text
child.nodeValue = child.nodeValue.replace(regex, replacement);
} else if (child.nodeType === 1) { // Element
replaceAll(child, regex, replacement);
}
}
}
setTimeout(function() {
replaceAll(document.body, /one/g, "two");
}, 800);
<div>
Here's one.
<p>And here's one.</p>
<p>And here's <strong>one</strong>
</div>
If you want to use a simple string instead of a regular expression, just use a regular expression escape function (such as the ones in the answers to this question and build your regex like this:
var regex = new RegExp(yourEscapeFunction(simpleString), "g");
The case this doesn't handle is where the target string crosses text nodes, like this:
<span>ex<span>ample</span></span>
Using the function above looking for "example", you wouldn't find it. I leave it as an exercise for the reader to handle that case if desired... :-)

Related

Filter by class names for changing text, using Javascript TreeWalker

Following that discussion, I'd like to understand how to use TreeWalker to change text on textNodes filtered by class names.
I'd like to replace all numbers with "x" on <p class="french"> only.
There's a piece of solution here, but the final jQuery is not practical for me. I'd like to understand why my following solution doesn't work.
myfilter = function(node){
if (node.className=="french")
return NodeFilter.FILTER_ACCEPT
else
return NodeFilter.FILTER_SKIP;
}
var walker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT, myfilter, false);
while (walker.nextNode()) {
walker.currentNode.nodeValue.replace(/(\d)/gu, 'x');
}
<body>
<p>890008900089000</p>
<p class="french">890008900089000</p>
</body>
Your code
walks over all text nodes (NodeFilter.SHOW_TEXT)
where className=="french" and
replaces each digit in their value with "x"
There are several problems.
Text nodes have no class. You have to test their parent nodes class.
Strings are immutable and you don't use the result of the replacement.
You actually want to replace all numbers with "x", not all digits. (?)
So change node.className to node.parentNode.className, \d to \d+, and assign the result of String#replace back to walker.currentNode.nodeValue:
myfilter = function(node){
if (node.parentNode.className=="french")
return NodeFilter.FILTER_ACCEPT
else
return NodeFilter.FILTER_SKIP;
}
var walker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT, myfilter, false);
while (walker.nextNode()) {
walker.currentNode.nodeValue = walker.currentNode.nodeValue.replace(/\d+/g, 'x');
}

How to get numbers in elements' inner text by javascript's regex

I want to get numbers in the inner text of an html by javascript regex to replace them.
for example in the below code I want to get 1,2,3,4,5,6,1,2,3,1,2,3, but not the 444 inside of the div tag.
<body>
aaaa123aaa456
<div style="background: #444">aaaa123aaaa</div>
aaaa123aaa
</body>
What could be the regular expression?
Your best bet is to use innerText or textContent to get at the text without the tags and then just use the regex /\d/g to get the numbers.
function digitsInText(rootDomNode) {
var text = rootDomNode.textContent || rootDomNode.innerText;
return text.match(/\d/g) || [];
}
For example,
alert(digitsInText(document.body));
If your HTML is not in the DOM, you can try to strip the tags yourself : JavaScript: How to strip HTML tags from string?
Since you need to do a replacement, I would still try to walk the DOM and operate on text nodes individually, but if that is out of the question, try
var HTML_TOKEN = /(?:[^<\d]|<(?!\/?[a-z]|!--))+|<!--[\s\S]*?-->|<\/?[a-z](?:[^">']|"[^"]*"|'[^']*')*>|(\d+)/gi;
function incrementAllNumbersInHtmlTextNodes(html) {
return html.replace(HTML_TOKEN, function (all, digits) {
if ("string" === typeof digits) {
return "" + (+digits + 1);
}
return all;
});
}
then
incrementAllNumbersInHtmlTextNodes(
'<b>123</b>Hello, World!<p>I <3 Ponies</p><div id=123>245</div>')
produces
'<b>124</b>Hello, World!<p>I <4 Ponies</p><div id=123>246</div>'
It will get confused around where special elements like <script> end and won't recognize digits that are entity encoded, but should work otherwise.
You don't necessarily need RegExp to get the text contents of an element excluding its descendant elements' — in fact I'd advise against it as RegExp matching for HTML is notoriously difficult — there are DOM solutions:
function getImmediateText(element){
var text = '';
// Text and elements are all DOM nodes. We can grab the lot of immediate descendants and cycle through them.
for(var i = 0, l = element.childNodes.length, node; i < l, node = element.childNodes[i]; ++i){
// nodeType 3 is text
if(node.nodeType === 3){
text += node.nodeValue;
}
}
return text;
}
var bodyText = getImmediateText(document.getElementsByTagName('body')[0]);
So here there's a function that will return only the immediate text content as a string. Of course, you could then strip that for numbers with the RegExp using something like this:
var numberString = bodyText.match(/\d+/g).join('');
Just to answer my old question:
It is possible to achieve it by lookahead.
/\d(?=[^<>]*(<|$))/g
to replace the numbers
html.replace(/\d(?=[^<>]*(<|$))/g, function($0) {
return map[$0]
});
the source of the answer https://www.drupal.org/node/619198#comment-5710052

How can I remove all instances of a specific text-phrase?

In a situation that the body area of a webpage is the only accessible part, is there a way to remove all instances of a particular text-phrase (written in HTML) using inline JavaScript or another inline capable language?
This could be useful in many situations, such as people using a Tiny.cc/customurl and wanting to remove the portion stating "tiny.cc/"
If specifics are allowed, we're modifying a calendar plugin using Tiny.cc to create a custom URLs (tiny.cc/customurl). The plugin shows the full URL by default so we'd like to strip the text "tiny.cc/" and keep the "customurl" portion in our code:
<div class="ews_cal_grid_custom_item_3">
<div class="ews_cal_grid_select_checkbox_clear" id="wGridTagChk" onclick="__doPostBack('wGridTagChk', 'tiny.cc/Baseball-JV');" > </div>
tiny.cc/Baseball-JV
</div>
The part we'd like to remove is the http://tiny.cc/ on the 3rd line by itself.
To do this without replacing all the HTML (which wrecks all event handlers) and to do it without recursion (which is generally faster), you can do this:
function removeText(top, txt) {
var node = top.firstChild, index;
while(node && node != top) {
// if text node, check for our text
if (node.nodeType == 3) {
// without using regular expressions (to avoid escaping regex chars),
// replace all copies of this text in this text node
while ((index = node.nodeValue.indexOf(txt)) != -1) {
node.nodeValue = node.nodeValue.substr(0, index) + node.nodeValue.substr(index + txt.length);
}
}
if (node.firstChild) {
// if it has a child node, traverse down into children
node = node.firstChild;
} else if (node.nextSibling) {
// if it has a sibling, go to the next sibling
node = node.nextSibling;
} else {
// go up the parent chain until we find a parent that has a nextSibling
// so we can keep going
while ((node = node.parentNode) != top) {
if (node.nextSibling) {
node = node.nextSibling;
break;
}
}
}
}
}​
Working demo here: http://jsfiddle.net/jfriend00/2y9eH/
To do this on the entire document, you would just call:
removeText(document.body, "http://tiny.cc/Baseball-JV");
As long as you can supply the data in string format, you can use Regular Expressions to do this for you.
You could parse the whole innerHTML of the body tag, if that is all that you can access. This is a slow and kinda-bad-practice method, but for explanation's sake:
document.body.innerHTML = document.body.innerHTML.replace(
/http:\/\/tiny\.cc\//i, // The regular expression to search for
""); // Waht to replace with (nothing).
The whole expression is contained within forward slashes, so any forward slashes inside the regexp need to be escaped with a backslash.
This goes for other characters that have special meaning in regexp, such as the period. A single period (.) denotes matching 'any' character. To match a period, it must be escaped (\.)
EDIT:
If you wish to keep the reference to the URL in the onclick, you can modify the regexp to not match when inside single quotes (as your example):
/([^']http:\/\/tiny\.cc\/[^'])/i
If you don't want to replace all the instances of that string in the HTML, then you'll have to recursively iterate over the node structure, for instance:
function textFilter(element, search, replacement) {
for (var i = 0; i < element.childNodes.length; i++) {
var child = element.childNodes[i];
var nodeType = child.nodeType;
if (nodeType == 1) { // element
textFilter(child, search, replacement);
} else if (nodeType == 3) { // text node
child.nodeValue = child.nodeValue.replace(search, replacement);
}
}
}
Then you just grab hold of the appropriate element, and call this function on it:
var el = document.getElementById('target');
textFilter(el, /http:\/\/tiny.cc\//g, "");​ // You could use a regex
textFilter(el, "Baseball", "Basketball");​ // or just a simple string

Help write regex that will surround certain text with <strong> tags, only if the <strong> tag isn't present

I have several posts on a website; all these posts are chat conversations of this type:
AD: Hey!
BC: What's up?
AD: Nothing
BC: Okay
They're marked up as simple paragraphs surrounded by <p> tags.
Using the javascript replace function, I want all instances of "AD" in the beginning of a conversation (ie, all instances of "AD" at the starting of a line followed by a ":") to be surrounded by <strong> tags, but only if the instance isn't already surrounded by a <strong> tag.
What regex should I use to accomplish this? Am I trying to do what this advises against?
The code I'm using is like this:
var posts = document.getElementsByClassName('entry-content');
for (var i = 0; i < posts.length; i++) {
posts[i].innerHTML = posts[i].innerHTML.replace(/some regex here/,
'replaced content here');
}
If AD: is always at the start of a line then the following regex should work, using the m switch:
.replace(/^AD:/gm, "<strong>AD:</strong>");
You don't need to check for the existence of <strong> because ^ will match the start of the line and the regex will only match if the sequence of characters that follows the start of the line are AD:.
You're not going against the "Don't use regex to parse HTML" advice because you're not parsing HTML, you're simply replacing a string with another string.
An alternative to regex would be to work with ranges, creating a range selecting the text and then using execCommand to make the text bold. However, I think this would be much more difficult and you would likely face differences in browser implementations. The regex way should be enough.
After seeing your comment, the following regex would work fine:
.replace(/<(p|br)>AD:/gm, "<$1><strong>AD:</strong>");
Wouldn't it be easier to set the class or style property of found paragraph to text-weight: bold or a class that does roughly the same? That way you wouldn't have to worry about adding in tags, or searching for existing tags. Might perform better, too, if you don't have to do any string replaces.
If you really want to add the strong tags anyway, I'd suggest using DOM functions to find childNodes of your paragraph that are <strong>, and if you don't find one, add it and move the original (text) childNode of the paragraph into it.
Using regular expressions on the innerHTML isn't reliable and will potentially lead to problems. The correct way to do this is a tiresome process but is much more reliable.
E.g.
for (var i = 0, l = posts.length; i < l; i++) {
findAndReplaceInDOM(posts[i], /^AD:/g, function(match, node){
// Make sure current node does note have a <strong> as a parent
if (node.parentNode.nodeName.toLowerCase() === 'strong') {
return false;
}
// Create and return new <strong>
var s = document.createElement('strong');
s.appendChild(document.createTextNode(match[0]));
return s;
});
}
And the findAndReplaceInDOM function:
function findAndReplaceInDOM(node, regex, replaceFn) {
// Note: regex MUST have global flag
if (!regex || !regex.global || typeof replaceFn !== 'function') {
return;
}
var start, end, match, parent, leftNode,
rightNode, replacementNode, text,
d = document;
// Loop through all childNodes of "node"
if (node = node && node.firstChild) do {
if (node.nodeType === 1) {
// Regular element, recurse:
findAndReplaceInDOM(node, regex, replaceFn);
} else if (node.nodeType === 3) {
// Text node, introspect
parent = node.parentNode;
text = node.data;
regex.lastIndex = 0;
while (match = regex.exec(text)) {
replacementNode = replaceFn(match, node);
if (!replacementNode) {
continue;
}
end = regex.lastIndex;
start = end - match[0].length;
// Effectively split node up into three parts:
// leftSideOfReplacement + REPLACEMENT + rightSideOfReplacement
leftNode = d.createTextNode( text.substring(0, start) );
rightNode = d.createTextNode( text.substring(end) );
parent.insertBefore(leftNode, node);
parent.insertBefore(replacementNode, node);
parent.insertBefore(rightNode, node);
// Remove original node from document
parent.removeChild(node);
}
}
} while (node = node.nextSibling);
}

How do I use Javascript to modify the content of a node?

I need to use Javascript to do three things:
Select all nodes with a class of "foo".
Find all words inside these nodes that begin with "*".
Surround those words with <span class="xyz"> ... </span>, where xyz is the word itself.
For example, the content:
<ul>
<li class="foo">
*abc def *ghi
</li>
<li class="bar">
abc *def *ghi
</li>
</ul>
would become
<ul>
<li class="foo">
<span class="abc">*abc</span> def <span class="ghi">*ghi</span>
</li>
<li class="bar">
abc *def *ghi <!-- Not part of a node with class "foo", so
</li> no changes made. -->
</ul>
How might I do this? (P.S. Solutions involving jQuery work too, but other than that I'd prefer not include any additional dependencies.)
No jQuery required:
UE_replacer = function (node) {
// just for performance, skip attribute and
// comment nodes (types 2 and 8, respectively)
if (node.nodeType == 2) return;
if (node.nodeType == 8) return;
// for text nodes (type 3), wrap words of the
// form *xyzzy with a span that has class xyzzy
if (node.nodeType == 3) {
// in the actual text, the nodeValue, change
// all strings ('g'=global) that start and end
// on a word boundary ('\b') where the first
// character is '*' and is followed by one or
// more ('+'=one or more) 'word' characters
// ('\w'=word character). save all the word
// characters (that's what parens do) so that
// they can be used in the replacement string
// ('$1'=re-use saved characters).
var text = node.nodeValue.replace(
/\b\*(\w+)\b/g,
'<span class="$1">*$1</span>' // <== Wrong!
);
// set the new text back into the nodeValue
node.nodeValue = text;
return;
}
// for all other node types, call this function
// recursively on all its child nodes
for (var i=0; i<node.childNodes.length; ++i) {
UE_replacer( node.childNodes[i] );
}
}
// start the replacement on 'document', which is
// the root node
UE_replacer( document );
Updated: To contrast the direction of strager's answer, I got rid of my botched jQuery and kept the regular expression as simple as possible. This 'raw' javascript approach turns out to be much easier than I expected.
Although jQuery is clearly good for manipulating DOM structure, it's actually not easy to figure out how to manipulate text elements.
Don't try to process the innerHTML/html() of an element. This will never work because regex is not powerful enough to parse HTML. Just walk over the Text nodes looking for what you want:
// Replace words in text content, recursively walking element children.
//
function wrapWordsInDescendants(element, tagName, className) {
for (var i= element.childNodes.length; i-->0;) {
var child= element.childNodes[i];
if (child.nodeType==1) // Node.ELEMENT_NODE
wrapWordsInDescendants(child, tagName, className);
else if (child.nodeType==3) // Node.TEXT_NODE
wrapWordsInText(child, tagName, className);
}
}
// Replace words in a single text node
//
function wrapWordsInText(node, tagName, className) {
// Get list of *word indexes
//
var ixs= [];
var match;
while (match= starword.exec(node.data))
ixs.push([match.index, match.index+match[0].length]);
// Wrap each in the given element
//
for (var i= ixs.length; i-->0;) {
var element= document.createElement(tagName);
element.className= className;
node.splitText(ixs[i][1]);
element.appendChild(node.splitText(ixs[i][0]));
node.parentNode.insertBefore(element, node.nextSibling);
}
}
var starword= /(^|\W)\*\w+\b/g;
// Process all elements with class 'foo'
//
$('.foo').each(function() {
wrapWordsInDescendants(this, 'span', 'xyz');
});
// If you're not using jQuery, you'll need the below bits instead of $...
// Fix missing indexOf method on IE
//
if (![].indexOf) Array.prototype.indexOf= function(item) {
for (var i= 0; i<this.length; i++)
if (this[i]==item)
return i;
return -1;
}
// Iterating over '*' (all elements) is not fast; if possible, reduce to
// all elements called 'li', or all element inside a certain element etc.
//
var elements= document.getElementsByTagName('*');
for (var i= elements.length; i-->0;)
if (elements[i].className.split(' ').indexOf('foo')!=-1)
wrapWordsInDescendants(elements[i], 'span', 'xyz');
The regexp would look something like this (sed-ish syntax):
s/\*\(\w+\)\b\(?![^<]*>\)/<span class="\1">*\1</span>/g
Thus:
$('li.foo').each(function() {
var html = $(this).html();
html = html.replace(/\*(\w+)\b(?![^<]*>)/g, "<span class=\"$1\">*$1</span>");
$(this).html(html);
});
The \*(\w+)\b segment is the important piece. It finds an asterisk followed by one or more word characters followed by some sort of word termination (e.g. end of line, or a space). The word is captured into $1, which is then used as the text and the class of the output.
The part just after that ((?![^<]*>)) is a negative lookahead. It asserts that a closing angle bracket does not follow, unless there is an opening angle bracket before it. This prevents a match where the string is inside an HTML tag. This doesn't handle malformed HTML, but that shouldn't be the case anyway.

Categories

Resources