JavaScript: loop through DOM tree and replace text? - javascript

I need to use JavaScript to loop through the DOM tree of a webpage and replace every instance of the word 'hipster' with a different word UNLESS it is part of a link or image src. Example, if 'hipster' appears in a paragraph, it should be replaced.
But if it's in the src="" url for an image, that should not be replaced because if it replaces that word in a url, the url obviously breaks.
I've been having a really hard time implementing this. Here's one thing I tried:
var items = document.getElementsByTagName("*");
var i = 0;
for (i = 0; i < items.length; i++){
if(i.nodeType == 3){
i.html().replace(/hipster/gi, 'James Montour');
}
else{
//do nothing for now
}
}

You’re close, but:
document.getElementsByTagName('*') will never return text nodes; it gets elements only (and comment nodes in old versions of IE)
Text nodes don’t have an html() method
replace doesn’t modify strings in place
With that in mind, you can get every child of every element, check their types, read their nodeValues, and, if appropriate, replace the entire node:
var elements = document.getElementsByTagName('*');
for (var i = 0; i < elements.length; i++) {
var element = elements[i];
for (var j = 0; j < element.childNodes.length; j++) {
var node = element.childNodes[j];
if (node.nodeType === 3) {
var text = node.nodeValue;
var replacedText = text.replace(/hipster/gi, 'James Montour');
if (replacedText !== text) {
element.replaceChild(document.createTextNode(replacedText), node);
}
}
}
}
Try it out on this page!

To use variables instead of 'hipster' replace:
var replacedText = text.replace(/hipster/gi, 'James Montour');
with these lines:
var needle = 'hipster';
var replacement = 'James Montour';
var regex = new RegExp(needle, "gi");
var replacedText = text.replace(regex, replacement);

Related

How to search for a certain text or "string" in html using javascript? (chrome extensiont)

I'm creating a chrome extension whereby when it detects a certain string, it will display a certain html page on the extension. So I tried to use .search followed by an "if else statement". To make it clear that it is working, I've set an alert within the "if else". But it doesn't seem to be working.
Any help?
Here's the javascript file
var elements = document.getElementsByTagName('*');
for (var i = 0; i < elements.length; i++) {
var element = elements[i];
for (var j = 0; j < element.childNodes.length; j++) {
var node = element.childNodes[j];
if (node.nodeType === 3) {
var text = node.nodeValue;
var searchText = text.search(/nmd/g);
// var replacedText = text.replace(/nmd/gi, 'nmdReplaced');
if(searchText == text){
alert("connected");
}
if (replacedText !== text) {
element.replaceChild(document.createTextNode(replacedText), node);
}
}
}
}
The text.replace works, but that's not what I'm trying to get it to do.
Search Methods returns the index of matched string or pattern not the string itself, If not matches anything then it returns -1
so your if(searchText == text) will always fail and it goes to else part, You can simply check for if(searchText !== -1)

Regex function causing chrome to hang

I am trying to find specific words in an html page to replace, however when I run it the RegExp() function is causing chrome to hang. The code below is the section where this is happening
function test(findWord, replaceWord){
var searchregexp = new RegExp(findWord.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), "gi");
console.log(searchregexp);
var elements = document.getElementsByTagName('*');
for (var i = 0; i < elements.length; i++) {
var element = elements[i];
for (var j = 0; j < element.childNodes.length; j++) {
var node = element.childNodes[j];
if (node.nodeType === 3) {
var text = node.nodeValue;
var replacedText = text.replace(searchregexp, replaceWord);
if (replacedText !== text) {
element.innerHTML = replacedText;
}
}
}
}
}
Based on some other things I've read it seems to be that the RegExp function is causing catastrophic backtracking, but I am not really sure why that's the case or how I would solve it.
The code works if I hardcode a specific regular expression like /replace/gi into text.replace(). However, if I try to make it so that the word being replaced doesn't have to be constant and using the new RegExp function chrome hangs

Developing a simple Chrome Extension to replace text, but it doesn't seem to be detecting word boundaries and therefore replaces nested words

I must admit I'm not too familiar with Javascript but here's the relevant code:
var elements = document.getElementsByTagName('*');
for (var i = 0; i < elements.length; i++) {
var element = elements[i];
for (var j = 0; j < element.childNodes.length; j++) {
var node = element.childNodes[j];
if (node.nodeType === 3) {
var text = node.nodeValue;
var replacedText = text.replace(/\bhe\b/gi,'hat');
if (replacedText !== text) {
element.replaceChild(document.createTextNode(replacedText), node);
}
}
}
}
Problem is (in this example) if I want to replace all occurances of 'he' with 'hat' then we'll get outputs like The -> That, Ether -> Ethater, etc. I've tried \b boundaries and \s too to no avail. Any ideas?
Apparently the answer was to disable the old version of the extension that was loaded at the same time. Doh!

Pure Javascript - Find/Replace a list of words in DOM

I have a list of words and I want to replace them in DOM.
The replace is working but it has an issue. Before and after the replaced text, it puts an undesired slash bar (i.e. reduced to a brigade > reduced/ OTHER WORD 1 /a brigade)
Also, I need to wrap the replaced word with an html tag, like .
(i.e. reduced to a brigade > reduced OTHER WORD 1 a brigade)
You can see the code here:
https://jsfiddle.net/realgrillo/9uL6ofge/
var elements = document.getElementsByTagName('*');
var words = [
[/(^|\s)([Tt]o)(\s|$)/ig, 'OTHER WORD 1'],
[/(^|\s)([Aa]nd)(\s|$)/ig, 'OTHER WORD 2']
];
for (var i = 0; i < elements.length; i++) {
var element = elements[i];
for (var j = 0; j < element.childNodes.length; j++) {
var node = element.childNodes[j];
if (node.nodeType === 3) {
var text = node.data;
for (var k = 0; k < words.length; k++)
{
var from = new RegExp(words[k][0]);
var to = new RegExp('$1' + words[k][1] + '$3');
var replacedText = text.replace(from, to);
//console.log("from: " + from);
//console.log("to: " + to);
//console.log("toDo: " + toDo);
if (replacedText !== text)
{
element.innerHTML = element.textContent.replace(from, to);
console.log("text: " + text);
console.log("replacedText: " + replacedText);
}
}
/*
//
//!!!!THIS CODE FOR 1 WORD WORKS!!!! I need to do this for the list of words.
//
var replacedText = text.replace(/(^|\s)([Tt]o)(\s|$)/ig, '$1OTHER WORD$3');
if (replacedText !== text) {
element.innerHTML = element.textContent.replace(/(^|\s)([Tt]o)(\s|$)/ig, '$1<b class=modified>OTHER WORD</b>$3');
}
*/
}
}
}
Masters of JS, can you help me? Pure javascript, not jQuery please.
Thanks.
I can give you some much more simple function to use. First define a function for replacing words like this:
String.prototype.replaceAll = function (search, replacement) {
try {
var target = this;
return target.split(search).join(replacement);
} catch (e) {
return "";
}
};
That's why the pure javascript replace function only replace one time in a whole case. After that you can use it in your code like here:
var replacedText = text.replaceAll(/(^|\s)([Tt]o)(\s|$)/ig, '$1OTHER WORD$3');
You don't need to parse words[k][0] into a RegExp as it is already one, even though it won't break because the constructor will handle a RegExp well.
Also for the words[k][1], you don't need to parse it into a RegExp either because the second parameter of String.prototype.replace is expected to be a plain string. The slashes are there because you cast your RegExp to a string (and will end up output).
So this is a change that will fix it:
var to = '$1' + words[k][1] + '$3';
And this is optional but probably a good practice:
var from = words[k][0];
For getting an html tag, you can either change it directly in each words[k][1] or in the to var declaration, depending on what you want.
Extra:
In the for loop, the code assumes that there are always at least three groups (because of the '$3' reference), an alternative could be to delegate this replacement directly in words[k][1], but it is up to you to decide.

\b metacharacter not working correctly

I know this code works and it has before but it now has stopped working. I am working on a person highlighter tool but if I type in a word then type in mark, it highlights the actual mark element. Here is my code:
function Search (tagC) {
var notes = document.getElementsByClassName("NoteOp");
for (var i = 0; i < notes.length; i++) {
var n = notes[i];
var tagOut = tagC
var tagFront = tagOut.slice(0, -9);
var tagLast = tagOut.slice(-9);
n.innerHTML = n.innerHTML.replace(new RegExp("\\b(" + tagFront + ")\\b", "gim"), "<mark class=" + tagLast + ">$1</mark>");
if(window.Bold === "Yes") {
$("mark").css("font-weight", "bold");
}
}
}
tagFront is the search term while tagLast is a class that always has 9 letters. Any problems seen in the coding?
An example of tagC would be:
testYelColBox
...and the text I'm searching looks like this:
<div id="NoteHolder">
<p class="NoteOp">This is a test paragraph uses to TeSt filters.</p>
<p class="NoteOp">Random words, I need to see if it will mess up mark</p>
</div>
Main question: Why does my code mark an HTML element even though my code has a \b metacharacter selector?
Your problem seems to be this:
If you first highlight a word, it works correctly. But now your HTML has <mark> tags, so if now you search a second time with search word "mark", that tag gets a nested mark, which is undesired and makes your HTML invalid.
Why this happens
The \b escape matches any position in the search string where the character sequence switches from an alphanumerical character to a non-alphanumerical character or vice versa. This means \b also matches with the position right after the < of <mark ...>, and with the position right after the k (because of the space that follows).
Solution
Do a controlled replacement by only applying it to text nodes, not to HTML elements. For this you need to iterate over the nodes, check their type, and when they are text nodes, perform the replacement. As the replacement involves the insertion of an HTML element, you should actually split that text node into what comes before, the mark element, and what comes after.
Here is code that does all this:
function clear() {
var notes = document.getElementsByClassName("NoteOp");
for (var i = 0; i < notes.length; i++) {
var n = notes[i];
// Remove all opening/closing mark tags
n.innerHTML = n.innerHTML.replace(/<\/?mark.*?>/gm, "");
}
}
function highlight(tagC) {
// Sanity check
if (tagC.length <= 9) return; // ignore wrong input
var notes = document.getElementsByClassName("NoteOp");
// Split into parts before entering loop:
var tagFront = tagC.slice(0, -9);
var tagLast = tagC.slice(-9);
// Escape tagFront characters that could conflict with regex syntax:
tagLast = tagLast.replace(/([.*+?^${}()|\[\]\/\\])/g, "\\$1");
var regex = new RegExp("\\b(" + tagFront + ")\\b", "gim");
// Create a template of the highlight that can be cloned
var mark = document.createElement('mark');
mark.setAttribute('class', tagLast);
// Loop over notes
for (var i = 0; i < notes.length; i++) {
// Create a span that will have the contents after replacements
var span = document.createElement('span');
// Go through all child nodes of this note
var nodes = notes[i].childNodes;
for (var j = 0; j < nodes.length; j++) {
var node = nodes[j];
if (node.nodeType === 3) {
// Only if text node, perform replacement
parts = node.textContent.split(regex);
// As regex has capture group, also the split expression is a part
for (var k = 0; k < parts.length; k++) {
// Add this part
if (k % 2) {
// Add highlighted text
mark.textContent = parts[k];
span.appendChild(mark.cloneNode(true));
} else {
// Add text part that did not match as such
span.appendChild(document.createTextNode(parts[k]));
}
}
} else {
// Non-text nodes are just copied as they are
span.appendChild(node.cloneNode(true));
}
}
// Replace note contents with new contents
notes[i].innerHTML = span.innerHTML;
}
// Setting style for CSS class should happen outside of the loop
$("mark").css("font-weight", window.Bold === "Yes" ? "bold": "normal");
}
// I/O
var inp = document.querySelector('#inp');
var btnMark = document.querySelector('#mark');
var btnClear = document.querySelector('#clear');
btnMark.onclick = function () {
highlight(inp.value + 'YelColBox');
}
btnClear.onclick = clear;
Type text to be highlighted and press Mark:<br>
<input id="inp" value="">
<button id="mark">Mark</button>
<button id="clear">Clear</button>
<div id="NoteHolder">
<p class="NoteOp">This is a test paragraph uses to TeSt filters.</p>
<p class="NoteOp">Random words, I need to see if it will mess up mark</p>
</div>

Categories

Resources