How to get all elements that can display text on the page? - javascript

I want to write a script to replace all text meeting a condition with something else.
However, I don't want it to replace text in elements such as script, style, etc. which are not shown/rendered.
What is the best way to distinguish these elements?
//Example of idea:
var elements = document.getElementsByTagName("*");
var element;
var text;
for(var i=0; i<elements.length; i++){
element = elements[i];
//Need to detect only text that is displayed.
text = element.textContent;
if(checkText(text)){element.textContent = somethingElse;}//Abstract idea
}

You could try this
$(':contains("targetText")').text("newText");

This is the purpose of TreeWalkers and the document.createTreeWalker method:
function getTextNodes (root) {
var tw = document.createTreeWalker(root || document.body, NodeFilter.SHOW_TEXT, {
acceptNode: function(node) {
return /^(STYLE|SCRIPT)$/.test(node.parentElement.tagName) ||
/^\s*$/.test(node.data) ? NodeFilter.FILTER_REJECT : NodeFilter.FILTER_ACCEPT
}
})
var result = []
while (tw.nextNode()) result.push(tw.currentNode)
return result
}
var textNodes = getTextNodes()
// Text nodes before
console.log(
textNodes.map(function (n) { return n.data })
)
// Example text data transformation
textNodes.forEach(function (n) {
n.data = n.data.toUpperCase()
})
// Text nodes after
console.log(
textNodes.map(function (n) { return n.data })
)
<p>Lorem ipsum dot dolor sit amet...</p>
<span>More example text!</span>
<style>
.omitted style { }
</style>
<script>
'omitted script'
</script>

You can also use jQuery's star selector (*) and say :not(script) to the tags you want to ignore.
The following will replace any element that has the content "Test" with "Replaced", while ignoring any <script> and <style> tags.
$("body *:not(script):not(style)").each(function() {
if ($(this).text() == "Test") {
$(this).text("Replaced");
}
});
<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.js"></script>
<div class="content">
<h1>Hello World</h1>
<p>Lorem ipsum...</p>
<p>Test</p>
<ul>
<li>Hello</li>
<li>World</li>
<li>Test</li>
</ul>
</div>

Related

Remove all attributes from an HTML element and all its children [duplicate]

This question already has answers here:
Remove all attributes
(10 answers)
Closed 6 years ago.
In the following code, I would like to remove all attributes, classes, etc. from all the HTML tags inside the elements that have class "card-back", so the result are the bare tags (+ content) only.
I looked here and here, but couldn't get it to work.
Here's my code so far:
$.fn.removeAttributes = function(only, except) {
if (only) {
only = $.map(only, function(item) {
return item.toString().toLowerCase();
});
};
if (except) {
except = $.map(except, function(item) {
return item.toString().toLowerCase();
});
if (only) {
only = $.grep(only, function(item, index) {
return $.inArray(item, except) == -1;
});
};
};
return this.each(function() {
var attributes;
if (!only) {
attributes = $.map(this.attributes, function(item) {
return item.name.toString().toLowerCase();
});
if (except) {
attributes = $.grep(attributes, function(item, index) {
return $.inArray(item, except) == -1;
});
};
} else {
attributes = only;
}
var handle = $(this);
$.each(attributes, function(index, item) {
handle.removeAttr(item);
});
});
};
$('.card_back').removeAttributes(null, null).filter(function() {
var data = $(this);
back = data.html().trim();
alert(back);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div class="card_wrapper">
<div class="card_navigation">
zurück |
<a title="Titletext" href="/xyz">next</a> </div>
<div class="card_front">
<span class="info">Front</span>
<p>here's just some text
<br>and one more line.
</p>
<p>here's just another text
<br>and one more line.
</p>
</div>
<div class="card_back">
<span class="info">Back</span>
<p class="test"><span id="test3">Lorem Ipsum non dolor <strong>nihil est major</strong>, laudat amemus hibitet</span></p>
<p><span style="color: red">- <strong>Non solum</strong>, sed calucat ebalitant medetur</span></p>
<p> </p>
</div>
</div>
As pointed out in this response you can extend removeAttr to take no parameters and delete all attributes.
BEWARE, YOU WILL REMOVE SRC ATTRIBUTE FROM IMAGES INSIDE!!!
Then paired with removeClass (wich already can take no params) and a loop over each element gives this:
var removeAttr = jQuery.fn.removeAttr;
jQuery.fn.removeAttr = function() {
if (!arguments.length) {
this.each(function() {
// Looping attributes array in reverse direction
// to avoid skipping items due to the changing length
// when removing them on every iteration.
for (var i = this.attributes.length -1; i >= 0 ; i--) {
jQuery(this).removeAttr(this.attributes[i].name);
}
});
return this;
}
return removeAttr.apply(this, arguments);
};
$('.card_back').find('*').each(function( index, element ) {
$(element).removeClass();
$(element).removeAttr();
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div class="card_wrapper">
<div class="card_navigation">
zurück |
<a title="Titletext" href="/xyz">next</a> </div>
<div class="card_front">
<span class="info">Front</span>
<p>here's just some text
<br>and one more line.
</p>
<p>here's just another text
<br>and one more line.
</p>
</div>
<div class="card_back">
<span class="info">Back</span>
<p class="test"><span id="test3">Lorem Ipsum non dolor <strong>nihil est major</strong>, laudat amemus hibitet</span></p>
<p><span style="color: red">- <strong>Non solum</strong>, sed calucat ebalitant medetur</span></p>
<p> </p>
</div>
</div>

Will this loop correctly and be able to list tag names?

I want to prompt user to enter a tag and it will list it in the console.log and will ask again until they type "quit". if that happens then I will use the documentwrite to list in the innertext what the previous tags been searched for.
var selector = prompt("Please enter a selector: ");
var selectorr = document.getElementsByTagName(selector);
var breaker = "quit";
breaker = false;
var textlogger = "elements have been found that match the selector ";
var lengthfinder = selectorr.length;
while(true) {
console.log(lengthfinder + textlogger + selector);
if (selector == breaker) {
for (var i=0; i<divs.length; i++) {
document.write.innerText(textlogger);
}
}
}
If you wanna try jQuery and something fun, take this:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Loop with jquery deferred</title>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script type="text/javascript">
var loop = function () {
return $.Deferred(function (deferred) {
var selector = prompt("Please enter a selector: ");
var quit = 'quit';
var selectors = [];
while (selector && selector != quit) {
selectors.push(selector);
var elements = $(selector);
console.log(elements.length + " elements have been found that match the selector " + selector);
selector = prompt("Please enter a selector: ");
}
if (selector)
{
deferred.resolve(selectors);
}
else
{
deferred.reject();
}
}).promise();
};
$(function () {
loop().done(function (selectors) {
$($.map(selectors, function (item, index) {
return '<div>' + item + '</div>';
}).join('')).appendTo($('body'));
});
});
</script>
</head>
<body>
<div>
<iframe src="http://stackoverflow.com/questions/40392515/will-this-loop-correctly-and-be-able-to-list-tag-names"/>
</div>
</body>
</html>
Here is the version with comments and suggestions on where to put your necessary code for it to work.
Code Preview
var breaker = "quit",
textlogger = "elements have been found that match the selector ",
textList = new Array();
while (true) {
var selector = prompt("Please enter a selector: ");
if (selector == breaker) {
/*
Write your necessary output here
*/
/*
After output you break out
*/
break;
} else {
/*
Write It inside list
*/
textList.push(selector);
}
/*
Write necessary output in console
*/
console.log(selector);
}
I want to prompt user to enter a tag and it will list it in the
console.log and will ask again until they type "quit"
while ("quit" !== prompt("Tag name selector, type `quit` to exit", "quit")) {
console.log("in loop");
}
console.log("exit loop");
I will use the documentwrite to list in the innertext what the
previous tags been searched for.
Either you use: document.write("some text") to append to existing dom or you can use selectorr[i].innerText="some text"
Here is my small example that might help you:
var selector;
while ("quit" !== (selector = prompt("Tag name selector. Use `quit` to cancel search", "quit"))) {
var elements = document.getElementsByTagName(selector);
var count = elements.length;
while (count--) {
elements[count].innerHTML += " [matched]";
}
}
<span>This is my <span> tag 1</span>
<p>This is my <p> tag 1</p>
<div>This is my <div> tag 2</div>
<p>This is my <p> tag 3</p>
<span>This is my <span> tag 2</span>

Better way of extracting text from HTML in Javascript

I'm trying to scrape text from an HTML string by using container.innerText || container.textContent where container is the element from which I want to extract text.
Usually, the text I want to extract is located in <p> tags. So for the HTML below as an example:
<div id="container">
<p>This is the first sentence.</p>
<p>This is the second sentence.</p>
</div>
Using
var container = document.getElementById("container");
var text = container.innerText || container.textContent; // the text I want
will return This is the first sentence.This is the second sentence. without a space between the first period and the start of the second sentence.
My overall goal is to parse text using the Stanford CoreNLP, but its parser cannot detect that these are 2 sentences because they are not separated by a space. Is there a better way of extracting text from HTML such that the sentences are separated by a space character?
The HTML I'm parsing will have the text I want mostly in <p> tags, but the HTML may also contain <img>, <a>, and other tags embeeded between <p> tags.
As a dirty hack, try using this:
container.innerHTML.replace(/<.*?>/g," ").replace(/ +/g," ");
This will replace all tags with a space, then collapse multiple spaces into a single one.
Note that if there is a > inside an attribute value, this will mess you up. Avoiding this problem will require more elaborate parsing, such as looping through all text nodes and putting them together.
Longer but more robust method:
function recurse(result, node) {
var c = node.childNodes, l = c.length, i;
for( i=0; i<l; i++) {
if( c[i].nodeType == 3) result += c.nodeValue + " ";
if( c[i].nodeType == 1) result = recurse(result, c[i]);
}
return result;
}
recurse(container);
Assuming I haven't made a stupid mistake, this will perform a depth-first search for text nodes, appending their contents to the result as it goes.
jQuery has the method text() that does what you want. Will this work for you?
I'm not sure if it fits for everything that's in your container but it works in my example. It will also take the text of a <a>-tag and appends it to the text.
Update 20.12.2020
If you're not using jQuery. You could implement the text method with vanilla js like this:
const nodes = Array.from(document.querySelectorAll("#container"));
const text = nodes
.filter((node) => !!node.textContent)
.map((node) => node.textContent)
.join(" ");
Using querySelectorAll("#container") to get every node in the container. Using Array.from so we can work with Array methods like filter, map & join.
Finally, generate the text by filtering out elements with-out textContent. Then use map to get each text and use join to add a space separator between the text.
$(function() {
var textToParse = $('#container').text();
$('#output').html(textToParse);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="container">
<p>This is the first sentence.</p>
<p>This is the second sentence.</p>
<img src="http://placehold.it/200x200" alt="Nice picture"></img>
<p>Third sentence.</p>
</div>
<h2>output:</h2>
<div id="output"></div>
You can use the following function to extract and process the text as shown. It basically goes through all the children nodes of the target element and the child nodes of the child nodes and so on ... adding spaces at appropriate points:
function getInnerText( sel ) {
var txt = '';
$( sel ).contents().each(function() {
var children = $(this).children();
txt += ' ' + this.nodeType === 3 ? this.nodeValue : children.length ? getInnerText( this ) : $(this).text();
});
return txt;
}
function getInnerText( sel ) {
var txt = '';
$( sel ).contents().each(function() {
var children = $(this).children();
txt += ' ' + this.nodeType === 3 ?
this.nodeValue : children.length ?
getInnerText( this ) : $(this).text();
});
return txt;
}
alert( getInnerText( '#container' ) );
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<div id="container">
Some other sentence
<p>This is the first sentence.</p>
<p>This is the second sentence.</p>
</div>
You may use jQuery to traverse down the elements.
Here is the code :
$(document).ready(function()
{
var children = $("#container").find("*");
var text = "";
while (children.html() != undefined)
{
text += children.html()+"\n";
children = children.next();
}
alert(text);
});
Here is the fiddle : http://jsfiddle.net/69wezyc5/

jQuery replace all occurrences of a string in an html page

I'm working on a project where I need to replace all occurrences of a string with another string. However, I only want to replace the string if it is text. For example, I want to turn this...
<div id="container">
<h1>Hi</h1>
<h2 class="Hi">Test</h2>
Hi
</div>
into...
<div id="container">
<h1>Hello</h1>
<h2 class="Hi">Test</h2>
Hello
</div>
In that example all of the "Hi"s were turned into "Hello"s except for the "Hi" as the h2 class.
I have tried...
$("#container").html( $("#container").html().replace( /Hi/g, "Hello" ) )
... but that replaces all occurrences of "Hi" in the html as well
This:
$("#container").contents().each(function () {
if (this.nodeType === 3) this.nodeValue = $.trim($(this).text()).replace(/Hi/g, "Hello")
if (this.nodeType === 1) $(this).html( $(this).html().replace(/Hi/g, "Hello") )
})
Produces this:
<div id="container">
<h1>Hello</h1>
<h2 class="Hi">Test</h2>
Hello
</div>
jsFiddle example
Nice results with:
function str_replace_all(string, str_find, str_replace){
try{
return string.replace( new RegExp(str_find, "gi"), str_replace ) ;
} catch(ex){return string;}}
and easier to remember...
replacedstr = str.replace(/needtoreplace/gi, 'replacewith');
needtoreplace should not rounded by '
//Get all text nodes in a given container
//Source: http://stackoverflow.com/a/4399718/560114
function getTextNodesIn(node, includeWhitespaceNodes) {
var textNodes = [], nonWhitespaceMatcher = /\S/;
function getTextNodes(node) {
if (node.nodeType == 3) {
if (includeWhitespaceNodes || nonWhitespaceMatcher.test(node.nodeValue)) {
textNodes.push(node);
}
} else {
for (var i = 0, len = node.childNodes.length; i < len; ++i) {
getTextNodes(node.childNodes[i]);
}
}
}
getTextNodes(node);
return textNodes;
}
var textNodes = getTextNodesIn( $("#container")[0], false );
var i = textNodes.length;
var node;
while (i--) {
node = textNodes[i];
node.textContent = node.textContent.replace(/Hi/g, "Hello");
}
Note that this will also match words where "Hi" is only part of the word, e.g. "Hill". To match the whole word only, use /\bHi\b/g
here you go => http://jsfiddle.net/c3w6X/1/
var children='';
$('#container').children().each(function(){
$(this).html($(this).html().replace(/Hi/g,"Hello")); //change the text of the children
children=children+$(this)[0].outerHTML; //copy the changed child
});
var theText=$('#container').clone().children().remove().end().text(); //get the text outside of the child in the root of the element
$('#container').html(''); //empty the container
$('#container').append(children+theText.replace(/Hi/g,"Hello")); //add the changed text of the root and the changed children to the already emptied element

Smart text replacing with jQuery

I need to replace some part of text, e.g. mustache var {{myvar}}, on already loaded page.
Example html:
<html>
<head>
<title>{{MYTITLE}}</title>
</head>
<body>
<p><strong><ul><li>text {{TEXT}}</li></ul></strong></p>
{{ANOTHER}}
</body>
</html>
What's the problem? Use $(html).html(myrenderscript($(html).html()))!
It's ugly, slow and brokes <script> tags.
What do you want?
I want to get closest tag with {{}} and than render and replace.
Your researches?
Firstly, i tried: $('html :contains("{{")). But it returns <title>, <p>, <strong> .... But i need <title> and <li>.
Than i tried to filter them:
$('html :contains("{{")').filter(function (i) {
return $(this).find(':contains("{{")').length === 0
});
...but it WONT return {{ANOTHER}}. And that is my dead end. Your suggestions?
Using http://benalman.com/projects/jquery-replacetext-plugin/ you could do the following:
$('html *').replaceText(/{{([^}]+)}}/, function(fullMatch, key) {
return key;
}, true);
See http://jsfiddle.net/4nvNy/
If all you want to do is replace that text - then surely the following works (or have I mis-understood)
usage is as follows: CONTAINER (body) - replaceTExt (search term (I have built the function to always include {{}} around the term), (replace - this will remove the {{}} as well)
$('body').replaceText("MYTITLE","WHATEVER YOU WANT IT REPLACING WITH");
$.fn.replaceText = function(search, replace, text_only) {
return this.each(function(){
var v1, v2, rem = [];
$(this).find("*").andSelf().contents().each(function(){
if(this.nodeType === 3) {
v1 = this.nodeValue;
v2 = v1.replace("{{" + search + "}}", replace );
if(v1!=v2) {
if(!text_only && /<.*>/.test(v2)) {
$(this).before( v2 );
rem.push(this);
}
else this.nodeValue = v2;
}
}
});
if(rem.length) $(rem).remove();
});
};
You could avoid jQuery altogether if you wanted to with something like this:
<body>
<p><strong>
<ul>
<li>text {{TEXT}}</li>
</ul>
</strong></p>
{{ANOTHER}}
<hr/>
<div id="showResult"></div>
<script>
var body = document.getElementsByTagName('body')[0].innerHTML;
var startIdx = 0, endIdx = 0, replaceArray = [];
var scriptPos = body.indexOf('<script');
while (startIdx != 1) {
startIdx = body.indexOf('{{', endIdx) + 2;
if(startIdx > scriptPos){
break;
}
endIdx = body.indexOf('}}', startIdx);
var keyText = body.substring(startIdx, endIdx);
replaceArray.push({"keyText": keyText, 'startIdx': startIdx, 'endIdx': endIdx});
}
document.getElementById("showResult").innerHTML = JSON.stringify(replaceArray);
</script>
</body>
You can then do what you want with the replaceArray.

Categories

Resources