How to get only the rendered text of a Text Node? - javascript

I am creating a little webextension which modifies a webpage depending on the text. As an example of my problem here is some code which has a tree walker grabs all text nodes on a page:
var treeWalker = document.createTreeWalker(
document.body,
NodeFilter.SHOW_TEXT,
{ acceptNode: () => {return NodeFilter.FILTER_ACCEPT;} },
false
);
while(treeWalker.nextNode()) {
let x = treeWalker.currentNode.data;
//do something with x
}
Unfortunately, x will have all of the text in the node, even if it isn't shown on the webpage.
What I want is something like treeWalker.currentNode.innerText, but that is undefined for text nodes. Does anyone know how to get only the text shown to the user for a text node?
Example: If a webpage has the node with the following HTML:
<div>
<script type="text/x-config">
{
"setObject": -1
}
</script>
<span>Quiz</span>
with associated CSS:
script {
display: none;
}
Then the text content of the respective text node (minus extra spaces and line breaks) is returned as "{ "setObject": -1 } Quiz". However, the only thing that is rendered to the user is "Quiz". Given the respective text node, how do I get only the rendered text?

I guess we have a problem.
If you are using a new browser you should be able to the innerText, but if you are not you have to use textContent.
The problem of textContent is that it gets the content of all elements including and and textContent is not aware of style so it will return hidden events.
I guess the way to go is to replace the filter with NodeFilter.SHOW_ELEMENT and get the Element.innerHTML.
So try it:
var treeWalker = document.createTreeWalker(
document.body,
NodeFilter.SHOW_ELEMENT,
{ acceptNode: (node) => { return NodeFilter.FILTER_ACCEPT;} },
false
);
while(treeWalker.nextNode()) {
let x = treeWalker.currentNode.innerHTML;
//do something with x
}

Related

Replace text with link with chrome extension

I am trying to replace text on a webpage with links. When I try this it just replaces the text with the tag and not a link. For example this code will replace "river" with:
asdf
This is what I have so far:
function handleText(textNode)
{
var v = textNode.nodeValue;
v = v.replace(/\briver\b/g, 'asdf');
textNode.nodeValue = v;
}
If all you wanted to do was change the text to other plain text, then you could change the contents of the text nodes directly. However, you are wanting to add an <a> element. For each <a> element you want to add, you are effectively wanting to add a child element. Text nodes can not have children. Thus, to do this you have to actually replace the text node with a more complicated structure. In doing so, you will want to make as little impact on the DOM as possible, in order to not disturb other scripts which rely on the current structure of the DOM. The simplest way to make little impact is to replace the text node with a <span> which contains the new text nodes (the text will split around the new <a>) and any new <a> elements.
The code below should do what you desire. It replaces the textNode with a <span> containing the new text nodes and the created <a> elements. It only makes the replacement when one or more <a> elements need to be inserted.
function handleTextNode(textNode) {
if(textNode.nodeName !== '#text'
|| textNode.parentNode.nodeName === 'SCRIPT'
|| textNode.parentNode.nodeName === 'STYLE'
) {
//Don't do anything except on text nodes, which are not children
// of <script> or <style>.
return;
}
let origText = textNode.textContent;
let newHtml=origText.replace(/\briver\b/g,'asdf');
//Only change the DOM if we actually made a replacement in the text.
//Compare the strings, as it should be faster than a second RegExp operation and
// lets us use the RegExp in only one place for maintainability.
if( newHtml !== origText) {
let newSpan = document.createElement('span');
newSpan.innerHTML = newHtml;
textNode.parentNode.replaceChild(newSpan,textNode);
}
}
//Testing: Walk the DOM of the <body> handling all non-empty text nodes
function processDocument() {
//Create the TreeWalker
let treeWalker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT,{
acceptNode: function(node) {
if(node.textContent.length === 0) {
//Alternately, could filter out the <script> and <style> text nodes here.
return NodeFilter.FILTER_SKIP; //Skip empty text nodes
} //else
return NodeFilter.FILTER_ACCEPT;
}
}, false );
//Make a list of the text nodes prior to modifying the DOM. Once the DOM is
// modified the TreeWalker will become invalid (i.e. the TreeWalker will stop
// traversing the DOM after the first modification).
let nodeList=[];
while(treeWalker.nextNode()){
nodeList.push(treeWalker.currentNode);
}
//Iterate over all text nodes, calling handleTextNode on each node in the list.
nodeList.forEach(function(el){
handleTextNode(el);
});
}
document.getElementById('clickTo').addEventListener('click',processDocument,false);
<input type="button" id="clickTo" value="Click to process"/>
<div id="testDiv">This text should change to a link -->river<--.</div>
The TreeWalker code was taken from my answer here.

DIV with <br> new line text

in my project i have an xmlhttpresponse object with some node, and i need to print element of one (serps) in a div obj but formatted.
Node is like this:
now i have to create a div where store serps info like response.serps1.headline+""+response.serps1.url+""+response.serps2.headline+""+response.serps2.url ecc ecc and in my code i have tried like this:
//Data
var divSerp3 = createElement('div', 'divSerp3', 'divSerp3css');
if (typeof(response.serps) === 'undefined' || response.serps === null) {
tse3 = document.createTextNode("NO DATA");
} else {
tse3 = document.createTextNode(response.serps[1].headline+" <br>"+response.serps[1].url+"<br><br>"+response.serps[2].headline+" <br>"+response.serps[2].url+"<br><br>"+response.serps[3].headline+"<br>"+response.serps[3].url+"<br><br>"+response.serps[4].headline+"<br>"+response.serps[4].url+"<br><br>"+response.serps[5].headline+" <br>"+response.serps[5].url);
}
divSerp3.appendChild(tse3);
but the result is like:
How can i cycle my entire serps node and insert data in a formatted mode into my div??
Html won't be rendered correctly in a TextNode.. as the element's name says by itself, its content is basically textual.
I suggest you to append <br> separately and I would not use createTextNode().. I'd append as many childs as you need using the appropriate html elements (like spans, paragraphs, etc..) and filling their content with $.html('your content') function if you are using jQuery library or element.innerHtml if you are working with pure javascript.
Hope it helps ;)
Instead of creating a text node, create an element and use innerHTML.
var divSerp3 = createElement('div', 'divSerp3', 'divSerp3css');
if (typeof(response.serps) === 'undefined' || response.serps === null) {
tse3 = document.createTextNode("NO DATA");
} else {
tse3 = document.createElement('span');
tse3.innerHTML = response.serps[1].headline+" <br>"+response.serps[1].url+"<br><br>"+response.serps[2].headline+" <br>"+response.serps[2].url+"<br><br>"+response.serps[3].headline+"<br>"+response.serps[3].url+"<br><br>"+response.serps[4].headline+"<br>"+response.serps[4].url+"<br><br>"+response.serps[5].headline+" <br>"+response.serps[5].url);
}
divSerp3.appendChild(tse3);
You're creating a TextNode which will take all of your html and parse it as text. You want to document.createElement('br') and append those instead of doing +"<br><br>"
HTML won't render when inside a text node. See Is it possible to get the the createTextNode method to render html tags?
You could create <br> elements and append them.
Or you could use newlines instead of <br> and use CSS white-space: pre-wrap;

jQuery .text('') on multiple nested elements

I wanted to remove all text from html and print only tags. I Ended up writing this:
var html = $('html');
var elements = html.find('*');
elements.text('');
alert(html.html());
It only out prints <head></head><body></body>. Was not that suppose to print all tags. I've nearly 2000 tags in the html.
var elements = html.find('*');
elements.text('');
That says "find all elements below html, then empty them". That includes body and head. When they are emptied, there are no other elements on the page, so they are the only ones that appear in html's content.
If you really wnat to remove all text from the page and leave the elements, you'll have to do it with DOM methods:
html.find('*').each(function() { // loop over all elements
$(this).contents().each(function() { // loop through each element's child nodes
if (this.nodeType === 3) { // if the node is a text node
this.parentNode.removeChild(this); // remove it from the document
}
});
})
You just deleted everything from your dom:
$('html').find('*').text('');
This will set the text of all nodes inside the <html> to the empty string, deleting descendant elements - the only two nodes that are left are the two children of the root node, <head></head> and <body></body> with their empty text node children - exactly the result you got.
If you want to remove all text nodes, you should use this:
var html = document.documentElement;
(function recurse(el) {
for (var i=0; i<el.childNodes.length; i++) {
var child = el.childNodes[i];
if (child.nodeType == 3)
el.removeChild(child);
else
recurse(child);
}
})(html);
alert(html.outerHTML);
Try this instead
$(function(){
var elements = $(document).find("*");
elements.each(function(index, data){
console.log(data);
});
});
This will return all the html elements of page.
lonesomeday seems to have the right path, but you could also do some string rebuilding like this:
var htmlString=$('html').html();
var emptyHtmlString="";
var isTag=false;
for (i=0;i<htmlString.length;i++)
{
if(htmlString[i]=='<')
isTag=true;
if(isTag)
{
emptyHtmlString+=htmlString[i];
}
if(htmlString[i]=='>')
isTag=false;
}
alert(emptyHtmlString);

How to select a part of string?

How to select a part of string?
My code (or example):
<div>some text</div>
$(function(){
$('div').each(function(){
$(this).text($(this).html().replace(/text/, '<span style="color: none">$1<\/span>'));
});
});
I tried this method, but in this case is selected all context too:
$(function(){
$('div:contains("text")').css('color','red');
});
I try to get like this:
<div><span style="color: red">text</span></div>
$('div').each(function () {
$(this).html(function (i, v) {
return v.replace(/foo/g, '<span style="color: red">$&<\/span>');
});
});
What are you actually trying to do? What you're doing at the moment is taking the HTML of each matching DIV, wrapping a span around the word "text" if it appears (literally the word "text") and then setting that as the text of the element (and so you'll see the HTML markup on the page).
If you really want to do something with the actual word "text", you probably meant to use html rather than text in your first function call:
$('div').each(function(){
$(this).html($(this).html().replace(/text/, '<span style="color: none">$1<\/span>'));
// ^-- here
}
But if you're trying to wrap a span around the text of the div, you can use wrap to do that:
$('div').wrap('<span style="color: none"/>');
Like this: http://jsbin.com/ucopo3 (in that example, I've used "color: blue" rather than "color: none", but you get the idea).
$(function(){
$('div:contains("text")').each(function() {
$(this).html($(this).html().replace(/(text)/g, '<span style="color:red;">\$1</span>'));
});
});
I've updated your fiddle: http://jsfiddle.net/nMzTw/15/
The general practice of interacting with the DOM as strings of HTML using innerHTML has many serious drawbacks:
Event handlers are removed or replaced
Opens the possibility of script inject attacks
Doesn't work in XHTML
It also encourages lazy thinking. In this particular instance, you're matching against the string "text" within the HTML with the assumption that any occurrence of the string must be within a text node. This is patently not a valid assumption: the string could appear in a title or alt attribute, for example.
Use DOM methods instead. This will get round all the problems. The following will use only DOM methods to surround every match for regex in every text node that is a descendant of a <div> element:
$(function() {
var regex = /text/;
function getTextNodes(node) {
if (node.nodeType == 3) {
return [node];
} else {
var textNodes = [];
for (var n = node.firstChild; n; n = n.nextSibling) {
textNodes = textNodes.concat(getTextNodes(n));
}
return textNodes;
}
}
$('div').each(function() {
$.each(getTextNodes(this), function() {
var textNode = this, parent = this.parentNode;
var result, span, matchedTextNode, matchLength;
while ( textNode && (result = regex.exec(textNode.data)) ) {
matchedTextNode = textNode.splitText(result.index);
matchLength = result[0].length;
textNode = (matchedTextNode.length > matchLength) ?
matchedTextNode.splitText(matchLength) : null;
span = document.createElement("span");
span.style.color = "red";
span.appendChild(matchedTextNode);
parent.insertBefore(span, textNode);
}
});
});
});

Building editor with DOM Range and content editable

I'm trying to build a text editor using DOM Range. Let's say I'm trying to bold selected text. I do it using the following code. However, I couldn't figure out how I would remove the bold if it's already bolded. I'm trying to accomplish this without using the execCommand function.
this.selection = window.getSelection();
this.range = this.selection.getRangeAt(0);
let textNode = document.createTextNode(this.range.toString());
let replaceElm = document.createElement('strong');
replaceElm.appendChild(textNode);
this.range.deleteContents();
this.range.insertNode(replaceElm);
this.selection.removeAllRanges();
Basically, if the selection range is enclosed in <strong> tags, I'd want to remove it.
Ok so I drafted this piece of code. It basically grabs the current selected node, gets the textual content and removes the style tags.
// Grab the currenlty selected node
// e.g. selectedNode will equal '<strong>My bolded text</strong>'
const selectedNode = getSelectedNode();
// "Clean" the selected node. By clean I mean extracting the text
// selectedNode.textContent will return "My bolded text"
/// cleandNode will be a newly created text type node [MDN link for text nodes][1]
const cleanedNode = document.createTextNode(selectedNode.textContent);
// Remove the strong tag
// Ok so now we have the node prepared.
// We simply replace the existing strong styled node with the "clean" one.
// a.k.a undoing the strong style.
selectedNode.parentNode.replaceChild(cleanedNode, selectedNode);
// This function simply gets the current selected node.
// If you were to select 'My bolded text' it will return
// the node '<strong> My bolded text</strong>'
function getSelectedNode() {
var node,selection;
if (window.getSelection) {
selection = getSelection();
node = selection.anchorNode;
}
if (!node && document.selection) {
selection = document.selection
var range = selection.getRangeAt ? selection.getRangeAt(0) : selection.createRange();
node = range.commonAncestorContainer ? range.commonAncestorContainer :
range.parentElement ? range.parentElement() : range.item(0);
}
if (node) {
return (node.nodeName == "#text" ? node.parentNode : node);
}
};
I don't know if this is a "production" ready soution but I hope it helps. This should work for simple cases. I don't know how it will react with more complex cases. With rich text editing things can get quite ugly.
Keep me posted :)

Categories

Resources