Applying RegEx on all text in element

Applying RegEx on all text in element - javascript

I'm trying to dynamically replace specific words with a link within a certain HTML element using JS. I figured I'd use a simple RegEx:
var regEx = new RegExp('\\b'+text+'\\b', 'gi');
The quick'n'nasty way it to apply the RegEx replace on the context div's innerHTML property:
context.innerHTML = context.innerHTML.replace(regEx, ''+text+"");
The problem with this is that it also applies to, say image titles, thus breaking the layout of the page. I want it to apply only to the text of the page, if possible also excluding things like header tags and of course HTML comment and such.
So I tried something like this instead, but it doesn't seem to work at all:
function replaceText(context, regEx, replace) {
var childNodes = context.childNodes;
for (n in childNodes) {
console.log(childNodes[n].nodeName);
if (childNodes[n] instanceof Text) {
childNodes[n].textContent = childNodes[n].textContent.replace(regEx, replace);
} else if (childNodes[n] instanceof HTMLElement) {
replaceText(childNodes[n], regEx, replace);
console.log('Entering '+childNodes[n].nodeName);
} else {
console.log('Skipping '+childNodes[n].nodeName);
}
}
}
Can anyone see what I'm doing wrong, or maybe come up with a better solution? Thanks!
UPDATE:
Here's a snippet of what the contents of context may look like:
<h4>Newton's Laws of Motion</h4>
<p><span class="inline_title">Law No.1</span>: <span class="caption">An object at rest will remain at rest, and an object in motion will continue to move at constant velocity, unless a net force is applied.</span></p>
<ul>Consequences: <li>Conservation of Momentum in both elastic and inelastic collisions</li>
<li>Conservation of kinetic energy in elastic collisions but not inelastic.</li>
<li>Conservation of angular momentum.</li>
</ul>
<h5>Equations</h5>
<p class="equation">ρ = mv</p>
<p>where ρ is the momentum, and m is the mass of an object moving at constant velocity v.</p>

You can use this:
function replaceText(context, regEx, replace)
{
var childNodes = context.childNodes;
for (var i = 0; i<childNodes.length; i++) {
var childNode = childNodes[i];
if (childNode.nodeType === 3) // 3 is for text node
childNode.nodeValue = childNode.nodeValue.replace(regEx, replace);
else if (childNode.nodeType === 1 && childNode.nodeName != "HEAD")
replaceText(childNode, regEx, replace);
}
}
replaceText(context, /cons/ig, 'GROUIK!');
The idea is to find all text nodes in "context" DOM tree, It is the reason why i use a recursive function to search text nodes inside child nodes.
Note: I test childNode.nodeName != "HEAD" in the function. It's only an example to avoid a particular tag. In the real life it is more simple to give the body node as parameter to the function.

As per my understanding, you're trying to replace text in innerHTML but within tags.
First I tried to use to use innerText instead of innerHTML, but it is not giving the expexted result. Later I found a #Alan Moore's answer with Negative Lookahead regex like
(?![^<>]*>)
Which can be use to ignore the text within tags <>. Here is my approach
var regEx = new RegExp("(?![^<>]*>)" + title, 'gi');
context.innerHTML = context.innerHTML.replace(regEx, ''+text+"");
Here is a sample JSFiddle

Related

Search the HTML document's text for certain strings (and replace those)

I'm writing a Firefox extension. I want to go through the entire plaintext, so not Javascript or image sources, and replace certain strings. I currently have this:
var text = document.documentElement.innerHTML;
var anyRemaining = true;
do {
var index = text.indexOf("search");
if (index != -1) {
// This does not just replace the string with something else,
// there's complicated processing going on here. I can't use
// string.replace().
} else {
anyRemaining = false;
}
} while (anyRemaining);
This works, but it will also go through non-text elements and HTML such as Javascript, and I only want it to do the visible text. How can I do this?
I'm currently thinking of detecting an open bracket and continuing at the next closing bracket, but there might be better ways to do this.

You can use xpath to get all the text nodes on the page and then do your search/replace on those nodes:
function replace(search,replacement){
var xpathResult = document.evaluate(
"//*/text()",
document,
null,
XPathResult.ORDERED_NODE_ITERATOR_TYPE,
null
);
var results = [];
// We store the result in an array because if the DOM mutates
// during iteration, the iteration becomes invalid.
while(res = xpathResult.iterateNext()) {
results.push(res);
}
results.forEach(function(res){
res.textContent = res.textContent.replace(search,replacement);
})
}
replace(/Hello/g,'Goodbye');
<div class="Hello">Hello world!</div>

You can either use regex to strip the HTML tags, might be easier to use javascript function to return the text without HTML. See this for more details:
How can get the text of a div tag using only javascript (no jQuery)

Make text content between specified HTML tags toUpperCase in React-Native

I want to make to uppercase the contents of specific HTML tags with plain JavaScript in a React-Native application.
Note: This is a React-Native application. There is no JS document, available, nor jQuery. Likewise, CSS text-transform: uppercase cannot be used because it will not be displayed in a web browser.
Let's say, there is the following HTML text:
<p>This is an <mytag>simple Example</mytag></p>
The content of the Tag <mytag> shall be transformed to uppercase:
<p>This is an <mytag>SIMPLE EXAMPLE</mytag></p>
I tried this code:
let regEx = storyText.match(/<mytag>(.*?)<\/mytag>/g)
if(regEx) storyText = regEx.map(function(val){
return val.toUpperCase();
});
But the map() function returns only the matched content instead of the whole string variable with the transformed part of <mytag>.
Also, the match() method will return null, if the tag wasn't found. So a fluent programming style like storyText.match().doSomething isn't possible.
Since there are more tags to transform, an approach where I can pass variables to the regex-pattern would be appreciated.
Any hints to solve this?
(This code is used in a React-Native-App with the react-native-html-view Plugin which doesn't support text-transform out of the box.)

Since it seems that document and DOM manipulation (e.g., i.e., through jQuery and native JS document functions) are off limits, I guess you do have to use regex.
Then why not just create a function that does a job like the above: looping through each tag and replacing it via regex?
var storyText = "your HTML in a string";
function tagsToUppercase(tags) {
for(tag in tags) {
let regex = new RegExp("(<" + tags[tag] + ">)([^<]+)(<\/" + tags[tag] + ">)", "g");
storyText = storyText.replace(regex, function(match, g1, g2, g3) {
return g1 + g2.toUpperCase() + g3;
});
}
}
// uppercase all <div>, <p>, <span> for example
tagsToUppercase(["div", "p", "span"]);
See it working on JSFiddle.
Also, although it probably doesn't apply to this case, (#Bergi urged me to remind you to) try to avoid using regular expressions to manipulate the DOM.

Edit, Updated
The content of the Tag < mytag > shall be transformed to uppercase:
<p>This is an <mytag>SIMPLE EXAMPLE</mytag></p>
You can use String.prototype.replace() with RegExp /(<mytag>)(.*?)(<\/mytag>)/g to create three capture groups, call .toUpperCase() on second capture group
let storyText = "<p>This is an <mytag>simple Example</mytag></p>";
let regEx = storyText.replace(/(<mytag>)(.*?)(<\/mytag>)/g
, function(val, p1, p2, p3) {
return p1 + p2.toUpperCase() + p3
});
console.log(regEx);

In general you shouldn't be parsing html with javascript. With that in mind, if this is what you truly need to do, then try something like this:
let story = '<p>smallcaps</p><h1>heading</h1><div>div</div><p>stuff</p>';
console.log( story.replace(/<(p|span|div)>([^<]*)<\/(p|span|div)>/ig,
(fullmatch, startag,content,endtag) => `<${startag}>${content.toUpperCase()}</${endtag}>` )
)
Consider the cases where you might have nested values, p inside a div, or an a or strong or em inside your p. For those cases this doesn't work.

Why not this way ?
$("mytag").text($("mytag").text().toUpperCase())
https://jsfiddle.net/gub61haL/

How can I truncate the text contents of an Element while preserving HTML?

I realize that there are several similar questions here but none of the answers solve my case.
I need to be able to take the innerHTML of an element and truncate it to a given character length with the text contents of any inner HTML element taken into account and all HTML tags preserved.
I have found several answers that cover this portion of the question fine as well as several plugins which all do exactly this.
However, in all cases the solution will truncate directly in the middle of any inner elements and then close the tag.
In my case I need the contents of all inner tags to remain intact, essentially allowing any "would be" truncated inner tags to exceed the given character limit.
Any help would be greatly appreciated.
EDIT:
For example:
This is an example of a link inside another element
The above is 51 characters long including spaces. If I wanted to truncate this to 23 characters, we would have to shorten the text inside the </a> tag. Which is exactly what most solutions out there do.
This would give me the following:
This is an example of a
However, for my use case I need to keep any remaining visible tags completely intact and not truncated in any way.
So given the above example, the final output I would like, when attempting to truncate to 23 characters is the following:
This is an example of a link
So essentially we are checking where the truncation takes place. If it is outside of an element we can split the HTML string to exactly that length. If on the other hand it is inside an element, we move to the closing tag of that element, repeating for any parent elements until we get back to the root string and split it there instead.

It sounds like you'd like to be able to truncate the length of your HTML string as a text string, for example consider the following HTML:
'<b>foo</b> bar'
In this case the HTML is 14 characters in length and the text is 7. You would like to be able to truncate it to X text characters (for example 2) so that the new HTML is now:
'<b>fo</b>'
Disclosure: My answer uses a library I developed.
You could use the HTMLString library - Docs : GitHub.
The library makes this task pretty simple. To truncate the HTML as we've outlined above (e.g to 2 text characters) using HTMLString you'd use the following code:
var myString = new HTMLString.String('<b>foo</b> bar');
var truncatedString = myString.slice(0, 2);
console.log(truncatedString.html());
EDIT: After additional information from the OP.
The following truncate function truncates to the last full tag and caters for nested tags.
function truncate(str, len) {
// Convert the string to a HTMLString
var htmlStr = new HTMLString.String(str);
// Check the string needs truncating
if (htmlStr.length() <= len) {
return str;
}
// Find the closing tag for the character we are truncating to
var tags = htmlStr.characters[len - 1].tags();
var closingTag = tags[tags.length - 1];
// Find the last character to contain this tag
for (var index = len; index < htmlStr.length(); index++) {
if (!htmlStr.characters[index].hasTags(closingTag)) {
break;
}
}
return htmlStr.slice(0, index);
}
var myString = 'This is an <b>example ' +
'of a link ' +
'inside</b> another element';
console.log(truncate(myString, 23).html());
console.log(truncate(myString, 18).html());
This will output:
This is an <b>example of a link</b>
This is an <b>example of a link inside</b>

Although HTML is notorious for being terribly formed and has edge cases which are impervious to regex, here is a super light way you could hackily handle HTML with nested tags in vanilla JS.
(function(s, approxNumChars) {
var taggish = /<[^>]+>/g;
var s = s.slice(0, approxNumChars); // ignores tag lengths for solution brevity
s = s.replace(/<[^>]*$/, ''); // rm any trailing partial tags
tags = s.match(taggish);
// find out which tags are unmatched
var openTagsSeen = [];
for (tag_i in tags) {
var tag = tags[tag_i];
if (tag.match(/<[^>]+>/) !== null) {
openTagsSeen.push(tag);
}
else {
// quick version that assumes your HTML is correctly formatted (alas) -- else we would have to check the content inside for matches and loop through the opentags
openTagsSeen.pop();
}
}
// reverse and close unmatched tags
openTagsSeen.reverse();
for (tag_i in openTagsSeen) {
s += ('<\\' + openTagsSeen[tag_i].match(/\w+/)[0] + '>');
}
return s + '...';
})
In a nutshell: truncate it (ignores that some chars will be invisible), regex match the tags, push open tags onto a stack, and pop off the stack as you encounter closing tags (again, assumes well-formed); then close any still-open tags at the end.
(If you want to actually get a certain number of visible characters, you can keep a running counter of how many non-tag chars you've seen so far, and stop the truncation when you fill your quota.)
DISCLAIMER: You shouldn't use this as a production solution, but if you want a super light, personal, hacky solution, this will get basic well-formed HTML.
Since it's blind and lexical, this solution misses a lot of edge cases, including tags that should not be closed, like <img>, but you can hardcode those edge cases or, you know, include a lib for a real HTML parser if you want. Fortunately, since HTML is poorly formed, you won't see it ;)

You've tagged your question regex, but you cannot reliably do this with regular expressions. Obligatory link. So innerHTML is out.
If you're really talking characters, I don't see a way to do it other than to loop through the nodes within the element, recursing into descendant elements, totalling up the lengths of the text nodes you find as you go. When you find the point where you need to truncate, you truncate that text node and then remove all following ones — or probably better, you split that text node into two parts (using splitText) and move the second half into a display: none span (using insertBefore), and then move all subsequent text nodes into display: none spans. (This makes it much easier to undo it.)

Thanks to T.J. Crowder I soon came to the realization that the only way to do this with any kind of efficiency is to use the native DOM methods and iterate through the elements.
I've knocked up a quick, reasonably elegant function which does the trick.
function truncate(rootNode, max){
//Text method for cross browser compatibility
var text = ('innerText' in rootNode)? 'innerText' : 'textContent';
//If total length of characters is less that the limit, short circuit
if(rootNode[text].length <= max){ return; }
var cloneNode = rootNode.cloneNode(true),
currentNode = cloneNode,
//Create DOM iterator to loop only through text nodes
ni = document.createNodeIterator(currentNode, NodeFilter.SHOW_TEXT),
frag = document.createDocumentFragment(),
len = 0;
//loop through text nodes
while (currentNode = ni.nextNode()) {
//if nodes parent is the rootNode, then we are okay to truncate
if (currentNode.parentNode === cloneNode) {
//if we are in the root textNode and the character length exceeds the maximum, truncate the text, add to the fragment and break out of the loop
if (len + currentNode[text].length > max){
currentNode[text] = currentNode[text].substring(0, max - len);
frag.appendChild(currentNode);
break;
}
else{
frag.appendChild(currentNode);
}
}
//If not, simply add the node to the fragment
else{
frag.appendChild(currentNode.parentNode);
}
//Track current character length
len += currentNode[text].length;
}
rootNode.innerHTML = '';
rootNode.appendChild(frag);
}
This could probably be improved, but from my initial testing it is very quick, probably due to using the native DOM methods and it appears to do the job perfectly for me. I hope this helps anyone else with similar requirements.
DISCLAIMER: The above code will only deal with one level deep HTML tags, it will not deal with tags inside tags. Though it could easily be modified to do so by keeping track of the nodes parent and appending the nodes to the correct place in the fragment. As it stands, this is fine for my requirements but may not be useful to others.

Replace text at top-level of node without breaking formatting of children

If I have the following html code:
<div id="element">
Qwerty Foo Bar
<span style="color: red;">This text should/will never be changed by the script.</span>
</div>
And I want to change "Foo" to "baz", I can do the following:
var element = document.getElementById('element');
element.innerText = element.innerText.replace('Foo', 'baz');
However this will destroy the formatting of the red text.
How can you do this?
I don't need cross-browser support, only support for chrome and I don't want to use a js framework. jsFiddle: http://jsfiddle.net/cLzJD/3/

You can iterate over the children and only modify text nodes:
var children = element.childNodes,
child;
for(var i = children.length; i--; ) {
child = children[i];
if(child.nodeType === 3) {
child.nodeValue = child.nodeValue.replace('Foo', 'baz');
}
}
DEMO
Notes:
If you want to replace all occurrences of Foo, you have to use a regular expression: replace(/Foo/g, 'baz').
The advantage of this approach is that event handlers bound through JavaScript will stay intact. If you don't need this, innerHTML will work as well.

Although #Felix Kling's solution is the best approach, in your special case you could use .innerHTML instead of .innerText.
element.innerHtml = element.innerHtml.replace('Foo', 'baz');
.replace() will only replace the first occurrence, so if you're sure there is no HTML content before your text, you can use it. Otherwise it could break your HTML.

You are losing the formatting because you're using innerText (which will return the contents with all the HTML stripped out). Just use innerHTML instead: http://jsfiddle.net/cLzJD/4/ (I've also changed the ids to be unique).

Javascript search for all occurences of a character in the dom?

I would like to find all occurrence of the $ character in the dom, how is this done?

You can't do something semantic like wrap $4.00 in a span element?
<span class="money">$4.00</span>
Then you would find elements belonging to class 'money' and manipulate them very easily. You could take it a step further...
<span class="money">$<span class="number">4.00</span></span>
I don't like being a jQuery plugger... but if you did that, jQuery would probably be the way to go.

One way to do it, though probably not the best, is to walk the DOM to find all the text nodes. Something like this might suffice:
var elements = document.getElementsByTagName("*");
var i, j, nodes;
for (i = 0; i < elements.length; i++) {
nodes = elements[i].childNodes;
for (j = 0; j < nodes.length; j++) {
if (nodes[j].nodeType !== 3) { // Node.TEXT_NODE
continue;
}
// regexp search or similar here
}
}
although, this would only work if the $ character was always in the same text node as the amount following it.

You could just use a Regular Expression search on the innerHTML of the body tag:
For instance - on this page:
var body = document.getElementsByTagName('body')[0];
var dollars = body.innerHTML.match(/\$[0-9]+\.?[0-9]*/g)
Results (at the time of my posting):
["$4.00", "$4.00", "$4.00"]

The easiest way to do this if you just need a bunch of strings and don't need a reference to the nodes containing $ would be to use a regular expression on the body's text content. Be aware that innerText and textContent aren't exactly the same. The main difference that could affect things here is that textContent contains the contents of <script> elements whereas innerText does not. If this matters, I'd suggest traversing the DOM instead.
var b = document.body, bodyText = b.textContent || b.innerText || "";
var matches = bodyText.match(/\$[\d.]*/g);

I'd like to add my 2 cents for prototype. Prototype has some very simple DOM traversal functions that might get exactly what you are looking for.
edit so here's a better answer
the decendants() function collects all of the children, and their children and allows them to be enumerated upon using the each() function
$('body').descendants().each(function(item){
if(item.innerHTML.match(/\$/))
{
// Do Fun scripts
}
});
or if you want to start from document
Element.descendants(document).each(function(item){
if(item.innerHTML.match(/\$/))
{
// Do Fun scripts
}
});

Develop Reference

JavaScript is the programming language of the Web.