JavaScript regex that matches the .innerHTML attribute of any element

JavaScript regex that matches the .innerHTML attribute of any element - javascript

I am currently building a Chrome extension which has to find specific pages in a website specifically the Log In / Sign In page, the Sign Up / Register page, the About page and the Contact Us page.
I am trying to achieve this by first getting the list of elements in the page (which I have already done). Now I need to check the innerHTML of the element such that it is a leaf node in the DOM and contains a part of the keyword, and I am trying to do this with a regex. I managed to build a regex which successfully returns what's in between a start or end tag of an element (i.e. the tag name along with its attributes), but not the innerHTML. Below is what I have done so far (with the example for the About page:
var list = document.body.getElementsByTagName("*");
var aboutElement = /^[^<.+>].*About.*[^(<.+>]$/i;
for (var i = 0; i <= list.length; i++) {
if ((aboutElement.test(list[i].innerHTML)) || (aboutElement.test(list[i].alt))) {
list[i].click();
}
}
Any idea what I should add to it such that it only matches leaf nodes (nodes which do not contain other nodes) and not what's in a start or end tag? I also think that with what I've done it's going to match everything in the innerHTML because of the .* part so I may need to change that as well. Any help would be greatly appreciated!

Thanks to two of the answers in the comments I managed to solve the problem. I used .textContent and changed the regex as shown below and it worked.
var list = document.body.getElementsByTagName("*");
var aboutElement = /^(.*?\s*(\bAbout\b)[^$]*)$/i;
for (var i = 0; i <= list.length; i++) {
if ((aboutElement.test(list[i].textContent)) || (aboutElement.test(list[i].alt))) {
list[i].click();
}
}

Related

What is a safe alternative to .innerHTML and Jquery .append?

Firefox addon got removed because of unsafe assignment to innerHTML. Replaced it with the Jquery append(), but it is also unsafe. What can I use instead?
I need to add content dynamically to the extension's DOM. Mozilla gave me this link when they took down the addon: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Safely_inserting_external_content_into_a_page
But I can not figure out from the link how to add nodes to an existing node in a safe way.
This is an example of what i'm doing now. I have several places where I append a node to another node. In this case I need to make a div for every category and add it to the node "folders"
var folders = $("#folders");
for (var i = 0; i < categories.length; i++) {
var s = categories[i];
var categoryFolder = $("<div>");
categoryFolder.addClass("folder");
categoryFolder.attr("id",s);
categoryFolder.text(s);
folders.append(categoryFolder);
}

JavaScript + Regex:: Replace "foo" with "bar" across entire document, excluding URLs

I'm trying to replace all instances of "foo" on a page with "bar", but to exclude instances occurring within image or URL links.
The current code I have is a simple replace:
document.documentElement.innerHTML = document.documentElement.innerHTML.replace(/foo/g, "bar");
But it breaks images and links containing "foo" in the address.
I'm looking for a regular expression replacement that will take the following:
foo
barfoo
foo
<img src="foo.jpg">
And give me:
bar
barbar
bar
<img src="foo.jpg">
If this can't be accomplished with regex in JavaScript, would there be a more elegant way to only run the replacement against non-URL strings?

Yeah, you're not going to want to use regex to do this. What you want to do is replace the text of every text node in your DOM tree. Try something like this.
var allElements = document.getElementsByTagName("*"); // Get every element.
for (var i = 0; i < allElements.length; i++) {
var children = allElements.item(i).childNodes;
for (var j = 0; j < children.length; j++) {
if (children[j].nodeType === 3 /* is this node a text node? */) {
children[j].nodeValue = children[j].nodeValue.replace(/* run your replacement regex here */).
}
}
}

There are 2 problems to solve.
Firstly, you need to get all the text nodes. This is a problem in and of itself.
This thread on stackoverflow discusses some techniques.
getElementsByTagName() equivalent for textNodes
Once you have your text nodes, you can run your regex on each node, and be fairly certain that you got everything.

Delete from all elements of the same class

I was looking for a way to search through all elements of the same class and remove any as these were causing unwanted gaps in my page layout.
Initially I used this code:
var el = document.querySelector('.offer');
el.innerHTML = el.innerHTML.replace(' ', '');
But this only finds the first node with the class of offer so isn't much use.

I'm answering my own question because I had to piece it together from a number of posts on here as well as other sites and I hope it helps others in my position.
Firstly I needed to use .queryselectorAll instead of .querySelector to return all elements with the offer class.
But the next line wont work since .queryselectorAll returns a string of nodes rather than just the first one it comes across.
el.innerHTML = el.innerHTML.replace(' ', '');
The solution is to loop through each element and replace each instance of
var el = document.querySelectorAll('.offer');
for(var i = 0; i < el.length; i++){
el[i].innerHTML = el[i].innerHTML.replace(' ', '');
}

javascript string.search not working in chrome

i am having a peculiar issue with Chrome at the moment... here's what i'm trying to accomplish:
i have a series of table sections, which have been identified with their IDs accordingly, like this:
T = Tab
G = Group within Tab
S = Sub-Group within Group
# = Numerical index
for example:
<tr id="T1"> = Tab 1
<td id="T1G3"> = Tab 1 , Group 3
<td id="T1G3S1"> = Tab 1, Group 3, Sub-Group 1
Pretty straight forward so far... with the help of JavaScript, i also aim to enable or disabled these groups on the form. Now, here's the problem i'm having... when my form loads the first time, i want to disable everything on the form that requires it. To do so, i created a dynamic function, that could do that for me, where i would specify which tags are affected, and what to look for within the IDs of those tags, and if a match occurs, disable it, like this:
Pseudo and Definition:
Function DisableAll(string TagNamesCSArray, string RegExpContent)
{
Split the tag names provided into an array
- loop through the array and get all tags using document.getElementsByTagName() within page
-- if tags are found
--- loop through collection of tags/elements found
---- if the ID of the element is present, and MATCHES the RegExp in any way
----- disable that item
---- end if
--- end loop
-- end if
- end loop
}
this was fairly easy to implement, and this is the final result:
function DisableAll(TagNames, RegExpStr)
{
//declare local vars
var tagarr = TagNames.split(",");
var collection1;
var IdReg = new RegExp(RegExpStr);
var i;
//loop through getting all the tags
for (i = 0; i < tagarr.length; i++)
{
collection1 = document.getElementsByTagName(tagarr[i].toString())
//loop through the collection of items found, if found
if (collection1)
{
for (y = 0; y < collection1.length; y++)
{
if (collection1[y].getAttribute("id") != null)
{
if (collection1[y].getAttribute("id").toString().search(IdReg) != -1)
{
collection1[y].disabled = true;
}
}
}
}
}
return;
}
And then i place a call to it like this:
DisableAll("tr,td", "^T|^T[0-9]S");
seems simple yes? "Hannnn!" wrong answer batman... this works PERFECTLY, in all browsers, except for Chrome... now why is that? i don't understand. Maybe there's something wrong with my RegExp?
Any help would be greatly appreciated.
Cheers!
MaxOvrdrv

In my case the regex match all possibilities. But the line collection1[y].disabled = true; has no effect because disabled is not a property of a DOM Node.
BTW: The second part of your regex is unnecessary because "^T" will match every ID which begins with T following by a number.

Javascript search for all occurences of a character in the dom?

I would like to find all occurrence of the $ character in the dom, how is this done?

You can't do something semantic like wrap $4.00 in a span element?
<span class="money">$4.00</span>
Then you would find elements belonging to class 'money' and manipulate them very easily. You could take it a step further...
<span class="money">$<span class="number">4.00</span></span>
I don't like being a jQuery plugger... but if you did that, jQuery would probably be the way to go.

One way to do it, though probably not the best, is to walk the DOM to find all the text nodes. Something like this might suffice:
var elements = document.getElementsByTagName("*");
var i, j, nodes;
for (i = 0; i < elements.length; i++) {
nodes = elements[i].childNodes;
for (j = 0; j < nodes.length; j++) {
if (nodes[j].nodeType !== 3) { // Node.TEXT_NODE
continue;
}
// regexp search or similar here
}
}
although, this would only work if the $ character was always in the same text node as the amount following it.

You could just use a Regular Expression search on the innerHTML of the body tag:
For instance - on this page:
var body = document.getElementsByTagName('body')[0];
var dollars = body.innerHTML.match(/\$[0-9]+\.?[0-9]*/g)
Results (at the time of my posting):
["$4.00", "$4.00", "$4.00"]

The easiest way to do this if you just need a bunch of strings and don't need a reference to the nodes containing $ would be to use a regular expression on the body's text content. Be aware that innerText and textContent aren't exactly the same. The main difference that could affect things here is that textContent contains the contents of <script> elements whereas innerText does not. If this matters, I'd suggest traversing the DOM instead.
var b = document.body, bodyText = b.textContent || b.innerText || "";
var matches = bodyText.match(/\$[\d.]*/g);

I'd like to add my 2 cents for prototype. Prototype has some very simple DOM traversal functions that might get exactly what you are looking for.
edit so here's a better answer
the decendants() function collects all of the children, and their children and allows them to be enumerated upon using the each() function
$('body').descendants().each(function(item){
if(item.innerHTML.match(/\$/))
{
// Do Fun scripts
}
});
or if you want to start from document
Element.descendants(document).each(function(item){
if(item.innerHTML.match(/\$/))
{
// Do Fun scripts
}
});

Develop Reference

JavaScript is the programming language of the Web.

JavaScript regex that matches the .innerHTML attribute of any element - javascript

Related

What is a safe alternative to .innerHTML and Jquery .append?

JavaScript + Regex:: Replace "foo" with "bar" across entire document, excluding URLs

Delete from all elements of the same class

javascript string.search not working in chrome

Javascript search for all occurences of a character in the dom?

Categories

Resources