Search on page with javascript

Search on page with javascript - javascript

I have a html page, and I want to find on it some data, but main trouble is that page is generated on server, and I want to write javascript code on my local machine and run it. So how can I write and run on local machine javascript code, so that it will find text, or get element by id/class?
Note, this is important: only pure javascript, no libraries like jQuerys and etc!
Thank you.

Updated answer:
I didn't understand at first that you want to call up a web page you're not in control of, and then use JavaScript in your browser to interact with it.
The information in the original answer below is still relevant, but the question is: How do you make the code run in the right context? And the answer is: There are at least two ways:
Any decent browser these days has built-in debugging tools. Look on the menus for them, but in many browsers they're accessible via the F12 key or Ctrl+Shift+I. In those tools, you'll find a "console" where you can type JavaScript and have it run in the context of the page you're looking at.
This is great for doing things interactively, but it's a bit of a pain to retype it every time. You can also put the code in a local file (say, /home/tjc/foo.js) and then when you go to the page, use the console to append that script to the page (which will cause it to execute, within the context of the page), like this:
document.documentElement.appendChild(document.createElement('script')).src = "file:///home/tjc/foo.js";
Once you have your script doing what you want, you might want to turn it into a bookmarklet. This is a browser bookmark using the javascript: scheme rather than the usual http: and such. See the link for details. You'll want a tool that takes your JavaScript code and does the necessary URL-encoding for you, like the Bookmarklet Crunchinator or similar.
Original answer:
... so that it will find text, or get element by id/class...
Those are three very different questions:
To find text on the page, you have a couple of options:
If you only want to find the text but don't much care exactly what element contains it, you can just look through innerHTML on document.body. innerHTML is a string; when you access it, the browser creates an HTML string for all of the DOM elements in the element you call it on (and its descendants). Note that this is not the original content from the server; this is created on-the-fly when you access the element. For a lot of use-cases, getting this string and then looking through it could be useful. Note that the text you're searching through is markup, so for instance, if you searched for the word "table" you might find it in a sentence ("We sat down at the table.") or in markup (<table>...).
Here's an example of counting the word I'm on the page using innerHTML: live copy | source - See note about examples at the end.
(function() {
var pageText = document.body.innerHTML;
display('Count of "I\'m" on the page: ' +
pageText.match(/I'm/g).length);
function display(msg) {
var p = document.createElement('p');
p.innerHTML = String(msg);
document.body.appendChild(p);
}
})();
If you need to find out exactly what element it's in, you'll need to write a recursive function that walks through the nodes of the page and, for Text nodes, looks at the text within. Here's a basic example (the function is the walk function): Live copy | source - See note about examples at the end.
(function() {
var matches = [], index;
walk(matches, document.body, "");
function walk(matches, node, path) {
var child;
switch (node.nodeType) {
case 1: // Element
for (child = node.firstChild; child; child = child.nextSibling) {
walk(matches, child, path + "/" + node.tagName);
}
break;
case 3: // Text
if (node.nodeValue.indexOf("I'm") !== -1 ) {
matches.push("Found it at " + path);
}
break;
}
}
display("Matches found (" + matches.length + "):");
for (index = 0; index < matches.length; ++index) {
display(matches[index]);
}
function display(msg) {
var p = document.createElement('p');
p.innerHTML = String(msg);
document.body.appendChild(p);
}
})();
To find an element on the page by id, use document.getElementById.
To find an element on the page by class, on most modern browsers you can use either getElementsByClassName or querySelectorAll.
Note about the examples: I'm using JSBin, which puts the JavaScript you see on the left-hand side in the "source" view at the end of the HTML you see on the right (just before the closing </body> tag), by default. This is in keeping with best practices.
Reading:
DOM2 Core
DOM2 HTML
DOM3 Core
HTML5 Web Application APIs

If you are looking for imacros solution, then it'some like this:
var reportDataTable = window.content.document.getElementById("yoursid");
if (reportDataTable == null)
{
iimPlay("mac1.iim");
}
else
{
iimDisplay("stop!");
}
Where mac1.iim is macros, which would repeated, until
window.content.document.getElementById("yoursid");
will not be founded

Related

Chrome extension - The new >>Manifest_version: 3<< Problem [duplicate]

Can the JavaScript command .replace replace text in any webpage? I want to create a Chrome extension that replaces specific words in any webpage to say something else (example cake instead of pie).

The .replace method is a string operation, so it's not immediately simple to run the operation on HTML documents, which are composed of DOM Node objects.
Use TreeWalker API
The best way to go through every node in a DOM and replace text in it is to use the document.createTreeWalker method to create a TreeWalker object. This is a practice that is used in a number of Chrome extensions!
// create a TreeWalker of all text nodes
var allTextNodes = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT),
// some temp references for performance
tmptxt,
tmpnode,
// compile the RE and cache the replace string, for performance
cakeRE = /cake/g,
replaceValue = "pie";
// iterate through all text nodes
while (allTextNodes.nextNode()) {
tmpnode = allTextNodes.currentNode;
tmptxt = tmpnode.nodeValue;
tmpnode.nodeValue = tmptxt.replace(cakeRE, replaceValue);
}
To replace parts of text with another element or to add an element in the middle of text, use DOM splitText, createElement, and insertBefore methods, example.
See also how to replace multiple strings with multiple other strings.
Don't use innerHTML or innerText or jQuery .html()
// the innerHTML property of any DOM node is a string
document.body.innerHTML = document.body.innerHTML.replace(/cake/g,'pie')
It's generally slower (especially on mobile devices).
It effectively removes and replaces the entire DOM, which is not awesome and could have some side effects: it destroys all event listeners attached in JavaScript code (via addEventListener or .onxxxx properties) thus breaking the functionality partially/completely.
This is, however, a common, quick, and very dirty way to do it.

Ok, so the createTreeWalker method is the RIGHT way of doing this and it's a good way. I unfortunately needed to do this to support IE8 which does not support document.createTreeWalker. Sad Ian is sad.
If you want to do this with a .replace on the page text using a non-standard innerHTML call like a naughty child, you need to be careful because it WILL replace text inside a tag, leading to XSS vulnerabilities and general destruction of your page.
What you need to do is only replace text OUTSIDE of tag, which I matched with:
var search_re = new RegExp("(?:>[^<]*)(" + stringToReplace + ")(?:[^>]*<)", "gi");
gross, isn't it. you may want to mitigate any slowness by replacing some results and then sticking the rest in a setTimeout call like so:
// replace some chunk of stuff, the first section of your page works nicely
// if you happen to have that organization
//
setTimeout(function() { /* replace the rest */ }, 10);
which will return immediately after replacing the first chunk, letting your page continue with its happy life. for your replace calls, you're also going to want to replace large chunks in a temp string
var tmp = element.innerHTML.replace(search_re, whatever);
/* more replace calls, maybe this is in a for loop, i don't know what you're doing */
element.innerHTML = tmp;
so as to minimize reflows (when the page recalculates positioning and re-renders everything). for large pages, this can be slow unless you're careful, hence the optimization pointers. again, don't do this unless you absolutely need to. use the createTreeWalker method zetlen has kindly posted above..

have you tryed something like that?
$('body').html($('body').html().replace('pie','cake'));

Run script after appending it to the HTML

I have a string with HTML:
var str = '<div><p>Examplee</p></div><script>alert("testing!")</script>';
and then I append it to the HTML:
document.body.innerHTML += str;
and the content is appended but the script does not execute, is there a way to force it?

First, a caveat: Naturally, only do this with scripts you trust. :-)
There are a couple of ways. Basically, you need to:
Get the text of the script, and
Run it
Getting the text
One option is just to go ahead and add it, then find it and grab its text:
document.body.innerHTML += str;
var scripts = document.querySelectorAll("script");
var text = scripts[scripts.length - 1].textContent;
On obsolete browsers, you may need to feature-detect textContent vs. innerText.
You might want to give it an identifying characteristic (id, class, etc.) so you don't have to find it by position like that.
Alternately, you could do what the PrototypeJS lib does and try go get it from the string with regex. You can find their source code for doing that here (look for extractScripts).
Running it
Once you have the text of the script, you have several options:
Use indirect eval (aka "global eval") on it: (0, eval)(text). This is not the same as eval(text); it runs the code in global scope, not local scope.
Create a new script element, set its text to the text, and add it
var newScript = document.createElement("script");
newScript.textContent = text;
document.body.appendChild(newScript);
If doing that, might make sense to remove the original, though it doesn't really matter.
Example that grabs the text of the script element after adding it and uses indirect eval:
var str = '<div><p>Examplee</p></div><script>alert("testing!")<\/script>';
document.body.innerHTML += str;
var scripts = document.querySelectorAll("script");
(0, eval)(scripts[scripts.length - 1].textContent);
Presumably you don't really use += on document.body.innerHTML, which builds an HTML string for the whole page, appends to it, tears the whole page down, and then parses the new HTML to build a new one. I assume that was just an example in your question.

jQuery provides the $.getScript(url [,success]) function. You can then load and execute your code from a separate jquery file which helps to manage and control the flow of execution.
basically put your alert("testing!") script inside a separate file, for instance alert.js in the same directory.
Then you can run the script when adding your employee to the HTML.
var str = '<div><p>Examplee</p></div>';
var url = 'alert.js';
document.body.innerHTML += str;
$.getScript(url);
I know this may seem like more work, but it is better practice to keep your javascript out of your HTML. You can also use a callback to gather user data after the alert or notify another process that the user has been alerted.
$.getScript(url, function(){
//do something after the alert is executed.
});
For instance maybe it would be a better design to add the employee after the alert is executed.
var str = '<div><p>Examplee</p></div>';
var url = 'alert.js';
$.getScript(url, function(){
document.body.innerHTML += str;
});
Edit: I know jQuery is not tagged, but I am also no petitioning to be the accepted answer to this question. I am only offering another alternative for someone who may run into the same issue and may be using jQuery. If that is the case $.getScript is a very useful tool designed for this exact problem.

You should change the HTML after it was loaded.
Try this:
document.addEventListener("DOMContentLoaded", function(event) {
document.body.innerHTML += str;
});

sports - title-- -- > {} third-- -- > undefined

i am new to js
Around eight hours I am trying to debug why I am getting the below empty object
document.getElementsByClassName("sports-title") is working fine in fiddle but when I put in my code base its not working fine.
it is returning like this so I am not able to proceed.
codebase output
sports - title-- -- > {}
third-- -- > undefined
fiddle ouput
sports-title---->{"0":{}} third---->{}
I am using same html structure.
can you guys tell me what could be problem so that I can proceed.
findStringInsideDiv() {
var sportsTitle = document.getElementsByClassName("sports-title");
var third = sportsTitle[0];
var thirdHTML = third.innerHTML
//str = str.split(" ")[4];
console.log("sports-title---->" + JSON.stringify(sportsTitle));
console.log("third---->" + JSON.stringify(third));
console.log("thirdHTML---->" + JSON.stringify(thirdHTML));
if ( thirdHTML === " basketball football swimming " ) {
console.log("matching basketball---->");
var menu = document.querySelector('.sports');
menu.classList.add('sports-with-basketball');
// how to add this class name directly to the first div after body.
// but we are not rendering that div in accordion
//is it possible
}
else{
console.log("not matching");
}
}

If document.getElementsByClassName("sports-title") isn't returning any elements this could mean:
Your HTML source doesn't have any elements with class="sports-title", possibly because of a syntax error, spelling error, etc.. Try inspecting your web page with a browser's DOM inspector and look for the elements which you think should be in the sports-title class.
Or
Your Javascript is executing before the .sports-title elements are actually added to the document, because scripts are (normally) executed synchronously during document parsing, as soon as the parser encounters them. Try moving the <script> element to the end of the <body>, after all the other elements are defined.
There may be other possible causes that I can't think of right now.

This probably isn't the answer but I can't leave a comment so here goes:
I was messing with your jsFiddle and I noticed that if you change JSON.stringify(object) to object.ToString that they turn into undefined. So my question to you is are you sure you're code in your codebase matches the jsfiddle?
Also if you're using JSfiddle to make and test your code first, you might consider installing Brackets.io. It has a nifty live update feature that makes web development easier for beginners and it opens up a preview on the browser. I've noticed in the past that JSfiddle doesn't always operate the same as a browser.

Any JavaScript templating library/system/engine/technique that returns a DOM fragment?

I have to make a high speed web app and I need a JavaScript templating library/system/engine/technique that returns a DOM Fragment instead of a String containing HTML.
Of course it should have some similar language as Resig's Micro-Templating
I am expecting something like this after compilation:
function myTemplate(dataToRender){
var fragment = document.createDocumentFragment();
fragment = fragment.appendChild(document.createElement('h1'));
fragment.appendChild(document.createTextNode(dataToRender.title));
fragment = fragment.parentNode;
fragment = fragment.appendChild(document.createElement('h2'));
fragment.appendChild(document.createTextNode(dataToRender.subTitle));
fragment = fragment.parentNode;
return fragment;
}
Is there any option?
Edit: Template function should not concatenate HTML strings. It decreases speed.
JQuery templates are working with strings internally. So Resig's Micro-Templating.
Edit2: I just did a benchmark on jsPerf. It is the first benchmark I did in JavaScript so some check it out(I am not sure if it's correct).

Check out jquery templates. http://api.jquery.com/category/plugins/templates/
It let's you create html fragments with keywords like "if", "each", etc and undeclared variables. Then you can call "tmpl" on a fragment from JavaScript with some values, and a DOM element is returned.

I had a go at this in this jsFiddle. Replacing a chunk of content is fastest when using DOM methods, but setting innerHTML isn't cripplingly slower and probably acceptable if your templates aren't very complex and you won't lose too much time in the string manipulation. (This isn't very surprising, "dealing with broken HTML quickly" is kind of what browsers are supposed to do and innerHTML is an ancient and popular property that probably had lots of optimisation go into it.) Adding another join() step in the innerHTML method also didn't really slow it down.
Conversely, using jQuery.tmpl() /and/ the DOM fragment method was orders of magnitude slower in Chrome on Mac. Either I'm doing something wrong in the dom_tmpl function, or deep-cloning DOM nodes is inherently slow.
I commented out the append tests because they froze up the tab process when you run the whole suite - thousands through tens of thousands of nodes shoved into a document probably confuse Chrome somehow. Appending with innerHTML alone ended up glacially slow because the string ends up being really huge.
The conclusion would seem to be: unless done stupidly or on very large strings, concatenating strings in a templating library likely isn't going to be what makes it slow, while trying to be clever with cloning chunks of the DOM will. Also, jQuery.tmpl() handled 2000-ish ops/sec on my computer, and 500-ish on my iPhone 4, this is likely "fast enough" if you're targetting these platforms. It was also in the same ballpark as the DOM fragment function making the latter largely pointless.
If you mostly need to replace the content of existing nodes and your templates aren't very large, use Underscore.js's templating and innerHTML. Underscore.js seems to do ten passes through the whole string, so if your templates /are/ large this could be an issue.
If you need to append to existing nodes, you can avoid serialising and reparsing the existing content by creating a wrapper element, seting its innerHTML, then append the wrapper element to the target node.
If you really want speed or your templates are crazy large, you'll probably have to do something like having a server-side script precompile your templates into Javascript that creates the respective nodes.
(Disclaimer: I don't claim to be any good at constructing test cases and benchmarks and only tested this in WebKit, you should tailor this to your use case and get a more relevant set of numbers.)
Update: I updated my jsFiddle benchmark to not use any jQuery features, in case its presence (user data on nodes etc.) was the cause of DOM node cloning being slow. Didn't help much.

I don't know if this is what you're searching for, but Underscore.js has a template utility.
Also, jquery can return the DOM of a matched element.

You could create individual objects that represent regions of your page or even go down as far as the individual element level, and do all this without resorting to DOM scripting which will be super slow. For instance:
function buttonFrag(data) {
this.data = data;
}
buttonFrag.prototype = (function() {
return {
_html : function(h) {
h.push("<h1>",data.title,"</h1>");
h.push("<h2>",data.subTitle,"</h2>");
},
render : function(id) {
var html = [];
this._html(html);
document.getElementById.innerHTML = html.join("");
}
}
})();
To implement this, you'd simply create a new object then invoke its render method to an id on your page:
var titleFragObj = new titleFrag({title: "My Title",subTitle: "My Subtitle");
titleFragObj.render("someId");
Of course you could get a bit more creative about the render method and use something like jQuery to write to a selector, then use the .html or .append methods like this:
render : function(selectorString, bAppend) {
var html = [];
this._html(html);
var htmlString = html.join("");
var destContainer = $(selectorString);
if (bAppend) {
destContainer.append(htmlString);
} else {
destContainer.html(htmlString);
}
}
In that case you'd just provide a selector string and whether or not you want to append to the end of the container, or completely replace its contents:
titleFragObj.render("#someId",true);
You could even go so far as to create a base object from which all your fragments descend from, then all you'd do is override the _html method:
function baseFragement(data) {
this.data = data;
}
baseFragment.prototype = (function() {
return {
_html : function() {
//stub
},
render : function(selectorString, bAppend) {
var html = [];
this._html(html);
var htmlString = html.join("");
var destContainer = $(selectorString);
if (bAppend) {
destContainer.append(htmlString);
} else {
destContainer.html(htmlString);
}
}
};
})();
Then all descendants would look something like this:
function titleFrag(data) {
baseFragment.call(this,data);
}
titleFrag.prototype = new baseFragment();
titleFrag.prototype._html = function() {
h.push("<h1>",data.title,"</h1>");
h.push("<h2>",data.subTitle,"</h2>");
}
You could create an entire library of little fragment generators that descend from that common base class.

Looking for a way to search an html page with javascript

what I would like to do is to the html page for a specific string and read in a certain amount of characters after it and present those characters in an anchor tag.
the problem I'm having is figuring out how to search the page for a string everything I've found relates to by tag or id. Also hoping to make it a greasemonkey script for my personal use.
function createlinks(srchstart,srchend){
var page = document.getElementsByTagName('html')[0].innerHTML;
page = page.substring(srchstart,srchend);
if (page.search("file','http:") != -1)
{
var begin = page.search("file','http:") + 7;
var end = begin + 79;
var link = page.substring(begin,end);
document.body.innerHTML += 'LINK | ';
createlinks(end+1,page.length);
}
};
what I came up with unfortunately after finding the links it loops over the document again

Assisted Direction
Lookup JavaScript Regex.
Apply your regex to the page's HTML (see below).
Different regex functions do different things. You could search the document for the string, as suggested, but you'd have to do it recursively, since the string you're searching for may be listed in multiple places.
To Get the Text in the Page
JavaScript: document.getElementsByTagName('html')[0].innerHTML
jQuery: $('html').html()
Note:
IE may require the element to be capitalized (eg 'HTML') - I forget
Also, the document may have newline characters \n that might want to take out, since one could be between the string you're looking for.

Okay, so in javascript you've got the whole document in the DOM tree. You an search for your string by recursively searching the DOM for the string you want. This is striaghtforward; I'll put in pseudocode because you want to think about what libraries (if any) you're using.
function search(node, string):
if node.innerHTML contains string
-- then you found it
else
for each child node child of node
search(child,string)
rof
fi

Develop Reference

JavaScript is the programming language of the Web.