Get all href links in DOM - javascript

I need to write code that puts all of the href links from a webpage into an array. Here's what I have so far:
var array = [];
var links = document.links;
for(var i=0; i<links.length; i++) {
array.push(links[i].href);
}
However, this does not work on a page like Gmail's inbox, because the some of the links are within an iframe. How can I get ALL of the links, including the ones inside the iframe?
Also, this is for a google chrome extension. In the manifest, I have all_frames set to true - does this make a difference?
Thanks

One thing to remember that
document.links
document.images
document.forms
document.forms[0].elements
document.getElementsByName()
document.getElementsByClassName()
document.getElementsByTagName()
are live queries to DOM objects, therefore in forLoops it may significantly slow down your execution (as i < links.length is queries on each for cycle), if you check the array length like this:
var array = [];
var links = document.getElementsByTagName("a");
for(var i=0; i<links.length; i++) {
array.push(links[i].href);
}
instead you better do this:
var array = [];
var links = document.getElementsByTagName("a");
for(var i=0, max=links.length; i<max; i++) {
array.push(links[i].href);
}

Surely you're going to get 'arr is not defined' with your code to begin with?
var array = [];
var links = document.links;
for(var i=0; i<links.length; i++) {
arr.push(links[i].href);
}
Try:
var array = [];
var links = document.getElementsByTagName("a");
for(var i=0; i<links.length; i++) {
array.push(links[i].href);
}

I have a method I use to access data in an IFrame. How fun that the answer is never just written down to read and use :P.
Feel free to modify and abuse:
public HtmlElementCollection GetIFrameElements(String tmpTag, int Frame)
{
HtmlElementCollection tmpCollection = mWebBrowser.Document.Window.Frames[Frame].Document.Body.GetElementsByTagName(tmpTag);
return tmpCollection;
}
I then use it to look for whatever element Im after:
foreach (HtmlElement el in GetElements("input"))
{
if (el.GetAttribute("id").Equals("hasNoGoogleAccount"))
{
el.InvokeMember("click");
}
}
You could always change the method to loop through and get all iFrames etc blah blah but that should be enough to get you moving.
Rate me! Im new

From my Web Adjuster's bookmarklet code,
function all_frames_docs(c) {
var f=function(w) {
if(w.frames && w.frames.length) {
var i; for(i=0; i<w.frames.length; i++) f(w.frames[i])
} c(w.document) };
f(window) }
You can pass any function into all_frames_docs and it will be called in turn on every frame and iframe in the current window, provided your script has access to such (i.e. it's an extension or a bookmarklet). So all you have to do now is code the function to handle each document, which can go through document.getElementsByTagName("a") or whatever, and make this function the parameter of your call to all_frames_docs.

Related

Find and list all URL-s as clickable links to the bottom of the page

I'm trying to write a javascript what is searching for all of the links on the page, then it is adding them to the bottom, under the original content.
"Document.links" seems to do the finding part, it is also listing them, but they are not clickable. So I tried to add some html codes (startHref and endHref lines), which broke the whole thing of course.
My (non-working) script:
var links = document.links;
for(var i = 0; i < links.length; i++) {
var preLink = document.createTextNode("LINK: ");
var linkHref = document.createTextNode(links[i].href);
var lineBreak = document.createElement("br");
var startHref = document.createElement("a href="");
var endHref = document.createElement(""");
document.body.appendChild(preLink);
document.body.appendChild(startHref);
document.body.appendChild(linkHref);
document.body.appendChild(endHref);
document.body.appendChild(lineBreak);
}
If this will work I'd also like to have them listed with a number in front of each line (starting with 1 - could be set in the preLink part) - if not too hard to implement.
Also, is there a way to list not all of the links, but only those matching with something? Like only links with a specific domain. Thank you!
As you have already found out, you can get all links in a document with:
var links = document.links;
Now you have an HTMLCollection. You can iterate through it and display all links. For better layout you can put them in a paragraph (p). This would be the loop:
for (var i = 0; i < links.length; i++) {
var p = document.createElement("p");
p.appendChild(links[i]);
document.body.appendChild(p);
}
Now all links are appended at the end of the page, every link is on its own line and they are clickable. Please try this out.
EDIT: as of your comment, if I understand it right, you have just to put one additional line:
for (var i = 0; i < links.length; i++) {
var p = document.createElement("p");
// the following line is added
links[i].innerHTML = links[i].href;
p.appendChild(links[i]);
document.body.appendChild(p);
}
That line will simply replace the inner HTML of the link with its value for the attribute href.
EDIT:
The variable links just points to document.links. The existing links are therefore removed from their original position and appended to the end. If you try to create new links in the for loop, like document.createElement("a") you will create an endless loop, because you're iterating through all links in the document. You remember, the variable links is not a snapshot of document.links when created, but points to it.
You can work around this with creating an array:
var links = [];
// populate the array links
for (var j = 0; j < document.links.length; j++) {
links.push(document.links[j].cloneNode());
}
Now this is a snapshot of all links on the page. Every links is cloned and pushed to the array. Now run the for loop and the original links aren't removed.
If the original link was something like:
This is an example.
it will become:
http://example.com
But if you want just:
http://example.com
then you have to adapt the code:
for (var i = 0; i < links.length; i++) {
var p = document.createElement("p");
var a = document.createElement("a");
a.href = links[i].href;
a.text = links[i].href; // you can use text instead of innerHTML
p.appendChild(a);
document.body.appendChild(p);
}
If you want to style the output you can add classes like this:
p.classList.add("my-css-class");
I hope this helps you to achieve your goal.

How to scrape all links in a page with javascript

I am trying to write a chrome extension to change all hrefs in a page using this code
var a = document.querySelector("a[href]");
a.href = "http://www.google.com";
But this code only fetches the first href but only if it is not embedded in another attribute(If the term is wrong I am meaning div, p, h etc.)
Could someone show me how to fetch all hrefs no matter what?
document.querySelector only returns the first element within the document, and so in this case you will want to use document.querySelectorAll which instead returns a list of all matching elements.
var elements = document.querySelectorAll('a');
for (var i = 0; i < elements.length; i++) {
elements[i].href = 'http://google.com';
}
but only if it is not embedded in another attribute(If the term is wrong I am meaning div, p, h etc.)
I believe you are talking about tags instead of attributes. To select all tags with the present of the href attribute, do this:
var list = document.querySelectorAll("*[href]");
for(var i = 0; i < list.length; i++){
list[i].href = "http://www.google.com";
}

How do I remove links from a page via JavaScript?

I want to write a user script for my browsers (Opera, Chromium) that removes links containing predefined keywords. For example, a link bar should simply vanish from the page when foo is part of the blacklist.
How do i remove duplicate links from a page except first shows how to get and filter a site, but I want to do this directly via a user script. Any ideas how I would apply the filter on every page load?
Get the document.links collection. If any of their .href properties match your blacklist, set their style.display property to 'none'.
e.g.
function removeLinks () {
var blackList = /foo|bar|baz/;
var link, links = document.links;
var i = links.length;
while (i--) {
link = links[i];
if (blackList.test(link.href)) {
link.style.display = 'none';
}
}
}
Edit
To remove duplicate links is a similar exercise. First convert the links HTMLCollection to a plain array, then as you iterate over them use their hrefs as create properties of an object. If the href is already a property, hide it using the above method or link.parentNode.removeChild(link).
You could use XPATH and its contains() function to match the links via document.evaluate.
Dive Into Greasemonkey has an exampl eof selecting and iterating over nodes using XPATH.
for (var i = 0; i < blacklist.length; i++) {
var links = document.evaluate('//a[contains(#href, "' + blacklist[i] + '"]', document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
for (var j = 0; j < links .snapshotLength; j++) {
var link = links.snapshotItem(j);
link.parentNode.removeChild(link);
}
}
You could use something like this
$("a[href*='" + foov + "']").css('display', 'none')

javascript - get all anchor tags and compare them to an array

I have been trying forever but it is just not working, how can I check the array of urls I got (document.getElementsByTagName('a').href;) to see if any of the websites are in another array?
getElementByTagName gives you a nodelist (an array of nodes).
var a = document.getElementsByTagName('a');
for (var idx= 0; idx < a.length; ++idx){
console.log(a[idx].href);
}
I really suggest that you use a frame work for this, like jquery. It makes your life so much easier.
Example with jquery:
$("a").each(function(){
console.log(this.href);
});
var linkcheck = (function(){
if(!Array.indexOf){
Array.prototype.indexOf = function(obj){
for(var i=0; i<this.length; i++){
if(this[i]===obj){
return i;
}
}
return -1;
}
}
var url_pages = [], anchor_nodes = []; // this is where you put the resulting urls
var anchors = document.links; // your anchor collection
var i = anchors.length;
while (i--){
var a = anchors[i];
anchor_nodes.push(a); // push the node object in case that needs to change
url_pages.push(a.href); // push the href attribute to the array of hrefs
}
return {
urlsOnPage: url_pages,
anchorTags: anchor_nodes,
checkDuplicateUrls: function(url_list){
var duplicates = []; // instantiate a blank array
var j = url_list.length;
while(j--){
var x = url_list[j];
if (url_pages.indexOf(x) > -1){ // check the index of each item in the array.
duplicates.push(x); // add it to the list of duplicate urls
}
}
return duplicates; // return the list of duplicates.
},
getAnchorsForUrl: function(url){
return anchor_nodes[url_pages.indexOf(url)];
}
}
})()
// to use it:
var result = linkcheck.checkDuplicateUrls(your_array_of_urls);
This is a fairly straight forward implementation of a pure JavaScript method for achieving what I believe the spec calls for. This also uses closures to give you access to the result set at any time, in case your list of urls changes over time and the new list needs to be checked. I also added the resulting anchor tags as an array, since we are iterating them anyway, so you can change their properties on the fly. And since it might be useful to have there is a convenience method for getting the anchor tag by passing the url (first one in the result set). Per the comments below, included snippet to create indexOf for IE8 and switched document.getElementsByTagName to document.links to get dynamic list of objects.
Using Jquery u can do some thing like this-
$('a').each(function(){
if( urls.indexOf(this.href) !- -1 )
alert('match found - ' + this.href );
})
urls is the your existing array you need to compare with.

How do I use regular javascript to look through every <a> tag and change the href?

A page has many tags. How do I loop through all of them and replace their "href" with "http://example.com"?
(do not use jQuery)
var links = document.getElementsByTagName("a");
for (var i = 0; i < links.length; i++) {
links[i].href = "http://example.com";
}
You can use the document.links collection. It's defined by the W3C and supported by all common browsers.
Moreover you get access not only to <a> elements, but to <area> tags too (which are commonly used in client image maps).
for(var i=0; i < document.links.length; i++) {
document.links[i].href = "http://example.com";
}
var links = document.getElementsByTagName("a");
for (i=0;i<links.length;i++)
links[i].href = "http://example.com";
You must use getElementsByTagName() to fetch all the links, and then loop through them to change the href property.
var links = document.getElementsByTagName('a');
if(links) { // if none are found, do not continue
for(var i = 0; i < links.length; i++) {
links[i].href = 'http://example.com/';
}
}
for(var i=0,L=document.links.length;i <L; i++) {
document.links[i].href = "http://example.com";
}
Or load a 20 kb library and write a little less code.

Categories

Resources