I want to write a user script for my browsers (Opera, Chromium) that removes links containing predefined keywords. For example, a link bar should simply vanish from the page when foo is part of the blacklist.
How do i remove duplicate links from a page except first shows how to get and filter a site, but I want to do this directly via a user script. Any ideas how I would apply the filter on every page load?
Get the document.links collection. If any of their .href properties match your blacklist, set their style.display property to 'none'.
e.g.
function removeLinks () {
var blackList = /foo|bar|baz/;
var link, links = document.links;
var i = links.length;
while (i--) {
link = links[i];
if (blackList.test(link.href)) {
link.style.display = 'none';
}
}
}
Edit
To remove duplicate links is a similar exercise. First convert the links HTMLCollection to a plain array, then as you iterate over them use their hrefs as create properties of an object. If the href is already a property, hide it using the above method or link.parentNode.removeChild(link).
You could use XPATH and its contains() function to match the links via document.evaluate.
Dive Into Greasemonkey has an exampl eof selecting and iterating over nodes using XPATH.
for (var i = 0; i < blacklist.length; i++) {
var links = document.evaluate('//a[contains(#href, "' + blacklist[i] + '"]', document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
for (var j = 0; j < links .snapshotLength; j++) {
var link = links.snapshotItem(j);
link.parentNode.removeChild(link);
}
}
You could use something like this
$("a[href*='" + foov + "']").css('display', 'none')
Related
I'm trying to get all the img element from some website.
I opened the chrome dev tools console and put the following code which is,
var imgs = document.getElementsByTagName("img");
var i;
for (i = 0; i < imgs.length; i++) {
console.log(imgs[i]);
}
Yes, code works and returns lists of images in console but not fully.
I noticed that some img elements are not returned.
the parts that is not returened is in the following image.(the links is here)
I really wonder why these guys are not returned. Why is that?
As I said in the comment, there are no <img> elements, those are <a> elements with a css background image.
If you want to get those image urls, simply select the <a> elements, access their background-image css property and extract the urls:
var aElems = document.querySelectorAll(".J_Prop_Color a");
var images = [];
for(var i = 0; i < aElems.length; i++) {
images.push(aElems[i].style.backgroundImage.replace("url(", "").replace(")", "").replace(/\"/gi, ""))
}
console.log(images);
The .replace("url(", "").replace(")", "").replace(/\"/gi, "") part is used to remove the surrounding url("...") as per this SO answer.
Note 1: The resulting urls appear to be protocol-relative urls, where they start with // rather than an explicit protocol like https://, you may want to prepend "https:" to them before using them.
Note 2: The resulting urls are of thumbnails rather than of the full-sized images, remove the _(number)x(number).jpg part of those urls by using this replace: replace(/_\d+x\d+\.[^.]+$/, "") to get the full-size image urls:
images.push("https:" + aElems[i]
.style.backgroundImage
.replace("url(", "").replace(")", "").replace(/\"/gi, "")
.replace(/_\d+x\d+\.[^.]+$/, "")
);
The problem if you open the console and inspect the elements is like someone mention in the comments that there are no image tags, if you check the console, you will see:
a div.
You want to do:
var imgs = document.getElementsByID(IDVALUES);
var i;
for (i = 0; i < imgs.length; i++) {
console.log(imgs[i]);
}
the id is stored in the div id something like 'ks-component-??'
Most likely the answer above will not give you want since you want multiple images you would want to create an array and push the corresponding elements to it.
var img1 = ....
var img2 = etc....
....
let arr = [];
arr.push(img1);
arr.push(img2);
....
for (i = 0; i < arr.length; i++) {
console.log(arr[i]);
}
Where the ... means the list or all the variables you need
I'm trying to write a javascript what is searching for all of the links on the page, then it is adding them to the bottom, under the original content.
"Document.links" seems to do the finding part, it is also listing them, but they are not clickable. So I tried to add some html codes (startHref and endHref lines), which broke the whole thing of course.
My (non-working) script:
var links = document.links;
for(var i = 0; i < links.length; i++) {
var preLink = document.createTextNode("LINK: ");
var linkHref = document.createTextNode(links[i].href);
var lineBreak = document.createElement("br");
var startHref = document.createElement("a href="");
var endHref = document.createElement(""");
document.body.appendChild(preLink);
document.body.appendChild(startHref);
document.body.appendChild(linkHref);
document.body.appendChild(endHref);
document.body.appendChild(lineBreak);
}
If this will work I'd also like to have them listed with a number in front of each line (starting with 1 - could be set in the preLink part) - if not too hard to implement.
Also, is there a way to list not all of the links, but only those matching with something? Like only links with a specific domain. Thank you!
As you have already found out, you can get all links in a document with:
var links = document.links;
Now you have an HTMLCollection. You can iterate through it and display all links. For better layout you can put them in a paragraph (p). This would be the loop:
for (var i = 0; i < links.length; i++) {
var p = document.createElement("p");
p.appendChild(links[i]);
document.body.appendChild(p);
}
Now all links are appended at the end of the page, every link is on its own line and they are clickable. Please try this out.
EDIT: as of your comment, if I understand it right, you have just to put one additional line:
for (var i = 0; i < links.length; i++) {
var p = document.createElement("p");
// the following line is added
links[i].innerHTML = links[i].href;
p.appendChild(links[i]);
document.body.appendChild(p);
}
That line will simply replace the inner HTML of the link with its value for the attribute href.
EDIT:
The variable links just points to document.links. The existing links are therefore removed from their original position and appended to the end. If you try to create new links in the for loop, like document.createElement("a") you will create an endless loop, because you're iterating through all links in the document. You remember, the variable links is not a snapshot of document.links when created, but points to it.
You can work around this with creating an array:
var links = [];
// populate the array links
for (var j = 0; j < document.links.length; j++) {
links.push(document.links[j].cloneNode());
}
Now this is a snapshot of all links on the page. Every links is cloned and pushed to the array. Now run the for loop and the original links aren't removed.
If the original link was something like:
This is an example.
it will become:
http://example.com
But if you want just:
http://example.com
then you have to adapt the code:
for (var i = 0; i < links.length; i++) {
var p = document.createElement("p");
var a = document.createElement("a");
a.href = links[i].href;
a.text = links[i].href; // you can use text instead of innerHTML
p.appendChild(a);
document.body.appendChild(p);
}
If you want to style the output you can add classes like this:
p.classList.add("my-css-class");
I hope this helps you to achieve your goal.
I am trying to write a chrome extension to change all hrefs in a page using this code
var a = document.querySelector("a[href]");
a.href = "http://www.google.com";
But this code only fetches the first href but only if it is not embedded in another attribute(If the term is wrong I am meaning div, p, h etc.)
Could someone show me how to fetch all hrefs no matter what?
document.querySelector only returns the first element within the document, and so in this case you will want to use document.querySelectorAll which instead returns a list of all matching elements.
var elements = document.querySelectorAll('a');
for (var i = 0; i < elements.length; i++) {
elements[i].href = 'http://google.com';
}
but only if it is not embedded in another attribute(If the term is wrong I am meaning div, p, h etc.)
I believe you are talking about tags instead of attributes. To select all tags with the present of the href attribute, do this:
var list = document.querySelectorAll("*[href]");
for(var i = 0; i < list.length; i++){
list[i].href = "http://www.google.com";
}
I need to write code that puts all of the href links from a webpage into an array. Here's what I have so far:
var array = [];
var links = document.links;
for(var i=0; i<links.length; i++) {
array.push(links[i].href);
}
However, this does not work on a page like Gmail's inbox, because the some of the links are within an iframe. How can I get ALL of the links, including the ones inside the iframe?
Also, this is for a google chrome extension. In the manifest, I have all_frames set to true - does this make a difference?
Thanks
One thing to remember that
document.links
document.images
document.forms
document.forms[0].elements
document.getElementsByName()
document.getElementsByClassName()
document.getElementsByTagName()
are live queries to DOM objects, therefore in forLoops it may significantly slow down your execution (as i < links.length is queries on each for cycle), if you check the array length like this:
var array = [];
var links = document.getElementsByTagName("a");
for(var i=0; i<links.length; i++) {
array.push(links[i].href);
}
instead you better do this:
var array = [];
var links = document.getElementsByTagName("a");
for(var i=0, max=links.length; i<max; i++) {
array.push(links[i].href);
}
Surely you're going to get 'arr is not defined' with your code to begin with?
var array = [];
var links = document.links;
for(var i=0; i<links.length; i++) {
arr.push(links[i].href);
}
Try:
var array = [];
var links = document.getElementsByTagName("a");
for(var i=0; i<links.length; i++) {
array.push(links[i].href);
}
I have a method I use to access data in an IFrame. How fun that the answer is never just written down to read and use :P.
Feel free to modify and abuse:
public HtmlElementCollection GetIFrameElements(String tmpTag, int Frame)
{
HtmlElementCollection tmpCollection = mWebBrowser.Document.Window.Frames[Frame].Document.Body.GetElementsByTagName(tmpTag);
return tmpCollection;
}
I then use it to look for whatever element Im after:
foreach (HtmlElement el in GetElements("input"))
{
if (el.GetAttribute("id").Equals("hasNoGoogleAccount"))
{
el.InvokeMember("click");
}
}
You could always change the method to loop through and get all iFrames etc blah blah but that should be enough to get you moving.
Rate me! Im new
From my Web Adjuster's bookmarklet code,
function all_frames_docs(c) {
var f=function(w) {
if(w.frames && w.frames.length) {
var i; for(i=0; i<w.frames.length; i++) f(w.frames[i])
} c(w.document) };
f(window) }
You can pass any function into all_frames_docs and it will be called in turn on every frame and iframe in the current window, provided your script has access to such (i.e. it's an extension or a bookmarklet). So all you have to do now is code the function to handle each document, which can go through document.getElementsByTagName("a") or whatever, and make this function the parameter of your call to all_frames_docs.
I have been trying forever but it is just not working, how can I check the array of urls I got (document.getElementsByTagName('a').href;) to see if any of the websites are in another array?
getElementByTagName gives you a nodelist (an array of nodes).
var a = document.getElementsByTagName('a');
for (var idx= 0; idx < a.length; ++idx){
console.log(a[idx].href);
}
I really suggest that you use a frame work for this, like jquery. It makes your life so much easier.
Example with jquery:
$("a").each(function(){
console.log(this.href);
});
var linkcheck = (function(){
if(!Array.indexOf){
Array.prototype.indexOf = function(obj){
for(var i=0; i<this.length; i++){
if(this[i]===obj){
return i;
}
}
return -1;
}
}
var url_pages = [], anchor_nodes = []; // this is where you put the resulting urls
var anchors = document.links; // your anchor collection
var i = anchors.length;
while (i--){
var a = anchors[i];
anchor_nodes.push(a); // push the node object in case that needs to change
url_pages.push(a.href); // push the href attribute to the array of hrefs
}
return {
urlsOnPage: url_pages,
anchorTags: anchor_nodes,
checkDuplicateUrls: function(url_list){
var duplicates = []; // instantiate a blank array
var j = url_list.length;
while(j--){
var x = url_list[j];
if (url_pages.indexOf(x) > -1){ // check the index of each item in the array.
duplicates.push(x); // add it to the list of duplicate urls
}
}
return duplicates; // return the list of duplicates.
},
getAnchorsForUrl: function(url){
return anchor_nodes[url_pages.indexOf(url)];
}
}
})()
// to use it:
var result = linkcheck.checkDuplicateUrls(your_array_of_urls);
This is a fairly straight forward implementation of a pure JavaScript method for achieving what I believe the spec calls for. This also uses closures to give you access to the result set at any time, in case your list of urls changes over time and the new list needs to be checked. I also added the resulting anchor tags as an array, since we are iterating them anyway, so you can change their properties on the fly. And since it might be useful to have there is a convenience method for getting the anchor tag by passing the url (first one in the result set). Per the comments below, included snippet to create indexOf for IE8 and switched document.getElementsByTagName to document.links to get dynamic list of objects.
Using Jquery u can do some thing like this-
$('a').each(function(){
if( urls.indexOf(this.href) !- -1 )
alert('match found - ' + this.href );
})
urls is the your existing array you need to compare with.