To give you some background, many (if not all) websites load their images one by one, so if there are a lot of images, and/or you have a slow computer, most of the images wont show up. This is avoidable for the most part, however if you're running a script to exact image URLs, then you don't need to see the image, you just want its URL. My question is as follows:
Is it possible to trick a webpage into thinking an image is done loading so that it will start loading the next one?
Typically browser will not wait for one image to be downloaded before requesting the next image. It will request all images simultaneously, as soon as it gets the srcs of those images.
Are you sure that the images are indeed waiting for previous image to download or are they waiting for a specific time interval?
In case if you are sure that it depends on download of previous image, then what you can do is, route all your requests through some proxy server / firewall and configure it to return an empty file with HTTP status 200 whenever an image is requested from that site.
That way the browser (or actually the website code) will assume that it has downloaded the image successfully.
how do I do that? – Jack Kasbrack
That's actually a very open ended / opinion based question. It will also depend on your OS, browser, system permissions etc. Assuming you are using Windows and have sufficient permissions, you can try using Fiddler. It has an AutoResponder functionality that you can use.
(I've no affiliation with Fiddler / Telerik as such. I'm suggesting it only as an example and because I've used it in the past and know that it can be used for the aforementioned purpose. There will be many more products that provide similar functionality and you should use the product of your choice.)
use a plugin called lazy load. what it does is it will load the whole webpage and will just load the image later on. it will only load the image when the user scroll on it.
To extract all image URLs to a text file maybe you could use something like this,
If you execute this script inside any website it will list the URLs of the images
document.querySelectorAll('*[src]').forEach((item) => {
const isImage = item.src.match(/(http(s?):)([/|.|\w|\s|-])*\.(?:jpg|jpeg|gif|png|svg)/g);
if (isImage) console.log(item.src);
});
You could also use the same idea to read Style from elements and get images from background url or something, like that:
document.querySelectorAll('*').forEach((item) => {
const computedItem = getComputedStyle(item);
Object.keys(computedItem).forEach((attr) => {
const style = computedItem[attr];
const image = style.match(/(http(s?):)([/|.|\w|\s|-])*\.(?:jpg|jpeg|gif|png|svg)/g);
if (image) console.log(image[0]);
});
});
So, at the end of the day you could do some function like that, which will return an array of all images on the site
function getImageURLS() {
let images = [];
document.querySelectorAll('*').forEach((item) => {
const computedItem = getComputedStyle(item);
Object.keys(computedItem).forEach((attr) => {
const style = computedItem[attr];
const image = style.match(/(http(s?):)([/|.|\w|\s|-])*\.(?:jpg|jpeg|gif|png|svg)/g);
if (image) images.push(image[0]);
});
});
document.querySelectorAll('*[src]').forEach((item) => {
const isImage = item.src.match(/(http(s?):)([/|.|\w|\s|-])*\.(?:jpg|jpeg|gif|png|svg)/g);
if (isImage) images.push(item.src);
});
return images;
}
It can probably be optimized but, well you get the idea..
If you just want to extract images once. You can use some tools like
1) Chrome Extension
2) Software
3) Online website
If you want to run it multiple times. Probably use the above code https://stackoverflow.com/a/53245330/4674358 wrapped in if condition
if(document.readyState === "complete") {
extractURL();
}
else {
//Add onload or DOMContentLoaded event listeners here: for example,
window.addEventListener("onload", function () {
extractURL();
}, false);
//or
/*document.addEventListener("DOMContentLoaded", function () {
extractURL();
}, false);*/
}
extractURL() {
//code mentioned above
}
You want the "DOMContentLoaded" event docs. It fires as soon as the document is fully parsed, but before everything has been loaded.
let addIfImage = (list, image) => image.src.match(/(http(s?):)([/|.|\w|\s|-])*\.(?:jpg|jpeg|gif|png|svg)/g) ?
[image.src, ...list] :
list;
let getSrcFromTags= (tag = 'img') => Array.from(document.getElementsByTagName(tag))
.reduce(addIfImage, []);
if (document.readyState === "loading") {
document.addEventListener("DOMContentLoaded", doSomething);
} else { // `DOMContentLoaded` already fired
doSomething();
}
I am using this, works as expected:
var imageLoading = function(n) {
var image = document.images[n];
var downloadingImage = new Image();
downloadingImage.onload = function(){
image.src = this.src;
console.log('Image ' + n + ' loaded');
if (document.images[++n]) {
imageLoading(n);
}
};
downloadingImage.src = image.getAttribute("data-src");
}
document.addEventListener("DOMContentLoaded", function(event) {
setTimeout(function() {
imageLoading(0);
}, 0);
});
And change every src attribute of image element to data-src
Related
I am writing a userscript that will run whenever I visit our JIRA server. It will swap out the avatar images for users with images from our company's ID badging photo store. This way, the entire organization does not need to upload their images onto the JIRA server separately for the user icons to be readily recognizable. I can use jQuery to alter the 'src' attribute of images already rendered into the DOM. However, the JIRA pages load dynamic content into divs and so on, so I am wondering if there is a way to attach an event handler that would trigger whenever the browser tries to fetch an image, and serve a different image from company server instead. The img tag contains the data attribute needed for me to map the image correctly. I just don't know what event to hook into.
The closest I could find is the "Grumpy cat" example on this page (landed there from this Stackoverflow comment), but it seems Mobify is no longer being maintained. What are the present-day options to tackle this use case?
Thanks!
You can use a deep (subtree) MutationObserver to watch for elements being added to the DOM. Whenever an element is added, look through it and its children for <img>s, and perform the required replacement on each:
const transform = img => img.src = img.src.replace(/200x100/, '100x100');
new MutationObserver((mutations) => {
for (const { addedNodes } of mutations) {
for (const addedNode of addedNodes) {
if (addedNode.nodeType !== 1) continue;
const imgs = addedNode.querySelectorAll('img');
for (const img of imgs) {
transform(img);
}
if (addedNode.tagName === 'IMG') {
transform(addedNode);
}
}
}
})
.observe(document.body, { childList: true, subtree: true });
setTimeout(() => {
document.body.innerHTML += '<div><img src="https://via.placeholder.com/200x100"><div>';
});
I was wondering how to load two or more video or audio files at once, and then wait for them all to be ready before playing.
The current way I've listed below seems to work for the most part, however, this can sometimes not fully wait for both to be ready to play through (since the oncanplay method is attached to different videos sources). This can of course cause issues when I want multiple file types to be completely synchronized.
function loadSources() {
var videoOne = document.getElementById("first");
videoOne.src = "first-video.mp4";
var videoTwo = document.getElementById("second");
videoTwo.src = "second-video.mp4";
videoOne.oncanplay = function() {
videoOne.play();
};
videoTwo.oncanplay = function() {
videoTwo.play();
};
}
How would I go about combining multiple oncanplay events into one?
One way to do this is by listening for a load event on the elements and incrementing an integer as resources are loaded. It would look something like the following code:
const videoOne = document.getElementById("first");
let loaded = 0;
videoOne.addEventListener("load", () => {
if (loaded === RESOURCE_NUM - 1) { // resources required to be loaded
// play the video / audio files
}
else {
loaded++;
}
});
I am loading a large number of images into a dynamic DIV and I am using a preloader to get the images.
imageObj = new Image();
imageObj.src = imgpath + imgname;
Each of these events creates a GET that I can see and monitor in Firebug.
If I know the name and path of an image, can I watch the relevant XMLHttpRequest to see if the GET has completed?
I do not want to rely on (or use) .onload events for this process.
The pseudo would look something like this...
if (imageObj.GET = 'complete')
Has anyone had any experience of this?
EDIT 1
Thanks to the help from Bart (see below) I have changed my image preloader to store an array of the image objects...
function imagePreLoader(imgname) {
images[imgnum] = new Image();
images[imgnum].src = imgpath + imgname;// load the image
imgnum ++;
}
And then, after all my other functions have run to build the content DIVs, I used the image.complete attribute in the following...
var interval = setInterval(function () {
imgcount = imgnum - 1; // because the imgnum counter ++ after src is called.
ok = 1;
for (i=0; i<imgcount; i++) {
if (images[i].complete == false){
ok = 0;
}
}
if (ok == 1) {
clearInterval(interval);
showIndexOnLoad();
}
}, 1000);
This waits until all the images are complete and only triggers the showIndexOnLoad() function when I get the 'ok' from the interval function.
All images now appear as I wanted, all at once with no additional waits for the GETs to catch up.
Well done Bart for putting me on to the image.complete attribute.
You can watch the complete property of the image to see if the image is fully loaded or not.
Here's an example.
http://jsfiddle.net/t3esV/1/
function load (source) {
var img = new Image();
img.src = source;
console.log('Loading ' + source);
var interval = setInterval(function () {
if (img.complete) {
clearInterval(interval);
complete(img);
}
}, 400);
};
function complete(img) {
console.log('Loaded', img.src);
document.body.appendChild(img);
};
Note: This example fails to clear the interval when something goes wrong and complete is never set to true.
Update
I wrote a simple jQuery.preload plugin to take advantage of the image.complete property.
This is a very interesting problem, and I am afraid there is no actual solution to this. The load event for images is when the image is being rendered and the browser knows the width and height of it.
What you would be after would be a tag-applicable readystatechange event. Alas, only IE allows you to bind those to non-document elements, so this is not an option.
There are a bunch of plug-ins that allow you to go around it, as well. One pretty hot one is https://github.com/desandro/imagesloaded , which has the added advantage of dealing with all the browser differences very efficiently. It, however, still relies on the load event (and I am pretty sure this is the only way to start doing what you want to do).
I'm trying to download the HTML of a website that is almost entirely generated by JavaScript. So, I need to simulate browser access and have been playing around with PhantomJS. Problem is, the site uses hashbang URLs and I can't seem to get PhantomJS to process the hashbang -- it just keeps calling up the homepage.
The site is http://www.regulations.gov. The default takes you to #!home. I've tried using the following code (from here) to try and process different hashbangs.
if (phantom.state.length === 0) {
if (phantom.args.length === 0) {
console.log('Usage: loadreg_1.js <some hash>');
phantom.exit();
}
var address = 'http://www.regulations.gov/';
console.log(address);
phantom.state = Date.now().toString();
phantom.open(address);
} else {
var hash = phantom.args[0];
document.location = hash;
console.log(document.location.hash);
var elapsed = Date.now() - new Date().setTime(phantom.state);
if (phantom.loadStatus === 'success') {
if (!first_time) {
var first_time = true;
if (!document.addEventListener) {
console.log('Not SUPPORTED!');
}
phantom.render('result.png');
var markup = document.documentElement.innerHTML;
console.log(markup);
phantom.exit();
}
} else {
console.log('FAIL to load the address');
phantom.exit();
}
}
This code produces the correct hashbang (for instance, I can set the hash to '#!contactus') but it doesn't dynamically generate any different HTML--just the default page. It does, however, correctly output that has when I call document.location.hash.
I've also tried to set the initial address to the hashbang, but then the script just hangs and doesn't do anything. For example, if I set the url to http://www.regulations.gov/#!searchResults;rpp=10;po=0 the script just hangs after printing the address to the terminal and nothing ever happens.
The issue here is that the content of the page loads asynchronously, but you're expecting it to be available as soon as the page is loaded.
In order to scrape a page that loads content asynchronously, you need to wait to scrape until the content you're interested in has been loaded. Depending on the page, there might be different ways of checking, but the easiest is just to check at regular intervals for something you expect to see, until you find it.
The trick here is figuring out what to look for - you need something that won't be present on the page until your desired content has been loaded. In this case, the easiest option I found for top-level pages is to manually input the H1 tags you expect to see on each page, keying them to the hash:
var titleMap = {
'#!contactUs': 'Contact Us',
'#!aboutUs': 'About Us'
// etc for the other pages
};
Then in your success block, you can set a recurring timeout to look for the title you want in an h1 tag. When it shows up, you know you can render the page:
if (phantom.loadStatus === 'success') {
// set a recurring timeout for 300 milliseconds
var timeoutId = window.setInterval(function () {
// check for title element you expect to see
var h1s = document.querySelectorAll('h1');
if (h1s) {
// h1s is a node list, not an array, hence the
// weird syntax here
Array.prototype.forEach.call(h1s, function(h1) {
if (h1.textContent.trim() === titleMap[hash]) {
// we found it!
console.log('Found H1: ' + h1.textContent.trim());
phantom.render('result.png');
console.log("Rendered image.");
// stop the cycle
window.clearInterval(timeoutId);
phantom.exit();
}
});
console.log('Found H1 tags, but not ' + titleMap[hash]);
}
console.log('No H1 tags found.');
}, 300);
}
The above code works for me. But it won't work if you need to scrape search results - you'll need to figure out an identifying element or bit of text that you can look for without having to know the title ahead of time.
Edit: Also, it looks like the newest version of PhantomJS now triggers an onResourceReceived event when it gets new data. I haven't looked into this, but you might be able to bind a listener to this event to achieve the same effect.
Searching for a js script, which will show some message (something like "Loading, please wait") until the page loads all images.
Important - it mustn't use any js framework (jquery, mootools, etc), must be an ordinary js script.
Message must disappear when the page is loaded.
Yeah an old-school question!
This goes back to those days when we used to preload images...
Anyway, here's some code. The magic is the "complete" property on the document.images collection (Image objects).
// setup a timer, adjust the 200 to some other milliseconds if desired
var _timer = setInterval("imgloaded()",200);
function imgloaded() {
// assume they're all loaded
var loaded = true;
// test all images for "complete" property
for(var i = 0, len = document.images.length; i < len; i++) {
if(!document.images[i].complete) { loaded = false; break; }
}
// if loaded is still true, change the HTML
if(loaded) {
document.getElementById("msg").innerHTML = "Done.";
// clear the timer
clearInterval(_timer);
}
};
Of course, this assumes you have some DIV thrown in somewhere:
<div id="msg">Loading...</div>
Just add a static <div> to the page, informing user that the page is loading. Then add window.onload handler and remove the div.
BTW, what’s the reason of this? Don’t users already have page load indicators in their browsers?
You should do async ajax requests for the images and add a call back when it's finished.
Here's some code to illustrate it:
var R = new XMLHttpRequest();
R.onreadystatechange = function() {
if (R.readyState == 4) {
// Do something with R.responseXML/Text ...
stopWaiting();
}
};
Theoretically you could have an onload event on every image object that runs a function that checks if all images is loaded. This way you don´t need a setTimeOut(). This would however fail if an image didn´t load so you would have to take onerror into account also.