Find images in the above fold using Puppeteer

Find images in the above fold using Puppeteer - javascript

I'm trying to find images in the above fold using Puppeteer and set attribute loading="eager" to those images.
Here is what I've tried:
const images = await page.$$("img");
images.forEach(async image => {
if (await image.isIntersectingViewport()) {
console.log("intersecting");
image.setAttribute("loading", "eager");
}
});
It finds the images in the above fold correctly. However, when I try to set the attribute to it, it throws an error: TypeError: image.setAttribute is not a function

image is an ElementHandle, that means that it's an object on the puppeteer's world pointing to an element in the browser world.
If you want to set an attribute to the DOM element, you could call evaluate and set that in the browser.
await image.evaluate(i => i.setAttribute("loading", "eager"));

Related

Can you use a dynamic selector that doesn't stay consistent with puppeteer?

I'm trying to find a selector that will stay fairly consistent through the entire process and there's 3 buttons that needs to be clicked to get to the create account page. On the third button there's no static or original selector I could find. There was an a id="" tag that I used and kept getting errors until I realized it changes every time the page is refreshed. I saw that if there was an consistency within it, you could do "a[id="product*"], but unfortunately the entire ID changes. Is there anyway to have the selector as "a[id=""] and dynamically scrape the ID as it changes? Below is two examples of the ID changing as the page is refreshed.

You may assume that to select an element, you have to use only that element. But if there are no reliably identifiable characteristics on that element, work back up the tree and check parents for identifiable characteristics.
In your case, the enclosing <div> has plenty of unique-looking static class names you can use:
const puppeteer = require("puppeteer"); // ^19.0.0
const html = `
<div class="nike-unite-component action-link loginJoinLink current-member-signin">
Junte-se a
</div>
<script>
// for testing
document.querySelector("a").addEventListener("click", ({target}) => {
target.textContent = "clicked";
});
</script>`;
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
const sel = ".loginJoinLink.current-member-signin a";
const el = await page.waitForSelector(sel);
await el.click();
console.log(await el.evaluate(el => el.textContent)); // => clicked
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
The assumption here is that there's only one element with the CSS selector .loginJoinLink.current-member-signin a, or that the one you want is the first in the document. If the assumption isn't true, you can always strengthen the selector, for example,
.loginJoinLink.current-member-signin.action-link.nike-unite-component > a
(I typed this in from your screenshot, so please test it for typos).
If that's still not enough to disambiguate, add additional parent context:
#nike-unite-loginForm .loginJoinLink.current-member-signin a
XPath with a text selector is another option:
const link = await page.waitForXPath("//a[contains(text(), 'Junte-se a')]");
The assumption here is that the substring Junte-se a is unique text inside <a> tags within the document, or that the one you want to click is the first.
If neither works, please provide a runnable, reproducible example with code and the actual site or representative markup. Oftentimes, there are iframes, shadow DOM roots, script blocking, visibility and other factors that make code like this fail on certain sites.
For example, visibility impacts how Puppeteer's trusted .click() works, so it may be necessary to use the untrusted native click:
await el.evaluate(el => el.click());

Is there a way to target a specific element using Puppeteer AND preserve the CSS when converting html to pdf?

I'd like to convert some html to a pdf file. The problem is that I just need a part of a webpage and most certainly not all elements. So I was wondering if there is a way to target a single element with a specific id for example, so that only that element gets converted to a pdf?
I know I can do this for example:
const dom = await page.$eval('div.jsb', (element) => {
return element.innerHTML
}) // Get DOM HTML
await page.setContent(dom) // HTML markup to assign to the page for generate pdf
However, using the code above won't preserve the CSS...
It is also not an option to use page.addStyleTag to add the css by hand, since the element I am trying to convert to a pdf has loads and loads of CSS styles already applied to it...
So the question remains, how can I convert a single element on a page using Puppeteer (or if you know of other ways / methods / libraries, then those are welcome too of course).
Grabzit for example allows to you to specify the targetElement in their options like so:
const options = {
'targetElement': '#rightContent',
'pagesize': 'A4',
}
Unfortunately, it does not give me consistent results.

I have had some success like this:
const myElement = await page.$('.my-el');
await page.evaluate(el => {
el = el.cloneNode(true);
document.body.innerHTML = `
<div>
${el.outerHTML}
</div>
`;
}, myElement);
const pdf = await page.pdf(...)
However it is not working very well when the element I select contains Canvas elements.
(Code based on example here https://github.com/puppeteer/examples/blob/master/element-to-pdf.js)

Web scraping using Apify

I'm trying to scrape URLs from https://en.wikipedia.org/wiki/List_of_hedge_funds
Specifically, I'm trying to use Apify to scrape that page and return a list of URLs from anchor tags present in the HTML. In my console, I expect to see the value of the href attribute of one or more anchor tags that exist on the target page in a property called myValue. I also expect to see the page title in a property called title. Instead, I just see the following URL property and its value.
My Apify actor uses the Puppeteer platform. So I'm using a pageFunction similar to the way Puppeteer uses it.
Below is a screen shot of the Apify UI just before I run it.
Page function
function pageFunction( context ) {
// called on every page the crawler visits, use it to extract data from it
var $ = context.jQuery;
var result = {
title: $('.wikitable').text,
myValue: $('a[href]').text,
};
return result;
}
What am I doing wrong?

You have a typo in your code, text is a function so you need to add parentheses:
var result = {
title: $('.wikitable').text(),
myValue: $('a[href]').text(),
};
But note that this will probably not do what you expect anyway - it will return text of all matched elements. You probably need to use jQuery's each() function (https://api.jquery.com/jquery.each/) to iterate the found elements, push some values from them to an array and return the array from your page function.

The page seems to be loaded by JavaScript so actually I have to use asynchronous code.

Protractor - getting computed style from web element using browser.executeScript works with string but fails with function

I'm trying to verify visibility for a tooltip popup by calling window.getComputedStyle().visibility property using protractor framework.
When I pass a string to executeScript it's working fine. It's returning visible:
// elementToCheck is an ElementFinder
async getComputedStyleVisibility(elementToCheck) {
return await browser
.executeScript(`return window.getComputedStyle(document.querySelector('${elementToCheck.locator().value}')).visibility`);
}
However, this is failing when I replace the string within executeScript by a function. It's returning hidden and it looks like execution gets stuck until tooltip popup disappears.
So I guess there's some synchronisation issue, but I cannot figure out what's happening:
// elementToCheck is an ElementFinder
async getComputedStyleVisibility(elementToCheck) {
return await browser.executeScript(
webElem => (window.getComputedStyle(webElem).visibility),
await elementToCheck.getWebElement()
);
}

For making your script work as you wanted, you need to correctly access your webElement.
The docs say,
Any arguments provided in addition to the script will be included as script arguments and may be referenced using the arguments object
So you need to use arguments object in your script. Like so:
async getComputedStyleVisibility(elementToCheck) {
return await browser.executeScript(
() => (window.getComputedStyle(arguments[0]).visibility),
await elementToCheck.getWebElement()
);
}
BUT
If you aren't restricted in some way to only use browser.executeScript() then you should overthink your approach.
protractor provides an API for checking if a certain element is present or similar.
Check an element to be present and visible for user:
element(by.css("#a")).isDisplayed()
.then(isDisplayed => console.log("element displayed?", isDisplayed))
.catch(err => console.error("Some error happedn. Element not present..", err))
You should use browser.executeScript() only as last resort in my opinion. Most of the general stuff like clicking, checking if present, etc. is already there, in a handy way, provided by protractor.

Rendering image dynamically is so hard in React

After trying various ways for hours and checking every relatable link, I couldn't find any proper way to render image dynamically in React.
Here is what i am trying to do.
I have an array of objects in which each object has a attribute called name. I am using map function map to loop over this array and returning of array of img element like shown below.
<img className="img-thumbnail" src={require('../../public/images/'+item.name+'.png')}/>
where item.name is the name of image file I want to display, for which require is giving me error "cannot find module".
Moreover I need to implement some fallback option, where rather showing broken images incase image file does not exist, i want to display default image
Here are the things I have tried:
using try and catch block over require and calling this function from img element
setImage(data){
try{
return require( '../../public/images/'+data+'.png' ) //actual image
}catch(err){
console.log(err);
return require('../../public/images/fallback.png'); //fallback
}
<img className="img-thumbnail" src={this.setImage(item)}/>
using import, inside same function above, got error import cannot be called from inside of function
using react-image library. Turned out it does not support local images.
Any help ?

Here a tricky way to handle this. Use react state to check if there's error.
If true, show fallback, otherwise, show actual image.
setImage = (data) => {
const image = new Image();
image.src = '../../public/images/'+data+'.png';
this.setState({
hasError: false
})
image.onerror = () => {
this.setState({
hasError: true
})
}
return image.src;
}
// into render
this.state.hasError
? <img src="../../public/images/fallback.png" />
: <img className="img-thumbnail" src={this.setImage(item)}/>
Update: Example
var image = new Image();
image.src = 'fake.jpg';
image.onerror = () => {
console.log('image doesn t exist');
}

I dont know why you need required it could be done simply like this. You can import something like this. Import image like this
import fallback from '../../public/images/fallback.png';
and for dynamic image i would suggest either make some key value pair. For ex :
let data = {
image1 : ../../public/images/image1.png,
image2 : ../../public/images/image1.png
}
and import it normal
and something in render
it could be something like this.
render(){
return(
<img className="img-thumbnail" src={img?img[type]:fallback}/> //something its just refrence(where type is something which you want dynamically add image)
)
}

Requires are statically checked during compile time. The path of requires cannot be dynamic. Since you have static images in your bundle and the object maps to one of these you can follow a solution to something as follows
const images = {
image1: require('local/path/to/image1'),
image2: require('local/path/to/image2'),
image3: require('local/path/to/image3'),
}
const defaultImage = require('local/path/to/defaultImage');
const Img = ({ name }) => {
// here name is the name for the image you get from api..
// This should match with the keys listed the iages object
return <img src={images[name] ? images[name] : defaultImage}/>
}

Above all answers were helpful but unforutnaley none of the method worked for me. So again digging little deep I found that require was giving error "cannot find module" because after webpack bundles my code, require lost the context. Which means the given relative path was no longer valid.
What i needed to do was preserve context which I did by using require.context;
Here is the final code that worked.
//getting the context of image folder
const imageFolderContext = require.context('realtive/path/to/image/folder')
//function to check if image exist
checkFile(data){
try{
let pathToImage = './path/to/image/relative/to/image/folder/context
imageFolderContext(pathToImage) //will check if Image exist
return true //return true if exist
}catch(err){return false}
}
//rendering image element dynamically based on item name and if exist or not
<img src={this.checkFile(itemName)?imageFolderContext('path/to/image/relative/to/context'):imageFolderContext('default/image/path/) />
don't forget to bind checkFile function

Develop Reference

JavaScript is the programming language of the Web.

Find images in the above fold using Puppeteer - javascript

image is an ElementHandle, that means that it's an object on the puppeteer's world pointing to an element in the browser world. If you want to set an attribute to the DOM element, you could call evaluate and set that in the browser. await image.evaluate(i => i.setAttribute("loading", "eager"));

Related

Can you use a dynamic selector that doesn't stay consistent with puppeteer?

Is there a way to target a specific element using Puppeteer AND preserve the CSS when converting html to pdf?

Web scraping using Apify

Protractor - getting computed style from web element using browser.executeScript works with string but fails with function

Rendering image dynamically is so hard in React

Categories

Resources