puppeteer: dynamic page using indexeddb not loading - javascript

I am trying to take a screenshot of a page that uses indexeddb to generate some of its content.
My puppeteer code is pretty simple:
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({
width: 1200,
height: 1080
});
const relevantSite = 'http://example.com'; // <<-- replace this with the actual site
await page.goto(relevantSite);
await page.waitForSelector('#myContentSelector');
console.log('Content is now loaded');
await page.screenshot({path: 'dynamic-screenshot.png'});
await browser.close();
})();
This code above works fine for pages that dynamically generate content and DON'T use indexeddb but for pages that do, I just can't seem to figure out what I need to do in order to get the page to load correctly.
Do I have to do something special to get this indexeddb to work for headless pages loaded in puppeteer?

IndexedDB seems to work just fine with puppeteer. Example.
I suspect your issue is actually that you don't give the page time to load, and for the content to be populated.
await page.waitForSelector('#myContentSelector');
Is #myContentSelector something like a container div, where the actual content is populated later (even 25ms later)
Try waiting a second after the page loads before taking the screenshot?

Try follow like:
await page.goto(relevantSite, { waitUntil: 'networkidle2' });
const navigationPromise = page.waitForNavigation();
await page.waitForSelector('#myContentSelector');
console.log('Content is now loaded');
await page.screenshot({path: 'dynamic-screenshot.png'});
await navigationPromise;
Using options waitUntil for page.goto and with page.waitForNavigation

Related

Puppeteer doesn't find button, doesn't recognize loaded website

First time using Puppeteer and trying to simply click this button
after clicking the deny cookies button. That's my code:
await page.goto('https://myurl.com');
await page.click('a.cc-btn.cc-deny');
// await page.waitForNavigation();
await page.waitForSelector("#detailview_btn_order", {visible: true});
await page.click("#detailview_btn_order");
Clicking the deny cookies button works like a charm. However, it seems the second button can't be identified by Puppeteer. If I don't use waitForSelector it just says it can't find it. If I use it, I get a timeout after 30 seconds even though the website finishes loading after 5 seconds. If I uncomment waitForNavigation (regardless of what options I use) I get a timeout there, even thoug the site loads within seconds. What am I doing wrong? Thanks!
Can you try this:
await page.goto('https://myurl.com');
await Promise.all([
page.click('a.cc-btn.cc-deny'),
page.waitForNavigation(),
]);
const iframeElement = await page.waitForSelector("#my-iframe");
const frame = await iframeElement.contentFrame();
await frame.waitForSelector("#detailview_btn_order", {visible: true});
await frame.click("#detailview_btn_order");
Sometimes there is a race condition between a click and navigation.

screen shot and data trying to be taken before site fully loads using puppeteer

Hi i am trying to get to take a screenshot of a website using puppeteer but the site loads quite slow which leads to always not being able to grab any data or take screen shots, I would like to delay my screenshot until the site is finished loading, I have tried a bunch of methods and cant figure it out. Thanks in advance for any help.
This is my Code
const puppeteer = require("puppeteer-extra");
// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());
async function scrapeProduct(url) {
//launching puppeteer
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url, { waitUntil: "load" });
await page.waitFor("*");
function time() {
var d = new Date();
var n = d.getSeconds();
return console.log(n);
}
time();
await page.screenshot({ path: "testresult.png" });
time();
await browser.close();
}
scrapeProduct("https://www.realcanadiansuperstore.ca/search?search-bar=milk");
waitFor has been depreciated recently so you are better off trying the other events.
I can't inspect the webpage you are taking a screenshot of so cannot tell what might be happening after the load event.
However have you tried the other events puppeteer offers?
waitForNavigation and waitForSelector mentioned in https://stackoverflow.com/a/52501934/484337
If you have control of the page you are taking a screenshot of then you can add a DOM event to it which your puppeteer code can wait for using waitForEvent.
If all else fails and time is not important then you can put in a sleep(n) that is long enough to guarantee the page is loaded.

How to select elements within an iframe element in Puppeteer

Since ESPN does not provide an API, I am trying to use Puppeteer to scrape data about my fantasy football league. However, I am having a hard time trying to login using puppeteer due to the login form being nested with an iframe element.
I have gone to http://www.espn.com/login and selected the iframe. I can't seem to select any of the elements within the iframe except for the main section by doing
frame.$('.main')
This is the code that seems to get the iframe with the login form.
const browser = await puppeteer.launch({headless:false});
const page = await browser.newPage();
await page.goto('http://www.espn.com/login')
await page.waitForSelector("iframe");
const elementHandle = await page.$('div#disneyid-wrapper iframe');
const frame = await elementHandle.contentFrame();
await browser.close()
I want to be able to access the username field, password field, and the login button within the iframe element. Whenever I try to access these fields, I get a return of null.
You can get the iframe using contentFrame as you are doing now, and then call $.
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('http://www.espn.com/login')
const elementHandle = await page.waitForSelector('div#disneyid-wrapper iframe');
const frame = await elementHandle.contentFrame();
await frame.waitForSelector('[ng-model="vm.username"]');
const username = await frame.$('[ng-model="vm.username"]');
await username.type('foo');
await browser.close()
I had an issue with finding stripe elements.
The reason for that is the following:
You can't access an with different origin using JavaScript, it would be a huge security flaw if you could do it. For the same-origin policy browsers block scripts trying to access a frame with a different origin. See more detailed answer here
Therefore when I tried to use puppeteer's methods:Page.frames() and Page.mainFrame(). ElementHandle.contentFrame() I did not return any iframe to me. The problem is that it was happening silently and I couldn't figure out why it couldn't find anything.
Adding these arguments to launch options solved the issue:
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process'

puppeteer execute a js function on the chosen page

This is the analyzed page https://www.diretta.it/.
In this page the content of the following days is loaded dynamically with the js without changing the URL of the site (you can try it at the top right of the table).
Using puppeteer, with the following code
await page.goto ('https://www.diretta.it/');
it loads the contents of today's page.
Is there a way to load the page with tomorrow's content?
i have to scrape information from the matches of the following days
the function in js executable from terminal for change day is:
> set_calendar_date ('1')
What you are looking for is the page.evaluate() function.
This function lets you run any JS function in the page context.
In simpler terms, running page.evaluate() is akin to opening Dev tools and writing set_calendar_date('1') there directly.
Here is a working snippet, don't hesitate to pass {headless: false} to puppeteer.launch() if you want to see it working with your own eyes.
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.diretta.it/');
await page.evaluate(() => {
set_calendar_date ('1');
});
await page.waitFor(500); //Wait a bit for the website to refresh contents
//Updated table is now available
})();

How do I use jQuery with pages on puppeteer?

I am trying to use jQuery on the pages I load with puppeteer and I wanted to know how I can do the same? My code structure is like:
const puppeteer = require('puppeteer');
let browser = null;
async function getSelectors() {
try{
browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.setViewport({width: 1024, height: 1080});
await page.goto('https://www.google.com/');
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.2.1.min.js'});
var button = $('h1').text();
console.log(button);
} catch (e) {
console.log(e);
}
}
getSelectors();
Also I will be navigating to many pages within puppeteer so is there a way I can just add jQuery once and then use it throughout? A local jquery file implementation would be helpful as well.
I tried implementing the answers from inject jquery into puppeteer page but couldn't get my code to work. I will be doing much more complex stuff than the one illustrated above so I need jQuery and not vanilla JS solutions.
I finally got a tip from How to scrape that web page with Node.js and puppeteer
which helped me understand that the Puppeteer page.evaluate function gives you direct access to the DOM of the page you've just launched in Puppeteer. To get the following code to work, you should know I'm running this test in Jest. Also, you need a suitable URL to a page that has a table element with an ID. Obviously, you can change the details of both the page and the jQuery function you want to try out. I was in the middle of a jQuery Datatables project so I needed to make sure I had a table element and that jQuery could find it. The nice thing about this environment is that the browser is quite simply a real browser, so if I add a script tag to the actual HTML page instead of adding it via Puppeteer, it works just the same.
test('Check jQuery datatables', async () => {
const puppeteer = require('puppeteer');
let browser = await puppeteer.launch();
let page = await browser.newPage();
await page.goto('http://localhost/jest/table.html');
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.3.1.slim.min.js'});
const result = await page.evaluate(() => {
try {
var table = $("table").attr("id");
return table;
} catch (e) {
return e.message;
}
});
await console.log("result", result);
await browser.close();
});
The key discovery for me: within the page.evaluate function, your JavaScript code runs in the familiar context of the page you've just opened in the browser. I've moved on to create tests for complex objects created using jQuery plugins and within page.evaluate they behave as expected. Trying to use JSDOM was driving me crazy because it behaved a bit like a browser, but was different with regard to the key points I was using to test my application.

Categories

Resources