How to select elements within an iframe element in Puppeteer - javascript

Since ESPN does not provide an API, I am trying to use Puppeteer to scrape data about my fantasy football league. However, I am having a hard time trying to login using puppeteer due to the login form being nested with an iframe element.
I have gone to http://www.espn.com/login and selected the iframe. I can't seem to select any of the elements within the iframe except for the main section by doing
frame.$('.main')
This is the code that seems to get the iframe with the login form.
const browser = await puppeteer.launch({headless:false});
const page = await browser.newPage();
await page.goto('http://www.espn.com/login')
await page.waitForSelector("iframe");
const elementHandle = await page.$('div#disneyid-wrapper iframe');
const frame = await elementHandle.contentFrame();
await browser.close()
I want to be able to access the username field, password field, and the login button within the iframe element. Whenever I try to access these fields, I get a return of null.

You can get the iframe using contentFrame as you are doing now, and then call $.
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('http://www.espn.com/login')
const elementHandle = await page.waitForSelector('div#disneyid-wrapper iframe');
const frame = await elementHandle.contentFrame();
await frame.waitForSelector('[ng-model="vm.username"]');
const username = await frame.$('[ng-model="vm.username"]');
await username.type('foo');
await browser.close()

I had an issue with finding stripe elements.
The reason for that is the following:
You can't access an with different origin using JavaScript, it would be a huge security flaw if you could do it. For the same-origin policy browsers block scripts trying to access a frame with a different origin. See more detailed answer here
Therefore when I tried to use puppeteer's methods:Page.frames() and Page.mainFrame(). ElementHandle.contentFrame() I did not return any iframe to me. The problem is that it was happening silently and I couldn't figure out why it couldn't find anything.
Adding these arguments to launch options solved the issue:
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process'

Related

How do I make puppeteer click on link?

I want to make a scraper with puppeteer, that opens a site, uses its search bar and opens the first link.
That is the code:
const puppeteer = require('puppeteer');
(async () => {
let browser = await puppeteer.launch();
let page = await browser.newPage();
await page.goto('https://example.com', {waitUntil: 'networkidle2'});
await page.click('[name=query]');
await page.keyboard.type("(Weapon)");
await page.keyboard.press('Enter');
await page.waitForSelector('div[class="search-results"]', {timeout: 100000});
});
The problem is I can't make it open the first link from the search results, I tried to use page.click() But all of the search results are the same except the URL.
What I want to know is how can I make it open the first link from search results.
There're more ways to solve this. I recommend experimenting with it a bit, so you learn different ways of doing this.
await page.click('.search-results a');
it turns out Puppeteer always click on the first element it finds, so if you want the first one, this will be enough.
Or you can select all the links and then click on the first one:
const resultLinks = await page.$$('.search-results a');
resultLinks[0].click();
It'd be better to include a condition here as well, so you don't end up with an error because no element was found:
const resultLinks = await page.$$('.search-results a');
if (resultLinks.length) resultLinks[0].click();
There're more ways, so if you want to learn more, please refer to the API documenttion.

screen shot and data trying to be taken before site fully loads using puppeteer

Hi i am trying to get to take a screenshot of a website using puppeteer but the site loads quite slow which leads to always not being able to grab any data or take screen shots, I would like to delay my screenshot until the site is finished loading, I have tried a bunch of methods and cant figure it out. Thanks in advance for any help.
This is my Code
const puppeteer = require("puppeteer-extra");
// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());
async function scrapeProduct(url) {
//launching puppeteer
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url, { waitUntil: "load" });
await page.waitFor("*");
function time() {
var d = new Date();
var n = d.getSeconds();
return console.log(n);
}
time();
await page.screenshot({ path: "testresult.png" });
time();
await browser.close();
}
scrapeProduct("https://www.realcanadiansuperstore.ca/search?search-bar=milk");
waitFor has been depreciated recently so you are better off trying the other events.
I can't inspect the webpage you are taking a screenshot of so cannot tell what might be happening after the load event.
However have you tried the other events puppeteer offers?
waitForNavigation and waitForSelector mentioned in https://stackoverflow.com/a/52501934/484337
If you have control of the page you are taking a screenshot of then you can add a DOM event to it which your puppeteer code can wait for using waitForEvent.
If all else fails and time is not important then you can put in a sleep(n) that is long enough to guarantee the page is loaded.

Puppeteer and dynamically added iFrame (element)

We have an angularJs application that popup a modal form (component) on button pressed.
This component loads an iFrame, which I cannot seem to access with Puppeteer.
Have tried with mainFrame.
await page.waitFor(15000);
const frame = page.mainFrame().childFrames().find((iframe) => {
console.log('FRAME', iframe.name(), iframe.url());
return iframe.name() === 'iFrameName';
});
The above only has one frame (the main frame/window).
Have tried with frames
await page.waitFor(15000);
const frame = page.frames().find((iframe) => {
console.log('FRAME', iframe.name(), iframe.url());
return iframe.name() === 'iFrameName';
});
Have tried with contentFrame
await page.waitForSelector('iframe', { visible: true, timeout: 2000 });
const elementHandle = await page.$('iframe');
await page.waitFor(1000);
const frame = await elementHandle.contentFrame();
With the above, elementHandle has a value but frame is null
We have this working with Protractor, were hopping to move to Puppeteers but if there is no solution will have to stick with Protractor (which has it own other issues)
Currently, there is no support for out-of-process iframes (OOPIFs). To be able to work with them, you need to launch Chromium with --disable-features=site-per-process:
const browser = await puppeteer.launch({
args: ['--disable-features=site-per-process']
});
You can track puppeteer's issue/support here.
I have a similar problem, an iframe dynamically called, so that src=(unknown) with a JS
href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(VARÄ°ABLES,,true,,false,))
is it possible to clone The or an iframe via invoking js calling it in puppeteer? if so you can try.

puppeteer: dynamic page using indexeddb not loading

I am trying to take a screenshot of a page that uses indexeddb to generate some of its content.
My puppeteer code is pretty simple:
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({
width: 1200,
height: 1080
});
const relevantSite = 'http://example.com'; // <<-- replace this with the actual site
await page.goto(relevantSite);
await page.waitForSelector('#myContentSelector');
console.log('Content is now loaded');
await page.screenshot({path: 'dynamic-screenshot.png'});
await browser.close();
})();
This code above works fine for pages that dynamically generate content and DON'T use indexeddb but for pages that do, I just can't seem to figure out what I need to do in order to get the page to load correctly.
Do I have to do something special to get this indexeddb to work for headless pages loaded in puppeteer?
IndexedDB seems to work just fine with puppeteer. Example.
I suspect your issue is actually that you don't give the page time to load, and for the content to be populated.
await page.waitForSelector('#myContentSelector');
Is #myContentSelector something like a container div, where the actual content is populated later (even 25ms later)
Try waiting a second after the page loads before taking the screenshot?
Try follow like:
await page.goto(relevantSite, { waitUntil: 'networkidle2' });
const navigationPromise = page.waitForNavigation();
await page.waitForSelector('#myContentSelector');
console.log('Content is now loaded');
await page.screenshot({path: 'dynamic-screenshot.png'});
await navigationPromise;
Using options waitUntil for page.goto and with page.waitForNavigation

How do I use jQuery with pages on puppeteer?

I am trying to use jQuery on the pages I load with puppeteer and I wanted to know how I can do the same? My code structure is like:
const puppeteer = require('puppeteer');
let browser = null;
async function getSelectors() {
try{
browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.setViewport({width: 1024, height: 1080});
await page.goto('https://www.google.com/');
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.2.1.min.js'});
var button = $('h1').text();
console.log(button);
} catch (e) {
console.log(e);
}
}
getSelectors();
Also I will be navigating to many pages within puppeteer so is there a way I can just add jQuery once and then use it throughout? A local jquery file implementation would be helpful as well.
I tried implementing the answers from inject jquery into puppeteer page but couldn't get my code to work. I will be doing much more complex stuff than the one illustrated above so I need jQuery and not vanilla JS solutions.
I finally got a tip from How to scrape that web page with Node.js and puppeteer
which helped me understand that the Puppeteer page.evaluate function gives you direct access to the DOM of the page you've just launched in Puppeteer. To get the following code to work, you should know I'm running this test in Jest. Also, you need a suitable URL to a page that has a table element with an ID. Obviously, you can change the details of both the page and the jQuery function you want to try out. I was in the middle of a jQuery Datatables project so I needed to make sure I had a table element and that jQuery could find it. The nice thing about this environment is that the browser is quite simply a real browser, so if I add a script tag to the actual HTML page instead of adding it via Puppeteer, it works just the same.
test('Check jQuery datatables', async () => {
const puppeteer = require('puppeteer');
let browser = await puppeteer.launch();
let page = await browser.newPage();
await page.goto('http://localhost/jest/table.html');
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.3.1.slim.min.js'});
const result = await page.evaluate(() => {
try {
var table = $("table").attr("id");
return table;
} catch (e) {
return e.message;
}
});
await console.log("result", result);
await browser.close();
});
The key discovery for me: within the page.evaluate function, your JavaScript code runs in the familiar context of the page you've just opened in the browser. I've moved on to create tests for complex objects created using jQuery plugins and within page.evaluate they behave as expected. Trying to use JSDOM was driving me crazy because it behaved a bit like a browser, but was different with regard to the key points I was using to test my application.

Categories

Resources