I want to make a scraper with puppeteer, that opens a site, uses its search bar and opens the first link.
That is the code:
const puppeteer = require('puppeteer');
(async () => {
let browser = await puppeteer.launch();
let page = await browser.newPage();
await page.goto('https://example.com', {waitUntil: 'networkidle2'});
await page.click('[name=query]');
await page.keyboard.type("(Weapon)");
await page.keyboard.press('Enter');
await page.waitForSelector('div[class="search-results"]', {timeout: 100000});
});
The problem is I can't make it open the first link from the search results, I tried to use page.click() But all of the search results are the same except the URL.
What I want to know is how can I make it open the first link from search results.
There're more ways to solve this. I recommend experimenting with it a bit, so you learn different ways of doing this.
await page.click('.search-results a');
it turns out Puppeteer always click on the first element it finds, so if you want the first one, this will be enough.
Or you can select all the links and then click on the first one:
const resultLinks = await page.$$('.search-results a');
resultLinks[0].click();
It'd be better to include a condition here as well, so you don't end up with an error because no element was found:
const resultLinks = await page.$$('.search-results a');
if (resultLinks.length) resultLinks[0].click();
There're more ways, so if you want to learn more, please refer to the API documenttion.
Related
I was wondering if there is a way to verify using playwright that there are 2 web tabs open...
In the code below I click on a button and verify that google has opened.
BUT!
I also would like to verify that there are now 2 tabs open and wanted to know if it can be done with some thing like .toHaveCount(2)
OR
would you guys recommend something better?
test('Click button link test using blank target', async () => {
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
await page.goto(<my weblink>);
const newPagePromise = new Promise((resolve) => context.once('page', resolve));
// Button locator used to click the button to open the link
const buttonLocator = page.locator('.MuiButton-root');
// Click the button
await buttonLocator.click();
// Verify that after clicking on the page that the correct URL is present
const newPage = await newPagePromise;
expect(newPage.url()).toContain('https://www.google.ca');
});
You can use browserContext.pages() to get all the pages opened in the context. You can then use length and assert the values like this
const totPages = context.pages().length
expect(totPages).toHaveCount(2);
First time using Puppeteer and trying to simply click this button
after clicking the deny cookies button. That's my code:
await page.goto('https://myurl.com');
await page.click('a.cc-btn.cc-deny');
// await page.waitForNavigation();
await page.waitForSelector("#detailview_btn_order", {visible: true});
await page.click("#detailview_btn_order");
Clicking the deny cookies button works like a charm. However, it seems the second button can't be identified by Puppeteer. If I don't use waitForSelector it just says it can't find it. If I use it, I get a timeout after 30 seconds even though the website finishes loading after 5 seconds. If I uncomment waitForNavigation (regardless of what options I use) I get a timeout there, even thoug the site loads within seconds. What am I doing wrong? Thanks!
Can you try this:
await page.goto('https://myurl.com');
await Promise.all([
page.click('a.cc-btn.cc-deny'),
page.waitForNavigation(),
]);
const iframeElement = await page.waitForSelector("#my-iframe");
const frame = await iframeElement.contentFrame();
await frame.waitForSelector("#detailview_btn_order", {visible: true});
await frame.click("#detailview_btn_order");
Sometimes there is a race condition between a click and navigation.
On a page I'm scraping with Puppeteer, I have a list with the same id for every li. I am trying to find and click on an element with specific text within this list. I have the following code:
await page.waitFor(5000)
const linkEx = await page.$x("//a[contains(text(), 'Shop')]")
if (linkEx.length > 0) {
await linkEx[0].click()
}
Do you have any idea how I could replace the first line with waiting for the actual text 'Shop'?
I tried await page.waitFor(linkEx), waitForSelector(linkEx) but it's not working.
Also, I would like to replace that a in the second line of code with the actual id (#activities) or something like that but I couldn't find a proper example.
Could you please help me with this issue?
page.waitForXPath what you need here.
Example:
const puppeteer = require('puppeteer')
async function fn() {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://example.com')
// await page.waitForSelector('//a[contains(text(), "More information...")]') // ❌
await page.waitForXPath('//a[contains(text(), "More information...")]') // ✅
const linkEx = await page.$x('//a[contains(text(), "More information...")]')
if (linkEx.length > 0) {
await linkEx[0].click()
}
await browser.close()
}
fn()
Try this for id-based XPath:
"//*[#id='activities' and contains(text(), 'Shop')]"
Did you know? If you right-click on an element in Chrome DevTools "Elements" tab and you select "Copy": there you are able to copy the exact selector or XPath of an element. After that, you can switch to the "Console" tab and with the Chrome API you are able to test the selector's content, so you can prepare it for your puppeteer script. E.g.: $x("//*[#id='activities' and contains(text(), 'Shop')]")[0].href should show the link what you expected to click on, otherwise you need to change on the access, or you need to check if there are more elements with the same selector etc. This may help to find more appropriate selectors.
For puppeteer 19 and newer, waitForXPath() is obsolete. Use the xpath prefix instead
await page.waitForSelector('xpath/' + xpathExpression)
In your case:
const linkEx = await page.waitForSelector('xpath///a[contains(text(), "Shop")]');
await linkEx.click();
Since ESPN does not provide an API, I am trying to use Puppeteer to scrape data about my fantasy football league. However, I am having a hard time trying to login using puppeteer due to the login form being nested with an iframe element.
I have gone to http://www.espn.com/login and selected the iframe. I can't seem to select any of the elements within the iframe except for the main section by doing
frame.$('.main')
This is the code that seems to get the iframe with the login form.
const browser = await puppeteer.launch({headless:false});
const page = await browser.newPage();
await page.goto('http://www.espn.com/login')
await page.waitForSelector("iframe");
const elementHandle = await page.$('div#disneyid-wrapper iframe');
const frame = await elementHandle.contentFrame();
await browser.close()
I want to be able to access the username field, password field, and the login button within the iframe element. Whenever I try to access these fields, I get a return of null.
You can get the iframe using contentFrame as you are doing now, and then call $.
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('http://www.espn.com/login')
const elementHandle = await page.waitForSelector('div#disneyid-wrapper iframe');
const frame = await elementHandle.contentFrame();
await frame.waitForSelector('[ng-model="vm.username"]');
const username = await frame.$('[ng-model="vm.username"]');
await username.type('foo');
await browser.close()
I had an issue with finding stripe elements.
The reason for that is the following:
You can't access an with different origin using JavaScript, it would be a huge security flaw if you could do it. For the same-origin policy browsers block scripts trying to access a frame with a different origin. See more detailed answer here
Therefore when I tried to use puppeteer's methods:Page.frames() and Page.mainFrame(). ElementHandle.contentFrame() I did not return any iframe to me. The problem is that it was happening silently and I couldn't figure out why it couldn't find anything.
Adding these arguments to launch options solved the issue:
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process'
I am trying to use jQuery on the pages I load with puppeteer and I wanted to know how I can do the same? My code structure is like:
const puppeteer = require('puppeteer');
let browser = null;
async function getSelectors() {
try{
browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.setViewport({width: 1024, height: 1080});
await page.goto('https://www.google.com/');
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.2.1.min.js'});
var button = $('h1').text();
console.log(button);
} catch (e) {
console.log(e);
}
}
getSelectors();
Also I will be navigating to many pages within puppeteer so is there a way I can just add jQuery once and then use it throughout? A local jquery file implementation would be helpful as well.
I tried implementing the answers from inject jquery into puppeteer page but couldn't get my code to work. I will be doing much more complex stuff than the one illustrated above so I need jQuery and not vanilla JS solutions.
I finally got a tip from How to scrape that web page with Node.js and puppeteer
which helped me understand that the Puppeteer page.evaluate function gives you direct access to the DOM of the page you've just launched in Puppeteer. To get the following code to work, you should know I'm running this test in Jest. Also, you need a suitable URL to a page that has a table element with an ID. Obviously, you can change the details of both the page and the jQuery function you want to try out. I was in the middle of a jQuery Datatables project so I needed to make sure I had a table element and that jQuery could find it. The nice thing about this environment is that the browser is quite simply a real browser, so if I add a script tag to the actual HTML page instead of adding it via Puppeteer, it works just the same.
test('Check jQuery datatables', async () => {
const puppeteer = require('puppeteer');
let browser = await puppeteer.launch();
let page = await browser.newPage();
await page.goto('http://localhost/jest/table.html');
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.3.1.slim.min.js'});
const result = await page.evaluate(() => {
try {
var table = $("table").attr("id");
return table;
} catch (e) {
return e.message;
}
});
await console.log("result", result);
await browser.close();
});
The key discovery for me: within the page.evaluate function, your JavaScript code runs in the familiar context of the page you've just opened in the browser. I've moved on to create tests for complex objects created using jQuery plugins and within page.evaluate they behave as expected. Trying to use JSDOM was driving me crazy because it behaved a bit like a browser, but was different with regard to the key points I was using to test my application.