I'm trying to execute some JavaScript via Puppeteer, which I'd normally execute through the Dev Tools console, as below:
Dev Tools Command
Essentially I'm trying to list out all the elements in the Array.
I've been reading through StackOverflow and the Docs here:
https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#pageevaluatepagefunction-args
But I can't seem to get this working. I've tried evaluating the page with a multitude of different bits of code, all have come up empty.
Any help would be appreciated!
So, as I suspected, this was extremely simple code.
I'd actually already written this code prior to asking this question, but I'd written it within the wrong section of the overall script and thus threw an error.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(<URL>)
const ids = await page.evaluate('product_ids');
console.log('ids are:', ids);
await browser.close();
})();
Related
I want to make a scraper with puppeteer, that opens a site, uses its search bar and opens the first link.
That is the code:
const puppeteer = require('puppeteer');
(async () => {
let browser = await puppeteer.launch();
let page = await browser.newPage();
await page.goto('https://example.com', {waitUntil: 'networkidle2'});
await page.click('[name=query]');
await page.keyboard.type("(Weapon)");
await page.keyboard.press('Enter');
await page.waitForSelector('div[class="search-results"]', {timeout: 100000});
});
The problem is I can't make it open the first link from the search results, I tried to use page.click() But all of the search results are the same except the URL.
What I want to know is how can I make it open the first link from search results.
There're more ways to solve this. I recommend experimenting with it a bit, so you learn different ways of doing this.
await page.click('.search-results a');
it turns out Puppeteer always click on the first element it finds, so if you want the first one, this will be enough.
Or you can select all the links and then click on the first one:
const resultLinks = await page.$$('.search-results a');
resultLinks[0].click();
It'd be better to include a condition here as well, so you don't end up with an error because no element was found:
const resultLinks = await page.$$('.search-results a');
if (resultLinks.length) resultLinks[0].click();
There're more ways, so if you want to learn more, please refer to the API documenttion.
Hi i am trying to get to take a screenshot of a website using puppeteer but the site loads quite slow which leads to always not being able to grab any data or take screen shots, I would like to delay my screenshot until the site is finished loading, I have tried a bunch of methods and cant figure it out. Thanks in advance for any help.
This is my Code
const puppeteer = require("puppeteer-extra");
// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());
async function scrapeProduct(url) {
//launching puppeteer
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url, { waitUntil: "load" });
await page.waitFor("*");
function time() {
var d = new Date();
var n = d.getSeconds();
return console.log(n);
}
time();
await page.screenshot({ path: "testresult.png" });
time();
await browser.close();
}
scrapeProduct("https://www.realcanadiansuperstore.ca/search?search-bar=milk");
waitFor has been depreciated recently so you are better off trying the other events.
I can't inspect the webpage you are taking a screenshot of so cannot tell what might be happening after the load event.
However have you tried the other events puppeteer offers?
waitForNavigation and waitForSelector mentioned in https://stackoverflow.com/a/52501934/484337
If you have control of the page you are taking a screenshot of then you can add a DOM event to it which your puppeteer code can wait for using waitForEvent.
If all else fails and time is not important then you can put in a sleep(n) that is long enough to guarantee the page is loaded.
This is the analyzed page https://www.diretta.it/.
In this page the content of the following days is loaded dynamically with the js without changing the URL of the site (you can try it at the top right of the table).
Using puppeteer, with the following code
await page.goto ('https://www.diretta.it/');
it loads the contents of today's page.
Is there a way to load the page with tomorrow's content?
i have to scrape information from the matches of the following days
the function in js executable from terminal for change day is:
> set_calendar_date ('1')
What you are looking for is the page.evaluate() function.
This function lets you run any JS function in the page context.
In simpler terms, running page.evaluate() is akin to opening Dev tools and writing set_calendar_date('1') there directly.
Here is a working snippet, don't hesitate to pass {headless: false} to puppeteer.launch() if you want to see it working with your own eyes.
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.diretta.it/');
await page.evaluate(() => {
set_calendar_date ('1');
});
await page.waitFor(500); //Wait a bit for the website to refresh contents
//Updated table is now available
})();
I am trying to use jQuery on the pages I load with puppeteer and I wanted to know how I can do the same? My code structure is like:
const puppeteer = require('puppeteer');
let browser = null;
async function getSelectors() {
try{
browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.setViewport({width: 1024, height: 1080});
await page.goto('https://www.google.com/');
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.2.1.min.js'});
var button = $('h1').text();
console.log(button);
} catch (e) {
console.log(e);
}
}
getSelectors();
Also I will be navigating to many pages within puppeteer so is there a way I can just add jQuery once and then use it throughout? A local jquery file implementation would be helpful as well.
I tried implementing the answers from inject jquery into puppeteer page but couldn't get my code to work. I will be doing much more complex stuff than the one illustrated above so I need jQuery and not vanilla JS solutions.
I finally got a tip from How to scrape that web page with Node.js and puppeteer
which helped me understand that the Puppeteer page.evaluate function gives you direct access to the DOM of the page you've just launched in Puppeteer. To get the following code to work, you should know I'm running this test in Jest. Also, you need a suitable URL to a page that has a table element with an ID. Obviously, you can change the details of both the page and the jQuery function you want to try out. I was in the middle of a jQuery Datatables project so I needed to make sure I had a table element and that jQuery could find it. The nice thing about this environment is that the browser is quite simply a real browser, so if I add a script tag to the actual HTML page instead of adding it via Puppeteer, it works just the same.
test('Check jQuery datatables', async () => {
const puppeteer = require('puppeteer');
let browser = await puppeteer.launch();
let page = await browser.newPage();
await page.goto('http://localhost/jest/table.html');
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.3.1.slim.min.js'});
const result = await page.evaluate(() => {
try {
var table = $("table").attr("id");
return table;
} catch (e) {
return e.message;
}
});
await console.log("result", result);
await browser.close();
});
The key discovery for me: within the page.evaluate function, your JavaScript code runs in the familiar context of the page you've just opened in the browser. I've moved on to create tests for complex objects created using jQuery plugins and within page.evaluate they behave as expected. Trying to use JSDOM was driving me crazy because it behaved a bit like a browser, but was different with regard to the key points I was using to test my application.
I am writing a somewhat simple web scraper that uses Puppeteer to search Google for a list of queries.
I have it working until I hit Google's reCAPTCHA. I know bypassing reCAPTCHA is unorthodox so I won't ask for something like that here (although the code I'm using was an attempt at that -- sorry, not sorry).
What I'm hoping for is some kind of prompt to enter the reCAPTCHA when it comes up so I can continue searching.
I'm new to Node so if perhaps PhantomJS or something similar would be better for what I'm trying to accomplish I'm willing to rewrite it from the ground up. I haven't tried because Phantom seems so much messier and there's less documentation on it.
Anyway, my code follows:
const fs = require('fs');
const puppeteer = require('puppeteer');
const cheerio = require('cheerio');
const LineByLine = require('line-by-line'),
lr = new LineByLine('dorks.txt', {skipEmptyLines: true});
---- snip ----
lr.on('end', async function () {
console.log(`[*] Loaded ${dorks.length+1} dorks. Let's begin.`);
for (var i = 0; i < dorks.length; i++) {
const browser = await puppeteer.launch({timeout: 10000, headless: false});
const page = await browser.newPage();
try {
await page.goto('https://www.google.com/search?num=100&q=' + dorks[i]);
let content = await page.content();
var $ = cheerio.load(content);
if (content.includes("Our systems have detected unusual traffic from your computer network.")) {
// call captcha handler?
console.log(`We got CAPTCHA'd. -_-`);
await $('form').contents().find('textarea#g-recaptcha-response').html(key); // this acts like it works but doesn't
await $('form').contents().find('input#recaptcha-token').val(key); // this one isn't even trying
// This is probably an issue with recaptcha being in an iFrame but I'm lost
await page.click('[name="submit"]');
await page.waitForNavigation();
return;
}
}
}
});
---- snip ----
Sorry for the wall of text. This is the first time I've come to a forum for help like this and I didn't want to give too little info.
Send help.
Edit: Snipped out a bunch of code so no one has to wade through it. Updated title. Hopefully it wasn't better before. First post guys, sorry.