How do I use jQuery with pages on puppeteer? - javascript

I am trying to use jQuery on the pages I load with puppeteer and I wanted to know how I can do the same? My code structure is like:
const puppeteer = require('puppeteer');
let browser = null;
async function getSelectors() {
try{
browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.setViewport({width: 1024, height: 1080});
await page.goto('https://www.google.com/');
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.2.1.min.js'});
var button = $('h1').text();
console.log(button);
} catch (e) {
console.log(e);
}
}
getSelectors();
Also I will be navigating to many pages within puppeteer so is there a way I can just add jQuery once and then use it throughout? A local jquery file implementation would be helpful as well.
I tried implementing the answers from inject jquery into puppeteer page but couldn't get my code to work. I will be doing much more complex stuff than the one illustrated above so I need jQuery and not vanilla JS solutions.

I finally got a tip from How to scrape that web page with Node.js and puppeteer
which helped me understand that the Puppeteer page.evaluate function gives you direct access to the DOM of the page you've just launched in Puppeteer. To get the following code to work, you should know I'm running this test in Jest. Also, you need a suitable URL to a page that has a table element with an ID. Obviously, you can change the details of both the page and the jQuery function you want to try out. I was in the middle of a jQuery Datatables project so I needed to make sure I had a table element and that jQuery could find it. The nice thing about this environment is that the browser is quite simply a real browser, so if I add a script tag to the actual HTML page instead of adding it via Puppeteer, it works just the same.
test('Check jQuery datatables', async () => {
const puppeteer = require('puppeteer');
let browser = await puppeteer.launch();
let page = await browser.newPage();
await page.goto('http://localhost/jest/table.html');
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.3.1.slim.min.js'});
const result = await page.evaluate(() => {
try {
var table = $("table").attr("id");
return table;
} catch (e) {
return e.message;
}
});
await console.log("result", result);
await browser.close();
});
The key discovery for me: within the page.evaluate function, your JavaScript code runs in the familiar context of the page you've just opened in the browser. I've moved on to create tests for complex objects created using jQuery plugins and within page.evaluate they behave as expected. Trying to use JSDOM was driving me crazy because it behaved a bit like a browser, but was different with regard to the key points I was using to test my application.

Related

How do I make puppeteer click on link?

I want to make a scraper with puppeteer, that opens a site, uses its search bar and opens the first link.
That is the code:
const puppeteer = require('puppeteer');
(async () => {
let browser = await puppeteer.launch();
let page = await browser.newPage();
await page.goto('https://example.com', {waitUntil: 'networkidle2'});
await page.click('[name=query]');
await page.keyboard.type("(Weapon)");
await page.keyboard.press('Enter');
await page.waitForSelector('div[class="search-results"]', {timeout: 100000});
});
The problem is I can't make it open the first link from the search results, I tried to use page.click() But all of the search results are the same except the URL.
What I want to know is how can I make it open the first link from search results.
There're more ways to solve this. I recommend experimenting with it a bit, so you learn different ways of doing this.
await page.click('.search-results a');
it turns out Puppeteer always click on the first element it finds, so if you want the first one, this will be enough.
Or you can select all the links and then click on the first one:
const resultLinks = await page.$$('.search-results a');
resultLinks[0].click();
It'd be better to include a condition here as well, so you don't end up with an error because no element was found:
const resultLinks = await page.$$('.search-results a');
if (resultLinks.length) resultLinks[0].click();
There're more ways, so if you want to learn more, please refer to the API documenttion.

screen shot and data trying to be taken before site fully loads using puppeteer

Hi i am trying to get to take a screenshot of a website using puppeteer but the site loads quite slow which leads to always not being able to grab any data or take screen shots, I would like to delay my screenshot until the site is finished loading, I have tried a bunch of methods and cant figure it out. Thanks in advance for any help.
This is my Code
const puppeteer = require("puppeteer-extra");
// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());
async function scrapeProduct(url) {
//launching puppeteer
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url, { waitUntil: "load" });
await page.waitFor("*");
function time() {
var d = new Date();
var n = d.getSeconds();
return console.log(n);
}
time();
await page.screenshot({ path: "testresult.png" });
time();
await browser.close();
}
scrapeProduct("https://www.realcanadiansuperstore.ca/search?search-bar=milk");
waitFor has been depreciated recently so you are better off trying the other events.
I can't inspect the webpage you are taking a screenshot of so cannot tell what might be happening after the load event.
However have you tried the other events puppeteer offers?
waitForNavigation and waitForSelector mentioned in https://stackoverflow.com/a/52501934/484337
If you have control of the page you are taking a screenshot of then you can add a DOM event to it which your puppeteer code can wait for using waitForEvent.
If all else fails and time is not important then you can put in a sleep(n) that is long enough to guarantee the page is loaded.

Executing JavaScript in Puppeteer

I'm trying to execute some JavaScript via Puppeteer, which I'd normally execute through the Dev Tools console, as below:
Dev Tools Command
Essentially I'm trying to list out all the elements in the Array.
I've been reading through StackOverflow and the Docs here:
https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#pageevaluatepagefunction-args
But I can't seem to get this working. I've tried evaluating the page with a multitude of different bits of code, all have come up empty.
Any help would be appreciated!
So, as I suspected, this was extremely simple code.
I'd actually already written this code prior to asking this question, but I'd written it within the wrong section of the overall script and thus threw an error.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(<URL>)
​
const ids = await page.evaluate('product_ids');
​
console.log('ids are:', ids);
​
await browser.close();
})();

Puppeteer and dynamically added iFrame (element)

We have an angularJs application that popup a modal form (component) on button pressed.
This component loads an iFrame, which I cannot seem to access with Puppeteer.
Have tried with mainFrame.
await page.waitFor(15000);
const frame = page.mainFrame().childFrames().find((iframe) => {
console.log('FRAME', iframe.name(), iframe.url());
return iframe.name() === 'iFrameName';
});
The above only has one frame (the main frame/window).
Have tried with frames
await page.waitFor(15000);
const frame = page.frames().find((iframe) => {
console.log('FRAME', iframe.name(), iframe.url());
return iframe.name() === 'iFrameName';
});
Have tried with contentFrame
await page.waitForSelector('iframe', { visible: true, timeout: 2000 });
const elementHandle = await page.$('iframe');
await page.waitFor(1000);
const frame = await elementHandle.contentFrame();
With the above, elementHandle has a value but frame is null
We have this working with Protractor, were hopping to move to Puppeteers but if there is no solution will have to stick with Protractor (which has it own other issues)
Currently, there is no support for out-of-process iframes (OOPIFs). To be able to work with them, you need to launch Chromium with --disable-features=site-per-process:
const browser = await puppeteer.launch({
args: ['--disable-features=site-per-process']
});
You can track puppeteer's issue/support here.
I have a similar problem, an iframe dynamically called, so that src=(unknown) with a JS
href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(VARİABLES,,true,,false,))
is it possible to clone The or an iframe via invoking js calling it in puppeteer? if so you can try.

puppeteer execute a js function on the chosen page

This is the analyzed page https://www.diretta.it/.
In this page the content of the following days is loaded dynamically with the js without changing the URL of the site (you can try it at the top right of the table).
Using puppeteer, with the following code
await page.goto ('https://www.diretta.it/');
it loads the contents of today's page.
Is there a way to load the page with tomorrow's content?
i have to scrape information from the matches of the following days
the function in js executable from terminal for change day is:
> set_calendar_date ('1')
What you are looking for is the page.evaluate() function.
This function lets you run any JS function in the page context.
In simpler terms, running page.evaluate() is akin to opening Dev tools and writing set_calendar_date('1') there directly.
Here is a working snippet, don't hesitate to pass {headless: false} to puppeteer.launch() if you want to see it working with your own eyes.
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.diretta.it/');
await page.evaluate(() => {
set_calendar_date ('1');
});
await page.waitFor(500); //Wait a bit for the website to refresh contents
//Updated table is now available
})();

Categories

Resources