PhantomJS to capture next page content after button click event - javascript

I am trying to capture second page content after click method. But it is returning front page content.
const status = await page.open('https://www.dubailand.gov.ae/English/services/Eservices/Pages/Brokers.aspx');
console.log(status);
await page.evaluate(function() {
document.querySelector('#ctl00_ctl42_g_26779dcd_6f3a_42ae_903c_59dea61690e9_dpPager > a.NextPageLink').click();
})
const content = await page.property('content');
console.log(content);
I have done similar task by using puppeteer, but shifting to phantomjs due to deployment issues with puppeteer.
any help is appreciated.

You get the front page because you request page's content immediately after clicking on the "next" button, but you need to wait for Ajax request to finish. It can be done by observing a "tree palm" ajax loader: when it's not visible, the results are in.
// Utility function to pass time: await timeout(ms)
const timeout = ms => new Promise(resolve => setTimeout(resolve, ms));
// emulate a realistic client's screen size
await page.property('viewportSize', { width: 1280, height: 720 });
const status = await page.open('https://www.dubailand.gov.ae/English/services/Eservices/Pages/Brokers.aspx');
await page.evaluate(function() {
document.querySelector('#ctl00_ctl42_g_26779dcd_6f3a_42ae_903c_59dea61690e9_dpPager > a.NextPageLink').click();
});
// Give it time to start request
await timeout(1000);
// Wait until the loader is gone
while(1 == await page.evaluate(function(){
return jQuery(".Loader_large:visible").length
}))
{
await timeout(1000);
console.log(".");
}
// Now for scraping
let contacts = await page.evaluate(function(){
var contacts = [];
jQuery("#tbBrokers tr").each(function(i, row){
contacts.push({"title" : jQuery(row).find("td:nth-child(2)").text().trim(), "phone" : jQuery(row).find("td:nth-child(4)").text().trim() })
})
return contacts;
});
console.log(contacts);

Related

How can I click on all links matching a selector with Playwright?

I'm using Playwright to scrape some data. How do I click on all links on the page matching a selector?
const { firefox } = require('playwright');
(async () => {
const browser = await firefox.launch({headless: false, slowMo: 50});
const page = await browser.newPage();
await page.goto('https://www.google.com');
page.pause(); // allow user to manually search for something
const wut = await page.$$eval('a', links => {
links.forEach(async (link) => {
link.click(); // maybe works?
console.log('whoopee'); // doesn't print anything
page.goBack(); // crashes
});
return links;
});
console.log(`wut? ${wut}`); // prints 'wut? undefined'
await browser.close();
})();
Some issues:
console.log inside the $$eval doesn't do anything.
page.goBack() and page.pause() inside the eval cause a crash.
The return value of $$eval is undefined (if I comment out page.goBack() so I get a return value at all). If I return links.length instead of links, it's correct (i.e. it's a positive integer). Huh?
I get similar results with:
const links = await page.locator('a');
await links.evaluateAll(...)
Clearly I don't know what I'm doing. What's the correct code to achieve something like this?
(X-Y problem alert: I don't actually care if I do this with $$eval, Playwright, or frankly even Javascript; all I really want to do is make this work in any language or tool).
const { context } = await launch({ slowMo: 250 });
const page = await context.newPage();
await page.goto('https://stackoverflow.com/questions/70702820/how-can-i-click-on-all-links-matching-a-selector-with-playwright');
const links = page.locator('a:visible');
const linksCount = await links.count();
for (let i = 0; i < linksCount; i++) {
await page.bringToFront();
try {
const [newPage] = await Promise.all([
context.waitForEvent('page', { timeout: 5000 }),
links.nth(i).click({ modifiers: ['Control', 'Shift'] })
]);
await newPage.waitForLoadState();
console.log('Title:', await newPage.title());
console.log('URL: ', page.url());
await newPage.close();
}
catch {
continue;
}
}
There's a number of ways you could do this, but I like this approach the most. Clicking a link, waiting for the page to load, and then going back to the previous page has a lot of problems with it - most importantly is that for many pages the links might change every time the page loads. Ctrl+shift+clicking opens in a new tab, which you can access using the Promise.all pattern and catching the 'page' event.
I only tried this on this page, so I'm sure there's tons of other problems that my arise. But for this page in particular, using 'a:visible' was necessary to prevent getting stuck on hidden links. The whole clicking operation is wrapped in a try/catch because some of the links aren't real links and don't open a new page.
Depending on your use case, it may be easiest just to grab all the hrefs from each link:
const links = page.locator('a:visible');
const linksCount = await links.count();
const hrefs = [];
for (let i = 0; i < linksCount; i++) {
hrefs.push(await links.nth(i).getAttribute('href'));
}
console.log(hrefs);
Try this approach.I will use typescript.
await page.waitForSelector(selector,{timeout:10000});
const links = await page.$$(selector);
for(const link of links)
{
await link.click({timeout:8000});
//your additional code
}
See more on https://youtu.be/54OwsiRa_eE?t=488

Puppeteer waitForNavigation reliability in determining page URL

I've got a Puppeteer Node JS app that, given a starting URL, follows the URL and scrapes the window's URL of each page it identifies. Originally I was using a setInterval and getting the current URL every 250ms but have stumbled upon the waitForNavigation option and need to know whether what I've got is going to be reliable?
Given the starting URL, I need to identify all of the pages, and just the pages that Puppeteer goes through, and then with a setTimeout make the assumption that if Puppeteer hasn't redirected to a new page within a given period of time, assume that there's no more redirections.
Will page.waitForNavigation work for this intended behaviour?
My current JS is:
let evalTimeout;
// initiate a Puppeteer instance with options and launch
const browser = await puppeteer.launch({
args: argOptions,
headless: (config.puppeteer.run_in_headless === 'true') ? true : false
});
// launch a new page
const page = await browser.newPage();
// go to a URL
await page.goto(body.url);
// create a function to inject into the page to scrape data
const currentUrl = () => {
return window.location.href;
}
// log the current page every 250ms
async function scrapePageUrl (runOnce = false) {
try {
console.log('running timeout...')
if (!runOnce) {
evalTimeout = setTimeout(() => {
console.log('6s reached, running omce')
scrapePageUrl(true) // assumes no more redirections after 6s, get final URL
}, 6000)
}
const url = await page.evaluate(currentUrl);
if (!runOnce) await page.waitForNavigation();
console.log(`url: ${url}`)
if (!runOnce) {
clearTimeout(evalTimeout)
scrapePageUrl()
}
} catch (err) { }
}
scrapePageUrl()

Inner async code not executing at all despite using await block

I've got a pretty simple class that I'm trying to use Puppeteer within, but no matter what I do the async code just doesn't seem to execute when I put a breakpoint on it.
The let data = await page.$$eval will execute and then literally nothing happens after that. The code doesn't even step into the inner function block.
Surely the await on that line should force the inner async block to execute before it moves onto the console log at the bottom?
let url = "https://www.ikea.com/gb/en/p/godmorgon-high-cabinet-brown-stained-ash-effect-40457851/";
let scraper = new Scraper();
scraper.launch(url);
export class Scraper{
constructor(){}
async launch(url: string){
let browser = await puppeteer.launch({});
let page = await browser.newPage();
await page.goto(url);
let data = await page.$$eval(' body *', elements => {
console.log("Elements: ", elements);
elements.forEach(element => {
console.log("Element: ", element.className);
})
return "done";
})
console.log("Data: ", data);
}
}
I'm trying to follow this tutorial.
I even copied this block of code directly from the tutorial but still it doesn't work.
await page.goto(this.url);
// Wait for the required DOM to be rendered
await page.waitForSelector('.page_inner');
// Get the link to all the required books
let urls = await page.$$eval('section ol > li', links => {
// Make sure the book to be scraped is in stock
links = links.filter(link => link.querySelector('.instock.availability > i').textContent !== "In stock")
// Extract the links from the data
links = links.map(el => el.querySelector('h3 > a').href)
return links;
});
console.log(urls);

Run puppeteer function again until completion

I've got a puppeteer function that runs on a Node JS script, upon launching, my initial function runs, however, after navigating to the next page of a website (in my example using btnToClick) I need it to re-evaluate the page and collect more data. Right now I'm using a setInterval that assumes the total time per page scrape is 12 seconds, I'd like to be able to run my extract function again after it's completed one, and keep running it until nextBtn returns 0.
Below is my current set up:
function extractFromArea() {
puppeteer.launch({
headless: true
}).then(async browser => {
// go to our page of choice, and wait for the body to load
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 720 });
await page.goto('mypage');
const extract = async function() {
// wait before evaluating the page
await page.evaluate(() => {
// next button
const nextBtn = document.querySelectorAll('a.nav.next.rndBtn.ui_button.primary.taLnk').length
if (nextBtn < 1) {
// if no more pages are found
}
}
// wait, then proceed to next page
setTimeout(() => {
const btnToClick = document.querySelector('a.nav.next.rndBtn.ui_button.primary.taLnk')
btnToClick.click()
}, 2000)
});
};
// TODO: somehow need to make this run again based on when the current extract function is finished.
setInterval(() => {
extract()
}, 12000)
// kick off the extraction
extract()
});
}
Here's what a while loop might look like:
while(await page.$('a.next')){
await page.click('a.next')
// do something
}

Puppeteer: opening page, extracting data, closing and continuing

I'm scraping a .NET site and one common operation I need to perform is:
set form, submit
open page with result data
extract values form a table
return to form, repeat with different form paramaters
I can see how to open a page, wait for it via browser.on('targetcreated'), then extract the data, however how do I cause the previous code to wait for the tab to close before submitting the form with the next set of parameters? The result page must be parsed before the next operation is submitted as it share's the same URL.
This is perhaps a more general JS question.
This is my current code which checks if a page needs to be opened then clicks the link.
async function fetchAnalysis(page, eventBandId, x, y) {
const ANALYSIS_TIMEOUT = 90000; // 90 seconds
const xElem = await page.$(SELECTORS.event_band_analysis_x_axis);
await xElem.type(x[1])
const yElem = await page.$(SELECTORS.event_band_analysis_y_axis);
await yElem.type(y[1])
await page.click(SELECTORS.event_band_analysis_calculate);
await page.waitForSelector(SELECTORS.spinner, { timeout:ANALYSIS_TIMEOUT, hidden: true });
// check if grid is presented straightaway
var dataTableSelector = null;
if (await page.$(SELECTORS.event_band_immediate_grid) !== null) {
console.log("Got data immediatly");
await page.screenshot({ path: './screenshots/Analysis: '+x[1]+' VS '+y[1]+'.png' });
var dataTableSelector = SELECTORS.event_band_immediate_grid;
} else {
console.log("Need to open page for data");
// await page.waitForSelector(SELECTORS.event_band_open_data_page);
await page.click(SELECTORS.event_band_open_data_page);
console.log("Clicked");
return;
}
const tableData = await utils.getTableDataAsJson(page, dataTableSelector);
await db.query('INSERT INTO vs_coding.event_band_result ( event_band_id, x_axis, y_axis, json_data ) VALUES (?,?,?,?)', [ eventBandId, x[1], y[1], JSON.stringify (tableData) ], function (error, results, fields) {
if (error) throw error;
});
console.log("Saved");
}
This is my solution to opening a page, grabbing the data and then continuing: don't use the on.('targetcreated') event, instead just work with the page in front of you.
[..snip..]
await fetchAnalysis(page, eventBandId, mode.name, x, y);
resultPage = await browser.newPage();
await utils.navigate(resultPage, 'analysisViewData.aspx');
await captureTableData(eventBandId, mode, x, y, resultPage, SELECTORS.event_band_analysis_results_grid);
await resultPage.close();

Categories

Resources