I am new to JS. I need to parse comments from Instagram, but first I need to load them all. I am using Puppeteer in Node.js, so I wrote this code:
await page.evaluate(() => {
while(document.querySelector('main').querySelector('ul').querySelector('button'))
document.querySelector('main').querySelector('ul').querySelector('button').click()
})
It does nothing and starts an endless loop. I tried to make a timeout inside the loop and so on...
I expect that code will check if this button exists and if true - click() on it while it exists loading more and more comments.
I can't catch what am I doing wrong.
Have a look at my answer to a question very similar to this one here:
Puppeteer / Node.js to click a button as long as it exists -- and when it no longer exists, commence action
You should be able to apply it to finding and continually clicking on your "load more" button.
Instead of a using a while() loop, you can use setInterval() to slow down each iteration to a more manageable pace while you load the comments:
await page.evaluate(async () => {
await new Promise((resolve, reject) => {
const interval = setInterval(() => {
const button = document.querySelector('main ul button');
if (button !== null) {
button.click();
} else {
clearInterval(interval);
resolve();
}
}, 100);
});
});
Related
I have used puppeteer for one of my projects to open webpages in headless chrome, do some actions and then close the page. These actions, however, are user dependent. I want to attach a lifetime to the page, where it closes automatically after, say 30 minutes, of opening irrespective of whether any action is performed or not.
I have tried setTimeout() functionality of Node JS but it didn't work (or I just couldn't figure how to make it work).
I have tried the following:
const puppeteer = require('puppeteer-core');
const browser = await puppeteer.connect({browserURL: browser_url});
const page = await browser.newPage();
// timer starts ticking here upon creation of new page (maybe in a subroutine and not block the main thread)
/**
..
Do something
..
*/
// timer ends and closePage() is triggered.
const closePage = (page) => {
if (!page.isClosed()) {
page.close();
}
}
But this gives me the following error:
Error: Protocol error: Connection closed. Most likely the page has been closed.
Your provided code should work as excepted. Are you sure the page is still opened after the timeout and it is indeed the same page?
You can try this wrapper for opening pages and closing them correctly.
// since it is async it won't block the eventloop.
// using `await` will allow other functions to execute.
async function openNewPage(browser, timeoutMs) {
const page = await browser.newPage()
setTimeout(async () => {
// you want to use try/catch for omitting unhandled promise rejections.
try {
if(!page.isClosed()) {
await page.close()
}
} catch(err) {
console.error('unexpected error occured when closing page.', err)
}
}, timeoutMs)
}
// use it like so.
const browser = await puppeteer.connect({browserURL: browser_url});
const min30Ms = 30 * 60 * 1000
const page = await openNewPage(browser, min30Ms);
// ...
The above only closes the Tabs in your browser. For closing the puppeteer instance you would have to call browser.close() which could may be what you want?
page.close returns a promise so you need to define closePage as an async function and use await page.close(). I believe #silvan's answer should address the issue, just make sure to replace if condition
if(page.isClosed())
with
if(!page.isClosed())
I'm running a test called create admin. The test will first create admin, then check if the admin was created successfully.
In the script, I have a part of code where I want to wait for 3 seconds before continuing because whenever the submit button was clicked, the website will need to take 3s to refresh the admin table (list of user info) after navigation was done. For more information, this refresh is not a navigation and therefore, my 'waitForNavigation()' is not working.
Therefore, the process will be like: 'fill out the form' > 'click submit button' > 'wait for navigation' > 'reload user table (3s).
If I don't wait 3s for the table to refresh, the test will throw an error because the registered user will not be found in the table (I have other scripts to find the user).
This is how the navigation looks like when 'Save button' was clicked:
After that, the table takes 3s to refresh and it looks like:
This is how the 'create' function looks like:
You can wrap setTimeout as Promise and use it inside async functions:
const delay = ms => new Promise(resolve => setTimeout(resolve, ms))
where ms - delay in millisceonds you wanna wait.
Usage in your code:
...
await page.click('button :text-is("Save")');
await delay(3000); // <-- here we wait 3s
return username;
Playwright now has this capability natively:
await page.waitForTimeout(3000);
Documentation for this is here: https://playwright.dev/docs/api/class-page#page-wait-for-timeout
Use setTimeout to do that. Here's an example
function delayedCheck() {
const processComplete = true; // set the right boolean here
if ( processComplete ) {
// Do something
} else {
setTimeout(delayedCheck, 3000); // try again in 3 seconds
}
}
delayedCheck();
This question already has an answer here:
How can I make a monitoring function to wait for an html element in puppeteer
(1 answer)
Closed 2 years ago.
I'm looking to have puppeteer refresh until a class I want to sign up for opens up. I plan on having the page refresh every 30 seconds or so until the selector to select the class opens. I've gone far enough to sign-in and get to the final page, but I can't wrap my head around the repeating part. A normal JS function brings an error.
page2.waitForNavigation()
await page2.waitFor(4000)
await page2.select('select[name="country"]', "USA")
await page2.click('#searchZip');
function findClass() {
page2.waitForNavigation()
await page2.waitFor(4000)
await page2.click('#showAvailableOnly');
await page2.evaluate(() => {
let el = document.querySelector(".dataTables_empty")
result = el.innerText
if (result = "No matching records found") {
console.log("Could not find any open test centers. Waiting 60 seconds to refresh.");
I want to have puppeteer wait 30 seconds here, and then run the code after findClass() again, until the result changes.
} else {
console.log("Found opening!");
}})
}
I would appreciate any help, as I am pretty new to puppeteer, or nodejs in general. Thank you so much!
in pupeteer there is a function you may use
await page.waitFor(30000);
or you can use as a second solution
function delay(time) {
return new Promise(function(resolve) {
setTimeout(resolve, time)
});
}
then call it
await delay(30000);
I m trying to test web page of my project with JEST & Puppeteer testing tool. In web page when i right click on element one menu pops up in page with setting some style attributes on element. So with this flow i m trying to test the same with JEST, I have written following code for the same.
describe('Test for Rest Data', () => {
jest.setTimeout(100000);
beforeEach(async () => {
await page.goto("url", { waitUntil: 'networkidle2' })
await page.waitForSelector('table');
});
});
test("Assert for delete row !",async () => {
await page.click('tr','right');
const tbl = await page.evaluate(()=>{
return document.querySelector('tr').getAttribute('style');
});
expect(tbl).not.toBeNull();
});
here when i click on of table style attribute gets added but with above code tbl is not getting any value.
Am I doing something wrong ? How should I do this right ?
You should also probably wait for some time after the click, maybe the style changes but not instantly, maybe the element is not there yet.
Try,
await page.waitFor(1000); // wait for some time
// or this below
await page.waitFor('tr'); // wait for the element
Which will wait for some time or the element. Check if that is the case.
I'm using nightmare.js to scrape public records and am just trying to get the scraper to wait for the next page to load. I'm crawling search results which I press a next button to (obviously) get to the next page. I can't use nightmare.wait(someConstTime) to accurately wait for the next page to load because sometimes someConstTime is shorter than the time it takes for the next page to load (although it's always under 30 seconds). I also can't use nightmare.wait(selector) because the same selectors are always present on all result pages. In that case nightmare basically doesn't wait at all because the selector is already present (on the page I already scraped) so it it will proceed to scrape the same page several times unless the new page loads before the next loop.
How can I conditionally wait for the next page to load after I click on the next button?
If I could figure out how - I would compare the "Showing # to # of ## entries" indicator of the current page (currentPageStatus) to the last known value (lastPageStatus) and wait until they're different (hence the next page loaded).
(ignore that the example image only has one search result page)
I'd do that using this code from https://stackoverflow.com/a/36734481/3491991 but that would require passing lastPageStatus into deferredWait (which I can't figure out).
Here's the code I've got so far:
// Load dependencies
//const { csvFormat } = require('d3-dsv');
const Nightmare = require('nightmare');
const fs = require('fs');
var vo = require('vo');
const START = 'http://propertytax.peoriacounty.org';
var parcelPrefixes = ["01","02","03","04","05","06","07","08","09","10",
"11","12","13","14","15","16","17","18","19"]
vo(main)(function(err, result) {
if (err) throw err;
});
function* main() {
var nightmare = Nightmare(),
currentPage = 0;
// Go to Peoria Tax Records Search
try {
yield nightmare
.goto(START)
.wait('input[name="property_key"]')
.insert('input[name="property_key"]', parcelPrefixes[0])
// Click search button (#btn btn-success)
.click('.btn.btn-success')
} catch(e) {
console.error(e)
}
// Get parcel numbers ten at a time
try {
yield nightmare
.wait('.sorting_1')
isLastPage = yield nightmare.visible('.paginate_button.next.disabled')
while (!isLastPage) {
console.log('The current page should be: ', currentPage); // Display page status
try {
const result = yield nightmare
.evaluate(() => {
return [...document.querySelectorAll('.sorting_1')]
.map(el => el.innerText);
})
// Save property numbers
// fs.appendFile('parcels.txt', result, (err) => {
// if (err) throw err;
// console.log('The "data to append" was appended to file!');
// });
} catch(e) {
console.error(e);
return undefined;
}
yield nightmare
// Click next page button
.click('.paginate_button.next')
// ************* THIS IS WHERE I NEED HELP *************** BEGIN
// Wait for next page to load before continue while loop
try {
const currentPageStatus = yield nightmare
.evaluate(() => {
return document.querySelector('.dataTables_info').innerText;
})
console.log(currentPageStatus);
} catch(e) {
console.error(e);
return undefined;
}
// ************* THIS IS WHERE I NEED HELP *************** END
currentPage++;
isLastPage = yield nightmare.visible('.paginate_button.next.disabled')
}
} catch(e) {
console.error(e)
}
yield nightmare.end();
}
I had a similar issue that I managed to fix. Basically I had to navigate to a search page, select the '100 per page' option and then wait for the refresh. Only problem was, it was a crapshoot as to whether a manual wait time allowed the AJAX to fire and repopulate with more than 10 results (the default).
I ended up doing this:
nightmare
.goto(url)
.wait('input.button.primary')
.click('input.button.primary')
.wait('#searchresults')
.select('#resultsPerPage',"100")
.click('input.button.primary')
.wait('.searchresult:nth-child(11)')
.evaluate(function() {
...
}
.end()
With this, the evaluate won't fire until it detects at least 11 divs with the class of .searchresult. Given that the default is 10, it has to wait for the reload for this to complete.
You could extend this to scrape the total number of available results from the first page to ensure that there are - in my case - more than 10 available. But the foundation of the concept works.
From what I could understand, basically you need the DOM change to be completed before you start extracting from the page being loaded.
In your case, the element for DOM changes is table with CSS selector: '#search-results'
I think MutationObserver is what you need.
I have used Mutation Summary library which provides a nice wrapper on raw functionality of MutationObservers, to achieve something similar
var observer = new MutationSummary({
callback: updateWidgets,
queries: [{
element: '[data-widget]'
}]
});
:From Tutorial
First register MutationSummary observer when the search results are loaded.
Then, after clicking 'Next' use nightmare.evaluate to wait for mutationSummary callback to return extracted values.