Web Scraping Loop w/ Puppeteer: "await is only valid in async function" - javascript

I'm trying to check what the current item on air is at qvc.com in a repeating loop using the following code, however I get "await is only valid in async function" on the line "const results = await..."
Here is my code:
(async () => {
// Init
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.qvc.com/content/iroa.qvc.eastern.html');
// Selectors
const current_item_selector = '.galleryItem:first-of-type a';
// Functions
setInterval(function() { // Repeat every 5s
const results = await page.$(current_item_selector);
const item = await results.evaluate(element => element.title);
console.log(item);
}, 5000);
})();
UPDATE:
setTimeout was supposed to be setInterval that was my mistake, a copy/paste error. I updated that in the codeblock, thanks those that pointed that out.

The function inside setInterval needs to be async as well:
// Functions
setInterval(async function() { // Repeat every 5s
const results = await page.$(current_item_selector);
const item = await results.evaluate(element => element.title);
console.log(item);
}, 5000);

Related

Discord Bot JS interaction reply does not wait until the code finish executing

I just started to code and trying to build a discord js bot. I am scraping data from a website. My problem is that the await interaction.reply(randomFact); will execute immediately while my code have not finish scraping and return the result. I have tried async/await but still does not work.
module.exports = {
data: new SlashCommandBuilder()
.setName("tips")
.setDescription("Scrap from website"),
async execute(interaction) {
let randomFact;
let healthArray = [];
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://www.example.com/health-facts/");
const pageData = await page.evaluate(() => {
return {
html: document.documentElement.innerHTML,
};
});
const $ = cheerio.load(pageData.html);
$(".round-number")
.find("h3")
.each(function (i, el) {
let row = $(el).text().replace(/(\s+)/g, " ");
row = $(el)
.text()
.replace(/[0-9]+. /g, "")
.trim();
healthArray.push(row);
});
await browser.close();
randomFact =
healthArray[Math.floor(Math.random() * healthArray.length)];
await interaction.reply(randomFact);
},
};
the output error
Sorry if there is anything lacking in my post. I just joined stack overflow.
The discord api has a timelimit of 3 seconds to reply to a interaction before it expires, to increase this timelimit you will need to use the deferReply method found here, the attached link is for the ButtonInteraction but the SelectMenueInteraction and other types of interactions also have the same method so shouldn't be a issue
https://discord.js.org/#/docs/main/stable/search?query=Interaction.defer

function in page.evaluate() is not working/being executed - Puppeteer

My Code below tries to collect a bunch of hyper links that come under the class name ".jss2". However, I do not think the function within my page.evaluate() is working. When I run the code, the link_list const doesn't get displayed.
I ran the document.querySelectorAll on the Chrome console and that was perfectly fine - really having a hard time with this.
async function testing() {
const browser = await puppeteer.launch({headless:false});
const page = await browser.newPage();
await page.setViewport({width: 1200, height: 800});
await page.goto(url);
const link_list = await this.page.evaluate(() => {
let elements = Array.from(document.querySelectorAll(".jss2"));
let links = elements.map(element => {
return element.href;
});
return (links);
});
console.log(link_list);
}
const link_list = await page.$$eval('.classname', links => links.map(link => link.href));
Found the answer here: PUPPETEER - unable to extract elements on certain websites using page.evaluate(() => document.querySelectorAll())

How do I get the return value from puppeteer? [duplicate]

This question already has answers here:
How can I pass variable into an evaluate function?
(7 answers)
Closed 1 year ago.
When the following build runs, I get the error "Error: Evaluation failed: ReferenceError: chunk is not defined" as a console response. In the next line, I can reach chunk[a] as console.log, but why can't I reach it in frame.evaluate?
const puppeteer = require('puppeteer');
const chunk= ["google.com","facebook.com","gmail.com","stackoverflow.com"];
(async () => {
for(var a=0;a<chunk.length;a++){
const browser = await puppeteer.launch({headless:false})
const page = await browser.newPage();
const iframeHandle = await page.$("frame[name='main']");
const frame = await iframeHandle.contentFrame();
console.log(chunk[a]);
await frame.evaluate(() => {
if(document.querySelector("#content > table").innerHTML.indexOf(chunk[a]) > -1){
console.log(chunk[a]);
}
});
}
})()
result : google.com
Evaluation failed: ReferenceError: chunk is not defined
I moved the chunck inside the async and chunk is not defined disappeared. However, this time another error occured regarding the element you want to reach. I tried to correct this as well by adding .waitForSelector and removing page.$ but a third error occured which I believe is because of mispelled element path ("frame[name='main']"). You may change it and try it with the code below:
(async () => {
const chunk= ["google.com","facebook.com","gmail.com","stackoverflow.com"];
for(var a=0;a<chunk.length;a++){
const browser = await puppeteer.launch({headless:false})
const page = await browser.newPage();
const iframeHandle = await page.waitForSelector("frame[name='main']");
const frame = await iframeHandle.contentFrame();
console.log(chunk[a]);
await frame.evaluate(() => {
if(document.querySelector("#content > table").innerHTML.indexOf(chunk[a]) > -1){
console.log(chunk[a]);
}
});
}
})()

Function in selenium to return the desired element once it appears

I'm trying to create a program to google search using selenium
based on this answer,
so far the code looks like this
const { Builder, By, Key, until } = require('selenium-webdriver');
const driver = new Builder().forBrowser("firefox").build();
(async () => {
await driver.get(`https://www.google.com`);
var el = await driver.findElement(By.name('q'));
await driver.wait(until.elementIsVisible(el),1000);
await el.sendKeys('selenium');
var el = await driver.findElement(By.name(`btnK`));
await driver.wait(until.elementIsVisible(el),1000);
await el.click();
console.log('...Task Complete!')
})();
but writing
var el = await driver.findElement(By.something('...'));
await driver.wait(until.elementIsVisible(el),1000);
await el.do_something();
everytime becomes difficult so I tried to make a function like this:
const { Builder, By, Key, until } = require('selenium-webdriver');
const driver = new Builder().forBrowser("firefox").build();
async function whenElement(by_identity,timeout=1000){
var el = await driver.findElement(by_identity);
await driver.wait(until.elementIsVisible(el),timeout);
return el;
}
(async () => {
await driver.get(`https://www.google.com`);
await whenElement(By.name('q')).sendKeys('selenium');
await whenElement(By.name('btnK')).click();
console.log('...Task Complete!')
})();
but it gives this ERROR:
UnhandledPromiseRejectionWarning: TypeError: whenElement(...).sendKeys
is not a function
My aim is to reduce the number of variables and make it as simple as possible
so what exactly am I doing wrong here?
There seems to be an error with the promises. You can only invoke a function on the actual element returned by the Promise and not the promise itself.
You must first wait for the promise that waits for the element whenElement to resolve and then you can use the element and wait for the promise returned by sendKeys to resolve.
const el = await whenElement(By.name('q'));
await el.sendKeys('selenium');
or
await (await whenElement(By.name('q'))).sendKeys('selenium');
or
await whenElement(By.name('q')).then(el => el.sendKeys('selenium'));

Puppeteer Async Await Loop in NodeJS

I am trying to make a script that :
Grabs all urls from a sitemap
Takes a screenshot of it with puppeteer
I am currently trying to understand how to code asynchronously but I still have troubles with finding the right coding pattern for this problem.
Here is the code I currently have :
// const spider = require('./spider');
const Promise = require('bluebird');
const puppeteer = require('puppeteer');
const SpiderConstructor = require('sitemapper');
async function crawl(url, timeout) {
const results = await spider(url, timeout);
await Promise.each(results, async (result, index) => {
await screen(result, index);
});
}
async function screen(result, index) {
const browser = await puppeteer.launch();
console.log('doing', index);
const page = await browser.newPage();
await page.goto(result);
const path = await 'screenshots/' + index + page.title() + '.png';
await page.screenshot({path});
browser.close();
}
async function spider(url, timeout) {
const spider = await new SpiderConstructor({
url: url,
timeout: timeout
});
const data = await spider.fetch();
console.log(data.sites.length);
return data.sites;
};
crawl('https://www.google.com/sitemap.xml', 15000)
.catch(err => {
console.error(err);
});
I am having the following problems :
The length of the results array is not a constant, it varies every time I launch the script, which I guess resides in the fact it is not fully resolved when I display it, but I thought the whole point of await was so that we are guarantied that on next line the promise is resolved.
The actual screenshotting action part of the script doesn't work half the time and I am pretty sure I have unresolved promises but I have no of the actual pattern for looping over an async function, right now it seems like it does a screenshot after the other (linear and incremental) but I get alot of duplicates.
Any help is appreciated. Thank you for your time

Categories

Resources