Puppeteer Async Await Loop in NodeJS - javascript

I am trying to make a script that :
Grabs all urls from a sitemap
Takes a screenshot of it with puppeteer
I am currently trying to understand how to code asynchronously but I still have troubles with finding the right coding pattern for this problem.
Here is the code I currently have :
// const spider = require('./spider');
const Promise = require('bluebird');
const puppeteer = require('puppeteer');
const SpiderConstructor = require('sitemapper');
async function crawl(url, timeout) {
const results = await spider(url, timeout);
await Promise.each(results, async (result, index) => {
await screen(result, index);
});
}
async function screen(result, index) {
const browser = await puppeteer.launch();
console.log('doing', index);
const page = await browser.newPage();
await page.goto(result);
const path = await 'screenshots/' + index + page.title() + '.png';
await page.screenshot({path});
browser.close();
}
async function spider(url, timeout) {
const spider = await new SpiderConstructor({
url: url,
timeout: timeout
});
const data = await spider.fetch();
console.log(data.sites.length);
return data.sites;
};
crawl('https://www.google.com/sitemap.xml', 15000)
.catch(err => {
console.error(err);
});
I am having the following problems :
The length of the results array is not a constant, it varies every time I launch the script, which I guess resides in the fact it is not fully resolved when I display it, but I thought the whole point of await was so that we are guarantied that on next line the promise is resolved.
The actual screenshotting action part of the script doesn't work half the time and I am pretty sure I have unresolved promises but I have no of the actual pattern for looping over an async function, right now it seems like it does a screenshot after the other (linear and incremental) but I get alot of duplicates.
Any help is appreciated. Thank you for your time

Related

Function does not pass object to the constant

I'm new to javascript so maybe it's a dumb mistake. I'm trying to pass the values ​​of the object that I get in this webscrapping function to the constant but I'm not succeeding. Every time I try to print the menu it prints as "undefined".
`
const puppeteer = require("puppeteer");
async function getMenu() {
console.log("Opening the browser...");
const browser = await puppeteer.launch({
headless: true
});
const page = await browser.newPage();
await page.goto('https://pra.ufpr.br/ru/ru-centro-politecnico/', {waitUntil: 'domcontentloaded'});
console.log("Content loaded...");
// Get the viewport of the page
const fullMenu = await page.evaluate(() => {
return {
day: document.querySelector('#conteudo div:nth-child(3) p strong').innerText,
breakfastFood: document.querySelector('tbody tr:nth-child(2)').innerText,
lunchFood: document.querySelector('tbody tr:nth-child(4)').innerText,
dinnerFood: document.querySelector('tbody tr:nth-child(6)').innerText
};
});
await browser.close();
return {
breakfast: fullMenu.day + "\nCafé da Manhã:\n" + fullMenu.breakfastFood,
lunch: fullMenu.day + "\nAlmoço:\n" + fullMenu.lunchFood,
dinner: fullMenu.day + "\nJantar:\n" + fullMenu.dinnerFood
};
};
const menu = getMenu();
console.log(menu.breakfast);
`
I've tried to pass these values ​​in several ways to a variable but I'm not succeeding. I also accept other methods of passing these strings, I'm doing it this way because it's the simplest I could think of.
Your getMenu() is an async function.
In your last bit of code, can you change it to,
(async () => {
let menu = await getMenu();
console.log(menu.breakfast);
})();
credit to this post.
I have no access to the package that you imported. You may try changing the last part of your code to:
const menu = await getMenu();
if (menu) {
console.log(menu.breakfast);
}
Explanation
getMenu() and await getMenu() are different things in JS. getMenu() is a Promise Object which does not represent any string / number / return value. await getMenu() tells JS to run other code first to wait for the result of getMenu().
Despite await tells JS to wait for getMenu() to be resolved, it doesn't stop console.log(menu.breakfast) from running. Your code will try to access menu - which at that moment it is a Promise object. Therefore, breakfast property doesn't exist in the Promise object, so you get undefined.
By adding a if (menu) {...} statement, javascript will wait until menu is resolved before going inside the if-statement. This is useful when you want to do console.log() on a async/await return value.

Discord Bot JS interaction reply does not wait until the code finish executing

I just started to code and trying to build a discord js bot. I am scraping data from a website. My problem is that the await interaction.reply(randomFact); will execute immediately while my code have not finish scraping and return the result. I have tried async/await but still does not work.
module.exports = {
data: new SlashCommandBuilder()
.setName("tips")
.setDescription("Scrap from website"),
async execute(interaction) {
let randomFact;
let healthArray = [];
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://www.example.com/health-facts/");
const pageData = await page.evaluate(() => {
return {
html: document.documentElement.innerHTML,
};
});
const $ = cheerio.load(pageData.html);
$(".round-number")
.find("h3")
.each(function (i, el) {
let row = $(el).text().replace(/(\s+)/g, " ");
row = $(el)
.text()
.replace(/[0-9]+. /g, "")
.trim();
healthArray.push(row);
});
await browser.close();
randomFact =
healthArray[Math.floor(Math.random() * healthArray.length)];
await interaction.reply(randomFact);
},
};
the output error
Sorry if there is anything lacking in my post. I just joined stack overflow.
The discord api has a timelimit of 3 seconds to reply to a interaction before it expires, to increase this timelimit you will need to use the deferReply method found here, the attached link is for the ButtonInteraction but the SelectMenueInteraction and other types of interactions also have the same method so shouldn't be a issue
https://discord.js.org/#/docs/main/stable/search?query=Interaction.defer

Function in selenium to return the desired element once it appears

I'm trying to create a program to google search using selenium
based on this answer,
so far the code looks like this
const { Builder, By, Key, until } = require('selenium-webdriver');
const driver = new Builder().forBrowser("firefox").build();
(async () => {
await driver.get(`https://www.google.com`);
var el = await driver.findElement(By.name('q'));
await driver.wait(until.elementIsVisible(el),1000);
await el.sendKeys('selenium');
var el = await driver.findElement(By.name(`btnK`));
await driver.wait(until.elementIsVisible(el),1000);
await el.click();
console.log('...Task Complete!')
})();
but writing
var el = await driver.findElement(By.something('...'));
await driver.wait(until.elementIsVisible(el),1000);
await el.do_something();
everytime becomes difficult so I tried to make a function like this:
const { Builder, By, Key, until } = require('selenium-webdriver');
const driver = new Builder().forBrowser("firefox").build();
async function whenElement(by_identity,timeout=1000){
var el = await driver.findElement(by_identity);
await driver.wait(until.elementIsVisible(el),timeout);
return el;
}
(async () => {
await driver.get(`https://www.google.com`);
await whenElement(By.name('q')).sendKeys('selenium');
await whenElement(By.name('btnK')).click();
console.log('...Task Complete!')
})();
but it gives this ERROR:
UnhandledPromiseRejectionWarning: TypeError: whenElement(...).sendKeys
is not a function
My aim is to reduce the number of variables and make it as simple as possible
so what exactly am I doing wrong here?
There seems to be an error with the promises. You can only invoke a function on the actual element returned by the Promise and not the promise itself.
You must first wait for the promise that waits for the element whenElement to resolve and then you can use the element and wait for the promise returned by sendKeys to resolve.
const el = await whenElement(By.name('q'));
await el.sendKeys('selenium');
or
await (await whenElement(By.name('q'))).sendKeys('selenium');
or
await whenElement(By.name('q')).then(el => el.sendKeys('selenium'));

Web Scraping Loop w/ Puppeteer: "await is only valid in async function"

I'm trying to check what the current item on air is at qvc.com in a repeating loop using the following code, however I get "await is only valid in async function" on the line "const results = await..."
Here is my code:
(async () => {
// Init
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.qvc.com/content/iroa.qvc.eastern.html');
// Selectors
const current_item_selector = '.galleryItem:first-of-type a';
// Functions
setInterval(function() { // Repeat every 5s
const results = await page.$(current_item_selector);
const item = await results.evaluate(element => element.title);
console.log(item);
}, 5000);
})();
UPDATE:
setTimeout was supposed to be setInterval that was my mistake, a copy/paste error. I updated that in the codeblock, thanks those that pointed that out.
The function inside setInterval needs to be async as well:
// Functions
setInterval(async function() { // Repeat every 5s
const results = await page.$(current_item_selector);
const item = await results.evaluate(element => element.title);
console.log(item);
}, 5000);

Using jest to test async function that does not have callback

I'm currently looking to test an async function, a scraper that scrapes Ryan Air's website for a price on a given route to be exact. And I want to test that the scraped price is actually what the price should be. When trying to run it with jest to test, I cannot seem to make it work properly... I've looked on Google and various other sites and they all seem to have solutions for async functions that have callbacks, promises, etc. and NOT async functions that don't have those.
My function takes as a parameter the URL of a given route on Ryan Air.
Here is my async function (file is named scraperProduct.js):
const puppeteer = require('puppeteer');
async function scraperProduct(url){
console.log('Starting scraper...');
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
await page.waitFor(500);
//Price departure
const [el1] = await page.$x('/html/body/flights-root/div/div/div/div/flights-summary-container/flights-summary/div/div[1]/journey-container/journey/div/div[2]/carousel-container/carousel/div/ul/li[3]/carousel-item/button/div[2]/ry-price/span[2]');
const txt = await el1.getProperty('textContent');
const Price = await txt.jsonValue();
//Price return
const [el6] = await page.$x('/html/body/flights-root/div/div/div/div/flights-summary-container/flights-summary/div/div[2]/journey-container/journey/div/div[2]/carousel-container/carousel/div/ul/li[3]/carousel-item/button/div[2]/ry-price/span[2]');
const txt6 = await el6.getProperty('textContent');
const Price2 = await txt6.jsonValue();
return Price + Price2;
}
module.exports = scraperProduct;
And this is my test file (named scraperProduct.test.js):
const scraperProduct = require('./scraperProduct');
test("Testing that scraper retrieves correct price from Ryan Air", async () => {
expect(
scraperProduct('https://www.ryanair.com/dk/da/trip/flights/select?adults=1&teens=0&children=0&infants=0&dateOut=2020-07-13&dateIn=2020-07-20&originIata=CPH&destinationIata=STN&isConnectedFlight=false&isReturn=true&discount=0')
).toBe(698);
});
'toBe(698)' is 698 since that is what the price should be in the test.
I appreciate any help I can get with this - it's my first time using jest, so I'm a bit of a noob atm.
Since you are trying to test an async function, you need to wait for the result of that function i.e use await.
This is one of the possible solutions when testing asynchronous code. Wait for the result and then test it.
const scraperProduct = require('./scraperProduct');
test("Testing that scraper retrieves correct price from Ryan Air", async () => {
const result = await scraperProduct('https://www.ryanair.com/dk/da/trip/flights/select?adults=1&teens=0&children=0&infants=0&dateOut=2020-07-13&dateIn=2020-07-20&originIata=CPH&destinationIata=STN&isConnectedFlight=false&isReturn=true&discount=0')
expect(result).toBe(698);
});

Categories

Resources