Using jest to test async function that does not have callback - javascript

I'm currently looking to test an async function, a scraper that scrapes Ryan Air's website for a price on a given route to be exact. And I want to test that the scraped price is actually what the price should be. When trying to run it with jest to test, I cannot seem to make it work properly... I've looked on Google and various other sites and they all seem to have solutions for async functions that have callbacks, promises, etc. and NOT async functions that don't have those.
My function takes as a parameter the URL of a given route on Ryan Air.
Here is my async function (file is named scraperProduct.js):
const puppeteer = require('puppeteer');
async function scraperProduct(url){
console.log('Starting scraper...');
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
await page.waitFor(500);
//Price departure
const [el1] = await page.$x('/html/body/flights-root/div/div/div/div/flights-summary-container/flights-summary/div/div[1]/journey-container/journey/div/div[2]/carousel-container/carousel/div/ul/li[3]/carousel-item/button/div[2]/ry-price/span[2]');
const txt = await el1.getProperty('textContent');
const Price = await txt.jsonValue();
//Price return
const [el6] = await page.$x('/html/body/flights-root/div/div/div/div/flights-summary-container/flights-summary/div/div[2]/journey-container/journey/div/div[2]/carousel-container/carousel/div/ul/li[3]/carousel-item/button/div[2]/ry-price/span[2]');
const txt6 = await el6.getProperty('textContent');
const Price2 = await txt6.jsonValue();
return Price + Price2;
}
module.exports = scraperProduct;
And this is my test file (named scraperProduct.test.js):
const scraperProduct = require('./scraperProduct');
test("Testing that scraper retrieves correct price from Ryan Air", async () => {
expect(
scraperProduct('https://www.ryanair.com/dk/da/trip/flights/select?adults=1&teens=0&children=0&infants=0&dateOut=2020-07-13&dateIn=2020-07-20&originIata=CPH&destinationIata=STN&isConnectedFlight=false&isReturn=true&discount=0')
).toBe(698);
});
'toBe(698)' is 698 since that is what the price should be in the test.
I appreciate any help I can get with this - it's my first time using jest, so I'm a bit of a noob atm.

Since you are trying to test an async function, you need to wait for the result of that function i.e use await.
This is one of the possible solutions when testing asynchronous code. Wait for the result and then test it.
const scraperProduct = require('./scraperProduct');
test("Testing that scraper retrieves correct price from Ryan Air", async () => {
const result = await scraperProduct('https://www.ryanair.com/dk/da/trip/flights/select?adults=1&teens=0&children=0&infants=0&dateOut=2020-07-13&dateIn=2020-07-20&originIata=CPH&destinationIata=STN&isConnectedFlight=false&isReturn=true&discount=0')
expect(result).toBe(698);
});

Related

Function does not pass object to the constant

I'm new to javascript so maybe it's a dumb mistake. I'm trying to pass the values ​​of the object that I get in this webscrapping function to the constant but I'm not succeeding. Every time I try to print the menu it prints as "undefined".
`
const puppeteer = require("puppeteer");
async function getMenu() {
console.log("Opening the browser...");
const browser = await puppeteer.launch({
headless: true
});
const page = await browser.newPage();
await page.goto('https://pra.ufpr.br/ru/ru-centro-politecnico/', {waitUntil: 'domcontentloaded'});
console.log("Content loaded...");
// Get the viewport of the page
const fullMenu = await page.evaluate(() => {
return {
day: document.querySelector('#conteudo div:nth-child(3) p strong').innerText,
breakfastFood: document.querySelector('tbody tr:nth-child(2)').innerText,
lunchFood: document.querySelector('tbody tr:nth-child(4)').innerText,
dinnerFood: document.querySelector('tbody tr:nth-child(6)').innerText
};
});
await browser.close();
return {
breakfast: fullMenu.day + "\nCafé da Manhã:\n" + fullMenu.breakfastFood,
lunch: fullMenu.day + "\nAlmoço:\n" + fullMenu.lunchFood,
dinner: fullMenu.day + "\nJantar:\n" + fullMenu.dinnerFood
};
};
const menu = getMenu();
console.log(menu.breakfast);
`
I've tried to pass these values ​​in several ways to a variable but I'm not succeeding. I also accept other methods of passing these strings, I'm doing it this way because it's the simplest I could think of.
Your getMenu() is an async function.
In your last bit of code, can you change it to,
(async () => {
let menu = await getMenu();
console.log(menu.breakfast);
})();
credit to this post.
I have no access to the package that you imported. You may try changing the last part of your code to:
const menu = await getMenu();
if (menu) {
console.log(menu.breakfast);
}
Explanation
getMenu() and await getMenu() are different things in JS. getMenu() is a Promise Object which does not represent any string / number / return value. await getMenu() tells JS to run other code first to wait for the result of getMenu().
Despite await tells JS to wait for getMenu() to be resolved, it doesn't stop console.log(menu.breakfast) from running. Your code will try to access menu - which at that moment it is a Promise object. Therefore, breakfast property doesn't exist in the Promise object, so you get undefined.
By adding a if (menu) {...} statement, javascript will wait until menu is resolved before going inside the if-statement. This is useful when you want to do console.log() on a async/await return value.

Discord Bot JS interaction reply does not wait until the code finish executing

I just started to code and trying to build a discord js bot. I am scraping data from a website. My problem is that the await interaction.reply(randomFact); will execute immediately while my code have not finish scraping and return the result. I have tried async/await but still does not work.
module.exports = {
data: new SlashCommandBuilder()
.setName("tips")
.setDescription("Scrap from website"),
async execute(interaction) {
let randomFact;
let healthArray = [];
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://www.example.com/health-facts/");
const pageData = await page.evaluate(() => {
return {
html: document.documentElement.innerHTML,
};
});
const $ = cheerio.load(pageData.html);
$(".round-number")
.find("h3")
.each(function (i, el) {
let row = $(el).text().replace(/(\s+)/g, " ");
row = $(el)
.text()
.replace(/[0-9]+. /g, "")
.trim();
healthArray.push(row);
});
await browser.close();
randomFact =
healthArray[Math.floor(Math.random() * healthArray.length)];
await interaction.reply(randomFact);
},
};
the output error
Sorry if there is anything lacking in my post. I just joined stack overflow.
The discord api has a timelimit of 3 seconds to reply to a interaction before it expires, to increase this timelimit you will need to use the deferReply method found here, the attached link is for the ButtonInteraction but the SelectMenueInteraction and other types of interactions also have the same method so shouldn't be a issue
https://discord.js.org/#/docs/main/stable/search?query=Interaction.defer

Function in selenium to return the desired element once it appears

I'm trying to create a program to google search using selenium
based on this answer,
so far the code looks like this
const { Builder, By, Key, until } = require('selenium-webdriver');
const driver = new Builder().forBrowser("firefox").build();
(async () => {
await driver.get(`https://www.google.com`);
var el = await driver.findElement(By.name('q'));
await driver.wait(until.elementIsVisible(el),1000);
await el.sendKeys('selenium');
var el = await driver.findElement(By.name(`btnK`));
await driver.wait(until.elementIsVisible(el),1000);
await el.click();
console.log('...Task Complete!')
})();
but writing
var el = await driver.findElement(By.something('...'));
await driver.wait(until.elementIsVisible(el),1000);
await el.do_something();
everytime becomes difficult so I tried to make a function like this:
const { Builder, By, Key, until } = require('selenium-webdriver');
const driver = new Builder().forBrowser("firefox").build();
async function whenElement(by_identity,timeout=1000){
var el = await driver.findElement(by_identity);
await driver.wait(until.elementIsVisible(el),timeout);
return el;
}
(async () => {
await driver.get(`https://www.google.com`);
await whenElement(By.name('q')).sendKeys('selenium');
await whenElement(By.name('btnK')).click();
console.log('...Task Complete!')
})();
but it gives this ERROR:
UnhandledPromiseRejectionWarning: TypeError: whenElement(...).sendKeys
is not a function
My aim is to reduce the number of variables and make it as simple as possible
so what exactly am I doing wrong here?
There seems to be an error with the promises. You can only invoke a function on the actual element returned by the Promise and not the promise itself.
You must first wait for the promise that waits for the element whenElement to resolve and then you can use the element and wait for the promise returned by sendKeys to resolve.
const el = await whenElement(By.name('q'));
await el.sendKeys('selenium');
or
await (await whenElement(By.name('q'))).sendKeys('selenium');
or
await whenElement(By.name('q')).then(el => el.sendKeys('selenium'));

Crawling multiple URLs in a loop using Puppeteer

I have an array of URLs to scrape data from:
urls = ['url','url','url'...]
This is what I'm doing:
urls.map(async (url)=>{
await page.goto(url);
await page.waitForNavigation({ waitUntil: 'networkidle' });
})
This seems to not wait for page load and visits all the URLs quite rapidly (I even tried using page.waitFor).
I wanted to know if am I doing something fundamentally wrong or this type of functionality is not advised/supported.
map, forEach, reduce, etc, does not wait for the asynchronous operation within them, before they proceed to the next element of the iterator they are iterating over.
There are multiple ways of going through each item of an iterator synchronously while performing an asynchronous operation, but the easiest in this case I think would be to simply use a normal for operator, which does wait for the operation to finish.
const urls = [...]
for (let i = 0; i < urls.length; i++) {
const url = urls[i];
await page.goto(`${url}`);
await page.waitForNavigation({ waitUntil: 'networkidle2' });
}
This would visit one url after another, as you are expecting. If you are curious about iterating serially using await/async, you can have a peek at this answer: https://stackoverflow.com/a/24586168/791691
The accepted answer shows how to serially visit each page one at a time. However, you may want to visit multiple pages simultaneously when the task is embarrassingly parallel, that is, scraping a particular page isn't dependent on data extracted from other pages.
A tool that can help achieve this is Promise.allSettled which lets us fire off a bunch of promises at once, determine which were successful and harvest results.
For a basic example, let's say we want to scrape usernames for Stack Overflow users given a series of ids.
Serial code:
const puppeteer = require("puppeteer"); // ^19.6.3
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
const baseURL = "https://stackoverflow.com/users";
const startId = 6243352;
const qty = 5;
const usernames = [];
for (let i = startId; i < startId + qty; i++) {
await page.goto(`${baseURL}/${i}`, {
waitUntil: "domcontentloaded"
});
const sel = ".flex--item.mb12.fs-headline2.lh-xs";
const el = await page.waitForSelector(sel);
usernames.push(await el.evaluate(el => el.textContent.trim()));
}
console.log(usernames);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Parallel code:
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
const baseURL = "https://stackoverflow.com/users";
const startId = 6243352;
const qty = 5;
const usernames = (await Promise.allSettled(
[...Array(qty)].map(async (_, i) => {
const page = await browser.newPage();
await page.goto(`${baseURL}/${i + startId}`, {
waitUntil: "domcontentloaded"
});
const sel = ".flex--item.mb12.fs-headline2.lh-xs";
const el = await page.waitForSelector(sel);
const text = await el.evaluate(el => el.textContent.trim());
await page.close();
return text;
})))
.filter(e => e.status === "fulfilled")
.map(e => e.value);
console.log(usernames);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Remember that this is a technique, not a silver bullet that guarantees a speed increase on all workloads. It will take some experimentation to find the optimal balance between the cost of creating more pages versus the parallelization of network requests on a given particular task and system.
The example here is contrived since it's not interacting with the page dynamically, so there's not as much room for gain as in a typical Puppeteer use case that involves network requests and blocking waits per page.
Of course, beware of rate limiting and any other restrictions imposed by sites (running the code above may anger Stack Overflow's rate limiter).
For tasks where creating a page per task is prohibitively expensive or you'd like to set a cap on parallel request dispatches, consider using a task queue or combining serial and parallel code shown above to send requests in chunks. This answer shows a generic pattern for this agnostic of Puppeteer.
These patterns can be extended to handle the case when certain pages depend on data from other pages, forming a dependency graph.
See also Using async/await with a forEach loop which explains why the original attempt in this thread using map fails to wait for each promise.
If you find that you are waiting on your promise indefinitely, the proposed solution is to use the following:
const urls = [...]
for (let i = 0; i < urls.length; i++) {
const url = urls[i];
const promise = page.waitForNavigation({ waitUntil: 'networkidle' });
await page.goto(`${url}`);
await promise;
}
As referenced from this github issue
Best way I found to achieve this.
const puppeteer = require('puppeteer');
(async () => {
const urls = ['https://www.google.com/', 'https://www.google.com/']
for (let i = 0; i < urls.length; i++) {
const url = urls[i];
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto(`${url}`, { waitUntil: 'networkidle2' });
await browser.close();
}
})();
Something no one else mentions is that if you are fetching multiple pages using the same page object it is crucial that you set its timeout to 0. Otherwise, once it has fetched the default 30 seconds worth of pages, it will timeout.
const browser = await puppeteer.launch();
const page = await browser.newPage();
page.setDefaultNavigationTimeout(0);

Puppeteer Async Await Loop in NodeJS

I am trying to make a script that :
Grabs all urls from a sitemap
Takes a screenshot of it with puppeteer
I am currently trying to understand how to code asynchronously but I still have troubles with finding the right coding pattern for this problem.
Here is the code I currently have :
// const spider = require('./spider');
const Promise = require('bluebird');
const puppeteer = require('puppeteer');
const SpiderConstructor = require('sitemapper');
async function crawl(url, timeout) {
const results = await spider(url, timeout);
await Promise.each(results, async (result, index) => {
await screen(result, index);
});
}
async function screen(result, index) {
const browser = await puppeteer.launch();
console.log('doing', index);
const page = await browser.newPage();
await page.goto(result);
const path = await 'screenshots/' + index + page.title() + '.png';
await page.screenshot({path});
browser.close();
}
async function spider(url, timeout) {
const spider = await new SpiderConstructor({
url: url,
timeout: timeout
});
const data = await spider.fetch();
console.log(data.sites.length);
return data.sites;
};
crawl('https://www.google.com/sitemap.xml', 15000)
.catch(err => {
console.error(err);
});
I am having the following problems :
The length of the results array is not a constant, it varies every time I launch the script, which I guess resides in the fact it is not fully resolved when I display it, but I thought the whole point of await was so that we are guarantied that on next line the promise is resolved.
The actual screenshotting action part of the script doesn't work half the time and I am pretty sure I have unresolved promises but I have no of the actual pattern for looping over an async function, right now it seems like it does a screenshot after the other (linear and incremental) but I get alot of duplicates.
Any help is appreciated. Thank you for your time

Categories

Resources