show long API call progress with progress bar in NextJS - javascript

When the user clicks a button on my NextJS website, a NextJS API is called. There, a Puppeteer client is started, an external API is called, and the code loops through this response and crawls through some data.
This takes a long time, and I wanted to give the user some kind of information on how the progress is going.
For instance: I get several pages and items on each page from the external API — let's say, 3 pages with 100 items each. Then I'd show the user "processing item 1 of 300". As the items go by, this number would be updated.
The problem is that right now, I'm using res.send, and it closes the connection with a 200 status. I wanted to send back this data without closing.
Some people told me to research HTTP Streaming, but I couldn't find any practical explanation on how to do it — especially using NextJS.
Pseudocode:
// api/index.ts
export default async function handler(
req: NextApiRequest,
res: NextApiResponse<Data>,
) {
// Start crawler instance
const { page } = await crawler.up()
const items = await getItems()
// Close crawler before ending
await crawler.down(page)
res.status(200).json(items)
}
// getItems.ts
export const getItems = async () => {
const items = fetch('external-url')
const result = []
for (const index in items) {
// instead of this console.log, I wanted to send this as a message to the website, so it could update a progress bar
console.log(`Processing ${index + 1} of ${items.length}`)
const processed = await processResult(items[index]) // this will take a while
result.push(processed)
}

Related

Puppeteer Page.$$eval() method returning empty arrays

I'm building a web scraping application with puppeteer. I'm trying to get an array of links to scrape from but it returns an empty array.
const scraperObject = {
url: 'https://www.walgreens.com/',
async scraper(browser){
let page = await browser.newPage();
console.log(`Navigating to ${this.url}...`);
await page.goto(this.url);
// Wait for the required DOM to be rendered
await page.waitForSelector('.CA__Featured-categories__full-width');
// Get the link to all the required links in the featured categories
let urls = await page.$$eval('.list__contain > ul#at-hp-rp-featured-ul > li', links => {
// Extract the links from the data
links = links.map(el => el.querySelector('li > a').href)
return links;
});
Whenever I ran this, the console would give me the needed array of links (example below).
Navigating to https://www.walgreens.com/...
[
'https://www.walgreens.com/seasonal/holiday-gift-shop?ban=dl_dl_FeatCategory_HolidayShop_TEST'
'https://www.walgreens.com/store/c/cough-cold-and-flu/ID=20002873-tier1?ban=dl_dl_FeatCategory_CoughColdFlu_TEST'
'https://www.walgreens.com/store/c/contact-lenses/ID=359432-tier2clense?ban=dl_dl_FeatCategory_ContactLenses_TEST'
]
So, from here I had the navigate to one of those urls through the code block below and rerun the same code to go through an array of categories to eventually navigate to the product listings page.
//Navigate to Household Essentials
let hEssentials = await browser.newPage();
await hEssentials.goto(urls[11]);
// Wait for the required DOM to be rendered
await page.waitForSelector('.content');
// Get the link to all the required links in the featured categories
let shopByNeedUrls = await page.$$eval('div.wag-row > div.wag-col-3 wag-col-md-6 wag-col-sm-6 CA__MT30', links1 => {
// Extract the links from the data
links1 = links1.map(el => el.querySelector('div > a').href)
return links1;
});
console.log(shopByNeedUrls);
}
However, whenever I run this code through the console, I receive the same navigating message but then I return an empty array(as shown in the example below)
Navigating to https://www.walgreens.com/...
[]
If anyone is able to explain why I'm outputting an empty array, that'd be great. Thank you.
I've attempted to change the parameter of the page.waitForSelector method and the page.$$eval method. However none of them appeared to work and output the same result. In fact, I recieve a timeout error sometimes for the the page.waitForSelector method.

How do I save a webpage state to mongodb without spamming http requests?

I have a to-do list that works like trello, with lists and draggable cards. I have the app functioning with both saving the state of the cards to mongoDB and localstorage using JSON. What I don't understand is how I am supposed to be constantly updating mongoDB every time I move or edit a card. It seems there must be a different way. Just for reference - this is the function I am using to update the DB entry. It saves and renders, but it feels wrong to be doing this with each state change.
async function mainUpdate(data) {
const uri =
'mongodb+srv://xxxxxxxxxxxxxxxxxx.mongodb.net/test';
const client = new MongoClient(uri);
try {
await client.connect();
await updateListingByName(client, 'user-jeff', { cachedState: data });
} finally {
await client.close();
}
}

Data from firestore it's being fetched multiple times when login and logout (vanilla JS)

Well I made this Library app, where an user can login and add books. So, when a user login the app fetch data from a firestore collection, cool. The problem exists when the user login once, logout and then login again without refreshing the app. If the user do this twice, the fetch twice, if thrice, the fetch thrice. The code that executes multiple times its the fetchBooks(), the signInWithGoogle() only executes once. Here's the code involved:
function signInWithGoogle(){
const provider = new firebase.auth.GoogleAuthProvider()
auth.signInWithPopup(provider)
.then(result => {
// Create the new user document in firestore
createNewUserDocument(result.user)
// fetch feed data
auth.onAuthStateChanged(user =>{
user? fetchBooks() : null
})
}).catch(err => {
console.log(err)
})
signUpForm.reset()
signUpModal.hide()
signInForm.reset()
signInModal.hide()
}
function fetchBooks() {
const docRef = db.collection('users').doc(auth.currentUser.uid).collection('books')
docRef.get().then(querySnapshot =>{
console.log(querySnapshot)
querySnapshot.forEach(doc => {
const data = doc.data()
console.log(doc.data());
addCardToHTML(data.title, data.author, data.pages, data.description, data.read)
})
})
}
onAuthStateChanged is a subscription that triggers itself when there's a change in the user's authentication state.
So it will trigger when you log in, when you log out, etc.
So ideally you'd want to wait until the user logs in, and then call the fetchBooks() function, but if you keep doing it inside of the subscriber the function will trigger any time the subscriber emits a new value.
I would recommend starting with a restructure of your code to have functions that do individual things. Right now, you have a function signInWithGoogle. That function should only sign the user in with Google and return a promise with the result of that sign in. Instead, you have it signing in the user, fetching books (which itself is also fetching books AND modifying the DOM), and calling methods on your signUp elements.
Restructuring this to have some other top-level function would likely help you handle your problem easier. Specifically, try something like this:
function handleSignIn() {
signInWithGoogle()
.then(fetchBooks)
.then(books => {
books.forEach(book => addCardToHTML(...))
})
}
This is a good start because now it's clear what each individual function is doing. So now to handle your specific issue, I'll assume that the problem you're facing is that you're seeing the books be added multiple times. In that case, I would think what you'd want to happen is that:
When a user is signed in, you want to load their books and display them on the page.
When they log out, you want the books to be unloaded from the screen
When they log back in, the books are re-loaded and displayed.
If all of those assumptions are correct, then your problem wouldn't be with the code you have, but rather the signout functionality. When the user signs out, you need to add a function that will remove the books from the HTML. That way, when they sign back in after signing out, the handleSignIn function will kick off again and the addCardToHTML function will be running on a blank HTML page rather than a page that already has the cards.
Example:
function handleSignOut() {
signOut()
.then(clearBookCards)
}
function clearBookCards() {
// Manipulate DOM to remove all of the card HTML nodes
}

Synchronize critical section in API for each user in JavaScript

I wanted to swap a profile picture of a user. For this, I have to check the database to see if a picture has already been saved, if so, it should be deleted. Then the new one should be saved and entered into the database.
Here is a simplified (pseudo) code of that:
async function changePic(user, file) {
// remove old pic
if (await database.hasPic(user)) {
let oldPath = await database.getPicOfUser(user);
filesystem.remove(oldPath);
}
// save new pic
let path = "some/new/generated/path.png";
file = await Image.modify(file);
await Promise.all([
filesystem.save(path, file),
database.saveThatUserHasNewPic(user, path)
]);
return "I'm done!";
}
I ran into the following problem with it:
If the user calls the API twice in a short time, serious errors occur. The database queries and the functions in between are asynchronous, causing that the changes of the first API call weren't applied when the second API checks for a profile pic to delete. So I'm left with a filesystem.remove request for an already unexisting file and an unremoved image in the filesystem.
I would like to safely handle that situation by synchronizing this critical section of code. I don't want to reject requests only because the server hasn't finished the previous one and I also want to synchronize it for each user, so users aren't bothered by the actions of other users.
Is there a clean way to achieve this in JavaScript? Some sort of monitor like you know it from Java would be nice.
You could use a library like p-limit to control your concurrency. Use a map to track the active/pending requests for each user. Use their ID (which I assume exists) as the key and the limit instance as the value:
const pLimit = require('p-limit');
const limits = new Map();
function changePic(user, file) {
async function impl(user, file) {
// your implementation from above
}
const { id } = user // or similar to distinguish them
if (!limits.has(id)) {
limits.set(id, pLimit(1)); // only one active request per user
}
const limit = limits.get(id);
return limit(impl, user, file); // schedule impl for execution
}
// TODO clean up limits to prevent memory leak?

Handling large data sets on client side

I'm trying to build an application that uses Server Sent Events in order to fetch and show some tweets (latest 50- 100 tweets) on UI.
Url for SSE:
https://tweet-service.herokuapp.com/stream
Problem(s):
My UI is becoming unresponsive because there is a huge data that's coming in!
How do I make sure My UI is responsive? What strategies should I usually adopt in making sure I'm handling the data?
Current Setup: (For better clarity on what I'm trying to achieve)
Currently I have a Max-Heap that has a custom comparator to show latest 50 tweets.
Everytime there's a change, I am re-rendering the page with new max-heap data.
We should not keep the EventSource open, since this will block the main thread if too many messages are sent in a short amount of time. Instead, we only should keep the event source open for as long as it takes to get 50-100 tweets. For example:
function getLatestTweets(limit) {
return new Promise((resolve, reject) => {
let items = [];
let source = new EventSource('https://tweet-service.herokuapp.com/stream');
source.onmessage = ({data}) => {
if (limit-- > 0) {
items.push(JSON.parse(data));
} else {
// resolve this promise once we have reached the specified limit
resolve(items);
source.close();
}
}
});
}
getLatestTweets(100).then(e => console.log(e))
You can then compare these tweets to previously fetched tweets to figure out which ones are new, and then update the UI accordingly. You can use setInterval to call this function periodically to fetch the latest tweets.

Categories

Resources