I have a function that retrieves the data from a document correctly. However, one image has the URL as it's field already. The other image only has a firebase image reference. Before I proceed to another function, I need to wait until the download URL has been fetched. I've attempted it below without much luck, and I'm not entirely sure i've stuck the async in the right place either.
getPhoto(user_id: string) {
this._subscription = this._activatedRoute.params.pipe(
switchMap(params => {
return this.service.getPhoto(params);
})
).subscribe(async(result) => {
const imageOne = result.imageOne;
// Need to await the download URL here
const imageTwo = this.blah(result.imageTwoRef)
this.otherFunction(imageOne, imageTwo)
});
}
blah(reference){
var storage = firebase.storage();
var imageTwo = reference;
var imagePathRef = storage.ref().child(imageTwo);
imagePathRef.getDownloadURL().then((url) => {
return url;
});
}
Using the async keyword only works on function, and by doing so, it will return a promise. So your usage is correct in that instance.
You can use await only in an async function and next to a promise call. It will stop the execution until your promise get resolved.
I think you are almost done. Try it like this and let me know:
getPhoto(user_id: string) {
this._subscription = this._activatedRoute.params.pipe(
switchMap(params => {
return this.service.getPhoto(params);
})
).subscribe(async(result) => {
const imageOne = result.imageOne;
// Need to await the download URL here
const imageTwo = await this.blah(result.imageTwoRef)
this.otherFunction(imageOne, imageTwo);
});
}
async blah(reference){
var storage = firebase.storage();
var imageTwo = reference;
var imagePathRef = storage.ref().child(imageTwo);
const url = await imagePathRef.getDownloadURL();
return url;
}
Related
I've created a script using axios and cheerio to fetch different shop names and their associated links from yellowpages and then used those links to scrape phone and email from their inner pages. The script is doing fine.
What I wish to do now is use the next page link to keep grabbing the content from next pages as well. I just can't figure out how to apply the logic of parsing and using next pages within getLinks() function.
At the moment this is what I'm trying with:
const axios = require('axios');
const cheerio = require('cheerio');
const startUrl = 'https://www.yellowpages.com/search?search_terms=Pizza&geo_location_terms=San+Francisco%2C+CA';
const host = 'https://www.yellowpages.com';
const getLinks = async (url,host,callback) => {
const { data } = await axios.get(url);
const $ = cheerio.load(data);
$('[class="result"] a.business-name').each(function(){
let items = $(this).find('span').text();
let links = host + $(this).attr("href");
return callback(items,links);
});
}
const fetchContent = async (shopName,shopLink,callback) => {
const { data } = await axios.get(shopLink);
const $ = cheerio.load(data);
let phone = $('.contact > p.phone').eq(0).text();
let email = $('.business-card-footer > a.email-business').eq(0).attr("href");
return callback(shopName,shopLink,phone,email);
}
async function scrapeData() {
getLinks(startUrl,host,function(itemName,link){
fetchContent(itemName,link,function(shopName,shopLink,phone,email){
console.log({shopName,shopLink,phone,email});
});
});
}
scrapeData();
Next page links are often in [rel=next] so it's usually something like this:
async function get(url) {
const { data } = await axios.get(url);
const $ = cheerio.load(data);
return $
}
async function run(){
let url = 'https://www.yellowpages.com/search?search_terms=Pizza&geo_location_terms=San+Francisco%2C+CA'
let $ = await get(url)
// doSomething($)
let href = $('[rel=next]').attr('href')
while(href){
url = new URL(href, url).href
$ = await get(url)
// doSomething($)
href = $('[rel=next]').attr('href')
}
}
I am scraping multiple pages with cheerio and axios in node.js
I am having a hard time with Promises, can someone help me return the JSON if I hit the last page? Thanks!
const getWebsiteContent = async (url) => {
await axios.get(url).then(res => {
const $ = cheerio.load(res.data)
pageNum = getTotalpages($); // Get the pagination
console.log(url);
//Some scraping here
})
indexPage++; // Increment to the next page
const nextPageLink = baseUrl + '&page=' + indexPage; // get next page
if (indexPage > pageNum) {
var editedText = text.slice(0, text.length - 1);
editedText += ']}';
editedText = JSON.parse(editedText); // I want to return this and use elsewhere
return editedText;
}
setTimeout(async () => {
getWebsiteContent(nextPageLink); // Call itself
}, 1000);
}
var myJSON= await getWebsiteContent(baseUrl); // something like this
I would write getPages as an async generator -
async function* getPages (href, initPage = 0) {
const res = await axios.get(setPage(href, initPage))
const $ = cheerio.load(res.data)
const pages = getTotalpages($)
yield { page: initPage, dom: $ }
for (let p = initPage; p < pages; p++) {
await sleep(1000)
const r = await axios.get(setPage(href, p))
yield { page: p, dom: cheerio.load(r.data) }
}
}
This depends on helper setPage that manipulates the href page number using the url module, which is much safer than hobbling together strings by hand -
function setPage (href, page) {
const u = new URL(href)
u.searchParams.set("page", page)
return u.toString()
}
And another helper, sleep, which prevents the mixing of setTimeout with async-based code. This allows us to easily pause between pages -
async function sleep (ms) {
return new Promise(r => setTimeout(r, ms))
}
Finally we write scrape which is a simple wrapper around getPages. This allows us to reuse the getPages function to scrape various elements as needed. A benefit of using this approach is that the caller can determine what happens with each page. Below we push to result array, but as another example we could write each page to disk using the fs module. Obviously this for you to decide -
async function scrape (href) {
const result = []
for await (const {page, dom} of getPages(href)) {
console.log("scraped page", page) // some status message
result.push(getSomeData(dom)) // get something from each page
}
return result
}
scrape(myUrl).then(console.log, console.error)
You shouldn't be using then with your async / await code.
pagination should look something like this:
let response = await axios.get(url)
let $ = cheerio.load(response.data)
// do some scraping
while(url = $('[rel=next]').attr('href')){
response = await axios.get(url)
$ = cheerio.load(response.data)
// do more scraping
}
I'm fetching the stylesheet and replacing all CSS variables with the actual hex value it corresponds to when the user changes the color as desired.
I created an event handler so that when the user clicks the download button, all of the colors he/she selected, would be saved in the stylesheet at that moment, but it doesn't seem to work. I know it's an issue with my understanding of promises as a whole and async await
What I did.
const fetchStyleSheet = async () => {
const res = await fetch("./themes/prism.css");
const orig_css = await res.text();
let updated_css = orig_css;
const regexp = /(?:var\(--)[a-zA-z\-]*(?:\))/g;
let cssVars = orig_css.matchAll(regexp);
cssVars = Array.from(cssVars).flat();
console.log(cssVars)
for await (const variable of cssVars) {
const trimmedVar = variable.slice(6, -1)
const styles = getComputedStyle(document.documentElement)
const value = String(styles.getPropertyValue(`--${trimmedVar}`)).trim()
updated_css = updated_css.replace(variable, value);
}
console.log(updated_css)
return updated_css
}
const main = async () => {
const downloadBtn = document.getElementById('download-btn')
downloadBtn.addEventListener('click', () => {
const updated_css = fetchStyleSheet()
downloadBtn.setAttribute('href', 'data:application/octet-stream;charset=utf-8,' + encodeURIComponent(updated_css))
downloadBtn.setAttribute('download', 'prism-theme.css')
})
}
main()
I can't await the updated_css because it falls into the callback of the click event, which is a new function.
Then I did the following thinking it would work since it was top level.
const downloadBtn = document.getElementById('download-btn')
downloadBtn.addEventListener('click', async () => {
const updated_css = await fetchStyleSheet()
downloadBtn.setAttribute('href', 'data:application/octet-stream;charset=utf-8,' + encodeURIComponent(updated_css))
downloadBtn.setAttribute('download', 'prism-theme.css')
})
That gave me the following error TypeError: NetworkError when attempting to fetch resource.
I understand that calling fetchStyleSheet() only returns a promise object at first and to get the value (which is updated_css), I need to follow it with .then() or await it.
The await is the correct approach to deal with the fetchStyleSheet() call returning a promise, your problem is that the click on the link tries to follow the href attribute immediately - before you set it to that data url. What you would need to do instead is prevent the default action, asynchronously do your stuff, and then re-trigger the click when you're done. Also don't forget to deal with possible exceptions:
const downloadBtn = document.getElementById('download-btn')
downloadBtn.addEventListener('click', async (event) => {
if (!e.isTrusted) return // do nothing on the second run
try {
event.preventDefault()
const updated_css = await fetchStyleSheet()
downloadBtn.setAttribute('href', 'data:application/octet-stream;charset=utf-8,' + encodeURIComponent(updated_css))
downloadBtn.setAttribute('download', 'prism-theme.css')
downloadBtn.click() // simulate a new click
} catch(err) {
console.error(err) // or alert it, or put the message on the page
}
})
I made a small webpage that takes information from the "Yahoo Weather" API and displays it in divs on the page.
This is the JS:
const url = 'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%20in%20(select%20woeid%20from%20geo.places(1)%20where%20text%3D%22nome%2C%20ak%22)&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys';
let data = 1;
const getWeather = async function() {
const fetchWeather = await fetch(url);
const result = await fetchWeather.json();
return data=result;
}
getWeather();
const showData = async function(info) {
let html = '';
const newInfo = info.query.results.channel.item.forecast.map((item, index) => {
html += `<div id='item${index}'><p>Date: ${item.date}</p>`;
html += `<p>Day: ${item.day}</p>`;
html += `<p>Highest temp: ${item.high}</p>`;
html += `<p>Lowest temp: ${item.low}</p></div>`;
return html;
});
}
const display = async function() {
const info = await showData(data);
weatherInfo.innerHTML = data;
}
display();
My goal is that when the page loads, it displays the information gathered from the promise returned by the API.
I get this error:Uncaught (in promise) TypeError: Cannot read property 'results' of undefined
Basically as far as I understand, by the time the "display()" is invoked, the variable "data" doesn't have anything in it yet.
What I'm trying to achieve is that display() will only work after "data" is defined, but without a for loop or something like that.
Any help will be appreciated!
As far as I understand, by the time the "display()" is invoked, the variable "data" doesn't have anything in it yet.
Yes. Don't use a global data variable at all1. getWeather returns a promise that will fulfill with the data, so you know exactly when it becomes available:
getWeather().then(display);
with
async function getWeather () {
const fetchWeather = await fetch(url);
const result = await fetchWeather.json();
return result; // drop the `data =` assignment
}
async function display(data) {
// ^^^^
const info = await showData(data);
weatherInfo.innerHTML = info;
}
Alternatively, especially when you don't want to use a then chain, just put const data = await getWeather(); in the display function.
1: If you insist to store the data in the global scope because you want to use it in multiple places, put the promise for the data in the variable not the result itself.
Here is my solutiuon:
const url = 'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%20in%20(select%20woeid%20from%20geo.places(1)%20where%20text%3D%22nome%2C%20ak%22)&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys';
const getWeather = async function() {
const fetchWeather = await fetch(url);
return await fetchWeather.json();
}
const showData = async function(info) {
let html = '';
info.query.results.channel.item.forecast.map((item, index) => {
html += `<div id='item${index}'><p>Date: ${item.date}</p>`;
html += `<p>Day: ${item.day}</p>`;
html += `<p>Highest temp: ${item.high}</p>`;
html += `<p>Lowest temp: ${item.low}</p></div>`;
});
return html;
}
const display = async function() {
const data = await getWeather();
const info = await showData(data);
weatherInfo.innerHTML = info;
}
display();
https://plnkr.co/edit/1b0fpBji7y6sZPODPDjY?p=preview
In https://stackoverflow.com/a/18658613/779159 is an example of how to calculate the md5 of a file using the built-in crypto library and streams.
var fs = require('fs');
var crypto = require('crypto');
// the file you want to get the hash
var fd = fs.createReadStream('/some/file/name.txt');
var hash = crypto.createHash('sha1');
hash.setEncoding('hex');
fd.on('end', function() {
hash.end();
console.log(hash.read()); // the desired sha1sum
});
// read all file and pipe it (write it) to the hash object
fd.pipe(hash);
But is it possible to convert this to using ES8 async/await instead of using the callback as seen above, but while still keeping the efficiency of using streams?
The await keyword only works on promises, not on streams. There are ideas to make an extra stream-like data type that would get its own syntax, but those are highly experimental if at all and I won't go into details.
Anyway, your callback is only waiting for the end of the stream, which is a perfect fit for a promise. You'd just have to wrap the stream:
var fd = fs.createReadStream('/some/file/name.txt');
var hash = crypto.createHash('sha1');
hash.setEncoding('hex');
// read all file and pipe it (write it) to the hash object
fd.pipe(hash);
var end = new Promise(function(resolve, reject) {
hash.on('end', () => resolve(hash.read()));
fd.on('error', reject); // or something like that. might need to close `hash`
});
There also exists a helper function to do just that in more recent versions of nodejs - pipeline from the stream/promises module:
import { pipeline } from 'node:stream/promises';
const fd = fs.createReadStream('/some/file/name.txt');
const hash = crypto.createHash('sha1');
hash.setEncoding('hex');
// read all file and pipe it (write it) to the hash object
const end = pipeline(fd, hash);
Now you can await that promise:
(async function() {
let sha1sum = await end;
console.log(sha1sum);
}());
If you are using node version >= v10.0.0 then you can use stream.pipeline and util.promisify.
const fs = require('fs');
const crypto = require('crypto');
const util = require('util');
const stream = require('stream');
const pipeline = util.promisify(stream.pipeline);
const hash = crypto.createHash('sha1');
hash.setEncoding('hex');
async function run() {
await pipeline(
fs.createReadStream('/some/file/name.txt'),
hash
);
console.log('Pipeline succeeded');
}
run().catch(console.error);
Node V15 now has a promisfiy pipeline in stream/promises.
This is the cleanest and most official way.
const { pipeline } = require('stream/promises');
async function run() {
await pipeline(
fs.createReadStream('archive.tar'),
zlib.createGzip(),
fs.createWriteStream('archive.tar.gz')
);
console.log('Pipeline succeeded.');
}
run().catch(console.error);
We all should appreciate how much works it's done here:
Capture errors in all the streams.
Destroy unfinished streams when error is raised.
Only return when the last writable stream is finished.
This pipe thing is one of the most powerful feature Node.JS has. Making it fully async is not easy. Now we have it.
Something like this works:
for (var res of fetchResponses){ //node-fetch package responses
const dest = fs.createWriteStream(filePath,{flags:'a'});
totalBytes += Number(res.headers.get('content-length'));
await new Promise((resolve, reject) => {
res.body.pipe(dest);
res.body.on("error", (err) => {
reject(err);
});
dest.on("finish", function() {
resolve();
});
});
}
2021 Update:
New example from Node documentation:
async function print(readable) {
readable.setEncoding('utf8');
let data = '';
for await (const chunk of readable) {
data += chunk;
}
console.log(data);
}
see https://nodejs.org/api/stream.html#stream_readable_symbol_asynciterator
I would comment, but don't have enough reputation.
A WORD OF CAUTION:
If you have an application that is passing streams around AND doing async/await, be VERY CAREFUL to connect ALL pipes before you await. You can end up with streams not containing what you thought they did. Here's the minimal example
const { PassThrough } = require('stream');
async function main() {
const initialStream = new PassThrough();
const otherStream = new PassThrough();
const data = [];
otherStream.on('data', dat => data.push(dat));
const resultOtherStreamPromise = new Promise(resolve => otherStream.on('end', () => { resolve(Buffer.concat(data)) }));
const yetAnotherStream = new PassThrough();
const data2 = [];
yetAnotherStream.on('data', dat => data2.push(dat));
const resultYetAnotherStreamPromise = new Promise(resolve => yetAnotherStream.on('end', () => { resolve(Buffer.concat(data2)) }));
initialStream.pipe(otherStream);
initialStream.write('some ');
await Promise.resolve(); // Completely unrelated await
initialStream.pipe(yetAnotherStream);
initialStream.end('data');
const [resultOtherStream, resultYetAnotherStream] = await Promise.all([
resultOtherStreamPromise,
resultYetAnotherStreamPromise,
]);
console.log('other stream:', resultOtherStream.toString()); // other stream: some data
console.log('yet another stream:', resultYetAnotherStream.toString()); // yet another stream: data
}
main();
I believe it will be helpful for someone:
async function readFile(filename) {
let records = []
return new Promise(resolve => {
fs.createReadStream(filename)
.on("data", (data) => {
records.push(data);
})
.on("end", () => {
resolve(records)
});
})
}