I am using the following code to download something large upon user gesture in the browser using fetch with progress indication:
const url = 'https://source.unsplash.com/random';
const response = await fetch(url);
const total = Number(response.headers.get('content-length'));
let loaded = 0;
const reader = response.body.getReader();
let result;
while (!(result = await reader.read()).done) {
loaded += result.value.length;
// Display loaded/total in the UI
}
I saw a snippet in a related question which lead me to believe this could be simplified into:
const url = 'https://source.unsplash.com/random';
const response = await fetch(url);
const total = Number(response.headers.get('content-length'));
let loaded = 0;
for await (const result of response.body.getReader()) {
loaded += result.value.length;
// Display loaded/total in the UI
}
getReader returns a ReadableStreamDefaultReader which comes from the Streams API which is a web API as well as a Node API which makes finding only web related information really hard.
In the snippet above the code fails with response.body.getReader(...) is not a function or its return value is not async iterable. I checked the object prototype and indeed I don't think it has Symbol.asyncIterator on it so yeah no wonder the browser failed to iterate over it.
So the code in that question must have been wrong, but now I wonder, is there a way to take a stream like this and iterate over it using for-await? I suppose you need wrap it in an async generator and yield the chunks in a similar way to the first snippet, right?
Is there an existing or planned API which makes this more streamlined, something closer to the second snippet, but actually working?
The ReadableStream itself implements an async iterable
for await (const result of response.body) {
loaded += result.value.length;
// Display loaded/total in the UI
}
The response.body.getReader() .read() returns {value:..., done: Boolean} meets exactly what asyncIterator needs; before the ReadableStream itself got asyncIterator implementation, you can easily polyfill it:
if (!response.body[Symbol.asyncIterator]) {
response.body[Symbol.asyncIterator] = () => {
const reader = response.body.getReader();
return {
next: () => reader.read(),
};
};
}
for await (const result of response.body) {
loaded += result.length;
console.log(((loaded / total) * 100).toFixed(2), '%');
}
See https://jsfiddle.net/6ostwkr2/
Related
I want to take two different fetches, put them into a variable form so their important data can be used for something else other than just logging the data.
I'm trying to do this via window response async, however, I am currently at a dead end because though what I'm doing works on one strand of data, it doesn't work on two because of the JSON body stream already read error.
let RESPONSE = window.Response.prototype.json;
window.Response.prototype.json = async function () {
if (!('https://a/')) return RESPONSE.call(this)
let x = await RESPONSE.call(this);
if (!('https://b/')) return RESPONSE.call(this)
let y = await RESPONSE.call(this);
for (let detect in x) {
if (x[detect] !== y[detect]) {
console.log(x[detect]);
console.log(y[detect]);
}
}
return x;
return y;
};
How can I keep the data in a variable form that can be used for something like this:
for (let detect in x) {
if (x[detect] !== y[detect]) {
console.log(x[detect]);
console.log(y[detect]);
}
but whilst being able to have both variables defined at the same time? This would mean I would need to get past the body stream error while also keeping that core code. How can I do that?
Does this help you?
async function doTwoRequestsAndCompareStatus() {
const res1 = await fetch("https://fakejsonapi.com/fake-api/employee/api/v1/employees");
const res2 = await fetch("https://fakejsonapi.com/fake-api/employee/api/v1/employees");
const data1 = await res1.json();
const data2 = await res2.json();
console.log('both were equal?', data1.status === data2.status);
}
// Don't forget the actually call the function π
doTwoRequestsAndCompareStatus();
Although I would recommend this, both because it's cleaner and faster, as the fetches are executed at the same time and not sequentially.
async function doTwoRequestsAndCompareStatus() {
const [data1, data2] = await Promise.all([
fetch("https://fakejsonapi.com/fake-api/employee/api/v1/employees").then(r => r.json()),
fetch("https://fakejsonapi.com/fake-api/employee/api/v1/employees").then(r => r.json()),
]);
console.log('both were equal?', data1.status === data2.status);
}
// Don't forget the actually call the function π
doTwoRequestsAndCompareStatus();
If you find the first one easier to understand though, I would recommend using it π.
I am sending chained Fetch requests. First, I retrieve data from database and request pictures related to every title I got.
The HTML code won't be loaded to results div before image requests are sent. So it takes long time to see articles. How can I make the text to load before image requests starting to be sent?
async function getArticles(params) {
url = 'http://127.0.0.1:8000/articles/api/?'
url2 = 'https://api.unsplash.com/search/photos?client_id=XXX&content_filter=high&orientation=landscape&per_page=1&query='
const article = await fetch(url + params).then(response => response.json());
const cards = await Promise.all(article.results.map(async result => {
try {
let image = await fetch(url2 + result.title).then(response => response.json())
let card = // Creating HTML code by using database info and Splash images
return card
} catch {
let card = // Creating HTML code by using info and fallback images from database
return card
}
}))
document.getElementById('results').innerHTML = cards.join("")
};
I have tried using them separately but I was getting Promise Object.
If you don't want to wait for all the fetches, use an ordinary for loop and await each one sequentially.
async function getArticles(params) {
url = 'http://127.0.0.1:8000/articles/api/?'
url2 = 'https://api.unsplash.com/search/photos?client_id=XXX&content_filter=high&orientation=landscape&per_page=1&query='
const article = await fetch(url + params).then(response => response.json());
for (let i = 0; i < article.results.length; i++) {
let result = article.results[i];
let card;
try {
let image = await fetch(url2 + result.title).then(response => response.json())
card = // Creating HTML code by using database info and Splash images
} catch {
card = // Creating HTML code by using info and fallback images from database
}
document.getElementById('results').innerHTML += card;
}
}
However, this will be slower because it won't start each fetch until the previous one completes.
It's hard to run all the fetches concurrently but display the results in the order that they were sent, rather than the order that the responses were received. You could do it by creating a container DIV for each response before sending, then filling in the appropriate DIV when its response is received.
I'm trying to practice some web scraping with prices from a supermarket. It's with node.js and puppeteer. I can navigate throught the website in beginning with accepting cookies and clicking a "load more button". But then when I try to read div's containing the products with querySelectorAll I get stuck. It returns undefined even though I wait for a specific div to be present. What am I missing?
Problem is at the end of the code block.
const { product } = require("puppeteer");
const scraperObjectAll = {
url: 'https://www.bilkatogo.dk/s/?query=',
async scraper(browser) {
let page = await browser.newPage();
console.log(`Navigating to ${this.url}`);
await page.goto(this.url);
// accept cookies
await page.evaluate(_ => {
CookieInformation.submitAllCategories();
});
var productsRead = 0;
var productsTotal = Number.MAX_VALUE;
while (productsRead < 100) {
// Wait for the required DOM to be rendered
await page.waitForSelector('button.btn.btn-dark.border-radius.my-3');
// Click button to read more products
await page.evaluate(_ => {
document.querySelector("button.btn.btn-dark.border-radius.my-3").click()
});
// Wait for it to load the new products
await page.waitForSelector('div.col-10.col-sm-4.col-lg-2.text-center.mt-4.text-secondary');
// Get number of products read and total
const loadProducts = await page.evaluate(_ => {
let p = document.querySelector("div.col-10.col-sm-4.col-lg-2").innerText.replace("INDLΓS FLERE", "").replace("Du har set ","").replace(" ", "").replace(/(\r\n|\n|\r)/gm,"").split("af ");
return p;
});
console.log("Products (read/total): " + loadProducts);
productsRead = loadProducts[0];
productsTotal = loadProducts[1];
// Now waiting for a div element
await page.waitForSelector('div[data-productid]');
const getProducts = await page.evaluate(_ => {
return document.querySelectorAll('div');
});
// PROBLEM HERE!
// Cannot convert undefined or null to object
console.log("LENGTH: " + Array.from(getProducts).length);
}
The callback passed to page.evaluate runs in the emulated page context, not in the standard scope of the Node script. Expressions can't be passed between the page and the Node script without careful considerations: most importantly, if something isn't serializable (converted into plain JSON), it can't be transferred.
querySelectorAll returns a NodeList, and NodeLists only exist on the front-end, not the backend. Similarly, NodeLists contain HTMLElements, which also only exist on the front-end.
Put all the logic that requires using the data that exists only on the front-end inside the .evaluate callback, for example:
const numberOfDivs = await page.evaluate(_ => {
return document.querySelectorAll('div').length;
});
or
const firstDivText = await page.evaluate(_ => {
return document.querySelector('div').textContent;
});
I have say 300 items 10 show to a page. The page loads the JSON data and is limited to 10 (this cannot be changed)
I want to scrub through the 30 odd pages pulling each item and listing it.
url.com/api/some-name?page=1 etc
The script ideally will use the above URL as a rule and scrub through increments of 1 until all 10 from each page is populated.
Can this be done? How would I go about it? Any advice or assistance to this would help me greatly in learning and looking at methods people suggest.
const getInfo = async function(pageNo) {
const jsonUrl = "https://website.com/api/some-title";
let actualUrl = jsonUrl + `?page=${pageNo}`;
let jsonResults = await fetch(actualUrl).then(response => {
return response.json();
});
return jsonResults;
};
const getEntireList = async function(pageNo) {
const results = await getInfo(pageNo);
console.log("Retreiving data from API for page:" + pageNo);
if (results.length > 0) {
return results.concat(await getEntireList(pageNo));
} else {
return results;
}
};
(async () => {
const entireList = await getEntireList();
console.log(entireList);
})();
I can see some issues in your code.
the initial call to getEntireList() should be initialised with the index of first page, maybe like const entireList = await getEntireList(1);
The page number will need to be incremented at some point.
results.concat() probably won't have the desired effect. json() returns an object, list, or value (depending on the server) and results will be one of those type. concat() operates on strings; so calling json() is (at best) redundant.
I'm consuming JSON stream on UWP WinRT with this code:
async function connect() {
let stream: MSStream;
return new CancellableContext<void>(
async (context) => {
// this will be called immediately
stream = await context.queue(() => getStreamByXHR()); // returns ms-stream object
await consumeStream(stream);
},
{
revert: () => {
// this will be called when user cancels the task
stream.msClose();
}
}
).feed();
}
async function consumeStream(stream: MSStream) {
return new CancellableContext<void>(async (context) => {
const input = stream.msDetachStream() as Windows.Storage.Streams.IInputStream;
const reader = new Windows.Storage.Streams.DataReader(input);
reader.inputStreamOptions = Windows.Storage.Streams.InputStreamOptions.partial;
while (!context.canceled) {
const content = await consumeString(1000);
// ... some more codes
}
async function consumeString(count: number) {
await reader.loadAsync(count); // will throw when the stream gets closed
return reader.readString(reader.unconsumedBufferLength);
}
}).feed();
}
Here, the document about InputStreamOptions.partial says:
The asynchronous read operation completes when one or more bytes is available.
However, reader.loadAsync completes even when reader.unconsumedBufferLength is 0 and this makes CPU load. Is this an API bug or can I prevent this behavior so that loadAsync can complete only when unconsumedBufferLength is greater than 0?
PS: Here is a repro with pure JS: https://github.com/SaschaNaz/InputStreamOptionsBugRepro
Is this an API bug or can I prevent this behavior so that loadAsync can complete only when unconsumedBufferLength is greater than 0
Most likey it also completes at the end of stream. So in that case the unconsumedBufferLength will be zero and needs to be catered for.
In fact the example at https://msdn.microsoft.com/en-us/library/windows/apps/windows.storage.streams.datareader.aspx shows something similar (admittedly not using that option):
// Once we have written the contents successfully we load the stream.
await dataReader.LoadAsync((uint)stream.Size);
var receivedStrings = "";
// Keep reading until we consume the complete stream.
while (dataReader.UnconsumedBufferLength > 0)
πΉ