I am trying to figure out how to run in parallel (in this case 10) async function based on a stream of parsing datas from a website using lapwinglabs/x-ray webscraper.
let pauser = new Rx.Subject()
let count = 0
let max = 10
// function that parse a single url to retrieve data
// return Observable
let parsing_each_link = url => {
return Rx.Observable.create(
observer => {
xray(url, selector)((err, data) => {
if (err) observer.onError(err)
observer.onNext(data)
observer.onCompleted()
})
})
}
// retrieve all the urls from a main page => node stream
let streamNode = xray(main_url, selector)
.paginate(some_selector)
.write()
.pipe(JSONStream.parse('*'))
// convert node stream to RxJS
let streamRx = RxNode.fromStream(streamNode)
.do(() => {
if (count === max) {
pauser.onNext(true)
count = 0
}
})
.do(() => count++)
.buffer(pauser) // take only 10 url by 10 url
streamRx.subscribe(
ten_urls => {
Rx.Observable.forkJoin(
ten_urls.map(url => parsing_each_link(url))
)
.subscribe(
x => console.log("Next : ", JSON.stringify(x, null, 4))
)
}
)
Next on the last console.log is never called ?!?
Impossible to say for sure, but if you can make sure that ten_urls are emitted as expected, then the next step is to make sure that the observable parsing_each_link does complete, as forkJoin will wait for the last value of each of its source observables.
I could not see any call to observer.onComplete in your code.
Related
I am using a chrome extension to activate infinite async instances of loops so they do not conflict with each other.
There is a list of values and one item is being passed to each individual loop. Those loops are executed in the content.js and are being managed by the background.js but they are initialized, started and cancelled from the popup.js.
Now big Question is how do I use best practices to make the management of multiple async loops as easy as possible?
Is there any possible way to also cancel these loops in an easy way?
example code:
content.js:
chrome.runtime.onMessage.addListener(
function(request, sender, sendResponse) {
if(request.message.active){
console.log("do something");
dispatch(request.message);
}});
async function dispatch(alternator) {
if (alternator.active) {
await new Promise(resolve => setTimeout(resolve, alternator.timeout));
console.log("do something");
}
return;
}
This background.js should have a list of async loops to manage in an array or something easy to manage. The for-loop is consuming too much time and the timeout is causing too much load.
background.js
async function dispatchBackground() {
while (1) {
for (let i = 0; i < alternator.length; i++) {
if(alternator[i].active){
chrome.tabs.sendMessage(alternator[i].tab_id, {"message": alternator[i]});
}
}
await new Promise(resolve => setTimeout(resolve, 100));
}
return;
}
You should probably use a library.
...but that would be boring!
In the following 👇 code, macrotask uses requestIdleCallback to run a callback on a JS runtime macrotask.
The user supplies the data to process, the logic to run (synchronously) at each step, and the continuation condition; they do not have to worry about explicitly yielding to an API.
Function createTask constructs a generator to return steps until the continuationPredicate returns false. Generator functions enable us to suspend and resume synchronous code - which we need to do here to switch between tasks in a round-robin fashion. A more advanced solution could prioritise tasks according to a heuristic.
createCircularList returns a wrapper around an array that exposes add, remove, and next (get the next item in creation order or, if we are at the "end", loop around to the first item again).
createScheduler maintains the task list. While there are tasks remaining in the task list, this function will identify the next task, schedule its next step on a macrotask, and wait for that step to complete. If that was the final step in the current task, the task is then removed from the task list.
Note that the precise interleaving of the output of this code will depend on things like how busy your machine is. The intent of the demonstration is to show how the task queue can be added-to while it is being drained.
const log = console.log
const nop = () => void 0
const stepPerItem = (_, i, data) => i < data.length
const macrotask = (cb) => (...args) => new Promise((res) => (typeof requestIdleCallback ? requestIdleCallback : setTimeout)(() => res(cb(...args))))
const createTask = (data,
step,
continuePredicate = stepPerItem,
acc = null,
onDone = nop) =>
(function*(i = 0) {
while(continuePredicate(acc, i, data)) {
acc = step(acc, i, data)
yield [acc, onDone]
i++
}
return [acc, onDone]
})()
const createCircularList = (list = []) => {
const add = list.push.bind(list)
const remove = (t) => list.splice(list.indexOf(t), 1)
const nextIndex = (curr, currIndex = list.indexOf(curr)) =>
(currIndex === list.length - 1) ? 0 : currIndex + 1
const next = (curr) =>
list.length ? list[nextIndex(curr)] : null
return { add, remove, next }
}
const createScheduler = (tasks = createCircularList()) => {
let isRunning = false
const add = (...tasksToAdd) =>
(tasksToAdd.forEach((t) => tasks.add(t)),
!isRunning && (isRunning = true, go()))
const remove = tasks.remove.bind(tasks)
const go = async (t = null) => {
while(t = tasks.next(t))
await macrotask(({ done, value: [result, onDone] } = t.next()) =>
done && (tasks.remove(t), onDone(result)))()
isRunning = false
}
return { add, remove }
}
const scheduler = createScheduler()
const task1 = createTask([...Array(5)], (_, i) => log('task1', i))
const task2 = createTask([...Array(5)], (_, i) => log('task2', i))
const task3 = createTask([...Array(5)], (_, i) => log('task3', i))
scheduler.add(task1, task2)
setTimeout(() => scheduler.add(task3), 50) // you may need to fiddle with the `setTimeout` delay here to observe meaningful interleaving
Sorry for the very confusing question, I have this code that gets information from a website without any node modules or libraries. It is a list of users separated into different pages use ?page= at the end of the URL. I have managed to iterate through the pages and split up the raw HTML just right. However, my promise resolves before all the data is collected. How can I wait for everything to finish before I resolve the promise? I have tried countless solutions, but none seem to work. Please don't ask to use a node package, as my goal is to not use one :) A friend helped with the regex and splitting it up. Here is the code I am using:
function getData() {
return new Promise((resolve, reject) => {
let final = [] //the array of users returned in the end
const https = require("https"), url = "https://buildtheearth.net/buildteams/121/members";
https.get(url + "?page=1", request => { //initial request, gets the number of user pages.
let rawList = '';
request.setEncoding("utf8"),
request.on("data", data => {rawList += data}),
request.on("end", () => {
if(request = (request = (request = rawList.substring(rawList.indexOf('<div class="pagination">'))).substring(0, request.indexOf("</div>"))).match(/<a(.+)>(.+)<\/a>/g)) {
for(let t = parseInt(request[request.length - 1].match(/(\d+)(?!.*\d)/g)), a = 1; a < t + 1; a++) { //iterates through member pages
https.get(url + "?page=" + a, request2 => { //https request for each page of members
let rawList2 = '';
request2.setEncoding('utf8'),
request2.on("data", data => {rawList2 += data}),
request2.on("end", () => {
let i = rawList2.match(/<td>(.+)<\/td>/g); //finds table in HTML
if (i)
for (var t = 1; t < i.length; t += 3) //iterates through rows in table
console.log(i[t].replace(/<td>/g, "").replace(/<\/td>/g, "")), /* logs element to the console (for testing) */
final.push(i[t].replace(/<td>/g, "").replace(/<\/td>/g, "")); //pushes element to the array that is resolved in the end
})
})
}
}
resolve(final) //resolves promise returning final array, but resolves before elements are added with code above
})
})
})
}
If this helps, here is the website I am trying to get info from.
I am still a little new to JS so if you could help, I would really appreciate it :)
I ended up turning each action into an async function with a try and catch block and then chained the functions together with .then() For the base (getting data from a website) I took inspiration from an article on Medium. Here is the site I am pulling data from, and here is the function to get data from a website:
const getData = async (url) => {
const lib = url.startsWith('https://') ? https : http;
return new Promise((resolve, reject) => {
const req = lib.get(url, res => {
if (res.statusCode < 200 || res.statusCode >= 300) {
return reject(new Error(`Status Code: ${res.statusCode}`));
}
const data = [];
res.on('data', chunk => data.push(chunk));
res.on('end', () => resolve(Buffer.concat(data).toString()));
});
req.on('error', reject);
req.end();
});
};
and then I got the number of pages (which can be accessed by appending ?page=<page number> to the end of the url) with this this function:
const pages = async () => {
try {
let html = await getData('https://buildtheearth.net/buildteams/121/members',);
let pages = await (html = (html = html.substring(html.indexOf('<div class="pagination">'))).substring(0, html.indexOf("</div>"))).match(/<a(.+)>(.+)<\/a>/g)
let pageCount = await parseInt(pages[pages.length - 1].match(/(\d+)(?!.*\d)/g))
return pageCount
} catch (error) {
console.error(error);
}
}
and then I used the page count to iterate through the pages and add the HTML of each to an array with this function:
const getPages = async pageCount => {
let returns = []
try {
for (page = 1; page <= pageCount; page++) {
try {
let pageData = await getData('https://buildtheearth.net/buildteams/121/members?page=' + page)
returns.push(pageData)
} catch (error) {
return error
}
}
} catch (error) {
return error
} finally {return returns}
}
and then I iterated through the array of strings of HTML of each page, and extracted the data I needed out of each with this function which would return the list of members I need:
const iteratePages = async pages => {
if (!Array.isArray(pages)) return
try {
let returns = []
await pages.forEach(page => {
let list = page.match(/<td>(.+)<\/td>/g);
if (list)
for (var element = 1; element < list.length; element += 3)
returns.push(list[element].replace(/<td>/g, "").replace(/<\/td>/g, ""));
})
return returns
} catch (error) {
return error
}
}
And then it was a matter of chaining each together to get the array I needed:
pages().then(pageCount => getPages(pageCount)).then(pages => iteratePages(pages)).then(finalList => {console.log(finalList); console.log(finalList.length)})
here what my scenario is i have 2 api's apiOne and apiTwo and when ever i call the apiOne is should give response and if the response is success then i have to send this repsonse to apiTwo as param then apiTwo will give another response in that i may get like "created" ,"in_progress" . here the issue is
How can i call the apitwo using interval for every 3 seconds until i get the response as "in_progress" and if i didnt get the response as like above then i need to poll the apiTwo till max 2 min and cancel the call. if i get the response as in_progress then i need to stop the interval or max 2 min cancel the interval or subcription.
I already wrote the code in nested way but it is not efficient .
below is my code
initiate() {
this.showProgress = true;
const data = {
id: this.id,
info: this.Values.info,
};
// First Api call
this.userServ.start(data).subscribe(res => {
this.Ids = res['Id'];
if (this.Ids) {
// Second Api call
this.Service.getStatus(this.Ids).subscribe(resp => {
if (resp) {
this.Status = res['Status'];
// if resp is In_Progress
if (this.Status === 'In_Progress') {
this.Start();
} else {
// if resp is not In_Progress then i get the response i am calling the api
this.intervalTimer = interval(3000).subscribe(x => {
this.Service.Status(this.Ids).subscribe(ress => {
this.Status = ress['Status'];
if (this.Status === 'In_Progress') {
this.delayStart();
this.intervalTimer.unsubscribe();
}
});
});
}
}
}, err => {
console.log(err);
});
}
}, error => {
console.log(error);
});
}
You may consider using the below approach See Code on Stackblitz
id = 1;
Values = { info: true };
get data() { return { id: this.id,info: this.Values.info}}
showProgressSubject$ = new BehaviorSubject(true);
showProgressAction$ = this.showProgressSubject$.asObservable();
currentStatusSubject$ = new Subject<string>();
currentStatus$ = this.currentStatusSubject$.asObservable()
stoppedSubject$ = new Subject();
stopped$ = this.stoppedSubject$.asObservable();
startedSubject$ = new Subject();
started$ = this.startedSubject$.asObservable();
interval = 500; // Change to 3000 for 3s
maxTrialTime = 6000;// Change to 120000 for 2min
timer$ = timer(0, this.interval).pipe(
tap((i) => {
if(this.maxTrialTime/this.interval < i) { this.stoppedSubject$.next()}
}),
takeUntil(this.stopped$),
repeatWhen(() => this.started$)
)
apiOneCall$ = this.userServ.start(this.data);
apiTwoCall$ = this.apiOneCall$.pipe(
switchMap(({Id}) => Id ? this.Service.getStatus(Id): throwError('No Id')),
tap((res) => this.currentStatusSubject$.next(res)),
tap(res => console.log({res})),
tap((res) => {if(res === 'created') {this.stoppedSubject$.next()}})
)
trialCallsToApiTwo$ = this.timer$.pipe(mergeMap(() => this.apiTwoCall$))
In your Html you can use the async pipe
Show Progress : {{ showProgressAction$ | async }} <br>
Timer: {{ timer$ | async }}<br>
Response: {{ trialCallsToApiTwo$ | async }}<br>
<button (click)="startedSubject$.next()">Start</button><br>
<button (click)="stoppedSubject$.next()">Stop</button><br>
Explanation
We begin by setting up the properties id, Values and data being a combination of the 2 values
id = 1;
Values = { info: true };
get data() { return { id: this.id,info: this.Values.info}}
We then create a Subject to help with tracking of the progress of the operations. I am using BehaviorSubject to set the initial value of showing Progress to true.
We will use currentStatus$ to store whether current state is 'in_progress' or 'created'
stopped$ and started will control our observable stream.
You may have a look at the below post What is the difference between Subject and BehaviorSubject?
showProgressSubject$ = new BehaviorSubject(true);
showProgressAction$ = this.showProgressSubject$.asObservable();
currentStatus$ = this.currentStatusSubject$.asObservable()
stoppedSubject$ = new Subject();
stopped$ = this.stoppedSubject$.asObservable();
startedSubject$ = new Subject();
started$ = this.startedSubject$.asObservable();
Next we define interval = 500; // Change to 3000 for 3s and maxTrialTime = 6000;// Change to 120000 for 2min
We then define a timer$ observable using the timer operator. The operator is used to generate a stream of values at regular interval
We set the delay to 0 and the interval to interval property we had earlier created
We then tap into the observable stream. The tap operator allows us perform an operation without changing the observable stream
In our tap operator, we check whether the maximum time has been reached and if it has we call the next function on stoppedSubject$. We pipe our stream to takeUntil(this.stopped$) to stop the stream and repeatWhen(() => this.started$) to restart the stream
timer$ = timer(0, this.interval).pipe(
tap((i) => {
if(this.maxTrialTime/this.interval < i) { this.stoppedSubject$.next()}
}),
takeUntil(this.stopped$),
repeatWhen(() => this.started$)
)
The Remaining part is to make a call to the apis
We will use switchMap to combine the two observables. switchMap will cancel any earlier request if a new request is made. If this is not your desired behaviour you may consider exhaustMap or the mergeMap operators
From the result of apiOneCall$ if no id, we use the throwError operator to indicate an error otherwise we return a call to apiTwo
We tap into the result of apiTwoCall$ and call the next function on currentStatusSubject$ passing in the response. This sets the value of currentStatus$ to the result of the response
The line tap((res) => {if(res === 'created') {this.stoppedSubject$.next()}}) taps into the result of apiTwoCall$ and if it is 'created' it stops the timer
apiOneCall$ = this.userServ.start(this.data);
apiTwoCall$ = this.apiOneCall$.pipe(
switchMap(({Id}) => Id ? this.Service.getStatus(Id): throwError('No Id')),
tap((res) => this.currentStatusSubject$.next(res)),
tap(res => console.log({res})),
tap((res) => {if(res === 'created') {this.stoppedSubject$.next()}})
)
Now we finally combine the timer$ and apiTwoCall$ with mergeMap operator trialCallsToApiTwo$ = this.timer$.pipe(mergeMap(() => this.apiTwoCall$))
In Our HTML we can then use the async pipe to avoid worrying about unsubscribing
{{ trialCallsToApiTwo$ | async }}
I'd use expand from rxjs, which will pass through the result of the source observable, but also let's you act according to the content of the result.
Also, avoid nesting calls to subscribe whenever possible. Consider this example code for reference:
this.userServ.start(data).pipe(
// use switchMap to not have 'nested' subscribe-calls
switchMap((result) => {
if (result['Id']) {
// if there is an ID, ask for the status
return this.Service.getStatus(result['Id']).pipe(
// use the expand operator to do additional processing, if necessary
expand((response) => response['Status'] === 'In_Progress'
// if the status is 'In_Progress', don't repeat the API call
? EMPTY
// otherwise, re-run the API call
: this.Service.getStatus(result['Id']).pipe(
// don't re-run the query immediately, instead, wait for 3s
delay(3000)
)
),
// Stop processing when a condition is met, in this case, 60s pass
takeUntil(timer(60000).pipe(
tap(() => {
// handle the timeout here
})
))
);
} else {
// if there is no ID, complete the observable and do nothing
return EMPTY;
}
}),
/**
* Since expand doesn't filter anything away, we don't want results that
* don't have the status 'In_Progress' to go further down for processing
*/
filter((response) => response['Status'] === 'In_Progress')
).subscribe(
(response) => {
this.Start();
}, (error) => {
console.log(error)
}
);
I am doing a search in the textfield and as I type, there is a call going to the backend after say 100ms.
For example, if we search "5041" and immediately search for "50" and again make it "5041", then there are 3 calls made to the backend.
1."5041" -> Promise 1
2."50" -> Promise 2
3."5041" -> Promise 3
However, promise 3 (web call takes 200ms) resolves before promise 2 (web call takes 500ms) which makes the screen reflect results for promise 2 ("50") when all I have in the textfield is "5041".
I need some way to let user type in the textfield without blocking the user along with the ability to show results for only the last call.
This is something that can be achieved using switchMap from rxjs in an angular app. However I need a way to achieve the same in vanilla JS.
First you can wrap your fetchData function into a something like fetchLatestSearchResults function which notes the time when network call was made and return the latest result from all the network calls(irrespective of what data was returned from server)
const generateLatestSearchFetch = function(fetchFunc){
let mostRecentResult = null;
let mostRecentResultFetchTime = Date.now();
return (...args) => {
const myFetchStartTime = Date.now();
return fetchFunc(...args)
.then(data => {
if (myFetchStartTime > mostRecentResultFetchTime) {
mostRecentResult = data;
mostRecentResultFetchTime = myFetchStartTime
}
return mostRecentResult;
});
}
};
Use Like:
fetchData = generateLatestSearchFetch(fetchData);
fetchData('10'); // resolves first and returns result for 10
fetchData('102'); // resolves third and returns result for 1024
fetchData('1024'); // resolves second and returns result for 1024
Last but not the least, use debounce more on this to optimize number of network calls made for every type event.
You need a "last" function:
// takes a function returning a promise and only reports the last resort
function last(fn) {
let p;
return function(...args) {
let current = fn(); // call the function
p = current; // mark it as the last call
return p.then(result => {
// ask am I still the last call?
if (p === current) return result;
else return new Promise(() => {}); // never resolve
});
}
}
let onlyLastSearch = last((name) => fetch('/api?name=' + name));
onlyLastSearch('a'); // will be ignored
onlyLastSearch('b'); // will be ignored
onlyLastSearch('c'); // only relevant result
You can use observer pattern for this.
const createWrapper = (fn) => {
let counter = 0;
let lastFetchId = 0;
const listeners = [];
return {
fetch: (str) => {
let id = ++counter;
fn(str).then((data) => {
if(id > lastFetchId) {
listeners.forEach(fn => {
fn(data);
});
lastFetchId = id;
}
});
},
listen: (fn) => {
listeners.push(fn);
return () => {
const index = listeners.indexOf(fn);
listeners.splice(index, 1);
};
}
}
}
const SearchWrapper = createWrapper(fetchData);
SearchWrapper.fetch('a');
SearchWrapper.fetch('b');
SearchWrapper.fetch('c');
SearchWrapper.listen((data) => {
console.log(data);
})
I periodically have to download/parse a bunch of Json data, about 1000~1.000.000 lines.
Each request has a chunk limit of 5000. So I would like to fire of a bunch of request at the time, stream each output through its own Transfomer for filtering out the key/value's and then write to a combined stream that writes its output to the database.
But with every attempt it doesn't work, or it gives errors because to many event listeners are set. What seems correct if I understand the the 'last pipe' is always the reference next in the chain.
Here is some code (changed it lot of times so could make little sense).
The question is: Is it bad practice to join multiple streams to one? Google also doesn't show a whole lot about it.
Thanks!
brokerApi/getCandles.js
// The 'combined output' stream
let passStream = new Stream.PassThrough();
countChunks.forEach(chunk => {
let arr = [];
let leftOver = '';
let startFound = false;
let lastPiece = false;
let firstByte = false;
let now = Date.now();
let transformStream = this._client
// Returns PassThrough stream
.getCandles(instrument, chunk.from, chunk.until, timeFrame, chunk.count)
.on('error', err => console.error(err) || passStream.emit('error', err))
.on('end', () => {
if (++finished === countChunks.length)
passStream.end();
})
.pipe(passStream);
transformStream._transform = function(data, type, done) {
/** Treansform to typedArray **/
this.push(/** Taansformed value **/)
}
});
Extra - Other file that 'consumes' the stream (writes to DB)
DataLayer.js
brokerApi.getCandles(instrument, timeFrame, from, until, count)
.on('data', async (buf: NodeBuffer) => {
this._dataLayer.write(instrument, timeFrame, buf);
if (from && until) {
await this._mapper.update(instrument, timeFrame, from, until, buf.length / (10 * Float64Array.BYTES_PER_ELEMENT));
} else {
if (buf.length) {
if (!from)
from = buf.readDoubleLE(0);
if (!until) {
until = buf.readDoubleLE(buf.length - (10 * Float64Array.BYTES_PER_ELEMENT));
console.log('UNTIL TUNIL', until);
}
if (from && until)
await this._mapper.update(instrument, timeFrame, from, until, buf.length / (10 * Float64Array.BYTES_PER_ELEMENT));
}
}
})
.on('end', () => {
winston.info(`Cache: Fetching ${instrument} took ${Date.now() - now} ms`);
resolve()
})
.on('error', reject)
Check out the stream helpers from highlandjs, e.g. (untested, pseudo code):
function getCandle(candle) {...}
_(chunks).map(getCandle).parallel(5000).pipe(...)