Batch a stream of requests into promises, grouped by time interval

Batch a stream of requests into promises, grouped by time interval - javascript

I have an api endpoint that receives a large volume of requests from various sources.
For every request received, I create a promise that invokes a internal api.
I want to batch these promises by source, where each batch contains at most 10 seconds of requests.
How can this be done?

If you have multiple requests from multiple sources you may just keep placing them into a Map object where keys being sources and values being received requests collected in an array. Such as let myMap be something like;
{source1: [req1,req2,req3],
source2: [req1,req2],
.
.
sourceN: [req1,req2,...,reqm]}
You may set up a pseudo recursive setTimeout loop to invoke your internal API.
var apiInterval = 10000;
function runner(){
setTimeout(mv => { Promise.all(mv.map(reqs => Promise.all(reqs.map(req => apiCall(req)))))
.then(pss => pss.map(ps => ps.map(p => p.then(r => doSomethingWithEachApiCallResult(r)))));
clearMapValues(); // to be filled in the next 10 seconds
runner();
}, apiInterval, myMap.values.slice());
}
Please take above as a pseudo code just to give you an idea. For instance Map.values return an iterator object and you may need to turn it into an array like [...myMap.values()] before using .map() or .slice() over it.
This is a little better than setInterval way of looping as you may change the interval value dynamically depending on the workload or whatnot.

I propose the following solution.
It uses a Map to store a string key and a array of values.
It uses setTimeout for every map key to flush the values of that map key to a callback.
Code
/**
* A stream of requests come from various sources, can be transposed into a batch indexed
* by the source of the request.
*
* The size of each batch is defined by a time interval. I.e. any request received within the
* time interval is stored in a batch.
*/
export class BatchStream<K, V> {
cache: Map<K, V[]>
flushRate: number
onBatch: (k: K, v: V[]) => Promise<void>
debug: boolean
constructor(onBatch: (k: K, v: V[]) => Promise<void>, flushRate = 5000, debug = false) {
this.cache = new Map<K, V[]>()
this.onBatch = onBatch
this.debug = debug
this.flushRate = flushRate
this.flush = this.flush.bind(this)
}
push(k: K, v: V) {
if (this.cache.has(k)) {
let batch = this.cache.get(k)
batch.push(v)
this.cache.set(k, batch)
} else {
this.cache.set(k, [v])
setTimeout(this.flush, this.flushRate, k)
}
}
flush(k: K) {
this.debug && console.log("Flush", k)
let batch = this.cache.get(k)
this.cache.delete(k)
this.onBatch(k, batch)
this.debug && console.log("Size", this.cache.size)
}
}
Test
it("BatchStream", (done) => {
let sources = []
let iterations = 10
let jobs = []
let jobsDone = 0
let debug = true
// Prepare sources
for (let i = 97; i < 123; i++) {
sources.push(String.fromCharCode(i))
}
// Prepare a stream of test data
for (let k of sources) {
for (let i = 0; i < iterations; i++) {
jobs.push({ k, v: k + i.toString() })
}
}
shuffle(jobs)
// Batch handler
let onBatch = (k: string, v: string[]) => {
return new Promise<void>((resolve, reject) => {
jobsDone += v.length
debug && console.log(" --> " + k, v.length, v.join(","), jobsDone, sources.length * iterations)
if (jobsDone == sources.length * iterations) {
done()
}
resolve()
})
}
let batchStream = new BatchStream<string, string>(onBatch, 5000, debug)
// Stream test data into batcher
let delay = 0
for (let j of jobs) {
delay += 100
setTimeout(() => {
batchStream.push(j.k, j.v)
}, delay)
}
})

Related

What is the fastest format for a key in a JS object [Improved]

Backround Info
For a program I'm working on I need to track the path taken through a web. In this case, a web is defined a series of nodes, each node having zero or more child nodes. It's a web and not a tree because any node can point to any other and can be circular.
My program will start at one "entry point" node, and traverse through the web until it has taken a path that is considered "valid". All valid paths are stored in a series of nested maps, each map containing the keys of all possible next steps.
For example:
{ 0: {1: "success"} }
This nested map defines the path:
entryNode.children[0].children[1]
I have a minimal example of the traversal algorithm for benchmarking purposes:
// you can ignore this, it just helps me get some more info on the results
function getStandardDeviation (array) {
const n = array.length
const mean = array.reduce((a, b) => a + b) / n
return Math.sqrt(array.map(x => Math.pow(x - mean, 2)).reduce((a, b) => a + b) / n)
}
//// values that can be converted to a 1-digit base-36 number
// let list = [30, 31, 32]
//// without base-36: 411ms
//// with base-36: 2009ms
//// values that can be converted to a 2-digit base-36 number
// let list = [36, 37, 38]
//// without base-36: 391ms
//// with base-36: 1211ms
//// arbitrary large numbers
let list = [10000, 10001, 10002]
//// without base-36: 4764ms
//// with base-36: 1954ms
//// I tried encoding to base 36 to reduce the key length, hence the keys like '1o' and '1p'
//// This seems to hurt the performance of short numbers, but help the performance of large ones
// list = list.map(n => n.toString(36))
let maps = {}
let currentMap = maps
list.forEach((n, i) => {
if (i === list.length - 1) {
currentMap[n] = "res1"
} else {
const tempMap = {}
currentMap[n] = tempMap
currentMap = tempMap
}
})
console.log(maps)
// store samples for stdev
let times = []
const samples = 1000
const operations = 100000
// collect samples for stdev calculation
for (let k = 0; k < samples; k++) {
const begin = process.hrtime()
// dummy variable to simulate doing something with the result
let c = ""
let current = maps
for (let i = 0; i < operations; i++) {
// simulate what the final algorithm does
for (let j = 0; j < list.length; j++) {
current = current[list[j]]
if (typeof current === 'string') {
c = current
}
}
current = maps
}
const end = process.hrtime()
// get the ms difference between start and end
times.push((end[0] * 1000 + end[1] / 1000000) - (begin[0] * 1000 + begin[1] / 1000000));
}
const stdev = getStandardDeviation(times)
let total = 0;
times.forEach(t => total += t)
console.log("Time in millisecond is: ", total.toFixed(2), `+-${stdev.toFixed(2)}ms (${(total - stdev).toFixed(2)}, ${(total + stdev).toFixed(2)})`)
The Question
While testing, I wondered if using shorter keys would be faster, since I'm guessing JS hashes them somehow before doing the lookup. And I found that different object keys resulted in drastically different performance, varying by about an order of magnitude, with the only difference being the size/characters used in map's keys. There's not an obvious pattern that I can see, though.
I laid out the different input lists and their results in the top of the benchmark source, but here's the actual maps used and their respective times:
// raw numbers
{ '30': { '31': { '32': 'res1' } } }
Time in millisecond is: 411.00 +-0.13ms (410.86, 411.13)
// converted to base-36
{ u: { v: { w: 'res1' } } }
Time in millisecond is: 2009.91 +-0.18ms (2009.72, 2010.09)
// raw numbers
{ '36': { '37': { '38': 'res1' } } }
Time in millisecond is: 391.52 +-0.16ms (391.36, 391.69)
// converted to base-36
{ '10': { '11': { '12': 'res1' } } }
Time in millisecond is: 1211.46 +-0.19ms (1211.27, 1211.65)
// raw numbers
{ '10000': { '10001': { '10002': 'res1' } } }
Time in millisecond is: 4764.09 +-0.17ms (4763.93, 4764.26)
// converted to base-36
{ '7ps': { '7pt': { '7pu': 'res1' } } }
Time in millisecond is: 1954.07 +-0.17ms (1953.90, 1954.25)
Why do these differeny keys result in such wildly different timings? I've tested it a lot and they are quite consistent
Note:
I'm using Node V16.15.0 for benchmarking

How to use multiple promises in recursion?

I am trying to solve the problem where the script enters a website, takes the first 10 links from it and then goes on those 10 links and then goes on to the next 10 links found on each of these 10 previous pages. Until the number of visited pages will be 1000.
This is what it looks like:
I was trying to get this by using for loop inside promise and recursion, this is my code:
const rp = require('request-promise');
const url = 'http://somewebsite.com/';
const websites = []
const promises = []
const getOnSite = (url, count = 0) => {
console.log(count, websites.length)
promises.push(new Promise((resolve, reject) => {
rp(url)
.then(async function (html) {
let links = html.match(/https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*)/g)
if (links !== null) {
links = links.splice(0, 10)
}
websites.push({
url,
links,
emails: emails === null ? [] : emails
})
if (links !== null) {
for (let i = 0; i < links.length; i++) {
if (count < 3) {
resolve(getOnSite(links[i], count + 1))
} else {
resolve()
}
}
} else {
resolve()
}
}).catch(err => {
resolve()
})
}))
}
getOnSite(url)

I think you might want a recursive function that takes three arguments:
an array of urls to extract links from
an array of the accumulated links
a limit for when to stop crawling
You'd kick it off by calling it with just the root url, and await all of the returned promises:
const allLinks = await Promise.all(crawl([rootUrl]));
On the initial call the second and third arguments could assume default values:
async function crawl (urls, accumulated = [], limit = 1000) {
...
}
The function would fetch each url, extract its links, and recurse until it hit the limit. I haven't tested any of this, but I'm thinking something along these lines:
// limit the number of links per page to 10
const perPageLimit = 10;
async function crawl (urls, accumulated = [], limit = 1000) {
// if limit has been depleted or if we don't have any urls,
// return the accumulated result
if (limit === 0 || urls.length === 0) {
return accumulated;
}
// process this set of links
const links = await Promise.all(
urls
.splice(0, perPageLimit) // limit to 10
.map(url => fetchHtml(url) // fetch the url
.then(extractUrls)); // and extract its links
);
// then recurse
return crawl(
links, // newly extracted array of links from this call
[...accumulated, links], // pushed onto the accumulated list
limit - links.length // reduce the limit and recurse
);
}
async fetchHtml (url) {
//
}
const extractUrls = (html) => html.match( ... )

Storing the date in the array of objects in for loop

I was working on a project where I am required to go through an array of objects after fetching the data, manipulate them and then save those that fulfill certain conditions.
Everything else was working, but the storing of the date in the object.
So basically, the code is like this (dateAdd taken from https://stackoverflow.com/a/1214753/12316962):
var baddata = []
fetch(fetch_url)
.then((response) => {
if (response.ok) {
response.json()
.then(data => data = data.workOpportunities)
.then(() => {
for (let i = 0; i < data.length; i++) {
data[i]['rooms'] = 0
data[i]['ontime'] = 0
data[i]['ontime'] = dateAdd(
new Date(time),
'second',
Math.round(data[i].distance.value / 50 * 60 * 60)
)
}
for (let i = 0; i < data.length; i++) {
if (/*checking ontime and other conditions*/) {
//do something
}
//do not do anything and push the data to baddata array
}
}
)
}
}).catch((err) => console.error(err))
data[i]['ontime'] is showing 'invalid date{}', and so the "if statements" are failing.
originally, I had only one loop for the whole thing, it was the same.
When I run the below, it works fine outside the loop.
data[0]['ontime'] = dateAdd(
new Date(time),
'second',
Math.round(data[0].distance.value / 50 * 60 * 60)
)
UPDATE:
I populated the array with all objects I had gotten so far.
It seems like the loop is running correctly if i run it as it is.
Does this have to do with an asynch function/fetch?
Where is the mess-up?

break up buffer into size rxjs

I have an observable get data from stream each time at size 512 each next I have to break it up to 200 char at other observable and keep [12] char in other buffer to concatenate with next block, I solve it by using new subject and for loop, I believe there maybe a better, more pretty solution.
received Observable ----------------------------------------
1st next [512] -------> [112] [200] [200] -------> [200] [200]
2nd next [512][112] --> [24][200][200] [88+112] --> [200] [200]
3rd next [512][24] --> [136] [200] [76+124] .....
nth iteration [512][194] --> [106][200][200][106+94] --> [200][200][200]
n+1th [512][6].......
maxValueSize = 200
this._sreamRecord$.subscribe(
{
next: (val) => {
const bufferToSend: Buffer = Buffer.concat([completationBuffer, val])
for (let i = 0; i < bufferToSend.length; i += maxValueSize) {
if (bufferToSend.length - i > maxValueSize) {
bufferStreamer.next(bufferToSend.slice(i, i + maxValueSize))
} else {
completationBuffer = bufferToSend.slice(i, i + maxValueSize)
}
}
},
complete() {
if (completationBuffer.length) {
bufferStreamer.next(completationBuffer)
}
bufferStreamer.complete()
}
})

You may want to consider a solution along these lines
const splitInChunksWithRemainder = (remainder: Array<any>) => {
return (streamRecord: Array<any>) => {
const streamRecordWithRemainder = remainder.concat(streamRecord);
let chunks = _.chunk(streamRecordWithRemainder, maxValueSize);
const last = chunks[chunks.length - 1];
let newRemainder = [];
if (last.length != maxValueSize) {
newRemainder = chunks[chunks.length - 1];
chunks.length = chunks.length - 1;
}
return {chunks, newRemainder};
};
}
let f = splitInChunksWithRemainder([]);
this._sreamRecord$.pipe(
switchMap(s => {
const res = f(s);
f = splitInChunksWithRemainder(res.newRemainder);
return from(res.chunks);
})
)
.subscribe(console.log);
The idea is to split each streamRecord with lodash chunk function after having concatenated the previous remainder, i.e. the array left as tail from the split of the previous streamRecord.
This is done using the function splitInChunksWithRemainder, which is an higher level function, i.e. a function which returns a function, in this case after having set the remainder coming from the previous split.
UPDATE after comment
If you need to emit also the last newRemainder, than you can consider a slightly more complex solution such as the following
const splitInChunksWithRemainder = (remainder: Array<any>) => {
return (streamRecord: Array<any>) => {
const streamRecordWithRemainder = remainder.concat(streamRecord);
let chunks = _.chunk(streamRecordWithRemainder, maxValueSize);
const last = chunks[chunks.length - 1];
let newRemainder = [];
if (last.length != maxValueSize) {
newRemainder = chunks[chunks.length - 1];
chunks.length = chunks.length - 1;
}
return {chunks, newRemainder};
};
}
const pipeableChain = () => (source: Observable<any>) => {
let f = splitInChunksWithRemainder([]);
let lastRemainder: any[];
return source.pipe(
switchMap(s => {
const res = f(s);
lastRemainder = res.newRemainder;
f = splitInChunksWithRemainder(lastRemainder);
return from(res.chunks);
}),
concat(defer(() => of(lastRemainder)))
)
}
_streamRecord$.pipe(
pipeableChain()
)
.subscribe(console.log);
We have introduced the pipeableChain function. In this function we save the remainder which is returned by the execution of splitInChunksWithRemainder. Once the source Observable completes, we add a last notification via the concat operator.
As you see, we have also to use the defer operator to make sure we create the Observable only when the Observer subscribes, i.e. after the source Observable completes. Without defer the Observable passed to concat as parameter would be created when the source Observable is initially subscribed, i.e. when lastRemainder is still undefined.

Bluebird promise: why it doesn't timeout?

So, I'm trying to model some long computation. for this purpose I'm computing the fibonacci number. In case when computation takes to much time I need to reject it.
The question: why TimeoutErrror handler doesn't work? How to fix the code?
const expect = require('chai').expect
const Promise = require('bluebird')
function profib(n, prev = '0', cur = '1') {
return new Promise.resolve(n < 2)
.then(function(isTerm) {
if(isTerm) {
return cur
} else {
n = n - 2
return profib(n, cur, strAdd(cur, prev));
}
})
}
const TIMEOUT = 10000
const N = 20000
describe('recursion', function() {
it.only('cancelation', function() {
this.timeout(2 * TIMEOUT)
let prom = profib(N).timeout(1000)
.catch(Promise.TimeoutError, function(e) {
console.log('timeout', e)
return '-1'
})
return prom.then((num) => {
expect(num).equal('-1')
})
})
})
const strAdd = function(lnum, rnum) {
lnum = lnum.split('').reverse();
rnum = rnum.split('').reverse();
var len = Math.max(lnum.length, rnum.length),
acc = 0;
res = [];
for(var i = 0; i < len; i++) {
var subres = Number(lnum[i] || 0) + Number(rnum[i] || 0) + acc;
acc = ~~(subres / 10); // integer division
res.push(subres % 10);
}
if (acc !== 0) {
res.push(acc);
}
return res.reverse().join('');
};
Some info about environment:
➜ node -v
v6.3.1
➜ npm list --depth=0
├── bluebird#3.4.6
├── chai#3.5.0
└── mocha#3.2.0

If I'm reading your code correctly profib does not exit until it's finished.
Timeouts are not interrupts. They are just events added to the list of events for the browser/node to run. The browser/node runs the next event when the code for the current event finishes.
Example:
setTimeout(function() {
console.log("timeout");
}, 1);
for(var i = 0; i < 100000; ++i) {
console.log(i);
}
Even though the timeout is set for 1 millisecond it doesn't appear until after the loop finishes (Which takes about 5 seconds on my machine)
You can see the same problem with a simple forever loop
const TIMEOUT = 10000
describe('forever', function() {
it.only('cancelation', function() {
this.timeout(2 * TIMEOUT)
while(true) { } // loop forever
})
})
Run in with your environment and you'll see it never times out. JavaScript does not support interrupts, it only supports events.
As for fixing the code you need to insert a call to setTimeout. For example, let's change forever loop so it exits (and therefore allows other events)
const TIMEOUT = 100
function alongtime(n) {
return new Promise(function(resolve, reject) {
function loopTillDone() {
if (n) {
--n;
setTimeout(loopTillDone);
} else {
resolve();
}
}
loopTillDone();
});
}
describe('forever', function() {
it.only('cancelation', function(done) {
this.timeout(2 * TIMEOUT)
alongtime(100000000).then(done);
})
})
Unfortunately using setTimeout is really a slow operation and arguably shouldn't be used in a function like profib. I don't really know what to suggest.

The problem appears because promises work in a "greedy" manner(it's my own explanation). For this reason function profib doesn't release event loop. To fix this issue I need to release event loop. The easiest way to do that with Promise.delay():
function profib(n, prev = '0', cur = '1') {
return new Promise.resolve(n < 2)
.then(function(isTerm) {
if(isTerm) {
return cur
} else {
n = n - 2
return Promise.delay(0).then(() => profib(n, cur, strAdd(cur, prev));
}
})
}

gman has already explained why your idea doesn't work. The simple and efficient solution would be to add a condition in your loop that checks the time and breaks, like thus :
var deadline = Date.now() + TIMEOUT
function profib(n, prev = '0', cur = '1') {
if (Date.now() >= deadline) throw new Error("timed out")
// your regular fib recursion here
}
Calling profib will either eventually return the result, or throw an error. However, it will block any other JavaScript from running while doing the calculation. Asynchronous execution isn't the solution here. Or at least, not all of it. What you need for such CPU-intensive tasks is a WebWorker to run it in another JavaScript context. Then you can wrap your WebWorker's communication channel in a Promise to get the API you envisioned originally.

Develop Reference

JavaScript is the programming language of the Web.

Batch a stream of requests into promises, grouped by time interval - javascript

I have an api endpoint that receives a large volume of requests from various sources. For every request received, I create a promise that invokes a internal api. I want to batch these promises by source, where each batch contains at most 10 seconds of requests. How can this be done?

Related

What is the fastest format for a key in a JS object [Improved]

How to use multiple promises in recursion?

Storing the date in the array of objects in for loop

break up buffer into size rxjs

Bluebird promise: why it doesn't timeout?

Categories

Resources