I'm trying to write back end functionality that is handling requests to particular API, but this API has some restrictive quotas, especially for requests/sec. I want to create API abstraction layer that is able of delaying function execution if there are too many requests/s, so it works like this:
New request arrives (to put it simple - library method is invoked)
Check if this request could be executed right now, according to given limit (requests/s)
If it can't be executed, delay its execution till next available moment
If at this time a new request arrives, delay its execution further or put it on some execution queue
I don't have any constraints in terms of waiting queue length. Requests are function calls with node.js callbacks as the last param for responding with data.
I thought of adding delay to each request, which would be equal to the smallest possible slot between requests (expressed as minimal miliseconds/request), but it can be a bit inefficient (always delaying functions before sending response).
Do you know any library or simple solution that could provide me with such functionality?
Save the last request's timestamp.
Whenever you have a new incoming request, check if a minimum interval elapsed since then, if not, put the function in a queue then schedule a job (unless one was already scheduled):
setTimeout(
processItemFromQueue,
(lastTime + minInterval - new Date()).getTime()
)
processItemFromQueue takes a job from the front of the queue (shift) then reschedules itself unless the queue is empty.
The definite answer for this problem (and the best one) came from the API documentation itself. We use it for a couple of months and it perfectly solved my problem.
In such cases, instead of writing some complicated queue code, the best way is to leverage JS possibility of handling asynchronous code and either write simple backoff by yourself or use one of many great libraries to use so.
So, if you stumble upon any API limits (e.g. quota, 5xx etc.), you should use backoff to recursively run the query again, but with increasing delay (more about backoff could be found here: https://en.wikipedia.org/wiki/Exponential_backoff). And, if finally, after given amount of times you fail again - gracefully return error about unavailability of the API.
Example use below (taken from https://www.npmjs.com/package/backoff):
var call = backoff.call(get, 'https://someaddress', function(err, res) {
console.log('Num retries: ' + call.getNumRetries());
if (err) {
// Put your error handling code here.
// Called ONLY IF backoff fails to help
console.log('Error: ' + err.message);
} else {
// Put your success code here
console.log('Status: ' + res.statusCode);
}
});
/*
* When to retry. Here - 503 error code returned from the API
*/
call.retryIf(function(err) { return err.status == 503; });
/*
* This lib offers two strategies - Exponential and Fibonacci.
* I'd suggest using the first one in most of the cases
*/
call.setStrategy(new backoff.ExponentialStrategy());
/*
* Info how many times backoff should try to post request
* before failing permanently
*/
call.failAfter(10);
// Triggers backoff to execute given function
call.start();
There are many backoff libraries for NodeJS, leveraging either Promise-style, callback-style or even event-style backoff handling (example above being second of the mentioned ones). They're really easy to use if you understand backoff algorithm itself. And as the backoff parameters could be stored in config, if backoff is failing too often, they could be adjusted to achieve better results.
Related
I have an angular service that makes an Web API call out to retrieve my search results. The problem I'm having is the angular controller & UI is set up in a way that allows the search to be called multiple times per second causing the service to be queued up. I tried resolving/defer the http call when a new one comes in but it doesnt seem like the best solution. I would rather queue up all the search calls I get within a certain time period and then only execute the last one. Any ideas on how I could do that?
timeout(function(){
var length = queue.length
var item = queue[length - 1];
queue.splice(0, length);
processItem(item);
} , <yourtime:number>)
keep adding your requests to the queue. and add the processing logic to the processItem function.
this might do the needful
*note - please consider this as a pseudo code. might have compilations errors
Alternatively you can just create a bool variable which is referred every time a request is about to be made and done make the request till its true. Somethign like this
function processItem(item){
if(process){
process = false;
//YOUR ACTUAL PROCESSING CODE
}
}
$timeout(function(){
process = true;
}, <yourtime in milli seconds>)
I have a list of 50k entries that I am entering into my db.
var tickets = [new Ticket(), new Ticket(), ...]; // 50k of them
tickets.forEach(function (t, ind){
console.log(ind+1 + '/' + tickets.length);
Ticket.findOneAndUpdate({id: t.id}, t, {upsert: true}, function (err, doc){
if (err){
console.log(err);
} else {
console.log('inserted');
}
});
});
Instead of the expected interleaving of
1 / 50000
inserted
2 / 50000
inserted
I am getting all of the indices followed by all of the inserted confirmations
1 / 50000
2 / 50000
...
50000 / 50000
inserted
inserted
...
inserted
I think something is happening with process.nextTick. There is a significant slow down after a few thousand records.
Does anyone know how to get the efficient interleaving?
Instead of the expected interleaving
That would be the expected behavior only for synchronous I/O.
Remember that these operations are all asynchronous, which is a key idea of node.js. What the code does is this:
for each item in the list,
'start a function' // <-- this will immediately look at the next item
output a number (happens immediately)
do some long-running operation over the network with connection pooling
and batching. When done,
call a callback that says 'inserted'
Now the code will launch a ton of those functions that, in turn, send requests to the database. All that will happen long before the first request has even reached the database. It is likely that the OS doesn't even bother to actually send the first TCP packets before you're at, say ticket 5 or 10 or so.
To answer the question from your comment: No, the requests will be sent out relatively soon (that is up to the OS), but the results won't reach your single-threaded javascript code before your loop hasn't finished queuing up the 50k entries. This is because the forEach is your currently running piece of code, and all events that come in while it's running will be processed only after it's finished - you'll observe the same if you use setTimeout(function() { console.log("inserted... not") }, 0) instead of the actual DB call, because setTimeout is also an async event.
To make your code fully async, your data source should be some kind of (async) iterator that provides data, instead of a huge array of items.
You are running into the wonders of node's asynchronicity. It's sending the upsert requests off into the ether, then continuing on to the next record without waiting for a response. Does it matter, as it's just an informational message that is not in sync with the upsert. You might want to use the Async library to flip through your array, if you need to make sure they are done in order.
I'm trying to test which Database, which can implemented in Node.js has the fastest time in certain tasks.
I'm working with a few DBs here, and every time I try to time a for loop, the timer quickly ends, but the task wasn't actually finished. Here is the code:
console.time("Postgre_write");
for(var i = 0; i < limit; i++){
var worker = {
id: "worker" + i.toString(),
ps: Math.random(),
com_id: Math.random(),
status: (i % 3).toString()
}
client.query("INSERT INTO test(worker, password, com_id, status) values($1, $2, $3, $4)", [worker.id, worker.ps, worker.com_id, worker.status]);
}
console.timeEnd("Postgre_write");
when I tried like a few hundred I thought the results were true. But when I test higher numbers like 100,000, the console outputs 500ms, but when I check the PGadmin app, the inserts were still processing.
Obviously I got something wrong here, but I don't know what. I rely heavily on the timer data and I need to find a way to time these operations correctly. Help?
Node.js is asynchronous. This means that your code will not necessarily execute in the order that you have written it.
Take a look at this article http://www.i-programmer.info/programming/theory/6040-what-is-asynchronous-programming.html which offers a pretty decent explanation of asynchronous programming.
In your code, your query is sent to the database and begins to execute. Since Node is asynchronous, it does not wait for that line to finish executing, but instead goes to the next line (which stops your timer).
When your query is finished running, a callback will occur and notify Node that the query has finished. If you specify a callback function it will be run at that point. Think of event-driven programming, which the query's completion being an event.
To solve your problem, stop the timer in the callback function of the query.
I have a Worker in which I want to execute my sql queries. But, and that is my problem, I want all these queries to be executed within the same transaction. This is how I have my (not working) Worker right now:
db = openDatabase("WorkerFoo", "", "", 1);
if (db) {
db.transaction(function (tx) {
self.onmessage = function(e) {
tx.executeSql(e.data, [], function(tx, rs){
self.postMessage(rs.rows.item(0)) ;
}) ;
};
}) ;
}
else {
self.postMessage('No WebSql support in Worker') ;
}
However, doing it this way, nothing happens (no errors). Any suggestions how to fix this ?
An other (related) question I have is, if a query is blocking the UI thread, because the query is heavy and takes a couple of seconds, will the execution of the query within a Worker fix this problem ?
Thanks a lot!
To answer your questions:
The query should not block the UI thread, even if not executed in the web-worker, because it is async (assuming that the target computer has enough multi-threading power that is). JavaScript thrives on non-blocking asynchronous IO.
You can, for example, pass the SQL code itself to the worker, and have transactionStart and transactionEnd messages, and only execute the code after receiving a transactionEnd.
Note, the WebSQL specification is no longer under work.
You might want to consider IndexedDB, its methods also return without blocking the calling thread. (Again, no web workers are needed, it does however has a synchronous version if you'd like which you can use with WebWorkers (but I think noone implementes yet)).
Good luck!
My situation ...
I have a set of workers that are scheduled to run periodically, each at different intervals, and would like to find a good implementation to manage their execution.
Example: Let's say I have a worker that goes to the store and buys me milk once a week. I would like to store this job and it's configuration in a mysql table. But, it seems like a really bad idea to poll the table (every second?) and see which jobs are ready to be put into the execution pipeline.
All of my workers are written in javascript, so I'm using node.js for execution and beanstalkd as a pipeline.
If new jobs (ie. scheduling a worker to run at a given time) are being created asynchronously and I need to store the job result and configuration persistently, how do I avoid polling a table?
Thanks!
I agree that it seems inelegant, but given the way that computers work something *somewhere* is going to have to do polling of some kind in order to figure out which jobs to execute when. So, let's go over some of your options:
Poll the database table. This isn't a bad idea at all - it's probably the simplest option if you're storing the jobs in MySQL anyway. A rate of one query per second is nothing - give it a try and you'll notice that your system doesn't even feel it.
Some ideas to help you scale this to possibly hundreds of queries per second, or just keep system resource requirements down:
Create a second table, 'job_pending', where you put the jobs that need to be executed within the next X seconds/minutes/hours.
Run queries on your big table of all jobs only once in a longer while, then populate the small table which you query every shorter while.
Remove jobs that were executed from the small table in order to keep it small.
Use an index on your 'execute_time' (or whatever you call it) column.
If you have to scale even further, keep the main jobs table in the database, and use the second, smaller table I suggest, just put that table in RAM: either as a memory table in the DB engine, or in a Queue of some kind in your program. Query the queue at extremely short intervals if you have too - it'll take some extreme use cases to cause any performance issues here.
The main issue with this option is that you'll have to keep track of jobs that were in memory but didn't execute, e.g. due to a system crash - more coding for you...
Create a thread for each of a bunch of jobs (say, all jobs that need to execute in the next minute), and call thread.sleep(millis_until_execution_time) (or whatever, I'm not that familiar with node.js).
This option has the same problem as no. 2 - where you have to keep track job execution for crash recovery. It's also the most wasteful imo - every sleeping job thread still takes system resources.
There may be additional options of course - I hope that others answer with more ideas.
Just realize that polling the DB every second isn't a bad idea at all. It's the most straightforward way imo (remember KISS), and at this rate you shouldn't have performance issues so avoid premature optimizations.
Why not have a Job object in node.js that's saved to the database.
var Job = {
id: long,
task: String,
configuration: JSON,
dueDate: Date,
finished: bit
};
I would suggest you only store the id in RAM and leave all the other Job data in the database. When your timeout function finally runs it only needs to know the .id to get the other data.
var job = createJob(...); // create from async data somewhere.
job.save(); // save the job.
var id = job.id // only store the id in RAM
// ask the job to be run in the future.
setTimeout(Date.now - job.dueDate, function() {
// load the job when you want to run it
db.load(id, function(job) {
// run it.
run(job);
// mark as finished
job.finished = true;
// save your finished = true state
job.save();
});
});
// remove job from RAM now.
job = null;
If the server ever crashes all you have to is query all jobs that have [finished=false], load them into RAM and start the setTimeouts again.
If anything goes wrong you should be able to restart cleanly like such:
db.find("job", { finished: false }, function(jobs) {
each(jobs, function(job) {
var id = job.id;
setTimeout(Date.now - job.dueDate, function() {
// load the job when you want to run it
db.load(id, function(job) {
// run it.
run(job);
// mark as finished
job.finished = true;
// save your finished = true state
job.save();
});
});
job = null;
});
});