Database Backed Work Queue

Database Backed Work Queue - javascript

My situation ...
I have a set of workers that are scheduled to run periodically, each at different intervals, and would like to find a good implementation to manage their execution.
Example: Let's say I have a worker that goes to the store and buys me milk once a week. I would like to store this job and it's configuration in a mysql table. But, it seems like a really bad idea to poll the table (every second?) and see which jobs are ready to be put into the execution pipeline.
All of my workers are written in javascript, so I'm using node.js for execution and beanstalkd as a pipeline.
If new jobs (ie. scheduling a worker to run at a given time) are being created asynchronously and I need to store the job result and configuration persistently, how do I avoid polling a table?
Thanks!

I agree that it seems inelegant, but given the way that computers work something *somewhere* is going to have to do polling of some kind in order to figure out which jobs to execute when. So, let's go over some of your options:
Poll the database table. This isn't a bad idea at all - it's probably the simplest option if you're storing the jobs in MySQL anyway. A rate of one query per second is nothing - give it a try and you'll notice that your system doesn't even feel it.
Some ideas to help you scale this to possibly hundreds of queries per second, or just keep system resource requirements down:
Create a second table, 'job_pending', where you put the jobs that need to be executed within the next X seconds/minutes/hours.
Run queries on your big table of all jobs only once in a longer while, then populate the small table which you query every shorter while.
Remove jobs that were executed from the small table in order to keep it small.
Use an index on your 'execute_time' (or whatever you call it) column.
If you have to scale even further, keep the main jobs table in the database, and use the second, smaller table I suggest, just put that table in RAM: either as a memory table in the DB engine, or in a Queue of some kind in your program. Query the queue at extremely short intervals if you have too - it'll take some extreme use cases to cause any performance issues here.
The main issue with this option is that you'll have to keep track of jobs that were in memory but didn't execute, e.g. due to a system crash - more coding for you...
Create a thread for each of a bunch of jobs (say, all jobs that need to execute in the next minute), and call thread.sleep(millis_until_execution_time) (or whatever, I'm not that familiar with node.js).
This option has the same problem as no. 2 - where you have to keep track job execution for crash recovery. It's also the most wasteful imo - every sleeping job thread still takes system resources.
There may be additional options of course - I hope that others answer with more ideas.
Just realize that polling the DB every second isn't a bad idea at all. It's the most straightforward way imo (remember KISS), and at this rate you shouldn't have performance issues so avoid premature optimizations.

Why not have a Job object in node.js that's saved to the database.
var Job = {
id: long,
task: String,
configuration: JSON,
dueDate: Date,
finished: bit
};
I would suggest you only store the id in RAM and leave all the other Job data in the database. When your timeout function finally runs it only needs to know the .id to get the other data.
var job = createJob(...); // create from async data somewhere.
job.save(); // save the job.
var id = job.id // only store the id in RAM
// ask the job to be run in the future.
setTimeout(Date.now - job.dueDate, function() {
// load the job when you want to run it
db.load(id, function(job) {
// run it.
run(job);
// mark as finished
job.finished = true;
// save your finished = true state
job.save();
});
});
// remove job from RAM now.
job = null;
If the server ever crashes all you have to is query all jobs that have [finished=false], load them into RAM and start the setTimeouts again.
If anything goes wrong you should be able to restart cleanly like such:
db.find("job", { finished: false }, function(jobs) {
each(jobs, function(job) {
var id = job.id;
setTimeout(Date.now - job.dueDate, function() {
// load the job when you want to run it
db.load(id, function(job) {
// run it.
run(job);
// mark as finished
job.finished = true;
// save your finished = true state
job.save();
});
});
job = null;
});
});

Related

Non blocking Javascript and concurrency

I have code on a web-worker and because i can't post to it an object with methods(functions) , i dont know how to stop blocking the UI with this code:
if (data != 'null') {
obj['backupData'] = obj.tbl.data().toArray();
obj['backupAllData'] = data[0];
}
obj.tbl.clear();
obj.tbl.rows.add(obj['backupAllData']);
var ext = config.extension.substring(1);
$.fn.dataTable.ext.buttons[ext + 'Html5'].action(e, dt, button, config);
obj.tbl.clear();
obj.tbl.rows.add(obj['backupData'])
This code exports records from an html table. Data is an array and is returned from a web worker and sometimes can have 50k or more objects.
As obj and all the methods that it contains are not transferable to we-worker, when data length 30k ,40k or 50k or even more, the UI blocks.
which is the best way to do this?
Thanks in advance.

you could try wrapping the heavy work in an async function like a timeout to allow the engine to queue the whole logic and elaborate it as soon as it has time
setTimeout(function(){
if (data != 'null') {
obj['backupData'] = obj.tbl.data().toArray();
obj['backupAllData'] = data[0];
}
//heavy stuff
}, 0)
or , if the code is extremely long, you can try figure it out a strategy to split your code into chunk of operation and execute each chunk in a separate async function (timeout)
Best way to iterate over an array without blocking the UI

Update:
Sadly, ImmutableJS doesn't work at the moment across webworkers. You should be able to transfer the ArrayBuffer so you don't need to parse it back into an array. Also read this article. If your workload is that heavy, it would be best to actually send back one item at a time from the worker.
Previously:
The code is converting all the data into an array, which is immediately costly. Try returning an immutable data structure from web worker if possible. This will guarantee that it doesn't change when the references change and you can continue iterating over it slowly in batches.
The next thing you can do is to use requestIdleCallback to schedule small batches of items to be processed.
This way you should be able to make the UI breathe a bit.

nedb method update and delete creates a new entry instead updating existing one

I'm using nedb and I'm trying to update an existing record by matching it's ID, and changing a title property.
What happens is that a new record gets created, and the old one is still there.
I've tried several combinations, and tried googling for it, but the search results are scarce.
var Datastore = require('nedb');
var db = {
files: new Datastore({ filename: './db/files.db', autoload: true })
};
db.files.update(
{_id: id},
{$set: {title: title}},
{},
callback
);
What's even crazier when performing a delete, a new record gets added again, but this time the record has a weird property:
{"$$deleted":true,"_id":"WFZaMYRx51UzxBs7"}
This is the code that I'm using:
db.files.remove({_id: id}, callback);

In the nedb docs it says followings :
localStorage has size constraints, so it's probably a good idea to set
recurring compaction every 2-5 minutes to save on space if your client
app needs a lot of updates and deletes. See database compaction for
more details on the append-only format used by NeDB.
 
Compacting the database
Under the hood, NeDB's persistence uses an append-only format, meaning
that all updates and deletes actually result in lines added at the end
of the datafile. The reason for this is that disk space is very cheap
and appends are much faster than rewrites since they don't do a seek.
The database is automatically compacted (i.e. put back in the
one-line-per-document format) everytime your application restarts.
You can manually call the compaction function with
yourDatabase.persistence.compactDatafile which takes no argument. It
queues a compaction of the datafile in the executor, to be executed
sequentially after all pending operations.
You can also set automatic compaction at regular intervals with
yourDatabase.persistence.setAutocompactionInterval(interval), interval
in milliseconds (a minimum of 5s is enforced), and stop automatic
compaction with yourDatabase.persistence.stopAutocompaction().
Keep in mind that compaction takes a bit of time (not too much: 130ms
for 50k records on my slow machine) and no other operation can happen
when it does, so most projects actually don't need to use it.
I didn't use this but it seems , it uses localStorage and it has append-only format for update and delete methods.
When investigated its source codes, in that search in persistence.tests they wanted to sure checking $$delete key also they have mentioned `If a doc contains $$deleted: true, that means we need to remove it from the data``.
So, In my opinion you can try to compacting db manually, or in your question; second way can be useful.

Parsing a large JSON array in Javascript

I'm supposed to parse a very large JSON array in Javascipt. It looks like:
mydata = [
{'a':5, 'b':7, ... },
{'a':2, 'b':3, ... },
.
.
.
]
Now the thing is, if I pass this entire object to my parsing function parseJSON(), then of course it works, but it blocks the tab's process for 30-40 seconds (in case of an array with 160000 objects).
During this entire process of requesting this JSON from a server and parsing it, I'm displaying a 'loading' gif to the user. Of course, after I call the parse function, the gif freezes too, leading to bad user experience. I guess there's no way to get around this time, is there a way to somehow (at least) keep the loading gif from freezing?
Something like calling parseJSON() on chunks of my JSON every few milliseconds? I'm unable to implement that though being a noob in javascript.
Thanks a lot, I'd really appreciate if you could help me out here.

You might want to check this link. It's about multithreading.
Basically :
var url = 'http://bigcontentprovider.com/hugejsonfile';
var f = '(function() {
send = function(e) {
postMessage(e);
self.close();
};
importScripts("' + url + '?format=json&callback=send");
})();';
var _blob = new Blob([f], { type: 'text/javascript' });
_worker = new Worker(window.URL.createObjectURL(_blob));
_worker.onmessage = function(e) {
//Do what you want with your JSON
}
_worker.postMessage();
Haven't tried it myself to be honest...
EDIT about portability: Sebastien D. posted a comment with a link to mdn. I just added a ref to the compatibility section id.

I have never encountered a complete page lock down of 30-40 seconds, I'm almost impressed! Restructuring your data to be much smaller or splitting it into many files on the server side is the real answer. Do you actually need every little byte of the data?
Alternatively if you can't change the file #Cyrill_DD's answer of a worker thread will be able to able parse data for you and send it to your primary JS. This is not a perfect fix as you would guess though. Passing data between the 2 threads requires the information to be serialised and reinterpreted, so you could find a significant slow down when the data is passed between the threads and be back to square one again if you try to pass all the data across at once. Building a query system into your worker thread for requesting chunks of the data when you need them and using the message callback will prevent slow down from parsing on the main thread and allow you complete access to the data without loading it all into your main context.
I should add that worker threads are relatively new, main browser support is good but mobile is terrible... just a heads up!

Node js for loop timer

I'm trying to test which Database, which can implemented in Node.js has the fastest time in certain tasks.
I'm working with a few DBs here, and every time I try to time a for loop, the timer quickly ends, but the task wasn't actually finished. Here is the code:
console.time("Postgre_write");
for(var i = 0; i < limit; i++){
var worker = {
id: "worker" + i.toString(),
ps: Math.random(),
com_id: Math.random(),
status: (i % 3).toString()
}
client.query("INSERT INTO test(worker, password, com_id, status) values($1, $2, $3, $4)", [worker.id, worker.ps, worker.com_id, worker.status]);
}
console.timeEnd("Postgre_write");
when I tried like a few hundred I thought the results were true. But when I test higher numbers like 100,000, the console outputs 500ms, but when I check the PGadmin app, the inserts were still processing.
Obviously I got something wrong here, but I don't know what. I rely heavily on the timer data and I need to find a way to time these operations correctly. Help?

Node.js is asynchronous. This means that your code will not necessarily execute in the order that you have written it.
Take a look at this article http://www.i-programmer.info/programming/theory/6040-what-is-asynchronous-programming.html which offers a pretty decent explanation of asynchronous programming.
In your code, your query is sent to the database and begins to execute. Since Node is asynchronous, it does not wait for that line to finish executing, but instead goes to the next line (which stops your timer).
When your query is finished running, a callback will occur and notify Node that the query has finished. If you specify a callback function it will be run at that point. Think of event-driven programming, which the query's completion being an event.
To solve your problem, stop the timer in the callback function of the query.

Node.js Kue how to restart failed jobs

I am using kue for delayed jobs in my node.js application.
I have some problems to figure out how I can restart a job using the API of kue without having to move the id of a job manually from the the list of failed jobs to the list of inactive jobs using redis commands.
Is this possible using kue?
I don't want to set a fixed number of retry attempts - I just want to retry specific jobs.
Suggestions for a well maintained alternative to kue are also welcome.

i dont know if this is working but you could try to reset the state of the job to active, and save the job again:
job.on('failed', function() {
job.state('inactive').save();
Edit: setting state to inactive will correctly re-enqueue the task.

This can also be done using queue level events.
queue.on('job failed', function(id, result) {
kue.Job.get(id, function(err, job) {
if (!err && shouldRetry(job))
job.state('inactive').save();
});
});
Thus you don't need to do for every job that you wish to retry. Instead you can filter it in the queue level event.

see Failure Attempts in the official docs
By default jobs only have one attempt, that is when they fail, they
are marked as a failure, and remain that way until you intervene.
However, Kue allows you to specify this, which is important for jobs
such as transferring an email, which upon failure, may usually retry
without issue. To do this invoke the .attempts() method with a number.
queue.create('email', {
title: 'welcome email for tj'
, to: 'tj#learnboost.com'
, template: 'welcome-email'
}).priority('high').attempts(5).save();
reference: failure attempts

Develop Reference

JavaScript is the programming language of the Web.