I have a Worker in which I want to execute my sql queries. But, and that is my problem, I want all these queries to be executed within the same transaction. This is how I have my (not working) Worker right now:
db = openDatabase("WorkerFoo", "", "", 1);
if (db) {
db.transaction(function (tx) {
self.onmessage = function(e) {
tx.executeSql(e.data, [], function(tx, rs){
self.postMessage(rs.rows.item(0)) ;
}) ;
};
}) ;
}
else {
self.postMessage('No WebSql support in Worker') ;
}
However, doing it this way, nothing happens (no errors). Any suggestions how to fix this ?
An other (related) question I have is, if a query is blocking the UI thread, because the query is heavy and takes a couple of seconds, will the execution of the query within a Worker fix this problem ?
Thanks a lot!
To answer your questions:
The query should not block the UI thread, even if not executed in the web-worker, because it is async (assuming that the target computer has enough multi-threading power that is). JavaScript thrives on non-blocking asynchronous IO.
You can, for example, pass the SQL code itself to the worker, and have transactionStart and transactionEnd messages, and only execute the code after receiving a transactionEnd.
Note, the WebSQL specification is no longer under work.
You might want to consider IndexedDB, its methods also return without blocking the calling thread. (Again, no web workers are needed, it does however has a synchronous version if you'd like which you can use with WebWorkers (but I think noone implementes yet)).
Good luck!
Related
I've mostly learned coding with OOPs like Java.
I have a personal project where I want to import a bunch of plaintext into a mongodb. I thought I'd try to expand my horizons and do this with using node.js powered JavaScript.
I got the code working fine but I'm trying to figure out why it is executing the way it is.
The output from the console is:
1. done reading file
2. closing db
3. record inserted (n times)
var fs = require('fs'),
readline = require('readline'),
instream = fs.createReadStream(config.file),
outstream = new (require('stream'))(),
rl = readline.createInterface(instream, outstream);
rl.on('line', function (line) {
var split = line.split(" ");
_user = "#" + split[0];
_text = "'" + split[1] + "'";
_addedBy = config._addedBy;
_dateAdded = new Date().toISOString();
quoteObj = { user : _user , text : _text , addedby : _addedBy, dateadded : _dateAdded};
db.collection("quotes").insertOne(quoteObj, function(err, res) {
if (err) throw err;
console.log("record inserted.");
});
});
rl.on('close', function (line) {
console.log('done reading file.');
console.log('closing db.')
db.close();
});
(full code is here: https://github.com/HansHovanitz/Import-Stuff/blob/master/importStuff.js)
When I run it I get the message 'done reading file' and 'closing db' and then all of the 'record inserted' messages. Why is that happening? Is it because of the delay in inserting a record in the db? The fact that I see 'closing db' first makes me think that the db would be getting closed and then how are the records being inserted still?
Just curious to know why the program is executing in this order for my own peace of mind. Thanks for any insight!
In short, it's because of asynchronous nature of I/O operations in the used functions - which is quite common for Node.js.
Here's what happens. First, the script reads all the lines of the file, and for each line initiates db.insertOne() operation, supplying a callback for each of them. Note that the callback will be called when the corresponding operation is finished, not in the middle of this process.
Eventually the script reaches the end of the input file, logs two messages, then invokes db.close() line. Note that even though 'insert' callbacks (that log 'inserted' message) are not called yet, the database interface has already received all the 'insert' commands.
Now the tricky part: whether or not DB interface succeeds to store all the DB records (in other words, whether or not it'll wait until all the insert operations are completed before closing the connection) is up both to DB interface and its speed. If write op is fast enough (faster than reading the file line), you'll probably end up with all the records been inserted; if not, you can miss some of them. That's why it's a safest bet to close the connection to database not in the file close (when the reading is complete), but in insert callbacks (when the writing is complete):
let linesCount = 0;
let eofReached = false;
rl.on('line', function (line) {
++linesCount;
// parsing skipped for brevity
db.collection("quotes").insertOne(quoteObj, function(err, res) {
--linesCount;
if (linesCount === 0 && eofReached) {
db.close();
console.log('database close');
}
// the rest skipped
});
});
rl.on('close', function() {
console.log('reading complete');
eofReached = true;
});
This question describes the similar problem - and several different approaches to solve it.
Welcome to the world of asynchronicity. Inserting into the DB happens asynchronously. This means that the rest of your (synchronous) code will execute completely before this task is complete. Consider the simplest asynchronous JS function setTimeout. It takes two arguments, a function and a time (in ms) after which to execute the function. In the example below "hello!" will log before "set timeout executed" is logged, even though the time is set to 0. Crazy right? That's because setTimeout is asynchronous.
This is one of the fundamental concepts of JS and it's going to come up all the time, so watch out!
setTimeout(() => {
console.log("set timeout executed")
}, 0)
console.log("hello!")
When you call db.collection("quotes").insertOne you're actually creating an asynchronous request to the database, a good way to determine if a code will be asynchronous or not is if one (or more) of its parameters is a callback.
So the order you're running it is actually expected:
You instantiate rl
You bind your event handlers to rl
Your stream starts processing & calling your 'line' handler
Your 'line' handler opens asynchronous requests
Your stream ends and rl closes
...
4.5. Your asynchronous requests return and execute their callbacks
I labelled the callback execution as 4.5 because technically your requests can return at anytime after step 4.
I hope this is a useful explanation, most modern javascript relies heavily on asynchronous events and it can be a little tricky to figure out how to work with them!
You're on the right track. The key is that the database calls are asychronous. As the file is being read, it starts a bunch of async calls to the database. Since they are asynchronous, the program doesn't wait for them to complete at the time they are called. The file then closes. As the async calls complete, your callbacks runs and the console.logs execute.
Your code reads lines and immediately after that makes a call to the db - both asynchronous processes. When the last line is read the last request to the db is made and it takes some time for this request to be processed and the callback of the insertOne to be executed. Meanwhile the r1 has done it's job and triggers the close event.
I'm developing a facebook app which searches for facebook events near your position.The only way to do so is to search for all the places id's in your zone and then for each of those check if there is an event today.The problem I have is that the computation takes like 1-1:30 min which is kinda long. This is the code I use(might not be the best, I know):
foreach (var item in allPlacesIds)
{
RunOnUiThread (() =>loading.Text = string.Format ("Loading {0} possible events out of {1}",count,allPlacesIds.Count));
string query = string.Format ("{0}?&fields=id,name,events.fields(id,name,description,start_time,attending_count,declined_count,maybe_count,noreply_count).since({1}).until({2})", item,dateNow,dateTomorrow);
JsonObject result=(JsonObject)fb.Get (query, null);
try
{
JsonArray allEvents= (JsonArray)((JsonObject) result ["events"])["data"];
foreach (var events in allEvents)
{
Events theEvent= new Events(((JsonObject)events) ["id"].ToString(),
((JsonObject)events) ["name"].ToString(),
((JsonObject)events) ["description"].ToString(),
((JsonObject)events) ["start_time"].ToString(),
int.Parse(((JsonObject)events) ["attending_count"].ToString()),
int.Parse(((JsonObject)events) ["declined_count"].ToString()),
int.Parse(((JsonObject)events) ["maybe_count"].ToString()),
int.Parse(((JsonObject)events) ["noreply_count"].ToString()));
todaysEvents.Add(theEvent);
}
}
catch(Exception ex)
{
}
count++;
}
Where the try starts I used to have an if but that made it take even longer so I replaced it with a try block, as the result comes as null.
I know this isn't exactly a technical issue but I felt maybe you guys know a faster and better implementation of this, my only other option is to create and host a web service and use that just to interrogate data. the problem with that is that I need to invest a lot of money into a server/real ip/ and then I need to create a scheduled job to update the data daily.
Each API call takes some time, the only way to make it faster is to use Batch Requests. Here´s the documentation about those: https://developers.facebook.com/docs/graph-api/making-multiple-requests
Keep in mind that this will not count as one API call, it´s still the same amount, so be careful with API limits.
I'm trying to write back end functionality that is handling requests to particular API, but this API has some restrictive quotas, especially for requests/sec. I want to create API abstraction layer that is able of delaying function execution if there are too many requests/s, so it works like this:
New request arrives (to put it simple - library method is invoked)
Check if this request could be executed right now, according to given limit (requests/s)
If it can't be executed, delay its execution till next available moment
If at this time a new request arrives, delay its execution further or put it on some execution queue
I don't have any constraints in terms of waiting queue length. Requests are function calls with node.js callbacks as the last param for responding with data.
I thought of adding delay to each request, which would be equal to the smallest possible slot between requests (expressed as minimal miliseconds/request), but it can be a bit inefficient (always delaying functions before sending response).
Do you know any library or simple solution that could provide me with such functionality?
Save the last request's timestamp.
Whenever you have a new incoming request, check if a minimum interval elapsed since then, if not, put the function in a queue then schedule a job (unless one was already scheduled):
setTimeout(
processItemFromQueue,
(lastTime + minInterval - new Date()).getTime()
)
processItemFromQueue takes a job from the front of the queue (shift) then reschedules itself unless the queue is empty.
The definite answer for this problem (and the best one) came from the API documentation itself. We use it for a couple of months and it perfectly solved my problem.
In such cases, instead of writing some complicated queue code, the best way is to leverage JS possibility of handling asynchronous code and either write simple backoff by yourself or use one of many great libraries to use so.
So, if you stumble upon any API limits (e.g. quota, 5xx etc.), you should use backoff to recursively run the query again, but with increasing delay (more about backoff could be found here: https://en.wikipedia.org/wiki/Exponential_backoff). And, if finally, after given amount of times you fail again - gracefully return error about unavailability of the API.
Example use below (taken from https://www.npmjs.com/package/backoff):
var call = backoff.call(get, 'https://someaddress', function(err, res) {
console.log('Num retries: ' + call.getNumRetries());
if (err) {
// Put your error handling code here.
// Called ONLY IF backoff fails to help
console.log('Error: ' + err.message);
} else {
// Put your success code here
console.log('Status: ' + res.statusCode);
}
});
/*
* When to retry. Here - 503 error code returned from the API
*/
call.retryIf(function(err) { return err.status == 503; });
/*
* This lib offers two strategies - Exponential and Fibonacci.
* I'd suggest using the first one in most of the cases
*/
call.setStrategy(new backoff.ExponentialStrategy());
/*
* Info how many times backoff should try to post request
* before failing permanently
*/
call.failAfter(10);
// Triggers backoff to execute given function
call.start();
There are many backoff libraries for NodeJS, leveraging either Promise-style, callback-style or even event-style backoff handling (example above being second of the mentioned ones). They're really easy to use if you understand backoff algorithm itself. And as the backoff parameters could be stored in config, if backoff is failing too often, they could be adjusted to achieve better results.
I'm trying to test which Database, which can implemented in Node.js has the fastest time in certain tasks.
I'm working with a few DBs here, and every time I try to time a for loop, the timer quickly ends, but the task wasn't actually finished. Here is the code:
console.time("Postgre_write");
for(var i = 0; i < limit; i++){
var worker = {
id: "worker" + i.toString(),
ps: Math.random(),
com_id: Math.random(),
status: (i % 3).toString()
}
client.query("INSERT INTO test(worker, password, com_id, status) values($1, $2, $3, $4)", [worker.id, worker.ps, worker.com_id, worker.status]);
}
console.timeEnd("Postgre_write");
when I tried like a few hundred I thought the results were true. But when I test higher numbers like 100,000, the console outputs 500ms, but when I check the PGadmin app, the inserts were still processing.
Obviously I got something wrong here, but I don't know what. I rely heavily on the timer data and I need to find a way to time these operations correctly. Help?
Node.js is asynchronous. This means that your code will not necessarily execute in the order that you have written it.
Take a look at this article http://www.i-programmer.info/programming/theory/6040-what-is-asynchronous-programming.html which offers a pretty decent explanation of asynchronous programming.
In your code, your query is sent to the database and begins to execute. Since Node is asynchronous, it does not wait for that line to finish executing, but instead goes to the next line (which stops your timer).
When your query is finished running, a callback will occur and notify Node that the query has finished. If you specify a callback function it will be run at that point. Think of event-driven programming, which the query's completion being an event.
To solve your problem, stop the timer in the callback function of the query.
My situation ...
I have a set of workers that are scheduled to run periodically, each at different intervals, and would like to find a good implementation to manage their execution.
Example: Let's say I have a worker that goes to the store and buys me milk once a week. I would like to store this job and it's configuration in a mysql table. But, it seems like a really bad idea to poll the table (every second?) and see which jobs are ready to be put into the execution pipeline.
All of my workers are written in javascript, so I'm using node.js for execution and beanstalkd as a pipeline.
If new jobs (ie. scheduling a worker to run at a given time) are being created asynchronously and I need to store the job result and configuration persistently, how do I avoid polling a table?
Thanks!
I agree that it seems inelegant, but given the way that computers work something *somewhere* is going to have to do polling of some kind in order to figure out which jobs to execute when. So, let's go over some of your options:
Poll the database table. This isn't a bad idea at all - it's probably the simplest option if you're storing the jobs in MySQL anyway. A rate of one query per second is nothing - give it a try and you'll notice that your system doesn't even feel it.
Some ideas to help you scale this to possibly hundreds of queries per second, or just keep system resource requirements down:
Create a second table, 'job_pending', where you put the jobs that need to be executed within the next X seconds/minutes/hours.
Run queries on your big table of all jobs only once in a longer while, then populate the small table which you query every shorter while.
Remove jobs that were executed from the small table in order to keep it small.
Use an index on your 'execute_time' (or whatever you call it) column.
If you have to scale even further, keep the main jobs table in the database, and use the second, smaller table I suggest, just put that table in RAM: either as a memory table in the DB engine, or in a Queue of some kind in your program. Query the queue at extremely short intervals if you have too - it'll take some extreme use cases to cause any performance issues here.
The main issue with this option is that you'll have to keep track of jobs that were in memory but didn't execute, e.g. due to a system crash - more coding for you...
Create a thread for each of a bunch of jobs (say, all jobs that need to execute in the next minute), and call thread.sleep(millis_until_execution_time) (or whatever, I'm not that familiar with node.js).
This option has the same problem as no. 2 - where you have to keep track job execution for crash recovery. It's also the most wasteful imo - every sleeping job thread still takes system resources.
There may be additional options of course - I hope that others answer with more ideas.
Just realize that polling the DB every second isn't a bad idea at all. It's the most straightforward way imo (remember KISS), and at this rate you shouldn't have performance issues so avoid premature optimizations.
Why not have a Job object in node.js that's saved to the database.
var Job = {
id: long,
task: String,
configuration: JSON,
dueDate: Date,
finished: bit
};
I would suggest you only store the id in RAM and leave all the other Job data in the database. When your timeout function finally runs it only needs to know the .id to get the other data.
var job = createJob(...); // create from async data somewhere.
job.save(); // save the job.
var id = job.id // only store the id in RAM
// ask the job to be run in the future.
setTimeout(Date.now - job.dueDate, function() {
// load the job when you want to run it
db.load(id, function(job) {
// run it.
run(job);
// mark as finished
job.finished = true;
// save your finished = true state
job.save();
});
});
// remove job from RAM now.
job = null;
If the server ever crashes all you have to is query all jobs that have [finished=false], load them into RAM and start the setTimeouts again.
If anything goes wrong you should be able to restart cleanly like such:
db.find("job", { finished: false }, function(jobs) {
each(jobs, function(job) {
var id = job.id;
setTimeout(Date.now - job.dueDate, function() {
// load the job when you want to run it
db.load(id, function(job) {
// run it.
run(job);
// mark as finished
job.finished = true;
// save your finished = true state
job.save();
});
});
job = null;
});
});