I have a nodejs script which creates dynamic tables and views for the temperature recorded for the day. Sometimes it does not create tables if the temperature is not in the normal range. For this I decide to use try catch and call the function recursively. I am not sure if I have done it correctly or if there is another way to call the con.query method, so that tables get created. I encountered this problem for first time in nodejs.
To start with, you have to detect errors and only recurse when there are specific error conditions. If the problem you're trying to solve is one specific error, then you should probably detect that specific error and only repeat the operation when you get that precise error.
Then, some other recommendations for retrying:
Retry only a fixed number of times. It's a sysop's nightmare when some server code gets stuck in a loop banging away over and over on something and just getting the same error every time.
Retry only on certain conditions.
Log every error so you are someone running your server can problem solve when something is wrong.
Retry only after some delay.
If you're going to retry more than few times, then implement a back-off delay so it gets longer and longer between retries.
Here's the general idea for some code to implement retries:
const maxRetries = 5;
const retryDelay = 500;
function execute_query(query, callback) {
let retryCntr = 0;
function run() {
con.query(query, function(err, result, fields) {
if (err && err is something we should retry for) {
++retryCntr;
if (retryCntr <= maxRetries) {
console.log('Retrying after error: ', err);
setTimeout(run, retryDelay)
} else {
// too many retries, communicate back error
console.log(err);
callback(err);
}
} else if (err) {
console.log(err);
// communicate back error
callback(err);
} else {
// communicate back result
callback(null, result, fields);
}
});
}
run();
}
The idea behind retries and backoffs if you're going to do lots of retries is that retry algorithms can lead to what are called avalanche failures. The system gets a little slow or a little too busy and it starts to create a few errors. So, your code then starts to retry over and over which creates more load which leads to more errors so more code starts to retry and the whole things then fails with lots of code looping and retrying in what is called an avalanche failure.
So, instead, when there's an error you have to make sure you don't inadvertently overwhelm the system and potentially just make things worse. That's why you implement a short delay, that's why you implement max retries and that's why you may even implement a back-off algorithm to make the delay between retries longer each time. All of this allows a system that has some sort of error causing perturbation to eventually recover on its own rather than just making the problem worse to the point where everything fails.
Related
I have a bullCollections where I save some information about some messages example
try {
let bullPayload = {
type: 'message',
payload: {
messsages: messagesForPreProcessingData,
sessionID: this.data['sessionID'],
socketID: parseInt(this.data['socketID']),
},
await bullConnections[accumulatorQueue].add(bullPayload, {
removeOnComplete: true,
});
};
The code works fine but I was asked to change the logic here, according to some statistics the messages are taking too much time to be shown to some users ( from bad wifi) and I come up with a solution that the front end will calculate the time taken from server to client and if that took longer than 400ms a new request will be sent so that the backend will know that the messages took a long time to load.
I made a timeout like this
saveBullPayloadWithTimeout(key, timeDuration, bullPayLoad, messages, events) {
let redis = this.data.dbRedisConfigur.dataRedis;
return new Promise((resolve, reject) => {
setTimeout(() => {
redisConnections[redis].get(key, (err, result) => {
if (err) {
reject(err);
} else {
if (result) {
redisConnections[redis].del(key, (err, result) => {
if (result == 1) {
} else {
logErrors({ message: 'CANNOT DELETE KEY' });
}
});
} else {
bullConnections[accumulator].add(bullPayLoad, {
removeOnComplete: true,
});
console.log('AFTER');
this.updateCurrentMessages(messages, events);
}
}
});
}, timeDuration);
});
}
So this piece of code should wait for 5 seconds so that it can know whether to insert the message or not.
During these 5 seconds, the backend waits for a second request if a second request has been made it saves data to Redis and after 5 seconds it will check that data if it exists then it won't save the message otherwise it will save it.
Does timeout affect the performance, because the backend will handle millions of users?
Is there any better way to separate this as a background process?
Does timeout affect the performance, because the backend will handle millions of users?
Timeouts, themselves, probably won't affect performance much. But your specific use of them will, because all it does is delay the process by timeDuration and then run it on the main thread, and you don't have anything in there (as far as I can tell) to cancel a previous one if a subsequent request is made that supercedes it.
Is there any better way to separate this as a background process?
setTimeout doesn't do its work as a background process. It doesn't even do it on a background thread. It's done on the same main thread that scheduled the timer. Using setTimeout just delays starting the work, it doesn't make the work happen on a different thread.
If you want something done in a different process, you'll need to spawn a child process.
If you want something done on a different thread, you'll need to spawn a worker thread.
I'm currently running a script in python SDK which programmatically bulk upserts 1.5 million documents into a collection in azure cosmos db. I've been using the bulk import sproc from the samples provided in the github repo: https://github.com/Azure/azure-cosmosdb-js-server/tree/master/samples/stored-procedures, the only change being that I've swapped collection.createDocument with collection.upsertDocument. I'll include my sproc in full below.
The stored procedure does run successfully - it upserts documents consistently and relatively quickly. Although this will be the case only up until around 30% progress when this error will be thrown:
CosmosHttpResponseError: (RequestTimeout) Message: {"Errors":["The requested operation exceeded maximum alloted time. Learn more: https://aka.ms/cosmosdb-tsg-service-request-timeout"]}
ActivityId: 9f2357c6-918c-4b67-ba20-569034bfde6f, Request URI: /apps/4a997bdb-7123-485a-9808-f952db2b7e52/services/a7c137c6-96b8-4b53-a20c-b9577981b353/partitions/305a8287-11d1-43f8-be1f-983bd4c4a63d/replicas/132488328092882514p/, RequestStats:
RequestStartTime: 2020-11-03T23:43:59.9158203Z, RequestEndTime: 2020-11-03T23:44:05.3858559Z, Number of regions attempted:1
ResponseTime: 2020-11-03T23:44:05.3858559Z, StoreResult: StorePhysicalAddress: rntbd://cdb-ms-prod-centralus1-fd22.documents.azure.com:14354/apps/4a997bdb-7123-485a-9808-f952db2b7e52/services/a7c137c6-96b8-4b53-a20c-b9577981b353/partitions/305a8287-11d1-43f8-be1f-983bd4c4a63d/replicas/132488328092882514p/, LSN: -1, GlobalCommittedLsn: -1, PartitionKeyRangeId: , IsValid: False, StatusCode: 408, SubStatusCode: 0, RequestCharge: 0, ItemLSN: -1, SessionToken: , UsingLocalLSN: False, TransportException: null, ResourceType: StoredProcedure, OperationType: ExecuteJavaScript, SDK: Microsoft.Azure.Documents.Common/2.11.0
Is there a way to add some retry logic or to extend the timeout period for bulk upserts? I believe the section of code in the sproc below if (!isAccepted) getContext().getResponse().setBody(count); is supposed to help with this scenario but it doesn't seem to work in my case.
Bulk upsert stored procedure in Javascript:
function bulkUpsert(docs) {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
// The count of imported docs, also used as current doc index.
var count = 0;
// Validate input.
if (!docs) throw new Error("The array is undefined or null.");
var docsLength = docs.length;
if (docsLength == 0) {
getContext().getResponse().setBody(0);
return;
}
// Call the CRUD API to create a document.
tryCreate(docs[count], callback);
// Note that there are 2 exit conditions:
// 1) The upsertDocument request was not accepted.
// In this case the callback will not be called, we just call setBody and we are done.
// 2) The callback was called docs.length times.
// In this case all documents were created and we don't need to call tryCreate anymore. Just call setBody and we are done.
function tryCreate(doc, callback) {
var isAccepted = collection.upsertDocument(collectionLink, doc, callback);
// If the request was accepted, callback will be called.
// Otherwise report current count back to the client,
// which will call the script again with remaining set of docs.
// This condition will happen when this stored procedure has been running too long
// and is about to get cancelled by the server. This will allow the calling client
// to resume this batch from the point we got to before isAccepted was set to false
if (!isAccepted) {
getContext().getResponse().setBody(count);
}
}
// This is called when collection.upsertDocument is done and the document has been persisted.
function callback(err, doc, options) {
if (err) throw err;
// One more document has been inserted, increment the count.
count++;
if (count >= docsLength) {
// If we have created all documents, we are done. Just set the response.
getContext().getResponse().setBody(count);
} else {
// Create next document.
tryCreate(docs[count], callback);
}
}
}
I think that the problem may lie in the stored procedure rather than the python script, if this isn't the case though I can provide my python script. Any help on this would be massively appreciated, it's been a head scratcher for me for days now!
Extra Info:
Throughput = 10,000, partition upsert size ~ 1.9MB consistently.
If anyone else has this problem, the workaround I've used is to increase the throughput to 100,000 instead of 10,000 temporarily whilst the bulk upsert operation is underway. The error doesn't occur if you use that bulk upsert stored procedure in conjunction with a sufficiently high throughput. I think the timeout was happening frequently once the bulk upsert operation had upserted around 30% of the 1.5 million records, likely because the throughput wasn't divided sufficiently between partitions and it was causing a bottleneck. I may have to again assign a greater throughput to my container once it is used in practice or maybe I'll be able to reduce it to save costs. Either way the code to do this is quite simple with just the method below:
new_throughput = 10000; container.replace_throughput(new_throughput)
Stored procedures have a bounded execution time of 5 seconds. However you can write your stored procedure to handle bounded execution by checking a boolean return value and then use the count of items inserted in each invocation of the stored procedure to track and resume progress across batches. There is an example here.
I am trying to write a javascript file in express to talk to a postgresql database. More precisely, I want to write a function that takes SQL as an input parameter and returns the stringified json. I can assume memory is not an issue given these table sizes. This is paid work making an internal use tool for a private business.
My most recent attempt involved the query callback putting the value into a global variable, but even that still fails because the outermost function returns before the json string is defined. Here is the relevant code:
var dbjson;
function callDB(q) {
pg.connect(connectionString, function(err, client, done) {
if (err) {
console.error('error fetching client from pool', err);
} else {
client.query(q, [], function(err, result) {
client.query('COMMIT');
done();
if (err) {
console.error('error calling query ' + q, err);
} else {
dbjson = JSON.stringify(result.rows);
console.log('1 ' + dbjson);
}
console.log('2 ' + dbjson);
});
console.log('3 ' + dbjson);
}
console.log('4 ' + dbjson);
});
console.log('5 ' + dbjson);
}
The SQL in my test is "select id from users".
The relevant console output is:
5 undefined
GET /db/readTable?table=users 500 405.691 ms - 1671
3 undefined
4 undefined
1 [{"id":1},{"id":2},{"id":3},{"id":4}]
2 [{"id":1},{"id":2},{"id":3},{"id":4}]
Why do the console logs occur in the order that they do?
They are consistent in the order.
I attempted to write a polling loop to wait for the global variable to be set using setTimeout in the caller and then clearing the timeout within the callback but that failed, I think, because javascript is single threaded and my loop did not allow other activity to proceed. Perhaps I was doing that wrong.
While I know I could have each function handle its own database connection and error logging, I really hate repeating the same code.
What is a better way to do this?
I am relatively new to express and javascript but considerably more experienced with other languages.
Presence of the following line will break everything for you:
client.query('COMMIT');
You are trying to execute an asynchronous command in a synchronous manner, and you are calling done(), releasing the connection, before that query gets a chance to execute. The result of such invalid disconnection would be unpredictable, especially since you are not handling any error in that case.
And why are you calling a COMMIT there in the first place? That in itself looks completely invalid. COMMIT is used for closing the current transaction, that which you do not even open there, so it doesn't exist.
There is a bit of misunderstanding there in terms of asynchronous code usage and the database also. If you want to have a good start at both, I would suggest to have a look at pg-promise.
One of my cloud functions is timing out occasionally. It seems to have trouble with counting, although there are only around 700 objects in the class. I would appreciate any tips on how to debug this issue.
The cloud function works correctly most of the time.
Example error logged:
E2015-02-03T02:21:41.410Z] v199: Ran cloud function GetPlayerWorldLevelRank for user xl8YjQElLO with:
Input: {"levelID":60}
Failed with: PlayerWorldLevelRank first count error: Request timed out
Is there anything that looks odd in the code below? The time out error is usually thrown in the second count (query3), although sometimes it times out in the first count (query2).
Parse.Cloud.define("GetPlayerWorldLevelRank", function(request, response) {
var query = new Parse.Query("LevelRecords");
query.equalTo("owner", request.user);
query.equalTo("levelID", request.params.levelID);
query.first().then(function(levelRecord) {
if (levelRecord === undefined) {
response.success(null);
}
// if player has a record, work out his ranking
else {
var query2 = new Parse.Query("LevelRecords");
query2.equalTo("levelID", request.params.levelID);
query2.lessThan("timeSeconds", levelRecord.get("timeSeconds"));
query2.count({
success: function(countOne) {
var numPlayersRankedHigher = countOne;
var query3 = new Parse.Query("LevelRecords");
query3.equalTo("levelID", request.params.levelID);
query3.equalTo("timeSeconds", levelRecord.get("timeSeconds"));
query3.lessThan("bestTimeUpdatedAt", levelRecord.get("bestTimeUpdatedAt"));
query3.count({
success: function(countTwo) {
numPlayersRankedHigher += countTwo;
var playerRanking = numPlayersRankedHigher + 1;
levelRecord.set("rank", playerRanking);
// The SDK doesn't allow an object that has been changed to be serialized into a response.
// This would disable the check and allow you to return the modified object.
levelRecord.dirty = function() { return false; };
response.success(levelRecord);
},
error: function(error) {
response.error("PlayerWorldLevelRank second count error: " + error.message);
}
});
},
error: function(error) {
response.error("PlayerWorldLevelRank first count error: " + error.message);
}
});
}
});
});
I don't think the issue is in your code. Like the error message states: the request times out. That is, the Parse API doesn't respond within the period of the timeout or the network causes it to timeout. As soon as you do .count some API call is probably done, which then can't connect or times out.
Apparently more people have this issue: https://www.parse.com/questions/ios-test-connectivity-to-parse-and-timeout-question. It doesn't seem possible to increase the timeout, so the suggestion in this post states:
For that reason, I suggest setting a NSTimer prior to executing the
query, and invalidating it when the query returns. If the NSTimer
fires before being invalidated, ask the user if they want to keep
waiting for the results to come back, or show them a message
indicating that the request is taking a long time to complete. This
gives the user the chance to wait more if they know their current
network conditions are not ideal.
In case you are dealing with networks, and especially on the mobile platform, you need to prepare for network hickups. So like the post suggests: offer the option to user to try again.
I have a routine that polls a database to look for work, and if it finds work there, it should execute it. It can only execute 1 (one) work order at a time, and this work-order could take anywhere from 5 seconds to several minutes to run. During this time it should not poll the database for more work, but wait until the current work is done.
I was thinking of using setTimeout to accomplish this, by doing the work in the timeout-event, and setting a new timeout at the end of the function. But I don't know if this is the best way to do it. Is there a "best practice" for these things?
It's an ok way! Some code, maybe that helps:
(function poll () {
fetchJob(onJob);
function onJob (err, job) {
if (err) throw err;
if (job) return execute(job, poll);
setTimeout(poll, 1000);
}
}());