How to prevent race condition in node.js? - javascript

Can someone explain me how to prevent race conditions in node.js with Express?
If have for example this two methods:
router.get('/addUser/:department', function(req, res) { ...})
router.get('/deleteUser/:department', function(req, res) { ...})
Both functions are using a non blocking I/O Operation ( like writing to a file or a database).
Now someone calls 'addUser' with Department 'A' and someone tries to delete all users with department 'A'. How can I solve this (or other similar) race conditions?
How can I solve the problem if every user has its own file/database-record?
How can I solve the problem if I have a single user (filesystem) file that I have to read alter and write again?
Note: This is just an example for understanding. No optimization tipps needed here.

To archive this goal, you need to implement a communication within the two services.
This can be done with a simple queue of operations to process each request in order.
The counter effect is that the request waiting for the queue will have a delayed response (and may occur timeout).
A simple "meta" implementation is:
const operationQueue = new Map();
const eventEmitter = new events.EventEmitter();
router.get('/addUser/:department', function(req, res) {
const customEvent = `addUser-${new Date().getTime()}`;
const done = () => {
res.send('done');
operationQueue.delete(customEvent);
};
eventEmitter.once(customEvent, done);
operationQueue.set(customEvent, () => addUser(customEvent, req));
})
router.get('/deleteUser/:department', function(req, res) {
const customEvent = `deleteUser-${new Date().getTime()}`;
const done = () => {
res.send('done');
operationQueue.delete(customEvent);
};
eventEmitter.once(customEvent, done);
operationQueue.set(customEvent, () => deleteUser(customEvent, req));
})
function addUser(customEvent, req){
// do the logic
eventEmitter.emit(customEvent, {done: true});
}
function deleteUser(customEvent, req){
// do the logic
eventEmitter.emit(customEvent, {done: true});
}
// not the best performance
setInterval(()=>{
const process = operationQueue.shift();
if(process) {
process();
}
}, 1);
Of course, if you'll use tools like a DB or a Redis queue it could fit better than this solution in terms of robustness and failover.

(This is a very broad question.)
Typically, one would use a database (instead of regular text files) and make use its in-built locking mechanisms.
Example of locking mechanisms in the Postgres database management system: https://www.postgresql.org/docs/10/static/explicit-locking.html

Related

How to handle NodeJS Express request race condition

Say I have this endpoint on an express server:
app.get('/', async (req, res) => {
var foo = await databaseGetFoo();
if (foo == true) {
foo = false;
somethingThatShouldOnlyBeDoneOnce();
await databaseSetFoo(foo);
}
})
I think this creates a race condition if the endpoint is called twice simultaneously?
If so how can I prevent this race condition from happening?
OK, so based on the comments, I've got a little better understanding of what you want here.
Assuming that somethingThatShouldOnlyBeDoneOnce is doing something asynchronous (like writing to a database), you are correct that a user (or users) making multiple calls to that endpoint will potentially cause that operation to happen repeatedly.
Using your comment about allowing a single comment per user, and assuming you've got middleware earlier in the middleware stack that can uniquely identify a user by session or something, you could naively implement something like this that should keep you out of trouble (usual disclosures that this is untested, etc.):
let processingMap = {};
app.get('/', async (req, res, next) => {
if (!processingMap[req.user.userId]) {
// add the user to the processing map
processingMap = {
...processingMap,
[req.user.userId]: true
};
const hasUserAlreadySubmittedComment = await queryDBForCommentByUser(req.user.userId);
if (!hasUserAlreadySubmittedComment) {
// we now know we're the only comment in process
// and the user hasn't previously submitted a comment,
// so submit it now:
await writeCommentToDB();
delete processingMap[req.user.userId];
res.send('Nice, comment submitted');
} else {
delete processingMap[req.user.userId];
const err = new Error('Sorry, only one comment per user');
err.statusCode = 400;
next(err)
}
} else {
delete processingMap[req.user.userId];
const err = new Error('Request already in process for this user');
err.statusCode = 400;
next(err);
}
})
Since insertion into the processingMap is all synchronous, and Node can only be doing one thing at a time, the first request for a user to hit this route handler will essentially lock for that user until the lock is removed when we're finished handling the request.
BUT... this is a naive solution and it breaks the rules for a 12 factor app. Specifically, rule 6, which is that your applications should be stateless processes. We've now introduced state into your application.
If you're sure you'll only ever run this as a single process, you're fine. However, the second you go to scale horizontally by deploying multiple nodes (via whatever method--PM2, Node's process.cluster, Docker, K8s, etc.), you're hosed with the above solution. Node Server 1 has no idea about the local state of Node Server 2 and so multiple requests hitting different instances of your multi-node application can't co-manage the state of the processing map.
The more robust solution would be to implement some kind of queue system, likely leveraging a separate piece of infrastructure like Redis. That way all of your nodes could use the same Redis instance to share state and now you can scale up to many, many instances of your application and all of them can share info.
I don't really have all the details on exactly how to go about building that out and it seems out of scope for this question anyway, but hopefully I've given you at least one solution and some idea of what to think about at a broader level.

nodejs how to avoid blocking requests

hi i wrote this simple nodejs server:
const express = require('express')
const app = express()
app.use(express.json())
app.get('/otherget', (req,res) => {
res.send('other get');
});
app.get('/get', async (req, res) => {
let response;
try{
response = await func1();
}catch(error){
}
res.send('response: ' + response);
});
const func1 = () => {
let j = 0;
for(let i=0; i<10000000000; i++){
j++;
}
return j;
}
app.listen(4000, () => console.log('listen 4000'));
and where the server gets a request to the route '/get' he cant perform any more requests units he has done with that one.
how can i prevent this situation when my server will have a lot of requests doing a lot of things?
every thread can get blocked by your "crazy loop".
try to learn about js's event loop to know which kind of work can block it (cpu intensive tasks).
the general solution to survive to this kind of cpu intensive tasks is to create more threads / processes :
you can fork, then you will be able to have two loops of this kind.
you can create workers
or you can load balance between mutiple servers
you can dcreate a queue to prevent having to much parrallel work
use the yield keyword to stop execution at specific checkpoints
etc
every solution has its pros and cons

How to share information between Express middleware and endpoints?

Lots of middleware comes with factories, that takes an options object. Among the options is usually a function that needs to provide some necessary information to the middleware. As an example, take a look at express-preconditions:
app.use(preconditions({
stateAsync: async (req) => { // Fetch the date the resource was last modified. }}
});
This is a neat pattern, but I find it gets complicated when the same information is needed in multiple places. For instance, let's say I've got a database table that contains both the information about the resource that the response is supposed to contain, and the last modified date. In other words, the same information is needed in both the middleware and the endpoint itself. I end up with code similar to this:
//The middleware
app.use(preconditions({
stateAsync: async (req) => {
const data = await fetchFromDb(req.param("id"));
return {
lastModified: data.lastModified
};
})
//The endpoint
app.use("path", (req, res, next) => {
const data = await fetchFromDb(req.param("id"));
res.send(data);
});
I'm hitting the database twice just because I need the same info in different places. I could off course just fetch it once, or store it somewhere on the request object. But that feels a bit like a hack. Another solution would be to have some kind of caching mechanism in fetchFromDb, but that feels a bit overcomplicated.
In my experience, this is a quite common problem when building stuff with Express. What is the recommended way to deal with situations like this?
You can pass data between middlewares with res.locals:
app.get('/yourEndPoint', (req, res, next) => {
const data = // fetch your datas;
res.locals.lastModified = data.lastModified;
next();
}, (req, res) => {
const lastModified = res.locals.lastModified;
// do whatever you need to do
});

Dangling callbacks: return response before every callback has returned

Question: Would you consider dangling callbacks as bad node.js style or even dangerous? If so under which premise?
Case: as described below, imagine you need to make calls to a DB in an express server that updates some data. Yet the client doesn't need to be informed about the result. In this case you could return a response immediately, not waiting for the asynchronous call to complete. This would be described as dangling callback for lack of a better name.
Why is this interesting?: Because tutorials and documentation in most cases show the case of waiting, in worst cases teaching callback hell. Recall your first experiences with say express, mongodb and passport.
Example:
'use strict'
const assert = require('assert')
const express = require('express')
const app = express()
function longOperation (value, cb) {
// might fail and: return cb(err) ...here
setTimeout(() => {
// after some time invokes the callback
return cb(null, value)
}, 4000)
}
app.get('/ping', function (req, res) {
// do some declartions here
//
// do some request processesing here
// call a long op, such as a DB call here.
// however the client does not need to be
// informed about the result of the operation
longOperation(1, (err, val) => {
assert(!err)
assert(val === 1)
console.log('...fired callback here though')
return
})
console.log('sending response here...')
return res.send('Hello!')
})
let server = app.listen(3000, function () {
console.log('Starting test:')
})
Yeah, this is basically what called a "fire and forget" service in other contexts, and could also be the first step in a good design implementing command-query response separation.
I don't consider it a "dangling callback", the response in this case acknowledges that the request was received. Your best bet here would be to make sure your response includes some kind of hypermedia that lets clients get the status of their request later, and if it's an error they can fix have the content at the new resource URL tell them how.
Think of it in the case of a user registration workflow where the user has to be approved by an admin, or has to confirm their email before getting access.

Building an object from multiple asynchronous sources in node.js/express.js

I am having a tough time finding a solution to my problem online and was hoping someone on here might be able to help me. I have an express route that does a few API requests for different JSON objects. I would like to build a JSON response for my client side view but all of my attempts so far either yield previous request data or no data at all.
So my question to you JavaScript pros using node/express js. How do you sync up multiple sources of JSON objects into one single object to be returned to the client side in one response? Is there a library or some callback magic that you use?
Thanks in advance for any and all help!
Async is one of the more popular libraries for this purpose. There are many other async and promise libraries that can help with this. There are different methods that have different behaviors depending on what you need. I think series method is what you need, but check the documentation carefully.
var async = require('async');
var request = require('request');
app.get('endpoint',function(req, res){
async.series([
function(callback){request.get('url',callback)},
function(callback){request.get('url2',callback)},
function(callback){request.get('url'3,callback)},
],
function(err,results){
//handle error
//results is an array of values returned from each one
var processedData = {
a: results[0],
b: results[1],
c: results[2]
};
res.send(processedDAta)
})
})
You could also do this yourself (and is good practice for learning how to organize your node.js code).. callbackhell is a good write up of using named functions and modules to organize callbacks.
Here is a one possible way to do it. (Not tested)
app.get('/endpoint',function(req, res){
var dataSources = ['url1', 'url2',url3];
var requestData = [];
processRequestData = function(){
//do you stuff here
res.send(processedData);
};
dataSources.forEach(function(dataSource){
request.get(dataSource,function(err,result){
//handle error
requestData.push(result);
if(requestData.length == dataSources.length){
processRequestData();
}
})
})
});
Async.js(https://github.com/caolan/async) can help with tasks like this. A quick example:
app.get('/resource', function(req, res){
// object to send back in the response
var results = {};
// helper function to make requests and handle the data to results object
var getData = function(url, propertyName, next){
http.get(url, function(err, data){
// let async.js know if there is an error
if(err) return next(err);
// save data to results object
results[propertyName] = data;
// async.js needs us to notify it when each operation is complete - so we call the callback without passing any data back to
next();
});
};
// array of operations to execute in series or parallel
var operations = [];
operations.push(function(next){
getData('http://url-one', 'users', next);
});
operations.push(function(next){
getData('http://url-two', 'posts', next);
});
// async.js has a few options on how to execute your operations - here we use series
async.series(operations, function(err){
if(err){
throw err;
}
// if we get to this point, all of the operations have exectued and called
// next() without any errors - send the response with a populated results object.
res.send(results);
});
});
I haven't actually tried this code yet but it should give you a good idea on how to do it.

Categories

Resources