ShareDB - Run Database queries in middleware - javascript

Currently i am dealing with the following situation:
I have a ShareDB backend up and running in order to realize real time collaboration (text writing).
Every time a user connects i would like to check if the document the user intends to work on exists in the database. If it DOES NOT exist, create it first. If it DOES exist, proceed normally, this should be done in the "connect" middleware:
var backend = new ShareDB();
backend.use('connect', function(context, next) {
console.log('connect')
var connection = backend.connect();
doc = connection.get('collection_name', 'document_id');
doc.fetch(function(err) {
if (err) throw err;
if (doc.type === null) {
doc.create({content: ''});
return;
}
});
next()
})
But it triggers an infinite loop, because i trigger an connect action inside the connect middleware.
So i have no idea how i to access the database in the middleware... any idea?
Thanks!

Related

How to handle NodeJS Express request race condition

Say I have this endpoint on an express server:
app.get('/', async (req, res) => {
var foo = await databaseGetFoo();
if (foo == true) {
foo = false;
somethingThatShouldOnlyBeDoneOnce();
await databaseSetFoo(foo);
}
})
I think this creates a race condition if the endpoint is called twice simultaneously?
If so how can I prevent this race condition from happening?
OK, so based on the comments, I've got a little better understanding of what you want here.
Assuming that somethingThatShouldOnlyBeDoneOnce is doing something asynchronous (like writing to a database), you are correct that a user (or users) making multiple calls to that endpoint will potentially cause that operation to happen repeatedly.
Using your comment about allowing a single comment per user, and assuming you've got middleware earlier in the middleware stack that can uniquely identify a user by session or something, you could naively implement something like this that should keep you out of trouble (usual disclosures that this is untested, etc.):
let processingMap = {};
app.get('/', async (req, res, next) => {
if (!processingMap[req.user.userId]) {
// add the user to the processing map
processingMap = {
...processingMap,
[req.user.userId]: true
};
const hasUserAlreadySubmittedComment = await queryDBForCommentByUser(req.user.userId);
if (!hasUserAlreadySubmittedComment) {
// we now know we're the only comment in process
// and the user hasn't previously submitted a comment,
// so submit it now:
await writeCommentToDB();
delete processingMap[req.user.userId];
res.send('Nice, comment submitted');
} else {
delete processingMap[req.user.userId];
const err = new Error('Sorry, only one comment per user');
err.statusCode = 400;
next(err)
}
} else {
delete processingMap[req.user.userId];
const err = new Error('Request already in process for this user');
err.statusCode = 400;
next(err);
}
})
Since insertion into the processingMap is all synchronous, and Node can only be doing one thing at a time, the first request for a user to hit this route handler will essentially lock for that user until the lock is removed when we're finished handling the request.
BUT... this is a naive solution and it breaks the rules for a 12 factor app. Specifically, rule 6, which is that your applications should be stateless processes. We've now introduced state into your application.
If you're sure you'll only ever run this as a single process, you're fine. However, the second you go to scale horizontally by deploying multiple nodes (via whatever method--PM2, Node's process.cluster, Docker, K8s, etc.), you're hosed with the above solution. Node Server 1 has no idea about the local state of Node Server 2 and so multiple requests hitting different instances of your multi-node application can't co-manage the state of the processing map.
The more robust solution would be to implement some kind of queue system, likely leveraging a separate piece of infrastructure like Redis. That way all of your nodes could use the same Redis instance to share state and now you can scale up to many, many instances of your application and all of them can share info.
I don't really have all the details on exactly how to go about building that out and it seems out of scope for this question anyway, but hopefully I've given you at least one solution and some idea of what to think about at a broader level.

Express server stops after 5 GET requests

This code works like it should work, but after fifth GET request it does what it should do on the backend(stores the data in db) but it's not logging anything on the server and no changes on frontend(reactjs)
const express = require('express');
const router = express.Router();
const mongoose = require('mongoose');
const User = require('./login').User;
mongoose.connect('mongodb://localhost:27017/animationsdb');
router.get('/', async(req, res) => {
await User.findOne({ username: req.query.username }, (err, result) => {
if (result) {
// when user goes to his profile we send him the list of animations he liked
// list is stored in array at db, field likedAnimations
res.send({ animationList: result.likedAnimations });
console.log("Lajkovane animacije:", result.likedAnimations);
} else {
console.log("no result found");
res.sendStatus(404)
}
});
});
router.put('/', async(req, res) => {
console.log("username:", req.body.username);
console.log("link:", req.body.link);
// if animation is already liked, then dislike it
// if it's not liked, then store it in db
const user = await User.findOne({ username: req.body.username });
if (user.likedAnimations.indexOf(req.body.link) === -1) {
user.likedAnimations.push(req.body.link);
} else {
user.likedAnimations = arrayRemove(user.likedAnimations, user.likedAnimations[user.likedAnimations.indexOf(req.body.link)]);
}
user.save();
});
function arrayRemove(arr, value) {
return arr.filter((item) => {
return item != value;
});
}
module.exports = router;
For first five requests I get this output:
Liked animations: ["/animations/animated-button.html"]
GET /animation-list/?username=marko 200 5.152 ms - 54
Liked animations: ["/animations/animated-button.html"]
GET /animation-list/?username=marko 304 3.915 ms - -
After that I don't get any output on server console and no changes on front end untill I refresh the page, even though db operations still work and data is saved.
It appears you have a couple issues going on. First, this request handler is not properly coded to handle errors and thus it leaves requests as pending and does not send a response and the connection will stay as pending until the client eventually times it out. Second, you likely have some sort of database concurrency usage error that is the root issue here. Third, you're not using await properly with your database. You either use await or you pass a callback to your database, not both. You need to fix all three of these.
To address the first and third issues:
router.get('/', async(req, res) => {
try {
let result = await User.findOne({ username: req.query.username };
if (result) {
console.log("Liked animations:", result.likedAnimations);
res.send({ animationList: result.likedAnimations });
} else {
console.log("no database result found");
res.sendStatus(404);
}
} catch(e) {
console.log(e);
res.sendStatus(500);
}
});
For the second issue, the particular database error you mention appears to be some sort of concurrency/locking issue internal to the database and is triggered by the sequence of database operations your code executes. You can read more about that error in the discussion here. Since the code you show us only shows a single read operation, we would need to see a much larger context of relevant code including the code related to this operation that writes to the database in order to be able to offer any ideas on how to fix the root cause of this issue.
We can't see the whole flow here, but you need to use atomic update operations in your database. Your PUT handler you show is an immediate race condition. In multi-client databases, you don't get a value, modify it and then write it back. That's an opportunity for a race condition because someone else could modify the value while you're sitting their holding it. When you then modify your held value, you overwrite the change that the other client just made. That's a race condition. Instead, you use an atomic operation that updates the operation directly in one database call or you use transactions to make a multi-step operation into a safe operation.
I'd suggest you read this article on atomic operations in mongodb. And, probably you want to use something like .findAndModify() so you can find and change an item in the database in one atomic operation. If you search for "atomic operations in mongodb", there are many other articles on the topic.

Weird socket.io behavior when Node server is down and then restarted

I implemented a simple chat for my website where users can talk to each other with ExpressJS and Socket.io. I added a simple protection from a ddos attack that can be caused by one person spamming the window like this:
if (RedisClient.get(user).lastMessageDate > currentTime - 1 second) {
return error("Only one message per second is allowed")
} else {
io.emit('message', ...)
RedisClient.set(user).lastMessageDate = new Date()
}
I am testing this with this code:
setInterval(function() {
$('input').val('message ' + Math.random());
$('form').submit();
}, 1);
It works correctly when Node server is always up.
However, things get extremely weird if I turn off the Node server, then run the code above, and start Node server again in a few seconds. Then suddenly, hundreds of messages are inserted into the window and the browser crashes. I assume it is because when Node server is down, socket.io is saving all the client emits, and once it detects Node server is online again, it pushes all of those messages at once asynchronously.
How can I protect against this? And what is exactly happening here?
edit: If I use Node in-memory instead of Redis, this doesn't happen. I am guessing cause servers gets flooded with READs and many READs happen before RedisClient.set(user).lastMessageDate = new Date() finishes. I guess what I need is atomic READ / SET? I am using this module: https://github.com/NodeRedis/node_redis for connecting to Redis from Node.
You are correct that this happens due to queueing up of messages on client and flooding on server.
When the server receives messages, it receives messages all at once, and all of these messages are not synchronous. So, each of the socket.on("message:... events are executed separately, i.e. one socket.on("message... is not related to another and executed separately.
Even if your Redis-Server has a latency of a few ms, these messages are all received at once and everything always goes to the else condition.
You have the following few options.
Use a rate limiter library like this library. This is easy to configure and has multiple configuration options.
If you want to do everything yourself, use a queue on server. This will take up memory on your server, but you'll achieve what you want. Instead of writing every message to server, it is put into a queue. A new queue is created for every new client and delete this queue when processing the last item in queue.
(update) Use multi + watch to create lock so that all other commands except the current one will fail.
the pseudo-code will be something like this.
let queue = {};
let queueHandler = user => {
while(queue.user.length > 0){
// your redis push logic here
}
delete queue.user
}
let pushToQueue = (messageObject) => {
let user = messageObject.user;
if(queue.messageObject.user){
queue.user = [messageObject];
} else {
queue.user.push(messageObject);
}
queueHandler(user);
}
socket.on("message", pushToQueue(message));
UPDATE
Redis supports locking with WATCH which is used with multi. Using this, you can lock a key, and any other commands that try to access that key in thet time fail.
from the redis client README
Using multi you can make sure your modifications run as a transaction,
but you can't be sure you got there first. What if another client
modified a key while you were working with it's data?
To solve this, Redis supports the WATCH command, which is meant to be
used with MULTI:
var redis = require("redis"),
client = redis.createClient({ ... });
client.watch("foo", function( err ){
if(err) throw err;
client.get("foo", function(err, result) {
if(err) throw err;
// Process result
// Heavy and time consuming operation here
client.multi()
.set("foo", "some heavy computation")
.exec(function(err, results) {
/**
* If err is null, it means Redis successfully attempted
* the operation.
*/
if(err) throw err;
/**
* If results === null, it means that a concurrent client
* changed the key while we were processing it and thus
* the execution of the MULTI command was not performed.
*
* NOTICE: Failing an execution of MULTI is not considered
* an error. So you will have err === null and results === null
*/
});
}); });
Perhaps you could extend your client-side code, to prevent data being sent if the socket is disconnected? That way, you prevent the library from queuing messages while the socket is disconnected (ie the server is offline).
This could be achieved by checking to see if socket.connected is true:
// Only allow data to be sent to server when socket is connected
function sendToServer(socket, message, data) {
if(socket.connected) {
socket.send(message, data)
}
}
More information on this can be found at the docs https://socket.io/docs/client-api/#socket-connected
This approach will prevent the built in queuing behaviour in all scenarios where a socket is disconnected, which may not be desirable, however if should protect against the problem you are noting in your question.
Update
Alternatively, you could use a custom middleware on the server to achieve throttling behaviour via socket.io's server API:
/*
Server side code
*/
io.on("connection", function (socket) {
// Add custom throttle middleware to the socket when connected
socket.use(function (packet, next) {
var currentTime = Date.now();
// If socket has previous timestamp, check that enough time has
// lapsed since last message processed
if(socket.lastMessageTimestamp) {
var deltaTime = currentTime - socket.lastMessageTimestamp;
// If not enough time has lapsed, throw an error back to the
// client
if (deltaTime < 1000) {
next(new Error("Only one message per second is allowed"))
return
}
}
// Update the timestamp on the socket, and allow this message to
// be processed
socket.lastMessageTimestamp = currentTime
next()
});
});

Node.js flat-cache, when to clear caches

I have a Node.js server which queries MySQL database. It serves as an api end point where it returns JSON and also backend server for my Express application where it returns the retrieved list as an object to the view.
I am looking into implementing flat-cache for increasing the response time. Below is the code snippet.
const flatCache = require('flat-cache');
var cache = flatCache.load('productsCache');
//get all products for the given customer id
router.get('/all/:customer_id', flatCacheMiddleware, function(req, res){
var customerId = req.params.customer_id;
//implemented custom handler for querying
queryHandler.queryRecordsWithParam('select * from products where idCustomers = ? order by CreatedDateTime DESC', customerId, function(err, rows){
if(err) {
res.status(500).send(err.message);
return;
}
res.status(200).send(rows);
});
});
//caching middleware
function flatCacheMiddleware(req, res, next) {
var key = '__express__' + req.originalUrl || req.url;
var cacheContent = cache.getKey(key);
if(cacheContent){
res.send(cacheContent);
} else{
res.sendResponse = res.send;
res.send = (body) => {
cache.setKey(key,body);
cache.save();
res.sendResponse(body)
}
next();
}
}
I ran the node.js server locally and the caching has indeed greatly reduced the response time.
However there are two issues I am facing that I need your help with.
Before putting that flatCacheMiddleware middleware, I received the response in JSON, now when I test, it sends me an HTML. I am not too well versed with JS strict mode (planning to learn it soon), but I am sure the answer lies in the flatCacheMiddleware function.
So what do I modify in the flatCacheMiddleware function so it would send me JSON?
I manually added a new row to the products table for that customer and when I called the end point, it still showed me the old rows. So at what point do I clear the cache?
In a web app it would ideally be when the user logs out, but if I am using this as an api endpoint (or even on webapp there is no guarantee that the user will log out the traditional way), how do I determine if new records have been added and the cache needs to be cleared.
Appreciate the help. If there are any other node.js caching related suggestions you all can give, it would be truly helpful.
I found a solution to the issue by parsing the content to JSON format.
Change line:
res.send(cacheContent);
To:
res.send(JSON.parse(cacheContent));
I created cache 'brute force' invalidation method. Calling clear method will clear both cache file and data stored in memory. You have to call it after db change. You can also try delete specified key using cache.removeKey('key');.
function clear(req, res, next) {
try {
cache.destroy()
} catch (err) {
logger.error(`cache invalidation error ${JSON.stringify(err)}`);
res.status(500).json({
'message' : 'cache invalidation error',
'error' : JSON.stringify(err)
});
} finally {
res.status(200).json({'message' : 'cache invalidated'})
}
}
Notice, that calling the cache.save() function will remove other cached API function. Change it into cache.save(true) will 'prevent the removal of non visited keys' (like mentioned in comment in the flat-cache documentation.

Node js. Proper / Best Practice to create connection

Right now i am creating a very large application in Node JS. I am trying to make my code clean and short (Just like most of the developer). I've create my own js file to handle connection to mysql. Please see code below.
var mysql = require('mysql');
var config = {
'default' : {
connectionLimit : process.env.DB_CONN_LIMIT,
host : process.env.DB_HOST,
user : process.env.DB_USER,
password : process.env.DB_PASS,
database : process.env.DB_NAME,
debug : false,
socketPath : process.env.DB_SOCKET
}
};
function connectionFunc(query,parameters,callback,configName) {
configName = configName || "default";
callback = callback || null;
parameters = parameters;
if(typeof parameters == 'function'){
callback = parameters;
parameters = [];
}
//console.log("Server is starting to connect to "+configName+" configuration");
var dbConnection = mysql.createConnection(config[configName]);
dbConnection.connect();
dbConnection.query(query,parameters, function(err, rows, fields) {
//if (!err)
callback(err,rows,fields);
//else
//console.log('Error while performing Query.');
});
dbConnection.end();
}
module.exports.query = connectionFunc;
I am using the above file in my models, like below :
var database = require('../../config/database.js');
module.exports.getData = function(successCallBack){
database.query('SAMPLE QUERY GOES HERE', function(err, result){
if(err) {console.log(err)}
//My statements here
});
}
Using this coding style, everything works fine but when i am trying to create a function that will loop my model's method for some reason. Please see sample below :
for (i = 0; i < 10000; i++) {
myModel.getData(param, function(result){
return res.json({data : result });
});
}
It gives me an ER_CON_COUNT_ERROR : Too Many Conenction. The question is why i still get an error like these when my connection always been ended by this dbConnection.end();? I'm still not sure if i am missing something. I am still stuck on this.
My connection limit is 100 and i think adding more connection is a bad idea.
Because query data form the database is async.
In your loop the myModel.getData (or more precisely the underling query) will not halt/paus your code until the query is finished, but send the query to the database server and as soon as the database response the callback will be called.
The calling end on dbConnection will not close the connection immediately, it will just mark the connection to be closed as soon as all queries that where created with that connection are finished.
mysql: Terminating connections
Terminating a connection gracefully is done by calling the end() method. This will make sure all previously enqueued queries are still before sending a COM_QUIT packet to the MySQL server.
An alternative way to end the connection is to call the destroy() method. This will cause an immediate termination of the underlying socket. Additionally destroy() guarantees that no more events or callbacks will be triggered for the connection.
But with destroy the library will not wait for the result so the results are lost, destroy is rarely useful.
So with your given code you try to create 10000 connections at one time.
You should only use on connection by task, e.g. if a user requests data using the browser, then you should use one connection for this given request. The same is for timed task, if you have some task that is done in certain intervals.
Here an example code:
var database = require('./config/database.js');
function someTask( callback ) {
var conn = database.getConnection();
myModel.getData(conn, paramsA, dataReceivedA)
function dataReceivedA(err, data) {
myModel.getData(conn, paramsB, dataReceivedB)
}
function dataReceivedB(err, data) {
conn.end()
callback();
}
}
If you want to entirely hide your database connection in your model code. Then you would need to doe something like that:
var conn = myModel.connect();
conn.getData(params, function(err, data) {
conn.end();
})
How to actually solve this depends only many factors so it is only possible to give you hints here.

Categories

Resources