Mongo db operations are getting starved in a rabbit mq consumer .
rabbitConn.createChannel(function(err, channel) {
channel.consume(q.queue, async function(msg) {
// The consumer listens to messages on Queue A for suppose based on a binding key.
await Conversations.findOneAndUpdate(
{'_id': 'someID'},
{'$push': {'messages': {'body': 'message body'}}}, function(error, count) {
// Passing a call back so that the query is executed immediately as mentioned in the
// mongoose document http://mongoosejs.com/docs/api.html#model_Model.findOneAndUpdate
});
});
});
The problem is if there are a large number of messages being read the mongo operations are getting starved and executed when the queue has no more messages. So if there are 1000 messages in the queue.The 1000 messages are read first and then and then mongo operation is getting called.
Would running the workers in a different nodejs process work ?
Ans: Tried doing this decoupling the workers from the main thread, does not help.
I have also written a load balancer with 10 workers but that does not seem to help, is the event loop not prioritizing the mongo operations ?
Ans: Does not help either the 10 workers read from the queue and only execute the findOneAndUpdate once there is nothing more to read from the queue.
Any help would be appreciated.
Thank you
Based on the description of the problem, I think you have a case of no message queuing happening. This can happen when you have a bunch of messages sitting in the queue, then subscribe a consumer with auto-ack set to true and no prefetch count. This answer describes in a bit more detail what happens in this case.
If I had to guess, I'd say the javascript code is spending all of its allocated cycles downloading messages from the broker rather than processing them into Mongo. Adding a prefetch count, while simultaneously disabling auto-ack may solve your issue.
Related
this question might be duplicated but I am still not getting the answer. I am fairly new to node.js so I might need some help. Many have said that node.js is perfectly free to run incoming requests asynchronously, but the code below shows that if multiple requests hit the same endpoint, say /test3, the callback function will:
Print "test3"
Call setTimeout() to prevent blocking of event loop
Wait for 5 seconds and send a response of "test3" to the client
My question here is if client 1 and client 2 call /test3 endpoint at the same time, and the assumption here is that client 1 hits the endpoint first, client 2 has to wait for client 1 to finish first before entering the event loop.
Can anybody here tells me if it is possible for multiple clients to call a single endpoint and run concurrently, not sequentially, but something like 1 thread per connection kind of analogy.
Of course, if I were to call other endpoint /test1 or /test2 while the code is still executing on /test3, I would still get a response straight from /test2, which is "test2" immediately.
app.get("/test1", (req, res) => {
console.log("test1");
setTimeout(() => res.send("test1"), 5000);
});
app.get("/test2", async (req, res, next) => {
console.log("test2");
res.send("test2");
});
app.get("/test3", (req, res) => {
console.log("test3");
setTimeout(() => res.send("test3"), 5000);
});
For those who have visited, it has got nothing to do with blocking of event loop.
I have found something interesting. The answer to the question can be found here.
When I was using chrome, the requests keep getting blocked after the first request. However, with safari, I was able to hit the endpoint concurrently. For more details look at the following link below.
GET requests from Chrome browser are blocking the API to receive further requests in NODEJS
Run your application in cluster. Lookup Pm2
This question needs more details to be answer and is clearly an opinion-based question. just because it is an strawman argument I will answer it.
first of all we need to define run concurrently, it is ambiguous if we assume the literal meaning in stric theory nothing RUNS CONCURRENTLY
CPUs can only carry out one instruction at a time.
The speed at which the CPU can carry out instructions is called the clock speed. This is controlled by a clock. With every tick of the clock, the CPU fetches and executes one instruction. The clock speed is measured in cycles per second, and 1c/s is known as 1 hertz. This means that a CPU with a clock speed of 2 gigahertz (GHz) can carry out two thousand million (or two billion for those in the US) for the rest of us/world 2000 million cycles per second.
cpu running multiple task "concurrently"
yes you're right now-days computers even cell phones comes with multi core which means the number of tasks running at the same time will depend upon the number of cores, but If you ask any expert such as this Associate Staff Engineer AKA me will tell you that is very very rarely you'll find a server with more than one core. why would you spend 500 USD for a multi core server if you can spawn a hold bunch of ...nano or whatever option available in the free trial... with kubernetes.
Another thing. why would you handle/configurate node to be incharge of the routing let apache and/or nginx to worry about that.
as you mentioned there is one thing call event loop which is a fancy way of naming a Queue Data Structure FIFO
so in other words. no, NO nodejs as well as any other programming language out there will run
but definitly it depends on your infrastructure.
I think we need some help here. Thanks in advance.
I have been doing programming in .Net for desktop applications and have used Timer objects to wait for a task to complete before the task result are shown in a data grid. Recently, we switched over to NodeJs and find it pretty interesting. We could design a small application that executes some tasks using PowerShell scripts and return the data to the client browser. However, I would have to execute a Timer on the client browser (when someone clicks on a button) to see if the file, that Timer receives from the server, has "ENDOFDATA" or not. Once the Timer sees ENDOFDATA it triggers another function to populate DIV with the data that was received from the server.
Is this the right way to get the data from a server? We really don't want to block EventLoop. We run PowerShell scripts on NodeJS to collect users from Active Directory and then send the data back to the client browser. The PowerShell scripts are executed as a Job so EventLoop is not blocked.
Here is an example of the code at NodeJs:
In the below code can we insert something that won't block the EventLoop but still respond to the server once the task is completed? As you can see in the code below, we would like to send the ADUsers.CSV file to the client browser once GetUsers.PS1 has finished executing. Since GetUSers.PS1 takes about five minutes to complete the Event Loop is blocked and the Server can no longer accept any other requests.
app.post("/LoadDomUsers", (request, response) => {
//we check if the request is an AJAX one and if accepts JSON
if (request.xhr || request.accepts("json, html") === "json") {
var ThisAD = request.body.ThisAD
console.log(ThisAD);
ps.addCommand("./public/ps/GetUsers.PS1", [{
name: 'AllParaNow',
value: ScriptPara
}])
ps.addCommand(`$rc = gc ` + __dirname + "/public/TestData/AD/ADUsers.CSV");
ps.addCommand(`$rc`);
ps.invoke().then((output) => {
response.send({ message: output });
console.log(output);
});
}
});
Thank you.
The way you describe your problem isn't that clear. I had to read some of the comments in your initial question just to be sure I understood the issue. Honestly, you could just utilize various CSV NPM packages to read and write from your active directory with NodeJS.
I/O is non-blocking with NodeJS, so you're not actually blocking the EventLoop. You can handle multiple I/O requests, since NodeJS will just create threads for each one,
and continue execution on the main thread until the I/O operations complete and send back the data to its function reference, adding them as functions to the callstack and resuming program execution from those function's references. After you get the I/O data, you just send it back to the client through the response object. There should be no timers needed.
So is the issue once the powershell script runs, you have to wait for that initial script to complete before being able to handle pending requests? I'm still a bit unclear...
I have an AWS lambda function that consumes data from an AWS SQS queue. If this lambda finds a problem when processing the data of a message, then this message has to be added in a dead letter queue.
The documentation I found is not clear about how can I make the lambda send the message to the Dead Letter Queue. How is that accomplished?
Should I use the sendMessage() method, like I'd do to insert in a standard queue, or is there a better approach?
AWS will automatically send messages to your dead-letter-queue (DLQ) for you if receiveMessage returns that message too many times (configurable on the queue with maxReceiveCount property) - typically this happens if you receive a message, but don't delete it (if for example, you had some exception in processing it). This is the simplest way to use a DLQ - by letting AWS put messages there for you.
However, there's nothing wrong with manually sending a message to a DLQ. There's nothing special about it - it's just another queue - you can send and receive messages from it, or even give it its own DLQ!
Manually sending messages to a DLQ is useful in several scenarios, the simplest one being your case: when you know the message is broken (and want to save time trying to reprocess it). Another example is if you need to quickly burn through old items in your main queue but still save those messages for processing later - enabling you to catch up from backlog by processing more recent events first.
The key things to remember when manually sending a message to a DLQ are:
Send the message to the queue FIRST
Mark the message as consumed in the original queue (using deleteMessage) so AWS's automatic mechanisms don't put it there for you later.
if you delete the message from the original queue first, there is a small chance the message is lost (ie: if you crash or have an error before storing the message elsewhere)
You are not supposed to send messages to the dead letter queue, messages that fail to process too many times will get there on their own see here
The point is you get the message, fail on it, don't delete it, and after maxReceiveCount times it will redrive it to the DLQ.
Note that you can simply send it to the DLQ (Hinted on by the documentation see where it says The NumberOfMessagesSent and NumberOfMessagesReceived for a Dead-Letter Queue Don't Match) however it seems like an abuse, to me at least.
TLDR: You're not supposed to send it yourself, the queue needs to be configured with a DLQ and Amazon will do it for you after a set amount of failures.
Perhaps the underlying issue is how the node-kafka module I am using has implemented things, but perhaps not, so here we go...
Using the node-kafa library, I am facing an issue with subscribing to consumer.on('message') events. The library is using the standard events module, so I think this question might be generic enough.
My actual code structure is large and complicated, so here is a pseudo-example of the basic layout to highlight my problem. (Note: This code snippet is untested so I might have errors here, but the syntax is not in question here anyway)
var messageCount = 0;
var queryCount = 0;
// Getting messages via some event Emitter
consumer.on('message', function(message) {
message++;
console.log('Message #' + message);
// Making a database call for each message
mysql.query('SELECT "test" AS testQuery', function(err, rows, fields) {
queryCount++;
console.log('Query #' + queryCount);
});
})
What I am seeing here is when I start my server, there are 100,000 or so backlogged messages that kafka will want to give me and it does so through the event emitter. So I start to get messages. To get and log all the messages takes about 15 seconds.
This is what I would expect to see for an output assuming the mysql query is reasonably fast:
Message #1
Message #2
Message #3
...
Message #500
Query #1
Message #501
Message #502
Query #2
... and so on in some intermingled fashion
I would expect this because my first mysql result should be ready very quickly and I would expect the result(s) to take their turn in the event loop to have the response processed. What I am actually getting is:
Message #1
Message #2
...
Message #100000
Query #1
Query #2
...
Query #100000
I am getting every single message before a mysql response is able to be processed. So my question is, why? Why am I not able to get a single database result until all the message events are complete?
Another note: I set a break point at .emit('message') in node-kafka and at mysql.query() in my code and I am hitting them turn-based. So it appears that all 100,000 emits are not stacking up up front before getting into my event subscriber. So there went my first hypothesis on the problem.
Ideas and knowledge would be very appreciated :)
The node-kafka driver uses quite a liberal buffer size (1M), which means that it will get as many messages from Kafka that will fit in the buffer. If the server is backlogged, and depending on the message size, this may mean (tens of) thousands of messages coming in with one request.
Because EventEmitter is synchronous (it doesn't use the Node event loop), this means that the driver will emit (tens of) thousands of events to its listeners, and since it's synchronous, it won't yield to the Node event loop until all messages have been delivered.
I don't think you can work around the flood of event deliveries, but I don't think that specifically the event delivery is problematic. The more likely problem is starting an asynchronous operation (in this case a MySQL query) for each event, which may flood the database with queries.
A possible workaround would be to use a queue instead of performing the queries directly from the event handlers. For instance, with async.queue you can limit the number of concurrent (asynchronous) tasks. The "worker" part of the queue would perform the MySQL query, and in the event handlers you'd merely push the message onto the queue.
classic problem, this time encountered in a nodejs environment and after thinking about it for a while i'm not sure what's the best way to solve this.
i have:
4 resources - lets call these "keys".
X tasks - lets call these "locks".
Y workers - between 1 to 4 actually.
A combination of 1 task and 1 resource (doesn't matter which) is a job to be done by a worker: or in other words - worker opens the lock with the key:)
I want to open to open these locks as fast as possible using my resources and workers.
Each worker should try to grab a key, grab a lock, unlock the lock, then put his key back.
When no keys are available a worker needs to wait till one is there.
When all locks are open workers can go home and enjoy a beer.
how would you solve this in a javascript / nodejs environment?
feel free to use redis/mongo/whatever tool you need to make this happen.
Please help me set my workers early back home today! :)
You can use RabbitMQ to solve this problem. You can maintain 4 different queues for each worker. Every worker will be bind to a separate queue.
So the worker will be waiting till there is no message on the queue.
As soon as there is a message in your case a key the worker will start
processing complete the task and again wait for the next key.
There will be single publisher and multiple listeners i.e. single supervisor and multiple workers.
You can even implement a batch processing if required.
for example:
connection.on('ready', function () {
connection.queue("queue_name", function(queue){
connection.exchange('exchange_name', {type: 'topic', confirm: true},function(exchange){
queue.bind('exchange_name');
queue.subscribe({ack: true},function (message, headers, deliveryInfo) {
try{
var data = unescape(message.data);
processMessage(data); // whatever needs to be done with the received message
queue.shift();
}
catch(e){
console.log('Some error occured.'+ e);
}
})
})
})
})