I do the test to run 2 difference consumers (let call it Consumer-A and Consumer-B) to the same Queue then I found that when Consumer-A pulling the message, Consumer-B doesn't pull the message and waiting for the Consumer-A to finish first and vice versa.
The process continues like this repeatedly.
So my question are:
Does this normal in term of multiple SQS consumer? I found this which seem to answer this question but I'm not sure. https://aws.amazon.com/sqs/faqs/#:~:text=Do%20Amazon%20SQS%20FIFO%20queues%20support%20multiple%20consumers%3F
What if we try to run the SQS consumer on ECS (example: with 3 desired count), do we get 30 messages at a time or what?
Do you have any others solutions for scaling to consume more SQS message (not using Lambda would be great)?
Related
Here is what I'm doing (using this: https://firebase.google.com/docs/functions/task-functions):
Launch an array of callable functions from the site (up to 500 cloud functions are launched after the click of a button from each user) - The same cloud function is called 500 times
Gets them in a queue to be processed at the desired rate
Each of the functions has the same task:
- Get a specific file from an API call (which takes some time)
- Download it to firebase storage (also not instant)
- Finally, update the Firestore database accordingly
Here is my issue:
Doing this is working fine with one user, however, having 2 or more users at the same time will not scale as wanted.
Indeed, the second user has to wait for the 500 cloud functions from the first user to be completed before it can start running its own 500 functions. (Which can take 30min) (Since they get added to the queue)
Edit: As the first comment said, the first user is not actually waiting on all 500 to run some for the second user to proceed, however, the point is that both users are "conflicting" (increase the time of the first user process), and will conflict even more if another user come to start his process as well
So my questions are:
Is there a way to have a queue specific to each user somehow?
If not, how should I approach this using cloud functions? Is this possible?
If not possible with cloud functions, what would you advise?
Any help will be appreciated
Edit: Possible solutions I'm thinking of so far:
1- Minimize the time each of the functions takes and increase the number of function that can run in parallel without exceeding API call possibilities
2- Handle all the work into one big function per user call (without going to the 9min limit if possible) (that means having a 500 array loop inside the function instead of launching 500 cloud functions)
3- Others?
this question might be duplicated but I am still not getting the answer. I am fairly new to node.js so I might need some help. Many have said that node.js is perfectly free to run incoming requests asynchronously, but the code below shows that if multiple requests hit the same endpoint, say /test3, the callback function will:
Print "test3"
Call setTimeout() to prevent blocking of event loop
Wait for 5 seconds and send a response of "test3" to the client
My question here is if client 1 and client 2 call /test3 endpoint at the same time, and the assumption here is that client 1 hits the endpoint first, client 2 has to wait for client 1 to finish first before entering the event loop.
Can anybody here tells me if it is possible for multiple clients to call a single endpoint and run concurrently, not sequentially, but something like 1 thread per connection kind of analogy.
Of course, if I were to call other endpoint /test1 or /test2 while the code is still executing on /test3, I would still get a response straight from /test2, which is "test2" immediately.
app.get("/test1", (req, res) => {
console.log("test1");
setTimeout(() => res.send("test1"), 5000);
});
app.get("/test2", async (req, res, next) => {
console.log("test2");
res.send("test2");
});
app.get("/test3", (req, res) => {
console.log("test3");
setTimeout(() => res.send("test3"), 5000);
});
For those who have visited, it has got nothing to do with blocking of event loop.
I have found something interesting. The answer to the question can be found here.
When I was using chrome, the requests keep getting blocked after the first request. However, with safari, I was able to hit the endpoint concurrently. For more details look at the following link below.
GET requests from Chrome browser are blocking the API to receive further requests in NODEJS
Run your application in cluster. Lookup Pm2
This question needs more details to be answer and is clearly an opinion-based question. just because it is an strawman argument I will answer it.
first of all we need to define run concurrently, it is ambiguous if we assume the literal meaning in stric theory nothing RUNS CONCURRENTLY
CPUs can only carry out one instruction at a time.
The speed at which the CPU can carry out instructions is called the clock speed. This is controlled by a clock. With every tick of the clock, the CPU fetches and executes one instruction. The clock speed is measured in cycles per second, and 1c/s is known as 1 hertz. This means that a CPU with a clock speed of 2 gigahertz (GHz) can carry out two thousand million (or two billion for those in the US) for the rest of us/world 2000 million cycles per second.
cpu running multiple task "concurrently"
yes you're right now-days computers even cell phones comes with multi core which means the number of tasks running at the same time will depend upon the number of cores, but If you ask any expert such as this Associate Staff Engineer AKA me will tell you that is very very rarely you'll find a server with more than one core. why would you spend 500 USD for a multi core server if you can spawn a hold bunch of ...nano or whatever option available in the free trial... with kubernetes.
Another thing. why would you handle/configurate node to be incharge of the routing let apache and/or nginx to worry about that.
as you mentioned there is one thing call event loop which is a fancy way of naming a Queue Data Structure FIFO
so in other words. no, NO nodejs as well as any other programming language out there will run
but definitly it depends on your infrastructure.
I have an AWS lambda function that consumes data from an AWS SQS queue. If this lambda finds a problem when processing the data of a message, then this message has to be added in a dead letter queue.
The documentation I found is not clear about how can I make the lambda send the message to the Dead Letter Queue. How is that accomplished?
Should I use the sendMessage() method, like I'd do to insert in a standard queue, or is there a better approach?
AWS will automatically send messages to your dead-letter-queue (DLQ) for you if receiveMessage returns that message too many times (configurable on the queue with maxReceiveCount property) - typically this happens if you receive a message, but don't delete it (if for example, you had some exception in processing it). This is the simplest way to use a DLQ - by letting AWS put messages there for you.
However, there's nothing wrong with manually sending a message to a DLQ. There's nothing special about it - it's just another queue - you can send and receive messages from it, or even give it its own DLQ!
Manually sending messages to a DLQ is useful in several scenarios, the simplest one being your case: when you know the message is broken (and want to save time trying to reprocess it). Another example is if you need to quickly burn through old items in your main queue but still save those messages for processing later - enabling you to catch up from backlog by processing more recent events first.
The key things to remember when manually sending a message to a DLQ are:
Send the message to the queue FIRST
Mark the message as consumed in the original queue (using deleteMessage) so AWS's automatic mechanisms don't put it there for you later.
if you delete the message from the original queue first, there is a small chance the message is lost (ie: if you crash or have an error before storing the message elsewhere)
You are not supposed to send messages to the dead letter queue, messages that fail to process too many times will get there on their own see here
The point is you get the message, fail on it, don't delete it, and after maxReceiveCount times it will redrive it to the DLQ.
Note that you can simply send it to the DLQ (Hinted on by the documentation see where it says The NumberOfMessagesSent and NumberOfMessagesReceived for a Dead-Letter Queue Don't Match) however it seems like an abuse, to me at least.
TLDR: You're not supposed to send it yourself, the queue needs to be configured with a DLQ and Amazon will do it for you after a set amount of failures.
Perhaps the underlying issue is how the node-kafka module I am using has implemented things, but perhaps not, so here we go...
Using the node-kafa library, I am facing an issue with subscribing to consumer.on('message') events. The library is using the standard events module, so I think this question might be generic enough.
My actual code structure is large and complicated, so here is a pseudo-example of the basic layout to highlight my problem. (Note: This code snippet is untested so I might have errors here, but the syntax is not in question here anyway)
var messageCount = 0;
var queryCount = 0;
// Getting messages via some event Emitter
consumer.on('message', function(message) {
message++;
console.log('Message #' + message);
// Making a database call for each message
mysql.query('SELECT "test" AS testQuery', function(err, rows, fields) {
queryCount++;
console.log('Query #' + queryCount);
});
})
What I am seeing here is when I start my server, there are 100,000 or so backlogged messages that kafka will want to give me and it does so through the event emitter. So I start to get messages. To get and log all the messages takes about 15 seconds.
This is what I would expect to see for an output assuming the mysql query is reasonably fast:
Message #1
Message #2
Message #3
...
Message #500
Query #1
Message #501
Message #502
Query #2
... and so on in some intermingled fashion
I would expect this because my first mysql result should be ready very quickly and I would expect the result(s) to take their turn in the event loop to have the response processed. What I am actually getting is:
Message #1
Message #2
...
Message #100000
Query #1
Query #2
...
Query #100000
I am getting every single message before a mysql response is able to be processed. So my question is, why? Why am I not able to get a single database result until all the message events are complete?
Another note: I set a break point at .emit('message') in node-kafka and at mysql.query() in my code and I am hitting them turn-based. So it appears that all 100,000 emits are not stacking up up front before getting into my event subscriber. So there went my first hypothesis on the problem.
Ideas and knowledge would be very appreciated :)
The node-kafka driver uses quite a liberal buffer size (1M), which means that it will get as many messages from Kafka that will fit in the buffer. If the server is backlogged, and depending on the message size, this may mean (tens of) thousands of messages coming in with one request.
Because EventEmitter is synchronous (it doesn't use the Node event loop), this means that the driver will emit (tens of) thousands of events to its listeners, and since it's synchronous, it won't yield to the Node event loop until all messages have been delivered.
I don't think you can work around the flood of event deliveries, but I don't think that specifically the event delivery is problematic. The more likely problem is starting an asynchronous operation (in this case a MySQL query) for each event, which may flood the database with queries.
A possible workaround would be to use a queue instead of performing the queries directly from the event handlers. For instance, with async.queue you can limit the number of concurrent (asynchronous) tasks. The "worker" part of the queue would perform the MySQL query, and in the event handlers you'd merely push the message onto the queue.
I'm using RabbitMQ as a buffer for a large number of messages that need to be saved to a database. The messages come in, then on the other end, a script pulls a number of them and writes them, as a batch, to the database. This app is written in NodeJS (using Rabbit.js).
While I haven't found a way to wait, and consume a set number of messages at once (say, 100), I am using a worker queue to receive messages in Node, and then write them after a time period / max number of messages has been reached.
If the app dies, however, or otherwise fails, I need the messages to be re-released onto the queue.
Therefore I can use the ack() function in Rabbit.js on the queue, but that merely acknowledges the most recent message, instead of letting me select a number of messages to acknowledge, and I'm reluctant to call ack() 100 times just to get to the right number of acknowledgements.
Is there a way to acknowledge receipt of X messages using Rabbit.js (or some other Node.js library that will work with RabbitMQ)?