I'm using RabbitMQ as a buffer for a large number of messages that need to be saved to a database. The messages come in, then on the other end, a script pulls a number of them and writes them, as a batch, to the database. This app is written in NodeJS (using Rabbit.js).
While I haven't found a way to wait, and consume a set number of messages at once (say, 100), I am using a worker queue to receive messages in Node, and then write them after a time period / max number of messages has been reached.
If the app dies, however, or otherwise fails, I need the messages to be re-released onto the queue.
Therefore I can use the ack() function in Rabbit.js on the queue, but that merely acknowledges the most recent message, instead of letting me select a number of messages to acknowledge, and I'm reluctant to call ack() 100 times just to get to the right number of acknowledgements.
Is there a way to acknowledge receipt of X messages using Rabbit.js (or some other Node.js library that will work with RabbitMQ)?
Related
Here is what I'm doing (using this: https://firebase.google.com/docs/functions/task-functions):
Launch an array of callable functions from the site (up to 500 cloud functions are launched after the click of a button from each user) - The same cloud function is called 500 times
Gets them in a queue to be processed at the desired rate
Each of the functions has the same task:
- Get a specific file from an API call (which takes some time)
- Download it to firebase storage (also not instant)
- Finally, update the Firestore database accordingly
Here is my issue:
Doing this is working fine with one user, however, having 2 or more users at the same time will not scale as wanted.
Indeed, the second user has to wait for the 500 cloud functions from the first user to be completed before it can start running its own 500 functions. (Which can take 30min) (Since they get added to the queue)
Edit: As the first comment said, the first user is not actually waiting on all 500 to run some for the second user to proceed, however, the point is that both users are "conflicting" (increase the time of the first user process), and will conflict even more if another user come to start his process as well
So my questions are:
Is there a way to have a queue specific to each user somehow?
If not, how should I approach this using cloud functions? Is this possible?
If not possible with cloud functions, what would you advise?
Any help will be appreciated
Edit: Possible solutions I'm thinking of so far:
1- Minimize the time each of the functions takes and increase the number of function that can run in parallel without exceeding API call possibilities
2- Handle all the work into one big function per user call (without going to the 9min limit if possible) (that means having a 500 array loop inside the function instead of launching 500 cloud functions)
3- Others?
I'm streaming and processing tweets in Firebase Cloud Functions using the Twitter API.
In my stream, I am tracking various keywords and users of Twitter, hence the influx of tweets is very high and a new tweet is delivered even before I have processed the previous tweet, which leads to lapses as the new tweet sometimes does not get processed.
This is how my stream looks:
...
const stream = twitter.stream('statuses/filter', {track: [various, keywords, ..., ...], follow: [userId1, userId2, userId3, userId3, ..., ...]});
stream.on('tweet', (tweet) => {
processTweet(tweet); //This takes time because there are multiple network requests involved and also sometimes recursively running functions depending on the tweets properties.
})
...
processTweet(tweet) essentially is compiling threads from twitter, which takes time depending upon the length of the thread. Sometimes a few seconds also. I have optimised processTweet(tweet) as much as possible to compile the threads reliably.
I want to run processTweet(tweet) parallelly and queue the tweets that are coming in at the time of processing so that it runs reliably as the twitter docs specify.
Ensure that your client is reading the stream fast enough. Typically you should not do any real processing work as you read the stream. Read the stream and hand the activity to another thread/process/data store to do your processing asynchronously.
Help would be very much appreciated.
This twitter streaming API will not work with Cloud Functions.
Cloud Functions code can only be invoked in response to incoming events, and the code may only run for up to 9 minutes max (default 60 seconds). After that, the function code is forced to shut down. With Cloud Functions, there is no way to continually process some stream of data coming from an API.
In order to use this API, you will need to use some other compute product that allows you to run code indefinitely on a dedicated server instance, such as App Engine or Compute Engine.
I have an AWS lambda function that consumes data from an AWS SQS queue. If this lambda finds a problem when processing the data of a message, then this message has to be added in a dead letter queue.
The documentation I found is not clear about how can I make the lambda send the message to the Dead Letter Queue. How is that accomplished?
Should I use the sendMessage() method, like I'd do to insert in a standard queue, or is there a better approach?
AWS will automatically send messages to your dead-letter-queue (DLQ) for you if receiveMessage returns that message too many times (configurable on the queue with maxReceiveCount property) - typically this happens if you receive a message, but don't delete it (if for example, you had some exception in processing it). This is the simplest way to use a DLQ - by letting AWS put messages there for you.
However, there's nothing wrong with manually sending a message to a DLQ. There's nothing special about it - it's just another queue - you can send and receive messages from it, or even give it its own DLQ!
Manually sending messages to a DLQ is useful in several scenarios, the simplest one being your case: when you know the message is broken (and want to save time trying to reprocess it). Another example is if you need to quickly burn through old items in your main queue but still save those messages for processing later - enabling you to catch up from backlog by processing more recent events first.
The key things to remember when manually sending a message to a DLQ are:
Send the message to the queue FIRST
Mark the message as consumed in the original queue (using deleteMessage) so AWS's automatic mechanisms don't put it there for you later.
if you delete the message from the original queue first, there is a small chance the message is lost (ie: if you crash or have an error before storing the message elsewhere)
You are not supposed to send messages to the dead letter queue, messages that fail to process too many times will get there on their own see here
The point is you get the message, fail on it, don't delete it, and after maxReceiveCount times it will redrive it to the DLQ.
Note that you can simply send it to the DLQ (Hinted on by the documentation see where it says The NumberOfMessagesSent and NumberOfMessagesReceived for a Dead-Letter Queue Don't Match) however it seems like an abuse, to me at least.
TLDR: You're not supposed to send it yourself, the queue needs to be configured with a DLQ and Amazon will do it for you after a set amount of failures.
I am using kafka-node node.js library. I have problem with message order when consuming topic with 250k messages (which where loaded into Kafka in batches of 2000 messages) with fresh start (no offsets in zookeeper). Consumer often does not process messages from offset 0, rather it starts at 4000 or 8000, or so. Also it continuosly processes block of 1000 messages and jumps at later or sooner N*1000 offset. I have tried changed maxTickMessages to 800 and it process block od 800 messages, but it still jumped to N*1000 offset. I could not find missing 200 offsets in debug log. Chaning maxTickMessages or maxNumSegments to very large number did not help.
I was printing current message offset directly in Kafka binary protocol decoder, which should eliminate some of potential async effects. Please see Offset log and used code kafka-order-test.js. I think there is problem in Kafka binary protocol parsing, but I was not able to find problem in it.
Kafka itself should not be a problem as I dumped topic with kafkacat, which mantained correct offset and messages order. I also monitored node.js-Kafka network traffic with Wireshark, and messages were shown in correct order.
This problem was caused by asynchronous nested MessageSet decompression which resulted in out of order message consuming. Kafka returns messages in MessageSet which contains nested compressed MessageSets of 2000 messages (in my testing). Unfortunately decompression was asynchronous without any synchronisation, so messages were processed out of order in batches of max 2000 (depends on maxTickMessages). My fix applies synchronous decompression.
I have developed a javascript chat (php on the backend) using:
1) long-polling to get new messages for the receiver
2) sessionStorage to store the counter of messages
3) setInterval to read new messages and if sessionStorageCounter < setIntervalCounter then the last message is shown to receiver.
4) javascript to create,update and write the chat dialogues
The module is working fine, but when users have a speedy chat the receiver' front end gets two or three same messages, (neither the counter fails, nor the query provides double inserts).
The code seems to be correct (that's why I don't provide the code), so the interval delay might be the reason (on reducing interval delay, nothing changes).
Do you think that the above schema is a bad practice and which schema do you think would eliminate the errors?
My approach, if solving it myself (as opposed to using an existing library that already handles this) would be:
Have the server assign a unique ID (GUID) to each message as it arrives.
On the clients, store the ID of the most recently received message.
When polling for new messages, do so with the ID of the last message successfully received. Server then responds by finding that message in its own queue and replaying all of the subsequent messages.
To guard against 'dropped' messages, each message can also carry the ID of the immediately-previous message (allowing the client to do consistency-checking)
If repolling does cause duplicates to be delivered from server to client, the presence of unique IDs on each message makes eliminating them trivial. Think of the server-side message queue as an event stream, with each client tracking their last-read position. The client makes no guesses about the appropriate order of messages, how many there are, etc - because its state consists entirely of 'what have I seen', there are few opportunities to get out of sync.
Since it's real time chat, the setInterval interval is probably small enough to ask the server for new messages two or three times simultaneously. Make sure that the server handler is synchronized and it is ignoring duplicated queries from the same user.