How to handle the Storage Queue using the WebJobs

How to handle the Storage Queue using the WebJobs - javascript

I just started to use Azure as my mobile development as well as my web development.
I am using NodeJs as my framework to work on the azure backend. I am using mobile services and web apps in the azure.
Here is the situation, I am using the Storage Queue from Azure and I am using webjob from my webapps to handle the storage queues. The messages in the queue are going to be sent out to each specific user via notification hub. (Push Notification)
So, the queues will have the size of the 50,000 or more queue messages. All these messages are used to push out the message to the user one by one. However, I tried to handle the queues using WebJob by scheduling 2minutes interval. I know that webjob wont run two instances when the schedule is currently running.
Initially, I wanna use the webjob which run continuously but it will go to pending to restart once the script run finished. My assumption for the continuously running of webjob is that it will run under an endless loop for the script over and over again. until it caught exception or something wrong. My assumption goes wrong, where it will restart by it self once it succeeded the whole script. I know the restart can be adjusted to less than 60seconds but I am not sure whether this helps as I could a lot aysnc operation as well.
For my script, it will run 50,000 or more users messages in the loop. Then, it will send out the push message via Azure nodejs package and then upon return, then it will delete the messages so that it wont appear in the queue anymore. So, there will be some async operation for each loop in the action.
However, everything is working fine but the webjob only have execute maximum of 5 mins and then it will run again on next schedule. Meaning, it will only run to a maximum 5 mins regardless of the operation. I tried with 1,000 messages from the queue and everything works fine but when the messages go up to 5,000 and above, the time is not sufficient. Therefore, some of the async operation is not completed which cause the messages are not deleted.
Is there a way to extend the 5 mins execution time or other better ways to handle the Storage Queues. I looked into the Webjobs SDK but it is only limited to C# and Visual Studio. I am using Mac OSX and Javascript which I could not use.
Please advise as I wasted a lot of time figuring out whats best to handle the storage queue using webjobs but now it seems like it does not serve the purpose when the messages grow bigger and when it dealt with async operation with the total of only 5 mins execution time. I do not have any VM at the moments which I only use PAAS in azure.

According your description:
All these messages are used to push out the message to the user one by one
it will run 50,000 or more users messages in the loop
So your requirement is to send each message in queue to user,and now you get all the messages in queue one time even the message size will get up to more then 50,000, and loop the messages for further operations?
If there is any misunderstanding, feel free to let me know.
In my opinion, cloud you get the top message of the queue at once, and send it to your user, so that it will remarkbly reduce the processing time and which can be set in a continuously webjob. You can refer to How To: Peek at the Next Message to see how to peek at the message in the front of a queue without removing it from the queue
update
As I found you have mentioned that I also have a Web App in Node.js in your whole project architecture.
So I consider whether you can leverage continuous webjob in Web Apps to get one message and send to Notification Hub one time.
And here is my test code snippet:
var azureStorage = require('azure-storage'),
azure = require('azure'),
accountName = '<accountName>',
accountKey = '<accountKey>';
var queueSvc = azureStorage.createQueueService(accountName, accountKey);
var notificationHubService = azure.createNotificationHubService('<notificationhub-name>', '<connectionstring>');
queueSvc.getMessages('myqueue', {numOfMessages:1}, function(error, result, response) {
if (!error) {
// Message text is in messages[0].messagetext
var message = result[0];
console.log(message.messagetext);
var payload = {
data: {
msg: message.messagetext
}
};
notificationHubService.gcm.send(null, payload, function(error) {
if (!error) {
//notification sent
console.log('notification sent');
queueSvc.deleteMessage('myqueue', message.messageid,message.popreceipt,function(error, response) {
if (!error) {
console.log(response);
// Message deleted
} else {
console.log(error);
}
});
}
});
}
});
Details refer to How to use Notification Hubs from Node.js And https://github.com/Azure/azure-storage-node/blob/master/lib/services/queue/queueservice.js#L727
update2
As I get the idea of Service-bus demo on GitHub, I modified the code above, and which greatly improve the efficiency.
Here the code snippet, for your information:
var queueName = 'myqueue';
function checkForMessages(queueSvc, queueName, callback) {
queueSvc.getMessages(queueName, function(err, message) {
if (err) {
if (err === 'No messages to receive') {
console.log('No messages');
} else {
console.log(err);
// callback(err);
}
} else {
callback(null, message[0]);
console.log(message);
}
});
}
function processMessage(queueSvc, err, lockedMsg) {
if (err) {
console.log('Error on Rx: ', err);
} else {
console.log('Rx: ', lockedMsg);
var payload = {
data: {
msg: lockedMsg.messagetext
}
};
notificationHubService.gcm.send(null, payload, function(error) {
if (!error) {
//notification sent
console.log('notification sent');
console.log(lockedMsg)
console.log(lockedMsg.popreceipt)
queueSvc.deleteMessage(queueName, lockedMsg.messageid, lockedMsg.popreceipt, function(err2) {
if (err2) {
console.log('Failed to delete message: ', err2);
} else {
console.log('Deleted message.');
}
})
}
});
}
}
var t = setInterval(checkForMessages.bind(null, queueSvc, queueName, processMessage.bind(null, queueSvc)), 100);
I set the loop time as 100ms in setInterval, now it can process almost 600 message per minutes in my test.

The various configuration settings for WebJobs are explained on this wiki page. In your case you should increase the WEBJOBS_IDLE_TIMEOUT value, which is the time in seconds that a triggered job will timeout if it hasn't produced any output for a period of time. The WEBJOBS_IDLE_TIMEOUT setting needs to be configured in the portal app settings, not via the app.config file.

Related

Node Express server terminates eight hours after inactivity

I have written a small backend application with Node Express.
Its purpose is to retrieve data from a MySQL database and send the resulting rows as a JSON-formatted string back to the requesting client.
app.get(`${baseUrl}/data`, (req, res) => {
console.log("Get data");
getDataFromDatabase((error, data) => {
if (error) {
return res.json({status: CODE_ERROR, content: error});
}
else {
return res.json({status: CODE_SUCCESS, content: data});
}
});
});
Inside the getDataFromDatabase() method a simple SELECT statement is sent to the DB and it receives a status code plus content. In case of success, the content would be a JSON of returned rows, otherwise information about the MySQL error - again in JSON format.
Basically this code works fine. There are a few other methods which were built the same way but don't cause the following problem:
After running this code on a server, I found that the process always dies exactly eight hours after the last call of the above method. The method can be called dozens of times, the problem occurs only after inactivity.
A quick and dirty workaround due to a lack of time was to simply create a cronjob which kills the process and re-run the application every six hours. However, the new process also gets killed eight hours after the last request has been sent in the last process.
While writing this question, I checked again for any differences between my methods. I found the following, here a snippet of getDataFromDatabase():
if (error) {
callback(error, null);
}
However, a method getOtherDataFromDatabase() has got a return keyword before its callback:
if(!error) {
return callback(null, data);
}
So, is the return keyword making a difference here? Is there some kind of unfinished asynchronous code which terminates after a timeout? I've got no exceptions in my console output, the process dies silently.

WebSocket needs browser refresh to update list

My project works as intended except that I have to refresh the browser every time my keyword list sends something to it to display. I assume it's my inexperience with Expressjs and not creating the route correctly within my websocket? Any help would be appreciated.
Browser
let socket = new WebSocket("ws://localhost:3000");
socket.addEventListener('open', function (event) {
console.log('Connected to WS server')
socket.send('Hello Server!');
});
socket.addEventListener('message', function (e) {
const keywordsList = JSON.parse(e.data);
console.log("Received: '" + e.data + "'");
document.getElementById("keywordsList").innerHTML = e.data;
});
socket.onclose = function(code, reason) {
console.log(code, reason, 'disconnected');
}
socket.onerror = error => {
console.error('failed to connect', error);
};
Server
const ws = require('ws');
const express = require('express');
const keywordsList = require('./app');
const app = express();
const port = 3000;
const wsServer = new ws.Server({ noServer: true });
wsServer.on('connection', function connection(socket) {
socket.send(JSON.stringify(keywordsList));
socket.on('message', message => console.log(message));
});
// `server` is a vanilla Node.js HTTP server, so use
// the same ws upgrade process described here:
// https://www.npmjs.com/package/ws#multiple-servers-sharing-a-single-https-server
const server = app.listen(3000);
server.on('upgrade', (request, socket, head) => {
wsServer.handleUpgrade(request, socket, head, socket => {
wsServer.emit('connection', socket, request);
});
});

In answer to "How to Send and/or Stream array data that is being continually updated to a client" as arrived at in comment.
A possible solution using WebSockets may be to
Create an interface on the server for array updates (if you haven't already) that isolates the array object from arbitrary outside modification and supports a callback when updates are made.
Determine the latency allowed for multiple updates to occur without being pushed. The latency should allow reasonable time for previous network traffic to complete without overloading bandwidth unnecessarily.
When an array update occurs, start a timer if not already running for the latency period .
On timer expiry JSON.stringify the array (to take a snapshot), clear the timer running status, and message the client with the JSON text.
A slightly more complicated method to avoid delaying all push operations would be to immediately push single updates unless they occur within a guard period after the most recent push operation. A timer could then push modifications made during the guard period at the end of the guard period.
Broadcasting
The WebSockets API does not directly support broadcasting the same data to multiple clients. Refer to Server Broadcast in ws documentation for an example of sending data to all connected clients using a forEach loop.
Client side listener
In the client-side message listener
document.getElementById("keywordsList").innerHTML = e.data;
would be better as
document.getElementById("keywordsList").textContent = keywordList;
to both present keywords after decoding from JSON and prevent them ever being treated as HTML.

So I finally figured out what I wanted to accomplish. It sounds straight forward after I learned enough and thought about how to structure the back end of my project.
If you have two websockets running and one needs information from the other, you cannot run them side by side. You need to have one encapsulate the other and then call the websocket INSIDE of the other websocket. This can easily cause problems down the road for other projects since now you have one websocket that won't fire until the other is run but for my project it makes perfect sense since it is locally run and needs all the parts working 100 percent in order to be effective. It took me a long time to understand how to structure the code as such.

Weird socket.io behavior when Node server is down and then restarted

I implemented a simple chat for my website where users can talk to each other with ExpressJS and Socket.io. I added a simple protection from a ddos attack that can be caused by one person spamming the window like this:
if (RedisClient.get(user).lastMessageDate > currentTime - 1 second) {
return error("Only one message per second is allowed")
} else {
io.emit('message', ...)
RedisClient.set(user).lastMessageDate = new Date()
}
I am testing this with this code:
setInterval(function() {
$('input').val('message ' + Math.random());
$('form').submit();
}, 1);
It works correctly when Node server is always up.
However, things get extremely weird if I turn off the Node server, then run the code above, and start Node server again in a few seconds. Then suddenly, hundreds of messages are inserted into the window and the browser crashes. I assume it is because when Node server is down, socket.io is saving all the client emits, and once it detects Node server is online again, it pushes all of those messages at once asynchronously.
How can I protect against this? And what is exactly happening here?
edit: If I use Node in-memory instead of Redis, this doesn't happen. I am guessing cause servers gets flooded with READs and many READs happen before RedisClient.set(user).lastMessageDate = new Date() finishes. I guess what I need is atomic READ / SET? I am using this module: https://github.com/NodeRedis/node_redis for connecting to Redis from Node.

You are correct that this happens due to queueing up of messages on client and flooding on server.
When the server receives messages, it receives messages all at once, and all of these messages are not synchronous. So, each of the socket.on("message:... events are executed separately, i.e. one socket.on("message... is not related to another and executed separately.
Even if your Redis-Server has a latency of a few ms, these messages are all received at once and everything always goes to the else condition.
You have the following few options.
Use a rate limiter library like this library. This is easy to configure and has multiple configuration options.
If you want to do everything yourself, use a queue on server. This will take up memory on your server, but you'll achieve what you want. Instead of writing every message to server, it is put into a queue. A new queue is created for every new client and delete this queue when processing the last item in queue.
(update) Use multi + watch to create lock so that all other commands except the current one will fail.
the pseudo-code will be something like this.
let queue = {};
let queueHandler = user => {
while(queue.user.length > 0){
// your redis push logic here
}
delete queue.user
}
let pushToQueue = (messageObject) => {
let user = messageObject.user;
if(queue.messageObject.user){
queue.user = [messageObject];
} else {
queue.user.push(messageObject);
}
queueHandler(user);
}
socket.on("message", pushToQueue(message));
UPDATE
Redis supports locking with WATCH which is used with multi. Using this, you can lock a key, and any other commands that try to access that key in thet time fail.
from the redis client README
Using multi you can make sure your modifications run as a transaction,
but you can't be sure you got there first. What if another client
modified a key while you were working with it's data?
To solve this, Redis supports the WATCH command, which is meant to be
used with MULTI:
var redis = require("redis"),
client = redis.createClient({ ... });
client.watch("foo", function( err ){
if(err) throw err;
client.get("foo", function(err, result) {
if(err) throw err;
// Process result
// Heavy and time consuming operation here
client.multi()
.set("foo", "some heavy computation")
.exec(function(err, results) {
/**
* If err is null, it means Redis successfully attempted
* the operation.
*/
if(err) throw err;
/**
* If results === null, it means that a concurrent client
* changed the key while we were processing it and thus
* the execution of the MULTI command was not performed.
*
* NOTICE: Failing an execution of MULTI is not considered
* an error. So you will have err === null and results === null
*/
});
}); });

Perhaps you could extend your client-side code, to prevent data being sent if the socket is disconnected? That way, you prevent the library from queuing messages while the socket is disconnected (ie the server is offline).
This could be achieved by checking to see if socket.connected is true:
// Only allow data to be sent to server when socket is connected
function sendToServer(socket, message, data) {
if(socket.connected) {
socket.send(message, data)
}
}
More information on this can be found at the docs https://socket.io/docs/client-api/#socket-connected
This approach will prevent the built in queuing behaviour in all scenarios where a socket is disconnected, which may not be desirable, however if should protect against the problem you are noting in your question.
Update
Alternatively, you could use a custom middleware on the server to achieve throttling behaviour via socket.io's server API:
/*
Server side code
*/
io.on("connection", function (socket) {
// Add custom throttle middleware to the socket when connected
socket.use(function (packet, next) {
var currentTime = Date.now();
// If socket has previous timestamp, check that enough time has
// lapsed since last message processed
if(socket.lastMessageTimestamp) {
var deltaTime = currentTime - socket.lastMessageTimestamp;
// If not enough time has lapsed, throw an error back to the
// client
if (deltaTime < 1000) {
next(new Error("Only one message per second is allowed"))
return
}
}
// Update the timestamp on the socket, and allow this message to
// be processed
socket.lastMessageTimestamp = currentTime
next()
});
});

Implementing Slack slash command delayed responses

I built a Slack slash command that communicates with a custom Node API and POSTS acronym data in some way, shape, or form. It either gets the meaning of an acronym or adds/removes a new acronym to a Mongo database.
The command works pretty well so far, but Slack occasionally returns a timeout error because it expects a response within 3 seconds. As a result, I'm trying to implement delayed responses. I'm not sure that I am implementing delayed responses properly for my Slack slash command & Node API.
This resource on Slack slash commands has information on delayed responses. The idea is that I want to send a 200 response immediately to let the Slack user know that their request has been processed. Then I want to send a delayed response to slackReq.response_url that isn't constrained by the 3-second time limit.
The Code
let jwt = require('jsonwebtoken');
let request = require('request');
let slackHelper = require('../helpers/slack');
// ====================
// Slack Request Body
// ====================
// {
// "token":"~",
// "team_id":"~"
// "team_domain":"~",
// "channel_id":"~",
// "channel_name":"~",
// "user_id":"~",
// "user_name":"~",
// "command":"~",
// "text":"~",
// "response_url":"~"
// }
exports.handle = (req, res) => {
let slackReq = req.body;
let token = slackReq.token;
let teamId = slackReq.team_id;
if (!token || !teamId || !slackHelper.match(token, teamId)) {
// Handle an improper Slack request
res.json({
response_type: 'ephemeral',
text: 'Incorrect request'
});
} else {
// Handle a valid Slack request
slackHelper.handleReq(slackReq, (err, slackRes) => {
if (err) {
res.json({
response_type: 'ephemeral',
text: 'There was an error'
});
} else {
// NOT WORKING - Immediately send a successful response
res.json({
response_type: 'ephemeral',
text: 'Got it! Processing your acronym request...'
})
let options = {
method: 'POST',
uri: slackReq.response_url,
body: slackRes,
json: true
};
// Send a delayed response with the actual acronym data
request(options, err => {
if (err) console.log(err);
});
}
});
}
};
What's Happening Right Now
Say I want to find the meaning of acronym NBA. I go on Slack and shoot out the following:
/acronym NBA
I then hit the 3-second timeout error - Darn – that slash command didn't work (error message: Timeout was reached). Manage the command at slash-command.
I send a request a few more times (2 to 4 times), and then the API finally returns, all at once:
Got it! Processing your acronym request...
NBA means "National Basketball Association".
What I Want to Happen
I go on Slack and shoot out the following:
/acronym NBA
I immediately get the following:
Got it! Processing your acronym request...
Then, outside of the 3-second window, I get the following:
NBA means "National Basketball Association".
I never hit a timeout error.
Conclusion
What am I doing wrong here? For some reason, that res.json() with the processing message isn't immediately being sent back. What can I do to fix this?
Thank you in advance!
Edit 1
I tried to replace the res.json() call with res.sendStatus(200).json(), but unfortunately, that only returned an 'OK' without actually processing the request.
I subsequently tried res.status(200).send({..stuff..}) but that resulted in the same problem I was having before.
I think res.json() sends a 200 automatically anyway, but its just not responding fast enough for some reason.
Solution
I eventually figured this one out. I was implementing the delayed responses right all along.
Since I'm using the free plan for Heroku, the dyno that's hosting my app would go down after 30 minutes of inactivity. When the app went down, the first few requests would time out on Slack before properly responding to a request.
The solution to this is either 1) upgrade to a new plan that keeps the dyno active at all times, or 2) ping the app with a simple get request every 15 or so minutes, like so:
const intervalMins = 15;
setInterval(() => {
http.get("<insert app url here>");
console.log('Ping!');
}, intervalMin * 60000)
I decided to go with the latter option. I don't run into the issue of the dyno sleeping anymore. I'd check this article for more details.

How to stop consuming message from selective queue - RabbitMQ - Javascript/node

I am building a REST-amqp sample in which I get messages from a given queue in rabbitmq and I send the messages back to the client via REST.
I have implemented the code following rabbitmq tutorial for node.js
var amqp = require('amqplib/callback_api');
amqp.connect('amqp://192.168.225.203:5672', function (err, conn) {
conn.createChannel(function (err, ch) {
var q = 'aQueue';
ch.assertQueue(q, { durable: false });
var messages = [];
console.log(" [*] Waiting for messages in %s. To exit press CTRL+C", q);
ch.consume(q, function (msg) {
console.log(" [x] Received %s", msg.content.toString());
messages.push(msg.content.toString());
messages.forEach(function(element) {
console.log(element);
});
}, { noAck: true });
});
});
I can consolidate all messages in an array (messages) but this function runs till the application is stopped, which is not a good thing as the REST client will wait forever....
I would like to stop the function and move on with the program when the queue is empty or after a period of time (2secs) if the first option is not possible.
I found the same problem solved in Java but not in JS.
Any hint is much appreciated.
Cheers, Giovanni

You can use connection#close to stop the connection altogether and build up a new one on-demand.
However, you still need to detect when to terminate the connection.
May I ask what exactly the motivation behind this is? HTTP/REST and message queues are inherently different. Sending a message based on an HTTP request is one thing; receiving message for a certain request is definitely a place I would not go voluntarily.

Develop Reference

JavaScript is the programming language of the Web.