I have a Node.js system that uploads a large number of objects to MongoDB and creates folders in dropbox for each object. This takes around 0.5 seconds per object. In situations therefore where i have many objects this could take up to around a minute. What i currently do is notify the client that the array of objects has been accepted using a 202 response code. However how do i then notify the client of completion a minute later.
app.post('/BulkAdd', function (req, res) {
issues = []
console.log(req.body)
res.status(202).send({response:"Processing"});
api_functions.bulkAdd(req.body).then( (failed, issues, success) => {
console.log('done')
})
});
bulkAdd: async function (req, callback) {
let failed = []
let issues = []
let success = []
i = 1
await req.reduce((promise, audit) => {
// return promise.then(_ => dropbox_functions.createFolder(audit.scanner_ui)
let globalData;
return promise.then(_ => this.add(audit)
.then((data)=> {globalData = data; return dropbox_functions.createFolder(data.ui, data)}, (error)=> {failed.push({audit: audit, error: 'There was an error adding this case to the database'}); console.log(error)})
.then((data)=>{console.log(data, globalData);return dropbox_functions.checkScannerFolderExists(audit.scanner_ui)},(error)=>{issues.push({audit: globalData, error: 'There was an error creating the case folder in dropbox'})})
.then((data)=>{return dropbox_functions.moveFolder(audit.scanner_ui, globalData.ui)},(error)=>{issues.push({audit: globalData, error: 'No data folder was found so an empty one was created'}); return dropbox_functions.createDataFolder(globalData.ui)})
.then(()=>success.push({audit:globalData}), issues.push({audit: globalData, error: 'Scanner folder found but items not moved'}))
);
}, Promise.resolve()).catch(error => {console.log(error)});
return(failed, issues, success)
},
Well the problem with making client request wait, is it will timeout after certain period or sometimes will show error with no response received.
What you can do is
- Make client request to server to initiate the task, and return 200OK and keep doing your task on server.
- Now write a file on server after insertion of every object as status.
- Read the file from client every 5-10 sec to check if server has completed creating objects or not.
- Mean while your task is not completed on server, show status with completion percentage or some animation.
Or simply implement WebHook or WebSockets to maintain communication.
Related
I'm trying to pass a variable to EJS a second time in my code and am running into trouble. Here is my code:
axios.get(yearURL)
.then(function (res) {
let data = res.data.MRData.SeasonTable.Seasons;
years = data.map(d => d.season);
app.get('/', function (req, res) {
res.render('index.ejs', {
years: years
});
app.post('/down', function(req, res) {
let year = req.body;
res.redirect('/');
axios.get(`http://ergast.com/api/f1/${year.year}/drivers.json`)
.then(function (res) {
let data = res.data.MRData.DriverTable.Drivers;
drivers = data.map(d => `${d.givenName} ${d.familyName}`);
})
.catch(function (err) {
console.log(err);
})
res.render('index.ejs', {
drivers: drivers,
years: years
});
Whenever I run this however, I receive an error that I cannot set headers after they are sent to the client. I've also read elsewhere that apparently you can not call res.render twice. So my question is, how can I pass another set of data to EJS after I have already called res.render once before?
Here it is as pseudocode. It's good to start your program with this level of logical structure, and then implement it:
Define ready = false, errored = false, and data = undefined
variables.
Get the data from the remote API, in the then branch, set
ready = true, assign result to data. In the error branch, set errored
= true. Should we retry on error?
Define the / GET route.
If not ready, check errored. If not errored, we are still waiting for the data. In this case, do we wait for the call to resolve, or return something to the client to let them know?
If not ready, and errored, tell the client that there was an error.
If ready == true, then we have data to render a response to the client.
Define the /down route. It needs to take a year parameter, and we need to make an async call in the route handler to get the data.
Can we cache the data, so that subsequent calls for the same year return data that we fetched previously? If we can, use an object as a lookup dictionary. If the object has a key for that year, use the cached data to render the response. If not, make the call, and in the then branch, add the response to the cache object, and use the data to render the response.
I am using SignalR to start long computations on server side and post a message to the client when the result is available.
The input bindings is an HTTP request.
I would like to be able to send multiple messages back in order to notify the client of the differents steps of the process (eg, computation starts, computation ends, etc..).
I tried pushing different messages to context.bindings.signalRMessages but I see that everything is sent together at the end of the whole process. Is there a way to send several messages at different times?
Another related issue is that my HTTP request on client side remains stuck until the end of the process. I would like to be able to post a quick response early, since I get the response via a signalR message.
Here is my server code :
module.exports = async function(context, req) {
let ID = context.bindingData.invocationId;
context.bindings.signalRMessages = [];
const messageQueue = context.bindings.signalRMessages;
var postMessage = (message) => {
message.userId = req.query.userId;
message.isPrivate = true;
messageQueue.push(message);
};
let preProcessData = preProcess(req.body.input);
let startMessage = {
"target": "optimStart",
"arguments": [{ preProcessData: preProcessData }]
};
postMessage(startMessage); // <<<< I want this one to be sent immediately
try {
let optimOutput = await computeOptim(req.body.input, ID); // that's the long process
let response = {
optimId: ID,
optimOutput: optimOutput
};
let optimCompleteMessage = {
"target": "optimComplete",
"arguments": [response]
};
postMessage(optimCompleteMessage);
} catch (err) {
// ....
}
};
Am I doing anything wrong or is it just not possible ?
Thanks!
This is not possible with a simple HTTP triggered function since bindings resolve only once the execution of the function completes.
For your scenario, durable functions would be the perfect choice.
You would still have a HTTP Triggered function (client function) to start on orchestration and return immediately. In the orchestration function, you would have separate activity functions for the processing and for sending updates to the client using the SignalR binding.
I implemented a simple chat for my website where users can talk to each other with ExpressJS and Socket.io. I added a simple protection from a ddos attack that can be caused by one person spamming the window like this:
if (RedisClient.get(user).lastMessageDate > currentTime - 1 second) {
return error("Only one message per second is allowed")
} else {
io.emit('message', ...)
RedisClient.set(user).lastMessageDate = new Date()
}
I am testing this with this code:
setInterval(function() {
$('input').val('message ' + Math.random());
$('form').submit();
}, 1);
It works correctly when Node server is always up.
However, things get extremely weird if I turn off the Node server, then run the code above, and start Node server again in a few seconds. Then suddenly, hundreds of messages are inserted into the window and the browser crashes. I assume it is because when Node server is down, socket.io is saving all the client emits, and once it detects Node server is online again, it pushes all of those messages at once asynchronously.
How can I protect against this? And what is exactly happening here?
edit: If I use Node in-memory instead of Redis, this doesn't happen. I am guessing cause servers gets flooded with READs and many READs happen before RedisClient.set(user).lastMessageDate = new Date() finishes. I guess what I need is atomic READ / SET? I am using this module: https://github.com/NodeRedis/node_redis for connecting to Redis from Node.
You are correct that this happens due to queueing up of messages on client and flooding on server.
When the server receives messages, it receives messages all at once, and all of these messages are not synchronous. So, each of the socket.on("message:... events are executed separately, i.e. one socket.on("message... is not related to another and executed separately.
Even if your Redis-Server has a latency of a few ms, these messages are all received at once and everything always goes to the else condition.
You have the following few options.
Use a rate limiter library like this library. This is easy to configure and has multiple configuration options.
If you want to do everything yourself, use a queue on server. This will take up memory on your server, but you'll achieve what you want. Instead of writing every message to server, it is put into a queue. A new queue is created for every new client and delete this queue when processing the last item in queue.
(update) Use multi + watch to create lock so that all other commands except the current one will fail.
the pseudo-code will be something like this.
let queue = {};
let queueHandler = user => {
while(queue.user.length > 0){
// your redis push logic here
}
delete queue.user
}
let pushToQueue = (messageObject) => {
let user = messageObject.user;
if(queue.messageObject.user){
queue.user = [messageObject];
} else {
queue.user.push(messageObject);
}
queueHandler(user);
}
socket.on("message", pushToQueue(message));
UPDATE
Redis supports locking with WATCH which is used with multi. Using this, you can lock a key, and any other commands that try to access that key in thet time fail.
from the redis client README
Using multi you can make sure your modifications run as a transaction,
but you can't be sure you got there first. What if another client
modified a key while you were working with it's data?
To solve this, Redis supports the WATCH command, which is meant to be
used with MULTI:
var redis = require("redis"),
client = redis.createClient({ ... });
client.watch("foo", function( err ){
if(err) throw err;
client.get("foo", function(err, result) {
if(err) throw err;
// Process result
// Heavy and time consuming operation here
client.multi()
.set("foo", "some heavy computation")
.exec(function(err, results) {
/**
* If err is null, it means Redis successfully attempted
* the operation.
*/
if(err) throw err;
/**
* If results === null, it means that a concurrent client
* changed the key while we were processing it and thus
* the execution of the MULTI command was not performed.
*
* NOTICE: Failing an execution of MULTI is not considered
* an error. So you will have err === null and results === null
*/
});
}); });
Perhaps you could extend your client-side code, to prevent data being sent if the socket is disconnected? That way, you prevent the library from queuing messages while the socket is disconnected (ie the server is offline).
This could be achieved by checking to see if socket.connected is true:
// Only allow data to be sent to server when socket is connected
function sendToServer(socket, message, data) {
if(socket.connected) {
socket.send(message, data)
}
}
More information on this can be found at the docs https://socket.io/docs/client-api/#socket-connected
This approach will prevent the built in queuing behaviour in all scenarios where a socket is disconnected, which may not be desirable, however if should protect against the problem you are noting in your question.
Update
Alternatively, you could use a custom middleware on the server to achieve throttling behaviour via socket.io's server API:
/*
Server side code
*/
io.on("connection", function (socket) {
// Add custom throttle middleware to the socket when connected
socket.use(function (packet, next) {
var currentTime = Date.now();
// If socket has previous timestamp, check that enough time has
// lapsed since last message processed
if(socket.lastMessageTimestamp) {
var deltaTime = currentTime - socket.lastMessageTimestamp;
// If not enough time has lapsed, throw an error back to the
// client
if (deltaTime < 1000) {
next(new Error("Only one message per second is allowed"))
return
}
}
// Update the timestamp on the socket, and allow this message to
// be processed
socket.lastMessageTimestamp = currentTime
next()
});
});
In short, I've run into an issue where multiple parallel GET requests to my Node.js server cause the server to get "clogged up" and hang, thus resulting in timeouts for the clients (503, service unavailable).
After a lot of performance analysis, I've realized it's a CPU issue. The specific request (we'll call it GET /foo) queries data from multiple services over HTTP, and then does a lot of computation, and returns the results to the client, like this:
Client request GET /foo
/foo controller queries data over HTTP from multiple other services`
/foo controller then does a bunch of iterations over the data to compile some output for the client
Step 3 takes around 2 seconds to complete. However, if I send 2 requests in parallel to /foo, each client will receive their response in about 4 seconds. When I run the app in a cluster using more cores, the requests run much faster, but not quite what I want.
Seems like I have several options here:
pre-compute the response (ideally would like to avoid this for now, since it will require a whole "cache invalidation" scheme), or
/foo sends the CPU-blocking computation asynchronously to another process (using Heroku, so that would be another dyno), and then I can use a websocket or something to push the results to the client (again, very complex for my situation), or
somehow yield to a child process in the request and return the results to the client
Would love to do something like option 3. Something like this:
get('/foo', function*(request) {
// I/O, so not blocking the event loop (I think)
let data = yield getData(request)
// make this happen in a different process
let response = yield doSomeHeavyProcessing(data)
return response
})
I've omitted a lot of implementation details above, but if it's necessary to know, I'm using Koa and Node.js 6.
Ideally, doSomeHeavyProcessing would do the CPU-intensive computation in some separate process, and when it's done, still send the results back in a "synchronous" fashion to the request client.
Been trying to wrap my head around child processes, web workers, fibers, etc., and have been doing some basic "hello worlds" with these to get them to do basically the above, but to no avail. Can post more details if necessary.
Here are some approaches that you can try:
1.
Split blocking computation in small chunks and use setImmediate to place the next chunk of work at the end of the event queue. So computation is no longer blocking and other requests can be processed.
2.
Microsoft recently released napajs. As stated in their README
As it evolves, we find it useful to complement Node.js in CPU-bound tasks, with the capability of executing JavaScript in multiple V8 isolates and communicating between them.
I haven't tried it, but it looks very promising:
var napa = require('napajs');
var zone1 = napa.zone.create('zone1', { workers: 4 });
get('/foo', function*(request) {
let data = yield getData(request)
let response = yield zone1.execute(doSomeHeavyProcessing, [data])
return response
})
3. If nothing of the above is enough and you need to spread the load across multiple machines, then you probably couldn't avoid using some sort of message queue to distribute work to different servers. In this case check out ZeroMQ. It is extremely easy to use from node, and you can implement any kind of distributed messaging pattern with it.
You could utilize Child process with additional wrapper for convenience.
worker.js - this module will run in a separate process and will do the heavy work
const crypto = require('crypto');
function doHeavyWork(data) {
return crypto.pbkdf2Sync(data, 'salt', 100000, 64, 'sha512');
}
process.on('message', (message) => {
const result = doHeavyWork(message.data);
process.send({ id: message.id, result });
});
client.js - a convenience (but primitive) wrapper for Child process
const cp = require('child_process');
let worker;
const resolves = new Map();
module.exports = {
init(moduleName, errorCallback) {
worker = cp.fork(moduleName);
worker.on('error', errorCallback);
worker.on('message', (message) => {
const resolve = resolves.get(message.id);
resolves.delete(message.id);
if (!resolve) {
errorCallback(new Error(`Got response from worker with unknown id: ${message.id}`));
return;
}
resolve(message.result);
});
console.log(`Service PID: ${process.pid}, Worker PID: ${worker.pid}`);
},
doHeavyWorkRemotly(data) {
const id = `${Date.now()}${Math.random()}`;
return new Promise((resolve) => {
worker.send({ id, data });
resolves.set(id, resolve);
});
}
}
I use fork() to utilize an additional communication channel as it is stated in the docs.
Also I keep a record of all submitted to worker process requests (const resolves = new Map();) and resolve Promises (resolve(message.result);) only when the worker process returns response for the specific request (const resolve = resolves.get(message.id);).
run.js - a startup module, it utilizes co to 'execute' generators.
const co = require('co');
const client = require('./client');
function errorCallback(error) {
console.log('Got an unexpected error!');
console.log(error);
}
client.init('./worker.js', errorCallback);
function* run() {
while(true) {
yield client.doHeavyWorkRemotly('mydata');
}
}
co(run);
To test it simply run node run.js, it will print
Service PID: XXXX, Worker PID: XXXX
then take a look at CPU utilization, worker process will probably take around 100% of CPU while Service will be quite idle.
I built a Slack slash command that communicates with a custom Node API and POSTS acronym data in some way, shape, or form. It either gets the meaning of an acronym or adds/removes a new acronym to a Mongo database.
The command works pretty well so far, but Slack occasionally returns a timeout error because it expects a response within 3 seconds. As a result, I'm trying to implement delayed responses. I'm not sure that I am implementing delayed responses properly for my Slack slash command & Node API.
This resource on Slack slash commands has information on delayed responses. The idea is that I want to send a 200 response immediately to let the Slack user know that their request has been processed. Then I want to send a delayed response to slackReq.response_url that isn't constrained by the 3-second time limit.
The Code
let jwt = require('jsonwebtoken');
let request = require('request');
let slackHelper = require('../helpers/slack');
// ====================
// Slack Request Body
// ====================
// {
// "token":"~",
// "team_id":"~"
// "team_domain":"~",
// "channel_id":"~",
// "channel_name":"~",
// "user_id":"~",
// "user_name":"~",
// "command":"~",
// "text":"~",
// "response_url":"~"
// }
exports.handle = (req, res) => {
let slackReq = req.body;
let token = slackReq.token;
let teamId = slackReq.team_id;
if (!token || !teamId || !slackHelper.match(token, teamId)) {
// Handle an improper Slack request
res.json({
response_type: 'ephemeral',
text: 'Incorrect request'
});
} else {
// Handle a valid Slack request
slackHelper.handleReq(slackReq, (err, slackRes) => {
if (err) {
res.json({
response_type: 'ephemeral',
text: 'There was an error'
});
} else {
// NOT WORKING - Immediately send a successful response
res.json({
response_type: 'ephemeral',
text: 'Got it! Processing your acronym request...'
})
let options = {
method: 'POST',
uri: slackReq.response_url,
body: slackRes,
json: true
};
// Send a delayed response with the actual acronym data
request(options, err => {
if (err) console.log(err);
});
}
});
}
};
What's Happening Right Now
Say I want to find the meaning of acronym NBA. I go on Slack and shoot out the following:
/acronym NBA
I then hit the 3-second timeout error - Darn – that slash command didn't work (error message: Timeout was reached). Manage the command at slash-command.
I send a request a few more times (2 to 4 times), and then the API finally returns, all at once:
Got it! Processing your acronym request...
NBA means "National Basketball Association".
What I Want to Happen
I go on Slack and shoot out the following:
/acronym NBA
I immediately get the following:
Got it! Processing your acronym request...
Then, outside of the 3-second window, I get the following:
NBA means "National Basketball Association".
I never hit a timeout error.
Conclusion
What am I doing wrong here? For some reason, that res.json() with the processing message isn't immediately being sent back. What can I do to fix this?
Thank you in advance!
Edit 1
I tried to replace the res.json() call with res.sendStatus(200).json(), but unfortunately, that only returned an 'OK' without actually processing the request.
I subsequently tried res.status(200).send({..stuff..}) but that resulted in the same problem I was having before.
I think res.json() sends a 200 automatically anyway, but its just not responding fast enough for some reason.
Solution
I eventually figured this one out. I was implementing the delayed responses right all along.
Since I'm using the free plan for Heroku, the dyno that's hosting my app would go down after 30 minutes of inactivity. When the app went down, the first few requests would time out on Slack before properly responding to a request.
The solution to this is either 1) upgrade to a new plan that keeps the dyno active at all times, or 2) ping the app with a simple get request every 15 or so minutes, like so:
const intervalMins = 15;
setInterval(() => {
http.get("<insert app url here>");
console.log('Ping!');
}, intervalMin * 60000)
I decided to go with the latter option. I don't run into the issue of the dyno sleeping anymore. I'd check this article for more details.