I am running a backend server with NodeJS. The backend holds a function that makes requests to an external API. As the external API provider isn't too happy about constant requests, I need to throttle my function that makes the requests to this external API. My current solution is to use the Bottleneck library.
With that library I can define a limit on how often a specific function is called in a certain amount of time (also I can limit the number of concurrent instances that execute a specific function). There is only one downside: I can neither access nor change the queue of "waiting" function calls, meaning that one client can basically make a lot of requests and block the function for other clients.
Is there a way to sort of implement a queue in NodeJS for function calls? If other clients make requests aswell, I need to take that into account and somehow mix up the execution order to be fair again (and not first in first out/first come first serve).
This is my current setup with Bottleneck, but as described above, the behaviour is FIFO and therefore other clients are getting "blocked".
const Bottleneck = require("bottleneck");
const limiter = new Bottleneck({
minTime: 1000,
});
router.post("/", async (req, res) => {
...
const result = await requestHandler(xml, 0);
async function requestHandler(xml, recursionCounter) {
...
result = await limiter.schedule(() => soapRequest(URL, xml));
...
}
}
async function soapRequest(url, xml) {...}
Related
Is there anyway how to insert data in parallel from an external data source? Meaning I have multiple APIs/Endpoint that provide similar dataset that will be inserted in a database.
For example:
My current code is looping through each API and saving it to the database. My target behavior is the image above and hopefully dynamic. Meaning I can add multiple Endpoints and can insert in parallel when calling my insert function.
Yes, you can do this.
To prepare to write the code you will be wise to tool up a version of the MySQL API in node that works with async/await (that is, a Promise-based API).
Then tool up to use a mysql connection pool. You can limit the total number of connections in a pool. That is wise because too many connections can overwhelm your MySQL server.
const mysql = require('mysql2/promise')
const pool = mysql.createPool({
host: 'host',
user: 'redacted',
database: 'redacted',
waitForConnections: true,
connectionLimit: 6,
queueLimit: 0
})
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms))
}
Then write each API access operation as an async function with a loop in it. Something like this gets a connection to use, even for multiple sequential queries, for each API operation.
async function apiOne(pool) {
while (true) {
const result = await (api_operation)
connection = await pool.getConnection()
const [rows, fields] = await connection.execute(whatever)
const [rows, fields] = await connection.execute(whatever_else)
connection.release()
await sleep(1000) // wait one second
}
}
Do getConnection() inside the loop, not outside it. Pool.getConnection() is very fast because it re-uses existing connections. Doing it inside the loop allows your pool to limit the number of simultaneous connections.
The sleep() function is optional of course. You can use it to control how fast the loop runs.
Write as many of these functions as you need. This is a good way to handle multiple APIs because the code for each one is isolated in its own function.
Finally, use Promise.all() to run all your async functions concurrently.
const concurrents = []
concurrents.push (apiOne(pool))
concurrents.push (apiTwo(pool))
concurrents.push (apiThree(pool))
Promise.all (concurrents).then() /* run all the ApiXxx functions */
Beware, this sample code is dangerously oversimplified. It lacks any error or exception handling, which you need in long-running code.
I’m working on an application that uses Firebase Functions as a API interface between my web application and Google Cloud SQL (MySQL 5.7).
I have a process for importing records from the client app; basically the client app reads a CSV file then executes a function for every row in the CSV file. The function executes three or four queries during processing of the record (checking to see if the main record exists, creating it and/or other needed records, updating a stats record for this process).
The function’s called sequentially for each row, so there’s never more than one request (row) processed at a time executing 3 or 4 queries before returning data to the client app which then processes the next row (async/await).
The process works great for CSV files with 1 to 100 rows. As soon as it goes above about 900 rows, the Firebase Functions starts reporting ERROR Error: ER_CON_COUNT_ERROR: Too many connections
My code, shown below, originally had a connection limit of 10, but I bumped it up to 100 connections but it still fails.
Here’s my code that executes the SQL queries:
import * as functions from "firebase-functions";
import * as mysql from 'mysql';
export async function executeQuery(cmd: string) {
const mySQLConfig = {
host: functions.config().sql.prodhost,
user: functions.config().sql.produser,
password: functions.config().sql.prodpswd,
database: functions.config().sql.proddatabase,
connectionLimit: 100,
}
var pool: any;
if (!pool) {
pool = mysql.createPool(mySQLConfig);
}
return new Promise(function (resolve, reject) {
//#ts-ignore
pool.query(cmd, function (error, results) {
if (error) {
return reject(error);
}
resolve(results);
});
});
}
As I understand it, with a pool like I think I’ve implemented above, each request will get a connection up to the max connections. Each connection will automatically return to the pool once its done processing the request. So, even if it takes a while to release the connection, with the connection limit at 100, I should be able to process quite a few rows (20 or so at least) before there’s contention for connections and then the process will queue up and wait for free connections before continuing. If that’s right, what’s happening here?
I found an article here: https://cloud.google.com/sql/docs/mysql/manage-connections that describes some additional settings I can use to tweak connection management:
// 'connectTimeout' is the maximum number of milliseconds before a timeout
// occurs during the initial connection to the database.
connectTimeout: 10000,
// 'acquireTimeout' is the maximum number of milliseconds to wait when
// checking out a connection from the pool before a timeout error occurs.
acquireTimeout: 10000,
// 'waitForConnections' determines the pool's action when no connections are
// free. If true, the request will queued and a connection will be presented
// when ready. If false, the pool will call back with an error.
waitForConnections: true, // Default: true
// 'queueLimit' is the maximum number of requests for connections the pool
// will queue at once before returning an error. If 0, there is no limit.
queueLimit: 0, // Default: 0
I’m tempted to try bumping up the timeouts, but I’m not sure whether that’s actually impacting me here.
Since I’m running this in Firebase Functions (Google Cloud Functions under the covers), do these settings even really apply? Isn’t my function’s VM resetting after every execution or at least my function terminating after every execution? Does the pool even exist in this context? If not, then how do I do this type of processing in Functions?
One option is, of course, to push all of my processing to the function, just send up a JSON object for the row array and let the function process them all at once. This, I think, should make proper use of pools, but I’m worried I’ll bump up against execution limits in Functions (5 minutes) which is why I built it like I did.
Stupid developer trick, I was paying such close attention to my pool code that I missed that I'm declaring the pool variable in the wrong place. Moving the pool declaration outside of the method fixed my problem. With the code the way it was, I was creating a pool with every SQL query which quickly used up all of my connections.
I'm struggling with a javascript/Nodejs8 Google Cloud Function to publish payloads to Google PubSub.
So I have a Cloud Function triggered by HTTP requests and the request body is then published to a pubsub topic (configured for pull mode).
Here is my code:
const {PubSub} = require('#google-cloud/pubsub');
const pubsub = new PubSub();
const topic = pubsub.topic('my-fancy-topic');
function formatPubSubMessage(reqObj){
// the body is pure text
return Buffer.from(reqObj.body);
};
exports.entryPoint = function validate(req, res) {
topic.publish(formatPubSubMessage(req)).then((messageId) => {
console.log("sent pubsub message with id :: " + messageId)
});
res.status(200).json({"res":"OK"});
};
My issue is that the cloud function finishes executing before the pubsub message being published (in logs, the log "Function execution took X ms, finished with status code: 200" shows up around 30 or 40 seconds before the my pubsub log. I also had several times a log with "Ignoring exception from a finished function" and I dont get my pubsub log)
I'm not a javascript or nodejs specialist and I don't master javascript promises neither but I was wondering if I could make the publish synchronous. I'm thinking as well that I might be doing something wrong here !
Thank you in advance for your help.
In your logic, your callback / event handler function is being called when the HTTP message arrives. You then execute a publish() function. Executing a publish is an asynchronous activity. This means that it make take some time for the publish to complete and since JavaScript (intrinsically) doesn't want to block, it returns immediately with a promise that you can then use to be notified when the asynchronous work has completed. Immediately after executing the publish() your logic executes a res.status(....) which sends a response to the HTTP request and that is indeed the end of the flow request from the HTTP client. The asynchronous publish is still cooking and when it itself completes, then the callback for the publish occurs and you log a response.
Unfortunately, this is not a good practice as documented by Google here ...
https://cloud.google.com/functions/docs/bestpractices/tips#do_not_start_background_activities
In this last story, the function you call validate will still end prior to the publish being completed. If you want to block while the publish() executes (effectively making it synchronous), you can use the JavaScript await key word. Loosely, something like:
try {
let messageId = await topic.publish(....);
console.log(...);
catch(e) {
...
}
You will also need to flag the functions as being async. For example:
exports.entryPoint = async function validate(req, res) {
...
See: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function
You can also simply return a Promise from the function and the callback function will not be considered resolved until the Promise as a whole is resolved.
The bottom line is to study Promises in depth.
Right now, this code is sending a response before the publish is complete. When the response is sent, the function is terminated, and ongoing async work might not complete.
What you should do instead is send the response only after the publish is complete, which means putting that line of code in the then callback.
exports.entryPoint = function validate(req, res) {
topic.publish(formatPubSubMessage(req)).then((messageId) => {
console.log("sent pubsub message with id :: " + messageId)
res.status(200).json({"res":"OK"});
});
};
I suggest spending some time learning about how promises work, as this is crucial to building functions that work correctly.
In short, I've run into an issue where multiple parallel GET requests to my Node.js server cause the server to get "clogged up" and hang, thus resulting in timeouts for the clients (503, service unavailable).
After a lot of performance analysis, I've realized it's a CPU issue. The specific request (we'll call it GET /foo) queries data from multiple services over HTTP, and then does a lot of computation, and returns the results to the client, like this:
Client request GET /foo
/foo controller queries data over HTTP from multiple other services`
/foo controller then does a bunch of iterations over the data to compile some output for the client
Step 3 takes around 2 seconds to complete. However, if I send 2 requests in parallel to /foo, each client will receive their response in about 4 seconds. When I run the app in a cluster using more cores, the requests run much faster, but not quite what I want.
Seems like I have several options here:
pre-compute the response (ideally would like to avoid this for now, since it will require a whole "cache invalidation" scheme), or
/foo sends the CPU-blocking computation asynchronously to another process (using Heroku, so that would be another dyno), and then I can use a websocket or something to push the results to the client (again, very complex for my situation), or
somehow yield to a child process in the request and return the results to the client
Would love to do something like option 3. Something like this:
get('/foo', function*(request) {
// I/O, so not blocking the event loop (I think)
let data = yield getData(request)
// make this happen in a different process
let response = yield doSomeHeavyProcessing(data)
return response
})
I've omitted a lot of implementation details above, but if it's necessary to know, I'm using Koa and Node.js 6.
Ideally, doSomeHeavyProcessing would do the CPU-intensive computation in some separate process, and when it's done, still send the results back in a "synchronous" fashion to the request client.
Been trying to wrap my head around child processes, web workers, fibers, etc., and have been doing some basic "hello worlds" with these to get them to do basically the above, but to no avail. Can post more details if necessary.
Here are some approaches that you can try:
1.
Split blocking computation in small chunks and use setImmediate to place the next chunk of work at the end of the event queue. So computation is no longer blocking and other requests can be processed.
2.
Microsoft recently released napajs. As stated in their README
As it evolves, we find it useful to complement Node.js in CPU-bound tasks, with the capability of executing JavaScript in multiple V8 isolates and communicating between them.
I haven't tried it, but it looks very promising:
var napa = require('napajs');
var zone1 = napa.zone.create('zone1', { workers: 4 });
get('/foo', function*(request) {
let data = yield getData(request)
let response = yield zone1.execute(doSomeHeavyProcessing, [data])
return response
})
3. If nothing of the above is enough and you need to spread the load across multiple machines, then you probably couldn't avoid using some sort of message queue to distribute work to different servers. In this case check out ZeroMQ. It is extremely easy to use from node, and you can implement any kind of distributed messaging pattern with it.
You could utilize Child process with additional wrapper for convenience.
worker.js - this module will run in a separate process and will do the heavy work
const crypto = require('crypto');
function doHeavyWork(data) {
return crypto.pbkdf2Sync(data, 'salt', 100000, 64, 'sha512');
}
process.on('message', (message) => {
const result = doHeavyWork(message.data);
process.send({ id: message.id, result });
});
client.js - a convenience (but primitive) wrapper for Child process
const cp = require('child_process');
let worker;
const resolves = new Map();
module.exports = {
init(moduleName, errorCallback) {
worker = cp.fork(moduleName);
worker.on('error', errorCallback);
worker.on('message', (message) => {
const resolve = resolves.get(message.id);
resolves.delete(message.id);
if (!resolve) {
errorCallback(new Error(`Got response from worker with unknown id: ${message.id}`));
return;
}
resolve(message.result);
});
console.log(`Service PID: ${process.pid}, Worker PID: ${worker.pid}`);
},
doHeavyWorkRemotly(data) {
const id = `${Date.now()}${Math.random()}`;
return new Promise((resolve) => {
worker.send({ id, data });
resolves.set(id, resolve);
});
}
}
I use fork() to utilize an additional communication channel as it is stated in the docs.
Also I keep a record of all submitted to worker process requests (const resolves = new Map();) and resolve Promises (resolve(message.result);) only when the worker process returns response for the specific request (const resolve = resolves.get(message.id);).
run.js - a startup module, it utilizes co to 'execute' generators.
const co = require('co');
const client = require('./client');
function errorCallback(error) {
console.log('Got an unexpected error!');
console.log(error);
}
client.init('./worker.js', errorCallback);
function* run() {
while(true) {
yield client.doHeavyWorkRemotly('mydata');
}
}
co(run);
To test it simply run node run.js, it will print
Service PID: XXXX, Worker PID: XXXX
then take a look at CPU utilization, worker process will probably take around 100% of CPU while Service will be quite idle.
How do the two compare to each other?
TL;DR
DNode
provides RMI;
remote functions can accept callbacks as arguments;
which is nice, since it is fully asynchronous;
runs stand-alone or through an existing http server;
can have browser and Node clients;
supports middleware, just like connect;
has been around longer than NowJS.
NowJS
goes beyond just RMI and implements a "shared scope" API. It's like
Dropbox, only with variables and functions instead of files;
remote functions also accept callbacks (thanks to Sridatta and Eric from NowJS
for the clarification);
depends on a listening http server to work;
can only have browser clients;
became public very recently;
is somewhat buggy right now.
Conclusion
NowJS is more of a toy right now -- but keep a watch as it matures. For
serious stuff, maybe go with DNode. For a more detailed review of these
libraries, read along.
DNode
DNode provides a Remote Method Invocation framework. Both the client and server
can expose functions to each other.
// On the server
var server = DNode(function () {
this.echo = function (message) {
console.log(message)
}
}).listen(9999)
// On the client
dnode.connect(9999, function (server) {
server.echo('Hello, world!')
})
The function that is passed to DNode() is a handler not unlike the one passed to
http.createServer. It has two parameters: client can be used to access the
functions exported by the client and connection can be used to handle
connection-related events:
// On the server
var server = DNode(function (client, connection) {
this.echo = function (message) {
console.log(message)
connection.on('end', function () {
console.log('The connection %s ended.', conn.id)
})
}
}).listen(9999)
The exported methods can be passed anything, including functions. They are properly
wrapped as proxies by DNode and can be called back at the other endpoint. This is
fundamental: DNode is fully asynchronous; it does not block while waiting
for a remote method to return:
// A contrived example, of course.
// On the server
var server = DNode(function (client) {
this.echo = function (message) {
console.log(message)
return 'Hello you too.'
}
}).listen(9999)
// On the client
dnode.connect(9999, function (server) {
var ret = server.echo('Hello, world!')
console.log(ret) // This won't work
})
Callbacks must be passed around in order to receive responses from the other
endpoint. Complicated conversations can become unreadable quite fast. This
question discusses possible solutions for this problem.
// On the server
var server = DNode(function (client, callback) {
this.echo = function (message, callback) {
console.log(message)
callback('Hello you too.')
}
this.hello = function (callback) {
callback('Hello, world!')
}
}).listen(9999)
// On the client
dnode.connect(9999, function (server) {
server.echo("I can't have enough nesting with DNode!", function (response) {
console.log(response)
server.hello(function (greeting) {
console.log(greeting)
})
})
})
The DNode client can be a script running inside a Node instance or can be
embedded inside a webpage. In this case, it will only connect to the server that
served the webpage. Connect is of great assistance in this case. This scenario was tested with all modern browsers and with Internet Explorer 5.5 and 7.
DNode was started less than a year ago, on June 2010. It's as mature as a Node
library can be. In my tests, I found no obvious issues.
NowJS
NowJS provides a kind of magic API that borders on being cute. The server has an
everyone.now scope. Everything that is put inside everyone.now becomes
visible to every client through their now scope.
This code, on the server, will share an echo function with every client that
writes a message to the server console:
// Server-side:
everyone.now.echo = function (message) {
console.log(message)
}
// So, on the client, one can write:
now.echo('This will be printed on the server console.')
When a server-side "shared" function runs, this will have a now attribute
that is specific to the client that made that call.
// Client-side
now.receiveResponse = function (response) {
console.log('The server said: %s')
}
// We just touched "now" above and it must be synchronized
// with the server. Will things happen as we expect? Since
// the code is not multithreaded and NowJS talks through TCP,
// the synchronizing message will get to the server first.
// I still feel nervous about it, though.
now.echo('This will be printed on the server console.')
// Server-side:
everyone.now.echo = function (message) {
console.log(message)
this.now.receiveResponse('Thank you for using the "echo" service.')
}
Functions in NowJS can have return values. To get them, a callback must be
passed:
// On the client
now.twice(10, function (r) { console.log(r) }
// On the server
everyone.now.twice = function(n) {
return 2 * n
}
This has an implication if you want to pass a callback as an honest argument (not
to collect a return value) -- one must always pass the return value collector, or
NowJS may get confused. According to the developers, this way of retrieving the
return value with an implicit callback will probably change in the future:
// On the client
now.crunchSomeNumbers('compute-primes',
/* This will be called when our prime numbers are ready to be used. */
function (data) { /* process the data */ },
/* This will be called when the server function returns. Even if we
didn't care about our place in the queue, we'd have to add at least
an empty function. */
function (queueLength) { alert('You are number ' + queueLength + ' on the queue.') }
)
// On the server
everyone.now.crunchSomeNumbers = function(task, dataCallback) {
superComputer.enqueueTask(task, dataCallback)
return superComputer.queueLength
}
And this is it for the NowJS API. Well, actually there are 3 more functions that
can be used to detect client connection and disconnection. I don't know why they
didn't expose these features using EventEmitter, though.
Unlike DNode, NowJS requires that the client be a script running inside a web browser.
The page containing the script must be served by the same Node that is running
the server.
On the server side, NowJS also needs an http server listening. It must be passed
when initializing NowJS:
var server = http.createServer(function (req, response) {
fs.readFile(__dirname + '/now-client.html', function (err, data) {
response.writeHead(200, {'Content-Type':'text/html'})
response.write(data)
response.end()
})
})
server.listen(8080)
var everyone = now.initialize(server)
NowJS first commit is from a couple weeks ago (Mar 2011). As such, expect it to
be buggy. I found issues myself while writing this answer. Also expect its
API to change a lot.
On the positive side, the developers are very accessible -- Eric even guided me
to making callbacks work. The source code is not documented, but is fortunately
simple and short and the user guide and examples are enough to get one started.
NowJS team member here. Correction to andref's answer:
NowJS fully supports "Remote Method Invocation". You can pass functions as arguments in remote calls and you can have functions as return values as well.
These functions are wrapped by NowJS just as they are in DNode so that they are executed on the machine on which the function was defined. This makes it easy to expose new functions to the remote end, just like in DNode.
P.S. Additionally, I don't know if andref meant to imply that remote calls are only asynchronous on DNode. Remote calls are also async on NowJS. They do not block your code.
Haven't tried Dnode so my answer is not a comparison. But I would like to put forth few experiences using nowjs.
Nowjs is based on socket.io which is quite buggy. I frequently experience session time-outs, disconnects and now.ready event firing multiple times in a short duration. Check out this issue on nowjs github page.
Also I found using websockets unviable on certain platforms, however this can be circumvented by explicitly disabling websockets.
I had planned creating a production app using nowjs but it seems its not mature enough to be relied upon. I will try dnode if it serves my purpose, else I will switch to plain-old express.
Update:
Nowjs seems to be scrapped. No commits since 8 months.