I have a use case where my http requests are caching the intermediate result on server.
If the cache is not present the request builds it by requesting another server.
These requests are fired in succession (loop) using AJAX to Node Server and the number of requests can be in range of 50 to 500.
The Problem:
Since the requests are made in a loop and the cache is already not present first few of them all try to build the cache and sometimes consequent requests find the semi-built cache, which returns wrong result.
I can circumvent this problem with polling:
(function next(){
if(!wait){
fs.readFile(cacheFile, function(err){
if(err) {
wait = true;
createCache(); // sets wait = false;
} else {
useCache();
}
});
} else {
setTimeout(next,waitTime);
}
})();
My Query:
Can the requests be halted without polling, and continue only after the first request has completed the cache building process?
Yes, it is possible in combination with Futures/Promise. You can take this one.
Outside of the scope define var cachePromise and you can use something like this below:
if (!cachePromise) {
cachePromise = require('future').create()
buildCache(function() {
cachePromise.fulfill();
});
}
cachePromise.when(next); // this one triggers next route in middleware stack
Put the code in route stack before the route which gives result and you are good to go.
thanks.
Related
My scenario is the following:
I have a Progressive Web App that uses a Service Worker where I need to catch the request and do something with it every time the user requests a resource or leaves the current URL
I'm handling that through adding a callback to the fetch event of the worker
I only care about requested resources within our domain (e.g. example.com)
If the requested resource is within our domain I return the promise result from a regular fetch, so that's already covered
But, if the requested resource is outside my domain (as shown in the below snippet) I want the original request to just continue
I'm currently just doing a simple return if the scenario in bullet 5 is true
Snippet of my current code:
function onFetch(event) {
if (!event.request.url.startsWith("example.com")) {
return;
} else {
event.respondWith(
fetch(event.request)
.then(req => {
// doing something with the request
})
.catch((error)=> {
// handle errors etc.
})
.finally(()=> {
// cleanup
})
);
}
}
self.addEventListener('fetch', onFetch);
My question: Is it OK if I just return nothing like in the snippet, or, do I need to return something specific, like a new promise by fetching the original request (like I'm doing on the else block)?
Thanks!
It is absolutely okay to do what you're doing. Not calling event.respondWith() is a signal to the browser that a given fetch handler is not going to generate a response to a given request, and you can structure your code to return early to avoid calling event.respondWith().
You might have multiple fetch handlers registered, and if the first one returns without calling event.respondWith(), the next fetch handler will then get a chance to respond. If all of the fetch handlers have executed and none of them call event.respondWith(), the browser will automatically handle the request as if there were no service worker at all, which is what you want.
In terms of observed behavior, not calling event.respondWith() at all ends up looking similar to what would happen if you called event.respondWith(event.request). But there is overhead involved in making a fetch() request inside of a service worker and then passing the response body from the service worker thread back to the main program, and you avoid that overhead if you don't call event.respondWith(). So, I'd recommend the approach you're taking.
In short, I've run into an issue where multiple parallel GET requests to my Node.js server cause the server to get "clogged up" and hang, thus resulting in timeouts for the clients (503, service unavailable).
After a lot of performance analysis, I've realized it's a CPU issue. The specific request (we'll call it GET /foo) queries data from multiple services over HTTP, and then does a lot of computation, and returns the results to the client, like this:
Client request GET /foo
/foo controller queries data over HTTP from multiple other services`
/foo controller then does a bunch of iterations over the data to compile some output for the client
Step 3 takes around 2 seconds to complete. However, if I send 2 requests in parallel to /foo, each client will receive their response in about 4 seconds. When I run the app in a cluster using more cores, the requests run much faster, but not quite what I want.
Seems like I have several options here:
pre-compute the response (ideally would like to avoid this for now, since it will require a whole "cache invalidation" scheme), or
/foo sends the CPU-blocking computation asynchronously to another process (using Heroku, so that would be another dyno), and then I can use a websocket or something to push the results to the client (again, very complex for my situation), or
somehow yield to a child process in the request and return the results to the client
Would love to do something like option 3. Something like this:
get('/foo', function*(request) {
// I/O, so not blocking the event loop (I think)
let data = yield getData(request)
// make this happen in a different process
let response = yield doSomeHeavyProcessing(data)
return response
})
I've omitted a lot of implementation details above, but if it's necessary to know, I'm using Koa and Node.js 6.
Ideally, doSomeHeavyProcessing would do the CPU-intensive computation in some separate process, and when it's done, still send the results back in a "synchronous" fashion to the request client.
Been trying to wrap my head around child processes, web workers, fibers, etc., and have been doing some basic "hello worlds" with these to get them to do basically the above, but to no avail. Can post more details if necessary.
Here are some approaches that you can try:
1.
Split blocking computation in small chunks and use setImmediate to place the next chunk of work at the end of the event queue. So computation is no longer blocking and other requests can be processed.
2.
Microsoft recently released napajs. As stated in their README
As it evolves, we find it useful to complement Node.js in CPU-bound tasks, with the capability of executing JavaScript in multiple V8 isolates and communicating between them.
I haven't tried it, but it looks very promising:
var napa = require('napajs');
var zone1 = napa.zone.create('zone1', { workers: 4 });
get('/foo', function*(request) {
let data = yield getData(request)
let response = yield zone1.execute(doSomeHeavyProcessing, [data])
return response
})
3. If nothing of the above is enough and you need to spread the load across multiple machines, then you probably couldn't avoid using some sort of message queue to distribute work to different servers. In this case check out ZeroMQ. It is extremely easy to use from node, and you can implement any kind of distributed messaging pattern with it.
You could utilize Child process with additional wrapper for convenience.
worker.js - this module will run in a separate process and will do the heavy work
const crypto = require('crypto');
function doHeavyWork(data) {
return crypto.pbkdf2Sync(data, 'salt', 100000, 64, 'sha512');
}
process.on('message', (message) => {
const result = doHeavyWork(message.data);
process.send({ id: message.id, result });
});
client.js - a convenience (but primitive) wrapper for Child process
const cp = require('child_process');
let worker;
const resolves = new Map();
module.exports = {
init(moduleName, errorCallback) {
worker = cp.fork(moduleName);
worker.on('error', errorCallback);
worker.on('message', (message) => {
const resolve = resolves.get(message.id);
resolves.delete(message.id);
if (!resolve) {
errorCallback(new Error(`Got response from worker with unknown id: ${message.id}`));
return;
}
resolve(message.result);
});
console.log(`Service PID: ${process.pid}, Worker PID: ${worker.pid}`);
},
doHeavyWorkRemotly(data) {
const id = `${Date.now()}${Math.random()}`;
return new Promise((resolve) => {
worker.send({ id, data });
resolves.set(id, resolve);
});
}
}
I use fork() to utilize an additional communication channel as it is stated in the docs.
Also I keep a record of all submitted to worker process requests (const resolves = new Map();) and resolve Promises (resolve(message.result);) only when the worker process returns response for the specific request (const resolve = resolves.get(message.id);).
run.js - a startup module, it utilizes co to 'execute' generators.
const co = require('co');
const client = require('./client');
function errorCallback(error) {
console.log('Got an unexpected error!');
console.log(error);
}
client.init('./worker.js', errorCallback);
function* run() {
while(true) {
yield client.doHeavyWorkRemotly('mydata');
}
}
co(run);
To test it simply run node run.js, it will print
Service PID: XXXX, Worker PID: XXXX
then take a look at CPU utilization, worker process will probably take around 100% of CPU while Service will be quite idle.
I have a script which I start a several http requests inside a loop
Let's say that I have to make 1000 http requests.
The thing is that I can do only one http request per IP and I have only 10 IPs.
So, after 10 parallel requests, I have to wait a response to make another one.
How do I wait without block the script one response from a http request to start another one?
My problem is if I execute a while waiting for a free IP my whole script is blocked and I do not receive any response.
Use the async module for this.
You can use async#eachLimit to limit the concurrent requests to 10.
var urls = [
// a list of 100 urls
];
function makeRequest(url, callback) {
/* make a http request */
callback(); // when done, callback
}
async.eachLimit(urls, 10, makeRequest, function(err) {
if(err) throw err;
});
This code will loop through the list of urls and call makeRequest for each one. It will stop at 10 concurrent requests and will not proceed with the 11th request until one of the first 10 have finished.
In my Meteor 1.0 app, I'm running a batch of server-side HTTP requests in order to retrieve fixture data in a synchronous fashion. Once a request completes and computations are run on that data, startNumber is incremented (by 5000) and the request is re-run with that new value. This loop system will continue until the API returns a blank response array, signifying all the data has been captured. This HTTP request is part of a larger, complex function that helps set the context of the request.
functionName = function(param1,param2,param3) {
// ...
// ...
var startNumber = 1;
do {
var request = Meteor.http.call("GET", "https://url-to-api-endpoint",
{ params:
{
"since": startNumber
},
timeout: 60000
}
);
if(request.statusCode === 200) {
var response = request.data;
// perform calculations on the response
startNumber+=5000;
}
} (while response.length>0);
// ...
// ...
};
The do-while loop system is working fine, except that every few iterations the request is returning with Error: getaddrinfo ENOTFOUND. The URL is perfectly valid, and it appears these errors are resulting from a finicky/unreliable API as sometimes the same exact request will go through or error out. I want to replay failed requests in order to make sure my app is retrieving data chronologically before proceeding.
How can I replay a failed HTTP request as though it were being run for the first time? In other words, without losing the current context of all the variables, etc., in functionName?
FYI, incase someone else ends up in this predicament, I solved this problem by wrapping the HTTP request in a try-catch block. In the case of an error such as getaddrinfo ENOTFOUND or ETIMEDOUT, the error gets caught. Within the catch block, I call the functionName and pass in parameters for the current state (i.e. the current startNumber) - this allows me to essentially "replay" the request all over again.
// ...
// ...
try {
var request = Meteor.http.call("GET", "https://url-to-api-endpoint",
{ params:
{
"since": startNumber
},
timeout: 60000
}
);
} catch(err) {
console.log(err + '\nFailed...retrying');
functionName(param1,param2,param3);
}
// ...
// ...
Hi I understand that in long polling you keep the connection with the server open for long till you a get a response back from the server and then poll again and wait for the next response. However i dont seem to understand how to code it. There is this code below which uses long polling but I dont seem to get it
(function poll(){
$.ajax({ url: "server", success: function(data){
//update page based on data
}, dataType: "json", complete: poll, timeout: 30000 });
})();
But how is the connection kept open here. I understand that "poll" function is fired again once the response from the server is got.But how is the connection kept open?
Edit1:- It would be great if someone can also explain what would timeout actually do here
The client cannot force the server to keep the connection open. The server is simply not closing the connection. The server will have to say at some point "that's it, there's no more content here, bye". In long polling, the server simply never does so and keeps the client waiting for more data, which it trickles out little by little as updates come in. That's long polling.
On the client side it's possible to check occasionally for the data which has already been received, while the request has not finished. That way data can occasionally be sent from the server over the same open connection. In your case this is not being done, the success callback will only fire when the request has finished. It's basically a cheap form of long polling in which the server keeps the client waiting for an event, sends data about this event and then closes the connection. The client takes that as the trigger, processes the data, then reconnects to the server to wait for the next event.
I think what is making this confusing to understand is that the discussion is focused on the client-side programming.
Long-polling is not strictly a client-side pattern, but requires the web server to keep the connection open.
Background: Client wants to be notified by web server when something occurs or is available, for example, let me know when a new email arrives without me having to go back and ask every few seconds.
Client opens a connection to a specific URL on the web server.
Server accepts connection, opens a socket and dispatches control to whatever server-side code handles this connection (say a servlet or jsp in java, or a route in RoR or node/express).
Server code waits until the event or information is available. For example, when an email arrives, sees if any of the "waiting connections" are for the particular inbox. If they are, then respond with the appropriate data.
Client receives data, does its thing, then starts another request to poll.
I was looking to do something with staggered data results where some would come back right away but the last few results might come back 10-15 seconds later. I created a quick little jQuery hack but it's kinda doing what I want (still not sure if it makes sense to use it tho):
(function($) {
if (typeof $ !== 'function') return;
$.longPull = function(args) {
var opts = $.extend({ method:'GET', onupdate:null, onerror:null, delimiter:'\n', timeout:0}, args || {});
opts.index = 0;
var req = $.ajaxSettings.xhr();
req.open(opts.method, opts.url, true);
req.timeout = opts.timeout;
req.onabort = opts.onabort || null;
req.onerror = opts.onerror || null;
req.onloadstart = opts.onloadstart || null;
req.onloadend = opts.onloadend || null;
req.ontimeout = opts.ontimeout || null;
req.onprogress = function(e) {
try {
var a = new String(e.srcElement.response).split(opts.delimiter);
for(var i=opts.index; i<a.length; i++) {
try {
var data = JSON.parse(a[i]); // may not be complete
if (typeof opts.onupdate==='function') opts.onupdate(data, i);
opts.index = i + 1;
} catch(fx){}
}
}
catch(e){}
};
req.send(opts.data || null);
};
})(jQuery);
Largely untested but it seemed to do what you had in mind. I can think of all sorts of ways it could go wrong, though ;-)
$.longPull({ url: 'http://localhost:61873/Test', onupdate: function(data) { console.log(data); }});
As requested, here is some pseudo NodeJS code:
function respond_to_client(res,session,cnt)
{
//context: res is the object we use to respond to the client
//session: just some info about the client, irrelevant here
//cnt: initially 0
//nothing to tell the client, let's long poll.
if (nothing_to_send(res,session))
{
if (cnt<MAX_LONG_POLL_TIME)
{
//call this function in 100 ms, increase the counter
setTimeout(function(){respond_to_client(request_id,res,session,cnt+1)},100);
}
else
{
close_connection(res);
//Counter too high.
//we have nothing to send and we kept the connection for too long,
//close it. The client will open another.
}
}
else
{
send_what_we_have(res);
close_connection(res);
//the client will consume the data we sent,
//then quickly send another request.
}
return;
}
You don't see how it works from that code only, because the actual difference from a regular request is done on the server.
The Javascript just makes a regular request, but the server doesn't have to respond to the request immediately. If the server doesn't have anything worth returning (i.e. the change that the browser is waiting for hasn't happened yet), the server just waits which keeps the connection open.
If nothing happens on the server for some time, either the client side will time out and make a new request, or the server can choose to return an empty result just to keep the flow going.
The connection is not kept open all the time. It is closed automatically when the response is received from the server and server closes the connection. In long polling the server is not supposed to send back data immediately. On ajax complete (when server closes the connection) the new request is sent to the server, which opens a new connection again and starts to keep pending for new response.
As was mentioned, long polling process is handled not only by client side, but mainly by server side. And not only by server script (in case of PHP), but by server itself, which doesn't close the "hanged" connection by timeout.
FWIW, WebSockets use constantly opened connection with the server side, which makes possible to receive and send back the data without closing the connection.
I guess no one properly explain why do we need timeout in the code. From jQuery Ajax docs:
Set a timeout (in milliseconds) for the request. This will override any global timeout set with $.ajaxSetup(). The timeout period starts at the point the $.ajax call is made; if several other requests are in progress and the browser has no connections available, it is possible for a request to time out before it can be sent
The timeout option indeed doesn't delay the next execution for X seconds. it only sets a maximum timeout for the current call. Good article about timeout stuff - https://mashupweb.wordpress.com/2013/06/26/you-should-always-add-timeout-to-you-ajax-call-in-jquery/