How do I send bulk HTTP GET requests using Axios, for example:
let maxI = 3000;
let i = 0;
do{
i = i + 1 ;
await exampleUrl = axios.get(`https://hellowWorld.com/${i}`);
} while (i < maxI);
How will I be able to receive the data from all the provided URLs and can this be merged into a single variable? And how can I make sure that this gets executed quickly?
I know about axios.all, but I don't know how to apply it in my case.
Thank you in advance.
You can do something like this but be careful, servers will reject your request if you make them in bulk to prevent DDOS and this also doesn't guarantee that all the requests would return successfully and you will receive all the data, here is the snippet for it:
import axios from "axios";
const URL = "https://randomuser.me/api/?results=";
async function getData() {
const requests = [];
for (let i = 1; i < 6; i++) {
requests.push(axios.get(URL + i));
}
const responses = await Promise.allSettled(requests);
console.log(responses);
const result = [];
responses.forEach((item) => {
if (item.status === "rejected") return;
result.push(item.value.data.results);
});
console.log(result.flat());
}
getData();
AFAIK, it is impossible to increase the speed/reduce the time taken to complete your batch requests unless you implement a batch request handling API on server which would reduce both number of requests handled by the server and the number of requests made by the browser. The solution I gave you is just to demonstrate how it could be done from client side but your approach is not an optimal way to do it.
There is a different limit for number of parallel requests that can be made to a single domain for different browsers because of which we cannot reduce the time taken to execute queries.
Please read through these resources for further information they will be of great help:
Bulk Requests Implementation
Browser batch request ajax
Browser request limits
Limit solution
Related
I would like to make 10,000 concurrent HTTP requests. I am currently doing it by using Promise.all. However, I seem to be rate limited in some way, it takes around 15-30 mins to complete all 10,000 requests. Is there something in axios or in the http requests in node that is limiting me? How can I raise the limt if there is one?
const axios = require('axios');
function http_request(url) {
return new Promise(async (resolve) => {
await axios.get(url);
// -- DO STUFF
resolve();
});
}
async function many_requests(num_requests) {
let all_promises = [];
for (let i = 0; i < num_requests; i++) {
let url = 'https://someurl.com/' + i;
let promise = http_request(url);
all_promises.push(promise);
}
return Promise.all(all_promises);
}
async function run() {
await many_requests(10000);
}
run();
In Node.js there are two types of threads: one Event Loop (aka the
main loop, main thread, event thread, etc.), and a pool of k Workers
in a Worker Pool (aka the threadpool).
...
The Worker Pool of Node.js is implemented in libuv (docs), which
exposes a general task submission API.
Event loop run in a thread, push tasks to pool of k Workers. And these workers will run parallel. Default number of work in pool is 4. You can set more.
source
libuv
Default UV_THREADPOOLSIZE is 4. You can set UV_THREADPOOLSIZE as link. Limit of it depend on os, you need check your os:
set UV_THREADPOOL_SIZE
Going through the axios docs, I'm trying to figure out if the axio.all construct for making concurrent requests works when the request are not all get request i.e can it work with a mix of GET and POST requests.
You can try performing multiple requests with your Axios client and using Promise.all to wait on the result of all your requests.
Here's an example using JavaScript:
const promiseGet = axios.get('/user?ID=12345')
const promisePost = axios.post('/user', { some: data })
const [responseGet, responsePost] = await Promise.all(promiseGet, promisePost)
Note that you should handle the possible errors (it's not described in the example above!)
I have two API urls to call the first one is https://jsonplaceholder.typicode.com/todos I need to get this first url to retrieve the id. After retrieving, call the second url which is https://jsonplaceholder.typicode.com/todos/(id). I am using promise-based for this approach, but my problem is.
How to achieve this in fast retrieving the large number of data?
Note: I am using only Plain JavaScript and CDN for axios.
export const getData = () => {
const API = `https://jsonplaceholder.typicode.com/todos`;
return axios.get(API, {
headers: {
"accept": "application/json;odata=verbose"
}
}).then(res => {
const data = [];
const requests = res.map(val => {
const id = val.id;
var obj = {};
const url = `https://jsonplaceholder.typicode.com/todos/(id)`;
return axios.get(url).then(res => {
obj['Result'] = res;
});
});
return Promise.all(requests).then(() => {
return data;
});
});
}
This code is working but it was slow getting the data and I need some suggestions for best concepts.
The fastest way would be to not perform all the AJAX calls from a web browser.
Web browsers cap the number of simultaneous requests around 6–10 (from this post), so if you can perform a request a get a response in 200ms, you're still looking at a full minute of client-side requests.
If instead you built a server-side solution to aggregate the data, you could query your custom endpoint to retrieve larger chunks of data at a time.
If that isn't an option for you, then either way, the browser request limit will probably be your bottleneck.
In short, I've run into an issue where multiple parallel GET requests to my Node.js server cause the server to get "clogged up" and hang, thus resulting in timeouts for the clients (503, service unavailable).
After a lot of performance analysis, I've realized it's a CPU issue. The specific request (we'll call it GET /foo) queries data from multiple services over HTTP, and then does a lot of computation, and returns the results to the client, like this:
Client request GET /foo
/foo controller queries data over HTTP from multiple other services`
/foo controller then does a bunch of iterations over the data to compile some output for the client
Step 3 takes around 2 seconds to complete. However, if I send 2 requests in parallel to /foo, each client will receive their response in about 4 seconds. When I run the app in a cluster using more cores, the requests run much faster, but not quite what I want.
Seems like I have several options here:
pre-compute the response (ideally would like to avoid this for now, since it will require a whole "cache invalidation" scheme), or
/foo sends the CPU-blocking computation asynchronously to another process (using Heroku, so that would be another dyno), and then I can use a websocket or something to push the results to the client (again, very complex for my situation), or
somehow yield to a child process in the request and return the results to the client
Would love to do something like option 3. Something like this:
get('/foo', function*(request) {
// I/O, so not blocking the event loop (I think)
let data = yield getData(request)
// make this happen in a different process
let response = yield doSomeHeavyProcessing(data)
return response
})
I've omitted a lot of implementation details above, but if it's necessary to know, I'm using Koa and Node.js 6.
Ideally, doSomeHeavyProcessing would do the CPU-intensive computation in some separate process, and when it's done, still send the results back in a "synchronous" fashion to the request client.
Been trying to wrap my head around child processes, web workers, fibers, etc., and have been doing some basic "hello worlds" with these to get them to do basically the above, but to no avail. Can post more details if necessary.
Here are some approaches that you can try:
1.
Split blocking computation in small chunks and use setImmediate to place the next chunk of work at the end of the event queue. So computation is no longer blocking and other requests can be processed.
2.
Microsoft recently released napajs. As stated in their README
As it evolves, we find it useful to complement Node.js in CPU-bound tasks, with the capability of executing JavaScript in multiple V8 isolates and communicating between them.
I haven't tried it, but it looks very promising:
var napa = require('napajs');
var zone1 = napa.zone.create('zone1', { workers: 4 });
get('/foo', function*(request) {
let data = yield getData(request)
let response = yield zone1.execute(doSomeHeavyProcessing, [data])
return response
})
3. If nothing of the above is enough and you need to spread the load across multiple machines, then you probably couldn't avoid using some sort of message queue to distribute work to different servers. In this case check out ZeroMQ. It is extremely easy to use from node, and you can implement any kind of distributed messaging pattern with it.
You could utilize Child process with additional wrapper for convenience.
worker.js - this module will run in a separate process and will do the heavy work
const crypto = require('crypto');
function doHeavyWork(data) {
return crypto.pbkdf2Sync(data, 'salt', 100000, 64, 'sha512');
}
process.on('message', (message) => {
const result = doHeavyWork(message.data);
process.send({ id: message.id, result });
});
client.js - a convenience (but primitive) wrapper for Child process
const cp = require('child_process');
let worker;
const resolves = new Map();
module.exports = {
init(moduleName, errorCallback) {
worker = cp.fork(moduleName);
worker.on('error', errorCallback);
worker.on('message', (message) => {
const resolve = resolves.get(message.id);
resolves.delete(message.id);
if (!resolve) {
errorCallback(new Error(`Got response from worker with unknown id: ${message.id}`));
return;
}
resolve(message.result);
});
console.log(`Service PID: ${process.pid}, Worker PID: ${worker.pid}`);
},
doHeavyWorkRemotly(data) {
const id = `${Date.now()}${Math.random()}`;
return new Promise((resolve) => {
worker.send({ id, data });
resolves.set(id, resolve);
});
}
}
I use fork() to utilize an additional communication channel as it is stated in the docs.
Also I keep a record of all submitted to worker process requests (const resolves = new Map();) and resolve Promises (resolve(message.result);) only when the worker process returns response for the specific request (const resolve = resolves.get(message.id);).
run.js - a startup module, it utilizes co to 'execute' generators.
const co = require('co');
const client = require('./client');
function errorCallback(error) {
console.log('Got an unexpected error!');
console.log(error);
}
client.init('./worker.js', errorCallback);
function* run() {
while(true) {
yield client.doHeavyWorkRemotly('mydata');
}
}
co(run);
To test it simply run node run.js, it will print
Service PID: XXXX, Worker PID: XXXX
then take a look at CPU utilization, worker process will probably take around 100% of CPU while Service will be quite idle.
EDITED
I have a nodeJS http server that is meant for receiving uploads from multiple clients and processing them separately.
My problem is that I've verified that the first request blocks the reception of any other request until the previous request is served.
This is the code I've tested:
var http = require('http');
http.globalAgent.maxSockets = 200;
var url = require('url');
var instance = require('./build/Release/ret');
http.createServer( function(req, res){
var path = url.parse(req.url).pathname;
console.log("<req>"+path+"</req>");
switch (path){
case ('/test'):
var body = [];
req.on('data', function (chunk) {
body.push(chunk);
});
req.on('end', function () {
body = Buffer.concat(body);
console.log("---req received---");
console.log(Date.now());
console.log("------------------");
instance.get(function(result){
postHTTP(result, res);
});
});
break;
}
}).listen(9999);
This is the native side (omitting obvious stuff) where getInfo is the exported method:
std::string ret2 (){
sleep(1);
return string("{\"image\":\"1.JPG\"}");
}
Handle<Value> getInfo(const Arguments &args) {
HandleScope scope;
if(args.Length() == 0 || !args[0]->IsFunction())
return ThrowException(Exception::Error(String::New("Error")));
Persistent<Function> fn = Persistent<Function>::New(Handle<Function>::Cast(args[0]));
Local<Value> objRet[1] = {
String::New(ret2().c_str())
};
Handle<Value> ret = fn->Call(Context::GetCurrent()->Global(), 1, objRet);
return scope.Close(Undefined());
}
I'm resting this with 3 curl parallel requests
for i in {1..3}; do time curl --request POST --data-binary "#/home/user/Pictures/129762.jpg" http://192.160.0.1:9999/test & done
This is the output from the server:
<req>/test</req>
---req received---
1397569891165
------------------
<req>/test</req>
---req received---
1397569892175
------------------
<req>/test</req>
---req received---
1397569893181
------------------
These the response and the timing from the client:
"1.JPG"
real 0m1.024s
user 0m0.004s
sys 0m0.009s
"1.JPG"
real 0m2.033s
user 0m0.000s
sys 0m0.012s
"1.JPG"
real 0m3.036s
user 0m0.013s
sys 0m0.001s
Apparently requests are received after the previous has been served. The sleep(1) simulates a synchronous operation that requires about 1s to complete and can't be changed.
The client receives the responses with an incremental delay of ~1s.
I would like to achieve a kind of parallelism, although I'm aware I'm in a single threaded environment such as nodeJS. What I would like to achieve is receiving all 3 answers is ~1s.
Thanks in advance for your help.
This:
for(var i=0;i<1000000000;i++) var a=a+i;
Is a pretty severe blocking operation. As soon as the first block ends. Your whole server hangs until this for loop is done. I'm interested in why you are trying to do this.
Perhaps you are trying to simulate a delayed response ?
setTimeout(function)({
send404(res);
}, 3000);
Right now you are turning a non-flowing stream into flowing mode by attaching a data event handler, and subsequently loading the whole stream into memory. You probably don't want to do this.
You can use the stream in now-flowing mode as illustrated below, this is useful if you want to send the data to some place that is only accessible after some other event.
However, using the stream in flowing mode is the fastest. If you want to write your own body parser I suppose you might want to use flowing mode, it depends on your use case.
req.on('readable', function () {
var chunk;
while (null !== (chunk = readable.read())) {
body.push(chunk);
}
});
Flowing and non-flowing mode is also know as respectively v1 and v2 streams, as the older streams used in node only supported flowing mode.