Node.js - Sending a big object to child_process is slow

Node.js - Sending a big object to child_process is slow - javascript

My Use-case is as follows:
I make plenty of rest API calls from my node server to public APIs. Sometime the response is big and sometimes its small. My use-case demands me to stringify the response JSON. I know a big JSON as response is going to block my event loop. After some research i decided to use child_process.fork for parsing these responses, so that the other API calls need not wait. I tried sending a big 30 MB JSON file from my main process to the forked child_process. It takes so long for the child process to pick and parse the json. The response im expecting from the child process is not huge. I just want to stringify and get the length and send back to the main process.
Im attaching the master and child code.
var moment = require('moment');
var fs = require('fs');
var process = require('child_process');
var request = require('request');
var start_time = moment.utc().valueOf();
request({url: 'http://localhost:9009/bigjson'}, function (err, resp, body) {
if (!err && resp.statusCode == 200) {
console.log('Body Length : ' + body.length);
var ls = process.fork("response_handler.js", 0);
ls.on('message', function (message) {
console.log(moment.utc().valueOf() - start_time);
console.log(message);
});
ls.on('close', function (code) {
console.log('child process exited with code ' + code);
});
ls.on('error', function (err) {
console.log('Error : ' + err);
});
ls.on('exit', function (code, signal) {
console.log('Exit : code : ' + code + ' signal : ' + signal);
});
}
ls.send({content: body});
});
response_handler.js
console.log("Process " + process.argv[2] + " at work ");
process.on('message', function (json) {
console.log('Before Parsing');
var x = JSON.stringify(json);
console.log('After Parsing');
process.send({msg: 'Sending message from the child. total size is' + x.length});
});
Is there a better way to achieve what im trying to do? On one hand i need the power of node.js to make 1000's of API calls per second, but sometimes i get a big JSON back which screws things up.

Your task seems to be both IO-bound (fetching 30MB sized JSON) where Node's asynchronicity shines, as well as CPU-bound (parsing 30MB sized JSON) where asynchronicity doesn't help you.
Forking too many processes soon becomes a resource hog and degrades performance. For CPU-bound tasks you need just as many processes as you have cores and no more.
I would use one separate process to do the fetching and delegate parsing to N other processes, where N is (at most) the number of your CPU cores minus 1 and use some form of IPC for the process communication.
One choice is to use Node's Cluster module to orchestrate all of the above: https://nodejs.org/docs/latest/api/cluster.html
Using this module, you can have a master process create your worker processes upfront and don't need to worry when to fork, how many processes to create, etc. IPC works as usual with process.send and process.on. So a possible workflow is:
Application startup: master process creates a "fetcher" and N "parser" processes.
fetcher is sent a work list of API endpoints to process and starts fetching JSON sending it back to master process.
on every JSON fetched the master sends to a parser process. You could use them in a round-robin fashion or use a more sophisticated way of signalling to the master process when a parser work queue is empty or is running low.
parser processes send the resulting JSON object back to master.
Note that IPC also has non-trivial overhead, especially when send/receiving large objects. You could even have the fetcher do the parsing of very small responses instead of passing them around to avoid this. "Small" here is probably < 32KB.
See also: Is it expensive/efficient to send data between processes in Node?

Related

WebSocket needs browser refresh to update list

My project works as intended except that I have to refresh the browser every time my keyword list sends something to it to display. I assume it's my inexperience with Expressjs and not creating the route correctly within my websocket? Any help would be appreciated.
Browser
let socket = new WebSocket("ws://localhost:3000");
socket.addEventListener('open', function (event) {
console.log('Connected to WS server')
socket.send('Hello Server!');
});
socket.addEventListener('message', function (e) {
const keywordsList = JSON.parse(e.data);
console.log("Received: '" + e.data + "'");
document.getElementById("keywordsList").innerHTML = e.data;
});
socket.onclose = function(code, reason) {
console.log(code, reason, 'disconnected');
}
socket.onerror = error => {
console.error('failed to connect', error);
};
Server
const ws = require('ws');
const express = require('express');
const keywordsList = require('./app');
const app = express();
const port = 3000;
const wsServer = new ws.Server({ noServer: true });
wsServer.on('connection', function connection(socket) {
socket.send(JSON.stringify(keywordsList));
socket.on('message', message => console.log(message));
});
// `server` is a vanilla Node.js HTTP server, so use
// the same ws upgrade process described here:
// https://www.npmjs.com/package/ws#multiple-servers-sharing-a-single-https-server
const server = app.listen(3000);
server.on('upgrade', (request, socket, head) => {
wsServer.handleUpgrade(request, socket, head, socket => {
wsServer.emit('connection', socket, request);
});
});

In answer to "How to Send and/or Stream array data that is being continually updated to a client" as arrived at in comment.
A possible solution using WebSockets may be to
Create an interface on the server for array updates (if you haven't already) that isolates the array object from arbitrary outside modification and supports a callback when updates are made.
Determine the latency allowed for multiple updates to occur without being pushed. The latency should allow reasonable time for previous network traffic to complete without overloading bandwidth unnecessarily.
When an array update occurs, start a timer if not already running for the latency period .
On timer expiry JSON.stringify the array (to take a snapshot), clear the timer running status, and message the client with the JSON text.
A slightly more complicated method to avoid delaying all push operations would be to immediately push single updates unless they occur within a guard period after the most recent push operation. A timer could then push modifications made during the guard period at the end of the guard period.
Broadcasting
The WebSockets API does not directly support broadcasting the same data to multiple clients. Refer to Server Broadcast in ws documentation for an example of sending data to all connected clients using a forEach loop.
Client side listener
In the client-side message listener
document.getElementById("keywordsList").innerHTML = e.data;
would be better as
document.getElementById("keywordsList").textContent = keywordList;
to both present keywords after decoding from JSON and prevent them ever being treated as HTML.

So I finally figured out what I wanted to accomplish. It sounds straight forward after I learned enough and thought about how to structure the back end of my project.
If you have two websockets running and one needs information from the other, you cannot run them side by side. You need to have one encapsulate the other and then call the websocket INSIDE of the other websocket. This can easily cause problems down the road for other projects since now you have one websocket that won't fire until the other is run but for my project it makes perfect sense since it is locally run and needs all the parts working 100 percent in order to be effective. It took me a long time to understand how to structure the code as such.

Running JavaScript with a web request?

BACK STORY :
Let me come from my problem, I need to update firebase database with Arduino so I used firebase-Arduino library but for some reason it will not compile Node MCU so my next way is a bit complicated that is I created a java script to update the firebase I just need to add 1 to the database so I don't need to update sensor value or anything so if I load the webpage it will update the value ,I thought it will be triggered with http request from Arduino but I was wrong it does not work like that.
QUESTION : How to run the JavaScript in a webpage with a web request from Arduino?

Assuming you have node.js installed you can have something like this (source):
const https = require('https');
https.get('your_url_here', (resp) => {
let data = '';
// A chunk of data has been recieved.
resp.on('data', (chunk) => {
data += chunk;
});
// The whole response has been received. Print out the result.
resp.on('end', () => {
console.log(JSON.parse(data).explanation);
});
}).on("error", (err) => {
console.log("Error: " + err.message);
});
But if you don't have installed node.js you might create the http request from bash commands like curl. This can be useful since you can make it run as daemon (run on th background every X minutes).
Let me know if you managed, something good luck.

jQuery on electron main process

How I can use jQuery on electron main process?
It seems every example I find is for renderer process.
Example I want to create a util that will be used by the main process, that will fetch data from an api using get.
Then using $.get makes an error that get is not a function.
Thanks.

jQuery is a JS library for the browser, eg DOM manipulating, etc. You shouldn't use that in the main process, since the main process is running in NodeJS.
It's hard to propose a solution without knowing more about your application. If you need the data from the AJAX request in your main process, you can use NodeJS https package. Example from Twilio blog:
const https = require('https');
https.get('https://api.nasa.gov/planetary/apod?api_key=DEMO_KEY', (resp) => {
let data = '';
// A chunk of data has been recieved.
resp.on('data', (chunk) => {
data += chunk;
});
// The whole response has been received. Print out the result.
resp.on('end', () => {
console.log(JSON.parse(data).explanation);
});
}).on("error", (err) => {
console.log("Error: " + err.message);
});
Edit:
As #Hans-Koch mentioned, you probably shouldn't use jQuery in the renderer process either since one of it's main purpose is to normalize the API for DOM manipulation, AJAX, etc. and in Electron you only have to support Chromium. If you want to make AJAX request you can use the XMLHttpRequest or some npm package which wraps it, eg xhr.

Can I yield to a child process and return the response in Node.js?

In short, I've run into an issue where multiple parallel GET requests to my Node.js server cause the server to get "clogged up" and hang, thus resulting in timeouts for the clients (503, service unavailable).
After a lot of performance analysis, I've realized it's a CPU issue. The specific request (we'll call it GET /foo) queries data from multiple services over HTTP, and then does a lot of computation, and returns the results to the client, like this:
Client request GET /foo
/foo controller queries data over HTTP from multiple other services`
/foo controller then does a bunch of iterations over the data to compile some output for the client
Step 3 takes around 2 seconds to complete. However, if I send 2 requests in parallel to /foo, each client will receive their response in about 4 seconds. When I run the app in a cluster using more cores, the requests run much faster, but not quite what I want.
Seems like I have several options here:
pre-compute the response (ideally would like to avoid this for now, since it will require a whole "cache invalidation" scheme), or
/foo sends the CPU-blocking computation asynchronously to another process (using Heroku, so that would be another dyno), and then I can use a websocket or something to push the results to the client (again, very complex for my situation), or
somehow yield to a child process in the request and return the results to the client
Would love to do something like option 3. Something like this:
get('/foo', function*(request) {
// I/O, so not blocking the event loop (I think)
let data = yield getData(request)
// make this happen in a different process
let response = yield doSomeHeavyProcessing(data)
return response
})
I've omitted a lot of implementation details above, but if it's necessary to know, I'm using Koa and Node.js 6.
Ideally, doSomeHeavyProcessing would do the CPU-intensive computation in some separate process, and when it's done, still send the results back in a "synchronous" fashion to the request client.
Been trying to wrap my head around child processes, web workers, fibers, etc., and have been doing some basic "hello worlds" with these to get them to do basically the above, but to no avail. Can post more details if necessary.

Here are some approaches that you can try:
1.
Split blocking computation in small chunks and use setImmediate to place the next chunk of work at the end of the event queue. So computation is no longer blocking and other requests can be processed.
2.
Microsoft recently released napajs. As stated in their README
As it evolves, we find it useful to complement Node.js in CPU-bound tasks, with the capability of executing JavaScript in multiple V8 isolates and communicating between them.
I haven't tried it, but it looks very promising:
var napa = require('napajs');
var zone1 = napa.zone.create('zone1', { workers: 4 });
get('/foo', function*(request) {
let data = yield getData(request)
let response = yield zone1.execute(doSomeHeavyProcessing, [data])
return response
})
3. If nothing of the above is enough and you need to spread the load across multiple machines, then you probably couldn't avoid using some sort of message queue to distribute work to different servers. In this case check out ZeroMQ. It is extremely easy to use from node, and you can implement any kind of distributed messaging pattern with it.

You could utilize Child process with additional wrapper for convenience.
worker.js - this module will run in a separate process and will do the heavy work
const crypto = require('crypto');
function doHeavyWork(data) {
return crypto.pbkdf2Sync(data, 'salt', 100000, 64, 'sha512');
}
process.on('message', (message) => {
const result = doHeavyWork(message.data);
process.send({ id: message.id, result });
});
client.js - a convenience (but primitive) wrapper for Child process
const cp = require('child_process');
let worker;
const resolves = new Map();
module.exports = {
init(moduleName, errorCallback) {
worker = cp.fork(moduleName);
worker.on('error', errorCallback);
worker.on('message', (message) => {
const resolve = resolves.get(message.id);
resolves.delete(message.id);
if (!resolve) {
errorCallback(new Error(`Got response from worker with unknown id: ${message.id}`));
return;
}
resolve(message.result);
});
console.log(`Service PID: ${process.pid}, Worker PID: ${worker.pid}`);
},
doHeavyWorkRemotly(data) {
const id = `${Date.now()}${Math.random()}`;
return new Promise((resolve) => {
worker.send({ id, data });
resolves.set(id, resolve);
});
}
}
I use fork() to utilize an additional communication channel as it is stated in the docs.
Also I keep a record of all submitted to worker process requests (const resolves = new Map();) and resolve Promises (resolve(message.result);) only when the worker process returns response for the specific request (const resolve = resolves.get(message.id);).
run.js - a startup module, it utilizes co to 'execute' generators.
const co = require('co');
const client = require('./client');
function errorCallback(error) {
console.log('Got an unexpected error!');
console.log(error);
}
client.init('./worker.js', errorCallback);
function* run() {
while(true) {
yield client.doHeavyWorkRemotly('mydata');
}
}
co(run);
To test it simply run node run.js, it will print
Service PID: XXXX, Worker PID: XXXX
then take a look at CPU utilization, worker process will probably take around 100% of CPU while Service will be quite idle.

NodeJS HTTP server stalled on V8 execution

EDITED
I have a nodeJS http server that is meant for receiving uploads from multiple clients and processing them separately.
My problem is that I've verified that the first request blocks the reception of any other request until the previous request is served.
This is the code I've tested:
var http = require('http');
http.globalAgent.maxSockets = 200;
var url = require('url');
var instance = require('./build/Release/ret');
http.createServer( function(req, res){
var path = url.parse(req.url).pathname;
console.log("<req>"+path+"</req>");
switch (path){
case ('/test'):
var body = [];
req.on('data', function (chunk) {
body.push(chunk);
});
req.on('end', function () {
body = Buffer.concat(body);
console.log("---req received---");
console.log(Date.now());
console.log("------------------");
instance.get(function(result){
postHTTP(result, res);
});
});
break;
}
}).listen(9999);
This is the native side (omitting obvious stuff) where getInfo is the exported method:
std::string ret2 (){
sleep(1);
return string("{\"image\":\"1.JPG\"}");
}
Handle<Value> getInfo(const Arguments &args) {
HandleScope scope;
if(args.Length() == 0 || !args[0]->IsFunction())
return ThrowException(Exception::Error(String::New("Error")));
Persistent<Function> fn = Persistent<Function>::New(Handle<Function>::Cast(args[0]));
Local<Value> objRet[1] = {
String::New(ret2().c_str())
};
Handle<Value> ret = fn->Call(Context::GetCurrent()->Global(), 1, objRet);
return scope.Close(Undefined());
}
I'm resting this with 3 curl parallel requests
for i in {1..3}; do time curl --request POST --data-binary "#/home/user/Pictures/129762.jpg" http://192.160.0.1:9999/test & done
This is the output from the server:
<req>/test</req>
---req received---
1397569891165
------------------
<req>/test</req>
---req received---
1397569892175
------------------
<req>/test</req>
---req received---
1397569893181
------------------
These the response and the timing from the client:
"1.JPG"
real 0m1.024s
user 0m0.004s
sys 0m0.009s
"1.JPG"
real 0m2.033s
user 0m0.000s
sys 0m0.012s
"1.JPG"
real 0m3.036s
user 0m0.013s
sys 0m0.001s
Apparently requests are received after the previous has been served. The sleep(1) simulates a synchronous operation that requires about 1s to complete and can't be changed.
The client receives the responses with an incremental delay of ~1s.
I would like to achieve a kind of parallelism, although I'm aware I'm in a single threaded environment such as nodeJS. What I would like to achieve is receiving all 3 answers is ~1s.
Thanks in advance for your help.

This:
for(var i=0;i<1000000000;i++) var a=a+i;
Is a pretty severe blocking operation. As soon as the first block ends. Your whole server hangs until this for loop is done. I'm interested in why you are trying to do this.
Perhaps you are trying to simulate a delayed response ?
setTimeout(function)({
send404(res);
}, 3000);
Right now you are turning a non-flowing stream into flowing mode by attaching a data event handler, and subsequently loading the whole stream into memory. You probably don't want to do this.
You can use the stream in now-flowing mode as illustrated below, this is useful if you want to send the data to some place that is only accessible after some other event.
However, using the stream in flowing mode is the fastest. If you want to write your own body parser I suppose you might want to use flowing mode, it depends on your use case.
req.on('readable', function () {
var chunk;
while (null !== (chunk = readable.read())) {
body.push(chunk);
}
});
Flowing and non-flowing mode is also know as respectively v1 and v2 streams, as the older streams used in node only supported flowing mode.

Develop Reference

JavaScript is the programming language of the Web.