avoiding simultaneous execution of shell command with node.js & shelljs

avoiding simultaneous execution of shell command with node.js & shelljs - javascript

Using nodejs 8.12 on Gnu/Linux CentOS 7. Using the built-in web server, require('https'), for a simple application.
I understand that nodejs is single threaded (single process) and there is no actual parallel execution of code. Based on my understanding, I think the http/https server will process one http request and run the handler through all synchronous statements and set up asynchronous statements to be executed later before it will return to process a subsequent request. However, with http/https libraries, you have an asynchronous code that is used to assemble the body of the request. So, we already have one callback which is executed when the body is ready ('end' event). This fact makes me think it might be possible to be in the middle of processing two or more requests simultaneously.
As part of handling the requests, I need to execute a string of shell commands and I use the shelljs.exec library to do that. It runs synchronously, waiting until complete before returning. So, example code would look like:
const shelljs_exec = require('shelljs.exec');
function process() {
// bunch of shell commands in string
var command_str = 'command1; command2; command3';
var exec_results = shelljs_exec(command_str);
console.log('just executed shelljs_exec command');
var proc_results = process_results(exec_results);
console.log(proc_results);
// and return the response...
}
So node.js runs the shelljs_exec() and waits for completion. While it's waiting, can another request be worked on, such that there is a risk, slight, of two or more shelljs.exec invocations running at the same time? Since that could be a problem, I need to ensure only one shelljs.exec statement can be in progress at a given time.
If that is not a correct understanding, then I was thinking I need to do something with mutex locks. Like this:
const shelljs_exec = require('shelljs.exec');
const locks = require('locks');
// Need this in global scope - so we are all dealing with the same one.
var mutex = locks.createMutex();
function await_lock(shell_commands) {
var commands = shell_commands;
return new Promise(getting_lock => {
mutex.lock(got_lock_and_execute);
});
function got_lock_and_execute() {
var exec_results = shelljs_exec(commands);
console.log('just executed shelljs_exec command');
mutex.unlock();
return exec_results;
}
}
async function process() {
// bunch of shell commands in string
var command_str = 'command1; command2; command3';
exec_results = await await_lock(command_str);
var proc_results = process_results(exec_results);
console.log(proc_results);
}

If shelljs_exec is synchronous, no need.
If it is not. If it takes a callback wrap it in a Promise constructor so that it can be awaited. I would suggest properly wrapping the mutex.lock in a promise that gets resolved when the lock is acquired. The try finally is needed to ensure that the mutex is unlocked if shelljs_exec throws an exception.
async function await_lock(shell_commands) {
await (new Promise(function(resolve, reject) {
mutex.lock(resolve);
}));
try {
let exec_results = await shelljs_exec(commands);
return exec_results;
} finally {
mutex.unlock();
}
}
Untested. But it looks like it should work.

Related

Is it impossible to create a reliable async singleton pattern in JavaScript?

This is the function that I have:
let counter = 0;
let dbConnected = false;
async function notASingleton(params) {
if (!dbConnected) {
await new Promise(resolve => {
if (Math.random() > 0.75) throw new Error();
setTimeout((params) => {
dbConnected = true; // assume we use params to connect to DB
resolve();
}, 1000);
});
return counter++
}
};
// in another module that imports notASingleton
Promise.all([notASingleton(params), notASingleton(params), notASingleton(params), notASingleton(params)]);
or
// in another module that imports notASingleton
notASingleton(params);
notASingleton(params);
notASingleton(params);
notASingleton(params);
The problem is that apparently the notASinglton promises in might be executed concurrently and assuming they are run in parallel, the execution context for all of them will be dbConnected = false.
Note: I'm aware that we could introduce a new variable e.g. initiatingDbConnection and instead of checking for !dbConnected check for !initiatingDbConnection; however, as long as concurrently means that the context of the promises will be the same inside Promise.all, that will not change anything.
The pattern can be properly implemented in e.g. Java by utilizing the contracts of JVM for creating a class: https://stackoverflow.com/a/16106598/12144949
However, even that Java implementation cannot be used for my use case where I need to pass a variable: "The client application can’t pass any argument, so we can’t reuse it. For example, having a generic singleton class for database connection where client application supplies database server properties."
https://www.journaldev.com/171/thread-safety-in-java-singleton-classes-with-example-code
Note 2: Another possibly related issue: https://eslint.org/docs/rules/require-atomic-updates#rule-details

"On some computers, they may be executed in parallel, or in some sense concurrently, while on others they may be executed serially."
That MDN description is rubbish. I'll remove it. Promise.all is not responsible for executing anything, and promises cannot be "executed" anyway. All it does is to wait for its arguments, and you're the one who are creating those promises in the array. They would run (and concurrently open multiple connections) even if you omitted Promise.all and simply called notASingleton() multiple times.
the execution context for all of them will be dbConnected = false
Yes, but only because your dbConnected = true; is in a timeout and you are calling notASingleton() again before that happens. Not because Promise.all does anything special.

Keep order in nodejs command line script

There is something I want to code in nodejs, but I don't have any idea of how to implement it. I've been reading and searching a lot, and still have not idea of what would be the correct way to do it.
The problem is the following:
Read lines from stdin
For each line, launch an http request
There must be a limit to simultaneous http
Write the line readed plus some data obtained from the http request to stdout
Lines must be written in order
You can not read "all" the file and then split lines: you must process one line at a time, remember it's stdin. You don't know when the input will end.
Does anybody have some clues of how to approach this problem? I do not have any idea of how to proceed.

You could do something like this:
const http = require('http');
const Promise = require('bluebird');
let arrayOfRequests = [];
process.stdin.on('data', (data) => {
//get data from the stdin
//then add the request to the array
arrayOfRequests.push(http.get({}))
})
process.stdin.on('end', () => {
Promise.all(arrayOfRequests)
// use Promise .all to bundle all of the reuqest
//then use the spread operator so you can use all of the reuqest In order
.spread( (request1,request2,request3) => {
// do stuff
})
})
FYI, the snippet wont work.
So what you are doing is using the process.stdin that is built into Node.js. Then you are bundling all of the requests. Whenever the user cancels out of the program, your requests will be made. Since the calls will be async, you have them in an array, then run Promsise.all and use the bluebird .spread operator to deconstruct the Promise.all and get the values.

So far, I've got this solution for the producer-consumer problem in nodejs, where the producer don't produce more data until there is space available in the queue.
This is queue's code, based on block-queue: https://gist.github.com/AlvaroMaceda/8a933a317fed3fd4691fd5eda3e60c5e
To use the blocking queue, you create it with 3 parameters:
Number of tasks running concurrently
"Push" function. It will be called with the queue as parameter when
more data is needed. The task will be added with an identifier.
"Task" function. It will be called with the identifier created by
"Push" function.
The queue will call "push" only when more data is needed. For example, if there are five tasks running and it was created with a maximum of 5 concurrent tasks, "push" won't be called until one of these tasks end.
This is an example of how to use it:
"use strict";
const queue = require('./block-queue');
const CONCURRENCY = 5;
const WORKS_TO_LAUNCH = 10;
const TIMEOUT = 200;
let i = 1;
let q = queue(CONCURRENCY, doSomethingAsync, putMoreData );
function putMoreData(queue) {
if (++i <= WORKS_TO_LAUNCH) {
console.log('Pushing to queue');
queue.push(i);
}
}
function doSomethingAsync(task, done) {
setTimeout(function() {
console.log('done ' + task);
done();
}, 1000 + task * TIMEOUT);
}
q.push(i);
I don't give this as solved because I don't know if there is a more simple approach and I want to work the complete solution, don't know if I'll find some issues when working with this and streams.

In node.js, how to use child_process.exec so all can happen asynchronously?

I have a server built on node.js. Below is one of the request handler functions:
var exec = require("child_process").exec
function doIt(response) {
//some trivial and fast code - can be ignored
exec(
"sleep 10", //run OS' sleep command, sleep for 10 seconds
//sleeping(10), //commented out. run a local function, defined below.
function(error, stdout, stderr) {
response.writeHead(200, {"Content-Type": "text/plain"});
response.write(stdout);
response.end();
});
//some trivial and fast code - can be ignored
}
Meanwhile, in the same module file there is a local function "sleeping" defined, which as its name indicates will sleep for 10 seconds.
function sleeping(sec) {
var begin = new Date().getTime();
while (new Date().getTime() < begin + sec*1000); //just loop till timeup.
}
Here come three questions --
As we know, node.js is single-processed, asynchronous, event-driven. Is it true that ALL functions with a callback argument is asynchronous? For example, if I have a function my_func(callback_func), which takes another function as an argument. Are there any restrictions on the callback_func or somewhere to make my_func asynchronous?
So at least the child_process.exec is asynchronous with a callback anonymous function as argument. Here I pass "sleep 10" as the first argument, to call the OS's sleep command and wait for 10 seconds. It won't block the whole node process, i.e. any other request sent to another request handler won't be blocked as long as 10 seconds by the "doIt" handler. However, if immediately another request is sent to the server and should be handled by the same "doIt" handler, will it have to wait till the previous "doIt" request ends?
If I use the sleeping(10) function call (commented out) to replace the "sleep 10", I found it does block other requests till 10 seconds after. Could anyone explain why the difference?
Thanks a bunch!
-- update per request --
One comment says this question seemed duplicate to another one (How to promisify Node's child_process.exec and child_process.execFile functions with Bluebird?) that was asked one year after this one.. Well these are too different - this was asked for asynchronous in general with a specific buggy case, while that one was asking about the Promise object per se. Both the intent and use cases vary.
(If by any chance these are similar, shouldn't the newer one marked as duplicate to the older one?)

First you can promisify the child_process.
const util = require('util');
const exec = util.promisify(require('child_process').exec);
async function lsExample() {
const { stdout, stderr } = await exec('ls');
if (stderr) {
// handle error
console.log('stderr:', stderr);
}
console.log('stdout:', stdout);
}
lsExample()
As an async function, lsExample returns a promise.
Run all promises in parallel with Promise.all([]).
Promise.all([lsExample(), otherFunctionExample()]);
If you need to wait on the promises to finish in parallel, await them.
await Promise.all([aPromise(), bPromise()]);
If you need the values from those promises
const [a, b] = await Promise.all([aPromise(), bPromise(])

1) No. For example .forEach is synchronous:
var lst = [1,2,3];
console.log("start")
lst.forEach(function(el) {
console.log(el);
});
console.log("end")
Whether function is asynchronous or not it purely depends on the implementation - there are no restrictions. You can't know it a priori (you have to either test it or know how it is implemented or read and believe in documentation). There's even more, depending on arguments the function can be either asynchronous or synchronous or both.
2) No. Each request will spawn a separate "sleep" process.
3) That's because your sleeping function is a total mess - it is not sleep at all. What it does is it uses an infinite loop and checks for date (thus using 100% of CPU). Since node.js is single-threaded then it just blocks entire server - because it is synchronous. This is wrong, don't do this. Use setTimeout instead.

JQuery / Javascript inline callback

In tornado we have gen module, that allows us to write constructions like this:
def my_async_function(callback):
return callback(1)
#gen.engine
def get(self):
result = yield gen.Task(my_async_function) #Calls async function and puts result into result variable
self.write(result) #Outputs result
Do we have same syntax sugar in jquery or other javascript libraries?
I want something like this:
function get_remote_variable(name) {
val = $.sweetget('/remote/url/', {}); //sweetget automatically gets callback which passes data to val
return val
}

You describe the function as "my_async_function", but the way you use it is synchronous rather than asynchronous.
Your sample code requires blocking -- if "my_async_function" were truly asynchronous (non-blocking), the following line of code self.write(result) would execute immediately after you called "my_async_function". If the function takes any time at all (and I mean any) to come back with a value, what would self.write(result) end up writing? That is, if self.write(result) is executed before result ever has a value, you don't get expected results. Thus, "my_async_function" must block, it must wait for the return value before going forward, thus it is not asynchronous.
On to your question specifically, $.sweetget('/remote/url/', {}): In order to accomplish that, you would have to be able to block until the ajax request (which is inherently asynchronous -- it puts the first A in AJAX) comes back with something.
You can hack a synchronous call by delaying the return of sweetget until the XHR state has changed, but you'd be using a while loop (or something similar) and risk blocking the browser UI thread or setting off one of those "this script is taking too long" warnings. Javascript does not offer threading control. You cannot specify that your current thread is waiting, so go ahead and do something else for a minute. You could contend with that, too, by manually testing for a timeout threshold.
By now one should be starting to wonder: why not just use a callback? No matter how you slice it, Javascript is single-threaded. No sleep, no thread.sleep. That means that any synchronous code will block the UI.
Here, I've mocked up what sweetget would, roughly, look like. As you can see, your browser thread will lock up as soon as execution enters that tight while loop. Indeed, on my computer the ajax request won't even fire until the browser displays the unresponsive script dialog.
// warning: this code WILL lock your browser!
var sweetget = function (url, time_out) {
var completed = false;
var result = null;
var run_time = false;
if (time_out)
run_time = new Date().getTime();
$.ajax({
url: url,
success: function(data) {
result = data;
completed = true;
},
error: function () {
completed = true;
}
}); // <---- that AJAX request was non-blocking
while (!completed) { // <------ but this while loop will block
if (time_out) {
if (time_out>= run_time)
break;
run_time = new Date().getTime();
}
}
return result;
};
var t = sweetget('/echo/json/');
console.log('here is t: ', t);
Try it: http://jsfiddle.net/AjRy6/

Versions of jQuery prior to 1.8 support sync ajax calls via the async: false setting. Its a hack with limitations (no cross-domain or jsonp, locks up the browser), and I would avoid it if possible.
There are several available libraries that provide some syntactic sugar for async operations in Javascript. For example:
https://github.com/caolan/async
https://github.com/coolaj86/futures
...however I don't think anything provides the synchronous syntax you are looking for - there is always a callback involved, because of the way JavaScript works in the browser.

How to write a node.js function that waits for an event to fire before 'returning'?

I have a node application that is not a web application - it completes a series of asynchronous tasks before returning 1. Immediately before returning, the results of the program are printed to the console.
How do I make sure all the asynchronous work is completed before returning? I was able to achieve something similar to this in a web application by making sure all tasks we completed before calling res.end(), but I haven't any equivalent for a final 'event' to call before letting a script return.
See below for my (broken) function currently, attempting to wait until callStack is empty. I just discovered that this is a kind of nonsensical approach because node waits for processHub to complete before entering any of the asynchronous functions called in processObjWithRef.
function processHub(hubFileContents){
var callStack = [];
var myNewObj = {};
processObjWithRef(samplePayload, myNewObj, callStack);
while(callStack.length>0){
//do nothing
}
return 1
}
Note: I have tried many times previously to achieve this kind of behavior with libraries like async (see my related question at How can I make this call to request in nodejs synchronous?) so please take the answer and comments there into account before suggesting any answers based on 'just use asynch'.

You cannot wait for an asynchronous event before returning--that's the definition of asynchronous! Trying to force Node into this programming style will only cause you pain. A naive example would be to check periodically to see if callstack is empty.
var callstack = [...];
function processHub(contents) {
doSomethingAsync(..., callstack);
}
// check every second to see if callstack is empty
var interval = setInterval(function() {
if (callstack.length == 0) {
clearInterval(interval);
doSomething()
}
}, 1000);
Instead, the usual way to do async stuff in Node is to implement a callback to your function.
function processHub(hubFileContents, callback){
var callStack = [];
var myNewObj = {};
processObjWithRef(samplePayload, myNewObj, callStack, function() {
if (callStack.length == 0) {
callback(some_results);
}
});
}
If you really want to return something, check out promises; they are guaranteed to emit an event either immediately or at some point in the future when they are resolved.
function processHub(hubFileContents){
var callStack = [];
var myNewObj = {};
var promise = new Promise();
// assuming processObjWithRef takes a callback
processObjWithRef(samplePayload, myNewObj, callStack, function() {
if (callStack.length == 0) {
promise.resolve(some_results);
}
});
return promise;
}
processHubPromise = processHub(...);
processHubPromise.then(function(result) {
// do something with 'result' when complete
});

The problem is with your design of the function. You want to return a synchronous result from a list of tasks that are executed asynchronously.
You should implement your function with an extra parameter that will be the callback where you would put the result (in this case, 1) for some consumer to do something with it.
Also you need to have a callback parameter in your inner function, otherwise you won't know when it ends. If this last thing is not possible, then you should do some kind of polling (using setInterval perhaps) to test when the callStack array is populated.
Remember, in Javascript you should never ever do a busy wait. That will lock your program entirely as it runs on a single process.

deasync is desinged to address your problem exactly. Just replace
while(callStack.length>0){
//do nothing
}
with
require('deasync').loopWhile(function(){return callStack.length>0;});

The problem is that node.js is single-threaded, which means that if one function runs, nothing else runs (event-loop) until that function has returned. So you can not block a function to make it return after async stuff is done.
You could, for example, set up a counter variable that counts started async tasks and decrement that counter using a callback function (that gets called after the task has finished) from your async code.

Node.js runs on A SINGLE threaded event loop and leverages asynchronous calls for doing various things, like I/O operations.
if you need to wait for a number of asynchronous operations to finish before executing additional code
you can try using Async -
Node.js Async Tutorial

You'll need to start designing and thinking asynchronously, which can take a little while to get used to at first. This is a simple example of how you would tackle something like "returning" after a function call.
function doStuff(param, cb) {
//do something
var newData = param;
//"return"
cb(newData);
}
doStuff({some:data}, function(myNewData) {
//you're done with doStuff in here
});
There's also a lot of helpful utility functions in the async library available on npm.

Develop Reference

JavaScript is the programming language of the Web.

avoiding simultaneous execution of shell command with node.js & shelljs - javascript

Related

Is it impossible to create a reliable async singleton pattern in JavaScript?

Keep order in nodejs command line script

In node.js, how to use child_process.exec so all can happen asynchronously?

JQuery / Javascript inline callback

How to write a node.js function that waits for an event to fire before 'returning'?

Categories

Resources