There is something I want to code in nodejs, but I don't have any idea of how to implement it. I've been reading and searching a lot, and still have not idea of what would be the correct way to do it.
The problem is the following:
Read lines from stdin
For each line, launch an http request
There must be a limit to simultaneous http
Write the line readed plus some data obtained from the http request to stdout
Lines must be written in order
You can not read "all" the file and then split lines: you must process one line at a time, remember it's stdin. You don't know when the input will end.
Does anybody have some clues of how to approach this problem? I do not have any idea of how to proceed.
You could do something like this:
const http = require('http');
const Promise = require('bluebird');
let arrayOfRequests = [];
process.stdin.on('data', (data) => {
//get data from the stdin
//then add the request to the array
arrayOfRequests.push(http.get({}))
})
process.stdin.on('end', () => {
Promise.all(arrayOfRequests)
// use Promise .all to bundle all of the reuqest
//then use the spread operator so you can use all of the reuqest In order
.spread( (request1,request2,request3) => {
// do stuff
})
})
FYI, the snippet wont work.
So what you are doing is using the process.stdin that is built into Node.js. Then you are bundling all of the requests. Whenever the user cancels out of the program, your requests will be made. Since the calls will be async, you have them in an array, then run Promsise.all and use the bluebird .spread operator to deconstruct the Promise.all and get the values.
So far, I've got this solution for the producer-consumer problem in nodejs, where the producer don't produce more data until there is space available in the queue.
This is queue's code, based on block-queue: https://gist.github.com/AlvaroMaceda/8a933a317fed3fd4691fd5eda3e60c5e
To use the blocking queue, you create it with 3 parameters:
Number of tasks running concurrently
"Push" function. It will be called with the queue as parameter when
more data is needed. The task will be added with an identifier.
"Task" function. It will be called with the identifier created by
"Push" function.
The queue will call "push" only when more data is needed. For example, if there are five tasks running and it was created with a maximum of 5 concurrent tasks, "push" won't be called until one of these tasks end.
This is an example of how to use it:
"use strict";
const queue = require('./block-queue');
const CONCURRENCY = 5;
const WORKS_TO_LAUNCH = 10;
const TIMEOUT = 200;
let i = 1;
let q = queue(CONCURRENCY, doSomethingAsync, putMoreData );
function putMoreData(queue) {
if (++i <= WORKS_TO_LAUNCH) {
console.log('Pushing to queue');
queue.push(i);
}
}
function doSomethingAsync(task, done) {
setTimeout(function() {
console.log('done ' + task);
done();
}, 1000 + task * TIMEOUT);
}
q.push(i);
I don't give this as solved because I don't know if there is a more simple approach and I want to work the complete solution, don't know if I'll find some issues when working with this and streams.
Related
I'm making requests to an API, but their server only allows a certain number of active connections, so I would like to limit the number of ongoing fetches. For my purposes, a fetch is only completed (not ongoing) when the HTTP response body arrives at the client.
I would like to create an abstraction like this:
const fetchLimiter = new FetchLimiter(maxConnections);
fetchLimiter.fetch(url, options); // Returns the same thing as fetch()
This would make things a lot simpler, but there seems to be no way of knowing when a stream being used by other code ends, because streams are locked while they are being read. It is possible to use ReadableStream.tee() to split the stream into two, use one and return the other to the caller (possibly also constructing a Response with it), but this would degrade performance, right?
Since fetch uses promises, you can take advantage of that to make a simple queue system.
This is a method I've used before for queuing promise based stuff. It enqueues items by creating a Promise and then adding its resolver to an array. Of course until that Promise resolves, the await keeps any later promises from being invoked.
And all we have to do to start the next fetch when one finishes is just grab the next resolver and invoke it. The promise resolves, and then the fetch starts!
Best part, since we don't actually consume the fetch result, there's no worries about having to clone or anything...we just pass it on intact, so that you can consume it in a later then or something.
*Edit: since the body is still streaming after the fetch promise resolves, I added a third option so that you can pass in the body type, and have FetchLimiter retrieve and parse the body for you.
These all return a promise that is eventually resolved with the actual content.
That way you can just have FetchLimiter parse the body for you. I made it so it would return an array of [response, data], that way you can still check things like the response code, headers, etc.
For that matter, you could even pass in a callback or something to use in that await if you needed to do something more complex, like different methods of parsing the body depending on response code.
Example
I added comments to indicate where the FetchLimiter code begins and ends...the rest is just demo code.
It's using a fake fetch using a setTimeout, which will resolve between 0.5-1.5 secs. It will start the first three requests immediately, and then the actives will be full, and it will wait for one to resolve.
When that happens, you'll see the comment that the promise has resolved, then the next promise in the queue will start, and then you'll see the then from in the for loop resolve. I added that then just so you could see the order of events.
(function() {
const fetch = (resource, init) => new Promise((resolve, reject) => {
console.log('starting ' + resource);
setTimeout(() => {
console.log(' - resolving ' + resource);
resolve(resource);
}, 500 + 1000 * Math.random());
});
function FetchLimiter() {
this.queue = [];
this.active = 0;
this.maxActive = 3;
this.fetchFn = fetch;
}
FetchLimiter.prototype.fetch = async function(resource, init, respType) {
// if at max active, enqueue the next request by adding a promise
// ahead of it, and putting the resolver in the "queue" array.
if (this.active >= this.maxActive) {
await new Promise(resolve => {
this.queue.push(resolve); // push, adds to end of array
});
}
this.active++; // increment active once we're about to start the fetch
const resp = await this.fetchFn(resource, init);
let data;
if (['arrayBuffer', 'blob', 'json', 'text', 'formData'].indexOf(respType) >= 0)
data = await resp[respType]();
this.active--; // decrement active once fetch is done
this.checkQueue(); // time to start the next fetch from queue
return [resp, data]; // return value from fetch
};
FetchLimiter.prototype.checkQueue = function() {
if (this.active < this.maxActive && this.queue.length) {
// shfit, pulls from start of array. This gives first in, first out.
const next = this.queue.shift();
next('resolved'); // resolve promise, value doesn't matter
}
}
const limiter = new FetchLimiter();
for (let i = 0; i < 9; i++) {
limiter.fetch('/mypage/' + i)
.then(x => console.log(' - .then ' + x));
}
})();
Caveats:
I'm not 100% sure if the body is still streaming when the promise resolves...that seems to be a concern for you. However if that's a problem you could use one of the Body mixin methods like blob or text or json, which doesn't resolve until the body content is completely parsed (see here)
I intentionally kept it very short (like 15 lines of actual code) as a very simple proof of concept. You'd want to add some error handling in production code, so that if the fetch rejects because of a connection error or something that you still decrement the active counter and start the next fetch.
Of course it's also using async/await syntax, because it's so much easier to read. If you need to support older browsers, you'd want to rewrite with Promises or transpile with babel or equivalent.
I am working with this while loop and it is not working. I decided to use the Google Chrome debugger and I saw that the code inside is not being executed.
All the time it checks the condition, starts the first line of the code inside, and goes back again to check the condition.
It is a NodeJS server and I am using the Spotify API.
app.get('/process', ensureAuthenticated, function (req, res) {
let init_array = text.split(" ");
let modtext = init_array;
while (init_array.length != 0) {
spotifyApi.searchTracks(modtext.join(" "))
.then(function (data) {
try {
console.log(data.body.tracks.items[0].name);
for (let i = 0; i < modtext.length; i++) {
init_array.shift();
}
modtext = init_array;
} catch (err) {
console.log("No song");
modtext.pop();
}
});
}
res.redirect('/');
});
This question is best understood by understanding how node.js uses an event loop. At its core, node.js runs your Javascript in a single thread and it uses an event loop in order to manage the completion of things outside that single thread such as timers, network operations, file operations, etc...
Let's first start with a very simple while() loop:
let done = false;
setTimeout(() => {
done = true;
}, 100);
while(!done) {
// do whatever you want here
}
console.log("done with while loop"); // never gets called
At first blush, you would think that the while loop would run for 100ms and then done would be set to true and the while loop would exit. That is not what happens. In fact, this is an infinite while loop. It runs and runs and runs and the variable done is never set to true. The console.log() at the end never runs.
It has this issue because setTimeout() is an asynchronous operation and it communicates its completion through the event loop. But, as we described above, node.js runs its Javascript as single threaded and only gets the next event from the event loop when that single thread finishes what it's doing. But, the while can't finish what it's doing until done gets set to true, but done can't get set to true until the while loop finishes. It's a stand-off and the while loop just runs forever.
So, in a nutshell, while any sort of loop is running, NO asynchronous operation ever gets its result processed (unless it's using await inside the loop which is something different). Asynchronous operations (or anything that uses the event loop) has to wait until the current running Javascript is done and then the interpreter can go back to the event loop.
Your while loop has the exact same issue. spotifyApi.searchTracks() is an asynchronous operation that returns a promise and all promises communicate their results via the event queue. So, you have the same standoff. Your .then() handler can't get called until the while loop finishes, but your while loop can't finish until the .then() handler gets called. Your while loop will just loop infinitely until you exhaust some system resource and your .then() handlers never get a chance to execute.
Since you haven't included code in your request handler that actually produces some result or action (all it appears to do is just modify some local variables), it's not obvious what exactly you're trying to accomplish and thus how to better write this code.
You appear to have N searches to do and you're logging something in each search. You could do them all in parallel and just use Promise.all() to track when they are all done (no while loop at all). Or, you can sequence them so you run one, get its result, then run another. Your question doesn't give us enough info to know what the best option would be.
Here's one possible solution:
Sequence the operations using async/await
Here the request handler is declared async so we can use await inside the while loop. That will suspend the while loop and allow other events to process while waiting for the promise to resolve.
app.get('/process', ensureAuthenticated, async function (req, res) {
let init_array = text.split(" ");
let modtext = init_array;
while (init_array.length != 0) {
try {
let data = await spotifyApi.searchTracks(modtext.join(" "));
console.log(data.body.tracks.items[0].name);
for (let i = 0; i < modtext.length; i++) {
init_array.shift();
}
modtext = init_array;
} catch (err) {
console.log("No song");
modtext.pop();
}
}
res.redirect('/');
});
The reason you're only seeing one line execute is because it's asynchronous. When you call an asynchronous function, it returns immediately and continues to do its work in the background. Once it's done, it calls another function (a "callback") and passes the results of the function to that. That's why your code has to go inside of a call to then() rather than just being on the next line of code.
In this case, spotifyApi.searchTracks() returns a Promise. Once the Spotify API has completed the search, the function in then() will run.
Lets use async/await to solve this problem, I have no clue what data you get in a text but I think it is good example to understand a concept of asynchronous processing.
app.get('/process', ensureAuthenticated, async function (req, res, next) {
try {
const modtext = text.split(" ");
const promises = modtext.map(item => spotify.searchTracks(item));
const response = await Promise.all(promises);
response.forEach(data => {
// if you use lodash, simply use a get function `get(data, 'body.tracks.items[0].name')` => no need to check existence of inner attributes
// or use something like this
if (data && data.body && data.body.track && data.body.tracks.items && data.body.tracks.items[0]) {
console.log(data.body.tracks.items[0].name);
} else {
console.log('No song');
}
});
res.redirect('/');
} catch(err) {
next(err)
}
});
For me much simpler and cleaner code, again, i dont know what is structure of your text attribute and logic behind it, so maybe you will have to make some changes.
Using nodejs 8.12 on Gnu/Linux CentOS 7. Using the built-in web server, require('https'), for a simple application.
I understand that nodejs is single threaded (single process) and there is no actual parallel execution of code. Based on my understanding, I think the http/https server will process one http request and run the handler through all synchronous statements and set up asynchronous statements to be executed later before it will return to process a subsequent request. However, with http/https libraries, you have an asynchronous code that is used to assemble the body of the request. So, we already have one callback which is executed when the body is ready ('end' event). This fact makes me think it might be possible to be in the middle of processing two or more requests simultaneously.
As part of handling the requests, I need to execute a string of shell commands and I use the shelljs.exec library to do that. It runs synchronously, waiting until complete before returning. So, example code would look like:
const shelljs_exec = require('shelljs.exec');
function process() {
// bunch of shell commands in string
var command_str = 'command1; command2; command3';
var exec_results = shelljs_exec(command_str);
console.log('just executed shelljs_exec command');
var proc_results = process_results(exec_results);
console.log(proc_results);
// and return the response...
}
So node.js runs the shelljs_exec() and waits for completion. While it's waiting, can another request be worked on, such that there is a risk, slight, of two or more shelljs.exec invocations running at the same time? Since that could be a problem, I need to ensure only one shelljs.exec statement can be in progress at a given time.
If that is not a correct understanding, then I was thinking I need to do something with mutex locks. Like this:
const shelljs_exec = require('shelljs.exec');
const locks = require('locks');
// Need this in global scope - so we are all dealing with the same one.
var mutex = locks.createMutex();
function await_lock(shell_commands) {
var commands = shell_commands;
return new Promise(getting_lock => {
mutex.lock(got_lock_and_execute);
});
function got_lock_and_execute() {
var exec_results = shelljs_exec(commands);
console.log('just executed shelljs_exec command');
mutex.unlock();
return exec_results;
}
}
async function process() {
// bunch of shell commands in string
var command_str = 'command1; command2; command3';
exec_results = await await_lock(command_str);
var proc_results = process_results(exec_results);
console.log(proc_results);
}
If shelljs_exec is synchronous, no need.
If it is not. If it takes a callback wrap it in a Promise constructor so that it can be awaited. I would suggest properly wrapping the mutex.lock in a promise that gets resolved when the lock is acquired. The try finally is needed to ensure that the mutex is unlocked if shelljs_exec throws an exception.
async function await_lock(shell_commands) {
await (new Promise(function(resolve, reject) {
mutex.lock(resolve);
}));
try {
let exec_results = await shelljs_exec(commands);
return exec_results;
} finally {
mutex.unlock();
}
}
Untested. But it looks like it should work.
I am attempting to make a simple text game that operates in a socket.io chat room on a node server. The program works as follows:
Currently I have three main modules
Rogue : basic home of rogue game functions
rogueParser : module responsible for extracting workable commands from command strings
Verb_library: module containing a list of commands that can be invoked from the client terminal.
The Client types a command like 'say hello world'. This triggers the following socket.io listener
socket.on('rg_command', function(command){
// execute the verb
let verb = rogueParser(command);
rogue.executeVerb(verb, command, function(result){
console.log(result);
});
});
Which then in turn invokes the executeVerb function from rogue..
executeVerb: function(verb, data, callback){
verb_library[verb](data, callback);
},
Each verb in verb_library should be responsible for manipulating the database -if required- and then returning an echo string sent to the appropriate targets representing the completion of the action.
EDIT: I chose 'say' when I posted this but it was pointed out afterward that it was a poor example. 'say' is not currently async but eventually will be as will be the vast majority of 'verbs' as they will need to make calls to the database.
...
say: function(data, callback){
var response = {};
console.log('USR:'+data.user);
var message = data.message.replace('say','');
message = ('you say '+'"'+message.trim()+'"');
response.target = data.user;
response.type = 'echo';
response.message = message;
callback(response);
},
...
My problem is that
1 ) I am having issues passing callbacks through so many modules. Should I be able to pass a callback through multiple layers of modules? Im worried that I'm blind so some scope magic that is making me lose track of what should happen when I pass a callback function into a module which then passes the same callback to another module which then calls the callback. Currently it seems I either end up without access to the callback on the end, or the first function tries to execute without waiting on the final callback returning a null value.
2 ) Im not sure if Im making this harder than it needs to be by not using promises or if this is totally achievable with callbacks, in which case I want to learn how to do it that way before I summon extra code.
Sorry if this is a vague question, I'm in a position of design pattern doubt and looking for advice on this general setup as well as specific information regarding how these callbacks should be passed around. Thanks!
1) Passing callback trough multiple layers doesn't sound like a good idea. Usually I'm thinking what will happen, If I will continiue doing this for a year? Will it be flexible enough so that when I need to change to architecture (let's say customer have new idea), my code will allow me to without rewriting whole app? What you're experiencing is called callback hell. http://callbackhell.com/
What we are trying to do, is to keep our code as shallow as possible.
2) Promise is just syntax sugar for callback. But it's much easier to think in Promise then in callback. So personally, I would advice you to take your time and grasp as much as you can of programming language features during your project. Latest way we're doing asynchronus code is by using async/await syntax which allows us to totally get rid of callback and Promise calls. But during your path, you will have to work with both for sure.
You can try to finish your code this way and when you're done, find what was the biggest pain and how could you write it again to avoid it in future. I promise you that it will be much more educative then getting explicit answear here :)
Asynchronousness and JavaScript go back a long way. How we deal with it has evolved over time and there are numerous applied patterns that attempt to make async easier. I would say that there are 3 concrete and popular patterns. However, each one is very related to the other:
Callbacks
Promises
async/await
Callbacks are probably the most backwards compatible and just involve providing a function to some asynchronous task in order to have your provided function be called whenever the task is complete.
For example:
/**
* Some dummy asynchronous task that waits 2 seconds to complete
*/
function asynchronousTask(cb) {
setTimeout(() => {
console.log("Async task is done");
cb();
}, 2000);
}
asynchronousTask(() => {
console.log("My function to be called after async task");
});
Promises are a primitive that encapsulates the callback pattern so that instead of providing a function to the task function, you call the then method on the Promise that the task returns:
/**
* Some dummy asynchronous task that waits 2 seconds to complete
* BUT the difference is that it returns a Promise
*/
function asynchronousTask() {
return new Promise(resolve => {
setTimeout(() => {
console.log("Async task is done");
resolve();
}, 2000);
});
}
asynchronousTask()
.then(() => {
console.log("My function to be called after async task");
});
The last pattern is the async/await pattern which also deals in Promises which are an encapsulation of callbacks. They are unique because they provide lexical support for using Promises so that you don't have to use .then() directly and also don't have to explicitly return a Promise from your task:
/*
* We still need some Promise oriented bootstrap
* function to demonstrate the async/await
* this will just wait a duration and resolve
*/
function $timeout(duration) {
return new Promise(resolve => setTimeout(resolve, duration));
}
/**
* Task runner that waits 2 seconds and then prints a message
*/
(async function() {
await $timeout(2000);
console.log("My function to be called after async task");
}());
Now that our vocabulary is cleared up, we need to consider one other thing: These patterns are all API dependent. The library that you are using uses callbacks. It is alright to mix these patterns, but I would say that the code that you write should be consistent. Pick one of the patterns and wrap or interface with the library that you need to.
If the library deals in callbacks, see if there is a wrapping library or a mechanism to have it deal in Promises instead. async/await consumes Promises, but not callbacks.
Callbacks are fine, but I would only use them if a function is dependent on some asynchronous result. If however the result is immediately available, then the function should be designed to return that value.
In the example you have given, say does not have to wait for any asynchronous API call to come back with a result, so I would change its signature to the following:
say: function(data){ // <--- no callback argument
var response = {};
console.log('USR:'+data.user);
var message = data.message.replace('say','');
message = ('you say '+'"'+message.trim()+'"');
response.target = data.user;
response.type = 'echo';
response.message = message;
return response; // <--- return it
}
Then going backwards, you would also change the signature of the functions that use say:
executeVerb: function(verb, data){ // <--- no callback argument
return verb_library[verb](data); // <--- no callback argument, and return the returned value
}
And further up the call stack:
socket.on('rg_command', function(command){
// execute the verb
let verb = rogueParser(command);
let result = rogue.executeVerb(verb, command); // <--- no callback, just get the returned value
console.log(result);
});
Of course, this can only work if all verb methods can return the expected result synchronously.
Promises
If say would depend on some asynchronous API, then you could use promises. Let's assume this API provides a callback system, then your say function could return a promise like this:
say: async function(data){ // <--- still no callback argument, but async!
var response = {};
console.log('USR:'+data.user);
var message = data.message.replace('say','');
response.target = data.user;
response.type = 'echo';
// Convert the API callback system to a promise, and use AWAIT
await respone.message = new Promise(resolve => someAsyncAPIWithCallBackAsLastArg(message, resolve));
return response; // <--- return it
}
Again going backwards, you would also change the signature of the functions that use say:
executeVerb: function(verb, data){ // <--- still no callback argument
return verb_library[verb](data); // <--- no callback argument, and return the returned promise(!)
}
And finally:
socket.on('rg_command', async function(command){ // Add async
// execute the verb
let verb = rogueParser(command);
let result = await rogue.executeVerb(verb, command); // <--- await the fulfillment of the returned promise
console.log(result);
});
I will appreciate if you help me with the following case:
Given function:
async function getAllProductsData() {
try {
allProductsInfo = await getDataFromUri(cpCampaignsLink);
allProductsInfo = await getCpCampaignsIdsAndNamesData(allProductsInfo);
await Promise.all(allProductsInfo.map(async (item) => {
item.images = await getProductsOfCampaign(item.id);
}));
allProductsInfo = JSON.stringify(allProductsInfo);
console.log(allProductsInfo);
return allProductsInfo;
} catch(err) {
handleErr(err);
}
}
That function is fired when server is started and it gathers campaigns information from other site: gets data(getDataFromUri()), then extracts from data name and id(getCpCampaignsIdsAndNamesData()), then gets products images for each campaign (getProductsOfCampaign());
I have also express.app with following piece of code:
app.get('/products', async (req, res) => {
if (allProductsInfo.length === undefined) {
console.log('Pending...');
allProductsInfo = await getAllProductsData();
}
res.status(200).send(allProductsInfo);
});
Problem description:
Launch server, wait few seconds until getAllProductsData() gets executed, do '/products' GET request, all works great!
Launch server and IMMEDIATELY fire '/products' GET request' (for that purpose I added IF with console.log('Pending...') expression), I get corrupted result back: it contains all campaigns names, ids, but NO images arrays.
[{"id":"1111","name":"Some name"},{"id":"2222","name":"Some other name"}],
while I was expecting
[{"id":"1111","name":"Some name","images":["URI","URI"...]}...]
I will highly appreciate your help about:
Explaining the flow of what is happening with async execution, and why the result is being sent without waiting for images arrays to be added to object?
If you know some useful articles/specs part covering my topic, I will be thankful for.
Thanks.
There's a huge issue with using a global variable allProductsInfo, and then firing multiple concurrent functions that use it asynchronously. This creates race conditions of all kinds, and you have to consider yourself lucky that you got only not images data.
You can easily solve this by making allProductsInfo a local variable, or at least not use it to store the intermediate results from getDataFromUri and getCpCampaignsIdsAndNamesData - use different (local!) variables for those.
However, even if you do that, you're potentially firing getAllProductsData multiple times, which should not lead to errors but is still inefficient. It's much easier to store a promise in the global variable, initialise this once with a single call to the info gathering procedure, and just to await it every time - which won't be noticeable when it's already fulfilled.
async function getAllProductsData() {
const data = await getDataFromUri(cpCampaignsLink);
const allProductsInfo = await getCpCampaignsIdsAndNamesData(allProductsInfo);
await Promise.all(allProductsInfo.map(async (item) => {
item.images = await getProductsOfCampaign(item.id);
}));
console.log(allProductsInfo);
return JSON.stringify(allProductsInfo);
}
const productDataPromise = getAllProductsData();
productDataPromise.catch(handleErr);
app.get('/products', async (req, res) => {
res.status(200).send(await productDataPromise);
});
Of course you might also want to start your server (or add the /products route to it) only after the data is loaded, and simply serve status 500 until then. Also you should consider what happens when the route is hit after the promise is rejected - not sure what handleErr does.