Different strings with headstream and promise - javascript

I use in my project a function:
function readStream(file) {
console.log("starte lesen");
const readStream = fs.createReadStream(file);
readStream.setEncoding('utf8');
return new Promise((resolve, reject) => {
let data = "";
readStream.on("data", chunk => data += chunk);
readStream.on("end", () => {resolve(data);});
readStream.on("error", error => reject(error));
});
}
It will read an xml file with around 800 lines in. If I add:
readStream.on("end", () => {console.log(data); resolve(data);});
Then the xml data is complete. Everything is fine. But if I call now this readStream from another function:
const dpath = path.resolve(__basedir, 'tests/downloads', 'test.xml');
let xml = await readStream(dpath);
console.log(xml);
then the XML data is cut. I think 800 lines is nothing big. So what can happen that the data is cut at this position but not in the function itself.

I have tried it like following way, it seems working for me.
For complete running example clone node-cheat xml-streamer and run node main.js.
xml-streamer.js:
const fs = require('fs');
module.exports.readStream = function (file) {
console.log("read stream started");
const readStream = fs.createReadStream(file);
readStream.setEncoding('utf8');
return new Promise((resolve, reject) => {
let data = "";
readStream.on("data", chunk => data += chunk);
readStream.on("end", () => {console.log(data); resolve(data);});
readStream.on("error", error => reject(error));
});
}
main.js:
const path = require('path');
const _streamer = require('./xml-streamer');
async function main() {
const xml = await _streamer.readStream( path.resolve(__dirname, 'files', 'test.xml'));
console.log(xml);
}
main();
P.S. In above mentioned node-cheat test xml file has 1121 lines.

Sometimes sync + async code can get a race condition when it's called in the same tick. Try using setImmediate(resolve, data) on your event handler, which will resolve on the next process tick.
Alternatively, if you're targeting node v12 or higher you can use the stream async iterator interface, which will be much cleaner for your code:
async function readStream(file) {
console.log("starte lesen");
const readStream = fs.createReadStream(file);
readStream.setEncoding('utf8');
let data = "";
for await (const chunk of readStream) {
out += chunk;
}
return out;
}

If you happens to use a modern version of node, there's fs.promises
const { promises: fs } = require('fs')
;(async function main() {
console.log(await fs.readFile('./input.txt', 'utf-8'));
})()

Related

how to access array in async context

i have this function :
const list = [];
(async () => {
await fs.readdir(JSON_DIR, async (err, files) => {
await files.forEach(async filename => {
const readStream = fs.createReadStream(path.join("output/scheduled", filename));
const parseStream = json.createParseStream();
await parseStream.on('data', async (hostlist: HostInfo[]) => {
hostlist.forEach(async host => {
list.push(host);
});
});
readStream.pipe(parseStream);
})
});
//here list.length = 0
console.log(list.length);
})();
the function read from a directory of large json files, and reads them, for each file,it create a stream that starts reading the json, and the stream can be working at the same time.
at the end of the function i need to save the variable host in the list, but when i check the lis at the end, is empty.
how can i save the content of the host to a global variable, so it can be accessible in the end.
i tought as solution to check when every file is finished reading using and end event.
though to access the list at the end, i need another event to start when all other events are finished.
and looks complicated.
i have been using the big-json library,
https://www.npmjs.com/package/big-json
You could use a counter to determine when the streams have finished processing.
You can use readdirSync for executing the operation synchronously.
const list: HostInfo[] = [];
(() => {
const files = fs.readdirSync(JSON_DIR);
let streamFinished = 0;
let streamCount = files.length;
files.forEach((filename) => {
const readStream = fs.createReadStream(
path.join('output/scheduled', filename)
);
const parseStream = json.createParseStream();
parseStream.on('error', (err) => {
// Handle errors
})
parseStream.on('data', (hostlist: HostInfo[]) => {
list.push(...hostlist);
});
parseStream.on('end', () => {
streamFinished++;
if (streamFinished === streamCount) {
// End of all streams...
}
console.log(list.length);
})
readStream.pipe(parseStream);
});
})();

Node.js how to synchronously read lines from stream.Readable

I'm interacting with a child process through stdio, and I need to wait for a line from childProcess.stdout each time I write some command to childProcess.stdin.
It's easy to wrap an asynchronous method for writing like below:
async function write(data){
return new Promise(resolve=>{
childProcess.stdin.write(data,()=>resolve());
})
}
However, it turns out quite difficult when it comes to reading, since data from stdout must be processed using listeners. I've tried below:
const LineReader = require("readline")
const reader = LineReader.createInterface(childProcess.stdout);
async function read(){
return new Promise(resolve=>{
reader.once("line",line=>resolve(line));
})
}
But it always returns the first line.
I know I may achieve this using setInterval, And I've already implemented the functionality this way. But it obviously has an impact on the performance, so now I'm trying to optimize it by wrapping it into an asynchronous method.
Any suggestions and solutions will be appreciated!
Well, I ended up with something pretty similar to what you were trying. It makes some assumptions that are mentioned in the code and needs more complete error handling:
const cp = require('child_process');
const readline = require('readline');
const child = cp.spawn("node", ["./echo.js"]);
child.on('error', err => {
console.log(err);
}).on('exit', () => {
console.log("child exited");
});
const reader = readline.createInterface({ input: child.stdout });
// this will miss line events that occurred before this is called
// so this only really works if you know the output comes one line at a time
function nextLine() {
return new Promise(resolve => {
reader.once('line', resolve);
});
}
// this does not check for stdin that is full and wants us to wait
// for a drain event
function write(str) {
return new Promise(resolve => {
let ready = child.stdin.write(str, resolve);
if (!ready) {
console.log("stream isn't ready yet");
}
});
}
async function sendCmd(cmd) {
// get line reader event handler installed so there's no race condition
// on missing the return event
let p = nextLine();
// send the command
await write(cmd);
return p;
}
// send a sequence of commands and get their results
async function run() {
let result1 = await sendCmd("hi\n");
console.log(`Got '${result1}'`);
let result2 = await sendCmd("goodbye\n");
console.log(`Got '${result2}'`);
let result3 = await sendCmd("exit\n");
console.log(`Got '${result3}'`);
}
run().then(() => {
console.log("done");
}).catch(err => {
console.log(err);
});
And, for testing purposes, I ran it with this echo app:
process.stdin.on("data", data => {
let str = data.toString();
let ready = process.stdout.write("return: " + str, () => {
if (str.startsWith("exit")) {
process.exit();
}
});
if (!ready) {
console.log("echo wasn't ready");
}
});

Can't populate array and return

I'm trying to read all the files inside a specific folder.
But I'm having some issues returning an array with all those files' data.
My function is returning an empty array because the return is called before all values have been pushed into the array.
How can I fix this problem using asynchronous mechanisms?
app.get('/load-schemas', async function (req, res) {
var schemas = [];
fs.readdirSync('Schemas').forEach(file => {
fs.readFile('Schemas/' + file, "utf8", function(err, data) {
schemas.push(data);
})
});
res.status(200).send(schemas);
})
I guess the easiest solution is to go with readFileSync
let data = fs.readFileSync('Schemas/' + file, "utf8");
schemas.push(data);
Since you can use async/await in your code I would use the promises from fs and wait them like here https://stackoverflow.com/a/58332163/732846
This way the code looks like "sync" code but has the benefits of being async
const { promises: fs } = require("fs");
app.get('/load-schemas', async function (req, res) {
var schemas = [];
const dirs = await fs.readdir('Schemas');
dirs.forEach(file => {
const data = await fs.readFile('Schemas/' + file, "utf8");
schemas.push(data);
});
res.status(200).send(schemas);
})
I think that you can go for promises.
snippet from: How do I wait for multiple fs.readFile calls?
const fs = require('fs');
function readFromFile(file) {
return new Promise((resolve, reject) => {
fs.readFile(file, function (err, data) {
if (err) {
console.log(err);
reject(err);
}
else {
resolve(JSON.parse(data));
}
});
});
}
const promises = [
readFromFile('./output/result3.json'),
readFromFile('./output/result4.json')
];
Promise.all(promises).then(result => {
console.log(result);
baseListOfFiles = result[0];
currentListOfFiles = result[1];
// do more stuff
});
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/all#:~:text=The%20Promise.,input%20iterable%20contains%20no%20promises.

Promise Resolving before Google Cloud Bucket Upload

I am writing some code that loops over a CSV and creates a JSON file based on the CSV. Included in the JSON is an array named photos, which is to contain the returned urls for the images that are being uploaded to Google Cloud Storage within the function. However, having the promise wait for the uploads to finish has me stumped, since everything is running asynchronously, and finishes off the promise and the JSON compilation prior to finishing the bucket upload and returning the url. How can I make the promise resolve after the urls have been retrieved and added to currentJSON.photos?
const csv=require('csvtojson')
const fs = require('fs');
const {Storage} = require('#google-cloud/storage');
var serviceAccount = require("./my-firebase-storage-spot.json");
const testFolder = './Images/';
var csvFilePath = './Inventory.csv';
var dirArr = ['./Images/Subdirectory-A','./Images/Subdirectory-B','./Images/Subdirectory-C'];
var allData = [];
csv()
.fromFile(csvFilePath)
.subscribe((json)=>{
return new Promise((resolve,reject)=>{
for (var i in dirArr ) {
if (json['Name'] == dirArr[i]) {
var currentJSON = {
"photos" : [],
};
fs.readdir(testFolder+json['Name'], (err, files) => {
files.forEach(file => {
if (file.match(/.(jpg|jpeg|png|gif)$/i)){
var imgName = testFolder + json['Name'] + '/' + file;
bucket.upload(imgName, function (err, file) {
if (err) throw new Error(err);
//returned uploaded img address is found at file.metadata.mediaLink
currentJSON.photos.push(file.metadata.mediaLink);
});
}else {
//do nothing
}
});
});
allData.push(currentJSON);
}
}
resolve();
})
},onError,onComplete);
function onError() {
// console.log(err)
}
function onComplete() {
console.log('finito');
}
I've tried moving the resolve() around, and also tried placing the uploader section into the onComplete() function (which created new promise-based issues).
Indeed, your code is not awaiting the asynchronous invocation of the readdir callback function, nor of the bucket.upload callback function.
Asynchronous coding becomes easier when you use the promise-version of these functions.
bucket.upload will return a promise when omitting the callback function, so that is easy.
For readdir to return a promise, you need to use the fs Promise API: then you can use
the promise-based readdir method and use
promises throughout your code.
So use fs = require('fs').promises instead of fs = require('fs')
With that preparation, your code can be transformed into this:
const testFolder = './Images/';
var csvFilePath = './Inventory.csv';
var dirArr = ['./Images/Subdirectory-A','./Images/Subdirectory-B','./Images/Subdirectory-C'];
(async function () {
let arr = await csv().fromFile(csvFilePath);
arr = arr.filter(obj => dirArr.includes(obj.Name));
let allData = await Promise.all(arr.map(async obj => {
let files = await fs.readdir(testFolder + obj.Name);
files = files.filter(file => file.match(/\.(jpg|jpeg|png|gif)$/i));
let photos = await Promise.all(
files.map(async file => {
var imgName = testFolder + obj.Name + '/' + file;
let result = await bucket.upload(imgName);
return result.metadata.mediaLink;
})
);
return {photos};
}));
console.log('finito', allData);
})().catch(err => { // <-- The above async function runs immediately and returns a promise
console.log(err);
});
Some remarks:
There is a shortcoming in your regular expression. You intended to match a literal dot, but you did not escape it (fixed in above code).
allData will contain an array of { photos: [......] } objects, and I wonder why you would not want all photo elements to be part of one single array. However, I kept your logic, so the above will still produce them in these chunks. Possibly, you intended to have other properties (next to photos) as well, which would make it actually useful to have these separate objects.
The problem is the your code is not waiting in your forEach. I would highly recommend to look for stream and try to do things in parallel as much as possible. There is one library which is very powerful and does that job for you. The library is etl.
You can read rows from csv in parallel and process them in parallel rather than one by one.
I have tried to explain the lines in the code below. Hopefully it makes sense.
const etl = require("etl");
const fs = require("fs");
const csvFilePath = `${__dirname }/Inventory.csv`;
const testFolder = "./Images/";
const dirArr = [
"./Images/Subdirectory-A",
"./Images/Subdirectory-B",
"./Images/Subdirectory-C"
];
fs.createReadStream(csvFilePath)
.pipe(etl.csv()) // parse the csv file
.pipe(etl.collect(10)) // this could be any value depending on how many you want to do in parallel.
.pipe(etl.map(async items => {
return Promise.all(items.map(async item => { // Iterate through 10 items
const finalResult = await Promise.all(dirArr.filter(i => i === item.Name).map(async () => { // filter the matching one and iterate
const files = await fs.promises.readdir(testFolder + item.Name); // read all files
const filteredFiles = files.filter(file => file.match(/\.(jpg|jpeg|png|gif)$/i)); // filter out only images
const result = await Promise.all(filteredFiles).map(async file => {
const imgName = `${testFolder}${item.Name}/${file}`;
const bucketUploadResult = await bucket.upload(imgName); // upload image
return bucketUploadResult.metadata.mediaLink;
});
return result; // This contains all the media link for matching files
}));
// eslint-disable-next-line no-console
console.log(finalResult); // Return arrays of media links for files
return finalResult;
}));
}))
.promise()
.then(() => console.log("finsihed"))
.catch(err => console.error(err));
Here's a way to do it where we extract some of the functionality into some separate helper methods, and trim down some of the code. I had to infer some of your requirements, but this seems to match up pretty closely with how I understood the intent of your original code:
const csv=require('csvtojson')
const fs = require('fs');
const {Storage} = require('#google-cloud/storage');
var serviceAccount = require("./my-firebase-storage-spot.json");
const testFolder = './Images/';
var csvFilePath = './Inventory.csv';
var dirArr = ['./Images/Subdirectory-A','./Images/Subdirectory-B','./Images/Subdirectory-C'];
var allData = [];
// Using nodejs 'path' module ensures more reliable construction of file paths than string manipulation:
const path = require('path');
// Helper function to convert bucket.upload into a Promise
// From other responses, it looks like if you just omit the callback then it will be a Promise
const bucketUpload_p = fileName => new Promise((resolve, reject) => {
bucket.upload(fileName, function (err, file) {
if (err) reject(err);
resolve(file);
});
});
// Helper function to convert readdir into a Promise
// Again, there are other APIs out there to do this, but this is a rl simple solution too:
const readdir_p = dirName => new Promise((resolve, reject) => {
fs.readdir(dirName, function (err, files) {
if (err) reject(err);
resolve(files);
});
});
// Here we're expecting the string that we found in the "Name" property of our JSON from "subscribe".
// It should match one of the strings in `dirArr`, but this function's job ISN'T to check for that,
// we just trust that the code already found the right one.
const getImageFilesFromJson_p = jsonName => new Promise((resolve, reject) => {
const filePath = path.join(testFolder, jsonName);
try {
const files = await readdir_p(filePath);
resolve(files.filter(fileName => fileName.match(/\.(jpg|jpeg|png|gif)$/i)));
} catch (err) {
reject(err);
}
});
csv()
.fromFile(csvFilePath)
.subscribe(async json => {
// Here we appear to be validating that the "Name" prop from the received JSON matches one of the paths that
// we're expecting...? If that's the case, this is a slightly more semantic way to do it.
const nameFromJson = dirArr.find(dirName => json['Name'] === dirName);
// If we don't find that it matches one of our expecteds, we'll reject the promise.
if (!nameFromJson) {
// We can do whatever we want though in this case, I think it's maybe not necessarily an error:
// return Promise.resolve([]);
return Promise.reject('Did not receive a matching value in the Name property from \'.subscribe\'');
}
// We can use `await` here since `getImageFilesFromJson_p` returns a Promise
const imageFiles = await getImageFilesFromJson_p(nameFromJson);
// We're getting just the filenames; map them to build the full path
const fullPathArray = imageFiles.map(fileName => path.join(testFolder, nameFromJson, fileName));
// Here we Promise.all, using `.map` to convert the array of strings into an array of Promises;
// if they all resolve, we'll get the array of file objects returned from each invocation of `bucket.upload`
return Promise.all(fullPathArray.map(filePath => bucketUpload_p(filePath)))
.then(fileResults => {
// So, now we've finished our two asynchronous functions; now that that's done let's do all our data
// manipulation and resolve this promise
// Here we just extract the metadata property we want
const fileResultsMediaLinks = fileResults.map(file => file.metadata.mediaLink);
// Before we return anything, we'll add it to the global array in the format from the original code
allData.push({ photos: fileResultsMediaLinks });
// Returning this array, which is the `mediaLink` value from the metadata of each of the uploaded files.
return fileResultsMediaLinks;
})
}, onError, onComplete);
You are looking for this library ELT.
You can read rows from CSV in parallel and process them in parallel rather than one by one.
I have tried to explain the lines in the code below. Hopefully, it makes sense.
const etl = require("etl");
const fs = require("fs");
const csvFilePath = `${__dirname }/Inventory.csv`;
const testFolder = "./Images/";
const dirArr = [
"./Images/Subdirectory-A",
"./Images/Subdirectory-B",
"./Images/Subdirectory-C"
];
fs.createReadStream(csvFilePath)
.pipe(etl.csv()) // parse the csv file
.pipe(etl.collect(10)) // this could be any value depending on how many you want to do in parallel.
.pipe(etl.map(async items => {
return Promise.all(items.map(async item => { // Iterate through 10 items
const finalResult = await Promise.all(dirArr.filter(i => i === item.Name).map(async () => { // filter the matching one and iterate
const files = await fs.promises.readdir(testFolder + item.Name); // read all files
const filteredFiles = files.filter(file => file.match(/\.(jpg|jpeg|png|gif)$/i)); // filter out only images
const result = await Promise.all(filteredFiles).map(async file => {
const imgName = `${testFolder}${item.Name}/${file}`;
const bucketUploadResult = await bucket.upload(imgName); // upload image
return bucketUploadResult.metadata.mediaLink;
});
return result; // This contains all the media link for matching files
}));
// eslint-disable-next-line no-console
console.log(finalResult); // Return arrays of media links for files
return finalResult;
}));
}))
.promise()
.then(() => console.log("finsihed"))
.catch(err => console.error(err));

Returning a value after writing to a file is complete

Hi I am writing a function in Node JS for which I have to return a filepath. My problem is in that function I am writing to a file and I want after writing to a file is finished then my return should work. Before looking into code, I know this can be duplicate and I have really did a research on this but I am not just being able to get there. I have tried using callback but the problem is I want to return a value which is already defined. So, before making any judgement calls for duplicate or lack of research, please read the code.
Also, tried to return value in fs.append callback but still did not solved.
My function:
const fs = require('fs');
const path = require('path');
module.exports.createDownloadFile = (request) => {
let filePath;
if (request) {
const userID = xyz;
filePath = path.join(__dirname, userID.concat('.txt'));
fs.open(filePath, 'w', (err) => {
if (err) throw new Error('FILE_NOT_PRESENT');
fs.appendFile(filePath, 'content to write');
});
}
return filePath;
};
I am getting the filePath where I am calling function, it's just at that time file is empty that is why I want to return after file is written completely.
Promises allow you to structure code and return values more like traditional synchronous code. util.promisify can help promisify regular node callback functions.
const fs = require('fs')
const path = require('path')
const fsAppendFileAsync = util.promisify(fs.appendFile)
const fsOpenAsync = util.promisify(fs.open)
module.exports.createDownloadFile = async (request) => {
if (!request) throw new Error('nope')
const userID = xyz
let filePath = path.join(__dirname, userID.concat('.txt'))
let fd = await fsOpenAsync(filePath, 'w')
await fsAppendFileAsync(fd, 'content to write')
return filePath
};
Note that async/await are ES2017 and require Node.js 7.6+ or Babel.
Opening a file with w creates or truncates the file and promises will reject on errors that are thrown so I've left the error handler out. You can use try {} catch (e) {} blocks to handle specific errors.
The Bluebird promise library is helpful too, especially Promise.promisifyAll which creates the promisified Async methods for you:
const Promise = require('bluebird')
const fs = Promise.promisifyAll(require('fs'))
fs.appendFileAsync('file', 'content to write')
use promises like this :
const fs = require('fs');
const path = require('path');
module.exports.createDownloadFile = (request) => {
return new Promise((resolve, reject) => {
let filePath;
if (request) {
const userID = xyz;
filePath = path.join(__dirname, userID.concat('.txt'));
fs.open(filePath, 'w', (err) => {
if (err) reject(err);
else
fs.appendFile(filePath, 'content to write', (err) => {
if (err)
reject(err)
else
resolve(filePath)
});
});
}
});
};
and call it like this :
createDownloadFile(requeset).then(filePath => {
console.log(filePath)
})
or use sync functions without Promises:
module.exports.createDownloadFile = (request) => {
let filePath;
if (request) {
const userID = xyz;
filePath = path.join(__dirname, userID.concat('.txt'));
fs.openSync(filePath,"w");
fs.appendFileSync(filePath, 'content to write');
}
return filePath;
};

Categories

Resources