I'm trying to save an image by spawning a python process from a node.js script. The image is passed as a binary data file from a Node.JS script to a Python process.
The Script index.js spawns a python script my_py.py, then passes it a binary image data file. The python script my_py.py captures the binary image data and saves it to the directory assets/media.
The problem is the image isn't saved to the directory and I get the error error spawn ENAMETOOLONG.
Could you please help me spot the problem and to fix the code?
Thanks in advance!
index.js:
const spawn = require("child_process").spawn;
const fs = require('fs');
let params = {
"image":readDataset()
}
// Launch python app
const pythonProcess = spawn('py',['my_py.py', JSON.stringify(params)]);
// Print result
pythonProcess.stdout.on("data", (data) =>{
console.log(data.toString());
});
// Read errors
pythonProcess.stderr.on("data", (data) =>{
console.log(data.toString());
});
function readDataset() {
try {
return fs.readFileSync('color.png', 'binary');
}
catch (err) {
return err;
}
}
my_py.py:
import sys, json
params = json.loads(sys.argv[1]) # Load passed arguments
import numpy as np
import cv2
import os
fileName = "image.png" # file name
fileData = params['image'] # Capture file
# Convert Image to Numpy as array
img = np.array(fileData)
# Save file to local directory
cv2.imwrite(os.path.join('assets/media/', f'{fileName}'), img)
cv2.waitKey(0)
# Results must return valid JSON Object
print("File saved!")
sys.stdout.flush()
The error says it all: the command you are constructing is too long.
The thing you need to be aware of is that operating systems limits how long a command can be. For Linux this is around 4096 bytes though you can modify this value. For Windows this is 8191 bytes and for Mac OS this is around 250k bytes.
Part of the reason for this is because these OSes were written in C/C++ and in C/C++ code, not enforcing buffer size limit is an invitation to buffer overrun (the infamous stack overflow or underflow bug!!). Truly making input size unlimited result in slow code because you will not be using simple arrays for input buffer in that case but more complicated data structures.
Additionally, not having any limit on command length is a vector for DOS attacks. If you run an OS that does not have command size limit and I know for sure you have 32GB of RAM all I need to do to crash your system is construct a 32GB command!!
TLDR
The correct way to pass large data between processes is what you've been doing over the internet - upload/download the data! The simplest implementation is to just pass the data via the stdin of the python process you are connected to:
const spawn = require("child_process").spawn;
const fs = require('fs');
let params = {
"image":readDataset()
}
// Launch python app
const pythonProcess = spawn('py',['my_py.py']);
// Pass image data
pythonProcess.stdin.write(JSON.stringify(params) + '\n');
// Print result
pythonProcess.stdout.on("data", (data) =>{
console.log(data.toString());
});
// Read errors
pythonProcess.stderr.on("data", (data) =>{
console.log(data.toString());
});
function readDataset() {
try {
return fs.readFileSync('color.png', 'base64');
}
catch (err) {
return err;
}
}
In the code above I end the JSON "packet" with a newline so that the python code can read until end of line for a single packet. You can use any convention you like. For example I also often use the nul character (0x00) to mark end of packet and HTTP use two newlines ('\n\n') to mark end of header etc.
In any case I read the image file as base64 because binary data is invalid in JSON. In your python code you can do a base64 decode to get the image back. Additionally base64 does not include newlines ('\n') in its character set so converting the image to base64 ensure you don't get a newline inside your JSON data.
To get the image just read from stdin:
import sys, json, base64
params = json.loads(sys.stdin.readline())
import numpy as np
import cv2
import os
fileName = "image.png" # file name
fileData = base64.b64decode(params['image']) # Capture file
# Convert Image to Numpy as array
img = np.array(fileData)
# Save file to local directory
cv2.imwrite(os.path.join('assets/media/', f'{fileName}'), img)
cv2.waitKey(0)
# Results must return valid JSON Object
print("File saved!")
sys.stdout.flush()
For some days I have searched for a working solution to an error
Error: EMFILE, too many open files
It seems that many people have the same problem. The usual answer involves increasing the number of file descriptors. So, I've tried this:
sysctl -w kern.maxfiles=20480
The default value is 10240. This is a little strange in my eyes, because the number of files I'm handling in the directory is under 10240. Even stranger, I still receive the same error after I've increased the number of file descriptors.
Second question:
After a number of searches I found a work around for the "too many open files" problem:
var requestBatches = {};
function batchingReadFile(filename, callback) {
// First check to see if there is already a batch
if (requestBatches.hasOwnProperty(filename)) {
requestBatches[filename].push(callback);
return;
}
// Otherwise start a new one and make a real request
var batch = requestBatches[filename] = [callback];
FS.readFile(filename, onRealRead);
// Flush out the batch on complete
function onRealRead() {
delete requestBatches[filename];
for (var i = 0, l = batch.length; i < l; i++) {
batch[i].apply(null, arguments);
}
}
}
function printFile(file){
console.log(file);
}
dir = "/Users/xaver/Downloads/xaver/xxx/xxx/"
var files = fs.readdirSync(dir);
for (i in files){
filename = dir + files[i];
console.log(filename);
batchingReadFile(filename, printFile);
Unfortunately I still recieve the same error.
What is wrong with this code?
For when graceful-fs doesn't work... or you just want to understand where the leak is coming from. Follow this process.
(e.g. graceful-fs isn't gonna fix your wagon if your issue is with sockets.)
From My Blog Article: http://www.blakerobertson.com/devlog/2014/1/11/how-to-determine-whats-causing-error-connect-emfile-nodejs.html
How To Isolate
This command will output the number of open handles for nodejs processes:
lsof -i -n -P | grep nodejs
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
...
nodejs 12211 root 1012u IPv4 151317015 0t0 TCP 10.101.42.209:40371->54.236.3.170:80 (ESTABLISHED)
nodejs 12211 root 1013u IPv4 151279902 0t0 TCP 10.101.42.209:43656->54.236.3.172:80 (ESTABLISHED)
nodejs 12211 root 1014u IPv4 151317016 0t0 TCP 10.101.42.209:34450->54.236.3.168:80 (ESTABLISHED)
nodejs 12211 root 1015u IPv4 151289728 0t0 TCP 10.101.42.209:52691->54.236.3.173:80 (ESTABLISHED)
nodejs 12211 root 1016u IPv4 151305607 0t0 TCP 10.101.42.209:47707->54.236.3.172:80 (ESTABLISHED)
nodejs 12211 root 1017u IPv4 151289730 0t0 TCP 10.101.42.209:45423->54.236.3.171:80 (ESTABLISHED)
nodejs 12211 root 1018u IPv4 151289731 0t0 TCP 10.101.42.209:36090->54.236.3.170:80 (ESTABLISHED)
nodejs 12211 root 1019u IPv4 151314874 0t0 TCP 10.101.42.209:49176->54.236.3.172:80 (ESTABLISHED)
nodejs 12211 root 1020u IPv4 151289768 0t0 TCP 10.101.42.209:45427->54.236.3.171:80 (ESTABLISHED)
nodejs 12211 root 1021u IPv4 151289769 0t0 TCP 10.101.42.209:36094->54.236.3.170:80 (ESTABLISHED)
nodejs 12211 root 1022u IPv4 151279903 0t0 TCP 10.101.42.209:43836->54.236.3.171:80 (ESTABLISHED)
nodejs 12211 root 1023u IPv4 151281403 0t0 TCP 10.101.42.209:43930->54.236.3.172:80 (ESTABLISHED)
....
Notice the: 1023u (last line) - that's the 1024th file handle which is the default maximum.
Now, Look at the last column. That indicates which resource is open. You'll probably see a number of lines all with the same resource name. Hopefully, that now tells you where to look in your code for the leak.
If you don't know multiple node processes, first lookup which process has pid 12211. That'll tell you the process.
In my case above, I noticed that there were a bunch of very similar IP Addresses. They were all 54.236.3.### By doing ip address lookups, was able to determine in my case it was pubnub related.
Command Reference
Use this syntax to determine how many open handles a process has open...
To get a count of open files for a certain pid
I used this command to test the number of files that were opened after doing various events in my app.
lsof -i -n -P | grep "8465" | wc -l
# lsof -i -n -P | grep "nodejs.*8465" | wc -l
28
# lsof -i -n -P | grep "nodejs.*8465" | wc -l
31
# lsof -i -n -P | grep "nodejs.*8465" | wc -l
34
What is your process limit?
ulimit -a
The line you want will look like this:
open files (-n) 1024
Permanently change the limit:
tested on Ubuntu 14.04, nodejs v. 7.9
In case you are expecting to open many connections (websockets is a good example), you can permanently increase the limit:
file: /etc/pam.d/common-session (add to the end)
session required pam_limits.so
file: /etc/security/limits.conf (add to the end, or edit if already exists)
root soft nofile 40000
root hard nofile 100000
restart your nodejs and logout/login from ssh.
this may not work for older NodeJS you'll need to restart server
use instead of if your node runs with different uid.
Using the graceful-fs module by Isaac Schlueter (node.js maintainer) is probably the most appropriate solution. It does incremental back-off if EMFILE is encountered. It can be used as a drop-in replacement for the built-in fs module.
I am not sure whether this will help anyone, I started working on a big project with lot of dependencies which threw me the same error. My colleague suggested me to install watchman using brew and that fixed this problem for me.
brew update
brew install watchman
Edit on 26 June 2019:
Github link to watchman
I did all the stuff above mentioned for the same problem but nothing worked. I tried below it worked 100%. Simple config changes.
Option 1: Set limit (It won't work most of the time)
user#ubuntu:~$ ulimit -n 65535
Check the current limit
user#ubuntu:~$ ulimit -n
1024
Option 2: Increase the available limit to e.g. 65535
user#ubuntu:~$ sudo nano /etc/sysctl.conf
Add the following line to it
fs.file-max = 65535
Run this to refresh with new config
user#ubuntu:~$ sudo sysctl -p
Edit the following file
user#ubuntu:~$ sudo vim /etc/security/limits.conf
Add the following lines to it
root soft nproc 65535
root hard nproc 65535
root soft nofile 65535
root hard nofile 65535
Edit the following file
user#ubuntu:~$ sudo vim /etc/pam.d/common-session
Add this line to it
session required pam_limits.so
Logout and login and try the following command
user#ubuntu:~$ ulimit -n
65535
Option 3: Just add this line
DefaultLimitNOFILE=65535
to /etc/systemd/system.conf and /etc/systemd/user.conf
I ran into this problem today, and finding no good solutions for it, I created a module to address it. I was inspired by #fbartho's snippet, but wanted to avoid overwriting the fs module.
The module I wrote is Filequeue, and you use it just like fs:
var Filequeue = require('filequeue');
var fq = new Filequeue(200); // max number of files to open at once
fq.readdir('/Users/xaver/Downloads/xaver/xxx/xxx/', function(err, files) {
if(err) {
throw err;
}
files.forEach(function(file) {
fq.readFile('/Users/xaver/Downloads/xaver/xxx/xxx/' + file, function(err, data) {
// do something here
}
});
});
You're reading too many files. Node reads files asynchronously, it'll be reading all files at once. So you're probably reading the 10240 limit.
See if this works:
var fs = require('fs')
var events = require('events')
var util = require('util')
var path = require('path')
var FsPool = module.exports = function(dir) {
events.EventEmitter.call(this)
this.dir = dir;
this.files = [];
this.active = [];
this.threads = 1;
this.on('run', this.runQuta.bind(this))
};
// So will act like an event emitter
util.inherits(FsPool, events.EventEmitter);
FsPool.prototype.runQuta = function() {
if(this.files.length === 0 && this.active.length === 0) {
return this.emit('done');
}
if(this.active.length < this.threads) {
var name = this.files.shift()
this.active.push(name)
var fileName = path.join(this.dir, name);
var self = this;
fs.stat(fileName, function(err, stats) {
if(err)
throw err;
if(stats.isFile()) {
fs.readFile(fileName, function(err, data) {
if(err)
throw err;
self.active.splice(self.active.indexOf(name), 1)
self.emit('file', name, data);
self.emit('run');
});
} else {
self.active.splice(self.active.indexOf(name), 1)
self.emit('dir', name);
self.emit('run');
}
});
}
return this
};
FsPool.prototype.init = function() {
var dir = this.dir;
var self = this;
fs.readdir(dir, function(err, files) {
if(err)
throw err;
self.files = files
self.emit('run');
})
return this
};
var fsPool = new FsPool(__dirname)
fsPool.on('file', function(fileName, fileData) {
console.log('file name: ' + fileName)
console.log('file data: ', fileData.toString('utf8'))
})
fsPool.on('dir', function(dirName) {
console.log('dir name: ' + dirName)
})
fsPool.on('done', function() {
console.log('done')
});
fsPool.init()
Like all of us, you are another victim of asynchronous I/O. With asynchronous calls, if you loop around a lot of files, Node.js will start to open a file descriptor for each file to read and then will wait for action until you close it.
File descriptor remains open until resource is available on your server to read it. Even if your files are small and reading or updating is fast, it takes some time, but in the same time your loop don't stop to open new files descriptor. So if you have too many files, the limit will be soon reached and you get a beautiful EMFILE.
There is one solution, creating a queue to avoid this effect.
Thanks to people who wrote Async, there is a very useful function for that. There is a method called Async.queue, you create a new queue with a limit and then add filenames to the queue.
Note: If you have to open many files, it would be a good idea to store which files are currently open and don't reopen them infinitely.
const fs = require('fs')
const async = require("async")
var q = async.queue(function(task, callback) {
console.log(task.filename);
fs.readFile(task.filename,"utf-8",function (err, data_read) {
callback(err,task.filename,data_read);
}
);
}, 4);
var files = [1,2,3,4,5,6,7,8,9,10]
for (var file in files) {
q.push({filename:file+".txt"}, function (err,filename,res) {
console.log(filename + " read");
});
}
You can see that each file is added to the queue (console.log filename), but only when the current queue is under the limit you set previously.
async.queue get information about availability of the queue through a callback, this callback is called only when data file is read and any action you have to do is achieved. (see fileRead method)
So you cannot be overwhelmed by files descriptor.
> node ./queue.js
0.txt
1.txt
2.txt
0.txt read
3.txt
3.txt read
4.txt
2.txt read
5.txt
4.txt read
6.txt
5.txt read
7.txt
1.txt read (biggest file than other)
8.txt
6.txt read
9.txt
7.txt read
8.txt read
9.txt read
I just finished writing a little snippet of code to solve this problem myself, all of the other solutions appear way too heavyweight and require you to change your program structure.
This solution just stalls any fs.readFile or fs.writeFile calls so that there are no more than a set number in flight at any given time.
// Queuing reads and writes, so your nodejs script doesn't overwhelm system limits catastrophically
global.maxFilesInFlight = 100; // Set this value to some number safeish for your system
var origRead = fs.readFile;
var origWrite = fs.writeFile;
var activeCount = 0;
var pending = [];
var wrapCallback = function(cb){
return function(){
activeCount--;
cb.apply(this,Array.prototype.slice.call(arguments));
if (activeCount < global.maxFilesInFlight && pending.length){
console.log("Processing Pending read/write");
pending.shift()();
}
};
};
fs.readFile = function(){
var args = Array.prototype.slice.call(arguments);
if (activeCount < global.maxFilesInFlight){
if (args[1] instanceof Function){
args[1] = wrapCallback(args[1]);
} else if (args[2] instanceof Function) {
args[2] = wrapCallback(args[2]);
}
activeCount++;
origRead.apply(fs,args);
} else {
console.log("Delaying read:",args[0]);
pending.push(function(){
fs.readFile.apply(fs,args);
});
}
};
fs.writeFile = function(){
var args = Array.prototype.slice.call(arguments);
if (activeCount < global.maxFilesInFlight){
if (args[1] instanceof Function){
args[1] = wrapCallback(args[1]);
} else if (args[2] instanceof Function) {
args[2] = wrapCallback(args[2]);
}
activeCount++;
origWrite.apply(fs,args);
} else {
console.log("Delaying write:",args[0]);
pending.push(function(){
fs.writeFile.apply(fs,args);
});
}
};
With bagpipe, you just need change
FS.readFile(filename, onRealRead);
=>
var bagpipe = new Bagpipe(10);
bagpipe.push(FS.readFile, filename, onRealRead))
The bagpipe help you limit the parallel. more details: https://github.com/JacksonTian/bagpipe
Had the same problem when running the nodemon command so i reduced the name of files open in sublime text and the error dissappeared.
cwait is a general solution for limiting concurrent executions of any functions that return promises.
In your case the code could be something like:
var Promise = require('bluebird');
var cwait = require('cwait');
// Allow max. 10 concurrent file reads.
var queue = new cwait.TaskQueue(Promise, 10);
var read = queue.wrap(Promise.promisify(batchingReadFile));
Promise.map(files, function(filename) {
console.log(filename);
return(read(filename));
})
Building on #blak3r's answer, here's a bit of shorthand I use in case it helps other diagnose:
If you're trying to debug a Node.js script that is running out of file descriptors here's a line to give you the output of lsof used by the node process in question:
openFiles = child_process.execSync(`lsof -p ${process.pid}`);
This will synchronously run lsof filtered by the current running Node.js process and return the results via buffer.
Then use console.log(openFiles.toString()) to convert the buffer to a string and log the results.
For nodemon users:
Just use the --ignore flag to solve the problem.
Example:
nodemon app.js --ignore node_modules/ --ignore data/
Use the latest fs-extra.
I had that problem on Ubuntu (16 and 18) with plenty of file/socket-descriptors space (count with lsof |wc -l). Used fs-extra version 8.1.0. After the update to 9.0.0 the "Error: EMFILE, too many open files" vanished.
I've experienced diverse problems on diverse OS' with node handling filesystems. Filesystems are obviously not trivial.
I solved this by updating watchman
brew install watchman
I did installing watchman, changing limit etc. and it didn't work in Gulp.
Restarting iterm2 actually helped though.
For anyone that might still be looking for solutions, using async-await worked fine for me:
fs.readdir(<directory path></directory>, async (err, filenames) => {
if (err) {
console.log(err);
}
try {
for (let filename of filenames) {
const fileContent = await new Promise((resolve, reject) => {
fs.readFile(<dirctory path + filename>, 'utf-8', (err, content) => {
if (err) {
reject(err);
}
resolve(content);
});
});
... // do things with fileContent
}
} catch (err) {
console.log(err);
}
});
Here's my two cents: Considering a CSV file is just lines of text I've streamed the data (strings) to avoid this problem.
Easiest solution for me that worked in my usecase.
It can be used with graceful fs or standard fs. Just note that there won't be headers in the file when creating.
// import graceful-fs or normal fs
const fs = require("graceful-fs"); // or use: const fs = require("fs")
// Create output file and set it up to receive streamed data
// Flag is to say "append" so that data can be recursively added to the same file
let fakeCSV = fs.createWriteStream("./output/document.csv", {
flags: "a",
});
and the data that needs to be streamed to the file i've done like this
// create custom streamer that can be invoked when needed
const customStreamer = (dataToWrite) => {
fakeCSV.write(dataToWrite + "\n");
};
Note that the dataToWrite is simply a string with a custom seperator like ";" or ",".
i.e.
const dataToWrite = "batman" + ";" + "superman"
customStreamer(dataToWrite);
This writes "batman;superman" to the file.
Note that there's no error catching or whatsoever in this example.
Docs: https://nodejs.org/api/fs.html#fs_fs_createwritestream_path_options
This will probably fix your problem if you're struggling to deploy a React solution that was created with the Visual Studio template (and has a web.config). In Azure Release Pipelines, when selecting the template, use:
Azure App Service deployment
Instead of:
Deploy a Node.js app to Azure App Service
It worked for me!
There's another possibility that hasn't been considered or discussed in any of the answers so far: symbolic link cycles.
Node's recursive filesystem watcher does not appear to detect and handle cycles of symlinks. So you can easily trigger this error with an arbitrarily high nfiles ulimit by simply running:
mkdir a
mkdir a/b
cd a/b
ln -s .. c
GNU find will notice the symlink cycle and abort:
$ find a -follow
a
a/b
find: File system loop detected; āa/b/cā is part of the same file system loop as āaā.
but node won't. If you set up a watch on the tree, it'll spew a EMFILE, too many open files error.
Amongst other things this can happen in node_modules where there's a containment relationship:
parent/
package.json
child/
package.json
which is how I encountered it in a project I was trying to build.
Note that you don't necessarily need to overcomplicate this issue, trying again works just fine.
import { promises as fs } from "fs";
const filepaths = [];
const errors = [];
function process_file(content: string) {
// logic here
}
await Promise.all(
filepaths.map(function read_each(filepath) {
return fs
.readFile(filepath, "utf8")
.then(process_file)
.catch(function (error) {
if (error.code === "EMFILE") return read_each(filepath);
else errors.push({ file: filepath, error });
});
}),
);
On Windows, there is seems that no the ulimit command to increase the number of open files. In graceful-fs, it maintains a queue to run I/O operations, eg: read/write file.
However, fs.readFile, fs.writeFile are based on fs.open, so you will need open/close files manually to solve this error.
import fs from 'fs/promises';
const fd = await fs.open('path-to-file', 'r');
await fd.readFile('utf-8'); // <== read through file handle
await fd.close(); // <== manually close it
I had this issue, and i solved it by running npm update and it worked.
In some cases you may need to remove node_modules rm -rf node_modules/
This may happen after changing the Node version
ERR emfile too many open files
Restart the computer
brew install watchman
It should be absolutely fixed the issue
first update your version of expo using expo update and then run yarn / npm install. This solved the issue for me!