Excessive Recursion of Node.js Project Dependencies

Excessive Recursion of Node.js Project Dependencies - javascript

So after a long day at work, I sit down and see an alert for Windows SkyDrive in the system tray:
Files can't be uploaded because the path of this file or folder is too long. Move the item to a different location or shorten its name.
C:\Users\Matthew\SkyDrive\Documents\Projects\Programming\angular-app\server\node_modules\grunt-contrib-nodeunit\node_modules\nodeunit\node_modules\tap\node_modules\runforcover\node_modules\bunker\node_modules\burrito\node_modules\traverse\example\stringify.js
... and for a while, I laughed at that technological limitation.
But then, I wondered: is that amount of directory recursion within a Node project really necessary? It would appear that the paths beyond "angular-app\server\node_modules" are simply dependencies of the project as a whole and might be better expressed as:
C:\Users\Matthew\SkyDrive\Documents\Projects\Programming\angular-app\server\node_modules\grunt-contrib-nodeunit\
C:\Users\Matthew\SkyDrive\Documents\Projects\Programming\angular-app\server\node_modules\nodeunit\
C:\Users\Matthew\SkyDrive\Documents\Projects\Programming\angular-app\server\node_modules\tap\
C:\Users\Matthew\SkyDrive\Documents\Projects\Programming\angular-app\server\node_modules\runforcover\
C:\Users\Matthew\SkyDrive\Documents\Projects\Programming\angular-app\server\node_modules\bunker\
C:\Users\Matthew\SkyDrive\Documents\Projects\Programming\angular-app\server\node_modules\burrito\
C:\Users\Matthew\SkyDrive\Documents\Projects\Programming\angular-app\server\node_modules\traverse\
I hadn't really given it much thought before, as package management in Node seems like magic compared to many platforms.
I would imagine that some large-scale Node.js projects even contain many duplicate modules (having the same or similar versions) which could be consolidated into a lesser amount. It could be argued that:
The increased amount of data stored and transmitted as a result of
duplicate dependencies adds to the cost of developing software.
Shallower directory structures (especially in this context) are
often easier to navigate and understand.
Excessively long path names can cause problems in some computing
environments.
What I am proposing (if such a thing does not exist) is a Node module that:
Recursively scans a Node project, collecting a list of the nested node_modules folders and how deeply they are buried in relation to the root of the project.
Moves the contents of each nested node_modules folder to the main node_modules folder, editing the require() calls of each .js file such that no references are broken.
Handles multiple versions of duplicate dependencies
If nothing else, it would make for an interesting experiment. What do you guys think? What potential problems might I encounter?

See if
npm dedupe
sets you right.
API Doc here

See fenestrate, npm-flatten, flatten-packages, npm dedupe, and multi-stage-installs.
Quoting Sam Mikes from this StackOverflow question:
npm will add dedupe-at-install-time by default. This is significantly more feasible than Node's module system changing, but it is still not exactly trivial, and involves a lot of reworking of some long-entrenched patterns.
This is (finally) currently in the works at npm, going by the name multi-stage-install, and is targeted for npm#3. npm development lead Forrest Norvell is going to spend some time running on Windows in the new year, so please do create windows-related issues on the npm issue tracker < https://github.com/npm/npm/issues >

Related

Any ideas on getting away from lengthy relative imports in cloud functions? Written in node.js and javascript

I was wondering if anyone here had some ideas or experience getting rid of long, hard-to-maintain relative imports inside of large node.js cloud functions projects. We’ve found that the approach which uses local NPM packages is very sub-optimal because of how quickly we tend to roll out and test new packages and functionality, and refactoring from JS to TS is impossible for us at the moment. We'd love to do it in the near future but are so slammed as it is currently :(
Basically what I’m trying to do in cloud functions is go from const {helperFunction} = require(‘../../../../../../helpers’)
to
const {helperFunction} = require(‘helpers’)
I have been unable to get babel or anything similar to that working in cloud functions. Intuitively I feel like there is an obvious solution to this beyond something like artifact registry or local NPM packages but i’m not seeing it! Any help would be greatly, greatly appreciated :)

For CommonJS modules in general, you have several options. I don't know the Google Cloud environment so you will have to decide which options seem appropriate to it.
The most attractive option to me is #5 as it's just a built-in search up the directory tree, designed specifically to look in the node_modules sub-directories of your parent directories. If you can install the shared modules in that way, then nodejs should be able to find them as long as your directory hierarchy is retained by Google Cloud.
Options #3 and #4 are hacking on the loader which has its own risks, but does give you imlementation flexibility as you could implement your own prefix that looks in a particular spot. But, it's hacking and may or may not work in the Google cloud environment.
Options #1 and #2 rely on environment variables and shared directories which may or may not be relevant in the Google cloud environment.
You can specify the environment variable NODE_PATH as a colon-delimited (or semi-colon on Windows) list of paths to search for modules. Doc here.
In addition, nodejs will search $HOME/.node_modules and $HOME/.node_libraries where $HOME is the user's home directory.
There is a package called module-alias here that is designed specifically to help you solve this. You define module aliases in your package.json, import this one module and then you can use the directory aliases in your require() statements.
You can make your own pre-processor for resolving a module filename that is being loaded by require. You cando this by monkey patching Module._resolveFilename to either modify the filename passed to it or to add additional search paths to the options argument. This is the general concept that the module-alias package (mentioned in point #3 above) uses.
If the actual location of the helper module you want to load is in a node_modules directory somewhere above your current module directory on this volume, it can be found automatically as long as the require is just a filename as in require("helpers"). An example in the doc here describes this:
For example, if the file at /home/ry/projects/foo.js called require('bar.js'), then Node.js would look in the following locations, in this order:
/home/ry/projects/node_modules/bar.js
/home/ry/node_modules/bar.js
/home/node_modules/bar.js
/node_modules/bar.js
You can see how it automatically searches up the directory tree looking in each parent node_modules sub-directory all the way up to the root. If you put these common, shared modules in your main project file's node_modules or in any directory above, then it will be found automatically. This might be the simplest way to do things as it's just a directory structuring and removing of all the ../../ stuff you have in the paths. Clearly these common, shared modules are already located somewhere common - you just need to make sure they're in this search hierarchy so they can be found automatically.
Note this info is for CommonJS modules and may be different for ESM modules.

What's a good way to include an npm module in git

Kind of a noob question here...
We've got a private npm module - a library - that needs to be included in other projects. So far, very simple.
Currently, we're simply "remembering" to manually do an npm run buld before pushing changes to our git repo, and then dependent projects when they do an npm run whatever, they're setup to pull from our repo and use the latest version already "compiled" as a module.
So, there are issues with this approach:
It relies on humans being able to perfectly remember to do a build before pushing to origin. (inherently fragile).
VSCode constantly shows me the build-artifacts as if they were source files. Git similarly shows merge conflicts relating to those files - which -- really, aren't source at all. They're compilation artifacts, but I'm not sure I want to .gitignore them - because - well, the point of all of this is to create those artifacts for use in other projects... so they belong in the repo, just not as source files...
So I'm not sure how to untangle this mess.
I want:
A simple way to update the source that doesn't cause git to become upset about merge conflicts for build artifacts, but only for actual source files
A simple way to ensure that the build artifacts are always updated upon push to origin (in fact, I'd prefer that it build & run our mocha tests and refuse to do a push if that fails)
I'm only about 9mos into using git on github - so there's a ton I don't know...
Ideas for better ways to manage this / automate this - are most welcome!
The key to implementation is of course simplicity. If it's easy to do, I'm sure I'll do it, and can get others to do so. But if it's a huge hurdle every time, well, we all know how well that goes over for other devs...

Is it okay to link node modules files in webpack.config.js?

In some projects, I saw developers didn't link to node_modules files in webpack.config.js (eg. "./node_modules/boostrap/dist/js/boostrap.bundle.js"), instead, they copied the file to assets/js and linked it there. Some of my friends also told me that they prefer this option because they never feel safe with linking to node_modules (I guess as somebody may use npm update...?)
What would you call a "good practice"? Is it totally fine to link to node_modules? If not - what wrong can happen?
I used this method in small projects as I don't think there is a need for doubling files but in larger - for peace of mind - I used the path to assets

It can be okay to do it. Purely from the build step perspective, it doesn't make a difference.
The trade offs you are making between using the node modules as npm provides them (node_modules) and storing your own copies, in an assets or vendors folder, are about:
security
source code management & development efficiency
storage space
When all the thousands of developers around the world create little pet projects and push them to Github, it wouldn't make sense for all of them to store their own copy of JQuery and then push it into their Github repo. Instead we push a package.json file that lists it as a dependency, we do this for every third party dependency and prevent creating a repository where a lot (even most) of the code is not application code, but dependencies. That is good.
On the other hand, if a developer always downloads dependencies every time a new project is started/cloned/forked, you potentially risk, with every module download, the chance of installing a compromised package version. For this we solve with vulnerability scanners, semantic versioning and lock files (package-lock.json) to give you control on how and when you get updates.
Another problem with downloading always is the bandwidth it consumes. For this we solve with a local cache. So, even if you uninstall a module from one project, npm doesn't really delete it from your drive. It keeps a copy on a cache folder. This works really well for most developers, but not so much in an enterprise environment with massive applications.
A problem, that has impacted already the world severely, is that if a module author decides to delete the code then lots of apps stop working because they can't find the dependency anymore. See left-pad broke Node, Babel... (It also broke things at my work)
The issue with moving things out from node_modules to assets is that if your app has 100 dependencies, your are not going to want to do that 100 times. You might as well save in your source control system the complete source code found in node_modules. That comes at a price of course, that folder can have a huge size.
A good balance can be found by using different tools and approaches. Wether you vendorize third party dependencies (store your own copy) or not depends on what has the better cost/risk ratio in your situation.

How does Node.JS handle duplicate transitive dependencies?

I apologize if my questions are naïve. In full disclosure I'm relatively new to Node.JS and JavaScript in general. I'm hoping someone could shed some light on how Node.JS handles duplicate, possibly transitive, dependencies? Not even in terms of the global namespace or any kind of conflicts, or different versions of the same module (e.g. v0.1 vs v0.2 elsewhere in your app), but more around being smart and efficient where possible. For example:
Is there any chance that Node is smart enough in terms of footprint to not have multiple copies of the same exact version of the library in your modules folder? Something like 1 copy for each required version with symbolic links or something similar pointing to this code for each module that depends on that version of that module?
What about in terms of loading duplicate modules into memory at runtime? If v0.1 of module x is already loaded into memory, if some other depended upon module comes along that requires that same version of that module, will the code be re-loaded into memory, or is Node smart enough to see that code is already loaded and re-use it? How sandboxed is Node in this regard?
Thanks!

Node.js has no concept of versions. The require() function resolves its argument to a full path to a .js file, and caches them by filename.
You may be asking how npm installs modules; that depends on the order in which you install them.
You can run npm dedup to do nice things here.

How can I convert a multi-file node.js app to a single file?

If I have a node.js application that is filled with many require statements, how can I compile this into a single .js file? I'd have to manually resolve the require statements and ensure that the classes are loaded in the correct order. Is there some tool that does this?
Let me clarify.
The code that is being run on node.js is not node specific. The only thing I'm doing that doesn't have a direct browser equivalent is using require, which is why I'm asking. It is not using any of the node libraries.

You can use webpack with target: 'node', it will inline all required modules and export everything as a single, standalone, one file, nodejs module
https://webpack.js.org/configuration/target/#root
2021 edit: There are now other solutions you could investigate, examples.
Namely:
https://esbuild.github.io
https://github.com/huozhi/bunchee

Try below:
npm i -g #vercel/ncc
ncc build app.ts -o dist
see detail here https://stackoverflow.com/a/65317389/1979406

If you want to send common code to the browser I would personally recommend something like brequire or requireJS which can "compile" your nodeJS source into asynchronously loading code whilst maintaining the order.
For an actual compiler into a single file you might get away with one for requireJS but I would not trust it with large projects with high complexity and edge-cases.
It shouldn't be too hard to write a file like package.json that npm uses to state in which order the files should occur in your packaging. This way it's your responsibility to make sure everything is compacted in the correct order, you can then write a simplistic node application to reads your package.json file and uses file IO to create your compiled script.
Automatically generating the order in which files should be packaged requires building up a dependency tree and doing lots of file parsing. It should be possible but it will probably crash on circular dependencies. I don't know of any libraries out there to do this for you.

Do NOT use requireJS if you value your sanity. I've seen it used in a largish project and it was an absolute disaster ... maybe the worst technical choice made at that company. RequireJS is designed to run in-browser and to asynchronously and recursively load JS dependencies. That is a TERRIBLE idea. Browsers suck at loading lots and lots of little files over the network; every single doc on web performance will tell you this. So you'll very very quickly end up needing a solution to smash your JS files together ... at which point, what's the point of having an in-browser dependency resolution mechanism? And even though your production site will be smashed into a single JS file, with requireJS, your code must constantly assume that any dependency might or might not be loaded yet; in a complex project, this leads to thousands of async load barriers wrapping every interaction point between modules. At my last company, we had some places where the closure stack was 12+ levels deep. All that "if loaded yet" logic makes your code more complex and harder to work with. It also bloats the code increasing the number of bytes sent to the client. Plus, the client has to load the requireJS library itself, which burns another 14.4k. The size alone should tell you something about the level of feature creep in the requireJS project. For comparison, the entire underscore.js toolkit is only 4k.
What you want is a compile-time step for smashing JS together, not a heavyweight framework that will run in the browser....
You should check out https://github.com/substack/node-browserify
Browserify does exactly what you are asking for .... combines multiple NPM modules into a single JS file for distribution to the browser. The consolidated code is functionally identical to the original code, and the overhead is low (approx 4k + 140 bytes per additional file, including the "require('file')" line). If you are picky, you can cut out most of that 4k, which provides wrappers to emulate common node.js globals in the browser (eg "process.nextTick()").

Develop Reference

JavaScript is the programming language of the Web.