Webpack - Bundling legacy files that depend on each other globals - javascript

I'm working with a legacy project and I try to convert it to a modern webpack project, with minimum changes to the original legacy files.
The problem is that many of these legacy files rely on each other's globals, e.g:
legacy1.js
console.log("Legacy1");
function globalMess() {
console.log("I am doing global mess!")
}
legacy2.js
console.log("Legacy2");
globalMess();%
The ideas solution would be one that:
Allows all the legacy files use each other globals without me searching for every global.
The globals would be only global to the legacy files, and not leak out to the real global scope.
Allow me to use Webpack's cache busting hash on the resulting legacy file. So that if I do I fix a bug in one the legacy files, the resulting file would have a new hash (e.g legacy1.abc.js).
Automatically add the resulting file to index.html usings something like HtmlWebpackPlugin.
What I've considered so far
These are solutions I've considered:
Solution 1
Simply inject them manually into index.html, in the same order they appear in the original project. This solution would have no cache busting and would leak globals.
Solution 2
Using ExposeLoader/ProvidePlugin. Those require me know specify each and every global that each library exposes. This doesn't make sense to me as I have 20 legacy files and each exposes some random function and I don't really know which exposes what.
Solution 3
Using webpack-raw-bundler. It concats the files, pretty much like I wanted. However, it doesn't support cache busting via hash or automatically adding itself to index.html.
Ideal solution
So what I imagine as an ideal solution would be a webpack plugin that globs all the legacy files, bundles them all as into a single file in its own function and produces a single file with a hash. That file will also get injected into the index.html using the logic of HtmlWebpackPlugin.
Its usage would look something like that:
new LegacyBundlePlugin({
uglify: false,
sourceMap: false,
name: "legacy",
fileName: "[name].[hash:8].js",
filesToConcat: ["./legacy1.js", "./legacy2.js"],
}),
Which would produce the following:
(function () {
console.log("Legacy1");
function globalMess() {
console.log("I am doing global mess!")
}
console.log("Legacy2");
globalMess();
})();

Currently the best solution I found is using webpack-concat-plugin.
It bundles everything to a single file, allows me to use cache busting and adds itself via HtmlWebpackPlugin to the resulting html.
The only thing it doesn't do is not-leaking to the global scope as it simply concats the provided files and doesn't put them into any scope.

Related

Encapsulating requireJS and its modules

I'd like to use something like requireJS to modularise my javascript app, however, there's one particular requirement I'd like to meet that I haven't figured out:
I'm happy enough to have require and define in the global space, however, I don't want modules that I import with require to be accessible globally (i.e. they should be accessible to my app, but not other apps running on the same page).
It seems to me that if I call define('moduleA', function() { ... }); I can then access that module - globally - via the require function. It may not be occupying a variable in the global space itself, or be attached to window, but it still feels bad, because other apps really shouldn't be able to see my internal modules (not to mention potential naming conflicts, etc, can I use contexts to circumvent this?).
This seems to be a step back from just namespacing my modules and including them all inside of one big privatising function at build time.
I could have my own private version of require but my modules (being in different files) wouldn't be able to access define.
Am I missing something or do I just have to live with this? (or alternatively, run an optimizer to bake everything into one file - feels like I could just namespace my modules and not bother with requireJS at all if I do that though).
In your r.js build config add wrap: true and you will no longer pollute the glabal scope with require or define functions.
Edit:
You can also specify a namespace for your require and define with the namespace setting in the r.js build config. You can read more about namespacing in RequireJS's FAQ. From the example.build.js comments:
// Allows namespacing requirejs, require and define calls to a new name.
// This allows stronger assurances of getting a module space that will
// not interfere with others using a define/require AMD-based module
// system. The example below will rename define() calls to foo.define().
// See http://requirejs.org/docs/faq-advanced.html#rename for a more
// complete example.
namespace: 'foo',

require.js - How can I set a version on required modules as part of the URL?

I am using require.js to require JS modules in my application.
I need a way to bust client cache on new JS modules, by way of a different requested URL.
i.e., if the file hello/there.js has already been cached on the client, I can change the file name to force the browser to get the new file.
In other words, for the module hello/there, I'd like require.js to request the url hello/there___v1234___.js (the file name can look different, it's just an example), according to a version string which is accessible on the client.
What is the best way to achieve that?
I got really frustrated with the urlArgs solution and finally gave up and implemented my own fix directly into require.js. This fix implements your ideal solution, if you are willing to modify your version of the library.
You can see the patch here:
https://github.com/jbcpollak/requirejs/commit/589ee0cdfe6f719cd761eee631ce68eee09a5a67
Once added, you can do something like this in your require config:
var require = {
baseUrl: "/scripts/",
cacheSuffix: ".buildNumber"
}
Use your build system or server environment to replace buildNumber with a revision id or software version.
Using require like this:
require(["myModule"], function() {
// no-op;
});
Will cause require to request this file:
http://yourserver.com/scripts/myModule.buildNumber.js
The patch will ignore any script that specifies a protocol, and it will not affect any non-JS files.
On our server environment, we use url rewrite rules to strip out the buildNumber, and serve the correct JS file. This way we don't actually have to worry about renaming all our JS files.
This works well for my environment, but I realize some users would prefer a prefix rather than a suffix, it should be easy to modify my commit to suit your needs.
Here are some possible duplicate questions:
RequireJS and proxy caching
Prevent RequireJS from Caching Required Scripts
OK, I googled "requirejs cache bust" for you and found this existing SO answer, which says you can configure requireJS with an urlArgs parameter, which is only a partial solution, but it might be enough to meet your immediate needs.
That said, the cachebusting problem is full of challenges and many "solutions" don't actually solve the problem in its entirety. The only maintainable way to do this (IMHO as of now) is with a full-on asset management system like the Ruby on Rails asset pipeline or connect-assets or the equivalent for your server side framework of choice. Those can correct compute a checksum (usually MD5 or SHA1) of the content of each file and give you the file names you need to put as URLs in your HTML script tags. So, don't bother with manually changing filenames based on version numbers, just use checksums since they are easily automated and foolproof.
From what I can tell, out of the box requirejs can't do the cachebusting aspect for you. You might want to read this google groups thread. Otherwise, you may need to pair requirejs with an additional tool/script to get you good cachebuster checksums.
Just do it like the creator of requirejs suggests:
var load = requirejs.load;
requirejs.load = function (context, moduleId, url) {
// modify url here
url = url.substring(0, url.lastIndexOf('.')) + '.' + VERSION + url.substring(url.lastIndexOf('.'));
return load(context, moduleId, url);
};
https://github.com/jrburke/requirejs/wiki/Fine-grained-URL-control
HTML5 Boilerplate has ant-build-script that renames your files and any reference to them for this exact reason and can do alot more. It's worth checking out if you haven't already.

Approaches to modular client-side Javascript without namespace pollution

I'm writing client-side code and would like to write multiple, modular JS files that can interact while preventing global namespace pollution.
index.html
<script src="util.js"></script>
<script src="index.js"></script>
util.js
(function() {
var helper() {
// Performs some useful utility operation
}
});
index.js
(function () {
console.log("Loaded index.js script");
helper();
console.log("Done with execution.");
})
This code nicely keeps utility functions in a separate file and does not pollute the global namespace. However, the helper utility function will not be executed because 'helper' exists inside a separate anonymous function namespace.
One alternative approach involves placing all JS code inside one file or using a single variable in the global namespace like so:
var util_ns = {
helper: function() {
// Performs some useful utility operation.
},
etc.
}
Both these approaches have cons in terms of modularity and clean namespacing.
I'm used to working (server-side) in Node.js land where I can 'require' one Javascript file inside another, effectively injecting the util.js bindings into the index.js namespace.
I'd like to do something similar here (but client-side) that would allow code to be written in separate modular files while not creating any variables in the global namespace while allowing access to other modules (i.e. like a utility module).
Is this doable in a simple way (without libraries, etc)?
If not, in the realm of making client-side JS behave more like Node and npm, I'm aware of the existence of requireJS, browserify, AMD, and commonJS standardization attempts. However, I'm not sure of the pros and cons and actual usage of each.
I would strongly recommend you to go ahead with RequireJS.
The modules support approach (without requires/dependencies):
// moduleA.js
var MyApplication = (function(app) {
app.util = app.util || {};
app.util.hypotenuse = function(a, b) {
return Math.sqrt(a * a + b * b);
};
return app;
})(MyApplication || {});
// ----------
// moduleB.js
var MyApplication = (function(app) {
app.util = app.util || {};
app.util.area = function(a, b) {
return a * b / 2;
};
return app;
})(MyApplication || {});
// ----------
// index.js - here you have to include both moduleA and moduleB manually
// or write some loader
var a = 3,
b = 4;
console.log('Hypotenuse: ', MyApplication.util.hypotenuse(a, b));
console.log('Area: ', MyApplication.util.area(a, b));
Here you're creating only one global variable (namespace) MyApplication, all other stuff is "nested" into it.
Fiddle - http://jsfiddle.net/f0t0n/hmbb7/
**One more approach that I used earlier in my projects - https://gist.github.com/4133310
But anyway I threw out all that stuff when started to use RequireJS.*
You should check out browserify, which will process a modular JavaScript project into a single file. You can use require in it as you do in node.
It even gives a bunch of the node.js libs like url, http and crypto.
ADDITION: In my opinion, the pro of browserify is that it is simply to use and requires no own code - you can even use your already written node.js code with it. There's no boilerplate code or code change necessary at all, and it's as CommonJS-compliant as node.js is. It outputs a single .js that allows you to use require in your website code, too.
There are two cons to this, IMHO: First is that two files that were compiled by browserify can override their require functions if they are included in the same website code, so you have to be careful there. Another is of course you have to run browserify every time to make change to the code. And of course, the module system code is always part of your compiled file.
I strongly suggest you try a build tool.
Build tools will allow you to have different files (even in different folders) when developing, and concatenating them at the end for debugging, testing or production. Even better, you won't need to add a library to your project, the build tool resides in different files and are not included in your release version.
I use GruntJS, and basically it works like this. Suppose you have your util.js and index.js (which needs the helper object to be defined), both inside a js directory. You can develop both separately, and then concatenate both to an app.js file in the dist directory that will be loaded by your html. In Grunt you can specify something like:
concat: {
app: {
src: ['js/util.js', 'js/index.js'],
dest: 'dist/app.js'
}
}
Which will automatically create the concatenation of the files. Additionally, you can minify them, lint them, and make any process you want to them too. You can also have them in completely different directories and still end up with one file packaged with your code in the right order. You can even trigger the process every time you save a file to save time.
At the end, from HTML, you would only have to reference one file:
<script src="dist/app.js"></script>
Adding a file that resides in a different directory is very easy:
concat: {
app: {
src: ['js/util.js', 'js/index.js', 'js/helpers/date/whatever.js'],
dest: 'dist/app.js'
}
}
And your html will still only reference one file.
Some other available tools that do the same are Brunch and Yeoman.
-------- EDIT -----------
Require JS (and some alternatives, such as Head JS) is a very popular AMD (Asynchronous Module Definition) which allows to simply specify dependencies. A build tool (e.g., Grunt) on the other hand, allows managing files and adding more functionalities without relying on an external library. On some occasions you can even use both.
I think having the file dependencies / directory issues / build process separated from your code is the way to go. With build tools you have a clear view of your code and a completely separate place where you specify what to do with the files. It also provides a very scalable architecture, because it can work through structure changes or future needs (such as including LESS or CoffeeScript files).
One last point, having a single file in production also means less HTTP overhead. Remember that minimizing the number of calls to the server is important. Having multiple files is very inefficient.
Finally, this is a great article on AMD tools s build tools, worth a read.
So called "global namespace pollution" is greatly over rated as an issue. I don't know about node.js, but in a typical DOM, there are hundreds of global variables by default. Name duplication is rarely an issue where names are chosen judiciously. Adding a few using script will not make the slightest difference. Using a pattern like:
var mySpecialIdentifier = mySpecialIdentifier || {};
means adding a single variable that can be the root of all your code. You can then add modules to your heart's content, e.g.
mySpecialIdentifier.dom = {
/* add dom methods */
}
(function(global, undefined) {
if (!global.mySpecialIdentifier) global.mySpecialIdentifier = {};
/* add methods that require feature testing */
}(this));
And so on.
You can also use an "extend" function that does the testing and adding of base objects so you don't replicate that code and can add methods to base library objects easily from different files. Your library documentation should tell you if you are replicating names or functionality before it becomes an issue (and testing should tell you too).
Your entire library can use a single global variable and can be easily extended or trimmed as you see fit. Finally, you aren't dependent on any third party code to solve a fairly trivial issue.
You can do it like this:
-- main.js --
var my_ns = {};
-- util.js --
my_ns.util = {
map: function () {}
// .. etc
}
-- index.js --
my_ns.index = {
// ..
}
This way you occupy only one variable.
One way of solving this is to have your components talk to each other using a "message bus". A Message (or event) consists of a category and a payload. Components can subscribe to messages of a certain category and can publish messages. This is quite easy to implement, but there are also some out of the box-solutions out there. While this is a neat solution, it also has a great impact on the architecture of your application.
Here is an example implementation: http://pastebin.com/2KE25Par
http://brunch.io/ should be one of the simplest ways if you want to write node-like modular code in your browser without async AMD hell. With it, you’re also able to require() your templates etc, not just JS files.
There are a lot of skeletons (base applications) which you can use with it and it’s quite mature.
Check the example application https://github.com/paulmillr/ostio to see some structure. As you may notice, it’s written in coffeescript, but if you want to write in js, you can — brunch doesn’t care about langs.
I think what you want is https://github.com/component/component.
It's synchronous CommonJS just like Node.js,
it has much less overhead,
and it's written by visionmedia who wrote connect and express.

How to manage multiple JS files server-side with Node.js

I'm working on a project with Node.js and the server-side code is becoming large enough that I would like to split it off into multiple files. It appears this has been done client-side for ages, development is done by inserting a script tag for each file and only for distribution is something like "Make" used to put everything together. I realize there's no point in concatting all the server-side code so I'm not asking how to do that. The closest thing I can find to use is require(), however it doesn't behave quite like script does in the browser in that require'd files do not share a common namespace.
Looking at some older Node.js projects, like Shooter, it appears this was once not the case, that or I'm missing something really simple in my code. My require'd files cannot access the global calling namespace at compile time nor run time. Is there any simple way around this or are we forced to make all our require'd JS files completely autonomous from the calling scope?
You do not want a common namespace because globals are evil. In node we define modules
// someThings.js
(function() {
var someThings = ...;
...
module.exports.getSomeThings = function() {
return someThings();
}
}());
// main.js
var things = require("someThings");
...
doSomething(things.getSomeThings());
You define a module and then expose a public API for your module by writing to exports.
The best way to handle this is dependency injection. Your module exposes an init function and you pass an object hash of dependencies into your module.
If you really insist on accessing global scope then you can access that through global. Every file can write and read to the global object. Again you do not want to use globals.
re #Raynos answer, if the module file is next to the file that includes it, it should be
var things = require("./someThings");
If the module is published on, and installed through, npm, or explicitly put into the ./node_modules/ folder, then the
var things = require("someThings");
is correct.

JavaScript dependency management

I am currently maintaining a large number of JS files and the dependency issue is growing over my head. Right now I have each function in a separate file and I manually maintain a database to work out the dependencies between functions.
This I would like to automate. For instance if I have the function f
Array.prototype.f = function() {};
which is referenced in another function g
MyObject.g = function() {
var a = new Array();
a.f();
};
I want to be able to detect that g is referencing f.
How do I go about this? Where do I start? Do I need to actually write a compiler or can I tweak Spidermonkey for instance? Did anyone else already do this?
Any pointers to get me started is very much appreciated
Thanks
Dok
Whilst you could theoretically write a static analysis tool that detected use of globals defined in other files, such as use of MyObject, you couldn't realistically track usage of prototype extension methods.
JavaScript is a dynamically-typed language so there's no practical way for any tool to know that a, if passed out of the g function, is an Array, and so if f() is called on it there's a dependency. It only gets determined what variables hold what types at run-time, so to find out you'd need an interpreter and you've made yourself a Turing-complete problem.
Not to mention the other dynamic aspects of JavaScript that completely defy static analysis, such as fetching properties by square bracket notation, the dreaded eval, or strings in timeouts or event handler attributes.
I think it's a bit of a non-starter really. You're probably better of tracking dependencies manually, but simplifying it by grouping related functions into modules which will be your basic unit of dependency tracking. OK, you'll pull in a few more functions that you technically need, but hopefully not too much.
It's also a good idea to namespace each module, so it's very clear where each call is going, making it easy to keep the dependencies in control manually (eg. by a // uses: ThisModule, ThatModule comment at the top).
Since extensions of the built-in prototypes are trickier to keep track of, keep them down to a bare minimum. Extending eg. Array to include the ECMAScript Fifth Edition methods (like indexOf) on browsers that don't already have them is a good thing to do as a basic fixup that all scripts will use. Adding completely new arbitrary functionality to existing prototypes is questionable.
Have you tried using a dependency manager like RequireJS or LabJS? I noticed no one's mentioned them in this thread.
From http://requirejs.org/docs/start.html:
Inside of main.js, you can use require() to load any other scripts you
need to run:
require(["helper/util"], function(util) {
//This function is called when scripts/helper/util.js is loaded.
//If util.js calls define(), then this function is not fired until
//util's dependencies have loaded, and the util argument will hold
//the module value for "helper/util".
});
You can nest those dependencies as well, so helper/util can require some other files within itself.
As #bobince already suggested, doing static analysis on a JavaScript program is a close to impossible problem to crack. Google Closure compiler does it to some extent but then it also relies on external help from JSDoc comments.
I had a similar problem of finding the order in which JS files should be concatenated in a previous project, and since there were loads of JS files, manually updating the inclusion order seemed too tedious. Instead, I stuck with certain conventions of what constitutes a dependency for my purposes, and based upon that and using simple regexp :) I was able to generated the correct inclusion order.
The solution used a topological sort algorithm to generate a dependency graph which then listed the files in the order in which they should be included to satisfy all dependencies. Since each file was basically a pseudo-class using MooTools syntax, there were only 3 ways dependencies could be created for my situation.
When a class Extended some other class.
When a class Implemented some other class.
When a class instantiated an object of some other class using the new keyword.
It was a simple, and definitely a broken solution for general purpose usage but it served me well. If you're interested in the solution, you can see the code here - it's in Ruby.
If your dependencies are more complex, then perhaps you could manually list the dependencies in each JS file itself using comments and some homegrown syntax such as:
// requires: Array
// requires: view/TabPanel
// requires: view/TabBar
Then read each JS file, parse out the requires comments, and construct a dependency graph which will give you the inclusion order you need.
It would be nice to have a tool that can automatically detect those dependencies for you and choose how they are loaded. The best solutions today are a bit cruder though. I created a dependency manager for my particular needs that I want to add to the list (Pyramid Dependency Manager). It has some key features which solve some unique use cases.
Handles other files (including inserting html for views...yes, you can separate your views during development)
Combines the files for you in javascript when you are ready for release (no need to install external tools)
Has a generic include for all html pages. You only have to update one file when a dependency gets added, removed, renamed, etc
Some sample code to show how it works during development.
File: dependencyLoader.js
//Set up file dependencies
Pyramid.newDependency({
name: 'standard',
files: [
'standardResources/jquery.1.6.1.min.js'
]
});
Pyramid.newDependency({
name:'lookAndFeel',
files: [
'styles.css',
'customStyles.css',
'applyStyles.js'
]
});
Pyramid.newDependency({
name:'main',
files: [
'createNamespace.js',
'views/buttonView.view', //contains just html code for a jquery.tmpl template
'models/person.js',
'init.js'
],
dependencies: ['standard','lookAndFeel']
});
Html Files
<head>
<script src="standardResources/pyramid-1.0.1.js"></script>
<script src="dependencyLoader.js"></script>
<script type="text/javascript">
Pyramid.load('main');
</script>
</head>
It does require you to maintain a single file to manage dependencies. I am thinking about creating a program that can automatically generate the loader file for you based on includes in the header but since it handles many different types of dependencies, maintaining them in one file might actually be better.
JSAnalyse uses static code analysis to detect dependencies between javascript files:
http://jsanalyse.codeplex.com/
It also allows you to define the allowed dependencies and to ensure it during the build, for instance. Of course, it cannot detect all dependencies because javascript is dynamic interpretet language which is not type-safe, like already mentioned. But it at least makes you aware of your javascript dependency graph and helps you to keep it under control.
I have written a tool to do something like this: http://github.com/damonsmith/js-class-loader
It's most useful if you have a java webapp and you structure your JS code in the java style. If you do that, it can detect all of your code dependencies and bundle them up, with support for both runtime and parse-time dependencies.

Categories

Resources