Save HTML As Standalone Page: Exporting Tool?

Save HTML As Standalone Page: Exporting Tool? - javascript

I need to regularly send html pages to a client as standalone .html files with no external dependencies. The original pages are done with node.js and express and they contains several librairies such as High Charts.
I have done the preparation manually until now, this includes:
Transform all images into blobs
Copy all external .js and .cs inside the page
Minimize where possible (standards librairies such as jQuery or Bootstrap...)
The result is a single .html file that can be opened without an internet connection and looks just like the original.
Is there any tool to do this automatically? If not, maybe I'll code it myself in Python. Do you have any recommendation around that?
Thanks

Monolith is a CLI tool for saving complete web pages as a single HTML file
See https://github.com/Y2Z/monolith

With apologies to OP, as this answer is probably far too late for him, but I'm posting it to help anyone with a similar problem:
HTTrack is an open-source project that does almost exactly what you described, though it doesn't work perfectly on some of the more peculiar JS.
It saves the page with most of the JS, the major images, and everything that the page needs to appear complete. It can be configured to include or exclude the entire or partial JS, images, and CSS.
This does not import all of the JS and other content into the HTML file, but neatly organizes all of the content into one folder and corrects all of the paths to make the folder portable.
It also seems to have trouble grabbing some external sources that are protected, but if it is your local site and simply uses common scripts like JQuery, you should be fine. When I tested it, it correctly downloaded all of my local CSS and any valid external CSS library that I incorporated, the JQuery and derivative scripts that I was using, and the embedded images.
Just to save everyone a question, the program by default saves the downloaded websites to C:\My Web Sites.

Related

How to maintain same version for a library in a website with 20+ webpages

I'm using a third party JavaScript library in a website with 20+ HTML pages each with its own JavaScript and CSS file. But the problem is if a new version is available to the library then, I have to go through all the HTML files to edit the version number. How to maintain the same version of the library in all the webpages?

If you're using any kind of server-side processing before serving the pages, you can have a separate file that contains just the (shared) script tags that need to be loaded and include that file with each of the HTML pages.
If not, editors like VS Code (Ctrl + Shift + H) can do Find/Replace across all files in a project.
Finally, you could omit the version number from the script file names (you'll just have to be aware of how the caching works - people may not get the new file right away).

If you are not using any kind of server-side processing, you could create your own javascript file loader that comes from your own file and loads the external javascript. So you only have to change the external part in one file that gets included in all your other html pages.
Something like this: JavaScript - function to load external JS files is needed

Prevent circumventing ASP.NET minification

I've got some ASP.NET that I'm deploying as an Azure cloud service. The javascript files have comments in them that I'd like not to be visible to anyone consuming the JS. I'm taking advantage of ASP.NET bundling and minification:
http://www.asp.net/mvc/overview/performance/bundling-and-minification
This seems to be a nice solution in that it removes all comments during the minifcation process. But I can't count on the fact that the user won't directly point his or her browser directly to the individual, original js files. I'm trying to figorue out how to prevent the user from pulling the js files directly (forcing them to pull only a bundle), in order to prevent viewing comments. Is there a way to implement a black list of files that can't be downloaded? If not, I was thinking of adding a series of random characters to the name of each js file. Lastly, if that doesn't seem like a good idea, I would investigate injecting something into the VS build process to strip comments on publish.
Any thoughts would be welcome.

You can use blockviewhandler in a web.config in the folder your js is in. Explicitly whitelist any files that are OK to download and then block the rest.
There's an example in this question:
Where to put view-specific javascript files in an ASP.NET MVC application?

I think you can modify your deployment process. To your production server upload only the minified js files but to your test/dev server upload everything.

Web page Optimization

I'm creating a new dinamic site to test and learn about web optimization...
Site Index
For html,css,js (exept jquery-min that is linked to google server) files I've created a php file that concatenate more files, remove unused spaces, and compress it using Gzip:
compressed css - compressed js
if(extension_loaded('zlib')){ob_start('ob_gzhandler');
/*...php code to read files and remove comments/spaces*/
if(extension_loaded('zlib')){ob_end_flush();}
For main images, I collapsed every image into one
For facebook like button, I replace the iframe after page load using jquery, I'd like to do the same with adbrite advertises but I don't know how..
If I try to replace, or inject the code into html after loading, the page disapper and remain only the adv...
Someone could help?
Can you tell me if I'm doing well (for optimization) and where I can improve
Thanks...

It's a good start, but you shouldn't compress anything dynamically. That is just too costly and will end up to be slower than delivering content uncompressed.
Use gzip/deflate and compress your javascript files with a minifier like YUI, Google's closure compiler or uglifyjs to name a few. Serve those files statical.
A nice tool to automate all of the above processes (and way more..) is Apache ANT.
A nice library to serve any content dynamically over one stream is supplyJS.

You can also try Google Granule: http://code.google.com/p/granule/ (which programmatically compresses and minifies css files and js files on the fly)
Also check your "adbrite advertises" is not working is because the code might be using a Document.write() method, which should be called while the document is being parsed. try loading them asynchronously or deferred.
http://www.sitepoint.com/non-blocking-async-defer/

Javascript and website loading time optimization

I know that best practice for including javascript is having all code in a separate .js file and allowing browsers to cache that file.
But when we begin to use many jquery plugins which have their own .js, and our functions depend on them, wouldn't it be better to load dynamically only the js function and the required .js for the current page?
Wouldn't that be faster, in a page, if I only need one function to load dynamically embedding it in html with the script tag instead of loading the whole js with the js plugins?
In other words, aren't there any cases in which there are better practices than keeping our whole javascript code in a separate .js?

It would seem at first glance that this would be a good idea, but in fact it would actually make matters worse. For example, if one page needs plugins 1, 2 and 3, then a file would be build server side with those plugins in it. Now, the browser goes to another page that needs plugins 2 and 4. This would cause another file to be built, this new file would be different from the first one, but it would also contain the code for plugin 2 so the same code ends up getting downloaded twice, bypassing the version that the browser already has.
You are best off leaving the caching to the browser, rather than trying to second-guess it. However, there are options to improve things.
Top of the list is using a CDN. If the plugins you are using are fairly popular ones, then the chances are that they are being hosted with a CDN. If you link to the CDN-hosted plugins, then any visitors who are hitting your site for the first time and who have also happened to have hit another site that's also using the same plugins from the same CDN, the plugins will already be cached.
There are, of course, other things you can to to speed your javascript up. Best practice includes placing all your script include tags as close to the bottom of the document as possible, so as to not hold up page rendering. You should also look into lazy initialization. This involves, for any stuff that needs significant setup to work, attaching a minimalist event handler that when triggered removes itself and sets up the real event handler.

One problem with having separate js files is that will cause more HTTP requests.
Yahoo have a good best practices guide on speeding up your site: http://developer.yahoo.com/performance/rules.html
I believe Google's closure library has something for combining javascript files and dependencies, but I havn't looked to much into it yet. So don't quote me on it: http://code.google.com/closure/library/docs/calcdeps.html
Also there is a tool called jingo http://code.google.com/p/jingo/ but again, I havn't used it yet.

I keep separate files for each plug-in and page during development, but during production I merge-and-minify all my JavaScript files into a single JS file loaded uniformly throughout the site. My main layout file in my web framework (Sinatra) uses the deployment mode to automatically either generate script tags for all JS files (in order, based on a manifest file) or perform the minification and include a single querystring-timestamped script inclusion.
Every page is given a body tag with a unique id, e.g. <body id="contact">.
For those scripts that need to be specific to a particular page, I either modify the selectors to be prefixed by the body:
$('body#contact form#contact').submit(...);
or (more typically) I have the onload handlers for that page bail early:
jQuery(function($){
if (!$('body#contact').length) return;
// Do things specific to the contact page here.
});
Yes, including code (or even a plug-in) that may only be needed by one page of the site is inefficient if the user never visits that page. On the other hand, after the initial load the entire site's JS is ready to roll from the cache.

The network latency is the main problem.You can get a very responsive page if you reduce the http calls to one.
It means all the JS, CSS are bundled into the HTML page.And if your can forget IE6/7 you can put the images as data:image/png;base64
When we release a new version of our web app, a shell script minify and bundle everything into a single html page.
Then there is a second call for the data, and we render all the HTML client-side using a JS template library: PURE
Ensure the page is cached and gzipped. There is probably a limit in size to consider.We try to stay under 400kb unzipped, and load secondary resources later when needed.

You can also try a service like http://www.blaze.io. It automatically peforms most front end optimization tactics and also couples in a CDN.
There currently in private beta but its worth submitting your website to.

I would recommend you join common bits of functionality into individual javascript module files and load them only in the pages they are being used using RequireJS / head.js or a similar dependency management tool.
An example where you are using lighbox popups, contact forms, tracking, and image sliders in different parts of the website would be to separate these into 4 modules and load them only where needed. That way you optimize caching and make sure your site has no unnecessary flab.
As a general rule its always best to have less files than more, its also important to work on the timing of each JS file, as some are needed BEFORE the page completes loading and some AFTER (ie, when user clicks something)
See a lot more tips in the article: 25 Techniques for Javascript Performance Optimization.
Including a section on managing Javascript file dependencies.
Cheers, hope this is useful.

How to work with JavaScript in development then live

I work on front end development and am looking to find a solution for working with javaScript between (non compressed and multiple files) development environment and (compressed and combined files) live environment.
I have found a solution with CSS which means that I only need to include one global CSS file with imports, then we combine and compress those imports when deploying to a live environment. This means that we don't have to toggle adding references in to the head for dev and live.
Any ideas on a similar solution for JavaScipt?
Thanks
Dave

If you are using jQuery it's really easy to include external Javascript files from within Javascript which is basically what you described you did with CSS.
Read up on jQuery getScript()

You can use Charles Web debugging proxy. Or smth similar.
Charles allows to give any local file instead of any url. So you can give your browser your local JS file instead of live JS. Thus you will be able to test JS or CSS changes without showing them to your users.

I use ESC to merge and compress all the independant JavaScripts to a central one, and have it run as a 'post build' task.

For Visual Studio I wrote a small console application I wrote (like ESC as someone mentioned) that is used as a post-build event. It's simple but automates the job you're describing by:
Taking a list of filenames as its arguments
Compressing each one using Crockford's JS compressor
Combining the output into one .js file
Then in the site project, the file is loaded from a resource, and a toggle is performed in a class
List<string> files = new List<string>();
#if DEBUG
files.Add("MyNamespace.Javascript.script1.js");
files.Add("MyNamespace.Javascript.script2.js");
#else
files.Add("MyNamespace.Javascript.Live.js"); // single file
#endif
// ScriptManager.Register them
You could also enable GZIP compression on the JS files for even faster load times. If you're not using the Microsoft dev environment then I'll delete this.

Thanks for all your responses. I have come up with a solution which uses some of your ideas.
i have a global js file which has a list of files to include and when run during dev just writes the script links to the page.
Then included in the deployment process is a script which parses the global js file, looks up which files it is linking together, combines and compresses them in to one global js file.
This means that I don't need any server side code during the process which makes things easier to maintain across a team of freelance front end devs.
i'll post the final bunch of code when it's ready on my blog.

I don't know how your dev environment looks like but you could put all the script tags into one file for development and have another for production that has the script tag for your one single file. For example: development_js.extension and production_js.extension.
Then it's just a matter of either using server-side include or some build tool to merge the correct file into your HTML file.

Develop Reference

JavaScript is the programming language of the Web.