Reading a collection of external files locally in JavaScript

Reading a collection of external files locally in JavaScript - javascript

I'm working on an app that needs to access a collection of external files. It's basically a music player. It works as-expected under a web server, but I also want it to work locally in the browser.
General overview:
index.htm (Small index file with markup, gather external js, css)
index.js (All the app code here)
dir.js (An array of file paths of all music files)
/AHX/ (location of the music files)
ahx.js (music player code)
The two main difficulties for this are:
JavaScript cannot list directory contents, even if it is a child directory. Instead I express file paths as an array of strings.
Loading external files only possible using XMLHttpRequest, which has security restrictions when running local/offline, but works in other environments (under HTTP or Chrome App, perhaps other platforms, not sure).
Oddly, in the latest Firefox, 2) is not an issue anymore. XMLHttpRequest works locally without disabling security.fileuri.strict_origin_policy. I'm not sure if that is standard behavior, but Chrome doesn't appear to allow it.
In any case, my current solution is generating a list of file-paths in a .js file (previously I used a txt file that required XHR), and using XMLHttpRequest to load the music files. This of course means I need to keep the folder structure and the file-path database in sync, using a shell script to rebuild the dir.js file.
XHR is only supposed to work over HTTP, so the app requires a web server. I want the app to work locally (and not just force the user to install as a Chrome App). So I am asking this question to find alternative methods of reading the data.
One alternative I tried is encoding all 1000 files in base64 strings and storing it in a JS object. This produces a rather large 8MB .js file. It doesn't appear to be slow to load, but I am assuming it isn't exactly efficient... Plus it is a pain to update/maintain.
localStorage, IndexedDB and Web SQL are all options, but there is no way to pre-populate the storage before the app runs. Perhaps utilize File API for a one-time setup of the storage database.
So back to my question: What are some solutions to accessing a large collection of binary files (200+ files, over 6MB etc) locally (i.e. opening the .html file directly)?
Edit: The app in question on GitHub, to clear up any confusion on my use case. But in general, I'm looking for ways to automatically read these music files from the app locally, without cross-origin errors. Also, here is the 'js-database' version. It stores all 1000 files in a 8MB js file likes so:
[{data:"base64-string-of-data-here",path:"original-path-here"}, ...]
In that way it bypasses the need for XHR.
Edit2: A solution using jszip and IndexedDB appears promising. It is not possible to load multiple files from multiple selected folders, but if the directory tree is zipped, jszip can access an array of all files in the format /FOLDER_HERE/FILE_HERE. The paths and binary data can then be imported into IndexedDB in a one-time setup. It also works fine on file:// URLs which is important.
It is also possible that jszip could be used to effectively build/update a large JSON structure of BASE64 strings of the contents, which doesn't require any setup by the user. Still need to be tested though.

don't take this as a definitive answer, this subject interests me too, if people around dont want to take time to elaborate an answer, please comment, it will be more useful than votes..
from what i learnt in javascript resources, consider that you cannot really bypass the security aspect of the question. Even open source, you should warn explicitly if you didn't take in account the security. People could distribute a modified version of the resources for example. It depends on what is done with the resources.
If this is for a player i recommend treating it as a data resource, not as a script resource, because of security (as long as you don't eval strings or such). JSON data could do the job here, but that would need to process the 1000 files. Not so hard to write a script that processes the files though.
HTML5 file API
I haven't used it yet, so i can just give you one or two links. With the downside that it restricts your player to recent browsers.
https://www.html5rocks.com/en/tutorials/file/dndfiles/
HTML5 File API read as text and binary
(i know, not an answer) use a library:
Except that in this case, this might be an answer, just because there is no real universal data retreivement in javascript. A good library would add that and a support for old browers.
Among these solutions, for example jQuery JSONP allows to do dynamical cross-domain GET requests. With data formatting (and not script), it is much safer to inject. But keep in mind that you should be aware in detail what your player does with the binary, and in which way it can be a risk.
http://api.jquery.com/jQuery.getJSON/
direct inclusion of script: not recommended
<script src="./sameFolderFile.js"></script>
As for direct script inclusion in a local folder structure, it actually works in local. IE says there is ActiveX content and asks for use permission, but it works in firefox and chrome. The tag can be dynamically added, but there is a big security risk here: malicious javascript code added in the resources will be executed. This can lead to risks for the users

Related

Reading directory's contents using Javascript

I have a single text input box where user enters a path to a directory. I want to fetch names of all the files from that directory using File API in JavaScript.
I was reading from this article: http://www.html5rocks.com/en/tutorials/file/filesystem/#toc-dir and tried executing the code under Reading a Directory's contents but was unable to understand the code since we are mentioning the directory name anywhere in the code.
So, how can i accomplish my task?

As far as the SPECS are concerned,
It is not possible to read normal file system directories from the browser apart from the upload button / flash etc.
The browser can, however create a sandbox (in its user files) which appears as a directory structure you can manipulate. This is useful for apps that need to play with actual files and store them locally rather than on server.
Real Life story
The filesystem API is not supported, rather dead specs which may find its way to implementation if need arise.
Nearest currently working functionality is local storage

What are those cache.js and compilation-mappings files

Recently I received a package with web page. I see inside (beside normal html and js files) there are some JS files. It looks like this:
4A3674A3247236B3C8294D2378462378.cache.js
FE728493278423748230C48234782347.cache.js
compilation-mappings.txt
Inside .js files I see Javascript which is obfuscated or minified. Inside compilation-mappings.txt the cache.js are referenced. Are these files generated by some kind of WEB IDE? Unfortunately I have no chance to get information how this wep page was developed.

That is a web project coded in Java and compiled to JS using the GWT project tools.
GWT compiler does a lot of the work you would have to do manually when coding JS by hand, and some other tasks which are almost impossible in a normal JS project: obfuscate, compress, death-code removal, different optimization per browser, renaming of the scripts, code splitting, etc.
What you have in your app is the result of this compilation:
First you should have a unique index.html file, because GWT is used to produce RIA (Rich Internet Applications) also known as SPI (Single Page Interface).
The unique html file should have a reference to a javascript file named application_name.nocache.js. Note the .nocache. part, meaning that the web server should set the appropriate headers, so as it is not cached by proxies nor browsers. This file is very small becaust it just have the code to identify the browser and ask for the next javascript file.
This first script knows which NNNN.cache.js have to load each browser. The NNNN prefix is a unique number which is generated when the app is compiled, and it is different for each browser. GWT supports 6 different browser platforms, so normally you would have 6 files like this. Note the .cache. part of the name, meaning that this files could be cached for ever. They are large files because have all the code of your application.
So the normal workflow of your app is that the browser ask for the index.html file which can be cached. This file has the script tag to get the small start script applicaton.nocache.js which should be always requested to the server. It has just the code for loading the most recent permutation for your browser NNNN.cache.js which will be downloaded cached in your browser for ever.
You have more info about this stuff here
The goals of this naming convention is that the next time the user goes to the app, it will be in cache the index.html and NNNN.cache.js files, asking only for the application.nocache.js which is really small. It guarantees that the user loads always the most recent version of the app, that the browser will download just once the code of your app, that proxies or cache devices do not break your app when releasing a new version, etc.
Said that, it is almost impossible to figure out what the code does inspecting the javascript stuff because of the big obfuscation. You need the original .java files to understand the code or make modifications.

I can't say for sure, but often a string will be attached to the name of a javascript file so that when a new version is deployed clients will not use a cached version of the old one.
(ie, if you have myScript.js and change it, the browser will say "I already have myScript.js, Idon't need it. If it goes from being myScript1234.js to myScript1235.js the browser will go fetch it)
It is possible the framework in use generated those files as part of it's scheme to handle client side cache issues. Though without knowing more details of what framework they used, there's no way of knowing for sure.

Retrieving the entire html with external js/css/images through javascript

I already have a Javascript file (performing some functions), that will be appended to a webpage. Now I want the Javascript to collect the entire webpage along with its html tags, images, external Javascript files and external css files. I don't want to use Jquery/any other external library here.
My motive is to get the entire webpage, save it, and display it as similar as the original one.
Is this possible with Javascript?
Any help will be greatly appreciated.

Short Answer - No
No, it's not possible with JavaScript, especially the "saving" part, as JavaScript doesn't have file access rights in browser environments (which we assume here), except when developing browser extensions or when explicitly modifying your browser's security properties to allow this.
Long Answer - If You Reall Must: The Long and Winding Road...
Loading the Right Content
First you need to figure out whether you want to fetch the page in its static status (as it is sent by the server on the first page load), or in its currently rendered status (after it's been rendered in the browser, and that scripts have executed and may have added content to the page).
Loading Resources
Then you'll need to iterate over all the elements of the DOM, and fetch all external resources (including the ones referenced in CSS files).
You'll probably want to have all resources fetch using HTML or plain-text mime-types in your requests, as otherwise your browser might trigger visible downloads with end-user popups, and not at all perform your transparent downloads.
Updating all references
Next you need to figure out how you'd want to organize your "downloaded" content, and where to put the resources and how to name them to avoid conflicts.
Once done, you need to iterate over all the DOM elements again and update the references to use the paths of your local resources instead of your local resources.
Writing Content to Disk
Now the last bit is to save all these resources to disk, using either your browser's custom APIs or the HTML5 File System APIs.
Exploring the HTML5 FileSystem APIs
Basic Concepts about the FileSystem API
Here Be Dragons
None of this guarantees that you'll achieve what you want, as some pages could still contain code that won't behave nicely once downloaded like this. There may be code requesting content from remote URLs or assuming some directory structures and endpoints, or using resource names that you may have modified, etc... (that would be strange, but is not that uncommon).

Loading local files via jQuery (part 2)

Ok, here we discussed the essence of the problem: in some browsers like Chrome and Opera HttpRequests to local files is turned off by default.
Now the question is: how to build such HTML+javascript viewer of HTML documents, that:
would run locally on any (or most of) browser(s) without additional tuning;
would not use frames;
would have an ability to work with many different files(5-10k);

It can't be done in straight HTML/Javascript if you want to load files via Javascript using AJAX requests. There are good security reasons to not allow local files script access to other files on the local system (see my answer here for more details), so most browsers will not allow this without special user configuration.
So your options are:
Don't load files with Javascript, use frames or another mechanism. If, as you state in the other question, you're shipping all this on CD, you might want to consider using some sort of build system that allows you to create static files using templates and either a database or flat-file content - Jekyll is one option I know of.
Ship an executable along with the files that can either run a local webserver or run HTML files in an application context. I think Appcelerator Titanium might fit the bill.

How to handle javascript & css files across a site?

I have had some thoughts recently on how to handle shared javascript and css files across a web application.
In a current web application that I am working on, I got quite a large number of different javascripts and css files that are placed in an folder on the server. Some of the files are reused, while others are not.
In a production site, it's quite stupid to have a high number of HTTP requests and many kilobytes of unnecessary javascript and redundant css being loaded. The solution to that is of course to create one big bundled file per page that only contains the necessary information, which then is minimized and sent compressed (GZIP) to the client.
There's no worries to create a bundle of javascript files and minimize them manually if you were going to do it once, but since the app is continuously maintained and things do change and develop, it quite soon becomes a headache to do this manually while pushing out new updates that features changes to javascripts and/or css files to production.
What's a good approach to handle this? How do you handle this in your application?

I built a library, Combres, that does exactly that, i.e. minify, combine etc. It also automatically detects changes to both local and remote JS/CSS files and push the latest to the browser. It's free & open-source. Check this article out for an introduction to Combres.

I am dealing with the exact same issue on a site I am launching.
I recently found out about a project named SquishIt (see on GitHub). It is built for the Asp.net framework. If you aren't using asp.net, you can still learn about the principles behind what he's doing here.
SquishIt allows you to create named "bundles" of files and then to render those combined and minified file bundles throughout the site.

CSS files can be categorized and partitioned to logical parts (like common, print, vs.) and then you can use CSS's import feature to successfully load the CSS files. Reusing of these small files also makes it possible to use client side caching.
When it comes to Javascript, i think you can solve this problem at server side, multiple script files added to the page, you can also dynamically generate the script file server side but for client side caching to work, these parts should have different and static addresses.

I wrote an ASP.NET handler some time ago that combines, compresses/minifies, gzips, and caches the raw CSS and Javascript source code files on demand. To bring in three CSS files, for example, it would look like this in the markup...
<link rel="stylesheet" type="text/css"
href="/getcss.axd?files=main;theme2;contact" />
The getcss.axd handler reads in the query string and determines which files it needs to read in and minify (in this case, it would look for files called main.css, theme2.css, and contact.css). When it's done reading in the file and compressing it, it stores the big minified string in server-side cache (RAM) for a few hours. It always looks in cache first so that on subsequent requests it does not have to re-compress.
I love this solution because...
It reduces the number of requests as much as possible
No additional steps are required for deployment
It is very easy to maintain
Only down-side is that all the style/script code will eventually be stored within server memory. But RAM is so cheap nowadays that it is not as big of a deal as it used to be.
Also, one thing worth mentioning, make sure that the query string is not succeptible to any harmful path manipulation (only allow A-Z and 0-9).

What you are talking about is called minification.
There are many libraries and helpers for different platforms and languages to help with this. As you did not post what you are using, I can't really point you towards something more relevant to yourself.
Here is one project on google code - minify.
Here is an example of a .NET Http handler that does all of this on the fly.

Develop Reference

JavaScript is the programming language of the Web.