Detecting Hard Links in Node.js - javascript

How can I tell if a file-system path is a hard link with Node.js? The function fs.lstat gives a stats object that, when given a hard link will return true for stats.isDirectory() and stats.isFile() respectively. fs.lstat doesn't offer up anything to note the difference between a normal file or directory and a linked one.
If my understanding of how linking (ln) works is correct, then a linked file points to the same place on the disk as the original file. This would mean that both the original and linked version are identical, and there is no way to tell the difference between the original file and the linked.
The functionality I'm looking for is as follows:
This is hypothetical pseudo-code for demonstration & communication purposes.
fs.writeFileSync('./file.txt', 'hello world')
fs.linkSync('./file.txt', './link.txt')
fs.isLinkSync('./file.txt') // => false
fs.isLinkSync('./link.txt') // => true
fs.linkChildrenSync('./file.txt') // => ['./link.txt']
fs.linkChildrenSync('./link.txt') // => []
fs.linkParentSync('./link.txt') // => './file.txt'
fs.linkParentSync('./file.txt') // => null

Alright.. just for fun...
You may have an option for finding the files via inode in a certain directory.
Once you grab the inode ID from the stat object..
fs.stat('./okay.file', function(err, stats){
var inodeID = stats.ino; // Double check that this is correct
});
You can then iterate over all the files in the folder and check with a conditional if the inode ID matches. Get all files in a directory. If it doesn't, you can assume there is no link (IN that current directory).
However, it doesn't look like we could search for a file by the inode id. see: nodejs open nfs files by inode (or a the fastest way to reopen a file)
fs.lstat: https://nodejs.org/api/fs.html#fs_fs_lstat_path_callback
Stats object: https://nodejs.org/api/fs.html#fs_class_fs_stats

Sorry but that is not possible you can't differ between original and hard linked file. They are the same on your linux system and poinzing to the same inode.

Related

How can I get an input folder and its files using Javacript for Automator?

I am writing an automator workflow to work with files and folders. I’m writing it in JavaScript as I’m more familiar with it.
I would like to receive a folder, and get the folder’s name as well as the files inside.
Here is roughly what I have tried:
Window receives current folders in Finder (I’m only interested in the first and only folder)
Get Folder Contents
JavaScript:
function run(input,parameters) {
var files = [];
for(let file of input) files.push(file.toString().replace(/.*\//,''));
// etc
}
This works, but I don’t have the folder name. Using this, I get the full path name of each file, which is why I run it through the replace() method.
If I omit step 2 above, I get the folder, but I don’t know how to access the contents of the folder.
I can fake the folder by getting the first file and stripping off the file name, but I wonder whether there is a more direct approach to getting both the folder and its contents.
I’ve got it working. In case anybody has a similar question:
// Window receives current folders in Finder
var app = Application.currentApplication()
app.includeStandardAdditions = true
function run(input, parameters) {
let directory = input.toString();
var directoryItems = app.listFolder(directory, { invisibles: false })
var files = [];
for(let file of directoryItems) files.push(file.toString().replace(/.*\//,'')) ;
// etc
}
I don’t include the Get Folder Contents step, but iterate through the folder using app.listFolder() instead. The replace() method is to trim off everything up to the last slash, giving the file’s base name.

Unzip a zip file with JavaScript [duplicate]

I want to display OpenOffice files, .odt and .odp at client side using a web browser.
These files are zipped files. Using Ajax, I can get these files from server but these are zipped files. I have to unzip them using JavaScript, I have tried using inflate.js, http://www.onicos.com/staff/iz/amuse/javascript/expert/inflate.txt, but without success.
How can I do this?
I wrote an unzipper in Javascript. It works.
It relies on Andy G.P. Na's binary file reader and some RFC1951 inflate logic from notmasteryet. I added the ZipFile class.
working example:
http://cheeso.members.winisp.net/Unzip-Example.htm (dead link)
The source:
http://cheeso.members.winisp.net/srcview.aspx?dir=js-unzip (dead link)
NB: the links are dead; I'll find a new host soon.
Included in the source is a ZipFile.htm demonstration page, and 3 distinct scripts, one for the zipfile class, one for the inflate class, and one for a binary file reader class. The demo also depends on jQuery and jQuery UI. If you just download the js-zip.zip file, all of the necessary source is there.
Here's what the application code looks like in Javascript:
// In my demo, this gets attached to a click event.
// it instantiates a ZipFile, and provides a callback that is
// invoked when the zip is read. This can take a few seconds on a
// large zip file, so it's asynchronous.
var readFile = function(){
$("#status").html("<br/>");
var url= $("#urlToLoad").val();
var doneReading = function(zip){
extractEntries(zip);
};
var zipFile = new ZipFile(url, doneReading);
};
// this function extracts the entries from an instantiated zip
function extractEntries(zip){
$('#report').accordion('destroy');
// clear
$("#report").html('');
var extractCb = function(id) {
// this callback is invoked with the entry name, and entry text
// in my demo, the text is just injected into an accordion panel.
return (function(entryName, entryText){
var content = entryText.replace(new RegExp( "\\n", "g" ), "<br/>");
$("#"+id).html(content);
$("#status").append("extract cb, entry(" + entryName + ") id(" + id + ")<br/>");
$('#report').accordion('destroy');
$('#report').accordion({collapsible:true, active:false});
});
}
// for each entry in the zip, extract it.
for (var i=0; i<zip.entries.length; i++) {
var entry = zip.entries[i];
var entryInfo = "<h4><a>" + entry.name + "</a></h4>\n<div>";
// contrive an id for the entry, make it unique
var randomId = "id-"+ Math.floor((Math.random() * 1000000000));
entryInfo += "<span class='inputDiv'><h4>Content:</h4><span id='" + randomId +
"'></span></span></div>\n";
// insert the info for one entry as the last child within the report div
$("#report").append(entryInfo);
// extract asynchronously
entry.extract(extractCb(randomId));
}
}
The demo works in a couple of steps: The readFile fn is triggered by a click, and instantiates a ZipFile object, which reads the zip file. There's an asynchronous callback for when the read completes (usually happens in less than a second for reasonably sized zips) - in this demo the callback is held in the doneReading local variable, which simply calls extractEntries, which
just blindly unzips all the content of the provided zip file. In a real app you would probably choose some of the entries to extract (allow the user to select, or choose one or more entries programmatically, etc).
The extractEntries fn iterates over all entries, and calls extract() on each one, passing a callback. Decompression of an entry takes time, maybe 1s or more for each entry in the zipfile, which means asynchrony is appropriate. The extract callback simply adds the extracted content to an jQuery accordion on the page. If the content is binary, then it gets formatted as such (not shown).
It works, but I think that the utility is somewhat limited.
For one thing: It's very slow. Takes ~4 seconds to unzip the 140k AppNote.txt file from PKWare. The same uncompress can be done in less than .5s in a .NET program. EDIT: The Javascript ZipFile unpacks considerably faster than this now, in IE9 and in Chrome. It is still slower than a compiled program, but it is plenty fast for normal browser usage.
For another: it does not do streaming. It basically slurps in the entire contents of the zipfile into memory. In a "real" programming environment you could read in only the metadata of a zip file (say, 64 bytes per entry) and then read and decompress the other data as desired. There's no way to do IO like that in javascript, as far as I know, therefore the only option is to read the entire zip into memory and do random access in it. This means it will place unreasonable demands on system memory for large zip files. Not so much a problem for a smaller zip file.
Also: It doesn't handle the "general case" zip file - there are lots of zip options that I didn't bother to implement in the unzipper - like ZIP encryption, WinZip encryption, zip64, UTF-8 encoded filenames, and so on. (EDIT - it handles UTF-8 encoded filenames now). The ZipFile class handles the basics, though. Some of these things would not be hard to implement. I have an AES encryption class in Javascript; that could be integrated to support encryption. Supporting Zip64 would probably useless for most users of Javascript, as it is intended to support >4gb zipfiles - don't need to extract those in a browser.
I also did not test the case for unzipping binary content. Right now it unzips text. If you have a zipped binary file, you'd need to edit the ZipFile class to handle it properly. I didn't figure out how to do that cleanly. It does binary files now, too.
EDIT - I updated the JS unzip library and demo. It now does binary files, in addition to text. I've made it more resilient and more general - you can now specify the encoding to use when reading text files. Also the demo is expanded - it shows unzipping an XLSX file in the browser, among other things.
So, while I think it is of limited utility and interest, it works. I guess it would work in Node.js.
I'm using zip.js and it seems to be quite useful. It's worth a look!
Check the Unzip demo, for example.
I found jszip quite useful. I've used so far only for reading, but they have create/edit capabilities as well.
Code wise it looks something like this
var new_zip = new JSZip();
new_zip.load(file);
new_zip.files["doc.xml"].asText() // this give you the text in the file
One thing I noticed is that it seems the file has to be in binary stream format (read using the .readAsArrayBuffer of FileReader(), otherwise I was getting errors saying I might have a corrupt zip file
Edit: Note from the 2.x to 3.0.0 upgrade guide:
The load() method and the constructor with data (new JSZip(data)) have
been replaced by loadAsync().
Thanks user2677034
If you need to support other formats as well or just need good performance, you can use this WebAssembly library
it's promised based, it uses WebWorkers for threading and API is actually simple ES module
How to use
Install with npm i libarchive.js and use it as a ES module.
The library consists of two parts: ES module and webworker bundle, ES module part is your interface to talk to library, use it like any other module. The webworker bundle lives in the libarchive.js/dist folder so you need to make sure that it is available in your public folder since it will not get bundled if you're using bundler (it's all bundled up already) and specify correct path to Archive.init() method.
import {Archive} from 'libarchive.js/main.js';
Archive.init({
workerUrl: 'libarchive.js/dist/worker-bundle.js'
});
document.getElementById('file').addEventListener('change', async (e) => {
const file = e.currentTarget.files[0];
const archive = await Archive.open(file);
let obj = await archive.extractFiles();
console.log(obj);
});
// outputs
{
".gitignore": {File},
"addon": {
"addon.py": {File},
"addon.xml": {File}
},
"README.md": {File}
}
I wrote "Binary Tools for JavaScript", an open source project that includes the ability to unzip, unrar and untar: https://github.com/codedread/bitjs
Used in my comic book reader: https://github.com/codedread/kthoom (also open source).
HTH!
If anyone's reading images or other binary files from a zip file hosted at a remote server, you can use following snippet to download and create zip object using the jszip library.
// this function just get the public url of zip file.
let url = await getStorageUrl(path)
console.log('public url is', url)
//get the zip file to client
axios.get(url, { responseType: 'arraybuffer' }).then((res) => {
console.log('zip download status ', res.status)
//load contents into jszip and create an object
jszip.loadAsync(new Blob([res.data], { type: 'application/zip' })).then((zip) => {
const zipObj = zip
$.each(zip.files, function (index, zipEntry) {
console.log('filename', zipEntry.name)
})
})
Now using the zipObj you can access the files and create a src url for it.
var fname = 'myImage.jpg'
zipObj.file(fname).async('blob').then((blob) => {
var blobUrl = URL.createObjectURL(blob)

Virtual paths from the client to real paths on the server

The client is supposed to see just a directory and its contents on the server (FS_ROOT).
And the server is supposed to convert the paths that it receives from the client to real paths that exist and do the file operations that the client requested on them:
I made these 2 functions to handle that and I want to ask if they are secure enough. I mean there should be no way for the client to fool the server to do something outside FS_ROOT
function fromVirtualPath(virtPath){
if(virtPath === '/' || virtPath === '.')
return FS_ROOT;
virtPath = virtPath.trim();
if(virtPath[0] === '/')
virtPath = virtPath.substr(1);
const absPath = path.resolve(FS_ROOT, virtPath);
if(absPath.indexOf(FS_ROOT) !== 0)
throw new Error('Outside root dir - no permissions!');
return absPath;
}
function toVirtualPath(absPath){
return '/' + path.relative(FS_ROOT, absPath);
}
Example real path: /www/site.com/public_html/yo
Client should see: /yo
About fromVirtualPath I would simply move the line virtPath = virtPath.trim(); to be the first line of the function, then it's ok.
If the values passed to toVirtualPath are always return values of fromVirtualPath, yes it is secure enough; other wise we could check if the value is a good absPath.
function fromVirtualPath(virtPath) {
virtPath = virtPath.trim();
if (virtPath === '/' || virtPath === '.')
return FS_ROOT;
if (virtPath[0] === '/')
virtPath = virtPath.substr(1);
const absPath = path.resolve(FS_ROOT, virtPath);
if (absPath.indexOf(FS_ROOT) !== 0)
throw new Error('Outside root dir - no permissions!');
return absPath;
}
function toVirtualPath(absPath) {
if (absPath.indexOf(FS_ROOT) !== 0)
throw new Error('Bad absolute path!');
return '/' + path.relative(FS_ROOT, absPath);
}
Your code is a bit insecure until you make use of the techniques provided by NODE.JS in the mentioned article. Try implementing the following code,
function fromVirtualPath(virtPath) {
virtPath = virtPath.trim();
if (virtPath === '/' || virtPath === '.')
return FS_ROOT;
if (virtPath.indexOf('\0') !== -1)
throw new Error('That was evil.');
const absPath = path.join(FS_ROOT, virtPath);
if (absPath.indexOf(FS_ROOT) !== 0)
throw new Error('Outside root dir - no permissions!');
return absPath;
}
function toVirtualPath(absPath) {
return '/' + path.relative(FS_ROOT, absPath);
}
The following article from NODE.JS will be really helpful to you.
"How can I secure my code?"
Poison Null Bytes
Poison null bytes are a way to trick your code into seeing another
filename than the one that will actually be opened.
if (filename.indexOf('\0') !== -1) {
return respond('That was evil.');
}
Preventing Directory Traversal
This example assumes that you already checked the
userSuppliedFilename variable as described in the "Poison Null
Bytes" section above.
var rootDirectory = '/var/www/'; // this is your FS_ROOT
Make sure that you have a slash at the end of the allowed folders name
you don't want people to be able to access /var/www-secret/, do you?.
var path = require('path');
var filename = path.join(rootDirectory, userSuppliedFilename);
Now filename contains an absolute path and doesn't contain ..
sequences anymore - path.join takes care of that. However, it might
be something like /etc/passwd now, so you have to check whether it
starts with the rootDirectory:
if (filename.indexOf(rootDirectory) !== 0) {
return respond('trying to sneak out of the web root?');
}
Now the filename variable should contain the name of a file or
directory that's inside the allowed directory (unless it doesn't
exist).
Security is a complex matter. And you can never be sure.
Despite the fact that I couldn't find any flows in #RahulVerma answer I'll add my 2 cents...
The link that #RahulVerma posted is official but not a documentation per se. And in the documentation there is nothing about Poison Null Bytes ...strange isn't it.
And that makes you think: maybe, just maybe, when the fs and/or path modules were written authors didn't put enough effort into security considerations, or just missed that. Yes, maybe there are some good reasons for you and not the fs/path to handle the \0. But also wouldn't it be better if everyone was protected from \0 by default? And only for some rear occasions you could explicitly set an option to allow \0 in paths.
So... what am I trying to say is: security is hard even for the best of us, and without proper peer review (currently, less than 100 views on this question do not strike me as a "proper peer review") or, better yet, a history of successful time in production, you should not be satisfied with these answers (my included) saying "It's OK, if you add this or that".
Why don't you use some code that already was tested in battles instead of trying to write a secure code by yourself?
E.g serve-static is used in Express.
(Probably it doesn't meet your needs - it's static after all, but you get the idea)
Even if you don't want another dependency in your project you can at least study and copy from the implementation that proved itself. (But, yes, it doesn't seem different from the #RahulVerma answer)
That said. I'd like to point out that:
If you'd copy the implementation, you can make a mistake while doing so.
Even if your code is safe, consider how safe do you manage your code. Will it be safe tomorrow?
Even well tested libraries and engines can, and often do, have bugs, and fall prey to 0day exploits
Oh! Just found: https://github.com/autovance/ftp-srv/issues/167
It's about the library that was suggested in another question of yours.
So, if you decide (or if you'll be assured) that now you code is surely safe don't stop on that! Add an extra layer of security to it anyway:
restrict the server's access to folders outside of the /www/site.com/public_html/ on an OS level.
The following principles can be applied to secure client access to paths relative to the web root:
Restrict access outside of your public web root folder to your
service. Rationale: begin with ZERO trust.
Split the path provided by the user into parts. This will remove leading '/' and all '/' separators leaving only the parts of the path. Better yet, use whitelisting for path parts to restrict acceptable characters in a path part using a regular expression. Rationale: sanitize user input
Validate each part sequentially for existence assuming that the first part starts from the web root as it is intended. Disallow .. (parent dir) in part names (to prevent traversal outside the web root folder). Rationale: sanitize user input and validate user input
Avoid using symbolic links under the web root folder (to prevent
traversal outside the web root folder). Rationale: reduce attack surface
Fail early with an error upon encountering the first invalid part. Rationale: reduce attack surface
To optimize system calls, you can do the check for .. and part whitelisting in one pass. If there are any .. in the path or offending parts, return an error. Otherwise, split the parts and rebuild the absolute path string by concatenating them with your web root and do one existence check instead of multiple folder existence checks along the path.
Instead of trying to validate every path yourself, let the operating system do it for you! This is a good example of an application that could use a chroot.
Here is an example of an npm library which creates a chroot.
> var chroot = require("chroot")
> var fs = require("fs")
> chroot('/virtual/root/here', 'nobody')
> fs.readdir(".", function(err, files) { console.log(files); }) // Lists virtual root
> fs.readdir("..", function(err, files) { console.log(files); }) // Also lists virtual root
> fs.readdir("/", function(err, files) { console.log(files); }) // ALSO lists virtual root
Should you run this script as root, it immediately changes the user to "nobody" and sandboxes you to your virtual root. This prevents the script from accessing anything outside it, and the program can't chroot out either, as it's no longer running as root.
Now that you are chrooted into your virtual root, using "/" will give you a directory listing of your virtual root - essentially, you can use your virtual path directly in fs.readdir()!
Need to access some specific files outside the new root? Use microservices! You can run a node.js instance in the background as your file accessor, and communicate between your main server and your file accessor. Having two nodejs instances not only allows your background task to sandbox itself, but also allows you to make use of multithreading.
Yours is a basic java code. In real time scenarios, these basic java code should not be deployed on server side and we can't expect
secuirty out of this.
To add the security check to this java code, many APIs come as part of Spring framework but since we are writing java code then we can
make use of java NIO package only, API name WatchService and WatchEvent
class DirectoryWatchTest {
public static void main(String[] args) {
try {
WatchService watchService = FileSystems.getDefault().newWatchService();
Path path = Paths.get("C:/");
/**
* The register() method of the Path class takes a WatchService object and an event type for which the
* application needs to get notified.
*
* The supported event types are:
* ENTRY_CREATE: indicates if a directory or file is created.
* ENTRY_DELETE: indicates if a directory or file is deleted.
* ENTRY_MODIFY: indicates if a directory or file is modified.
* OVERFLOW: indicates if the event might have been lost or discarded. This event is always implicitly
* registered so we don't need to explicitly specify it in the register() method. */
path.register(watchService, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY);
while (true) {
WatchKey key;
try {
key = watchService.take();
} catch (InterruptedException ex) {
return;
}
/**
* The whole work flow:
* A Watchable object is registered with a watch service by invoking its register method,
* returning a WatchKey to represent the registration.
*
* When an event for an object is detected, the key is signalled, and if not currently signalled,
* it is queued to the watch service so that it can be retrieved by consumers that invoke the poll or
* take methods to retrieve keys and process events.
*
* pollEvents List<WatchEvent<?>> pollEvents() method retrieves and removes all pending events for
* this watch key, returning a List of the events that were retrieved. Note that this method does not
* wait if there are no events pending. */
for (WatchEvent<?> event : key.pollEvents()) {
WatchEvent.Kind<?> kind = event.kind();
#SuppressWarnings("unchecked")
WatchEvent<Path> ev = (WatchEvent<Path>) event;
Path fileName = ev.context();
System.out.println(kind.name() + ": " + fileName);
if (kind == ENTRY_MODIFY && fileName.toString().equals("DirectoryWatchTest.java")) {
System.out.println("My source file has changed!!!");
System.out.println("My source file has changed!!! - Modified");
}
}
/**Once the events have been processed the consumer invokes the key's reset method to reset the
* key which allows the key to be signalled and re-queued with further events.*/
boolean valid = key.reset();
if (!valid) {
break;
}
}
} catch (IOException ex) {
System.err.println(ex);
}
}
}
This kind of basic security check can be put in java code. The user will be able to watch the url unless and until we don't get
hold of protocol and hide it via #PutMapping or implementing security based API's in this but for that we need framework based API's
enter code here

File and Directory Entries API broken in Chrome?

I'm trying to use the File and Directory Entries API to create a file uploader tool that will allow me to drop an arbitrary combination of files and directories into a browser window, to be read and uploaded.
(I'm fully aware that similar functionality can be achieved by using an file input element with webkitdirectory enabled, but I'm testing a use case where the user isn't forced to put everything into a single folder)
Using the Drag and Drop API, I've managed to read the DataTransfer items and convert them to FileSystemEntry objects using DataTransferItem.webkitGetAsEntry.
From there, I am able to tell that if the entry is a FileSystemFileEntry or a FileSystemDirectoryEntry. My plan of course if to recursively walk the directory structure, if any, which I should be able to do using the FileSystemDirectoryReader method readEntries, like this:
handleDrop(event) {
event.preventDefault();
event.stopPropagation();
//assuming I dropped only one directory
const directory = event.dataTransfer.items[0];
const directoryEntry = directory.webkitGetAsEntry();
const directoryReader = directoryEntry.createReader();
directoryReader.readEntries(function(entires){
// callback: the "entries" param is an Array
// containing the directory entries
});
}
However, I'm running into the following issue: in Chrome, the readEntries method only returns 100 entries. Apparently, this is the expected behavior as the way to obtain subsequent files from the directory is to call readEntries again. However, I'm finding this impossible to do. A subsequent call to the method throws the error:
DOMException: An operation that depends on state cached in an interface object was made but the state had changed since it was read from disk.
Does anyone know a way around this? Is this API hopelessly broken for directories of 100+ files in Chrome? Is this API deprecated? (not that it was ever "precated"). In Firefox, readEntries returns the whole directory content at once, which apparently against the spec, but it is usable.
Please advice.
Of course, as soon as I had posted this question the answer hit me. What I was trying to do was akin to the following:
handleDrop(event) {
event.preventDefault();
event.stopPropagation();
//assuming I dropped only one directory
const directory = event.dataTransfer.items[0];
const directoryEntry = directory.webkitGetAsEntry();
const directoryReader = directoryEntry.createReader();
directoryReader.readEntries(function(entries){
// callback: the "entries" param is an Array
// containing the directory entries
}, );
directoryReader.readEntries(function(entries){
//call entries a second time
});
}
The problem with this is that readEntries is asynchronous, so I'm trying to call it while it's "busy" reading the first batch (I'm sure lower-level programmers will have a better term for that). A better way of achieving what I was trying to do:
handleDrop(event) {
event.preventDefault();
event.stopPropagation();
//assuming I dropped only one directory
const directory = event.dataTransfer.items[0];
const directoryEntry = directory.webkitGetAsEntry();
const directoryReader = directoryEntry.createReader();
function read(){
directoryReader.readEntries(function(entries){
if(entries.length > 0) {
//do something with the entries
read(); //read the next batch
} else {
//do whatever needs to be done after
//all files are read
}
});
}
read();
}
This way we ensure the FileSystemDirectoryReader is done with one batch before starting the next one.

Unzipping files

I want to display OpenOffice files, .odt and .odp at client side using a web browser.
These files are zipped files. Using Ajax, I can get these files from server but these are zipped files. I have to unzip them using JavaScript, I have tried using inflate.js, http://www.onicos.com/staff/iz/amuse/javascript/expert/inflate.txt, but without success.
How can I do this?
I wrote an unzipper in Javascript. It works.
It relies on Andy G.P. Na's binary file reader and some RFC1951 inflate logic from notmasteryet. I added the ZipFile class.
working example:
http://cheeso.members.winisp.net/Unzip-Example.htm (dead link)
The source:
http://cheeso.members.winisp.net/srcview.aspx?dir=js-unzip (dead link)
NB: the links are dead; I'll find a new host soon.
Included in the source is a ZipFile.htm demonstration page, and 3 distinct scripts, one for the zipfile class, one for the inflate class, and one for a binary file reader class. The demo also depends on jQuery and jQuery UI. If you just download the js-zip.zip file, all of the necessary source is there.
Here's what the application code looks like in Javascript:
// In my demo, this gets attached to a click event.
// it instantiates a ZipFile, and provides a callback that is
// invoked when the zip is read. This can take a few seconds on a
// large zip file, so it's asynchronous.
var readFile = function(){
$("#status").html("<br/>");
var url= $("#urlToLoad").val();
var doneReading = function(zip){
extractEntries(zip);
};
var zipFile = new ZipFile(url, doneReading);
};
// this function extracts the entries from an instantiated zip
function extractEntries(zip){
$('#report').accordion('destroy');
// clear
$("#report").html('');
var extractCb = function(id) {
// this callback is invoked with the entry name, and entry text
// in my demo, the text is just injected into an accordion panel.
return (function(entryName, entryText){
var content = entryText.replace(new RegExp( "\\n", "g" ), "<br/>");
$("#"+id).html(content);
$("#status").append("extract cb, entry(" + entryName + ") id(" + id + ")<br/>");
$('#report').accordion('destroy');
$('#report').accordion({collapsible:true, active:false});
});
}
// for each entry in the zip, extract it.
for (var i=0; i<zip.entries.length; i++) {
var entry = zip.entries[i];
var entryInfo = "<h4><a>" + entry.name + "</a></h4>\n<div>";
// contrive an id for the entry, make it unique
var randomId = "id-"+ Math.floor((Math.random() * 1000000000));
entryInfo += "<span class='inputDiv'><h4>Content:</h4><span id='" + randomId +
"'></span></span></div>\n";
// insert the info for one entry as the last child within the report div
$("#report").append(entryInfo);
// extract asynchronously
entry.extract(extractCb(randomId));
}
}
The demo works in a couple of steps: The readFile fn is triggered by a click, and instantiates a ZipFile object, which reads the zip file. There's an asynchronous callback for when the read completes (usually happens in less than a second for reasonably sized zips) - in this demo the callback is held in the doneReading local variable, which simply calls extractEntries, which
just blindly unzips all the content of the provided zip file. In a real app you would probably choose some of the entries to extract (allow the user to select, or choose one or more entries programmatically, etc).
The extractEntries fn iterates over all entries, and calls extract() on each one, passing a callback. Decompression of an entry takes time, maybe 1s or more for each entry in the zipfile, which means asynchrony is appropriate. The extract callback simply adds the extracted content to an jQuery accordion on the page. If the content is binary, then it gets formatted as such (not shown).
It works, but I think that the utility is somewhat limited.
For one thing: It's very slow. Takes ~4 seconds to unzip the 140k AppNote.txt file from PKWare. The same uncompress can be done in less than .5s in a .NET program. EDIT: The Javascript ZipFile unpacks considerably faster than this now, in IE9 and in Chrome. It is still slower than a compiled program, but it is plenty fast for normal browser usage.
For another: it does not do streaming. It basically slurps in the entire contents of the zipfile into memory. In a "real" programming environment you could read in only the metadata of a zip file (say, 64 bytes per entry) and then read and decompress the other data as desired. There's no way to do IO like that in javascript, as far as I know, therefore the only option is to read the entire zip into memory and do random access in it. This means it will place unreasonable demands on system memory for large zip files. Not so much a problem for a smaller zip file.
Also: It doesn't handle the "general case" zip file - there are lots of zip options that I didn't bother to implement in the unzipper - like ZIP encryption, WinZip encryption, zip64, UTF-8 encoded filenames, and so on. (EDIT - it handles UTF-8 encoded filenames now). The ZipFile class handles the basics, though. Some of these things would not be hard to implement. I have an AES encryption class in Javascript; that could be integrated to support encryption. Supporting Zip64 would probably useless for most users of Javascript, as it is intended to support >4gb zipfiles - don't need to extract those in a browser.
I also did not test the case for unzipping binary content. Right now it unzips text. If you have a zipped binary file, you'd need to edit the ZipFile class to handle it properly. I didn't figure out how to do that cleanly. It does binary files now, too.
EDIT - I updated the JS unzip library and demo. It now does binary files, in addition to text. I've made it more resilient and more general - you can now specify the encoding to use when reading text files. Also the demo is expanded - it shows unzipping an XLSX file in the browser, among other things.
So, while I think it is of limited utility and interest, it works. I guess it would work in Node.js.
I'm using zip.js and it seems to be quite useful. It's worth a look!
Check the Unzip demo, for example.
I found jszip quite useful. I've used so far only for reading, but they have create/edit capabilities as well.
Code wise it looks something like this
var new_zip = new JSZip();
new_zip.load(file);
new_zip.files["doc.xml"].asText() // this give you the text in the file
One thing I noticed is that it seems the file has to be in binary stream format (read using the .readAsArrayBuffer of FileReader(), otherwise I was getting errors saying I might have a corrupt zip file
Edit: Note from the 2.x to 3.0.0 upgrade guide:
The load() method and the constructor with data (new JSZip(data)) have
been replaced by loadAsync().
Thanks user2677034
If you need to support other formats as well or just need good performance, you can use this WebAssembly library
it's promised based, it uses WebWorkers for threading and API is actually simple ES module
How to use
Install with npm i libarchive.js and use it as a ES module.
The library consists of two parts: ES module and webworker bundle, ES module part is your interface to talk to library, use it like any other module. The webworker bundle lives in the libarchive.js/dist folder so you need to make sure that it is available in your public folder since it will not get bundled if you're using bundler (it's all bundled up already) and specify correct path to Archive.init() method.
import {Archive} from 'libarchive.js/main.js';
Archive.init({
workerUrl: 'libarchive.js/dist/worker-bundle.js'
});
document.getElementById('file').addEventListener('change', async (e) => {
const file = e.currentTarget.files[0];
const archive = await Archive.open(file);
let obj = await archive.extractFiles();
console.log(obj);
});
// outputs
{
".gitignore": {File},
"addon": {
"addon.py": {File},
"addon.xml": {File}
},
"README.md": {File}
}
I wrote "Binary Tools for JavaScript", an open source project that includes the ability to unzip, unrar and untar: https://github.com/codedread/bitjs
Used in my comic book reader: https://github.com/codedread/kthoom (also open source).
HTH!
If anyone's reading images or other binary files from a zip file hosted at a remote server, you can use following snippet to download and create zip object using the jszip library.
// this function just get the public url of zip file.
let url = await getStorageUrl(path)
console.log('public url is', url)
//get the zip file to client
axios.get(url, { responseType: 'arraybuffer' }).then((res) => {
console.log('zip download status ', res.status)
//load contents into jszip and create an object
jszip.loadAsync(new Blob([res.data], { type: 'application/zip' })).then((zip) => {
const zipObj = zip
$.each(zip.files, function (index, zipEntry) {
console.log('filename', zipEntry.name)
})
})
Now using the zipObj you can access the files and create a src url for it.
var fname = 'myImage.jpg'
zipObj.file(fname).async('blob').then((blob) => {
var blobUrl = URL.createObjectURL(blob)

Categories

Resources