coordinating filesystem activity in nodejs

coordinating filesystem activity in nodejs - javascript

What is the best practice for coordinating access to files in node.js?
I'm trying to write an http based file uploader for very large files (10sGB) that is resumable. I'm trying to figure out what the best approach is to handle two people trying to upload the same file at the same time... I'm also trying to think ahead to the possibility where more than one copy of the node.js http server is running behind a load balancer, which means catching duplicate uploads can't rely on just the code itself.
In python, for example, you can create a file by passing the correct flags to the open() call to force an atomic create. Not sure if the default node.js open new file is atomic.
Another option I thought of, but don't really want to pursue, is using a database with an async driver that supports atomic transactions to track this state...

In order to know if multiple users are uploading the same file, you will have to identify the files somehow. Hashing is best for this. First, hash the entire file on the client side to identify it. Tell the server the hash of the file, if there is already a file on the server with the same hash, then the file has already been uploaded or is currently being uploaded.
Since this is an http file server, you will likely want users to upload files from a browser. You can get the contents of a file with a browser using the File Reader API. Unfortunately as of now this isn't widely supported. You might have to use something like flash to to get it to work in other browsers.
As you stream the file into memory with the file reader, you will want to break it into chunks and hash the chunks. Then send the server all of the file's hashed chunks. It's important that you break the file into chunks and hash those individual chunks instead of the contents of the entire file because otherwise the client could send one hash and upload an entire different file.
After the hashes are received and compared to other files' hashes and it turns out someone else is currently uploading the same file, the server then decides which user gets to upload which chunks of the file. The server then tells the uploading clients what chunks it wants from them, and the clients upload their corresponding chunks.
As each chunk is finished uploading, it is rehashed on the server and compared with the original array of hashes to verify that the user is uploading the correct file.

I found this on HackerNews under a response to someone complaining about some of the same things in node.js. I'll put it here for completeness. This allows me to at least lock some file writes in node.js like I wanted to.
IsaacSchlueter 4 hours ago | link
You can open a file with O_EXCL if you pass in the open flags as a
number. (You can find them on require("constants"), and they need to
be binary-OR'ed together.) This isn't documented. It should be. It
should probably also be exposed in a cleaner way. Most of the rest of
what you describe is APIs that need to be polished and refined a bit.
The boundaries are well defined at this point, though. We probably
won't add another builtin module at this point, or dramatically expand
what any of them can do. (I don't consider seek() dramatic, it's just
tricky to get right given JavaScript's annoying Number problems.)

Related

How can I upload large files from browser to S3 using Laravel in a load balanced environment?

I have an application that allows users to upload relatively large video files, which are stored on S3. We previously used flow-php-server to chunk uploads over multiple requests that are then assembled and stored on S3. Unfortunately, this method no longer works, as we recently upgraded our server architecture to a load balanced environment and chunks are being split across the multiple servers behind our load balancer.
What is the solution to this problem? I am under the impression that if we split file uploads over multiple requests, we can make no guarantees about which server it hits so uploads will fail. Does this mean I'll have to settle with single request uploads and deal with browser single request file size limits? Is there another way around this?
I'm not sure if the solution requires configuring the server/load balancer to somehow direct uploads for the same file to the same server, or if there's a different method I can implement on the front/back ends to accommodate this.

How to get files above the server directory in Node.js

I am trying to serve audio files with a node.js server. The problem is, I want to be able to get any audio file in my computer but I don't know how to make audio elements in html to work with directories above the server. There is this question I found but since the files I want to be able to get always change it didn't really help.
The project is basically a media player on the browser. It will be on LAN so serving everything in the computer isn't really a problem. I am already using express' static function for images, javascript and css. The application is keeping path, name and other informations using nedb.

First off, you have to understand that node.js doesn't serve any files by default. As such, you must either code each individual request manually so that the /foo request generates content from some specific file or code. Or you must create some set of mappings where /content/foo tells your server to read some corresponding directory on your server like /myservercontent/foo.
And, there are various tools to help you create this mapping for entire directory hierarchies of files (such as express.static()). But, any mapping like this has an explicit root where all requests are relative to some root. You can define where you want this root to be on your server. It can even be the root directory of your server (though that is never recommended for a variety of reasons). Usually, this root is set to some parent directory on your hard disk that ONLY contains public web content below that parent directory. This is so that you NEVER create a situation where some random web user can get access to files on your system that you do not intend to be public (such as your HTTPs certificates, databases, server code, password files, etc...).
All that said, node.js allows you to do pretty much anything you want. If you want to give access to some random file on your hard drive in any random location (something I would never recommend), you can easily code node.js to do that. There are several ways to achieve it. One simple one would be to just construct a route that accepts a query parameter where the query parameter specifies the path to the desired file and then have that route handle go read that file and return it. This is a gaping security hole big enough to drive a truck through so I would never, ever recommend this.
Instead, what you really should do is to gather up all the files that you wish to make available via your server and put them into one safe directory hierarchy and then allow access to files in that specific directory hierarchy, not anywhere else on your hard drive.
Now that you've explained a little more about what you're doing, here's one idea:
Scan the local hard drive to identify all audio files that you think are safe to share. Be very, very careful what you decide to share as mistakes here could open big security holes. You will have to assess the security risks of what you're doing here since we don't understand the full context.
As you gather this list of audio files, save the list to some sort of data store that your server is using so you can quickly get access to the list at any future time. I'd suggest that you create a unique ID for each audio file that may make it easier to refer to in the future.
You can then offer your remote user a list of these audio files and they can pick one. The audio file they have picked can then be sent to the server as part of a request to play a specific audio file. I would suggest that files should be requested only by ID (for security reasons), though your user interface may choose to display the original path name if that is important or relevant.
When your server receives a request to play an audio file with a specific ID, it can then consult its data store to find out which audio file is the one with specific ID. This is an important step because forcing the client to request the file by ID (not by path) makes it so the client can only request audio files that you've previously scanned and found it was safe to make public. There would be no vulnerabilities where a remote client could request some other file that you did not intend to share.
Once your server looks in the data store and finds the audio file with that particular ID, it can then ge the local path from the data store and can then read the audio file and can send/stream it to the remote client.
As an example, in step 3 and 4, then client may send a URL that looks like this:
http://someserver/play/5934902
That would be a request to play the audio file with an id of 5934902. Your server would then have a route handler for /play/:id that would use the id to then carry out steps 4 and 5.

Javascript / Jquery reading text file

I know there is a lot of information about reading data from local file in javascript but I'd like to ensure myself that there is no possibility to just display window to pick file path and read it, everything on the client side. Just YES/NO
I need to write a script which generates schedule completely on the client side. Is there any way to do this except copy-paste data into some text area?
P.S
I clarified my first question

Javascript cannot create files on the client-side. It would be a massive security risk if it were allowed to do so.
The general pattern is to create the file on the server (either physically or in memory) and serve it to the user for download.

Download one file, with pieces stored on more than one server (HTTP)

I am working on a file upload system which will store individual parts of large files on more than one server. So the distribution of a 1GB file will look something like this:
Server 1: 0-128MB
Server 2: 128MB-256MB
Server 2: 256MB-384MB
... etc
The intention of this is to allow for redundancy (each part will exist on more than one server), security (no one server has access to the entire file), and cost (bandwidth expenses are distributed).
I am curious if anyone has an opinion on how I might be able to "trick" web browsers into downloading the various parts all in one link.
What I had in mind was something like:
Browser is linked to Server 1, which provides a content-size of the full file
Once 128MB is served, Server 1 will intentionally close the connection
Hopefully, the browser will try to restart the download, requesting Server 1
Server 1 provides a 3XX redirect to Server 2
Browser continues downloading from Server 2
I don't know for certain that my example works, as I haven't tested it yet. I was curious if there were other solutions someone might have?
I'd like to make the whole process as easy as possible (ideally requiring no work beyond a simple download). I don't want the users to have to use another program (ie: cat'ing the files together). I'd also like to not use a proxy server, since it would incur extra bandwidth costs.
As far as I'm aware, there is no javascript solution for writing a file, if there was one, that would be great.

AFAIK this is not possible by using the HTTP protocol. You can probably use a custom browser extension but it would depend on the browser. Another alternative is to create a Java applet that would download the file from different servers. The applet can accept the URLs to the different servers as parameters.

To save the generated file:
https://stackoverflow.com/a/4551467/329062
That solution stores the file in memory though, so it won't work with very large files.
You can download the partial files into a JS variable using JSONP. That will also let you get around the same-origin policy.

Javascripts security model will only allow you to access data from the same origin where the Javascript came from - i.e. not multiple servers.
If you are going to have the file bits on multiple servers, you will need the user to load the web page, fetch the bit and then finally stick the bits together in the correct order. If you can manage to get all your users to do this (correctly), you are a better man than I.

It's possible to do in modern browsers over standard HTTP.
You can use XHR2 with CORS to download file chunks as ArrayBuffers and then merge them using Blob constructor and use createObjectURL to send merged file to the user.
However, I suspect that browsers will store these objects in RAM, so it's probably a bad idea to use it for large files.

Javascript / S3 multiple file upload

I'm looking for a way to select and upload multiple files to Amazon S3, something in the vein of Uploadify, but with the following constraints :
No flash or HTML 5 - but AJAX and iframe tricks are allowed.
Multiple selection must happen in a single dialog.
Files must be sent directly to Amazon (there is no intermediary server to handle them).
Also, Amazon S3 does not allow uploading multiple files in a single request, so this means every file will have to be sent with a distinct request to a distinct URL, so I need to specify what those URLs will be.
Are any components around that might do this, or any known techniques I could leverage to build my own? Thank you.

Plain HTML file uploads are limited to one file at a time.
Javascript is restricted from accessing the user's file system, and must depend on the HTML file upload mechanism.
Consequently, we are left only with the complex options such as Flash, Java applets, or browser plugins. If they are not acceptable, you will not be able to support multiple file uploads.

Develop Reference

JavaScript is the programming language of the Web.