Encrypting large files in JavaScript & storing in S3 efficiently

Encrypting large files in JavaScript & storing in S3 efficiently - javascript

I have a UI which uploads files to s3 using pre-signed urls (gets the pre-signed urls from the server). It works quite nicely, except now there's a requirement to add one more layer of encryption on top (in-case the bucket got exposed by mistaken policies).
I understand there are ways to encrypt at the JavaScript layer using asymmetric keys, but it seems I've to read the file completely in memory, encrypt and then send. So if I upload 1GB, it will crash the browser/tab.
So is there an efficient way around this? I right now just use the $http angular service to upload the file. It's capable of handling 1GB files on its own - it seems to internally break the file into chunks & send it across.
I wasn't sure how to emulate that chunk behavior on my own. I can make use of File.slice() to read a part & encrypt. However the pre-signed url will upload it as a single entity. The next part will only replace the first one. Not sure how to combine multi-part upload with pre-signed urls.
I was also wondering if there's anyway to intercept the chunks that the $http service sends out, encrypt the body & then let them go again?
If there are no options, I would have to fall back to simply uploading the file to the server side, encrypt & push it to S3.

Create Cognito Identity Pool
Get tmp credentials use getCredentials function
Use aws-sdk library and createMultipartUpload method
function example:
function getCredentials() {
return new Promise((resolve, reject) => {
const cognitoIdentityPoolId = 'us-east-1:xxxxxxxx';
let cognitoIdentityId = '';
AWS.config.region = 'us-east-1';
AWS.config.credentials = new AWS.CognitoIdentityCredentials({
IdentityPoolId: cognitoIdentityPoolId
});
AWS.config.credentials.get(err => {
if (err) {
reject(err);
}
cognitoIdentityId = AWS.config.credentials.identityId;
let cognitoidentity = new AWS.CognitoIdentity();
cognitoidentity.getCredentialsForIdentity({
IdentityId: cognitoIdentityId
}, (err, data) => {
if (err) {
reject(err);
} else {
resolve(data.Credentials);
}
});
});
});
}

I had asked this question assuming I could encrypt large files using public key encryption. I've seen a few strings getting encrypted in JS using public key, hence assumed I could replicate the logic for a large file.
Turns out, encrypting a large file using public/private keys isn't feasible. It would require firstly a huge key itself &/or would be time consuming to de/encrypt (even if chunked).
So, in the end I followed the industry practice - which was to generate a symmetric key & encrypt the object, and then encrypt the symmetric key using my public. Which turns out what AWS does all along with aws:kms data-keys.
Once I resolved to using symmetric keys, I couldn't do it in the JS layer. I pushed it all the server. Transferring the file to the server & then to S3 again is not as slow as a I perceived.

Related

How to send multiple files to server using WebSockets?

I'm trying to send multiple files from the client to the NodeJS server using WebSockets.
To send one file, I currently do the following:
// Client
let upload = document.getElementById('upload')
button.onclick = async function() {
let file = upload.files[0];
let byteFile = await getAsByteArray(file);
socket.send(byteFile);
}
async function getAsByteArray(file) {
return new Uint8Array(await readFile(file))
}
function readFile(file) {
return new Promise((resolve, reject) => {
let reader = new FileReader()
reader.addEventListener("loadend", e => resolve(e.target.result))
reader.addEventListener("error", reject)
reader.readAsArrayBuffer(file)
});
}
// Server
ws.on('message', function incoming(message) {
// This returns a buffer which is what I'm looking for when working with a single file.
console.log(message);
return;
}
This works great for one file. I'm able to use the buffer and process the file as I would like. To send two files, my thought was to convert each file to a Uint8Array (as I did for the single file) and push to an array like so:
// Client
let filesArray = [];
let files = upload.files; // Grab uploaded Manifests
for (let file of files) {
let byteFile = await getAsByteArray(file);
filesArray.push(byteFile);
}
socket.send(filesArray);
In the same way as with one file, the server returns a buffer for the array that was sent; however, I'm not sure how to work with it. I need each file to be their own buffer in order to work with them. Am I taking the wrong approach here? Or am I just missing some conversion to be able to work with each file?

This works great for one file.
Not really. Unless it is supposed to be used in some very simplistic setup, probably in an isolated (from the internet) network.
You literally send a sequence of bytes to the server which reads it and what is it going to do with it? Save it to disk? Without validating? But how can it validate a random sequence of bytes, it has no hint about what it is? Secondly, where will it save it? Under what name? You didn't send any metadata like filename. Is it supposed to generate a random name for it? How will the user know that this is his file? Heck, as it is you don't even know who sent that file (no authentication). Finally, what about security? Can I open a WebSocket connection to your server and spam it with arbitrary sequences of data, effictively killing it? You probably need some authentication, but even with it, can any user spam such upload? Maybe you additionally need tokens with timeouts for that (but then you have to think about how will your server issue such tokens).
I need each file to be their own buffer in order to work with them.
No, you don't. The bare minimum you need is (1) the ability to send files with metadata from the client and (2) the ability to read files with metadata on the server side. You most likely need some authentication mechanism as well. Typically you would use classical HTTP for that, which I strongly encourage you to utilize.
If you want to stick with WebSockets, then you have to implement those already well established mechanisms by yourself. So here's how I would do that:
(1) Define a custom protocol on top of WebSocket. Each frame should have a structure, for example first two bytes indicating "size of command", next X bytes (previous 2 bytes interpreted as int of size 16) the command as string. On the server side you read that command, map it to some handler, and run appropriate action. The data that the command should process, is the data from the remaining bytes of the frame.
(2) Setup authentication. Not in the scope of this answer, just indicating it is crucial. I'm putting this after (1) because you can reuse the protocol for that.
(3) Whenever you want to upload a file: send a command "SEND" to the server. In the same frame, after "SEND" command put metadata (file name, size, content type, etc.), you can encode it as JSON prefixed with length. Afterwards put the content of the file in the buffer.
This solution should obviously be refined with (mentioned earlier) tokens. For proper responsivness and concurrency, you should probably split large files into separate WebSocket frames (which complicates the design a lot).
Anyway, as you can see, the topic is far from trivial and requires lots of experience. And it is basically reimplementing what HTTP does anyway. Again: I strongly suggest you use plain old HTTP.

Send each buffer in separate message:
button.onclick = async function() {
upload.files.forEach(file => socket.send(await getAsByteArray(file)));
}

JavaScript NodeJS Buffers

I am trying to make a client-server application where the client sends the server a list of filenames in an array such as let files = ["Cat.jpeg", "Moon.gif"] and I am trying to use Buffers so the server can send a response back to the client and open these files. But I am not sure how to approach this.
I tried
let imageNames = Buffer.from(files)
but I am not sure how to extract this and read these values.

An approach would be to return an array of BLOB objects (or buffers of the files that you have read from the server) back to the frontend and get the JS to convert the BLOBs/Buffers to the original file types for downloading.
Here's an example with PDFs that can easily be adapted:
Node:
// Express route
let fileBuffer = fs.fileReadSync('path/to/file.pdf')
return res.status(200).send(fileBuffer)
Frontend (React & Axios):
axios.get('api/endpoint')
.then(response => {
const pdfBlob = new Blob(
[Response.data],
{type:'application/pdf;charset=utf8'}
)
// Save file here
})
.catch(error => {
// do something with error
})
Note
The Blob API is part of the File API, which is in Working Draft and can change between browsers and versions. So ensure to do extensive browser testing.

I specifically would suggest instead of buffers use streams to pipeline the file in res individually.
Create a route which receives a filename, you stream down the file in the response.
In the front-end can make individual request for each file.
Because it will not overload the server and will increase its limit to handle a large number of requests in comparison to a single request doing heavy CPU utilization of all these files and returning response, for which Node.JS is really bad.

Fetching and hashing a file from a URL in a Zapier code action in JavaScript

I was looking for a way to hash files manipulated in a Zapier code action and since Zapier does not provide such crypto transforms by default, I went on to implement it in Javascript.
The code below does this:
fetch the file from a public URL
hash said file using sha256
return the hash in the output
const crypto = require('crypto');
return fetch(inputData.fileUrl)
.then((res)=>{
console.log(res.ok);
console.log(res.status);
console.log(res.statusText);
return res.buffer()
})
.then((buffer)=>{
const hash = crypto.createHash('sha256');
hash.update(buffer);
callback (null, {"hashValue":hash.digest('hex')});
})
.catch(callback)
I am basically calling 'fetch' on the S3 URL, returning the result as a buffer() call on the response object. I then create a 'crypto' object from said buffer, from which I create a sha256 hash and hex digest.
Note:
Javascript code actions in Zapier can only take strings as input parameters, so any files you want to hash need to transit through a storage space (e.g. AWS S3 bucket) that has a publicly accessible URL. If you manipulate private/sensitive data, you may want to delete the file in a subsequent action in your zap. Beware also of non ascii characters in the URL (e.g. fetching from AWS S3 returns a 403 Forbidden error if your URL includes such characters as '€')
I hope Zapier users will find this useful, e.g. to automate workflows where you need to ensure file data integrity (accounting, invoicing ...)

Browser-based upload - AWS S3 Signed POST - Verify correct file is uploaded

I'm trying to understand how can I inform the backend that the user has uploaded a file to s3, its key and preventing the user from tampering with the key.
The files to be uploaded are private, meaning I'm generating signed GET URLs to download the files when the user wants them.
I'm using boto3 to create the presigned POST urls, so I'm not using a home-made implementation, but this is library-agnostic.
I have a javascript frontend and a backend API, the userflow for uploading a file would be more or less:
Browser -> API (GET /sign):
Request:
{
filename: 'something.txt
size: 12345
type: image/jpeg
}
Response:
The backend using filename calculates a random key (so that collision is avoided), then using ACCESS_KEY, SECRET_KEY, BUCKET_NAME and some more info calculates a signature, sending back the required parameters to the frontend
Backend does not save any data in its db, as there's no actual data until the client uploads the file.
{
url: https://mybucket.s3.amazonaws.com
fields: {
acl: 'private',
key: 'mykey',
signature: 'mysignature',
policy: 'mybase64 encoded policy'
}
}
Browser -> s3 (POST mybucket.s3.amazonaws.com):
Request:
Browser sends the actual file and the signed parameters as a multipart/form-data
{
acl: 'private',
key: 'mykey',
signature: 'mysignature',
policy: 'mybase64 encoded policy'
}
Response (documentation aws RESTObjectPOST):
Among headers and body we have:
Bucket
Key
ETag (which "is an MD5 hash of the object")
Location
Many javascript libraries and guides (i.e. FineUploader, Heroku) now simply proxies the key from the POST response to the backend. Then the backend checks that the file exists and adds the new file key to its database.
Now, the files to be uploaded are private, meaning I'm generating signed GET URLs to download the files when the user wants them.
Let's say there's UserA and UserB, UserB uploaded FileB.
What's preventing UserA from uploading a file, but sending to the backend the FileB key (which can be guessed or just be random input until they get something existing) and thus the backend saving that userA has fileB?
What I tought of:
keys are random and a key/file can only belong to one user, thus when userA tells "I've uploaded fileB" the backend responds "fileB is of another user"/"Error". This seems to me the easiest way to solve the problem, but I think I'm missing something (concurrency, leftover files,...)
On GET /sign backend stores "UserA will upload FileA", thus on the callback the backend checks that UserA wanted to upload FileA, if mismatching "Error"
Checking the ETag returned by s3, so that they must have the file to calculate its md5 and the user can't get FileB.
Setting in the policy that the key is XXXX, and on the callback recalculate the signature and checking that it's the same.
Using a temporary upload bucket so that a malicious user would need not only to guess a random key but also that this key is of a file being uploaded now. When the callback is called the backend moves the file with the specified key to the final bucket.
Maybe I'm overlooking something, but I can't find any solution online apart from (I think) hacky ones (setting the key to start with the username, ...).
I can't post more than two links, so some aws documentation is left out.

How can I write an array to another .js file?

So I'm making a webApp that involves thousands of API queries. Since the API has a limit on the amount of queries I can send it per day, I was wondering if I could simply run the query loop a single time and then write the resulting objects to an array in a new file.
is this possible?

You want to make calls, then create cache, then use cache instead of call.
Are you on client side or in server side js ?
Client side will be tricky, but server side is easy :
Files can be a cache, so does a DB or a lot of tools (memcached, etc..).

Sure, just send the array to JSON.stringify() and write it to a file.
If you are using Node.js it would look something like this:
function writeResponse(resp, cb)
{
fs.writeFile('response.json', JSON.stringify(resp, null, 2), function (err) {
if (err) console.log(err);
if(cb) cb();
});
}
If you are in a browser you can use the Web Storage API which allows storage in key/value pairs up to 10Mb. If that doesn't work, maybe write a quick Node.js server that works as a caching proxy. A quick google search suggests that you might be able to find one ready to deploy.

You could probably use local storage, which is accessible across your domain, and will remain on the users computer indefinitely. Perhaps something like this:
function getData(){
var data = localStorage.getItem("myData");
if(data === null){
data = makeQuery();
localStorage.setItem("myData", data);
}
return data
}

Develop Reference

JavaScript is the programming language of the Web.