I'm trying to send multiple files from the client to the NodeJS server using WebSockets.
To send one file, I currently do the following:
// Client
let upload = document.getElementById('upload')
button.onclick = async function() {
let file = upload.files[0];
let byteFile = await getAsByteArray(file);
socket.send(byteFile);
}
async function getAsByteArray(file) {
return new Uint8Array(await readFile(file))
}
function readFile(file) {
return new Promise((resolve, reject) => {
let reader = new FileReader()
reader.addEventListener("loadend", e => resolve(e.target.result))
reader.addEventListener("error", reject)
reader.readAsArrayBuffer(file)
});
}
// Server
ws.on('message', function incoming(message) {
// This returns a buffer which is what I'm looking for when working with a single file.
console.log(message);
return;
}
This works great for one file. I'm able to use the buffer and process the file as I would like. To send two files, my thought was to convert each file to a Uint8Array (as I did for the single file) and push to an array like so:
// Client
let filesArray = [];
let files = upload.files; // Grab uploaded Manifests
for (let file of files) {
let byteFile = await getAsByteArray(file);
filesArray.push(byteFile);
}
socket.send(filesArray);
In the same way as with one file, the server returns a buffer for the array that was sent; however, I'm not sure how to work with it. I need each file to be their own buffer in order to work with them. Am I taking the wrong approach here? Or am I just missing some conversion to be able to work with each file?
This works great for one file.
Not really. Unless it is supposed to be used in some very simplistic setup, probably in an isolated (from the internet) network.
You literally send a sequence of bytes to the server which reads it and what is it going to do with it? Save it to disk? Without validating? But how can it validate a random sequence of bytes, it has no hint about what it is? Secondly, where will it save it? Under what name? You didn't send any metadata like filename. Is it supposed to generate a random name for it? How will the user know that this is his file? Heck, as it is you don't even know who sent that file (no authentication). Finally, what about security? Can I open a WebSocket connection to your server and spam it with arbitrary sequences of data, effictively killing it? You probably need some authentication, but even with it, can any user spam such upload? Maybe you additionally need tokens with timeouts for that (but then you have to think about how will your server issue such tokens).
I need each file to be their own buffer in order to work with them.
No, you don't. The bare minimum you need is (1) the ability to send files with metadata from the client and (2) the ability to read files with metadata on the server side. You most likely need some authentication mechanism as well. Typically you would use classical HTTP for that, which I strongly encourage you to utilize.
If you want to stick with WebSockets, then you have to implement those already well established mechanisms by yourself. So here's how I would do that:
(1) Define a custom protocol on top of WebSocket. Each frame should have a structure, for example first two bytes indicating "size of command", next X bytes (previous 2 bytes interpreted as int of size 16) the command as string. On the server side you read that command, map it to some handler, and run appropriate action. The data that the command should process, is the data from the remaining bytes of the frame.
(2) Setup authentication. Not in the scope of this answer, just indicating it is crucial. I'm putting this after (1) because you can reuse the protocol for that.
(3) Whenever you want to upload a file: send a command "SEND" to the server. In the same frame, after "SEND" command put metadata (file name, size, content type, etc.), you can encode it as JSON prefixed with length. Afterwards put the content of the file in the buffer.
This solution should obviously be refined with (mentioned earlier) tokens. For proper responsivness and concurrency, you should probably split large files into separate WebSocket frames (which complicates the design a lot).
Anyway, as you can see, the topic is far from trivial and requires lots of experience. And it is basically reimplementing what HTTP does anyway. Again: I strongly suggest you use plain old HTTP.
Send each buffer in separate message:
button.onclick = async function() {
upload.files.forEach(file => socket.send(await getAsByteArray(file)));
}
Related
I am trying to make a client-server application where the client sends the server a list of filenames in an array such as let files = ["Cat.jpeg", "Moon.gif"] and I am trying to use Buffers so the server can send a response back to the client and open these files. But I am not sure how to approach this.
I tried
let imageNames = Buffer.from(files)
but I am not sure how to extract this and read these values.
An approach would be to return an array of BLOB objects (or buffers of the files that you have read from the server) back to the frontend and get the JS to convert the BLOBs/Buffers to the original file types for downloading.
Here's an example with PDFs that can easily be adapted:
Node:
// Express route
let fileBuffer = fs.fileReadSync('path/to/file.pdf')
return res.status(200).send(fileBuffer)
Frontend (React & Axios):
axios.get('api/endpoint')
.then(response => {
const pdfBlob = new Blob(
[Response.data],
{type:'application/pdf;charset=utf8'}
)
// Save file here
})
.catch(error => {
// do something with error
})
Note
The Blob API is part of the File API, which is in Working Draft and can change between browsers and versions. So ensure to do extensive browser testing.
I specifically would suggest instead of buffers use streams to pipeline the file in res individually.
Create a route which receives a filename, you stream down the file in the response.
In the front-end can make individual request for each file.
Because it will not overload the server and will increase its limit to handle a large number of requests in comparison to a single request doing heavy CPU utilization of all these files and returning response, for which Node.JS is really bad.
I want to handle synchronizing between browser cache (indexedDB) and S3. Therefore I utilize timestamps.
The tricky part is, that my browser application needs to know the exact "last update" timestamp of the file in S3 to store it alongside the locally cached file (so I can sense differences on the one or other side by timestamps being not equal).
Currently, my best solution is:
// Upload of file
var upload = new AWS.S3.ManagedUpload({
params: {
// some params
}
});
await upload.promise();
// Call of listObjectsV2
var s3Objects = await s3.listObjectsV2(params).promise();
// get "LastModified" value from listObjectsV2
I really dislike this solution as it makes an extra call for "listObjectsV2", that needs time and is charged by AWS.
From the top of my head, I expected there should be something in the return params of the upload, that I can utilize. But I can't find anything. What am I missing?
Looking at the documentation for the AWS SDK for JavaScript, I don't think you're missing anything at all: https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3/ManagedUpload.html#promise-property
It is simply not returning any date time field after a successfull upload.
(I've been searching for something like this myself, only for NET. In the end I had to start sending metadata requests after uploading.)
Perhaps listening to S3 events could be an alternative: https://aws.amazon.com/blogs/aws/s3-event-notification/
I have a UI which uploads files to s3 using pre-signed urls (gets the pre-signed urls from the server). It works quite nicely, except now there's a requirement to add one more layer of encryption on top (in-case the bucket got exposed by mistaken policies).
I understand there are ways to encrypt at the JavaScript layer using asymmetric keys, but it seems I've to read the file completely in memory, encrypt and then send. So if I upload 1GB, it will crash the browser/tab.
So is there an efficient way around this? I right now just use the $http angular service to upload the file. It's capable of handling 1GB files on its own - it seems to internally break the file into chunks & send it across.
I wasn't sure how to emulate that chunk behavior on my own. I can make use of File.slice() to read a part & encrypt. However the pre-signed url will upload it as a single entity. The next part will only replace the first one. Not sure how to combine multi-part upload with pre-signed urls.
I was also wondering if there's anyway to intercept the chunks that the $http service sends out, encrypt the body & then let them go again?
If there are no options, I would have to fall back to simply uploading the file to the server side, encrypt & push it to S3.
Create Cognito Identity Pool
Get tmp credentials use getCredentials function
Use aws-sdk library and createMultipartUpload method
function example:
function getCredentials() {
return new Promise((resolve, reject) => {
const cognitoIdentityPoolId = 'us-east-1:xxxxxxxx';
let cognitoIdentityId = '';
AWS.config.region = 'us-east-1';
AWS.config.credentials = new AWS.CognitoIdentityCredentials({
IdentityPoolId: cognitoIdentityPoolId
});
AWS.config.credentials.get(err => {
if (err) {
reject(err);
}
cognitoIdentityId = AWS.config.credentials.identityId;
let cognitoidentity = new AWS.CognitoIdentity();
cognitoidentity.getCredentialsForIdentity({
IdentityId: cognitoIdentityId
}, (err, data) => {
if (err) {
reject(err);
} else {
resolve(data.Credentials);
}
});
});
});
}
I had asked this question assuming I could encrypt large files using public key encryption. I've seen a few strings getting encrypted in JS using public key, hence assumed I could replicate the logic for a large file.
Turns out, encrypting a large file using public/private keys isn't feasible. It would require firstly a huge key itself &/or would be time consuming to de/encrypt (even if chunked).
So, in the end I followed the industry practice - which was to generate a symmetric key & encrypt the object, and then encrypt the symmetric key using my public. Which turns out what AWS does all along with aws:kms data-keys.
Once I resolved to using symmetric keys, I couldn't do it in the JS layer. I pushed it all the server. Transferring the file to the server & then to S3 again is not as slow as a I perceived.
I have data coming in through a websocket. It sends binary data in 20ms chunks. I need to concatenate each of these chunks so that a backend process can read the data as a continuous stream as it comes in.
//Create the file and append binary as it comes in
tmp.file({postfix: '.raw' },function (err, path, fd, cleanup) {
if (err) throw err;
newPath = path
fs.appendFile(newPath, new Buffer(binary), (err) => {
if (err) throw err;
})
})
//Read the file as it is written
fs.createReadStream(newPath).pipe(recStream);
For now I just have a simple half second delay on createReadStream to make sure there is data in the file.
This certainly does not feel correct and is not working. What is the correct way to go about this?
The best thing to do in this situation would be to tell the server you're receiving data from to pause until you're ready to process more (drain). Assuming that's not an option for you:
Start by writing incoming data to your destination stream. If write(chunk) returns false, this means the stream's internal buffer is full; it's time to start buffering subsequent data to disk. (The chunk you just wrote resulting in a false return value is buffered; do not write it to disk -- false does not mean the write failed, it's just a signal that the buffer has more data than highWaterMark.)
In a temporary folder, create a new file (A) write stream and write the next chunk(s) of incoming data to it. Do this until your destination stream emits a drain event.
When your destination drains:
Swap out buffer files. Close the current buffer file A and create a new temporary file B to begin writing new incoming data to it.
Open a read stream on the temporary file A and start piping data from it into your destination stream. You probably can't use the actual pipe() method since it will signal the end of data when you reach the end of the temp file, which is not what we want, since it is not the actual end of all incoming data. (Look at what pipe() does and implement that yourself, minus calling end().)
When the temp file's stream A emits end, delete the file A. Then go back to step 1 and begin the process again with file B. (If no data was written to file B in the meantime, go back to unbuffered operation, writing incoming data directly to the destination stream.)
Once the server signals that it is done sending data and all data has been read out of your temporary files, write(null) into the destination stream to signal that there is no more data. All done!
By swapping between temporary buffer files and deleting them once their data is processed, you don't have to worry about reading data as it is written to a file. Plus, you don't have to buffer the entire incoming data stream on disk.
Of course, this does make the assumption that your storage medium is guaranteed to accept writes faster than you will receive data over the network. This is probably safe, but things will likely break down if this assumption is incorrect. Test this using production systems -- what is the peak incoming data rate and how quickly can you write to disk on your prod system?
So I'm making a webApp that involves thousands of API queries. Since the API has a limit on the amount of queries I can send it per day, I was wondering if I could simply run the query loop a single time and then write the resulting objects to an array in a new file.
is this possible?
You want to make calls, then create cache, then use cache instead of call.
Are you on client side or in server side js ?
Client side will be tricky, but server side is easy :
Files can be a cache, so does a DB or a lot of tools (memcached, etc..).
Sure, just send the array to JSON.stringify() and write it to a file.
If you are using Node.js it would look something like this:
function writeResponse(resp, cb)
{
fs.writeFile('response.json', JSON.stringify(resp, null, 2), function (err) {
if (err) console.log(err);
if(cb) cb();
});
}
If you are in a browser you can use the Web Storage API which allows storage in key/value pairs up to 10Mb. If that doesn't work, maybe write a quick Node.js server that works as a caching proxy. A quick google search suggests that you might be able to find one ready to deploy.
You could probably use local storage, which is accessible across your domain, and will remain on the users computer indefinitely. Perhaps something like this:
function getData(){
var data = localStorage.getItem("myData");
if(data === null){
data = makeQuery();
localStorage.setItem("myData", data);
}
return data
}