Multiple file stream instead of download to disk and then zip?

Multiple file stream instead of download to disk and then zip? - javascript

I have an API method that when called and passed an array of file keys, downloads them from S3. I'd like to stream them, rather than download to disk, followed by zipping the files and returning that to the client.
This is what my current code looks like:
reports.get('/xxx/:filenames ', async (req, res) => {
var AWS = require('aws-sdk');
var s3 = new AWS.S3();
var str_array = filenames.split(',');
for (var i = 0; i < str_array.length; i++) {
var filename = str_array[i].trim();
localFileName = './' + filename;
var params = {
Bucket: config.reportBucket,
Key: filename
}
s3.getObject(params, (err, data) => {
if (err) console.error(err)
var file = require('fs').createWriteStream(localFileName);
s3.getObject(params).createReadStream().pipe(file);
console.log(file);
})
}
});
How would I stream the files rather than downloading them to disk and how would I zip them to return that to the client?

Main problem is to zip multiple files.
More specifically, download them from AWS S3 in bulk.
I've searched through AWS SDK and didn't find bulk s3 operations.
Which brings us to one possible solution:
Load files one by one and store them to folder
Zip folder (with some package like this)
Send zipped folder
This is raw and untested example, but it might give you the idea:
// Always import packages at the beginning of the file.
const AWS = require('aws-sdk');
const fs = require('fs');
const zipFolder = require('zip-folder');
const s3 = new AWS.S3();
reports.get('/xxx/:filenames ', async (req, res) => {
const filesArray = filenames.split(',');
for (const fileName of filesArray) {
const localFileName = './' + filename.trim();
const params = {
Bucket: config.reportBucket,
Key: filename
}
// Probably you'll need here some Promise logic, to handle stream operation end.
const fileStream = fs.createWriteStream(localFileName);
s3.getObject(params).createReadStream().pipe(fileStream);
}
// After that all required files would be in some target folder.
// Now you need to compress the folder and send it back to user.
// We cover callback function in promise, to make code looks "sync" way.
await new Promise(resolve => zipFolder('/path/to/the/folder', '/path/to/archive.zip', (err) => {resolve()});
// And now you can send zipped folder to user (also using streams).
fs.createReadStream('/path/to/archive.zip').pipe(res);
});
Info about streams link and link
Attention: You'll probably could have some problems with async behaviour, according to streams nature, so, please, first of all, check if all files are stored in folder before zipping.
Just a mention, I've not tested this code. So if any questions appear, let's debug together

Related

How to read a large csv as a stream

I am using the #aws-sdk/client-s3 to read a json file from S3, take the contents and dump it into dynamodb. This all currently works fine using:
const data = await (await new S3Client(region).send(new GetObjectCommand(bucketParams)));
And then deserialising the response body etc.
However, I'm looking to migrate to use jsonlines format, effectiely csv, in the sense it needs to be streamed in line by line or in chunks of lines and processed. I can't seem to find a way of doing this that doesnt load the entire file into memory (using response.text() etc).
Ideally, I would like to pipe the response into a createReadStream, and go from there.

I found this example with createReadStream() form module fs in node.js:
import fs from 'fs';
function read() {
let data = '';
const readStream = fs.createReadStream('business_data.csv', 'utf-8');
readStream.on('error', (error) => console.log(error.message));
readStream.on('data', (chunk) => data += chunk);
readStream.on('end', () => console.log('Reading complete'));
};
read();
You can modify it for your use. Hope this helps.
Connection to your S3 you can do by:
var s3 = new AWS.S3({apiVersion: '2006-03-01'});
var params = {Bucket: 'myBucket', Key: 'myImageFile.jpg'};
var file = require('fs').createWriteStream('/path/to/file.jpg');
s3.getObject(params).createReadStream().pipe(file);
see here

File not uploading completely to Google Bucket NodeJs

The file does get uploaded but instead of it being 100s of Kbs or few MBs its just a couple of Bytes, and when trying to open it shows blank or "file not exist" error. same issue with text files and images.
I believe the problem is with the stream not waiting to read to the whole file before uploading it to bucket.
Code sample is the one provided by google from their "Google Cloud Storage: Node.js Client" Documentation
function main(
bucketName = 'myBucket',
destFileName = 'MyUploadedFile',
contents = 'testFile.pdf'
) {
// Imports the Google Cloud client library
const {Storage} = require('#google-cloud/storage');
// Import Node.js stream
const stream = require('stream');
// Creates a client
const storage = new Storage();
// Get a reference to the bucket
const myBucket = storage.bucket(bucketName);
// Create a reference to a file object
const file = myBucket.file(destFileName);
const passthroughStream = new stream.PassThrough();
passthroughStream.write(contents);
//console.log(passthroughStream.write(contents))
passthroughStream.end();
async function streamFileUpload() {
passthroughStream.pipe(file.createWriteStream({resumable:true,gzip:true})).on('finish', () => {
// The file upload is complete
});
console.log(`${destFileName} uploaded to ${bucketName}`);
}
streamFileUpload().catch(console.error);
// [END storage_stream_file_upload]
}
main(...process.argv.slice(2));

Downloading Image locally from GitHub Raw link using fs.writeFileSync() JS

Currently trying to download image from GitHub locally. Everything seems to work, the fetch goes through with a 200 OK response, however, I don't understand how to store image itself:
const rawGitLink = "https://raw.githubusercontent.com/cardano-foundation/CIPs/master/CIP-0001/CIP_Flow.png"
const folder = "/Folder"
const imageName = "/Test"
const imageResponse = await axios.get(rawGitLink)
fs.writeFileSync(___dirname + folder + imageName, imageResponse, (err) => {
//Error handling
}
)

Four problems had to be fixed:
Image name must include png format for this case
The response must be in the correct format as a buffer for an image
You must write the response data and not the object itself
__dirname only needs two underscores
const rawGitLink = "https://raw.githubusercontent.com/cardano-foundation/CIPs/master/CIP-0001/CIP_Flow.png"
const folder = "/Folder"
const imageName = "/Test.png"
const imageResponse = await axios.get(rawGitLink, { responseType: 'arraybuffer' });
fs.writeFileSync(__dirname + folder + imageName, imageResponse.data)

Axios returns a special object: https://github.com/axios/axios#response-schema
let {data} = await axios.get(...)
await fs.writeFile(filename, data) // you can use fs.promises instead of sync
As #Leau said you should include the extension on the filename
Another sugestion is to use the path module to create the filename:
filename = path.join(__dirname, "/Folder", "Test.png")

Node-less way to generate a CID that matches IPFS-Desktop CID

I'd like to generate a CID (Content identifier) for a file in javascript without having access to an IPFS node or the internet. I've tried using js-multihashing-async to first hash the file and js-cid to generate a CID from the hash but I get a different CID than if I just add the file to ipfs-desktop. It looks like the problem is an IPFS node chunks data and the CID is for the DAG that links the files' chunks. I've tried this library but it doesn't produce the same CID as ipfs-desktop does for the same file. This question is essentially the same as mine but none of the answers give a CID that matches the ipfs-desktop-generated CID.

ipfs-only-hash is the right module to use to create an IPFS CID from a file or a Buffer, without needing to start an IPFS daemon. For the same input file and the same options, it should produce the same CID.
This example is from the ipfs-only-hash tests, where it verifies that it hashes the same buffer to the same CID as a js-ipfs node does.
test('should produce the same hash as IPFS', async t => {
const data = Buffer.from('TEST' + Date.now())
const ipfs = new Ipfs({ repo: path.join(os.tmpdir(), `${Date.now()}`) })
await new Promise((resolve, reject) => {
ipfs.on('ready', resolve).on('error', reject)
})
const files = await ipfs.add(data)
const hash = await Hash.of(data)
t.is(files[0].hash, hash)
})
https://github.com/alanshaw/ipfs-only-hash/blob/dbb72ccfff45ffca5fbea6a7b1704222f6aa4354/test.js#L21-L33
I'm one of the maintainers of IPFS Desktop, and under the hood, that app calls ipfs.add on http api for the local IPFS daemon here
When adding or hashing a file manually via the api, there are options to alter how files are chunked into blocks, how those blocks are linked together, and how the blocks are hashed. If any option values differ then the resulting hash and the CID that contains it will be different, even if the input file is the same.
You can experiment with those options and see a visualisation of the resulting DAG (Directed Acyclic Graph) structure here: https://dag.ipfs.io/
For a deep dive on how IPFS chunks and hashes files you can see the author of the ipfs-only-hash and maintainer of js-ipfs explain it here https://www.youtube.com/watch?v=Z5zNPwMDYGg

For the sake of posterity, here is how to match an image's CID downloaded via fetch to the CID generated from ipfs-desktop for the same image (added as a file from the local drive). You have to remove the prefix data:*/*;base64, that is prepended to the image's base64string and decode the string into a buffer array. Then you get the matching CID.
async testHashes() {
const url = "https://raw.githubusercontent.com/IanPhilips/jst-cids-test/master/src/23196210.jpg";
fetch(url)
.then(response => response.blob())
.then(blob => new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onloadend = () => resolve(reader.result);
reader.readAsDataURL(blob)
})).then(async dataUrl =>{
const strData = dataUrl as string;
// remove "data:*/*;base64," from dataUrl
const endOfPrefix = strData.indexOf(",");
const cleanStrData = strData.slice(endOfPrefix+1);
const data = Buffer.from(cleanStrData, "base64");
const hash = await Hash.of(data);
console.log("fetch data CID: " + hash); // QmYHzA8euDgUpNy3fh7JRwpPwt6jCgF35YTutYkyGGyr8f
});
console.log("ipfs-desktop CID: QmYHzA8euDgUpNy3fh7JRwpPwt6jCgF35YTutYkyGGyr8f");
}

Upload a file stream to S3 without a file and from memory

I'm trying to create a csv from a string and upload it to my S3 bucket. I don't want to write a file. I want it all to be in memory.
I don't want to read from a file to get my stream. I would like to make a stream with out a file. I would like this method createReadStream, but instead of a file, I would like to pass a string with my stream's contents.
var AWS = require('aws-sdk'),
zlib = require('zlib'),
fs = require('fs');
s3Stream = require('s3-upload-stream')(new AWS.S3()),
// Set the client to be used for the upload.
AWS.config.loadFromPath('./config.json');
// Create the streams
var read = fs.createReadStream('/path/to/a/file');
var upload = s3Stream.upload({
"Bucket": "bucket-name",
"Key": "key-name"
});
// Handle errors.
upload.on('error', function (error) {
console.log(error);
});
upload.on('part', function (details) {
console.log(details);
});
upload.on('uploaded', function (details) {
console.log(details);
});
read.pipe(upload);

You can create a ReadableStream and push your string directly to it which, can then be consumed by your s3Stream instance.
const Readable = require('stream').Readable
let data = 'this is your data'
let read = new Readable()
read.push(data) // Push your data string
read.push(null) // Signal that you're done writing
// Create upload s3Stream instance and attach listeners go here
read.pipe(upload)

Develop Reference

JavaScript is the programming language of the Web.

Multiple file stream instead of download to disk and then zip? - javascript

Related

How to read a large csv as a stream

File not uploading completely to Google Bucket NodeJs

Downloading Image locally from GitHub Raw link using fs.writeFileSync() JS

Node-less way to generate a CID that matches IPFS-Desktop CID

Upload a file stream to S3 without a file and from memory

Categories

Resources