How to get only the top layer files from AWS S3 object?

How to get only the top layer files from AWS S3 object? - javascript

I use Javascript to get the file names from AWS S3 bucket. After i get the complete list of file names from the bucket, i parse & manipulate it & list them in front-end. In this case, i want to get rid of the problem which will arise when the bucket with filled with huge amount of data (i.e) i'll run out of memory when i try to manipulate with a shocking amount of data. So, i only need the file names of the very first layer.
Example :
The complete object in S3 bucket :
{
new_folder: {...},
some_file.png: {...}
}
Here i only need the names -> new_folder, some_file.png
Below is the code which i use now :
const AWS = require('aws-sdk');
export default async function wasabiActions(dataObj) {
var accessKeyId = '************';
var secretAccessKey = '********************';
var wasabiEndpoint = new AWS.Endpoint('s3.us-west-1.wasabisys.com');
var s3 = await new AWS.S3({
endpoint: wasabiEndpoint,
accessKeyId: accessKeyId,
secretAccessKey: secretAccessKey
});
var params = {
Bucket: 'bucket_name',
};
s3.listObjectsV2(params, function(err, data) {
if (!err) {
var files = [];
data.Contents.forEach(function(element) {
files.push(element.Key.split('/').filter((name) => name.length > 0));
});
console.log(files);
var parsedData = wasabiDataParser(files);
console.log(parsedData);
}
});
}
Thanks in advance! :)

You can use pagination. Some AWS operations return results that are incomplete and require subsequent requests in order to attain the entire result set. Check here for more details. https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html

Related

Cannot seem to access S3 bucket contents with key

I am using the aws javascript sdk and for some reason I can access the entire buckets contents, but when I add a prefix I get null returned rather than a subset of those contents. For example, the following returns all bucket contents:
AWS.config.accessKeyId = this.s3.config["accessKeyId"];
AWS.config.secretAccessKey = this.s3.config["secretAccessKey"];
AWS.config.region = 'us-east-2';
var aws = new AWS.S3();
var all_params = {Bucket: 'bucket-name'};
new Promise(resolve => {
aws.listObjectsV2(all_params, function (err, url) {
console.log(url)
resolve(url)
});
})
The object returned contains 1000 records, most of them in the format Key: "clients/after_fronts/000...". However when I run the following, I get a null object:
AWS.config.accessKeyId = this.s3.config["accessKeyId"];
AWS.config.secretAccessKey = this.s3.config["secretAccessKey"];
AWS.config.region = 'us-east-2';
var key = "clients"
var aws = new AWS.S3();
var params = {Bucket: 'bucket-name', prefix: key};
return new Promise(resolve => {
aws.listObjectsV2(params, function (err, url) {
console.log(url)
resolve(url)
});
})
I thought it might be a permissions issue but I'm not sure why it returns data without a prefix and then no data with the prefix. What else could be going on?

Well, after staring at this for an hour I realized the docs call for Prefix not prefix and that capitalization made all the difference.

Multiple file stream instead of download to disk and then zip?

I have an API method that when called and passed an array of file keys, downloads them from S3. I'd like to stream them, rather than download to disk, followed by zipping the files and returning that to the client.
This is what my current code looks like:
reports.get('/xxx/:filenames ', async (req, res) => {
var AWS = require('aws-sdk');
var s3 = new AWS.S3();
var str_array = filenames.split(',');
for (var i = 0; i < str_array.length; i++) {
var filename = str_array[i].trim();
localFileName = './' + filename;
var params = {
Bucket: config.reportBucket,
Key: filename
}
s3.getObject(params, (err, data) => {
if (err) console.error(err)
var file = require('fs').createWriteStream(localFileName);
s3.getObject(params).createReadStream().pipe(file);
console.log(file);
})
}
});
How would I stream the files rather than downloading them to disk and how would I zip them to return that to the client?

Main problem is to zip multiple files.
More specifically, download them from AWS S3 in bulk.
I've searched through AWS SDK and didn't find bulk s3 operations.
Which brings us to one possible solution:
Load files one by one and store them to folder
Zip folder (with some package like this)
Send zipped folder
This is raw and untested example, but it might give you the idea:
// Always import packages at the beginning of the file.
const AWS = require('aws-sdk');
const fs = require('fs');
const zipFolder = require('zip-folder');
const s3 = new AWS.S3();
reports.get('/xxx/:filenames ', async (req, res) => {
const filesArray = filenames.split(',');
for (const fileName of filesArray) {
const localFileName = './' + filename.trim();
const params = {
Bucket: config.reportBucket,
Key: filename
}
// Probably you'll need here some Promise logic, to handle stream operation end.
const fileStream = fs.createWriteStream(localFileName);
s3.getObject(params).createReadStream().pipe(fileStream);
}
// After that all required files would be in some target folder.
// Now you need to compress the folder and send it back to user.
// We cover callback function in promise, to make code looks "sync" way.
await new Promise(resolve => zipFolder('/path/to/the/folder', '/path/to/archive.zip', (err) => {resolve()});
// And now you can send zipped folder to user (also using streams).
fs.createReadStream('/path/to/archive.zip').pipe(res);
});
Info about streams link and link
Attention: You'll probably could have some problems with async behaviour, according to streams nature, so, please, first of all, check if all files are stored in folder before zipping.
Just a mention, I've not tested this code. So if any questions appear, let's debug together

Upload a file stream to S3 without a file and from memory

I'm trying to create a csv from a string and upload it to my S3 bucket. I don't want to write a file. I want it all to be in memory.
I don't want to read from a file to get my stream. I would like to make a stream with out a file. I would like this method createReadStream, but instead of a file, I would like to pass a string with my stream's contents.
var AWS = require('aws-sdk'),
zlib = require('zlib'),
fs = require('fs');
s3Stream = require('s3-upload-stream')(new AWS.S3()),
// Set the client to be used for the upload.
AWS.config.loadFromPath('./config.json');
// Create the streams
var read = fs.createReadStream('/path/to/a/file');
var upload = s3Stream.upload({
"Bucket": "bucket-name",
"Key": "key-name"
});
// Handle errors.
upload.on('error', function (error) {
console.log(error);
});
upload.on('part', function (details) {
console.log(details);
});
upload.on('uploaded', function (details) {
console.log(details);
});
read.pipe(upload);

You can create a ReadableStream and push your string directly to it which, can then be consumed by your s3Stream instance.
const Readable = require('stream').Readable
let data = 'this is your data'
let read = new Readable()
read.push(data) // Push your data string
read.push(null) // Signal that you're done writing
// Create upload s3Stream instance and attach listeners go here
read.pipe(upload)

Can't download AWS S3 File in nodejs

I'm trying to use Amazon's S3 service, I managed to upload GZipped files to my bucket but I can't retrieve them. I tried using the code example that I've found here, everything works fine when I'm uploading the files, but I can't download them.
This is my upload code:
var s3 = new AWS.S3();
s3.headBucket({Bucket: bucketName}, function (err) {
if (err) s3.createBucket({Bucket: bucketName}, cb);
var body = fs.createReadStream(file).pipe(zlib.createGzip());
s3.upload({Bucket: bucketName, Key: key, Body: body}).send(cb);
});
ANd this is my download code:
var s3 = new AWS.S3();
var params = {Bucket: bucketName, Key: key};
var outFile = require('fs').createWriteStream(file);
s3.getObject(params).createReadStream().pipe(zlib.createGunzip()).pipe(outFile);
But I get error throw new Error('Cannot switch to old mode now.'); on the last line.
and I can't figure out how to fix it, I'm using node 0.10.25(and I can't change it).
So I tried using this:
var params = {Bucket: bucketName, Key: key};
s3.getObject(params, function(err, data) {
var outFile = require('fs').createWriteStream(file);
var read = AWS.util.buffer.toStream(data.Body);
read.pipe(zlib.createGzip()).pipe(outFile);
read.on('end', function(){cb();});
});
but often I get error 104(unexpected end of input).
Anyone has some ideas?

Unexpected end of input is perhaps due to pipe getting closed prematurely or some other error was encountered in the middle of reading a fixed-size block or data structure.
You can look at - https://github.com/minio/minio-js instead as well as an alternative, it is fully written in Streams2 style.
Here is an example.
$ npm install minio
$ cat >> get-object.js << EOF
var Minio = require('minio')
var fs = require('fs')
// find out your s3 end point here:
// http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
var s3Client = new Minio({
url: 'https://<your-s3-endpoint>',
accessKey: 'YOUR-ACCESSKEYID',
secretKey: 'YOUR-SECRETACCESSKEY'
})
var outFile = fs.createWriteStream('test.txt');
s3Client.getObject('mybucket', 'my-key', function(e, dataStream) {
if (e) {
return console.log(e)
}
dataStream.pipe(outFile)
})
EOF

How to get contents of a text file from AWS s3 using a lambda function?

I was wondering if I could set up a lambda function for AWS, triggered whenever a new text file is uploaded into an s3 bucket. In the function, I would like to get the contents of the text file and process it somehow. I was wondering if this was possible...?
For example, if I upload foo.txt, with contents foobarbaz, I would like to somehow get foobarbaz in my lambda function so I can do stuff with it. I know I can get metadata from getObject, or a similar method.
Thanks!

The S3 object key and bucket name are passed into your Lambda function via the event parameter. You can then get the object from S3 and read its contents.
Basic code to retrieve bucket and object key from the Lambda event is as follows:
exports.handler = function(event, context, callback) {
const bkt = event.Records[0].s3.bucket.name;
const key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
};
Once you have the bucket and key, you can call getObject to retrieve the object:
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
exports.handler = function(event, context, callback) {
// Retrieve the bucket & key for the uploaded S3 object that
// caused this Lambda function to be triggered
const Bucket = event.Records[0].s3.bucket.name;
const Key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
// Retrieve the object
s3.getObject({ Bucket, Key }, function(err, data) {
if (err) {
console.log(err, err.stack);
callback(err);
} else {
console.log("Raw text:\n" + data.Body.toString('ascii'));
callback(null, null);
}
});
};
Here's an updated JavaScript example using ES6-style code and promises, minus error-handling:
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
exports.handler = async (event, context) => {
const Bucket = event.Records[0].s3.bucket.name;
const Key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
const data = await s3.getObject({ Bucket, Key }).promise();
console.log("Raw text:\n" + data.Body.toString('ascii'));
};
A number of posters have asked for the equivalent in Java, so here's an example:
package example;
import java.net.URLDecoder;
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.lambda.runtime.events.S3Event;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;
import com.amazonaws.services.s3.event.S3EventNotification.S3EventNotificationRecord;
public class S3GetTextBody implements RequestHandler<S3Event, String> {
public String handleRequest(S3Event s3event, Context context) {
try {
S3EventNotificationRecord record = s3event.getRecords().get(0);
// Retrieve the bucket & key for the uploaded S3 object that
// caused this Lambda function to be triggered
String bkt = record.getS3().getBucket().getName();
String key = record.getS3().getObject().getKey().replace('+', ' ');
key = URLDecoder.decode(key, "UTF-8");
// Read the source file as text
AmazonS3 s3Client = new AmazonS3Client();
String body = s3Client.getObjectAsString(bkt, key);
System.out.println("Body: " + body);
return "ok";
} catch (Exception e) {
System.err.println("Exception: " + e);
return "error";
}
}
}

I am using lambda function with a python 3.6 environment.
The code below will read the contents of a file main.txt inside bucket my_s3_bucket. Make sure to replace name of bucket and file name according to your needs.
def lambda_handler(event, context):
# TODO implement
import boto3
s3 = boto3.client('s3')
data = s3.get_object(Bucket='my_s3_bucket', Key='main.txt')
contents = data['Body'].read()
print(contents)

You can use data.Body.toString('ascii') to get the contents of the text file, assuming that the text file was encoded used ascii format. You can also pass other encoding types to the function. Check out Node-Buffer for further details.

The new AWS SDK v3 means that the files are read back as a readable stream. You'll need to take that into consideration from now on as well.
https://carova.io/snippets/read-data-from-aws-s3-with-nodejs

Develop Reference

JavaScript is the programming language of the Web.

How to get only the top layer files from AWS S3 object? - javascript

You can use pagination. Some AWS operations return results that are incomplete and require subsequent requests in order to attain the entire result set. Check here for more details. https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html

Related

Cannot seem to access S3 bucket contents with key

Multiple file stream instead of download to disk and then zip?

Upload a file stream to S3 without a file and from memory

Can't download AWS S3 File in nodejs

How to get contents of a text file from AWS s3 using a lambda function?

Categories

Resources