how to detect file size / type while mid-download using axios or other requestor? - javascript

I have a scraper that looks for text on sites from a google search. However, occasionally the URLs for search are LARGE files without extension names (i.e. https://myfile.com/myfile/).
I do have a timeout mechanism in place, but by the time it times out, the file has already overloaded the memory. Is there any way to detect a file size or file type while it's being downloaded?
Here is my request function:
const getHtml = async (url, { timeout = 10000, ...opts } = {}) => {
const CancelToken = axios.CancelToken
const source = CancelToken.source()
try {
const timeoutId = setTimeout(() => source.cancel('Request cancelled due to timeout'), timeout)
let site = await axios.get(url, {
headers: {
'user-agent': userAgent().toString(),
connection: 'keep-alive', // self note: Isn't this prohibited on http/2?
},
cancelToken: source.token,
...opts,
})
clearTimeout(timeoutId)
return site.data
} catch (err) {
throw err
}
}
PS: I've seen similar questions, but none had an answer that would apply.

Ok so this isn't as easy to solve as one might expect. Ideally, http headers 'Content-length' and 'Content-type' exist so the user can know what he should expect but these aren't required headers. However those are often inaccurate or missing.
The solution I've found for this problem, which looks to be very reliable, involves two things:
Making the request as a Stream
Reading the file signature that the first byte of a lot of files have(probably due to ISO 8859-1, which lists these signatures); These are actually commonly known as Magic Numbers/Bytes.
A great way to use these two things is to stream the response and read the first bytes to check for the file signature; After you know if the file is in whatever format you support/want, then you can just process it as you'd normally or cancel the request before you read the next chunk of the stream, which should prevent overloading of your system(and which you can also use to measure the file size more accurately - which I show in the following snippet)
Here's how I implemented the solution mentioned above:
const getHtml = async (url, { timeout = 10000, ...opts } = {}) => {
const CancelToken = axios.CancelToken
const source = CancelToken.source()
try {
const timeoutId = setTimeout(() => source.cancel('Request cancelled due to timeout'), timeout)
const res = await axios.get(url, {
headers: {
connection: 'keep-alive',
},
cancelToken: source.token,
// Use stream mode so we can read the first chunk before getting the rest(1.6kB/chunk(highWatermark))
responseType: 'stream',
...opts,
})
const stream = res.data;
let firstChunk = true
let size = 0
// Not to be confused with arrayBuffer(the object) ;)
const bufferArray = []
// Async iterator syntax for consuming the stream. Iterating over a stream will consume it fully, but returning or breaking the loop in any way will destroy it
for await (const chunk of stream) {
if (firstChunk) {
firstChunk = false
// Only check the first 100(relevant, spaces excl.) chars of the chunk for html. This would possibly only fail in a raw text file which contains the word html at the very top(very unlikely and even then, wouldn't break anything)
const stringChunk = String(chunk).replace(/\s+/g, '').slice(0, 100).toLowerCase()
if (!stringChunk.includes('html')) return { error: `Requested URL is detected as a file. URL: ${url}\nChunk's magic 100: ${stringChunk}` };
}
size += Buffer.byteLength(chunk);
if (size > sizeLimit) return { error: `Requested URL is too large.\nURL: ${url}\nSize: ${size}` };
const buff = new Buffer.from(chunk)
bufferArray.push(buff)
}
// After the stream is fully consumed, we clear the timeout and create one big buffer to convert to str and return that
clearTimeout(timeoutId)
return { html: Buffer.concat(bufferArray).toString() }
} catch (err) {
throw err
}
}

Related

Is there a fine way to get file mime type / extension by a download link?

I need to get file extension from a download link, currently I'm using the following script (excerpt from React hook)
// ...
useEffect(() => {
/// ...
void (async () => {
try {
setIsLoading(true);
const response = await fetch(downloadLink);
const blob = await response.blob();
setFileExtension(fileExtensionByMimeType[blob.type]);
} finally {
setIsLoading(false);
}
})();
}, []);
// ....
This code works, but the problem is that response.blob() takes time relatively to the file size, and I guess my solution is just bad.
Maybe there is some elegant way to solve my problem?
My bad, just discovered that I have all the headers I need, I was confused with empty headers prop:
Turns out it's not empty and I can get my mime type by this code:
const response = await fetch(downloadLink, { method: 'HEAD' });
const mimeType = response.headers.get('content-type');
Also, I think HEAD is appropriate here

Async function immediately returns undefined yet output variable returns a value immediately before the return statement

I am writing a function that downloads and converts a pdf into individual jpg files by page. I am using the imagemagick library to do the conversion. I am having trouble with my processPDF() function as it immediately returns undefined. I put a console.log statement immediately before the function returns and it returns the exact value I expect yet that value doesn't seem to be getting outside of the function for some reason.
import im from 'imagemagick'
import { promises as fs } from 'fs'
import path from 'path'
import _ from 'lodash'
import axios from 'axios'
import { v4 as uuid } from 'uuid'
async function processPDF(pdfPath) {
let basename = path.basename(pdfPath, '.pdf')
let outputPath = "./img/" + basename + ".jpg";
console.log(`Converting ${pdfPath}`)
// Take PDF file and generate individual JPG files
await im.convert(["-density", 300, pdfPath, outputPath],async (err) => {
if (err) {
console.log(err)
throw `Couldn't Process ${pdfPath}`
}
else {
// Get every file in Temporary Image Directory
let files = await fs.readdir(`./img/`)
// Append directory into filenames
files = files.map(file => {
return "./img/" + file
})
// We only want the files that match the source pdf's name
files = files.filter((file) => {
return file.includes(basename)
})
console.log(`Getting ${basename} Buffer Data`)
// For each file, read and return the buffer data along with the path
let images = await Promise.all(files.map(async file => {
const contents = await fs.readFile(file)
return { path: file, buffer: contents }
}))
// Since we read the files asynchonously, Reorder the files
images = _.orderBy(images, (image) => {
let regex = /\d*.jpg/
let res = image.path.match(regex)[0]
res = path.basename(res, '.jpg')
return res
})
let output = { pdf: pdfPath, images }
// Returns a value
console.log(output)
// Returns undefined???
return output
}
})
}
export async function downloadAndProcessPDF(url) {
// Fetch PDF from server
let { data } = await axios.get(url, {
responseType: 'arraybuffer',
headers: {
'Content-Type': 'application/json',
'Accept': 'application/pdf'
}
}).catch(e=>{
console.log(e);
throw `Can't retrieve ${url}`
})
// Generate a Unique ID for the pdf since this is called asynchronously, this will be called many times simultaneously
let id = "./pdf/" + uuid() + ".pdf"
await fs.writeFile(id, data);
// tell processPDF to process the pdf in the ./pdf directory with the given filename
let pdfData = await processPDF(id);
// Returns undefined???
console.log(pdfData)
return pdfData
}
If I had to take a wild guess I'd think that im.convert is the function that is giving me trouble. Throughout my source code i'm using promises to handle asynchronous tasks yet im.convert() uses a callback function. I'm not super familiar with how concurrency works between promises and callback functions so I think that's what's probably the issue.

How do I make multiple fetch calls without getting 429 error?

I came across a problem in a book which I can't seem to figure out. Unfortunately, I don't have a live link for it, so if anyone could help me with my approach to this theoretically, I'd really appreciate it.
The process:
I get from a fetch call an array of string codes (["abcde", "fghij", "klmno", "pqrst"]).
I want to make a call to a link with each string code.
example:
fetch('http://my-url/abcde').then(res => res.json()).then(res => res).catch(error => new Error(`Error: ${error}`)); // result: 12345
fetch('http://my-url/fghij').then(res => res.json()).then(res => res).catch(error => new Error(`Error: ${error}`)); // result: 67891
...etc
Each of the calls is going to give me a number code, as shown.
I need to get the highest number of the 5 and get its afferent string code and make another call with that.
"abcde" => 1234
"fghij" => 5314
"klmno" => 3465
"pqrst" => 7234 <--- winner
fetch('http://my-url/pqrst').then(res => res.json()).then(res => res).catch(error => new Error(`Error: ${error}`));
What I tried:
let codesArr = []; // array of string codes
let promiseArr = []; // array of fetch using each string code in `codesArr`, meant to be used in Promise.all()
let codesObj = {}; // object with string code and its afferent number code gotten from the Promise.all()
fetch('http://my-url/some-code')
.then(res => res.json())
.then(res => codesArr = res) // now `codesArr` is ["abcde", "fghij", "klmno", "pqrst"]
.catch(error => new Error(`Error: ${error}`);
for(let i = 0; i < codesArr.length; i++) {
promiseArr.push(
fetch(`http://my-url/${codesArr[i]}`)
.then(res => res.text())
.then(res => {
codesObj[codesArr[i]] = res;
// This is to get an object from which I can later get the highest number and its string code. Like this:
// codesObj = {
// "abcde": 12345,
// "fghij": 67891
// }
})
.catch(error => new Error(`Error: ${error}`));
// I am trying to make an array with fetch, so that I can use it later in Promise.all()
}
Promise.all(promiseArray) // I wanted this to go through all the fetches inside the `promiseArr` and return all of the results at once.
.then(res => {
for(let i = 0; i < res.length; i++) {
console.log(res[i]);
// this should output the code number for each call (`12345`, `67891`...etc)
// this is where I get lost
}
})
One of the problems with my approach so far seems to be that it makes too many requests and I get 429 error. I sometimes get the number codes all right, but not too often.
Like you already found out the 429 means that you send too many requests:
429 Too Many Requests
The user has sent too many requests in a given amount of time ("rate
limiting").
The response representations SHOULD include details explaining the
condition, and MAY include a Retry-After header indicating how long to
wait before making a new request.
For example:
HTTP/1.1 429 Too Many Requests
Content-Type: text/html
Retry-After: 3600
<html>
<head>
<title>Too Many Requests</title>
</head>
<body>
<h1>Too Many Requests</h1>
<p>I only allow 50 requests per hour to this Web site per
logged in user. Try again soon.</p>
</body>
</html>
Note that this specification does not define how the origin server
identifies the user, nor how it counts requests. For example, an
origin server that is limiting request rates can do so based upon
counts of requests on a per-resource basis, across the entire server,
or even among a set of servers. Likewise, it might identify the user
by its authentication credentials, or a stateful cookie.
Responses with the 429 status code MUST NOT be stored by a cache.
To handle this issue you should reduce the amount of requests made in a set amount of time. You should iterate your codes with a delay, spacing out the request by a few seconds. If not specified in the API documentation or the 429 response, you have to use trial and error approach to find a delay that works. In the example below I've spaced them out 2 seconds (2000 milliseconds).
The can be done by using the setTimeout() to execute some piece of code later, combine this with a Promise to create a sleep function. When iterating the initially returned array, make sure to await sleep(2000) to create a 2 second delay between each iteration.
An example could be:
const fetch = createFetchMock({
"/some-code": ["abcde", "fghij", "klmno", "pqrst"],
"/abcde": 12345,
"/fghij": 67891,
"/klmno": 23456,
"/pqrst": 78912,
});
const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
(async function () {
try {
const url = "https://my-url/some-code";
console.log("fetching url", url);
const response = await fetch(url);
const codes = await response.json();
console.log("got", codes);
const codesObj = {};
for (const code of codes) {
await sleep(2000);
const url = `https://my-url/${code}`;
console.log("fetching url", url);
const response = await fetch(url);
const value = await response.json();
console.log("got", value);
codesObj[code] = value;
}
console.log("codesObj =", codesObj);
} catch (error) {
console.error(error);
}
})();
// fetch mocker factory
function createFetchMock(dataByPath = {}) {
const empty = new Blob([], {type: "text/plain"});
const status = {
ok: { status: 200, statusText: "OK" },
notFound: { status: 404, statusText: "Not Found" },
};
const blobByPath = Object.create(null);
for (const path in dataByPath) {
const json = JSON.stringify(dataByPath[path]);
blobByPath[path] = new Blob([json], { type: "application/json" });
}
return function (url) {
const path = new URL(url).pathname;
const response = (path in blobByPath)
? new Response(blobByPath[path], status.ok)
: new Response(empty, status.notFound);
return Promise.resolve(response);
};
}
In this case... You should run and wait each fetch run finish before run new fetch by using async/await
runFetch = async (codesArr) => {
for(let i = 0; i < codesArr.length; i++){
const rawResponse = await fetch(`http://my-url/${codesArr[i]}`);
const codeResponse = rawResponse.json();
console.log(rawResponse);
codesObj[codesArr[i]] = codeResponse;
}
}
hope that help you.

Cloudinary Signed Uploads with Widget

Documentation is extremely frustrating.
I'm using the upload widget to try to allow users to upload multiple pictures for their profile. I can't use unsigned uploads because of the potential for abuse.
I would much rather upload the file through the upload widget instead of through the server as it seems like it should be so simple
I've pieced together what I think should work but it is still saying: Upload preset must be whitelisted for unsigned uploads
Server:
// grab a current UNIX timestamp
const millisecondsToSeconds = 1000;
const timestamp = Math.round(Date.now() / millisecondsToSeconds);
// generate the signature using the current timestmap and any other desired Cloudinary params
const signature = cloudinaryV2.utils.api_sign_request({ timestamp }, CLOUDINARY_SECRET_KEY);
// craft a signature payload to send to the client (timestamp and signature required)
return signature;
also tried
return {
signature,
timestamp,
};
also tried
const signature = cloudinaryV2.utils.api_sign_request(
data.params_to_sign,
CLOUDINARY_SECRET_KEY,
);
Client:
const generateSignature = async (callback: Function, params_to_sign: object): Promise<void> => {
try {
const signature = await generateSignatureCF({ slug: 'xxxx' });
// also tried { slug: 'xxxx', params_to_sign }
callback(signature);
} catch (err) {
console.log(err);
}
};
cloudinary.openUploadWidget(
{
cloudName: 'xxx',
uploadPreset: 'xxxx',
sources: ['local', 'url', 'facebook', 'dropbox', 'google_photos'],
folder: 'xxxx',
apiKey: ENV.CLOUDINARY_PUBLIC_KEY,
uploadSignature: generateSignature,
},
function(error, result) {
console.log(error);
},
);
Let's all take a moment to point out how horrible Cloudinary's documentation is. It's easily the worst i've ever seen. Nightmare fuel.
Now that i've got that off my chest... I really needed to be able to do this and I spent way too long banging my head against walls for what should be extremely simple. Here it is...
Server (Node.js)
You'll need an endpoint that returns a signature-timestamp pair to the frontend:
import cloudinary from 'cloudinary'
export async function createImageUpload() {
const timestamp = new Date().getTime()
const signature = await cloudinary.utils.api_sign_request(
{
timestamp,
},
process.env.CLOUDINARY_SECRET
)
return { timestamp, signature }
}
Client (Browser)
The client makes a request to the server for a signature-timestamp pair and then uses that to upload a file. The file used in the example should come from an <input type='file'/> change event etc.
const CLOUD_NAME = process.env.CLOUDINARY_CLOUD_NAME
const API_KEY = process.env.CLOUDINARY_API_KEY
async function uploadImage(file) {
const { signature, timestamp } = await api.post('/image-upload')
const form = new FormData()
form.append('file', file)
const res = await fetch(
`https://api.cloudinary.com/v1_1/${CLOUD_NAME}/image/upload?api_key=${API_KEY}&timestamp=${timestamp}&signature=${signature}`,
{
method: 'POST',
body: form,
}
)
const data = await res.json()
return data.secure_url
}
That's it. That's all it takes. If only Cloudinary had this in their docs.
Man. I hate my life. I finally figured it out. It literally took me beautifying the upload widget js to understand that the return of the function should be a string instead of an object even though the docs make it seem otherwise.
Here is how to implement a signed upload with a Firebase Cloud Function
import * as functions from 'firebase-functions';
import cloudinary from 'cloudinary';
const CLOUDINARY_SECRET_KEY = functions.config().cloudinary.key;
const cloudinaryV2 = cloudinary.v2;
module.exports.main = functions.https.onCall(async (data, context: CallableContext) => {
// Checking that the user is authenticated.
if (!context.auth) {
// Throwing an HttpsError so that the client gets the error details.
throw new functions.https.HttpsError(
'failed-precondition',
'The function must be called while authenticated.',
);
}
try {
return cloudinaryV2.utils.api_sign_request(data.params_to_sign, CLOUDINARY_SECRET_KEY);
} catch (error) {
throw new functions.https.HttpsError('failed-precondition', error.message);
}
});
// CLIENT
const uploadWidget = () => {
const generateSignature = async (callback: Function, params_to_sign: object): Promise<void> => {
try {
const signature = await generateImageUploadSignatureCF({ params_to_sign });
callback(signature.data);
} catch (err) {
console.log(err);
}
};
cloudinary.openUploadWidget(
{
cloudName: 'xxxxxx',
uploadSignature: generateSignature,
apiKey: ENV.CLOUDINARY_PUBLIC_KEY,
},
function(error, result) {
console.log(error);
},
);
};

How to get response from S3 getObject in Node.js?

In a Node.js project I am attempting to get data back from S3.
When I use getSignedURL, everything works:
aws.getSignedUrl('getObject', params, function(err, url){
console.log(url);
});
My params are:
var params = {
Bucket: "test-aws-imagery",
Key: "TILES/Level4/A3_B3_C2/A5_B67_C59_Tiles.par"
If I take the URL output to the console and paste it in a web browser, it downloads the file I need.
However, if I try to use getObject I get all sorts of odd behavior. I believe I am just using it incorrectly. This is what I've tried:
aws.getObject(params, function(err, data){
console.log(data);
console.log(err);
});
Outputs:
{
AcceptRanges: 'bytes',
LastModified: 'Wed, 06 Apr 2016 20:04:02 GMT',
ContentLength: '1602862',
ETag: '9826l1e5725fbd52l88ge3f5v0c123a4"',
ContentType: 'application/octet-stream',
Metadata: {},
Body: <Buffer 01 00 00 00 ... > }
null
So it appears that this is working properly. However, when I put a breakpoint on one of the console.logs, my IDE (NetBeans) throws an error and refuses to show the value of data. While this could just be the IDE, I decided to try other ways to use getObject.
aws.getObject(params).on('httpData', function(chunk){
console.log(chunk);
}).on('httpDone', function(data){
console.log(data);
});
This does not output anything. Putting a breakpoint in shows that the code never reaches either of the console.logs. I also tried:
aws.getObject(params).on('success', function(data){
console.log(data);
});
However, this also does not output anything and placing a breakpoint shows that the console.log is never reached.
What am I doing wrong?
#aws-sdk/client-s3 (2022 Update)
Since I wrote this answer in 2016, Amazon has released a new JavaScript SDK, #aws-sdk/client-s3. This new version improves on the original getObject() by returning a promise always instead of opting in via .promise() being chained to getObject(). In addition to that, response.Body is no longer a Buffer but, one of Readable|ReadableStream|Blob. This changes the handling of the response.Data a bit. This should be more performant since we can stream the data returned instead of holding all of the contents in memory, with the trade-off being that it is a bit more verbose to implement.
In the below example the response.Body data will be streamed into an array and then returned as a string. This is the equivalent example of my original answer. Alternatively, the response.Body could use stream.Readable.pipe() to an HTTP Response, a File or any other type of stream.Writeable for further usage, this would be the more performant way when getting large objects.
If you wanted to use a Buffer, like the original getObject() response, this can be done by wrapping responseDataChunks in a Buffer.concat() instead of using Array#join(), this would be useful when interacting with binary data. To note, since Array#join() returns a string, each Buffer instance in responseDataChunks will have Buffer.toString() called implicitly and the default encoding of utf8 will be used.
const { GetObjectCommand, S3Client } = require('#aws-sdk/client-s3')
const client = new S3Client() // Pass in opts to S3 if necessary
function getObject (Bucket, Key) {
return new Promise(async (resolve, reject) => {
const getObjectCommand = new GetObjectCommand({ Bucket, Key })
try {
const response = await client.send(getObjectCommand)
// Store all of data chunks returned from the response data stream
// into an array then use Array#join() to use the returned contents as a String
let responseDataChunks = []
// Handle an error while streaming the response body
response.Body.once('error', err => reject(err))
// Attach a 'data' listener to add the chunks of data to our array
// Each chunk is a Buffer instance
response.Body.on('data', chunk => responseDataChunks.push(chunk))
// Once the stream has no more data, join the chunks into a string and return the string
response.Body.once('end', () => resolve(responseDataChunks.join('')))
} catch (err) {
// Handle the error or throw
return reject(err)
}
})
}
Comments on using Readable.toArray()
Using Readable.toArray() instead of working with the stream events directly might be more convenient to use but, its worse performing. It works by reading all response data chunks into memory before moving on. Since this removes all benefits of streaming, this approach is discouraged per the Node.js docs.
As this method reads the entire stream into memory, it negates the benefits of streams. It's intended for interoperability and convenience, not as the primary way to consume streams. Documentation Link
#aws-sdk/client-s3 Documentation Links
GetObjectCommand
GetObjectCommandInput
GetObjectCommandOutput
aws-sdk (Original Answer)
When doing a getObject() from the S3 API, per the docs the contents of your file are located in the Body property, which you can see from your sample output. You should have code that looks something like the following
const aws = require('aws-sdk');
const s3 = new aws.S3(); // Pass in opts to S3 if necessary
var getParams = {
Bucket: 'abc', // your bucket name,
Key: 'abc.txt' // path to the object you're looking for
}
s3.getObject(getParams, function(err, data) {
// Handle any error and exit
if (err)
return err;
// No error happened
// Convert Body from a Buffer to a String
let objectData = data.Body.toString('utf-8'); // Use the encoding necessary
});
You may not need to create a new buffer from the data.Body object but if you need you can use the sample above to achieve that.
Based on the answer by #peteb, but using Promises and Async/Await:
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
async function getObject (bucket, objectKey) {
try {
const params = {
Bucket: bucket,
Key: objectKey
}
const data = await s3.getObject(params).promise();
return data.Body.toString('utf-8');
} catch (e) {
throw new Error(`Could not retrieve file from S3: ${e.message}`)
}
}
// To retrieve you need to use `await getObject()` or `getObject().then()`
const myObject = await getObject('my-bucket', 'path/to/the/object.txt');
Updated (2022)
nodejs v17.5.0 added Readable.toArray. If this API is available in your node version. The code will be very short:
const buffer = Buffer.concat(
await (
await s3Client
.send(new GetObjectCommand({
Key: '<key>',
Bucket: '<bucket>',
}))
).Body.toArray()
)
If you are using Typescript, you are safe to cast the .Body part as Readable (the other types ReadableStream and Blob are only returned in browser environment. Moreover, in browser, Blob is only used in legacy fetch API when response.body is not supported)
(response.Body as Readable).toArray()
Note that: Readable.toArray is an experimental (yet handy) feature, use it with caution.
=============
Original answer
If you are using aws sdk v3, the sdk v3 returns nodejs Readable (precisely, IncomingMessage which extends Readable) instead of a Buffer.
Here is a Typescript version. Note that this is for node only, if you send the request from browser, check the longer answer in the blog post mentioned below.
import {GetObjectCommand, S3Client} from '#aws-sdk/client-s3'
import type {Readable} from 'stream'
const s3Client = new S3Client({
apiVersion: '2006-03-01',
region: 'us-west-2',
credentials: {
accessKeyId: '<access key>',
secretAccessKey: '<access secret>',
}
})
const response = await s3Client
.send(new GetObjectCommand({
Key: '<key>',
Bucket: '<bucket>',
}))
const stream = response.Body as Readable
return new Promise<Buffer>((resolve, reject) => {
const chunks: Buffer[] = []
stream.on('data', chunk => chunks.push(chunk))
stream.once('end', () => resolve(Buffer.concat(chunks)))
stream.once('error', reject)
})
// if readable.toArray() is support
// return Buffer.concat(await stream.toArray())
Why do we have to cast response.Body as Readable? The answer is too long. Interested readers can find more information on my blog post.
For someone looking for a NEST JS TYPESCRIPT version of the above:
/**
* to fetch a signed URL of a file
* #param key key of the file to be fetched
* #param bucket name of the bucket containing the file
*/
public getFileUrl(key: string, bucket?: string): Promise<string> {
var scopeBucket: string = bucket ? bucket : this.defaultBucket;
var params: any = {
Bucket: scopeBucket,
Key: key,
Expires: signatureTimeout // const value: 30
};
return this.account.getSignedUrlPromise(getSignedUrlObject, params);
}
/**
* to get the downloadable file buffer of the file
* #param key key of the file to be fetched
* #param bucket name of the bucket containing the file
*/
public async getFileBuffer(key: string, bucket?: string): Promise<Buffer> {
var scopeBucket: string = bucket ? bucket : this.defaultBucket;
var params: GetObjectRequest = {
Bucket: scopeBucket,
Key: key
};
var fileObject: GetObjectOutput = await this.account.getObject(params).promise();
return Buffer.from(fileObject.Body.toString());
}
/**
* to upload a file stream onto AWS S3
* #param stream file buffer to be uploaded
* #param key key of the file to be uploaded
* #param bucket name of the bucket
*/
public async saveFile(file: Buffer, key: string, bucket?: string): Promise<any> {
var scopeBucket: string = bucket ? bucket : this.defaultBucket;
var params: any = {
Body: file,
Bucket: scopeBucket,
Key: key,
ACL: 'private'
};
var uploaded: any = await this.account.upload(params).promise();
if (uploaded && uploaded.Location && uploaded.Bucket === scopeBucket && uploaded.Key === key)
return uploaded;
else {
throw new HttpException("Error occurred while uploading a file stream", HttpStatus.BAD_REQUEST);
}
}
Converting GetObjectOutput.Body to Promise<string> using node-fetch
In aws-sdk-js-v3 #aws-sdk/client-s3, GetObjectOutput.Body is a subclass of Readable in nodejs (specifically an instance of http.IncomingMessage) instead of a Buffer as it was in aws-sdk v2, so resp.Body.toString('utf-8') will give you the wrong result “[object Object]”. Instead, the easiest way to turn GetObjectOutput.Body into a Promise<string> is to construct a node-fetch Response, which takes a Readable subclass (or Buffer instance, or other types from the fetch spec) and has conversion methods .json(), .text(), .arrayBuffer(), and .blob().
This should also work in the other variants of aws-sdk and platforms (#aws-sdk v3 node Buffer, v3 browser Uint8Array subclass, v2 node Readable, v2 browser ReadableStream or Blob)
npm install node-fetch
import { Response } from 'node-fetch';
import * as s3 from '#aws-sdk/client-s3';
const client = new s3.S3Client({})
const s3Response = await client.send(new s3.GetObjectCommand({Bucket: '…', Key: '…'});
const response = new Response(s3Response.Body);
const obj = await response.json();
// or
const text = await response.text();
// or
const buffer = Buffer.from(await response.arrayBuffer());
// or
const blob = await response.blob();
Reference: GetObjectOutput.Body documentation, node-fetch Response documentation, node-fetch Body constructor source, minipass-fetch Body constructor source
Thanks to kennu comment in GetObjectCommand usability issue
Extremely similar answer to #ArianAcosta above. Except I'm using import (for Node 12.x and up), adding AWS config and sniffing for an image payload and applying base64 processing to the return.
// using v2.x of aws-sdk
import aws from 'aws-sdk'
aws.config.update({
accessKeyId: process.env.YOUR_AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.YOUR_AWS_SECRET_ACCESS_KEY,
region: "us-east-1" // or whatever
})
const s3 = new aws.S3();
/**
* getS3Object()
*
* #param { string } bucket - the name of your bucket
* #param { string } objectKey - object you are trying to retrieve
* #returns { string } - data, formatted
*/
export async function getS3Object (bucket, objectKey) {
try {
const params = {
Bucket: bucket,
Key: objectKey
}
const data = await s3.getObject(params).promise();
// Check for image payload and formats appropriately
if( data.ContentType === 'image/jpeg' ) {
return data.Body.toString('base64');
} else {
return data.Body.toString('utf-8');
}
} catch (e) {
throw new Error(`Could not retrieve file from S3: ${e.message}`)
}
}
At first glance it doesn't look like you are doing anything wrong but you don't show all your code. The following worked for me when I was first checking out S3 and Node:
var AWS = require('aws-sdk');
if (typeof process.env.API_KEY == 'undefined') {
var config = require('./config.json');
for (var key in config) {
if (config.hasOwnProperty(key)) process.env[key] = config[key];
}
}
var s3 = new AWS.S3({accessKeyId: process.env.AWS_ID, secretAccessKey:process.env.AWS_KEY});
var objectPath = process.env.AWS_S3_FOLDER +'/test.xml';
s3.putObject({
Bucket: process.env.AWS_S3_BUCKET,
Key: objectPath,
Body: "<rss><data>hello Fred</data></rss>",
ACL:'public-read'
}, function(err, data){
if (err) console.log(err, err.stack); // an error occurred
else {
console.log(data); // successful response
s3.getObject({
Bucket: process.env.AWS_S3_BUCKET,
Key: objectPath
}, function(err, data){
console.log(data.Body.toString());
});
}
});
Alternatively you could use minio-js client library get-object.js
var Minio = require('minio')
var s3Client = new Minio({
endPoint: 's3.amazonaws.com',
accessKey: 'YOUR-ACCESSKEYID',
secretKey: 'YOUR-SECRETACCESSKEY'
})
var size = 0
// Get a full object.
s3Client.getObject('my-bucketname', 'my-objectname', function(e, dataStream) {
if (e) {
return console.log(e)
}
dataStream.on('data', function(chunk) {
size += chunk.length
})
dataStream.on('end', function() {
console.log("End. Total size = " + size)
})
dataStream.on('error', function(e) {
console.log(e)
})
})
Disclaimer: I work for Minio Its open source, S3 compatible object storage written in golang with client libraries available in Java, Python, Js, golang.
Just as an alternate solution:
As per this issue on the same subject, it seems like in October 2022, there is a way of handling the body returned from an S3 GetObject request. Assuming you are using AWS SDK V3, you can take advantage of the #aws-sdk/util-stream-node package in the official AWS SDK:
import { GetObjectCommand, S3Client } from '#aws-sdk/client-s3';
import { sdkStreamMixin } from '#aws-sdk/util-stream-node';
const s3Client = new S3Client({});
const { Body } = await s3Client.send(
new GetObjectCommand({
Bucket: 'your-bucket',
Key: 'your-key',
}),
);
// Throws error if Body is undefined
const body = await sdkStreamMixin(Body).transformToString();
You can also transform the body into a byte array or web stream using the .transformToByteArray() and .transformToWebStream() functions.
Keep in mind that the package says that you shouldn't be using it directly, but it seems to be the most straightforward way to handle the body from the request.
This was found in this reply that highlighted a PR that added this feature.
This is the async / await version
var getObjectAsync = async function(bucket,key) {
try {
const data = await s3
.getObject({ Bucket: bucket, Key: key })
.promise();
var contents = data.Body.toString('utf-8');
return contents;
} catch (err) {
console.log(err);
}
}
var getObject = async function(bucket,key) {
const contents = await getObjectAsync(bucket,key);
console.log(contents.length);
return contents;
}
getObject(bucket,key);
The Body.toString() method no longer works with the latest version of the s3 api. Use the following instead:
const { S3Client, GetObjectCommand } = require("#aws-sdk/client-s3");
const streamToString = (stream) =>
new Promise((resolve, reject) => {
const chunks = [];
stream.on("data", (chunk) => chunks.push(chunk));
stream.on("error", reject);
stream.on("end", () => resolve(Buffer.concat(chunks).toString("utf8")));
});
(async () => {
const region = "us-west-2";
const client = new S3Client({ region });
const command = new GetObjectCommand({
Bucket: "test-aws-sdk-js-1877",
Key: "readme.txt",
});
const { Body } = await client.send(command);
const bodyContents = await streamToString(Body);
console.log(bodyContents);
})();
Copy and pasted from here: https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-755387549
Not sure why this solution hasn't already been added as I think it is cleaner than the top answer.
Using express and AWS SDK v3:
public downloadFeedFile = (req: IFeedUrlRequest, res: Response) => {
const downloadParams: GetObjectCommandInput = parseS3Url(req.s3FileUrl.replace(/\s/g, ''));
logger.info("requesting S3 file " + JSON.stringify(downloadParams));
const run = async () => {
try {
const fileStream = await this.s3Client.send(new GetObjectCommand(downloadParams));
if (fileStream.Body instanceof Readable){
fileStream.Body.once('error', err => {
console.error("Error downloading s3 file")
console.error(err);
});
fileStream.Body.pipe(res);
}
} catch (err) {
logger.error("Error", err);
}
};
run();
};

Categories

Resources