Lambda download and forward PDF (PDF proxy)

Lambda download and forward PDF (PDF proxy) - javascript

i'm wondering what am i doing wrong with this lambda function.
Goal:
Send http options to fetch an PDF and forward it to consumer from Lambda service.
Current code:
"use strict";
const http = require("http");
function getPDF(options, event) {
console.log(options);
return new Promise((resolve, reject) => {
let body = "";
let statusCode = 0;
let headers = { };
http
.request(options, (res) => {
statusCode = res.statusCode;
const headersFromReq = res.headers || {};
res.on("data", (chunk) => (body += chunk));
res.on("end", function () {
console.log( statusCode, headers, body);
resolve({
body: Buffer.from(body).toString(),
statusCode,
headers: {
...headersFromReq,
//'Content-type': 'application/pdf',
//'content-disposition': 'attachment; filename=test.pdf'
}
});
})
.on("error", reject)
.end();
});
});
}
exports.handler = async (event) => {
try {
const response = await getPDF(event.options, event);
return response;
} catch (error) {
console.error(error);
return {
statusCode: 500,
body: JSON.stringify(error),
headers: {}
};
}
};
Whatever i've tried, it either times out or does not result in the actually needed response of Base64 encoded PDF.
Params for testing would look something like this:
{
"options": {
"hostname": "www.africau.edu",
"port": 80,
"path": "images/default/sample.pdf",
"method": "GET",
"headers": {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36",
"Accept": "application/pdf",
"Accept-encoding": "gzip, deflate, br"
}
}
}
Current logs -
Function Logs
START RequestId: 8d6be86c-788d-4f49-8305-8caf377cd32e Version: $LATEST
2021-09-28T09:01:21.507Z 8d6be86c-788d-4f49-8305-8caf377cd32e INFO {
hostname: 'www.africau.edu',
port: 80,
path: 'images/default/sample.pdf',
method: 'GET',
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36',
Accept: 'application/pdf',
'Accept-encoding': 'gzip, deflate, br'
}
}
END RequestId: 8d6be86c-788d-4f49-8305-8caf377cd32e
REPORT RequestId: 8d6be86c-788d-4f49-8305-8caf377cd32e Duration: 11011.54 ms Billed Duration: 11000 ms Memory Size: 128 MB Max Memory Used: 54 MB Init Duration: 152.43 ms
2021-09-28T09:01:32.494Z 8d6be86c-788d-4f49-8305-8caf377cd32e Task timed out after 11.01 seconds

Your approach has an underlying conceptual problem - it may take time to execute. The time that you don't have when you run things in lambda. Your lambda "technically" has the maximum of 15 minutes to finish the execution (although you explicitly have to configure it. I think by default it's 10s), but if you trigger it from AWS API Gateway, that goes down to 30 seconds and this is not a limit you can configure. It's the total max. Moreover your lambda response cannot be larger than 6MB and it is normally supposed to be JSON, so you would have to convert your file to Base64, but again, if you serve that file via api gateway even that limit goes down once again... What you're trying just cannot be done reliably with lambda in this way. There is a different way however that would actually be recommended by AWS.
You send a request to API Gateway that triggers a lambda
Lambda looks up if the requested file already exists in S3
If it doesn't exist:
The lambda downloads a file and puts it into S3. Note that you can now set up S3 bucket policy so that file stays in S3 only for certain amount of time. You probably don't want to keep it there forever, but it's nice to keep it cached for a while in case the user tries to re-download your PDF. This way they will be able to get the response much faster
The lambda then generates a pre-signed S3 URL to the freshly downloaded file (a special URL that you can request from S3 that will be valid for another few minutes only) and returns it in the response
If it already exists:
the lambda just generates the pre-signed S3 URL and returns it in the response
Your client (UI application I presume) has to generate a consecutive request to the pre-signed url received in the response (so it talks directly to S3). This way, even if your user has slow internet connection and they need 20 minutes to download the file, you don't get any timeouts... well you will still get some if the file is really large and the lambda cannot download it quickly enough, but that would require a longer discussion. In this case I'm assuming your file is under 15MB.

Related

react native fetch not getting the same content as post man

Im having a little problem with my request on getting an html from https://readnovelfull.com/beauty-and-the-beast-wolf-hubby-xoxo/chapter-1-i-would-not-be-responsible.html as example.
I can get all the html on the other url eg novel detalj, latest upgated etc.
but not when im getting the detali for the chapters.
I tested those url on postman and also on https://codebeautify.org/source-code-viewer as well and there is no problem on getting the content of the chapter of which it exist under the div #chr-content
So I am a bit lost now, what am I doing wrong?
Here is my fetch calls which is working on other novel sites.
static async getHtml(
url: string
): Promise<HTMLDivElement> {
console.log(`Sending html request to ${url}`);
var container = parse('<div>test</div>') as any;
try {
let headers = new Headers({
Accept: '*/*',
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'
});
var data = await fetch(url, {
method: 'GET',
headers: headers,
});
if (!data.ok) {
const message = `An error has occured:${data.status}`;
console.log(message);
} else {
var html = await data.text();
console.log('Data is ok. proceed to parse it');
container = parse('<div>' + html + '</div>') as any;
}
} catch (e) {
console.log(e);
}
return container as HTMLDivElement;
}
I should mention that am not getting any error what so ever, its just that the html I am getting is not the same as postman and other site is getting.
Update
Ok so i did some research on the site and this is what i come up with.
the site need X-CSRF-TOKEN and i was able to extract those and find those values
const csrf = 'x09Q6KGqJOJJx2iHwNQUa_mYfG4neV9EOOMsUBKTItKfNjSc0thQzwf2HvCR7SQCqfIpC2ogPj18jG4dQPgVtQ==';
const id = 774791;
which i need to send a request to https://readnovelfull.com/ajax/increase-chapter-views with the values above. and this will send back true/false
now i tried to inc the csrf on my fetch call after but its still the same old same no data.
any idee if i am doing something wrong still?

Looks like you have an issue with CORS. To make sure just try to send request through cors proxy. One of the ways you can quickly do that is add prefix URL:
https://cors-anywhere.herokuapp.com/https://readnovelfull.com/beauty-and-the-beast-wolf-hubby-xoxo/chapter-1-i-would-not-be-responsible.html`
NOTE: Using this CORS proxy on production is not recommended, because it's not secure
If after that you'll receive data, that means that you faced with CORS, and you need to figure out how to solve it in your specific case.
Reproducable example:
const parse = (str) => str;
const getHtml = async (url) => {
console.log(`Sending html request to ${url}`);
var container = parse('<div>No content =(</div>')
try {
let headers = new Headers({
Accept: '*/*',
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'
});
var data = await fetch(url, {
method: 'GET',
headers: headers,
});
if (!data.ok) {
const message = `An error has occured:${data.status}`;
console.log(message);
} else {
var html = await data.text();
console.log('Data is ok. proceed to parse it');
container = parse('<div>' + html + '</div>');
}
} catch (e) {
console.log(e);
}
return container;
}
getHtml('https://cors-anywhere.herokuapp.com/https://readnovelfull.com/beauty-and-the-beast-wolf-hubby-xoxo/chapter-1-i-would-not-be-responsible.html').then(htmlContent => document.querySelector('div').innerHTML = htmlContent);
<div>loading...</div>
If it doesn't help, please provide a reproducible RN example, but I believe there is no difference between RN and web environments in that case.

How to send GET request without downloading response content using node-requests?

I'm currently learning node and i'm looking for HTTP library that would allow me to send GET request, without downloading server response content (body).
I need to send very large amount of http requests every minute. However i do not need to read their content (also to save bandwidth). I can't use HEAD for this purpose.
Is there any way to avoid downloading response body using node-requests, or perhaps any other library - could be used?
My sample code using node-request:
const options = {
url: "https://google.com",
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'
}
}
//How to avoid downloading a whole response?
function callback(err, response, body) {
console.log(response.request.uri.host + ' - ' + response.statusCode);
}
request(options, callback);

HTTP GET by standard fetches the file content, you cannot avoid downloading(getting response) it but you can ignore it. Which is basically what you are doing.
request(options, (err, response, body)=>{
//just return from here don't need to process anything
});
EDIT1:
To just use some bytes of the response, you can use http.get and get the data using the data event. From the doc:
http.get('http://nodejs.org/dist/index.json', (res) => {
res.setEncoding('utf8');
let rawData = '';
res.on('data', (chunk) => { rawData += chunk; });
res.on('end', () => {
//this is when the response will end
});
}).on('error', (e) => {
console.error(`Got error: ${e.message}`);
});

Downloading file with node express app as response

I'm having an issue with express app. I'm using multer to upload a file, then using res.download to send the file back. This seems to work with text files but images are not working. When I send the file to the client, the file size is actually a little bit smaller than what is on the server. It seems as is the full file isn't being transferred.
I'm not doing anything fancy with the response I'm just using res.download. I've researched basically every article I can find and it seems like this works for everyone else.
Only text files are working. Word, excel, pdfs are all saying they're corrupted when downloaded.
EDIT: Here is the function that runs res.download. Its passed the file path, mimetype etc.
function downloadFile(req, res) {
let fpath = req.body.path;
let originalName = req.body.originalName;
let mimetype = req.body.mimetype;
let filename = req.body.filename;
res.download(fpath, originalName, function(err) {
if (err) {
console.log(err);
}
});
}
EDIT: Here is my redux thunk that makes the request and triggers the file download. The download function comes from the downloadjs library.
export const downloadFile = (path, originalName, mimetype, filename) => {
return dispatch => {
return axios.post('/api/v1/quotes/downloadFile', { path: path, originalName: originalName, mimetype: mimetype, filename: filename })
.then(res => {
if (res.status !== 200) {
ErrorHandler.logError(res);
}
else {
// download(res.data, originalName);
download(new Blob([res.data]), originalName, mimetype);
}
}).catch(function(error) {
ErrorHandler.logError(error);
});
}
}
EDIT: Here is a small sample of what I see in the network tab. It seems like its the image contents, but the size is smaller than what is on the server and when I try to open it I get an unsupported file type error.
PNG
IHDR{>õIÖÕsRGB®ÎégAMA±üa pHYsÃÃÇo¨d+{IDATx^íÝml\×ßq¾jº]´Mv´¤ÛÅvÛnÛEßt4/vQ[äÅ¢¯òb>-
él²æJv$Ç¦(Ñ¦DÉR$R
¥V-Q6mÅ4kF¶®,U%ÊYS¶åDr¼5ÿ=ÿ{Ï9sîÌ!Gßp#Î}¾çÞ9÷7÷Þ¹Ó!¸o/ÛÚaï>MOJ4µ¸aíÐF{÷ég?ùó?µÚa a=öFØHa a=öFØHa
Request Header
Accept: application/json, text/plain, */*
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Content-Length: 160
Content-Type: application/json;charset=UTF-8
Host: localhost:3000
Origin: http://localhost:3000
Referer: http://localhost:3000/Quote
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36
Response Header
accept-ranges: bytes
cache-control: public, max-age=0
connection: close
content-disposition: attachment; filename="sSubLineConfigIds.PNG"
content-length: 11238
content-type: application/octet-stream
date: Wed, 17 Jul 2019 19:03:54 GMT
etag: W/"2be6-16c0151b84a"
last-modified: Wed, 17 Jul 2019 19:03:48 GMT
x-powered-by: Express

I was able to figure this out. What I ended up doing is converting the file into base64 and setting the download link to that base64 string.
The node function that gets hit to build the base64 string.
function downloadFile(req, res) {
let fpath = req.body.path;
let mimetype = req.body.mimetype;
fs.readFile(fpath, function (err, data) {
if (err) res.status(500).send('File could not be downloaded');
var base64 = Buffer.from(data).toString('base64');
base64='data:' + mimetype + ';base64,'+base64;
res.send(base64);
});
}
Here is the client side code that builds a link, simulates click, and sets the source link equal to the base64 string.
export const downloadFile = (path, originalName, mimetype, filename) => {
return dispatch => {
return axios.post('/api/v1/quotes/downloadFile', { path: path, originalName: originalName, mimetype: mimetype, filename: filename })
.then(res => {
if (res.status !== 200) {
ErrorHandler.logError(res);
}
else {
const linkSource = res.data;
const downloadLink = document.createElement("a");
const fileName = originalName;
downloadLink.href = linkSource;
downloadLink.download = fileName;
downloadLink.click();
}
}).catch(function(error) {
ErrorHandler.logError(error);
});
}
}

Things looks fine as per the code shared.
It seems this request is initiated through XHR from your front end side than you have to write the download logic to convert the response to blob and then create a file for download as mentioned how-to-create-a-dynamic-file-link-for-download-in-javascript

Node JS and making external web calls successfully?

Hi I am trying to start learning NodeJS now and am in the middle of creating an application. The goal currently is to call a website through node, get an authentication token, then call that website again now with a POST payload which includes my login info and the auth token.
I have created the same program using python and i get a 200 response where in nodeJS i am getting a 302.
I believe thats a quick solution, the main meat of the problem I guess is my lack of understanding in NodeJS where:
1. If I am supposed to nest these requests calls into one another because they are supposed to be a part of the same 'session' and
2. If so how do I go to the last url which is, example.com/poll and be able to store/modify that information (which is just a json) because/if i go to example.com/poll url using a browser, the browser automatically downloads a file which it contains is a JSON format and doesnt just display it, which is what i need. so that i can either save that data in a string or etc. and not download it
In python I do this (Create a session than make the two calls)
url = "https://example.com/"
session = requests.session()
first_req = session.get(url)
auth_token_str = re.search(XXX, first_req.text)
login_url = 'https://example.com/sessions'
payload = { 'session[username_or_email]' : 'username', 'session[password]' : 'password', 'redirect_after_login':'/', 'authenticity_token': authenticity_token }
login_req = session.post(login_url, data=payload, headers=user_agent)
print "login_req response: ", login_req.status_code //gets me 200
then in Node JS:
var initLoad = {
method: 'GET',
url: 'https://example.com/',
headers: {
'User-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36'
}
};
request(initLoad, function(error, response, body) {
if (error) throw new Error(error);
var $ = cheerio.load(body, {xmlMode: false});
var authenticityToken = $("input[name=authenticity_token]").val();
console.log(authenticityToken);
var options = {
method: 'POST',
url: 'https://example.com/sessions',
headers: response.headers,
form: {
'session[username_or_email]': 'someUsername',
'session[password]': 'somePassword',
redirect_after_login: '/',
authenticity_token: authenticityToken
}
};
request(options, function(error, response2, body2) {
if (error) throw new Error(error);
console.log(response2.statusCode); //gets me 302 not 200
var analytics_url = 'https://example.com/poll';
var tripleload = {
method: 'GET',
url: analytics_url,
headers: response2.headers
};
request(tripleload, function(error, response3, body3) {
if (error) throw new Error(error);
res.end(body3);
});
});
});

302 means temporarily moved redirection which you get due error page being displayed to you (or served to your server in this case). There is something with this call that you are doing wrong, maybe url is wrong if generated like this.
Your code is messy due you being newbie in node and due the fact you use request which is barebone and offers little to no comfort in writing this stuff.
Use something like Axios: https://github.com/mzabriskie/axios to make it easier to write requests like this.

Requesting font awesome file from nodeJs is sending back wrong data/file

my application: When you send a request from a browser to my node server, my node server will request an origin website, download all of its static files (including code) and server them back to the user. Next time you visit my node server it will server all the content back from node instead of requesting the origin.
When i make a request for a font awesome file from node
http://example.com/modules/megamenu/fonts/fontawesome-webfont.woff?v=4.2.0
The file's content is different from when i request the same url with cUrl.
This is causing this error in the browser when i return the file from node back to the browser:
Failed to decode downloaded font: http://nodeDomain.test/modules/megamenu/fonts/fontawesome-webfont.woff?v=4.2.0
If i copy and paste the content from the file i requested via curl into the file stored on my node server, the error disappears and all the font awesome stuff works.
Here are the headers I am sending with the request to the origin server from node.
{
connection: 'keep-alive',
pragma: 'no-cache',
'cache-control': 'no-cache',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36',
accept: '*/*',
referer: 'http://example.com/modules/megamenu/css/font-awesome.min.css',
'accept-language': 'en-US,en;q=0.8',
cookie: 'PrestaShop-a30a9934ef476d11b.....'
}
I tried to see what headers where being sent when doing the curl request from command line but i cannot figure out how to do it.
______Node code used to fetch file_______
Url: in options is the one stated above
headers: are the browsers request headers
var options = {
url: originRequestPath,
headers: requestHeaders
}
var originPage = rquest(options);
var responseBody = '';
var resHeads = '';
originPage.on('response', function(res)
{
//store response headers locally
}
originPage.on('data', function(chunk)
{
responseBody += chunk;
});
originPage.on('end', function()
{
storeData.storePageData(storeFilePath, responseBody);
});
__________Store Function below________________
exp.storePageData = function(storePath, pageContent)
{
fs.outputFile(storePath, pageContent, function(err) {
if(err){ console.log(err)}
});
}

I believe the problem with your code is you are converting your buffer output to utf8 string. since you are adding buffer with empty string responseBody += chunk; that buffer is converted to utf-8 string. Thus you are losing some data for binary files. Try this way:
var originPage = rquest(options);
var chunks = []
originPage.on('response', function(res)
{
//store response headers locally
}
originPage.on('data', function(chunk)
{
chunks.push(chunk)
});
originPage.on('end', function()
{
var data = Buffer.concat(chunks)
//send data to browser and store content locally
});

Develop Reference

JavaScript is the programming language of the Web.