Async Await in firebase functions - javascript

I'm using Async/Await in a firebase function as follows
const functions = require("firebase-functions");
const puppeteer = require('puppeteer');
async function getHtml(url) {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox',]
});
const page = await browser.newPage();
await page.goto(url,
{ waitUntil: ['networkidle0', 'networkidle2', 'load', 'domcontentloaded'] });
const k = await page.content()
await browser.close();
return k
};
// Create and deploy your first functions
// https://firebase.google.com/docs/functions/get-started
exports.api = functions.runWith({ memory: '512MB' })
.https.onRequest((request, response) => {
// console.log(request.query.url)
functions.logger.info("Hello logs!", { structuredData: true });
getHtml(request.query.url)
.then(function (res) {
response.send(res);
console.log(res)
})
.catch(function (err) {
response.send(err);
})
})
;
This fails the pre-deploy with the error Parsing error: Unexpected token function for the line async function getHtml(url) {
I just assumed that it's an issue with the eslint which is why I removed the predeploy "predeploy": [ "npm --prefix \"$RESOURCE_DIR\" run lint"] in the firebase json which helped me deploy successfully but returned an empty response on testing out the deployed code.
Everything on my local system works flawlessly with firebase serve
Any help would be much appriciated!
Edit:
Steps to reproduce
run firebase init in a directory
Select functions... , Javascript and (optionally)select no for eslint
install puppeteer using npm and paste the code above in the /functions/index.js file
run firebase serve and test the code with a url parameter on your local system. This will respond with a rendered html page.
Run firebase deploy
Now, on going to the hosted URL and appending a url parameter should just return {} instead of the html

Related

How to use proxy servers with aws lambda + Puppeteer?

I am trying to run puppeteer with proxy chain package on aws lambda but I am getting this error message:
"errorType": "Error",
"errorMessage": "Protocol error (Target.createTarget): Target closed.",
Code:
const chromium = require('chrome-aws-lambda');
const { addExtra } = require("puppeteer-extra");
const puppeteerExtra = addExtra(chromium.puppeteer);
const proxyChain = require('proxy-chain');
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteerExtra.use(StealthPlugin());
exports.handler = async (event, context, callback) => {
let finalResult = [];
const url = ``;
let browser;
const oldProxyUrl = ''; // --> bright data proxy
const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);
console.log("newProxyUrl", newProxyUrl)
try {
browser = await puppeteerExtra.launch({
args: ['--no-sandbox', '--disable-setuid-sandbox', `--proxy-server=${newProxyUrl}`],
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath,
headless: chromium.headless
});
const page = await browser.newPage();
await page.goto(url);
finalResult = await extractElements(page);
} catch (error) {
return callback(error);
} finally {
await browser.close();
}
return callback(null, finalResult);
};
Above code works fine on aws lambda without proxy-server url. I also tested same code without proxy server url on serverless functions like vercel and netlify and it worked. Only issue is when I add proxy server url it throws protocol error.
Here are a few things you can try to troubleshoot this issue:
Make sure that the url variable has a value. This is currently an empty string, which means that the page.goto() method will not have a valid URL to navigate to.
Make sure that the oldProxyUrl variable has a value. This is currently an empty string, which means that the proxyChain.anonymizeProxy() method will not have a valid proxy to anonymize.
Make sure that the extractElements() function is defined and can be called. This function is not present in the code you provided, so you may need to include it or modify the code to remove the call to this function.
Check the logs of your AWS Lambda function to see if there are any additional error messages that might provide more information about the issue.
Check the documentation for the puppeteer-extra-plugin-stealth and proxy-chain packages to see if there are any known issues or compatibility issues with AWS Lambda.

The npm puppeter package is returning an error regarding node_package space

Here is the error that it is returning
Here is a picture of the error i'm getting when I run npm install puppeter
I found some stuff online with permissions, but this is about node_package space. It is not disk space as I've looked over my disk storage availability and there's plenty. Working on an Apify SDK, and I'm following the documentation, but the console is returning a whole bunch of error messages.
Can someone please help?
const Apify = require('apify')
Apify.main(async () => {
const requestQueue = await Apify.openRequestQueue();
await requestQueue.addRequest({ url: 'https://www.iana.org/' });
const crawler = new Apify.PuppeteerCrawler({
requestQueue,
handlePageFunction: async ({ request, page }) => {
const title = await page.title();
console.log(`Title of ${request.url}: ${title}`);
await Apify.utils.enqueueLinks({
requestQueue,
page,
pseudoUrls: ['https://www.iana.org/[.*]'],
});
},
});
await crawler.run();
});

GCloud Function with Puppeteer - 'Process exited with code 16' error

Having trouble finding any documentation or cause for this sort of issue. I'm trying to run a headless chrome browser script that pulls the the current song playing from kexp.org and returns it as a JSON object. Testing with the NPM package #Google-clound/functions-framework does return the correct response however when deployed into GCloud, I receive the following error when hitting the API trigger:
Error: could not handle the request
Error: Process exited with code 16
at process.on.code (invoker.js:396)
at process.emit (events.js:198)
at process.EventEmitter.emit (domain.js:448)
at process.exit (per_thread.js:168)
at logAndSendError (/workspace/node_modules/#google-cloud/functions framework/build/src/invoker.js:184)
at process.on.err (invoker.js:393)
at process.emit (events.js:198)
at process.EventEmitter.emit (domain.js:448)
at emitPromiseRejectionWarnings (internal/process/promises.js:140)
at process._tickCallback (next_tick.js:69)
Full Script:
const puppeteer = require('puppeteer');
let browserPromise = puppeteer.launch({
args: [
'--no-sandbox'
]
})
exports.getkexp = async (req, res) => {
const browser = await browserPromise
const context = await browser.createIncognitoBrowserContext()
const page = await context.newPage()
try {
const url = 'https://www.kexp.org/'
await page.goto(url)
await page.waitFor('.Player-meta')
let content = await page.evaluate(() => {
// finds elements by data type and maps to array note: needs map because puppeeter needs a serialized element
let player = [...document.querySelectorAll('[data-player-meta]')].map((player) =>
// cleans up and removes empty strings from array
player.innerHTML.trim());
// creates object and removes empty strings
player = {...player.filter(n => n)}
let songList = {
"show":player[0],
"artist":player[1],
"song":player[2].substring(2),
"album":player[3]
}
return songList
});
context.close()
res.set('Content-Type', 'application/json')
res.status(200).send(content)
} catch (e) {
console.log('error occurred: '+e)
context.close()
res.set('Content-Type', 'application/json')
res.status(200).send({
"error":"occurred"
})
}
}
Is there documentation for this error type? It's been deployed on GCloud via CLI shell with the following parameters:
gcloud functions deploy getkexp --trigger-http --runtime=nodejs10 --memory=1024mb

How to use Apify on Google Cloud Functions

I'm deploying some code using Apify as Google Cloud Functions. When triggered, the Cloud Function terminates silently. What am I doing wrong?
I have some working code using Apify 0.15.1. It runs fine locally. Once deployed as a Google Cloud Function, it fails silently without any clear error. The equivalent code using Puppeteer 1.18.1 works fine.
I've reproduced the issue using more simple code below. While this example doesn't strictly require Apify, I would like to be able to use the extra functionality provided by Apify.
Code using Apify:
const Apify = require("apify");
exports.screenshotApify = async (req, res) => {
let imageBuffer;
Apify.main(async () => {
const browser = await Apify.launchPuppeteer({ headless: true });
const page = await browser.newPage();
await page.goto("https://xenaccounting.com");
imageBuffer = await page.screenshot({ fullPage: true });
await browser.close();
});
if (res) {
res.set("Content-Type", "image/png");
res.send(imageBuffer);
}
return imageBuffer;
};
Code using Puppeteer:
const puppeteer = require("puppeteer");
exports.screenshotPup = async (req, res) => {
const browser = await puppeteer.launch({ args: ["--no-sandbox"] });
const page = await browser.newPage();
await page.goto("https://xenaccounting.com");
const imageBuffer = await page.screenshot({ fullpage: true });
await browser.close();
if (res) {
res.set("Content-Type", "image/png");
res.send(imageBuffer);
}
return imageBuffer;
};
Once deployed as a Google Cloud Function (with --trigger-http and --memory=2048), the Puppeteer variant works fine, while the Apify variant terminates silently without result (apart from an 'ok' / HTTP 200 return value).
Get rid of the Apify.main() function, it schedules the call to a later time, after your function already returned the result.

Puppeteer Crawler - Error: net::ERR_TUNNEL_CONNECTION_FAILED

Currently I have my Puppeteer running with a Proxy on Heroku. Locally the proxy relay works totally fine. I however get the error Error: net::ERR_TUNNEL_CONNECTION_FAILED. I've set all .env info in the Heroku config vars so they are all available.
Any idea how I can fix this error and resolve the issue?
I currently have
const browser = await puppeteer.launch({
args: [
"--proxy-server=https=myproxy:myproxyport",
"--no-sandbox",
'--disable-gpu',
"--disable-setuid-sandbox",
],
timeout: 0,
headless: true,
});
page.authentication
The correct format for proxy-server argument is,
--proxy-server=HOSTNAME:PORT
If it's HTTPS proxy, you can pass the username and password using page.authenticate before even doing a navigation,
page.authenticate({username:'user', password:'password'});
Complete code would look like this,
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless:false,
ignoreHTTPSErrors:true,
args: ['--no-sandbox','--proxy-server=HOSTNAME:PORT']
});
const page = await browser.newPage();
// Authenticate Here
await page.authenticate({username:user, password:password});
await page.goto('https://www.example.com/');
})();
Proxy Chain
If somehow the authentication does not work using above method, you might want to handle the authentication somewhere else.
There are multiple packages to do that, one is proxy-chain, with this, you can take one proxy, and use it as new proxy server.
The proxyChain.anonymizeProxy(proxyUrl) will take one proxy with username and password, create one new proxy which you can use on your script.
const puppeteer = require('puppeteer');
const proxyChain = require('proxy-chain');
(async() => {
const oldProxyUrl = 'http://username:password#hostname:8000';
const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);
// Prints something like "http://127.0.0.1:12345"
console.log(newProxyUrl);
const browser = await puppeteer.launch({
args: [`--proxy-server=${newProxyUrl}`],
});
// Do your magic here...
const page = await browser.newPage();
await page.goto('https://www.example.com');
})();

Categories

Resources