Reloading page in puppeteer if response code is not valid - javascript

So I know I can check the response code after a page.goto() call using the response.status() function but my program is built to scrape and do a bunch of actions on a website. Now some websites under load or randomly return a 500 error or 503 error instead of serving up the webpage.
So what I want to do is, for every navigation request, if the response code returns a 500 or 503 error, I want to reload the page. I have been taking a look at setRequestInterception but that's fired before a request is made. setResponseInterception doesn't exist yet (but I see it as a potential feature in github). It would be a peace of cake with setResponseInterception:
Check response code
If 500 or 503, reload page
I am wondering if I can do anything like this right now using setRequestInterception. Or I may have to individually monitor each navigation call and check if it returns a valid code before proceeding.

you didn't provide any code sample so i dont know who your code structure is but here is 1 way to do this
async function init_puppeteer( test ) {
var browser = await puppeteer.launch({headless: false , args: ['--no-sandbox', '--disable-setuid-sandbox' , ]});
let success = false ;
while(!success)
{
success = await open_page( browser , link );
}
browser.close();
}
async function open_page( browser , link ){
try
{
const page = await browser.newPage();
await page.goto( link ).catch(function (error) { throw new Error('TimeoutBrows'); });
// also you can check status code and throw error if its 500 or 503
return true ;
}
catch(e){
return false;
}
}

Related

Service Worker for Static HTML fallback - Refreshing page Offline just shows "No Internet"

I'm trying to get just a simple working example of this going, but I feel like I'm misunderstanding something.
My page is dynamically generated (Django), but all I want is to register a service worker to have a fallback page if the user is offline anywhere in the app. I'm testing this on http://localhost:8000, so maybe this is keeping it from working?
This is what I was basing my code from, which I've copied 99% aside from the location of the offline HTML file, which is correctly getting cached, so I can verify it works.
https://googlechrome.github.io/samples/service-worker/custom-offline-page/
The SW is registered at the bottom of my HTML's body:
<script>
if ('serviceWorker' in navigator) {
navigator.serviceWorker.register('/static/js/service-worker.js');
}
</script>
For /static/js/service-worker.js:
const OFFLINE_VERSION = 1;
const CACHE_NAME = 'offline';
// Customize this with a different URL if needed.
const OFFLINE_URL = '/static/offline/offline.html';
self.addEventListener('install', (event) => {
event.waitUntil((async () => {
const cache = await caches.open(CACHE_NAME);
// Setting {cache: 'reload'} in the new request will ensure that the response
// isn't fulfilled from the HTTP cache; i.e., it will be from the network.
await cache.add(new Request(OFFLINE_URL, {cache: 'reload'}));
})());
});
self.addEventListener('activate', (event) => {
event.waitUntil((async () => {
// Enable navigation preload if it's supported.
// See https://developers.google.com/web/updates/2017/02/navigation-preload
if ('navigationPreload' in self.registration) {
await self.registration.navigationPreload.enable();
}
})());
// Tell the active service worker to take control of the page immediately.
self.clients.claim();
});
self.addEventListener('fetch', (event) => {
// We only want to call event.respondWith() if this is a navigation request
// for an HTML page.
if (event.request.mode === 'navigate') {
event.respondWith((async () => {
try {
// First, try to use the navigation preload response if it's supported.
const preloadResponse = await event.preloadResponse;
if (preloadResponse) {
return preloadResponse;
}
const networkResponse = await fetch(event.request);
return networkResponse;
} catch (error) {
// catch is only triggered if an exception is thrown, which is likely
// due to a network error.
// If fetch() returns a valid HTTP response with a response code in
// the 4xx or 5xx range, the catch() will NOT be called.
console.log('Fetch failed; returning offline page instead.', error);
const cache = await caches.open(CACHE_NAME);
const cachedResponse = await cache.match(OFFLINE_URL);
return cachedResponse;
}
})());
}
// If our if() condition is false, then this fetch handler won't intercept the
// request. If there are any other fetch handlers registered, they will get a
// chance to call event.respondWith(). If no fetch handlers call
// event.respondWith(), the request will be handled by the browser as if there
// were no service worker involvement.
});
The worker successfully installs and activates. The offline.html page is successfully cached and I can verify this in Chrome Inspector -> Application -> Service Workers. I can also verify it's the correct service-worker.js file and not an old one.
If I switch Chrome to "Offline" and refresh the page, I still get the standard "No Internet" page. It also doesn't look like the "fetch" event happens on any normal page loads due to a "console.log" never getting fired.
Is the sample code I'm using outdated? Is this a limitation of trying this on Localhost? What am I doing wrong? Thank you.

API response 404 not found Handling

I am working on a project with API. API response is 404 not found. I need to handle this status code without a new page. I want to add an window.confirm("not found"). However, I couldnt do that, because when API says 404 there is no response so I couldn't check the response. How can I do that without using new page? How can I handle that? Here is my response code:
const response = await instance.get(`?q=${q}&appid=${appid}`);
if (!response) {
console.log("ceren");
}
It never prints "ceren". I tried response ==="", response ===null, response.data===null, and so on
The response object is never null. It's an object that, along with many other keys, includes the status. Moreover, if the request fails, it will throw an error (due to the await, though outside of this function it will be a Promise rejection), so you can just catch that:
return instance.get(`?q=${q}&appid=${appid}`).then(/*...*/).catch((error) => console.log('Request failed!'));
Or, if you must use an await:
try {
const response = await instance.get(`?q=${q}&appid=${appid}`);
} catch (error) {
console.log('Request failed!');
}

Can We Explicitly Catch Puppeteer (Chrome/Chromium) Error net::ERR_ABORTED?

Can we explicitly and specifically catch Puppeteer (Chromme/Chromium) error net::ERR_ABORTED? Or is string matching the only option currently?
page.goto(oneClickAuthPage).catch(e => {
if (e.message.includes('net::ERR_ABORTED')) {}
})
/* "net::ERROR_ABORTED" occurs for sub-resources on a page if we navigate
* away too quickly. I'm specifically awaiting a 302 response for successful
* login and then immediately navigating to the auth-protected page.
*/
await page.waitForResponse(res => res.url() === href && res.status() === 302)
page.goto(originalRequestPage)
Ideally, this would be similar to a potential event we could catch with page.on('requestaborted')
I'd recommend putting your api calls and so in a trycatch block
If it fails, you catch the error, like you are currently doing. But it just looks a bit nicer
try {
await page.goto(PAGE)
} catch(error) {
console.log(error) or console.error(error)
//do specific functionality based on error codes
if(error.status === 300) {
//I don't know what app you are building this in
//But if it's in React, here you could do
//setState to display error messages and so forth
setError('Action aborted')
//if it's in an express app, you can respond with your own data
res.send({error: 'Action aborted'})
}
}
If there are not specific error codes in the error responses for when Puppeteer is aborted, it means that Puppeteer's API has not been coded to return data like that, unfortunately :')
It's not too uncommon to do error messages checks like you are doing in your question. It's, unfortunately, the only way we can do it, since this is what we're given to work with :'P

Puppeteer custom error messages when failure

I am trying to create a custom error messages when Puppeteer fails to do a task, in my case it cannot find the field that it has to click.
let page;
before(async () => { /* before hook for mocha testing */
page = await browser.newPage();
await page.goto("https://www.linkedin.com/login");
await page.setViewport({ width: 1920, height: 1040 });
});
after(async function () { /* after hook for mocah testing */
await page.close();
});
it('should login to home page', async () => { /* simple test case */
const emailInput = "#username";
const passwordInput = "#assword";
const submitSelector = ".login__form_action_container ";
linkEmail = await page.$(emailInput);
linkPassword = await page.$(passwordInput)
linkSubmit = await page.$(submitSelector);
await linkEmail.click({ clickCount: 3 });
await linkEmail.type('testemail#example.com'); // add the email address for linkedin //
await linkPassword.click({ clickCount: 3 }).catch(error => {
console.log('The following error occurred: ' + error);
});;
await linkPassword.type('testpassword'); // add password for linkedin account
await linkSubmit.click();
await page.waitFor(3000);
});
});
I have deliberately put a wrong passwordInput name in order to force puppeteer to fail. However, the console.log message is never printed.
This is my error output which is the default mocha error:
simple test for Linkedin Login functionality
1) should login to home page
0 passing (4s)
1 failing
1) simple test for Linkedin Login functionality
should login to home page:
TypeError: Cannot read property 'click' of null
at Context.<anonymous> (test/sample.spec.js:29:28)
Line 29 is the await linkPassword.click({ clickCount: 3 })
Anyone has an idea how I can make it print a custom error message when an error like this occurs?
The problem is that the exception is being thrown not in the result of the function await linkPassword.click() execution, but in the result of attempt of executing the function. By .catch() you try to handle an eventual exception thrown during execution. page.$() works this way it returns a null if a selector isn't found. And in your case, you execute null.click({ clickCount: 3 }).catch() what actually doesn't have sense.
To quickly solve your problem you should do a check to verify whether linkPassword isn't null. However, I think you make a big mistake by using page.$() to get an element to interact with. This way you lose a lot of the puppeteer features because instead to use puppeteer's method page.click() you use a simple browser's click() in the browser.
Instead, you should make sure that the element exists and is visible and then use the puppeteer's API to play with the element. Like this:
const emailInput = "#username";
await page.waitForSelector(emailInput);
await page.click(emailInput, { clickCount: 3 });
await page.type(emailInput, 'testemail#example.com')
Thanks to that your script makes sure the element is clickable and if it is it scrolls to the element and performs clicks and types the text.
Then you can handle a case when the element isn't found this way:
page.waitForSelector(emailInput).catch(() => {})
or just by using try/catch.

login into gmail fails for unknown reason

I am trying to login into my gmail with puppeteer to lower the risk of recaptcha
here is my code
await page.goto('https://accounts.google.com/AccountChooser?service=mail&continue=https://mail.google.com/mail/', {timeout: 60000})
.catch(function (error) {
throw new Error('TimeoutBrows');
});
await page.waitForSelector('#identifierId' , { visible: true });
await page.type('#identifierId' , 'myemail');
await Promise.all([
page.click('#identifierNext') ,
page.waitForSelector('.whsOnd' , { visible: true })
])
await page.type('#password .whsOnd' , "mypassword");
await page.click('#passwordNext');
await page.waitFor(5000);
but i always end up with this message
I even tried to just open the login window with puppeteer and fill the login form manually myself, but even that failed.
Am I missing something ?
When I look into console there is a failed ajax call just after login.
Request URL: https://accounts.google.com/_/signin/challenge?hl=en&TL=APDPHBCG5lPol53JDSKUY2mO1RzSwOE3ZgC39xH0VCaq_WHrJXHS6LHyTJklSkxd&_reqid=464883&rt=j
Request Method: POST
Status Code: 401
Remote Address: 216.58.213.13:443
Referrer Policy: no-referrer-when-downgrade
)]}'
[[["er",null,null,null,null,401,null,null,null,16]
,["e",2,null,null,81]
]]
I've inspected your code and it seems to be correct despite of some selectors. Also, I had to add a couple of timeouts in order to make it work. However, I failed to reproduce your issue so I'll just post the code that worked for me.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('https://accounts.google.com/AccountChooser?service=mail&continue=https://mail.google.com/mail/', {timeout: 60000})
.catch(function (error) {
throw new Error('TimeoutBrows');
});
await page.screenshot({path: './1.png'});
...
})();
Please, note that I run browser in normal, not headless mode. If you take a look at screenshot at this position, you will see that it is correct Google login form
The rest of the code is responsible for entering password
const puppeteer = require('puppeteer');
(async () => {
...
await page.waitForSelector('#identifierId', {visible: true});
await page.type('#identifierId', 'my#email');
await Promise.all([
page.click('#identifierNext'),
page.waitForSelector('.whsOnd', {visible: true})
]);
await page.waitForSelector('input[name=password]', {visible: true});
await page.type('input[name=password]', "my.password");
await page.waitForSelector('#passwordNext', {visible: true});
await page.waitFor(1000);
await page.click('#passwordNext');
await page.waitFor(5000);
})();
Please also note few differences from your code - the selector for password field is different. I had to add await page.waitForSelector('#passwordNext', {visible: true}); and a small timeout after that so the button could be clicked successfully.
I've tested all the code above and it worked successfully. Please, let me know if you still need some help or facing troubles with my example.
The purpose of question is to login to Gmail. I will share another method that does not involve filling email and password fields on puppeteer script
and works in headless: true mode.
Method
Login to your gmail using normal browser (google chrome preferebbly)
Export all cookies for the gmail tab
Use page.setCookie to import the cookies to your puppeteer instance
Login to gmail
This should be no brainer.
Export all cookies
I will use an extension called Edit This Cookie, however you can use other extensions or manual methods to extract the cookies.
Click the browser icon and then click the Export button.
Import cookies to puppeteer instance
We will save the cookies in a cookies.json file and then import using page.setCookie function before navigation. That way when gmail page loads, it will have login information right away.
The code might look like this.
const puppeteer = require("puppeteer");
const cookies = require('./cookies.json');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Set cookies here, right after creating the instance
await page.setCookie(...cookies);
// do the navigation,
await page.goto("https://mail.google.com/mail/u/0/#search/stackoverflow+survey", {
waitUntil: "networkidle2",
timeout: 60000
});
await page.screenshot({ path: "example.png" });
await browser.close();
})();
Result:
Notes:
It was not asked, but I should mention following for future readers.
Cookie Expiration: Cookies might be short lived, and expire shortly after, behave differently on a different device. Logging out on your original device will log you out from the puppeteer as well since it's sharing the cookies.
Two Factor: I am not yet sure about 2FA authentication. It did not ask me about 2FA probably because I logged in from same device.

Categories

Resources