Puppeteer invalid selector - javascript

I'm trying to fill a form with Puppeteer on a webpage, the input has an id and I'm using it as a selector.
The ID is :
#loginPage:SiteTemplate:formulaire:login-field
When I get the selector from chrome it gives me that :
#loginPage\3a SiteTemplate\3a formulaire\3a login-field
And wether I put the first or the second option in Puppeteer it spits me out this error :
Error: Evaluation failed: DOMException: Failed to execute 'querySelector' on 'Document': '#loginPage:SiteTemplate:formulaire:login-field' is not a valid selector.
Here is the code if needed :
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('XXX');
await page.click(GOTO_LOGIN_BUTTON_SELECTOR)
await page.waitForNavigation({waitUntil: 'load'});
await page.waitFor(EMAIL_SELECTOR); // here
await page.focus(EMAIL_SELECTOR);
await page.keyboard.type(CREDS.email);
await page.focus(PASSWORD_SELECTOR);
await page.keyboard.type(CREDS.password);
await browser.close();
})();

One option, for an ID like that, is to do as follows:
const EMAIL_SELECTOR = '[id="loginPage:SiteTemplate:formulaire:login-field"]';
Or, if that doesn't work, split it up as follows to work around the use of the ::
const EMAIL_SELECTOR = '[id*="loginPage"][id*="SiteTemplate"][id*="formulaire"][id*="login-field"]';
Hopefully one (or both) of those will help!

I'm not sure but I would drop the line await page.waitForNavigation({waitUntil: 'load'});
And replace await page.waitFor(EMAIL_SELECTOR); // here
with await page.waitForSelector(EMAIL_SELECTOR);
And test to see if just using #login-field or other #loginPage > SiteTemplate > formulaire > login-field
I could be wrong as I'm still working this out too.

Related

web scraping with puppeteer does not find the CSS tag

im starting to learn web scraping in javascript with puppeteer. I found a video that I liked that showcases puppeteer and I'm trying to scrape the same information as the video(link). the page has changed a little from the video so I used what I think are the correct tags.
the problem comes when I try to find the "h3" tag. the tag exists in the DOM but my code refuses to acknowledge its existence but works "fine" when looking for the "h2" tag.
what I want to know is why my code does not retrieve it.
web page: https://marketingplatform.google.com/about/partners/find-a-partner?utm_source=marketingplatform.google.com&utm_medium=et&utm_campaign=marketingplatform.google.com%2Fabout%2Fpartners%2F
// normal things to launch it
const puppeteer = require("puppeteer");
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const url = "https://marketingplatform.google.com/about/partners/find-a-partner?utm_source=marketingplatform.google.com&utm_medium=et&utm_campaign=marketingplatform.google.com%2Fabout%2Fpartners%2F";
await page.goto(url);
// here comes the problem
// this doesn't work v
const h3 = await page.evaluate(() => document.querySelector("h3").textContent);
console.log(h3); //the error is because it tries to get the text content of null meaning it didn't found "h3"
// this DOES work v
const h2 = await page.evaluate(() => document.querySelector("h2").textContent);
console.log(h2);
//await browser.close();
})();
i know that "h3" exists. I will appreciate it if you explain a little of what happens so I can learn more
thx.
h3 header not exist on page, we need wait it by waitForSelector:
// normal things to launch it
const puppeteer = require("puppeteer");
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const url = "https://marketingplatform.google.com/about/partners/find-a-partner?utm_source=marketingplatform.google.com&utm_medium=et&utm_campaign=marketingplatform.google.com%2Fabout%2Fpartners%2F";
await page.goto(url);
await page.waitForSelector('h3')
const h3 = await page.evaluate(() => document.querySelector("h3").textContent);
console.log(h3);
const h2 = await page.evaluate(() => document.querySelector("h2").textContent);
console.log(h2);
await browser.close(); // don't forget close it.
})();
output is:
Viden
Find your perfect match.

Puppeteer scraping; adding text to search bar returning error

I'm learning to use puppeteer but I'm running into trouble. I'm trying to create a program which takes in a date and finds a famous persons whose birthday is on that date. I have this code:
const puppeteer = require('puppeteer');
try {
(async () => {
console.log('here');
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.famousbirthdays.com/');
console.log('me');
await page.type(document.querySelector('input'), '11-16-1952');
console.log('you clicked');
await page.click(document.querySelector('button'));
console.log('Here');
await page.waitForSelector(
document.querySelector('div[class="list_page"]')
);
let data = await page.evaluate(() => {
let name = document.querySelector('div[class="name"').textContent;
return { name };
});
console.log(data);
browser.close();
})();
} catch (err) {
console.error(err);
}
Im not understand why I'm getting errors at the page.type line? I get an error and cant reach that log of "you clicked". If I read the documentation correctly, .click can take in a selector and text to type into it so I'm pretty sure im using it correctly. I checked on the browser console and document.querySelector('input') does pull up the element I want(the search bar). Any advice is appreciated. Thanks for looking.

Puppeteer: Get DOM element which isn't in the initial DOM

I'm trying to figure out how to get the elements in e.g. a JS gallery that loads its images after they have been clicked on.
I'm using a demo of Viewer.js as an example. The element with the classes .viewer-move.viewer-transition isn't in the initial DOM. After clicking on an image the element is available but if I use $eval the string is empty. If I open the console in the Puppeteer browser and execute document.querySelector('.viewer-move.viewer-transition') I get the element but in the code the element isn't available.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://fengyuanchen.github.io/viewerjs/');
await page.click('[data-original="images/tibet-1.jpg"]');
let viewer = await page.$eval('.viewer-move.viewer-transition', el => el.innerHTML);
console.log(viewer);
})();
You get the empty string because the element has no content so inner HTML is empty. outerHTML seems working:
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch({ headless: false });
const [page] = await browser.pages();
await page.goto('https://fengyuanchen.github.io/viewerjs/');
await page.click('[data-original="images/tibet-1.jpg"]');
await page.waitForSelector('.viewer-move.viewer-transition');
const viewer = await page.$eval('.viewer-move.viewer-transition', el => el.outerHTML);
console.log(viewer);
await browser.close();
} catch (err) {
console.error(err);
}
})();
Since you have to wait until it is available, the most convenient method would be to use await page.waitForSelector(".viewer-move.viewer-transition") which would wait util the element is added to DOM, although this has the caveat that this continues execution the moment that the element is added to DOM, even if it is empty/hidden.

Puppeteer throws error with "Node not visible..."

When I open this page with puppeteer, and try to click the element, it throws an Error when it is expected to be possible to click the element.
const puppeteer = require('puppeteer');
const url = "https://www.zapimoveis.com.br/venda/apartamentos/rj+rio-de-janeiro/";
async function run() {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto(url, {waitUntil: 'load'});
const nextPageSelector = "#list > div.pagination > #proximaPagina";
await page.waitForSelector(nextPageSelector, {visible: true})
console.log("Found the button");
await page.click(nextPageSelector)
console.log('clicked');
}
run();
Here's a working version of your code.
const puppeteer = require('puppeteer');
const url = "https://www.zapimoveis.com.br/venda/apartamentos/rj+rio-de-janeiro/";
async function run() {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto(url);
const nextPageSelector = "#list > div.pagination > #proximaPagina";
console.log("Found the button");
await page.evaluate(selector=>{
return document.querySelector(selector).click();
},nextPageSelector)
console.log('clicked');
}
run();
I personally prefer to use page.evaluate, as page.click doesn't work for me neither in some cases and you can execute whatever js on the page directly.
The only thing to know is the syntax :
- first param : function to execute
- second param : the variable(s) to be passed to this function
Found the problem.
When you page.click(selector) or ElementHandle.click(), Puppeteer scrolls to the element, find its center coordinates, and finaly clicks. It uses the function _clickablePoint at node_modules/puppeteer/lib/JSHandle.js to find the coordinates.
The problem with this website (zapimoveis) is that the scroll into the element's viewport is too slow, so Puppeteer can't find its (x,y) center.
One nice way you can click on this element is to use page.evaluate to click it using page javascript. But, there is a gambi's way that I prefer. :) Change the line await page.click(nextPageSelector) by these lines:
try { await page.click(nextPageSelector) } catch (error) {} // sacrifice click :)
await page.waitFor(3000); // time for scrolling
await page.click(nextPageSelector); // this will work

cannot find context with specified id undefined

I am trying to create an app that does a google image search of a random word and selects/clicks the first result image.
I am successful until the code is attempting to select the result image and it throws the following error in my terminal:
UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection
id: 1): Error: Protocol error (Runtime.callFunctionOn): Cannot find
context with specified id undefined
Here is my code:
const pup = require('puppeteer');
const random = require('random-words');
const url = 'http://images.google.com';
(async() => {
const browser = await pup.launch({headless: false});
const page = await browser.newPage();
await page.goto(url);
const searchBar = await page.$('#lst-ib');
await searchBar.click();
await page.keyboard.type(`${random()}`);
const submit = await page.$('#mKlEF');
await submit.click();
await page.keyboard.type(random());
await page.keyboard.press('Enter');
const pic = await page.evaluate(() => {
return document.querySelectorAll('img');
});
pic.click();
})();
document.querySelectorAll('img') is not serialisable, so it returns undefined (see this issue as reference)
Please use something like: (depends on which element you want to click)
await page.$$eval('img', elements => elements[0].click());
This is a long dead thread but if anyone runs into this issue and the above answer does not apply to you, try adding a simple await page.waitForTimeout(2000). My test was completing but I was getting this error when attempting to await browser.close(); Adding the wait after searching for my final selector seems to have resolved the issue.

Categories

Resources