Puppeteer Cant go into iframe

Puppeteer Cant go into iframe - javascript

I am struggling for hours trying to get to the iframe but I just can't type in this box for some reason. The HTML does not show input on the page or in the iframe this is the code I tried and was the closest but not really getting to the box to type. this is the part of the HTML I try to get into.
inspect from Chrome
and here is the code I am using
const iframeHandle = await page.$$('iframe');
const contentFrame = await iframeHandle[2].contentFrame();
const tester = await contentFrame.$$('#rte');
and when I run
console.log(tester.length);
I get 1 so i am getting into the iframe but I dont know how to type with in it so far I can see its only an emtpy tag in it
Maybe I am just missing something small any help will be most appreciated

You can utilize the frame call.
So from your code
const iframeHandle = await page.$$('iframe');
await this.browser.frame(iframeHandle);
Or something of the sort, according to your code should get you into that iframe.

Try focus on the input and type
const cardElement = await paymentFrame.$('#cardNumber');
// Input is focused.
await cardElement.focus();
or this should work
const frames = await page.frames();
let iframe = frames.find(f => f.name() === 'any_iframe');
const textInput = await iframe.$('#textInput');
textInput.click(); // this focusses on the element
textInput.type('description text');

Related

Puppeteer renders ALL HTML Content but only gets some tags

I am web scraping using NodeJS/typescript.
I have a problem using puppeteer where I get the fully rendered page (which I verify by running await page.content()). I printed the content and found that it had 26 'a' tags (links). However, when I search with puppeteer, I only get 20.
What is more strange is that sometimes I will get all the 'a' tags on the page and sometimes it gets less 'a' tags than on the page - all without changing the code! It seems to be kind of random.
I've seen some suggestions online saying to use a waitForElement method or something along those lines. Basically, before searching for tags, it ensures an element is on the page. I don't think this would help in my case because clearly puppeteer is getting everything it needs as shown by the await page.content() method.
Does anyone know why this may be happening? Thanks! A simplified snippet of my code is below.
const getLinksFromPage = async (
browser: puppeteer.Browser,
url: string
) => {
const page = await browser.newPage();
const curLink = book.sportsURLs[pageIndex];
await page.goto(url, { waitUntil: 'networkIdle0'});
const html = await page.content(); // this code gets the content and prints it
console.log(html); // so I can verify number of 'a' tags
const rawLinks = await page.$$eval('a', (elements: Element[]) => {
return elements
.map((element: Element) => element.getAttribute('href')!)
});
await page.close();
return rawLinks
};

Error on Goto function on playwright when navigate to PDF file

I have this link https://nfse.blumenau.sc.gov.br/contrib/app/nfse/rel/rp_nfse_v23.aspx?s=61154301&e=00165960000101&f=2BED3D1E8 (if you try to access its gonna ask to solve a captcha but as long as i already have the session, the playwright doesnt need to worry it).
OUT page.goto: net::ERR_ABORTED at https://nfse.blumenau.sc.gov.br/contrib/app/nfse/rel/rp_nfse_v23.aspx?s=61154301&e=00165960000101&f=2BED3D1E8
Anybody knows why playwright cannot access it? I need to download the PDF Buffer of this link.

You can use the fetch API. Something like this:
const fetchResponse = browserContect.request.get('https://nfse.blumenau.sc.gov.br/contrib/app/nfse/rel/rp_nfse_v23.aspx?s=61154301&e=00165960000101&f=2BED3D1E8')
const pdfBuffer = await fetchResponse.body();

Found a Solution:
First i didnt concatenate the cookie on the calling of the function
const cook = 'ASP.NET_SessionId=' + cookie;
await setCookie(cook, urlFinal);
Then i used the got module to put the cookie and get the buffer of the pdf:
response = await got(urlFinal, {cookieJar}).buffer();
Plus: Sometimes it returned a blank pdf (i think because of the timeout of loading it). So i inserted a loop to check the size of the buffer and tried 20 times until it gets more than 'X' of lenght.
for (let j = 0;j<=25;j++){
console.log('Entrei no looping ==> ' + j);
response = await got(urlFinal, {cookieJar}).buffer();
if (response.toString().length>=10000){
j=21;
}
}
console.log('tamanho do buffer ==> ' + response.toString().length);

I tried that already but didnt work either:
const cookie = (await page.context().cookies()).filter(cookie => cookie.name === 'ASP.NET_SessionId')
.map(cookie => cookie.value)[0];
console.log(cookie);
const cookieJar = new CookieJar();
const setCookie = promisify(cookieJar.setCookie.bind(cookieJar));
await setCookie('ASP.NET_SessionId=' + cookie, urlFinal);
const response = await got(urlFinal, {cookieJar}).buffer();
its really a challenge. Because if i wont go with page.goto i loose the session. The code bellow would solve the problem.
await page.goto(urlFinal);

Get complete web page source html with puppeteer - but some part always missing

I am trying to scrape specific string on webpage below :
https://www.booking.com/hotel/nl/scandic-sanadome-nijmegen.en-gb.html?checkin=2020-09-19;checkout=2020-09-20;i_am_from=nl;
The info I want to get from this web page source is the number serial in string below (that is something I can search when right-click mouse ->
"View Page source"):
name="nr_rooms_4377601_232287150_0_1_0"/ name="nr_rooms_4377601_232287150_1_1_0"
I am using "puppeteer" and below is my code :
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
//await page.goto('https://example.com');
const response = await page.goto("My-url-above");
let bodyHTML = await page.evaluate(() => document.body.innerHTML);
let outbodyHTML = await page.evaluate(() => document.body.outerHTML);
console.log(await response.text());
console.log(await page.content());
await browser.close();
})()
But I cannot find the strings I am looking for in response.text() or page.content().
Am I using the wrong methods in page ?
How can I dump the actual page source on the web page , the one exactly the same as I right-click the mouse ?

If you investigate where these strings are appearing then you can see that in <select> elements with a specific class (.hprt-nos-select):
<select
class="hprt-nos-select"
name="nr_rooms_4377601_232287150_0_1_0"
data-component="hotel/new-rooms-table/select-rooms"
data-room-id="4377601"
data-block-id="4377601_232287150_0_1_0"
data-is-fflex-selected="0"
id="hprt_nos_select_4377601_232287150_0_1_0"
aria-describedby="room_type_id_4377601 rate_price_id_4377601_232287150_0_1_0 rate_policies_id_4377601_232287150_0_1_0"
>
You would wait until this element is loaded into the DOM, then it will be visible in the page source as well:
await page.waitForSelector('.hprt-nos-select', { timeout: 0 });
BUT your issue actually lies in the fact, that the url you are visiting has some extra URL parameters: ?checkin=2020-09-19;checkout=2020-09-20;i_am_from=nl; which are not taken into account by puppeteer (you can take a full page screenshot and you will see that it still has the default hotel search form without the specific hotel offers, and not the ones you are expecting).
You should interact with the search form with puppeteer (page.click() etc.) to set the dates and the origin country yourself to achieve the expected page content.

How to use a selector finding an frame (iframe) using Playwright

I have a trivial question that I can't find an answer to using Microsoft Playwright framework. According to documentation you can fetch an iframe with the following code:
const frame = page.frame('frame-login');
But how do I use a selector to find and interact with an iframe? I need to use a CSS selector to find my iframe since it does not have an id.
Any help appreciated

You can use elementHandle.contentFrame()
await page.waitForSelector('.class-name')
const elementHandle = await page.$('.class-name')
const frame = await elementHandle.contentFrame()
From that moment you can interact with the content of the <iframe> like: await frame.<method_name>.

You can get the ElementHandle calling $ and then call the contentFrame function:
const handle = await page.$('.frame');
const contentFrame = await handle.contentFrame();

Finding words from a list on a page

I'm currently working on a script that detects bad words and sends out an alert when the word occurs.
I'm using puppeteer, it has access to the chrome browser and it's able to run command in the terminal. I have tried a lot of things in the console, for example "includes" but this gives an undefined error. https://love2dev.com/blog/javascript-includes/
I also tried adding the code of an answer here on Stackoverflow; find words in html page with javascript
But this does not work within puppeteer, it only works when you paste it in the terminal. This system can only search for 1 word. My idea was to make an array that contains all words that must be filtered.
So far, I have written the following it. As far as I understood, to run code I need to put in into the {} of the evaluate().
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const id = new Date();
console.log(id)
await page.goto('https://www.mediamarkt.nl/nl/search.html?query=iphone&searchProfile=onlineshop&channel=mmnlnl', {waitUntil: 'networkidle2'});
const html = await page.evaluate(() => {
return page.includes("mediamarkt");
});
console.log(html)
console.log("it worked, i guess");
})();
This generates errors like:
19-07-23T23:38:23.763Z
(node:24944) UnhandledPromiseRejectionWarning: Error: Evaluation failed: ReferenceError: page is not defined
My question for you is, how do I create a bad word filter using these tools or where can I learn more about the skills I need to build this.
Thank you

If you just want to find all bad words (for example you have an array of bad words) and you want to check that page content contains any of bad words, you can do something like this:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const id = new Date();
console.log(id)
await page.goto('https://www.mediamarkt.nl/nl/search.html?query=iphone&searchProfile=onlineshop&channel=mmnlnl', {waitUntil: 'networkidle2'});
const blackList = ['home', 'mediamarkt', 'sorteren', 'maakt'];
const pageContent = await page.$eval('body', el => el.textContent);
const result = pageContent.split(/\s+/).filter(text => blackList.includes(text.toLowerCase()));
await browser.close();
console.log("Here is the array of the found words", result);
})();
It will return the array of all found bad words on the page. Hope I get correctly your question.

If you want to find bad phrases (including spaces), you can try:
const found = await page.evaluate(() => window.find(elementsToSearchFor));
or
const found = (await page.content()).match(REGEX)

Develop Reference

JavaScript is the programming language of the Web.

Puppeteer Cant go into iframe - javascript

You can utilize the frame call. So from your code const iframeHandle = await page.$$('iframe'); await this.browser.frame(iframeHandle); Or something of the sort, according to your code should get you into that iframe.

Related

Puppeteer renders ALL HTML Content but only gets some tags

Error on Goto function on playwright when navigate to PDF file

Get complete web page source html with puppeteer - but some part always missing

How to use a selector finding an frame (iframe) using Playwright

Finding words from a list on a page

Categories

Resources