I have a trivial question that I can't find an answer to using Microsoft Playwright framework. According to documentation you can fetch an iframe with the following code:
const frame = page.frame('frame-login');
But how do I use a selector to find and interact with an iframe? I need to use a CSS selector to find my iframe since it does not have an id.
Any help appreciated
You can use elementHandle.contentFrame()
await page.waitForSelector('.class-name')
const elementHandle = await page.$('.class-name')
const frame = await elementHandle.contentFrame()
From that moment you can interact with the content of the <iframe> like: await frame.<method_name>.
You can get the ElementHandle calling $ and then call the contentFrame function:
const handle = await page.$('.frame');
const contentFrame = await handle.contentFrame();
Related
When I used Cheerio to scrape https://www.bankofamerica.com/home-equity/assumptions-home-equity/?loanType=homeEquity&state=CA, I only receive a variable name instead of the variable value.
Code:
const BankofAmericaScraper = async (browser) => {
const date = new Date().toLocaleDateString();
const page = await browser.newPage();
await page.goto(URL, {
waitUntil: ["load"],
timeout: 0,
});
const MortgagesPage = await page.content();
const $ = cheerio.load(MortgagesPage);
const step1 = Object.values($(".col-num-2")[2])[5];
}
I get {{ percentage rates.product.currentRate }} and not 6.650.
How do I access the variable? I'm using a headless browser to evaluate it.
Short answer: With Cheerio, you cant
So right off the bat, Cheerio documentation states
Cheerio parses markup and provides an API for traversing/manipulating
the resulting data structure. It does not interpret the result as a
web browser does. Specifically, it does not produce a visual
rendering, apply CSS, load external resources, or execute JavaScript
The trade off for this is having that speed of returning data, in comparison to other libraries that emulate the page.
I have a NodeJS Typescript project, and I am trying to get all the 'p' tags from a dynamically rendered website (not STATIC HTML but instead makes multiple requests to backend to get some data and render webpage). I am using typescript and have ["es6", "dom"] in my lib, and I have the following code (this is all my code in the project so far):
import puppeteer from 'puppeteer';
const getLinks = async () => {
const browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto('https://webscraper.io/test-sites', { waitUntil: 'networkidle0' });
const links = await page.evaluate(() => document.querySelectorAll('p'));
console.log(links);
await browser.close();
}
However, I keep getting undefined when I print links. I assume this is because the program can't find any 'p' tags. Why is this?
Note: the url provided is just an example. I have tried across multiple different sites and I still get undefined.
Any help is appreciated! Thanks!
Don't use page.evaluate to get elements, use waitForSelector/waitForXpath/$x/$$ instead (see Puppeteer doc to know the differences between them: https://devdocs.io/puppeteer/index#pageselector-1):
const links: ElementHandle[] = await mainPage.$$("p");
So, I am using Puppeteer (a headless browser) to scrape through a website, and when I access that url, how can I load jQuery to use it inside my page.evaluate() function.
All I have now is a .js file and I'm running the code below. It goes to my URL as intended until I get an error on page.evaluate() since it seems like it's not loading the jQuery as I thought it would from the code on line 7: await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.2.1.min.js'})
Any ideas how I can load jQuery correctly here, so that I can use jQuery inside my page.evaluate() function?
(async() => {
let url = "[website url I'm scraping]"
let browser = await puppeteer.launch({headless:false});
let page = await browser.newPage();
await page.goto(url, {waitUntil: 'networkidle2'});
// code below doesn't seem to load jQuery, since I get an error in page.evaluate()
await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.2.1.min.js'})
await page.evaluate( () => {
// want to use jQuery here to do access DOM
var classes = $( "td:contains('Lec')")
classes = classes.not('.Comments')
classes = classes.not('.Pct100')
classes = Array.from(classes)
});
})();
You are on the right path.
Also I don't see any jQuery code being used in your evaluate function.
There is no document.getElement function.
The best way would to be to add a local copy of jQuery to avoid any cross origin errors.
More details can be found in the already answered question here.
UPDATE: I tried a small snippet to test jquery. The puppeteer version is 10.4.0.
(async () => {
const browser = await puppeteer.launch({headless:false});
const page = await browser.newPage();
await page.goto('https://google.com',{waitUntil: 'networkidle2'});
await page.addScriptTag({path: "jquery.js"})
await page.evaluate( () => {
let wrapper = $(".L3eUgb");
wrapper.css("background-color","red");
})
await page.screenshot({path:"hello.png"});
await browser.close();
})();
The screenshot is
So the jquery code is definitely working.
Also check if the host website doesn't have a jQuery instance already. In that case you would need to use jquery noConflict
$.noConflict();
Fixed it!
I realized I forgot to include the code where I did some extra navigation clicks after going to my initial URL, so the problem was from adding the script tag to my initial URL instead of after navigating to my final destination URL.
I also needed to use
await page.waitForNavigation({waitUntil: 'networkidle2'})
before adding the script tag so that the page was fully loaded before adding the script.
I'm trying to figure out a way to scrape next page link from a webpage using xpath within puppeteer. When I execute the script, I can see that the script gets gibberish result even when the xpath is correct. How can I fix it?
const puppeteer = require("puppeteer");
const base = "https://www.timesbusinessdirectory.com";
let url = "https://www.timesbusinessdirectory.com/company-listings";
(async () => {
const browser = await puppeteer.launch({headless:false});
const [page] = await browser.pages();
await page.goto(url,{waitUntil: 'networkidle2'});
page.waitForSelector(".company-listing");
const nextPageLink = await page.$x("//a[#aria-label='Next'][./span[#aria-hidden='true'][contains(.,'Next')]]", item => item.getAttribute("href"));
url = base.concat(nextPageLink);
console.log("========================>",url)
await browser.close();
})();
Current output:
https://www.timesbusinessdirectory.comJSHandle#node
Expected output:
https://www.timesbusinessdirectory.com/company-listings?page=2
First of all, there's a missing await on page.waitForSelector(".company-listing");. Not awaiting this defeats the point of the call entirely, but it could be that it incidentally works since the very strict waitUntil: "networkidle2" covers the selector you're interested in anyway, or the xpath is statically present (I didn't bother to check).
Generally speaking, if you're using waitForSelector right after a page.goto, waitUntil: "networkidle2" only slows you down. Only keep it if there's some content you need on the page other than the waitForSelector target, otherwise you're waiting for irrelevant requests that are pulling down images, scripts and data potentially unrelated to your primary target. If it's a slow-loading page, then increasing the timeout on your waitFor... is the typical next step.
Another note is that it's sort of odd to waitForSelector on some CSS target, then try to select an xpath immediately afterwards. It seems more precise to waitForXPath, then call $x on the exact same xpath pattern twice.
Next, let's look at the docs for page.$x:
page.$x(expression)
expression <string> Expression to evaluate.
returns: <Promise<Array<ElementHandle>>>
The method evaluates the XPath expression relative to the page document as its context node. If there are no such elements, the method resolves to an empty array.
Shortcut for page.mainFrame().$x(expression)
So, unlike evaluate, $eval and $$eval, $x takes 1 parameter and resolves to an elementHandle array. Your second parameter callback doesn't get you the href like you think -- this only works on eval-family functions.
In addition to consulting the docs, you can also console.log the returned value to confirm the behavior. The JSHandle#node you're seeing in the URL isn't gibberish, it's the stringified form of the JSHandle object and provides information you can cross-check against the docs.
The solution is to grab the first elementHandle from the array returned by the function and then evaluate on that handle using your original callback:
const puppeteer = require("puppeteer");
const url = "https://www.timesbusinessdirectory.com/company-listings";
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
await page.goto(url);
const xp = `//a[#aria-label='Next']
[./span[#aria-hidden='true'][contains(.,'Next')]]`;
await page.waitForXPath(xp);
const [nextPageLink] = await page.$x(xp);
const href = await nextPageLink.evaluate(el => el.getAttribute("href"));
console.log(href); // => /company-listings?page=2
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
As an aside, there's also el => el.href for grabbing the href attribute. .href includes the base URL here, so you won't need to concatenate. In general, behavior differs beyond delivering the absolute vs relative path, so it's good to know about both options.
I am struggling for hours trying to get to the iframe but I just can't type in this box for some reason. The HTML does not show input on the page or in the iframe this is the code I tried and was the closest but not really getting to the box to type. this is the part of the HTML I try to get into.
inspect from Chrome
and here is the code I am using
const iframeHandle = await page.$$('iframe');
const contentFrame = await iframeHandle[2].contentFrame();
const tester = await contentFrame.$$('#rte');
and when I run
console.log(tester.length);
I get 1 so i am getting into the iframe but I dont know how to type with in it so far I can see its only an emtpy tag in it
Maybe I am just missing something small any help will be most appreciated
You can utilize the frame call.
So from your code
const iframeHandle = await page.$$('iframe');
await this.browser.frame(iframeHandle);
Or something of the sort, according to your code should get you into that iframe.
Try focus on the input and type
const cardElement = await paymentFrame.$('#cardNumber');
// Input is focused.
await cardElement.focus();
or this should work
const frames = await page.frames();
let iframe = frames.find(f => f.name() === 'any_iframe');
const textInput = await iframe.$('#textInput');
textInput.click(); // this focusses on the element
textInput.type('description text');