I'm trying to get a text from a modal on Chrome. Using the console, I can get the inner text as follows:
document.querySelector('.my-form > a').innerText
// returns http://a-url.com
Now, on my test, I can evaluate the element using
const myText = Selector('.my-form > a').innerText;
await t
.expect(myText).contains('url');
and I can even click on that URL
await t.click(myText);
but I cannot put that inner text to a variable, for instance. I tried using a ClientFunction from this post
const getUrl = ClientFunction(() => document.querySelector('.my-form > a').innerText);
test('My Test', async t => {
const text = await getUrl();
console.log(text);
});
// this results in
// TypeError: Cannot read property 'innerText' of null
and tried using a plain Selector as this post suggests
const text = Selector('.my-form > a').innerText;
const inner = await text.textContent;
console.log(inner);
// prints: undefined
How to extract a text from an element? I understand that t.selectText is limited in this scenario, right?
From the documentation you want:
const text = await Selector('.my-form > a').innerText;
That will do the trick:
const paragraph = Selector("p").withText("possible entries");
const extractEntries = await paragraph.textContent;
Related
Is there a way to traverse through html elements in playwright like cy.get("abc").find("div") in cypress?
In other words, any find() equivalent method in playwright?
page.locator("abc").find() is not a valid method in playwright though :(
If you assign the parent element handle to a variable, any of the findBy* functions (or locator) will search only in the parent element. An example below where the parent is a div, the child is a button, and we use .locator() to find the elements.
test('foo', async ({ page }) => {
await page.goto('/foo');
const parent = await page.locator('div');
const child = await parent.locator('button');
});
You can just combine the selectors, this will resolve to div below abc
page.locator("abc div")
Let's consider the website https://www.example.com with the following HTML
<body style="">
<div>
<h1>Example Domain</h1>
<p>This domain is for use in illustrative examples in documents. You may use this
domain in literature without prior coordination or asking for permission.</p>
<p>
More information...
</p>
</div>
</body>
As mentioned by #agoff, you can use nested locator page.locator('p').locator('a') or you can specify the nested selector directly in the locator page.locator('p >> a')
// #ts-check
const playwright = require('playwright');
(async () => {
const browser = await playwright.webkit.launch();
const context = await browser.newContext();
const page = await context.newPage();
await page.goto('https://www.example.com/');
// Here res1 and res2 produces same output
const res1 = await page.locator('p').locator('a'); // nested locator
const res2 = await page.locator('p >> a'); // directly specifying the selector to check the nested elements
const out1 = await res1.innerText();
const out2 = await res2.innerText();
console.log(out1 == out2); // output: true
console.log(out1); // output: More information...
console.log(out2); // output: More information...
// Traversal
const result = await page.locator('p');
const elementList = await result.elementHandles(); // elementList will contain list of <p> tags
for (const element of elementList)
{
const out = await element.innerText()
console.log(out);
}
// The above will output both the <p> tags' innerText i.e
// This domain is for use in illustrative examples in...
// More information...
await browser.close();
})();
Since you mentioned that you need to traverse through the HTML elements, elementHandles can be used to traverse through the elements specified by the locator as mentioned in the above code.
I'm trying to get an element's attribute in a test. My test looks like this:
test(`Should be at least 5 characters long`, async({ page }) => {
await page.goto('http://localhost:8080');
const field = await page.locator('id=emailAddress');
const _attribute = await field.getAttribute('minlength');
const minlength = Number(_attribute);
await expect(minlength).toBeGreaterThanOrEqual(5);
});
When I run this, I can see that the minlength value is 0. This is because _attribute is null. However, I don't understand why. field is a Locator. But, I can't seem to get the attribute or it's value. What am I doing wrong?
This is the same but uses a native PlayWright function for getting attribute values
const inputElement = page.locator('#emailAddress');
const minLength = await inputElement.getAttribute('minLength');
The following worked for me
const inputElement = page.locator('#emailAddress');
minLength = await inputElement.evaluate(e => (e as HTMLInputElement).minLength);
I'm trying to scrape this element: on this website.
My JS code:
const puppeteer = require("puppeteer");
const url = 'https://magicseaweed.com/Bore-Surf-Report/1886/'
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const title = await page.$x('/html/body/div[1]/div[2]/div[2]/div/div[2]/div[2]/div[2]/div/div/div[1]/div/header/h3/div[1]/span[1]')
let text = await page.evaluate(res => res.textContext, title[0])
console.log(text) // UNDEFINED
text is undefined. What is the problem here? Thanks.
I think you need to fix 1 or 2 issues on your code.
textContent vs textContext
xpath
For the content you want the xpath should be:
const title = await page.$x('/html/body/div[1]/div[2]/div[2]/div/div[2]/div[2]/div[2]/div/div/div[1]/div/div[1]/div[1]/div/div[2]/ul[1]/li[1]/text()')
And to get the content of this:
const text = await page.evaluate(el => {
return el.textContent.trim()
}, title[0])
Notice you need send title[0] as an argument to the page function.
OR
if you don't need to use xpath, it seems you could get directly using class name to find the element:
const rating = await page.evaluate(() => {
return $('.rating.rating-large.clearfix > li.rating-text')[0].textContent.trim()
})
I'm running some Node.js code to scrape a website and return some text from this part of the html:
And here's the code I'm using to get it
const fs = require('mz/fs');
const xpath = require('xpath');
const parse5 = require('parse5');
const xmlser = require('xmlserializer');
const dom = require('xmldom').DOMParser;
const axios = require('axios');
(async () => {
const response = await axios.get('https://www.aritzia.com/en/product/sculpt-knit-tank-%28arjun-knit-top%29/66139.html?dwvar_66139_color=17388');
const html = response.data;
const document = parse5.parse(html.toString());
const xhtml = xmlser.serializeToString(document);
const doc = new dom().parseFromString(xhtml);
const select = xpath.useNamespaces({"x": "http://www.w3.org/1999/xhtml"});
const nodes = select("//x:div[contains(#class, 'pdp-product-brand')]/*/text()", doc);
console.log(nodes.length ? nodes[0].nodeValue : nodes.length)
})();
The code above works as expected -- it prints Babaton.
But when I swap out the xpath above for one that includes a instead of * (i.e. //x:div[contains(#class, 'pdp-product-brand')]/a/text()) it instead tells me that nodes.length === 0.
I would expect it to give the same result because the div that it's pointing to does in fact have a child anchor tag (see screenshot above). I'm just confused why it doesn't work with a and was wondering if anybody else knew the answer. Thanks!
I am trying to scrape this page.
https://www.psacard.com/Pop/GetItemTable?headingID=172510&categoryID=20019&isPSADNA=false&pf=0&_=1583525404214
I want to be able to find the grade count for PSA 9 and 10. If we look at the HTML of the page, you will notice that PSA does a very bad job (IMO) at displaying the data. Every TR is a player. And the first TD is a card number. Let's just say I want to get Card Number 1 which in this case is Kevin Garnett.
There are a total of four cards, so those are the only four cards I want to display.
Here is the code I have.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://www.psacard.com/Pop/GetItemTable?headingID=172510&categoryID=20019&isPSADNA=false&pf=0&_=1583525404214");
const tr = await page.evaluate(() => {
const tds = Array.from(document.querySelectorAll('table tr'))
return tds.map(td => td.innerHTML)
});
const getName = tr.map(name => {
//const thename = Array.from(name.querySelectorAll('td.card-num'))
console.log("\n\n"+name+"\n\n");
})
await browser.close();
})();
I will get each TR printed, but I can't seem to dive into those TRs. You can see I have a line commented out, I tried to do this but get an error. As of right now, I am not getting it by the player dynamically... The easiest way I would think is to create a function that would think about getting the specific card would be doing something where the select the TR -> TD.card-num == 1 for Kevin.
Any help with this would be amazing.
Thanks
Short answer: You can just copy and paste that into Excel and it pastes perfectly.
Long answer: If I'm understanding this correctly, you'll need to map over all of the td elements and then, within each td, map each tr. I use cheerio as a helper. To complete it with puppeteer just do: html = await page.content() and then pass html into the cleaner I've written below:
const cheerio = require("cheerio")
const fs = require("fs");
const test = (html) => {
// const data = fs.readFileSync("./test.html");
// const html = data.toString();
const $ = cheerio.load(html);
const array = $("tr").map((index, element)=> {
const card_num = $(element).find(".card-num").text().trim()
const player = $(element).find("strong").text()
const mini_array = $(element).find("td").map((ind, elem)=> {
const hello = $(elem).find("span").text().trim()
return hello
})
return {
card_num,
player,
column_nine: mini_array[13],
column_ten: mini_array[14],
total:mini_array[15]
}
})
console.log(array[2])
}
test()
The code above will output the following:
{
card_num: '1',
player: 'Kevin Garnett',
column_nine: '1-0',
column_ten: '0--',
total: '100'
}