Sequentially pressing each element of a certain class within an SVG - javascript

I've been looking for a good example of click events on every element of a certain class, but I can't seem to find one. In my app, I generate multiple bars in an svg with class .bar.
Is there a nice way to iterate through each bar in the selection and click it?
Here is my code so far (with the link to dev area removed):
const puppeteer = require('puppeteer');
(async () => {
//open browser, use headless to allowing viewing
const browser = await puppeteer.launch({headless: false, sloMo: 80});
const page = await browser.newPage();
//goto link
await page.goto('/link_to_test/');
//scraping automation goes here
await page.waitFor(5000);
let bars = await page.$$(".bar");
for(const idx in bars){
await bars[idx].click({delay:250});
}
// close browser
await browser.close();
})();
I've been looking for a way to select each bar from the $$(".bar") selection and click it but I cannot seem to find any documentation around it.
Update
I increased the page.waitFor to 5000 and removed the ElementHandle from the for loop. Code no longer throws any errors but it doesn't want to click anything.
Looks like this doesn't work for SVG elements yet https://github.com/GoogleChrome/puppeteer/issues/1769

Without seeing more code I am not sure whether this is the answer you need. This code selects all a.bars for a given ul and returns an array of all the hrefs. We then loop through the links and open each one in turn.
I think the missing bit of the jigsaw is that I am mapping the links to an array (see below ... links => links.map((a) => { return a.href }));
const puppeteer = require('puppeteer');
(async () => {
const html = `
<html>
<body>
<ul>
<li><a class="bar" href="https://www.google.com">Goolge</a></li>
<li><a class="bar" href="https://www.bing.com">Bing</a></li>
<li><a class="bar" href="https://duckduckgo.com">DuckDuckGo</a></li>
</ul>
</body>
</html>`;
const browser = await puppeteer.launch({ headless:false});
const page = await browser.newPage();
await page.goto(`data:text/html,${html}`);
const data = await page.$$eval('ul li a.bar', links =>
links.map((a) => { return a.href }));
//You will now have an array of hrefs
for (const i in data) {
console.log("Opening", data[i]);
await page.goto(data[i]);
}
await browser.close();
})();

Related

Puppeteer- How to .click() a single button out of a grid of buttons with same classname?

I'm developing a Nike SNKRS BOT to buy shoes with Puppeteer and Node.js.
I'm having issues to distinguish and .click() Size button screenshot of html devtools and front end buttons
That's my code: i'm not experienced so i have tried everything
const xpathButton = '//*
[#id="root"]/div/div/div[1]/div/div[1]/div[2]/div/section[1]/div[2]/aside/div/div[2]/div/
div[2]/ul/li[1]/button'
const puppeteer = require('puppeteer')
const productUrl = 'https://www.nike.com/it/launch/t/air-max-97-coconut-
milk-black'
const idAcceptCookies = "button[class='ncss-btn-primary-dark btn-lg']"
async function givePage(){
const browser = await puppeteer.launch({headless: false})
const page = await browser.newPage();
return page;
}
async function addToCart(page){
await page.goto(urlProdotto);
await page.waitForSelector(idAcceptCookies);
await page.click(idAcceptCookies,elem => elem.click());
//this is where the issues begin
//attempt 1
await page.evaluate(() => document.getElementsByClassName('size-grid-
dropdown size-grid-button"')[1].click());
//attempt 2
const sizeButton = "button[class='size-grid-dropdown size-grid-button']
button[name='42']";
await page.waitForSelector(sizeButton);
await page.click(sizeButton,elem => elem.click());
}
//attempt 3
await page.click(xpathButton)
//attempt 4
document.evaluate("//button[contains ( ., '36')]", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue
async function checkout(){
var page = await givePage();
await addToCart(page)
}
checkout()
Attempt number 2 looks like the best approach, except your selector is wrong. The button does not have a name attribute, according to your screenshot, so you will need another approach, closer to attempt 3.
You can use puppeteer to select an element by with xpath, and xpath allows you to select by an element's text content.
Try this:
await page.waitForXPath('//button[contains(text(), "EU 36")]')
const [button] = await page.$x('//button[contains(text(), "EU 36")]')
await button.click()
Because the xpath selector is returning an array of element handles, I destructure the first element in the array (which should be the only match), and assign it a value of button. That element handle can now be clicked.

puppeteer not scraping full information from website

I had a puppeteer scrape algorithm that scrapes youtube for the image URL source of videos but my current code only prints 4 strings of output with their URL source and the rest prints empty strings. To check if the error was only with the image source I added code for scraping the video titles as well and the video title scrape code prints all the titles without any empty string. What is the cause of this and how can I fix it to print all image URL sources? I taught of one potential reason why the image source would only be printing 4 strings which is, it might be because youtube has 4 thumbnails per row and the puppeteer is somehow only reading 1 row then printing empty strings for the others but the code I wrote for scraping video titles prints all the video titles which kind of disproves my hypothesis. Any help is appreciated. Thanks in advance.
const puppeteer = require('puppeteer');
async function scrape(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url, {timeout: 0});
const selector1 = 'ytd-thumbnail > a > yt-img-shadow > #img'
const src1 = await page.$$eval(selector1, elems => elems.map(el => el.src))
const selector2 = 'h3 > a > #video-title'
const src2 = await page.$$eval(selector2, elems => elems.map(el => el.textContent))
browser.close();
console.log({src1, src2})
}
scrape("http://www.youtube.com")
It is an Infinite Scrolling behavior on Youtube that ensures the client browser only fetches the items once the user scrolled them into view. You can open DevTools elements tab and investigate that last (nth) ytd-rich-item-renderer:nth-child(n). You will see the yt-img-shadow inside:
<yt-img-shadow
ftl-eligible=""
class="style-scope ytd-thumbnail no-transition empty"
style="background-color: transparent;">
<!--css-build:shady-->
<img id="img" class="style-scope yt-img-shadow" alt="" width="9999">
</yt-img-shadow>
Then you scroll down until the element will be in view and the inner <img> will be changed:
<yt-img-shadow
ftl-eligible=""
class="style-scope ytd-thumbnail no-transition"
style="background-color: transparent;"
loaded="">
<!--css-build:shady-->
<img id="img" class="style-scope yt-img-shadow" alt="" width="9999" src="https://i.ytimg.com/vi/_{id}/hqdefault.jpg?sqp={parameter}">
</yt-img-shadow>
There are many answers on Stackoverflow how to deal with infinite scrolling with puppeteer.
Most probably you will need to use vanilla JS (e.g scrollTo) inside a page.evaluate to scroll as much as you want.
You can get video thumbnails from YouTube like in the code below (also check it on the online IDE):
const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());
const mainPageUrl = "https://www.youtube.com";
async function scrollPage(page, scrollElements) {
let currentElement = 0;
while (true) {
let elementsLength = await page.evaluate((scrollElements) => {
return document.querySelectorAll(scrollElements).length;
}, scrollElements);
for (; currentElement < elementsLength; currentElement++) {
await page.waitForTimeout(200);
await page.evaluate(
(currentElement, scrollElements) => {
document.querySelectorAll(scrollElements)[currentElement].scrollIntoView();
},
currentElement,
scrollElements
);
}
await page.waitForTimeout(5000);
let newElementsLength = await page.evaluate((scrollElements) => {
return document.querySelectorAll(scrollElements).length;
}, scrollElements);
if (newElementsLength === elementsLength || currentElement > 100) break; // if you want to get all elements (or some other number of elements) change number to 'Infinity' (or some other number)
}
}
async function getThumbnails() {
const browser = await puppeteer.launch({
headless: false,
args: ["--no-sandbox", "--disable-setuid-sandbox"],
});
const page = await browser.newPage();
await page.setDefaultNavigationTimeout(60000);
await page.goto(mainPageUrl);
await page.waitForSelector("#contents");
const scrollElements = "a#thumbnail";
await scrollPage(page, scrollElements);
await page.waitForTimeout(10000);
const urls = await page.$$eval("a#thumbnail #img", (els) => els.map(el => el.getAttribute('src')).filter(el => el));
await browser.close();
return urls;
}
getThumbnails().then(console.log);
Output
[
"https://i.ytimg.com/vi/02oeySm1CJA/hq720.jpg?sqp=-oaymwEcCNAFEJQDSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLBmrYMHESpY_f1oTNx00iuR3tNeCQ",
"https://i.ytimg.com/vi/RMo2haIPYBM/hq720_live.jpg?sqp=CNifxJcG-oaymwEcCNAFEJQDSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLBw4ogzR0709SqbttRdEzfL-aTdgQ",
"https://i.ytimg.com/vi/qJFFp_ta1Zk/hqdefault.jpg?sqp=-oaymwEcCOADEI4CSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLBJ-44OFgBUuVUYWBVh3Yi3hQgwIg",
"https://i.ytimg.com/vi/OZoTjoN-Sn0/hqdefault.jpg?sqp=-oaymwEcCOADEI4CSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLCOeGTCnlT4U0wV1SNclkmFUEHLaA",
"https://i.ytimg.com/vi/L8cH2gI67uk/hqdefault.jpg?sqp=-oaymwEcCOADEI4CSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLAuvZ3khIjpvAVTGjmR9FDxQrPIgQ",
"https://i.ytimg.com/vi/6rUyVKyJnGY/hq720.jpg?sqp=-oaymwEcCNAFEJQDSFXyq4qpAw4IARUAAIhCGAFwAcABBg==&rs=AOn4CLCifsTG4MlA3mf8CcJDkfKdWaZkaA",
"https://i.ytimg.com/vi/xpaURivPZFk/hq720_2.jpg?sqp=-oaymwEdCJYDENAFSFXyq4qpAw8IARUAAIhCcAHAAQbQAQE=&rs=AOn4CLA5oFDDsVzbV3tUqyfogfuf3LPahQ",
"https://i.ytimg.com/vi/MsR76PyVdUs/hq720_2.jpg?sqp=-oaymwEdCJYDENAFSFXyq4qpAw8IARUAAIhCcAHAAQbQAQE=&rs=AOn4CLAEBYGNvif-7LWx2mqW4G9o-OUhEQ",
"https://i.ytimg.com/vi/liasQRRVt5w/hq720_2.jpg?sqp=-oaymwEdCJYDENAFSFXyq4qpAw8IARUAAIhCcAHAAQbQAQE=&rs=AOn4CLAUcMpyKY0GhmNAHHtP_cDkAp18DQ",
"https://i.ytimg.com/vi/Dr5IqlTLMDM/hq720_2.jpg?sqp=-oaymwEdCJYDENAFSFXyq4qpAw8IARUAAIhCcAHAAQbQAQE=&rs=AOn4CLBOSUi6mgjdD5a-Jx8Ns24SlexB1g",
"https://i.ytimg.com/vi/E8kit8xJKdI/hq720_2.jpg?sqp=-oaymwEdCJYDENAFSFXyq4qpAw8IARUAAIhCcAHAAQbQAQE=&rs=AOn4CLDDStn95G7ei5DTusGXE4RimzdLUw",
"https://i.ytimg.com/vi/SqEaahOmLHU/hq720_2.jpg?sqp=-oaymwEdCM0CENAFSFXyq4qpAw8IARUAAIhCcAHAAQbQAQE=&rs=AOn4CLBDcWLCklNxEAuT1ZvSTKrIplGOag",
...and other results
]
You can read more about scraping YouTube from my blog posts:
Web scraping YouTube search video results with Nodejs
Web scraping YouTube secondary search results with Nodejs
Web scraping YouTube video page with Nodejs

Can't scrape from a page I navigate to by using Puppeteer

I'm fairly new to Puppeteer and I'm trying to practice keep tracking of a selected item from Amazon. However, I'm facing a problem when I try to retrieve some results from the page.
The way I intended this automation to work is by following these steps:
New tab.
Go to the home page of Amazon.
Enter the given product name in the search element.
Press the enter key.
Return the product title and price.
Check this example below:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (req) => { // don't load any fonts or images on my requests. To Boost the performance
if (req.resourceType() == 'font' /* || req.resourceType() == 'image' || req.resourceType() == 'stylesheet'*/) {
req.abort();
}
else {
req.continue(); {
}
}
});
const baseDomain = 'https://www.amazon.com';
await page.goto(`${baseDomain}/`, { waitUntil: "networkidle0" });
await page.click("#twotabsearchtextbox" ,{delay: 50})
await page.type("#twotabsearchtextbox", "Bose QuietComfort 35 II",{delay: 50});
await page.keyboard.press("Enter");
await page.waitForNavigation({
waitUntil: 'networkidle2',
});
let productTitle = await page.$$(".a-size-medium, .a-color-base, .a-text-normal")[43]; //varible that holds the title of the product
console.log(productTitle );
debugger;
})();
when I execute this code, I get in the console.log a value of undefined for the variable productTitle. I had a lot of trouble with scraping information from a page I navigate to. I used to do page.evaluate() and it only worked when I'm scraping from the page that I have told the browser to go to.
The first problem is on this line:
let productTitle = await page.$$(".a-size-medium, .a-color-base, .a-text-normal")[43];
// is equivalent to:
let productTitle = await (somePromise[43]);
// As you guessed it, a Promise does not have a property `43`,
// so I think you meant to do this instead:
let productTitle = (await page.$$(".a-size-medium, .a-color-base, .a-text-normal"))[43];
Once this is fixed, you don't get the title text, but a handle to the DOM element. So you can do:
let titleElem = (await page.$$(".a-size-medium, .a-color-base, .a-text-normal"))[43];
let productTitle = await titleElem.evaluate(node => node.innerText);
console.log(productTitle); // "Microphone"
However, I'm not sure that simply selecting the 43rd element will always get you the one you want, but if it isn't, that would be a topic for another question.

FabricJS Inspecting the Canvas from UnitTests or Automated Tools

We are using FabricJS and items can be rendered to the canvas when actions are taken in the web app.
i.e. Click a button and an object would appear in the fabric canvas.
What are the best methods to check the canvas has been updated correctly using an Automated e2e test?
We are using Puppeteer and/or Protractor for end to end testing.
Is there a good way to check that the canvas contains all of the expected elements?
Are details of the objects in the canvas available to DevTools in any way?
Regards,
Other sources for this problem with no solution.....
https://www.reddit.com/r/javascript/comments/4ltvfe/are_unit_tests_with_a_fabricjs_app_possible/
Here is an example, where Puppeteer can open a FabricJs example page. It is possible to get a screenshot of the whole page, and the dataurl for the canvas. However, it would be better to be able to inspect/follow the canvas state and the objects that it contains together with their locations.
http://fabricjs.com/static_canvas
How would you get the location of any of the planets from the example?
The planets are fabric.Circle objects, but rendered to the canvas. Would there be a way to track or follow them?
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({headless:false, args:['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.goto('http://fabricjs.com/solar-system');
await page.waitFor(4000)
await page.screenshot({path: 'example.png'});
const dataUrl = await page.evaluate(() => {
const canvas = document.getElementsByClassName("lower-canvas")[0];
console.log(canvas);
return canvas.toDataURL();
});
console.log(dataUrl);
await page.goto(dataUrl);
await page.waitFor(4000)
await browser.close();
})();
Kind regards,
The solution that we went for was to have a hidden button in the page.
This button would be linked to a function that would query the fabric canvas and add a list of object details to a global variable.
This global variable can be queried allowing the test to locate objects in the fabric canvas. It's clunky, but works.
Would love to hear any other suggestions.
async renderAllObjectsToApplicationState() {
console.log('Adding objects to application info');
const objs = this._mainCanvas.getObjects().map(ob => ({
id: ob.id,
left: ob.oCoords.tl.x,
top: ob.oCoords.tl.y,
width: ob.oCoords.br.x - ob.oCoords.tl.x,
height: ob.oCoords.br.y - ob.oCoords.tl.y,
visible: ob.visible,
center: ob.path
? this.getCoordCenter(ob.path, this._mainCanvas.getZoom())
: null,
}));
applicationInfo['fabricObjects'] = objs;
}
<button
id="getFabricObjBtn"
class="float-right"
style="display: none;"
(click)="renderAllObjectsToApplicationState()"
></button>
async function getImageViewerFabricObjects(){
await page.waitFor(100);
await page.evaluate(() => document.querySelector('#getFabricObjBtn').click());
await page.waitFor(250);
const fabricObjects = await page.evaluate(() => window.applicationInfo.fabricObjects);
return fabricObjects;
}
const fabricObjects = await getFabricObjects();
const roi = fabricObjects.filter(obj => obj.id==="roi");

How do you get all the links from a page with node puppeteer?

I'm trying to build a web crawler with node and came across the puppeteer package which looks perfect for what I want. My end result is to gather all the links from a page, all of its text content, and then a screenshot of the page itself.
I ran the following and it appears to gather a large number of links, however on actual inspection of the site there are links that it is not gathering.
const puppeteer = require('puppeteer');
module.exports = () => {
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://pixabay.com/en/columbine-columbines-aquilegia-3379045/');
await page.screenshot({ path: 'myscreenshot.png', fullPage: true });
let text = await page.$eval('*', el => el.innerText.split(' '));
text = text.map(string => {
return string.replace(/[^\w\s]/gi, '');
});
let hrefs = await page.evaluate(() => {
const links = Array.from(document.querySelectorAll('a'))
return links.map(link => link.href);
});
console.log('done');
await browser.close();
})();
};
for example this link : /go/?t=image-details-shutterstock&id=699165328 is nowhere in the array of hrefs. What's worse is these are links that lead out of the site, the exact type of thing I want to do, otherwise I'm stuck only crawling the one site.
Is there a reason my script is only showing some of the links? is the querySelector too narrow or rejecting certain links?
That links are generated by onclick event, it saved in data-go attribute, for example
<a data-go="image-details-shutterstock&id=458320033">
It only need to prepend /go/?t= and to get it
return links.map(link => link.href || link.getAttribute('data-go'));
there are also empty link for menu like
<a><i class="icon icon_menu_user"></i></a>

Categories

Resources