I am trying to write an application where it will go to amazon and get a list of books on the page. I am using Playwright as the tool. I can get to the right section but I can't get the list of books. Looking online the examples seem to use page.$$(selector) but when I try that, I get an empty array back. Found this information here and here. Reading the docs on $$, this seems like the right call as all the list elements have the same class name. I have no idea what I am doing wrong, any advice on this?
Here is my code so far;
const AMAZON_KINDLE_EBOOK_STORE_URL = 'https://www.amazon.com/Best-Sellers-Kindle-Store-eBooks/zgbs/digital-text/154606011/ref=zg_bs_nav_kstore_1_kstore/';
(async () => {
const browser = await chromium.launch();
try {
const amazonPage = await browser.newPage();
await amazonPage.goto(AMAZON_KINDLE_EBOOK_STORE_URL);
await amazonPage.waitForSelector('"Best Sellers in"');
await amazonPage.click('"Self-Help"');
await amazonPage.click('"Creativity"')
const books = await amazonPage.$$('li[class="zg-item-immersion"]');
console.log(books);
} finally {
await browser.close();
}
})();
For the selector I have tried it numerous ways as well;
li[class="zg-item-immersion"] - This actually worked checking on the dev console
'zg-item-immersion'
#zg-item-immersion
It seems the only problem is that Plawright is too fast and you don't wait for those elements li[class="zg-item-immersion"].
I debugged the script and the selector is fine, so with this line, it returns 50 element handles:
const { chromium } = require('playwright');
const AMAZON_KINDLE_EBOOK_STORE_URL = 'https://www.amazon.com/Best-Sellers-Kindle-Store-eBooks/zgbs/digital-text/154606011/ref=zg_bs_nav_kstore_1_kstore/';
(async () => {
const browser = await chromium.launch({ headless: false});
try {
const amazonPage = await browser.newPage();
await amazonPage.goto(AMAZON_KINDLE_EBOOK_STORE_URL);
await amazonPage.waitForSelector('"Best Sellers in"');
await amazonPage.click('"Self-Help"');
await Promise.all([
amazonPage.waitForNavigation(),
amazonPage.click('"Creativity"')
]);
const books = await amazonPage.$$('li[class="zg-item-immersion"]');
console.log(books);
} finally {
await browser.close();
}
})();
you can perhaps do what you did a few lines above and for a selector:
const { chromium } = require('playwright');
const AMAZON_KINDLE_EBOOK_STORE_URL = 'https://www.amazon.com/Best-Sellers-Kindle-Store-eBooks/zgbs/digital-text/154606011/ref=zg_bs_nav_kstore_1_kstore/';
(async () => {
const browser = await chromium.launch({ headless: false});
try {
const amazonPage = await browser.newPage();
await amazonPage.goto(AMAZON_KINDLE_EBOOK_STORE_URL);
await amazonPage.waitForSelector('"Best Sellers in"');
await amazonPage.click('"Self-Help"');
await amazonPage.click('"Creativity"')
await amazonPage.waitForSelector('li[class="zg-item-immersion"]');
const books = await amazonPage.$$('li[class="zg-item-immersion"]');
console.log(books);
} finally {
await browser.close();
}
})();
It does work like this as well.
Related
I've been attempting to use Playwright to interact with the map component of sites like Google Maps or OpenStreetMaps. I've tried using the combination of browser.mouse.move(), browser.mouse.up(), and browser.mouse.down() with literals as the parameters. When I run it, it doesn't seem to be doing anything with the map at all.
Is there a way to move the map around with Playwright?
I've created a GitHub repo so that it can be reproduced easily. I will also have the code down below.
https://github.com/vincent-woodward/Playwright-Map-Interaction
const { chromium } = require("playwright");
(async () => {
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
//await page.goto("https://www.google.com/maps");
await page.goto("https://www.openstreetmap.org/#map=4/38.01/-95.84");
await page.mouse.move(600, 300);
await page.mouse.down();
await page.mouse.move(1200, 450);
await page.mouse.up();
browser.close();
})();
Great news! It looks like this was freshly added about a day ago!
View source/test implementation
After looking at the PR, your code should work:
await page.mouse.move(600, 300);
await page.mouse.down();
await page.mouse.move(1200, 450); // NOTE: make sure your viewport is big enough for this
await page.mouse.up();
I found that if you add {steps: 5} to the move command then it will work as expected. I think something about the moving map interfaces of openstreet maps and also leaflet expect there to be a sequence of mouse move events.
const { chromium } = require("playwright");
(async () => {
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
//await page.goto("https://www.google.com/maps");
await page.goto("https://www.openstreetmap.org/#map=4/38.01/-95.84");
await page.mouse.move(600, 300);
await page.mouse.down();
await page.mouse.move(1200, 450, {steps: 5}); // <-- Change here
await page.mouse.up();
browser.close();
})();
This works for me on a laptop. You can also remove the loops for the delay
const { chromium } = require("playwright");
(async () => {
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
//await page.goto("https://www.google.com/maps");
await page.goto("https://www.openstreetmap.org/#map=4/38.01/-95.84");
await page.click('#map',{force:true});//here the trick
await page.mouse.down();
await page.mouse.move(890, 80);
for(var i = 0;i<1000000000;i++){}
await page.mouse.move(400, 180);
for(var i = 0;i<1000000000;i++){}
await page.mouse.move(700, 300);
await page.mouse.up();
//browser.close();
})();
I'm in the process of making an Autocheckout bot, I'm attempting to make the section that checks if the item is in stock and I want to make it all different functions in different code blocks. The problem is I cant get it to run.
When I wrap the function in () only the first function runs while the second one does nothing.
Here is the code without the () around the functions, anyone know what I'm doing wrong?
const puppeteer = require ('puppeteer');
const puppeteerExtra = require('puppeteer-extra');
const pluginStealth = require('puppeteer-extra-plugin-stealth');
const rand_url = "https://www.walmart.com/ip/Cyberpunk-2077-Warner-Bros-PlayStation-4/786104378";
async function initBrowser(){
const browser = await puppeteer.launch({args: ["--incognito"],headless:false}); //Launches browser in incognito
const context = await browser.createIncognitoBrowserContext();
const page = await context.newPage(); //Ensures the new page is also incognito
await page.evaluateOnNewDocument(() => {delete navigator.__proto__.webdriver;});
await page.goto(rand_url); //goes to given link
return page;
};
async function checkstock(page){
await page.reload();
let content = await page.evaluate(() => document.body.innerHTML)
$("link[itemprop ='availability']", content).each(function(){
let out_of_stock = $(this).attr('href').toLowerCase().includes("outofstock");
if(out_of_stock){
console.log("Out of Stock");
} else{
await browser.close();
console.log("In Stock")
//await page.waitForSelector("button[class='button spin-button prod-ProductCTA--primary button--primary']", {visible: true,}); //Waits for Add to Cart Button
//await page.$eval("button[class='button spin-button prod-ProductCTA--primary button--primary']", elem => elem.click()); //Clicks Add to cart button
}
});
};
To execute the code do it as follow, but you will get ReferenceError: $ is not defined.
const puppeteer = require ('puppeteer');
const puppeteerExtra = require('puppeteer-extra');
const pluginStealth = require('puppeteer-extra-plugin-stealth');
const rand_url = "https://www.walmart.com/ip/Cyberpunk-2077-Warner-Bros-PlayStation-4/786104378";
async function initBrowser(){
const browser = await puppeteer.launch({args: ["--incognito"],headless:false}); //Launches browser in incognito
const context = await browser.createIncognitoBrowserContext();
const page = await context.newPage(); //Ensures the new page is also incognito
await page.evaluateOnNewDocument(() => {delete navigator.__proto__.webdriver;});
await page.goto(rand_url); //goes to given link
return page;
};
async function checkstock(page){
await page.reload();
let content = await page.evaluate(() => document.body.innerHTML)
console.error(content);
$("link[itemprop ='availability']", content).each(async function(){
let out_of_stock = $(this).attr('href').toLowerCase().includes("outofstock");
if(out_of_stock){
console.log("Out of Stock");
} else{
await browser.close();
}
});
};
(async () => {
const page = await initBrowser()
await checkstock(page)
})()
I debugged your code, and after add to launch.json:
"outputCapture": "std"
I noticed that there is an error in the following line:
await browser.close();
^^^^^
SyntaxError: await is only valid in async function
You need to add async
$("link[itemprop ='availability']", content).each(async function(){
I am trying to automate my application which is running on azure portal using puppeteer. And I am getting following error after entering the password it is not clicking the submit button.
node:55768) UnhandledPromiseRejectionWarning: ReferenceError: browser is not defined
Here is my sample code:
(async () => {
try {
const launchOptions = { headless: false, args: ['--start-maximized'] };
const browser = await puppeteer.launch(launchOptions);
const page = await browser.newPage();
await page.emulate(iPhonex);
await page.goto('https://apps.testpowerapps.com/play/72ff5b93-2327-404d-9423-92eedb44a287?tenantId=n082027');
//Enter User Name
const [userName] = await page.$x('//*[#id="i0116"]');
await userName.type("jyoti.m#azure.com");
const [loginButton] = await page.$x('//*[#id="idSIButton9"]');
await loginButton.press('Enter');
//Enter Password
const [passWord] = await page.$x('//*[#id="i0118"]');
await passWord.type("Pass123");
const [submitButton] = await page.$x('//*[#id="idSIButton9"]');
await submitButton.press('Enter');
//await page.keyboard.press('Enter');
}
catch(error){
console.error(error);
}
finally {
await browser.close();
}
})();
Tried with both way but not working only catch is the xpath for both the pages are same.
const [submitButton] = await page.$x('//*[#id="idSIButton9"]');
await submitButton.press('Enter');
//await page.keyboard.press('Enter');
any clue to resolve this.
You define the browser value in the try but you also use it in the catch. consts are block-scoped, so they are tied to the block, so a different block (the finally) can not see it.
Here is the problem:
try {
const browser = ...;
}
finally {
// different block!
await browser.close();
}
To solve this, move the browser out of the try-catch:
const browser = ...
try {
}
finally {
await browser.close();
}
This way it's available in the finally block.
Finally I figured how to use Node.js. Installed all libraries/extensions. So puppeteer is working, but as it was previous with Xmlhttp... it gets only template/body of the page, without needed information. All scripts on the page engage after few second it had been opened in browser (Web app?). I need to get information inside certain tags after Whole page is loaded. Also, I would ask, if it possible to have pure JavaScript, because I do not use jQuery like code. So it doubles difficulty for me...
Here what I have so far.
const puppeteer = require('puppeteer');
const $ = require('cheerio');
let browser;
let page;
const url = "really long link with latitude and attitude";
(async () => puppeteer
.launch()
.then(await function(browser) {
return browser.newPage();
})
.then(await function(page) {
return page.goto(url).then(function() {
return page.content();
});
})
.then(await function(html) {
$('strong', html).each(function() {
console.log($(this).text());
});
})
.catch(function(err) {
//handle error
}))();
I get only template default body elements inside strong tag. But it should contain a lot more data than just 10 items.
If you want full html same as inspect? Here it is:
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto('https://example.org/', { waitUntil: 'networkidle0' });
const data = await page.evaluate(() => document.querySelector('*').outerHTML);
console.log(data);
await browser.close();
} catch (err) {
console.error(err);
}
})();
let bodyHTML = await page.evaluate(() => document.documentElement.outerHTML);
This
Some notes:
You need not cheerio with puppeteer and you need not reparse page.content(): you already have the full DOM with all scripts run and you can evaluate any code in window context like in a browser using page.evaluate() and transferring serializable data between web API context and Node.js API context.
Try to use async/await only, this will simplify your code and flow.
If you need to wait till all the scripts and other dependencies are loaded, use waitUntil: 'networkidle0' in page.goto().
If you suspect that document scripts need some time till the needed state, use various test functions like page.waitForSelector() or fall back to page.waitFor(milliseconds).
Here is a simple script that outputs all tag names in a page.
'use strict';
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto('https://example.org/', { waitUntil: 'networkidle0' });
const data = await page.evaluate(
() => Array.from(document.querySelectorAll('*'))
.map(elem => elem.tagName)
);
console.log(data);
await browser.close();
} catch (err) {
console.error(err);
}
})();
You can specify your task in more details and we can try to write something more appropriate.
Script for www.bezrealitky.cz (task from a comment below):
'use strict';
const fs = require('fs');
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch();
const [page] = await browser.pages();
page.setDefaultTimeout(0);
await page.goto('https://www.bezrealitky.cz/vyhledat?offerType=pronajem&estateType=byt&disposition=&ownership=&construction=&equipped=&balcony=&order=timeOrder_desc&boundary=%5B%5B%7B%22lat%22%3A50.171436864513%2C%22lng%22%3A14.506905276796942%7D%2C%7B%22lat%22%3A50.154133576294%2C%22lng%22%3A14.599004629591036%7D%2C%7B%22lat%22%3A50.14524430128%2C%22lng%22%3A14.58773054712799%7D%2C%7B%22lat%22%3A50.129307131988%2C%22lng%22%3A14.60087568578706%7D%2C%7B%22lat%22%3A50.122604734575%2C%22lng%22%3A14.659116306376973%7D%2C%7B%22lat%22%3A50.106512499343%2C%22lng%22%3A14.657434650206028%7D%2C%7B%22lat%22%3A50.090685542974%2C%22lng%22%3A14.705099547441932%7D%2C%7B%22lat%22%3A50.072175921973%2C%22lng%22%3A14.700004206235008%7D%2C%7B%22lat%22%3A50.056898491904%2C%22lng%22%3A14.640206899053055%7D%2C%7B%22lat%22%3A50.038528576841%2C%22lng%22%3A14.666852728301023%7D%2C%7B%22lat%22%3A50.030955909657%2C%22lng%22%3A14.656128752460972%7D%2C%7B%22lat%22%3A50.013435368522%2C%22lng%22%3A14.66854956530301%7D%2C%7B%22lat%22%3A49.99444182116%2C%22lng%22%3A14.640153080292066%7D%2C%7B%22lat%22%3A50.010839032542%2C%22lng%22%3A14.527474219359988%7D%2C%7B%22lat%22%3A49.970771602447%2C%22lng%22%3A14.46224174052395%7D%2C%7B%22lat%22%3A49.970669964027%2C%22lng%22%3A14.400648545303966%7D%2C%7B%22lat%22%3A49.941901176098%2C%22lng%22%3A14.395563234671044%7D%2C%7B%22lat%22%3A49.948384148423%2C%22lng%22%3A14.337635637038034%7D%2C%7B%22lat%22%3A49.958376114735%2C%22lng%22%3A14.324977842107955%7D%2C%7B%22lat%22%3A49.9676286223%2C%22lng%22%3A14.34491711110104%7D%2C%7B%22lat%22%3A49.971859099005%2C%22lng%22%3A14.326815050839059%7D%2C%7B%22lat%22%3A49.990608728081%2C%22lng%22%3A14.342731259186962%7D%2C%7B%22lat%22%3A50.002211140429%2C%22lng%22%3A14.29483886971002%7D%2C%7B%22lat%22%3A50.023596577558%2C%22lng%22%3A14.315872285282012%7D%2C%7B%22lat%22%3A50.058309376419%2C%22lng%22%3A14.248086830069042%7D%2C%7B%22lat%22%3A50.073179111%2C%22lng%22%3A14.290193274400963%7D%2C%7B%22lat%22%3A50.102973823639%2C%22lng%22%3A14.224439442359994%7D%2C%7B%22lat%22%3A50.130060800171%2C%22lng%22%3A14.302396419107936%7D%2C%7B%22lat%22%3A50.116019827009%2C%22lng%22%3A14.360785349547996%7D%2C%7B%22lat%22%3A50.148005694843%2C%22lng%22%3A14.365662825877052%7D%2C%7B%22lat%22%3A50.14142969454%2C%22lng%22%3A14.394903042943952%7D%2C%7B%22lat%22%3A50.171436864513%2C%22lng%22%3A14.506905276796942%7D%2C%7B%22lat%22%3A50.171436864513%2C%22lng%22%3A14.506905276796942%7D%5D%5D&hasDrawnBoundary=1&mapBounds=%5B%5B%7B%22lat%22%3A50.289447077141126%2C%22lng%22%3A14.68724263943227%7D%2C%7B%22lat%22%3A50.289447077141126%2C%22lng%22%3A14.087801111111958%7D%2C%7B%22lat%22%3A50.039169221047985%2C%22lng%22%3A14.087801111111958%7D%2C%7B%22lat%22%3A50.039169221047985%2C%22lng%22%3A14.68724263943227%7D%2C%7B%22lat%22%3A50.289447077141126%2C%22lng%22%3A14.68724263943227%7D%5D%5D¢er=%7B%22lat%22%3A50.16447196305031%2C%22lng%22%3A14.387521875272125%7D&zoom=11&locationInput=praha&limit=15');
await page.waitForSelector('#search-content button.btn-icon');
while (await page.$('#search-content button.btn-icon') !== null) {
const articlesForNow = (await page.$$('#search-content article')).length;
console.log(`Articles for now: ${articlesForNow}. Getting more...`);
await Promise.all([
page.evaluate(
() => { document.querySelector('#search-content button.btn-icon').click(); }
),
page.waitForFunction(
old => document.querySelectorAll('#search-content article').length > old,
{},
articlesForNow
),
]);
}
const articlesAll = (await page.$$('#search-content article')).length;
console.log(`All articles: ${articlesAll}.`);
fs.writeFileSync('full.html', await page.content());
fs.writeFileSync('articles.html', await page.evaluate(
() => document.querySelector('#search-content div.b-filter__inner').outerHTML
));
fs.writeFileSync('articles.txt', await page.evaluate(
() => [...document.querySelectorAll('#search-content article')]
.map(({ innerText }) => innerText)
.join(`\n${'-'.repeat(50)}\n`)
));
console.log('Saved.');
await browser.close();
} catch (err) {
console.error(err);
}
})();
Just one line:
const html = await page.content();
Details:
import puppeteer from 'puppeteer'
const test = async (url) => {
const browser = await puppeteer.launch({ headless: false })
const page = await browser.newPage()
await page.goto(url, { waitUntil: 'networkidle0' })
const html = await page.content()
console.log(html)
}
await test('https://stackoverflow.com/')
Hi, i would like to be able to click on system tools and then on the firmware upgrade button, but when i use the ID or the selector ( by right click -> copy selector), it jst says that it cant find it.
It's my first time using Puppeteer, can someone help please :) ?
Thanks
const puppeteer = require('puppeteer');
let scrape = async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.setViewport({width: 1000, height: 500})
await page.goto('http://192.168.2.107:8080/', {waitUntil: 'networkidle2'});
await page.waitFor('input[id=pcPassword]');
await page.$eval('input[id=pcPassword]', el => el.value = 'admin');
page.keyboard.press('Enter')
await page.waitFor(3000);
await page.click(
'[id="the Id im talking about "]'
);
//await page.waitFor(5000);
await browser.close();
};
Ok I just had to select the frame, i didn't know that,
Here is the code :
const puppeteer = require('puppeteer');
let scrape = async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox'], headless:false});
const page = await browser.newPage();
await page.setViewport({width: 1900, height: 700})
await page.goto('http://192.168.2.105:8080', {waitUntil: 'networkidle2'});
await page.waitFor('input[id=pcPassword]');
await page.$eval('input[id=pcPassword]', el => el.value = 'admin');
page.keyboard.press('Enter');
await page.waitFor(3000);
const frame = await page.frames().find(f => f.name() === 'bottomLeftFrame');
const button = await frame.$('#menu_tools');
button.click();
await page.waitFor(1000);
const button2 = await frame.$('#menu_softup');
button2.click();
}
scrape().then((value) => {
console.log(value); // Success!
});
From your comments to this answer I think that we are dealing with frames in the page. That's the reason why puppeteer is unable to find the element #menu_tools despite it being visible to you in the page. To access the frames in the page look more at Puppeteer pageFrames.
Here's a demo of how your code may look.
let scrape = async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.setViewport({width: 1000, height: 500})
await page.goto('http://192.168.2.107:8080/', {waitUntil: 'networkidle2'});
// Find out which frame holds your desired selector then edit the pageFrame below.
const pageFrame = await page.mainFrame().childFrames[0];
await pageFrame.waitFor('input[id=pcPassword]');
await pageFrame.$eval('input[id=pcPassword]', el => el.value = 'admin');
pageFrame.keyboard.press('Enter')
await pageFrame.waitFor(3000);
await pageFrame.waitFor('#menu_tools')
await pageFrame.click('#menu_tools');
await browser.close();
};
You can find all the frames available in the page with await page.frames() then connect to that frame. You can then proceed to perform operations like you usually do so with a page by using the handle for that frame.