I'm working with puppeteer at the moment to create a web-crawler and face the following problem:
The site I'm trying to scrape information off of uses Tabs. It renders all of them at once and sets the display-property of all but one tab to 'none' so only one tab is visible.
The following code always gets me the first flight row, which can be hidden depending on the date that the crawler is asking for.
const flightData = await page.$eval('.available-flights .available-flight.row', (elements) => {
// code to handle rows
}
There doesn't seem to be an additional parameter you can pass with .$eval() like you can in
.waitForSelector('.selector', {hidden: false})
Am I following the wrong idea?
Is there a way to only select the shown element and work with that data?
const flightData = await page.$eval('.available-flights .available-flight.row:not([style*="display:none"]):not([style*="display: none"])', (elements) => {
// code to handle rows
}
Does the trick :)
Related
I am doing one of my first projects using the Ball Don't lie API, trying to build my version of an ESPN landing page. I am using https://www.balldontlie.io/api/v1/players. I am using Javascript, I have been stuck for days trying to understand how to display the first and last name of all of the players on the landing page in HTML. I only know how to display one name if I use data.data[0]. I've tried .map, loops, it's just not clicking. I want to be able to display other stats in the array as well. Can anyone help?
This my Javascript code:
async function getPlayers() {
const response = await fetch ('https://www.balldontlie.io/api/v1/players');
const data = await response.json();
const players = data.data;
console.log(players);
displayPlayer(players);
}
function displayPlayer(players) {
const scores = document.getElementById('scores');
scores.innerHTML = `
${players.first_name} ${players.last_name}`;
}
getPlayers()```
I had tried .map, I've tried loops, I am just not understanding what function is going to show the players. Maybe my orignal code doesn't make sense. I've tried watching Youtube and can't find anyone doing it in simple Javascript.
You can try this in your script and edit points 2. and 4. for better display of what you need to show
// 1. GET request using fetch()
fetch("https://www.balldontlie.io/api/v1/players")
// Converting received data to JSON
.then((response) => response.json())
.then((json) => {
// 2. Create a variable to store HTML table headers
let li = `<tr><th>ID</th><th>first_name</th><th>height_feet</th><th>height_inches</th> <th>last_name</th><th>position</th><th>im lazy...</th></tr>`;
// 3. Loop through each data and add a table row
console-console.log(json.data);
json.data.forEach((user) => {
li += `<tr>
<td>${user.id}</td>
<td>${user.first_name} </td>
<td>${user.height_feet}</td>
<td>${user.height_inches}</td>
<td>${user.last_name}</td>
<td>${user.position}</td>
<td>${user.team.id}</td>
<td>${user.team.abbreviation}</td>
<td>${user.team.city}</td>
<td>${user.team.conference}</td>
<td>${user.team.division}</td>
<td>${user.team.full_name}</td>
<td>${user.team.name}</td>
</tr>`;
});
// 4. DOM Display result
document.getElementById("users").innerHTML = li;
});
And your html body part look like this
<div>
<!-- Table to display fetched user data -->
<table id="users"></table>
</div>
Your constant players is an array. In order to access a player's information within that array, you would need to index each player to then access their object of key:value pairs.
That is why you can get the first player's name to show when you save players as data.data[0]. This is indicating that you want to access the object in position 0 in the array. If you wanted the second player's information you would reference data.data[1], and so forth.
With trying to keep as much of your original code as possible (and adding some comments), I believe this is what you were trying to achieve.
async function getPlayers() {
// Fetch the API and convert it to json.
const response = await fetch ('https://www.balldontlie.io/api/v1/players');
const data = await response.json();
// Save the returned data as an array.
const players = data.data;
console.log(players);
// Create an element to display each individual player's information.
players.array.forEach(player => {
displayPlayer(player);
});
}
function displayPlayer(player) {
// Grab the element encasing all players.
const scores = document.getElementById('scores');
// Create a new element for the individual player.
const playerContent = document.createElement('div');
// Add the player's name.
playerContent.innerHTML = `
${player.first_name} ${player.last_name}`;
// Add the player content into the encasing division.
scores.appendChild(playerContent);
}
getPlayers()
We will use the forEach() function to index each player's object in the array of data for us, grab your "scores" element you created on your HTML page, then we will "append" (add to the end) each player's information into your "scores" element.
The website link below has some useful information to read that can help you build on your existing code when you want to start adding styling.
https://www.thesitewizard.com/javascripts/insert-div-block-javascript.shtml
This site has some useful information on using "promises" when dealing with async functions that will come in handy as you progress in coding.
https://www.geeksforgeeks.org/why-we-use-then-method-in-javascript/
These website links were added as of 02/04/2023 (just to add as a disclaimer to the links because who knows what they will do in 2030 O.o).
Hope this helps!
I've been playing around with puppeteer for some time now and can't seem to work out the best approach.
My goal is to be able to select one of the days which once automatically clicked loads different data which I will then extract.
I'm having a hard time trying to work out the best way to navigate between the days with puppeteer.
I want to have the ability to input the day I want to select and when launching the page it will navigate to that selected day.
I did have more code written in, however stripped it back.
async function scrapeData(page) {
await page.goto(newPageURL);
// Select the day you want to go to.
let spanDay = await page.$(".MuiTab-wrapper");
await page.$eval(".MuiTab-wrapper", (el) =>
el.forEach((element) => {
console.log(element);
})
);
}
What I try to do is to loop through an array of Facebook page IDs and to return the code from each event page. Unfortunately, I only get the code of the last page ID in the array but as many times as elements are in the array. E.g. when I have 3 ID's in the array I get 3 times the code of the last page ID.
I already experimented with async await but I had no success.
The expected outcome would be the code of each page.
Thank you for any help and examples.
//Looping through pages
pages.forEach(
function(page) {
//Creating URL
let url = "https://mbasic.facebook.com/"+page+"?v=events";
//Getting URL
driver.get(url).then(
function() {
//Page loaded
driver.getPageSource().then(function(result) {
console.log(result);
});
}
);
}
);
you faced the same issue i did when i created a scraper using python and selenium. Facebook has countermeasure on manual URL change, you cannot change it , i receive the same data again and again even though it was automated. in order to get a good result you need to have access of face books Graph API which provides a complete object of Facebook page with its pagination URL.
or the second way i got it write was i used on click button of selenium browser automation to scroll down the next page.it wont work like you are typing , i prefer the usage of graph API
I am trying to create an apify crawler, which has multiple clickable element. First click is to paginate, second click to visit each result, third is to visit a section of each result to extract more information.
function pageFunction(context) {
var $ = context.jQuery;
if (context.request.label === 'category'|| context.request.label === 'detail') {
context.skipLinks();
var result = {
item_name: $('name').text(),
categories: $('.categories').text(),
email: $('email').text(),
kvk: $('kvk').text()
};
return result;
} else {
context.skipOutput();
}
}
The first 2 clicks are happening, it paginates and visits the results and extract first 3 values : item_name, categories and email
The fourth value : kvk is not returned. I think either the third click is not happening or the code I used have some errors. Can anyone please help me to fix this?
One of the problems can the context.skipLinks() a function that prevents any new enqueued pages. Also, did you check all the selectors in the developer console? For debugging I would advise you to log the content of the page so you know it loaded. First, you need to find the source of the problem.
ONe side note, I would advice you to start developing is our modern web-scraper. Crawler platform is no longer maintained and may perform worse for some cases.
I'm using select2 and fetching available options from the server with this query function
var recipientsTimeout;
this.loadRecipients = function (query) {
clearTimeout(recipientsTimeout);
recipientsTimeout = setTimeout(function(){
data.transmitRequest('lookupcontacts', { search: query.term }, function(resp){
query.callback({ results: resp.items });
});
}, 500);
};
It uses our own ajax layer, and delays searching until the user stops typing, and it works fine. The only small issue is that if the user types some text, then immediately backspaces over it, my last timeout will still fire, and an ajax request will be fired, and ignored. This doesn't cause any problems, but it's less than optimal.
Is there any (preferably non-brittle) way to fetch whatever the current text is? It seems as though the query object sent in has an element property, which is a jQuery wrapper of the original hidden input I set select2 up with, but there's no direct way I can see to get the actual textbox that the user is typing in. Obviously I could inspect it and easily figure out the dom pattern and build up a selector from the hidden element back to what the user is typing in, but I really hate doing that, since the specific layout could easily change in a future version.
So what is the best way to get the currently entered text, so I could do a quick check on it when the setTimeout expires, and I'm about to run my ajax request.
I'm using 4.0.3 and the way I did it is this:
$("#mySelect2").data("select2").dropdown.$search.val()
The hidden input element (where you initialize select2 on) gets a data property select2, which contains references to all elements, that are used by select2. So you could do something like this:
var hiddenInputSelector = '#e1',
select2 = $(hiddenInputSelector).data('select2'),
searchInput = select2.search;
if($(searchInput).val() === '')
clearTimeout(recipientsTimeout);
This is not in the docs though, so it might change in the future.
In select2 version 4.0.3 this works for me, whereas the others did not:
$('.select2-search__field')[0].value;
I do something like this to get the current search term:
list.data("select2").search[0].value;