Puppeteer block for element to appear without sleeping/waiting - javascript

I want to loop through an array containing URLs and push an element into another array.
This is the code i used:
for (var i=0; i < links.length ; i++){
await page.goto(links[i], { timeout: 0, waitUntil: ['domcontentloaded'] });
await page.waitFor(20000);
var values = await page.evaluate(
() => [...document.querySelectorAll('.XYZ')]
.map(element => element.getAttribute('src'))
); //get the elements location
media.push(values); // push to array
);
console.log(media);
}
This code works. However, notice the third line is an await page.waitFor(20000);.
I am using this, to wait so that the page has loaded.
If I omit this line, sometimes, the variable called values is undefined.
I experimented with other time delay values and the lower it gets, the more chance it has of being undefined.
What is the proper way to loop through the array without wasting unneccesary time with big delays?
Shouldn't this process be automatic since I am using waitUntil: ['domcontentloaded'] in the page.goto() method?

Considering you are using the evaluate method to retrieve all elements with class name XYZ from the UI, why not use page.waitForSelector() to ask puppeteer to wait for the last loaded element using that class to be loaded in the UI?
By doing so you will know that all the elements you are interested in will have loaded before your evaluate method is triggered.
That will be much more efficient than asking it to wait for a hardcoded amount of 20 seconds each time. You want to avoid using hardcoded wait calls at all times during automation.

Related

'onElementReady' detecting simultaneous elements

I saw this code: https://gist.github.com/sidneys/ee7a6b80315148ad1fb6847e72a22313
This is pretty nice, a way to detect changes inside a page. I have the following code in addition to the above one:
(async () => {
const elems = await Promise.all([
'elem1',
'elem2',
'elem3',
].map(e => onElementReady(e, true)));
console.log(elems);
})();
The reason why I need to check for multiple elements is that there's a chaotic order of elements loading up via AJAX and one element isn't enough because it might be too early when the page is not fully loaded.
But this code that was provided by #wOxxOm doesn't work well when I need to call it every time a new page has loaded through AJAX. It works the first time when run, afterwards calling again the function that includes the above-mentioned code will no longer work.
Can you please help me?

Protractor - how to check that all instance of a class is not present in the DOM?

I need to check whether all elements of a class are not present in the DOM. Say, I want all the elements with the class .loading to not present in the DOM. I know I can do this:
browser.wait(EC.stalenessOf($$('.loading')), 5000);
My question is whether this code will wait for all the loading class to go away or just the first one? If it waits for only the first one, how will I make it work for all of them? Thanks in advance :)
yes, this should wait until ALL elements matching the locator are not present
But for future, when in doubt, you can write your function instead of using ExtectedConditions library. In this case, you could do
let loading = $$('.loading');
await browser.wait(
async () => (await loading.count()) === 0,
5000,
`message on failure`
);
In fact, this is what I'm using to handle multiple loading animations ;-)

How to scrape instagram post URL's using puppeteer (Node.js applicatie)

With all the changes to the current Instagram API I was trying to build a scraper. After some looking around I found puppeteer. Although it seems really straightforward I am running into a problem I can't seem to wrap my head around.
The problem is the following:
I know what the div tag of a post is (.v1Nh3.kIKUG._bz0w) and how to call for it (elements = await page.$$('.v1Nh3.kIKUG._bz0w');)
If I understand the $ function correctly this should return me a promise containing an array of all the posts on 'page'.
My first question would be if this assumption is correct, and my second is how I can get the array out of. (And if that all works how to get the redirect URL contained in the child href)
First things first: since Instagram is a heavy javascript-powered React application, the selectors you are after may not be available right after the page is loaded. So we should wait for them to appear in the DOM:
await page.waitForSelector('.v1Nh3.kIKUG._bz0w');
Now with page.evaluate we get the posts, but since you only want the links inside of those posts, let's grab them right away in the query:
const result = await page.evaluate(() => {
// Get elements into a NodeList
const elements = document.querySelectorAll('.v1Nh3.kIKUG._bz0w a');
...
}
But we cant convert the elements from Nodelist to an Array and just return them, because they're still DOM nodes, complex unserializable objects, and they need to be serializable to be able to return from page.evaluate. So instead of returning the complete nodes we'll just get what we need: urls from href attribute:
const result = await page.evaluate(() => {
// Get elements into a NodeList
const elements = document.querySelectorAll('.v1Nh3.kIKUG._bz0w a');
// Convert elements to an array,
// then for each item of that array only return the href attribute
const linksArr = Array.from(elements).map(link => link.href);
return linksArr;
});
Other ways to do it
In your question you mentioned page.$$ method. It is indeed applicable here to get handles of the objects we seek. But the code to iterate over them is not pretty:
const results = await page.$$('.v1Nh3.kIKUG._bz0w a')
for (const i in results)
{
console.log(await(await(await results[i]).getProperty("href")).jsonValue());
}
My favourite way to get those links though would be to use page.$$eval method:
const results = await page.$$eval('.v1Nh3.kIKUG._bz0w a', links => links.map(link => link.href))
It does exactly the same what we did in page.evaluate solution but in a much more concise way.
In order to get elements with a certain class and return them you must use the page.evaluate method. This is an asynchronous call which returns a promise.
So, in your use case, it should look like this:
const result = await page.evaluate(() => {
let elements = document.querySelectorAll('.v1Nh3.kIKUG._bz0w');
let elementsArr = [];
//Loop over elements in the array and create objects from each element
//with the data relevant to your logic
for (let element of elements) {
resultArr.push({
//your logic
});
}
return elementsArr;
});

Run several small test within one 'it' in E2E test using Protractor

I am working on a E2E test for a single-page web application in Angular2.
There are lots of clickable tags (not redirected to other pages but has some css effect when clicking) on the page, with some logic between them. What I am trying to do is,
randomly click a tag,
check to see the the response from the page is correct or not (need to grab many components from the web to do this),
then unclick it.
I set two const as totalRound and ITER, which I would load the webpage totalRound times, then within each loading page, I would randomly choose and click button ITER times.
My code structure is like:
let totalRound: number = 10;
let ITER: number = 100;
describe('XX Test', () => {
let page: AppPage;
beforeEach(() => {
page = new AppPage();
});
describe('Simulate User\'s Click & Unclick',() => {
for(let round = 0; round < totalRound; round++){
it('Click Simulation Round ' + round, () =>{
page.navigateTo('');
let allTagFinder = element.all(by.css('someCSS'));
allTagFinder.getText().then(function(tags){
let isMatched: boolean = True;
let innerTurn = 0;
for(let i = 0; i < ITER; i++){
/* Randomly select a button from allTagFinder,
using async func. eg. getText() to get more info
about the page, then check if the logic is correct or not.
If not correct, set isMatchTemp, a local variable to False*/
isMatched = isMatched && isMatchTemp;
innerTurn += 1;
if(innerTurn == ITER - 1){
expect(isMatched).toEqual(true);
}
}
});
});
}
});
});
I want to get a result after every ITER button checks from a loading page. Inside the for loop, the code is nested for async functions like getText(), etc..
In most time, the code performs correctly (looks the button checkings are in sequential). But still sometimes, it seems 2 iterations' information were conflicted. I guess there is some problem with my code structure for the async.
I thought JS is single-thread. (didn't take OS, correct me if wrong) So in the for loop, after all async. function finish initialization, all nested async. function (one for each loop) still has to run one by one, as what I wish? So in the most, the code still perform as what I hope?
I tried to add a lock in the for loop,
like:
while(i > innerTurn){
;
}
I wish this could force the loop to be run sequentially. So for the async. func from index 1 to ITER-1, it has to wait the first async. finish its work and increment the innerTurn by 1. But it just cannot even get the first async. (i=0) back...
Finally I used promise to solve the problem.
Basically, I put every small sync/async function into separate promises then use chaining to make sure the later function will only be called after the previous was resolved.
For the ITER for loop problem, I used a recursion plus promise approach:
var clickTest = function(prefix, numLeft, ITER, tagList, tagGsLen){
if(numLeft == 0){
return Promise.resolve();
}
return singleClickTest(prefix, numLeft, ITER, tagList, tagGsLen).then(function(){
clickTest(prefix, numLeft - 1, ITER, tagList, tagGsLen);
}).catch((hasError) => { expect(hasError).toEqual(false); });
}
So, each single clicking test will return a resolve signal when finished. Only then, the next round will be run, and the numLeft will decrease by 1. The whole test will end when numLeft gets to 0.
Also, I tried to use Python to rewrite the whole program. It seems the code can run in sequential easily. I didn't met the problems in Protractor and everything works for my first try. The application I need to test has a relatively simple logic so native Selenium seemed to be a better choice for me since it does not require to run with Frond-end code(just visit the webapp url and grab data and do process) and I am more confident with Python.

Is it possible to show an element just before entering a long running sync process?

This is a very simple use case. Show an element (a loader), run some heavy calculations that eat up the thread and hide the loader when done. I am unable to get the loader to actually show up prior to starting the long running process. It ends up showing and hiding after the long running process. Is adding css classes an async process?
See my jsbin here:
http://jsbin.com/voreximapewo/12/edit?html,css,js,output
To explain what a few others have pointed out: This is due to how the browser queues the things that it needs to do (i.e. run JS, respond to UI events, update/repaint how the page looks etc.). When a JS function runs, it prevents all those other things from happening until the function returns.
Take for example:
function work() {
var arr = [];
for (var i = 0; i < 10000; i++) {
arr.push(i);
arr.join(',');
}
document.getElementsByTagName('div')[0].innerHTML = "done";
}
document.getElementsByTagName('button')[0].onclick = function() {
document.getElementsByTagName('div')[0].innerHTML = "thinking...";
work();
};
(http://jsfiddle.net/7bpzuLmp/)
Clicking the button here will change the innerHTML of the div, and then call work, which should take a second or two. And although the div's innerHTML has changed, the browser doesn't have chance to update how the actual page looks until the event handler has returned, which means waiting for work to finish. But by that time, the div's innerHTML has changed again, so that when the browser does get chance to repaint the page, it simply displays 'done' without displaying 'thinking...' at all.
We can, however, do this:
document.getElementsByTagName('button')[0].onclick = function() {
document.getElementsByTagName('div')[0].innerHTML = "thinking...";
setTimeout(work, 1);
};
(http://jsfiddle.net/7bpzuLmp/1/)
setTimeout works by putting a call to a given function at the back of the browser's queue after the given time has elapsed. The fact that it's placed at the back of the queue means that it'll be called after the browser has repainted the page (since the previous HTML changing statement would've queued up a repaint before setTimeout added work to the queue), and therefore the browser has had chance to display 'thinking...' before starting the time consuming work.
So, basically, use setTimeout.
let the current frame render and start the process after setTimeout(1).
alternatively you could query a property and force a repaint like this: element.clientWidth.
More as a what is possible answer you can make your calculations on a new thread using HTML5 Web Workers
This will not only make your loading icon appear but also keep it loading.
More info about web workers : http://www.html5rocks.com/en/tutorials/workers/basics/

Categories

Resources