WebDriverIO iterate through URL and scroll through each

WebDriverIO iterate through URL and scroll through each - javascript

I'm using WebDriverIO (v5.18.7) and I'm trying to write something the can go to each URL, and scroll down in increments until you reached the bottom, then move on the next URL. The issue I'm having is when it comes to the scrolling portion of the script, it might scroll once before it goes to the next URL.
From what I'm understanding, the documentation for WebDriverIO says the commands are send in asynchronous, which is handled in the framework behind the scenes. So I tried sticking with the framework and tried browser.execute / browser.executeAsync but wasn't able to get it working.
Here's what I have that seems close to what I want. Any guidance would be appreciated!
const { doesNotMatch } = require('assert');
const assert = require('assert');
const { Driver } = require('selenium-webdriver/chrome');
// variable for list of URLS
const arr = browser.config.urls
describe('Getting URL and scrolling', () => {
it('Get URL and scroll', async () => {
// let i = 0;
for (const value of arr) {
await browser.url(value);
await browser.execute(() => {
const elem = document.querySelector('.info-section');
// scroll until this reaches the end.
// add another for loop with a counter?
elem.scrollIntoView(); //Using this for now as a place holder
});
// i += 1;
}
})
})

Short anwer $('.info-section').scrollIntoView()
See https://webdriver.io/docs/api/element/scrollIntoView.html
WebdriverIO suppors sync and async modes, see https://webdriver.io/docs/sync-vs-async.html

Related

Vscode move to line X after openTextDocument

I'm developing a VS Code extension that jump to a specific file:num, but I'm stuck at the step of moving the cursor to a specific line after opening a file.
How can I achieve this :
export const openAndMoveToLine = async (file_line: string) => {
// /home/user/some/path.php:10
let [filename, line_number] = file_line.split(":")
// opening the file => OK
let setting: vscode.Uri = vscode.Uri.parse(filename)
let doc = await vscode.workspace.openTextDocument(setting)
vscode.window.showTextDocument(doc, 1, false);
// FIXME: After being opened, now move to the line X => NOK **/
await vscode.commands.executeCommand("cursorMove", {
to: "down", by:'wrappedLine',
value: parseInt(line_number)
});
}
Thank you

It can be done with the TextDocumentShowOptions easily:
const showDocOptions = {
preserveFocus: false,
preview: false,
viewColumn: 1,
// replace with your line_number's
selection: new vscode.Range(314, 0, 314, 0)
};
let doc = await vscode.window.showTextDocument(setting, showDocOptions);

You will first need to get access to the active editor. This is done by adding a .then to the showTextDocument call (which is a Thenable function) that returns a text editor object. You will then be able to use the textEditor variable (as in the example) to set the position of the cursor using the selection property as follows:
vscode.window.showTextDocument(doc, 1, false).then((textEditor: TextEditor) => {
const lineNumber = 1;
const characterNumberOnLine = 1;
const position = new vscode.Position(lineNumber, characterNumberOnLine);
const newSelection = new vscode.Selection(position, position);
textEditor.selection = newSelection;
});
Reference to selection API can be found here.
The usecase that you are exploring has been discussed in GitHub issue that can be found here.

How do I continuously listen for a new item while scraping a website

I am using puppeteer to scrape a website that is being live updated, to report the latest item elsewhere.
Currently the way I was thinking accomplishing this is to run a setInterval call on my async scrape and to compare if the last item has changed, checking every 30 seconds. I assume there has to be a better way of doing this then that.
Here is my current code:
const puppeteer = require('puppeteer');
playtracker = async () => {
console.log('loading');
const browser = await puppeteer.launch({});
const page = await browser.newPage();
await page.goto('URL-Being-Scraped');
await page.waitForSelector('.playlist-tracklist-view');
let html = await page.$$eval('.playlist-tracklist-view > .playlist-track', tracks => {
tracks = tracks.filter(track => track.querySelector('.playlist-trackname').textContent);
tracks = tracks.map(el => el.querySelector('.playlist-trackname').textContent);
return tracks;
});
console.log('logging', html[html.length-1]);
};
setInterval(playtracker, 30000)

There is an api called "MutationObserver". You can check that out on MDN. Here's the link https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver
What it is doing is basically do whatever you want to do when the spesific element has changed. Lets say you have a list you want to listen. What you would do is
const listElement = document.querySelector( [list element] );
const callbackFunc = funcion foo () {
//do something
}
const yourMutationObserver = new MutationObserver(callbackFunc)
yourMutationObserver.observe(listElement)
You can disconnect your mutation observer with yourMutationObserver.disconnect() method whenever you want.
This could help too if you confused about how to implement it https://stackoverflow.com/a/48145840/14138428

StaleElementReferenceError on iterations

My application gets a list of IDs from the db. I iterate over these with a cursor & for every ID, I plug it into a URL with Selenium to get specific items on a page. This is doing a search on a keyword & getting the most relevant item to that search. There are around 1000 results from the db. At random iterations, 1 of the driver actions will throw up an StaleElementReferenceError with the full message of:
stale element reference: element is not attached to the page document\n (Session info: chrome=77.0.3865.75)
Looking at the official docs I can see that the 2 common causes for this are:
The element has been deleted entirely.
The element is no longer attached to the DOM.
With the former being the most frequent cause.
index.js
const { MongoClient, ObjectID } = require('mongodb')
const fs = require('fs')
const path = require('path')
const { Builder, Capabilities, until, By } = require('selenium-webdriver')
const chrome = require('selenium-webdriver/chrome')
require('dotenv').config()
async function init() {
try {
const chromeOpts = new chrome.Options()
const ids = fs.readFileSync(path.resolve(__dirname, '..', 'data', 'primary_ids.json'), 'utf8')
const client = await MongoClient.connect(process.env.DB_URL || 'mongodb://localhost:27017/test', {
useNewUrlParser: true
})
const db = client.db(process.env.DB_NAME || 'test')
const productCursor = db.collection('product').find(
{
accountId: ObjectID(process.env.ACCOUNT_ID),
primaryId: {
$in: JSON.parse(ids)
}
},
{
_id: 1,
primaryId: 1
}
)
const resultsSelector = 'body #wrapper div.src-routes-search-style__container--2g429 div.src-routes-search-style__products--3rsz9'
const mostRelevantSelector = `${resultsSelector}
> div:nth-child(2)
> div.src-routes-search-product-item-raw-style__product--3vH_O:nth-child(1)`
const titleContainerSelector = `${mostRelevantSelector}
> div.src-routes-search-product-item-raw-style__mainPart--1HEWx
> div.src-routes-search-product-item-raw-style__containerText--3NefD
> div.src-routes-search-product-item-raw-style__description--3swql
> div.src-routes-search-product-item-raw-style__titleContainer--tazkH`
const productImageSelector = `${mostRelevantSelector}
> div.src-routes-search-product-item-raw-style__mainPart--1HEWx
> div.src-routes-search-product-item-raw-style__containerImages--1PfdF
> a.src-routes-search-product-item-raw-style__productImage--1Y42Y
> img`
const linkSelector = `${titleContainerSelector} > a`
const primaryIdSelector = `${titleContainerSelector} > p`
chromeOpts.setChromeBinaryPath('/usr/local/bin')
const driver = await new Builder()
.withCapabilities(Capabilities.chrome())
.forBrowser('chrome')
.build()
let newProds = {}
let product
let i = 0
while (await productCursor.hasNext()) {
i += 1
product = await productCursor.next()
let searchablePrimaryId = product.primaryId
let link
let primaryId
let pId
let href
let img
let imgSrc
if (product.primaryId.includes('#')) {
searchablePrimaryId = product.primaryId.substr(0, product.primaryId.indexOf('#'))
}
if (searchablePrimaryId.includes('-')) {
searchablePrimaryId = searchablePrimaryId.substr(0, searchablePrimaryId.indexOf('-'))
}
await driver.get(`https://icecat.biz/en/search?keyword=${encodeURIComponent(searchablePrimaryId.toLowerCase())}`)
link = await driver.wait(until.elementLocated(By.css(linkSelector)), 10000) // wait 10 seconds
img = await driver.wait(until.elementLocated(By.css(productImageSelector)), 10000)
imgSrc = await img.getAttribute('src')
primaryId = await driver.wait(until.elementLocated(By.css(primaryIdSelector)), 10000)
pId = await primaryId.getText()
href = await link.getAttribute('href')
const iceCatId = href.substr(href.lastIndexOf('-') + 1, href.length)
const _iceCatId = iceCatId.substr(0, iceCatId.indexOf('.html'))
const idFound = (searchablePrimaryId.toUpperCase() === pId.toUpperCase()) && !imgSrc.includes('logo-fullicecat')
newProds[product._id.toString()] = {
primaryId: product.primaryId,
iceCatId: idFound ? _iceCatId : 'N/A'
}
}
const foundProducts = Object.values(newProds).filter(prod => prod.iceCatId !== 'N/A')
console.log(`\nFound ${foundProducts.length}/${JSON.parse(ids).length}`)
fs.writeFileSync(path.resolve(__dirname, '..', 'data', 'new_products.json'), JSON.stringify(newProds, null, 4), 'utf8')
driver.quit()
} catch(err) {
throw err
}
}
init()
.then(res => {
console.log(res)
})
.catch(err => {
console.error(err)
})
To debug, I put a try...catch around each of the driver actions to see which specific action it is that is failing but that didn't work as it was never a consistent action that was failing. For example, sometimes if would have been one of the elementLocated lines or others it would have just been the getAttribute action.
If it is the latter in that scenario, that is why I am confused as to why this error is being thrown as surely selenium has found the element on the page (i.e. link) but is unable to do getAttribute('src') on the element? That's why I'm confused as to the error I'm getting. I imagine I must be doing something wrong with how I am setting up selenium to handle iterations. The iterations never get higher than 110

In your case the second cause is The element is no longer attached to the DOM. If a WebElement is located and the DOM is refreshed afterwards this element become stale even if the DOM hasn't change, the same locator will return new WebElement.
Normally, driver.get() will block until the page is fully loaded, however this site is running JavaScript to load the search results. You can test it by running document.readyState in the developer tools console, you will see "complete" results while the search results are still loading.
The page has a spinner before the results are located, hopefully it will be enough to wait for it to appear and became stale before scraping the page
await driver.get(`https://icecat.biz/en/search?keyword=${encodeURIComponent(searchablePrimaryId.toLowerCase())}`)
let spinner = driver.wait(until.elementIsVisible(By.className('src-routes-search-style__loader---acti')))
driver.wait(until.stalenessOf(spinner))
link = await driver.wait(until.elementLocated(By.css(linkSelector)), 10000)

You don't have wait for Ajax request to finish. The website retrieves and refreshes dom once you go to end and also keeps calling index every few seconds so DOM probably keeps updating. You can probably hold AJAX requests, get your results, process and enable AJAX again.

Could you try removing "await" from img Src = await img.getAttribute('src'). Since wait for img is already handled in its previous line.

Cypress request wait by default?

I need Cypress to wait for any xhr requests to complete by default before performing any operations. Is there any way to make this as a default or any other alternatives because the application I am testing is slow and makes a lot of api calls?
Edit: By writing a single statement for every api request is getting messy and unnecessary work. Need a way to make this easier.

If what you want is to wait for a specific xhr you can do it making use of cy.route(). I use this in some scenarios and it is really useful. The general steps to use it are:
cy.server()
cy.route('GET','**/api/my-call/**').as('myXHR');
Do things in the UI such as clicking on a button that will trigger such api calls
cy.wait(#myXHR)
This way if such call isn't triggered your test will fail. You can find extensive documentation about this here

Found something that works for me here https://github.com/PinkyJie/cypress-auto-stub-example
Look for cy.waitUntilAllAPIFinished

I partialy solve the problem adding a waitAll command and ovewrite route command in support folder:
const routeCallArr = [];
Cypress.Commands.overwrite('route', (route, ...params) => {
const localRoute = route(...params);
if (localRoute.alias === undefined) return;
localRoute.onRequest = function() {
routeCallArr.push({alias: `#${localRoute.alias}`, starTime: Date.now()});
}
localRoute.onResponse = function() {
clearCall(`#${localRoute.alias}`);
}
})
const waitAll = (timeOut = 50000, options = {verbose: false, waitNested: false}) => {
const filterRouteCallArr = [];
const date = Date.now();
for (const routeCall of routeCallArr) {
if ((date - routeCall.starTime) > timeOut) continue;
filterRouteCallArr.push(routeCall.alias);
}
if (options.verbose ){
console.table(routeCallArr.map(routeCall => ({
deltaTime: date - routeCall.starTime,
alias: routeCall.alias,
starTime: routeCall.starTime,
})));
console.log(routeCallArr, filterRouteCallArr)
};
routeCallArr.length = [];
if (filterRouteCallArr.length > 0) {
const waiter = cy.wait(filterRouteCallArr, {timeout: timeOut});
options.waitNested && waiter.then(() => {
if (routeCallArr.length > 0) {
waitAll(timeOut, options);
}
});
}
}
Cypress.Commands.add('waitAll', waitAll)
And in the test instead of use cy.wait(['#call01',..., '#callN']); I use cy.waitAll();
The problem with this implementation came when have nested calls in a relative separate time interval from original calls. In that case you can use a recursive wait cy.waitAll(50000, {waitNested: true});

Scraping the same page forever using puppeteer

Doing scraping. How can I stay on a page and read the content to search for data every xx seconds without refresh the page? I use this way but the pc crashes after some time. Any ideas on how to make it efficient? I would like to achieve it without using while (true). The readOdds function does not always delay the same time.
//...
while(true){
const html = await page.content();
cant = await readOdds(html); // some code with the html
console.info('Waiting 5 seconds to read again...');
await page.waitFor(5000);
}
this is a section
async function readOdds(htmlPage){
try {
var savedat = functions.mysqlDateTime(new Date());
var pageHtml=htmlPage.replace(/(\r\n|\n|\r)/gm,"");
var exp_text_all = /<coupon-section(.*?)<\/coupon-section>/g;
var leagueLinksMatches = pageHtml.match(exp_text_all);
var cmarkets = 0;
let reset = await mysqlfunctions.promise_updateMarketsCount(cmarkets, table_markets_count, site);
console.log(reset);
if(leagueLinksMatches == null){
return cmarkets;
}
for (let i = 0; i < leagueLinksMatches.length; i++) {
const html = leagueLinksMatches[i];
var expc = /class="title ellipsis-text">(.*?)<\/span/g;
var nameChampionship = functions.getDataInHtmlCode(String(html).match(expc)[0]);
var idChampionship = await mysqlfunctions.promise_db_insert_Championship(nameChampionship, gsport, table_championship);
var exp_text = /<ui-event-line(.*?)<\/ui-event-line>/g;
var text = html.match(exp_text);
// console.info(text.length);
for (let index = 0; index < text.length; index++) {
const element = text[index];
....

Simple Solution with recursive callback
However before we go into that, you can try to run the function itself instead of while which will loop forever without any proper control.
const readLoop = async() => {
const html = await page.content();
cant = await readOdds(html);
return readLoop() // run the loop again
}
// invoke it for infinite callbacks without any delays at all
await readLoop();
Which will run the same block function continuously, without any delay, as long as your readOdds function returns. You won't have to use page.waitFor and while.
Memory leak prevention
For advanced cases where you have respawn over a period of time, Queue like bull and process manager like PM2 comes into play. However, queue will void your without refresh the page? part of your question.
You definitely should use pm2 though.
The usage is as follows,
npm i -g pm2
pm2 start index.js --name=myawesomeapp // or your app file
There are few useful arguments,
--max-memory-restart 100M, It can limit memory usage to 100M and restart itself.
--max-restarts 50, It will stop working once it restarts 50 times due to error (or memory leak).
You can check the logs using pm2 logs myawesomeapp as you set the name above.

Develop Reference

JavaScript is the programming language of the Web.

WebDriverIO iterate through URL and scroll through each - javascript

Short anwer $('.info-section').scrollIntoView() See https://webdriver.io/docs/api/element/scrollIntoView.html WebdriverIO suppors sync and async modes, see https://webdriver.io/docs/sync-vs-async.html

Related

Vscode move to line X after openTextDocument

How do I continuously listen for a new item while scraping a website

StaleElementReferenceError on iterations

Cypress request wait by default?

Scraping the same page forever using puppeteer

Categories

Resources