Trying to simulate user inputs in CefSharp3 (OffScreen) using JavaScript - javascript

I am trying to simulate the user operation on CefSharp(OffScreen) using JavaScript.
Once I load the page (https://www.w3.org),
I am trying to search in the search bar,
click the search button
open first result
So I have used,
await browser.EvaluateScriptAsync("document.getElementsByName('q')[0].value = 'CSS';");
await browser.EvaluateScriptAsync("document.getElementById('search-submit').click();");
await browser.EvaluateScriptAsync("document.getElementById('r1-0').click();");
But the issue I am facing is, to take screenshot, I have to use Thread.Sleep(x) in between and at the end for the pages to load before doing the next operation or take the screenshot.
Is there anyway to avoid the SLEEP and detect when the loading is done, to do the next operation?
I tried ExecuteScriptAsync also, same issue with that also.

You can modify and use the LoadPageAsync() function from the OffScreenExample for this purpose.
Just replace alter the parameters by removing the url as you wont be using it, and remove the if statement using the url.
Then on calling the LoadPageAsync after EvaluateScriptAsync with the corresponding browser object.
The Function will look like this after modification.
public static Task LoadPageAsync(IWebBrowser browser)
{
var tcs = new TaskCompletionSource<bool>();
EventHandler<LoadingStateChangedEventArgs> handler = null;
handler = (sender, args) =>
{
if (!args.IsLoading)
{
browser.LoadingStateChanged -= handler;
tcs.TrySetResultAsync(true);
}
};
browser.LoadingStateChanged += handler;
return tcs.Task;
}
Usage will be like,
await browser.EvaluateScriptAsync("document.getElementsByName('q')[0].value = 'CSS';");
await browser.EvaluateScriptAsync("document.getElementById('search-submit').click();");
await LoadPageAsyncCompleted(browser);
await browser.EvaluateScriptAsync("document.getElementById('r1-0').click();");
await LoadPageAsyncCompleted(browser);

Related

Close the page after certain interval [Puppeteer]

I have used puppeteer for one of my projects to open webpages in headless chrome, do some actions and then close the page. These actions, however, are user dependent. I want to attach a lifetime to the page, where it closes automatically after, say 30 minutes, of opening irrespective of whether any action is performed or not.
I have tried setTimeout() functionality of Node JS but it didn't work (or I just couldn't figure how to make it work).
I have tried the following:
const puppeteer = require('puppeteer-core');
const browser = await puppeteer.connect({browserURL: browser_url});
const page = await browser.newPage();
// timer starts ticking here upon creation of new page (maybe in a subroutine and not block the main thread)
/**
..
Do something
..
*/
// timer ends and closePage() is triggered.
const closePage = (page) => {
if (!page.isClosed()) {
page.close();
}
}
But this gives me the following error:
Error: Protocol error: Connection closed. Most likely the page has been closed.
Your provided code should work as excepted. Are you sure the page is still opened after the timeout and it is indeed the same page?
You can try this wrapper for opening pages and closing them correctly.
// since it is async it won't block the eventloop.
// using `await` will allow other functions to execute.
async function openNewPage(browser, timeoutMs) {
const page = await browser.newPage()
setTimeout(async () => {
// you want to use try/catch for omitting unhandled promise rejections.
try {
if(!page.isClosed()) {
await page.close()
}
} catch(err) {
console.error('unexpected error occured when closing page.', err)
}
}, timeoutMs)
}
// use it like so.
const browser = await puppeteer.connect({browserURL: browser_url});
const min30Ms = 30 * 60 * 1000
const page = await openNewPage(browser, min30Ms);
// ...
The above only closes the Tabs in your browser. For closing the puppeteer instance you would have to call browser.close() which could may be what you want?
page.close returns a promise so you need to define closePage as an async function and use await page.close(). I believe #silvan's answer should address the issue, just make sure to replace if condition
if(page.isClosed())
with
if(!page.isClosed())

How to deep copy a page of puppeteer in javascript?

I'm using puppeteer to navigate my website. I want to wait for an api that sometimes gets called and sometimes not. I'm using
await page.waitForResponse((response =>response.url().includes(myurl)), { timeout: 1000 });
to wait for that api. This works fine when the api gets called, but whenever the api doesn't get called, it crashes and the page isn't same anymore. So, I want to deep copy the page so that I can just check for the api via it's copy and even if that page gets damaged. I will have another that I can use.
I think you don't need to copy your page. That's probably not doable very easy and seems like a bit of overkill. Instead, preventing the page from crashing would be a simpler approach.
Try something like this:
async function waitForApi(url, timeoutMs) {
try {
console.log('waiting ', timeoutMs+'ms for special API. url:', url);
const opts = { timeout: timeoutMs || 1000 };
await page.waitForResponse(response => response.url().includes(url), opts);
console.log('Special API was called!.');
return true;
} catch(err) {
console.log('Special Api was appearantly not called. (Or may be failed.. Error:', err);
return false;
}
}
// example call of waitForApi ..
const myUrl = '...'
const apiCalled = await waitForApi(myUrl, 1000)
if(apiCalled) {
// do stuff if you want to..
} else {
// do stuff if you want to..
}
This should now log if the api was called or not and when needed you can handle the cases differently.

Cypress with react and google API services - how to stub autocomplete

I am trying to test a react webapp (created in a separate project), that contains a popup where there's an input containing a google auto-complete for cities:
(I changed text because of language)
I have in "search city" a text input where if data is inserted, google searches for cities and returns results (eg I search Rome, Italy):
When I press "save data" there's a function that checks google results, then closes the popup:
in a file:
export const useGoogleApiDesktop = () => {
let autocompleteService
if (window.google && window.google.maps) {
autocompleteService = new window.google.maps.places.AutocompleteService()
}
}
in another file (the one called):
const googleApi = useGoogleApiDesktop()
const onSubmitClick = useCallback(async () => {
[...]
const res: GoogleApiPlacesResponse = await googleApi.autocompleteService.getPlacePredictions({
input: addressComputed,
types: ['(cities)'],
componentRestrictions: { country: 'it' }
})
}, [])
When I use it in plain browser, everything works fine;
but if I try to launch it with cypress to test it, it returns me this error:
I am trying to avoid this error and simply go on and close the popup, since during my tests I do not need to write anything on that line; I only need to write something on the other textareas and close the popup.
Since I couldn't do it, I've tried to stub that call, but I am totally new in using cy.stub() and does not work:
function selectAddress(bookingConfig) {
// opens the popup
cy.get('.reservationsWhereAdd').click()
// trying to add the google library
const win = cy.state('window')
const document = win.document
const script = document.createElement('script')
script.src = `https://maps.googleapis.com/maps/api/js?key=[myApiKey]&libraries=places&language=it`
script.async = true
// this is commented since I don't think I need it
// window.initMap = function () {
// // JS API is loaded and available
// console.log('lanciato')
// }
// Append the ‘script’ element to ‘head’
document.head.appendChild(script)
// type something in some fields
cy.get('#street').type(bookingConfig.street)
cy.get('#streetNumber').type(bookingConfig.streetNum)
cy.get('#nameOnTheDoorbell').type(bookingConfig.nameOnTheDoorbell)
cy.get('#addressAlias').type(bookingConfig.addressAlias)
// this correctly finds and prints the object
console.log('--->', win.google.maps.places)
cy.stub(googleApi.autocompleteService, 'getPlacePredictions')
// this closes the popup
cy.get('.flex-1 > .btn').click()
}
this cy.stub however does not works, and I don't get why: it says
googleApi is not defined
Any idea on how to solve this? Thanks!
UPDATE:
After the error, working with the cypress window, I manually closed the popup, reopened it, filled the fields, and clicked on save data. It worked, so I added a cy.wait(1000) just after opening the popup and it works for 95% of the times (9 times on 10). Any Idea on how to "wait for loading the google api, then fill the fields"?
As the update block said, I discovered that the problem was that it kept really long time to load the google API, because it's not local and needs time to be retrieved.
So at first I just put a cy.wait(2000) before executing my code; but this couldn't be the answer: what happens if I run the code on a slow network? Or if it takes more time for my application to load?
So, i created a command, that first waits for the google API to load; if it fails to load after 5 attempts, the test fails.
Then, after that, my code is being executed. This way my test won't fail really easily.
Here's the code:
in cypress/support/command.js
Cypress.Commands.add('waitForGoogleApi', () => {
let mapWaitCount = 0
const mapWaitMax = 5
cyMapLoad()
function cyMapLoad() {
mapWaitCount++
cy.window().then(win => {
if (typeof win.google != 'undefined') {
console.log(`Done at attempt #${mapWaitCount}:`, win)
return true
} else if (mapWaitCount <= mapWaitMax) {
console.log('Waiting attempt #' + mapWaitCount) // just log
cy.wait(2000)
cyMapLoad()
} else if (mapWaitCount > mapWaitMax) {
console.log('Failed to load google api')
return false
}
})
}
})
in file you want to use it:
cy.waitForGoogleApi().then(() => {
// here comes the code to execute after loading the google Apis
})

Retry failed pages with new proxyUrl

I have developed an Actor+PuppeteerCrawler+Proxy based crawler and want to rescrape failed pages. To increase the chance for the rescrape, I want to switch to another proxyUrl. The idea is, to create a new crawler with a modified launchPupperteer function and a different proxyUrl, and re-enque the failed pages. Please check the sample code below.
But unfortunately, it doesn't work, although I reset the request queue by using drop and reopening. Is it possible to rescraped failed pages by using PuppeteerCrawler with a different proxyUrl and how?
Best regards,
Wolfgang
for(let retryCount = 0; retryCount <= MAX_RETRY_COUNT; retryCount++){
if(retryCount){
// Try to reset the request queue, so that failed request shell be rescraped
await requestQueue.drop();
requestQueue = await Apify.openRequestQueue(); // this is necessary to avoid exceptions
// Re-enqueue failed urls in array failedUrls >>> ignored although using drop() and reopening request queue!!!
for(let failedUrl of failedUrls){
await requestQueue.addRequest({url: failedUrl});
}
}
crawlerOptions.launchPuppeteerFunction = () => {
return Apify.launchPuppeteer({
// generates a new proxy url and adds it to a new launchPuppeteer function
proxyUrl: createProxyUrl()
});
};
let crawler = new Apify.PuppeteerCrawler(crawlerOptions);
await crawler.run();
}
I think your approach should work but on the other hand it should not be necessary. I'm not sure what createProxyUrl does.
You can supply a generic proxy URL with auto username which will use all your datacenter proxies at Apify. Or you can provide proxyUrls directly to PuppeteerCrawler.
Just don't forget that you have to switch browser to get a new IP from the proxy. More in this article - https://help.apify.com/en/articles/2190650-how-to-handle-blocked-requests-in-puppeteercrawler

C# WebBrowser control - document does not contain html input control [duplicate]

Most of the answers I have read concerning this subject point to either the System.Windows.Forms.WebBrowser class or the COM interface mshtml.HTMLDocument from the Microsoft HTML Object Library assembly.
The WebBrowser class did not lead me anywhere. The following code fails to retrieve the HTML code as rendered by my web browser:
[STAThread]
public static void Main()
{
WebBrowser wb = new WebBrowser();
wb.Navigate("https://www.google.com/#q=where+am+i");
wb.DocumentCompleted += delegate(object sender, WebBrowserDocumentCompletedEventArgs e)
{
mshtml.IHTMLDocument2 doc = (mshtml.IHTMLDocument2)wb.Document.DomDocument;
foreach (IHTMLElement element in doc.all)
{
System.Diagnostics.Debug.WriteLine(element.outerHTML);
}
};
Form f = new Form();
f.Controls.Add(wb);
Application.Run(f);
}
The above is just an example. I'm not really interested in finding a workaround for figuring out the name of the town where I am located. I simply need to understand how to retrieve that kind of dynamically generated data programmatically.
(Call new System.Net.WebClient.DownloadString("https://www.google.com/#q=where+am+i"), save the resulting text somewhere, search for the name of the town where you are currently located, and let me know if you were able to find it.)
But yet when I access "https://www.google.com/#q=where+am+i" from my Web Browser (ie or firefox) I see the name of my town written on the web page. In Firefox, if I right click on the name of the town and select "Inspect Element (Q)" I clearly see the name of the town written in the HTML code which happens to look quite different from the raw HTML that is returned by WebClient.
After I got tired of playing System.Net.WebBrowser, I decided to give mshtml.HTMLDocument a shot, just to end up with the same useless raw HTML:
public static void Main()
{
mshtml.IHTMLDocument2 doc = (mshtml.IHTMLDocument2)new mshtml.HTMLDocument();
doc.write(new System.Net.WebClient().DownloadString("https://www.google.com/#q=where+am+i"));
foreach (IHTMLElement e in doc.all)
{
System.Diagnostics.Debug.WriteLine(e.outerHTML);
}
}
I suppose there must be an elegant way to obtain this kind of information. Right now all I can think of is add a WebBrowser control to a form, have it navigate to the URL in question, send the keys "CLRL, A", and copy whatever happens to be displayed on the page to the clipboard and attempt to parse it. That's horrible solution, though.
I'd like to contribute some code to Alexei's answer. A few points:
Strictly speaking, it may not always be possible to determine when the page has finished rendering with 100% probability. Some pages
are quite complex and use continuous AJAX updates. But we
can get quite close, by polling the page's current HTML snapshot for changes
and checking the WebBrowser.IsBusy property. That's what
LoadDynamicPage does below.
Some time-out logic has to be present on top of the above, in case the page rendering is never-ending (note CancellationTokenSource).
Async/await is a great tool for coding this, as it gives the linear
code flow to our asynchronous polling logic, which greatly simplifies it.
It's important to enable HTML5 rendering using Browser Feature
Control, as WebBrowser runs in IE7 emulation mode by default.
That's what SetFeatureBrowserEmulation does below.
This is a WinForms app, but the concept can be easily converted into a console app.
This logic works well on the URL you've specifically mentioned: https://www.google.com/#q=where+am+i.
using Microsoft.Win32;
using System;
using System.ComponentModel;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace WbFetchPage
{
public partial class MainForm : Form
{
public MainForm()
{
SetFeatureBrowserEmulation();
InitializeComponent();
this.Load += MainForm_Load;
}
// start the task
async void MainForm_Load(object sender, EventArgs e)
{
try
{
var cts = new CancellationTokenSource(10000); // cancel in 10s
var html = await LoadDynamicPage("https://www.google.com/#q=where+am+i", cts.Token);
MessageBox.Show(html.Substring(0, 1024) + "..." ); // it's too long!
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
// navigate and download
async Task<string> LoadDynamicPage(string url, CancellationToken token)
{
// navigate and await DocumentCompleted
var tcs = new TaskCompletionSource<bool>();
WebBrowserDocumentCompletedEventHandler handler = (s, arg) =>
tcs.TrySetResult(true);
using (token.Register(() => tcs.TrySetCanceled(), useSynchronizationContext: true))
{
this.webBrowser.DocumentCompleted += handler;
try
{
this.webBrowser.Navigate(url);
await tcs.Task; // wait for DocumentCompleted
}
finally
{
this.webBrowser.DocumentCompleted -= handler;
}
}
// get the root element
var documentElement = this.webBrowser.Document.GetElementsByTagName("html")[0];
// poll the current HTML for changes asynchronosly
var html = documentElement.OuterHtml;
while (true)
{
// wait asynchronously, this will throw if cancellation requested
await Task.Delay(500, token);
// continue polling if the WebBrowser is still busy
if (this.webBrowser.IsBusy)
continue;
var htmlNow = documentElement.OuterHtml;
if (html == htmlNow)
break; // no changes detected, end the poll loop
html = htmlNow;
}
// consider the page fully rendered
token.ThrowIfCancellationRequested();
return html;
}
// enable HTML5 (assuming we're running IE10+)
// more info: https://stackoverflow.com/a/18333982/1768303
static void SetFeatureBrowserEmulation()
{
if (LicenseManager.UsageMode != LicenseUsageMode.Runtime)
return;
var appName = System.IO.Path.GetFileName(System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName);
Registry.SetValue(#"HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Main\FeatureControl\FEATURE_BROWSER_EMULATION",
appName, 10000, RegistryValueKind.DWord);
}
}
}
Your web-browser code looks reasonable - wait for something, that grab current content. Unfortunately there is no official "I'm done executing JavaScript, feel free to steal content" notification from browser nor JavaScript.
Some sort of active wait (not Sleep but Timer) may be necessary and page-specific. Even if you use headless browser (i.e. PhantomJS) you'll have the same issue.

Categories

Resources