I am trying to scrape a web page that requires clicking on a page button through a __doPostBack function. I have tried the following code in the chrome console.
javascript:__doPostBack('ctl00$siteContent$widgetLayout$rptWidgets$ctl03$widgetContainer$ctl00$pgrTable$pagingLinksRepeater$ctl02$pageSelector','')
This works and I am able to move to the next page. However I am having some difficulty in passing this command to puppeteer. I have tried the following with no success.
await page.evaluate(() => { javascript:__doPostBack('ctl00$siteContent$widgetLayout$rptWidgets$ctl03$widgetContainer$ctl00$pgrTable$pagingLinksRepeater$ctl02$pageSelector','');})
I have also tried to modify the aspnet form by resetting the __EVENTTARGET value to
'ctl00$siteContent$widgetLayout$rptWidgets$ctl03$widgetContainer$ctl00$pgrTable$pagingLinksRepeater$ctl02$pageSelector'
but it does not seem to be sufficient. Grateful for any suggestions.
The problem is that ASP.NET is registering two __doPostBack functions.
One in the page:
//<![CDATA[
var theForm = document.forms['aspnetForm'];
if (!theForm) {
theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
//]]>
And another one in the source script
Sys.Extended.UI.ControlBase.__doPostBack = function(n, t) {
if (!Sys.WebForms.PageRequestManager.getInstance().get_isInAsyncPostBack())
for (var i = 0; i < Sys.Extended.UI.ControlBase.onsubmitCollection.length; i++)
Sys.Extended.UI.ControlBase.onsubmitCollection[i]();
Function.createDelegate(window, Sys.Extended.UI.ControlBase.__doPostBackSaved)(n, t)
};
As they are extending window with ControlBase the __doPostBack function you are getting is the one from the resource file instead of the one from the page.
You can click the button instead.
await page.click('#ctl00_siteContent_widgetLayout_rptWidgets_ctl03_widgetContainer_ctl00_pgrTable_pagingLinksRepeater_ctl01_pageSelector');
You may call the function before it is defined in the page. Try to wait for it to be defined:
await page.waitForFunction(() => typeof __doPostBack !== 'undefined');
await page.evaluate(() => {
__doPostBack('ctl00$siteContent$widgetLayout$rptWidgets$ctl03$widgetContainer$ctl00$pgrTable$pagingLinksRepeater$ctl02$pageSelector','');
});
Came across this post after I tried doing this same thing with the website
https://members.acacamps.org/rentals
I tried to use page.click on the "Next" button with the navigator at the bottom, but I get a Node not found or not an HTML element error.
Not sure why Puppeteer doesn't pick it up, but imagine it has to do with the doPostBack.
The solution was to use document.querySelector:
page.evaluate(()=>document.querySelector('linkid').click()) worked
Hope this might help anyone that had this issue as well. I think we need to use querySelector here because this case does not have two separate doPostBack calls like the OP's problem?
Related
I am trying to store the last page and the last form field that the user was focused on prior to his unexpected exit from the page (did not click continue), but my solution is not working.
I am using the onbeforeunload event on the pages themselves, I fully realise that this event does not work consistently across all the browsers, but I could not figure out a way to this in an another way.
window.onbeforeunload = function () {
if (fieldName != null && fieldName.length > 0) {
var formName = location.pathname.substring(1);
if (typeof (TrackFormField) == 'function') {
try {
TrackFormField(formName, fieldName);
}
catch (err) {
}
}
}
};
TrackFormField is function in a separate file that just assigns the value to the property
function TrackFormField(formName, fieldName) {
if (formName) {
s.prop23 = formName + ":" + fieldName;
}
sendOmniture();
}
And sendOmniture does the following:
function sendOmniture() {
var s_code = s.t(); if (s_code) document.write(s_code)
}
The weird thing is that at times it works, but usually I don't see the prop23 neither in the analytics debugger nor in Fiddler.
After some debugging I found out that s_code for some reason is undefined in the send omniture function.
What can I do to fix this issue ?
My guess is the window.onbeforeunload event will kill the (s) object in most cases before it has a chance to execute.
What about using a Direct Call rule with Adobe DTM that will fire your function on focus change? That way you don't need to tie any action to the onbeforeunload event.
Hope this helps.
I have a script that redirects the user to another page. I want to load some content into a div on the new page after the new page has fully loaded. How can I do this. The following doesn't work.
function goToPage() {
window.location.href = 'http://www.mypage.com/info';
$('.my_class').load('my/url/path/with/content/to/load');
}
The newly loaded page http://www.mypage.com/info contains the following div:
<div class="my_class"></div>
What am I doing wrong?
Redirect to the new page, but append a hash signal to the URL.
function goToPage() {
window.location.href = 'http://www.mypage.com/info#load-stuff;
}
Then on load of the target page, evaluate the URL checking for that hash signal.
function pageLoad() {
if (window.location.hash === "#load-stuff") {
$('.my_class').load('my/url/path/with/content/to/load');
}
}
If your application is using jQuery it'd look something like:
$(function () {
if (window.location.hash === "#load-stuff") {
$('.my_class').load('my/url/path/with/content/to/load');
}
});
That's the rough idea at least.
As pointed out in the other answers, you won't be able to perform any script instructions from your original site. Instead of using PHP to create the content statically, you could also use HTML fragments as arguments, e.g. like this:
// in the original page:
function goToPage() {
window.location.href = 'http://www.mypage.com/info#my/url/path/with/content/to/load';
}
// in http://www.mypage.com/info:
$( document ).ready(function () {
if(window.location.hash)
$('.my_class').load(window.location.hash.substring(1));
}
An easy way to pass data to your page you are redirecting to would be to set some url parameters.
For example:
window.location.href - "http://youpage.com/?key=value"
When that page loads you could have a:
$(document).ready(function(){
var my_param = getUrlParameter('key');
if(my_param == "value"){
//do your stuff here
}
});
var getUrlParameter = function getUrlParameter(sParam) {
var sPageURL = decodeURIComponent(window.location.search.substring(1)),
sURLVariables = sPageURL.split('&'),
sParameterName,
i;
for (i = 0; i < sURLVariables.length; i++) {
sParameterName = sURLVariables[i].split('=');
if (sParameterName[0] === sParam) {
return sParameterName[1] === undefined ? true : sParameterName[1];
}
}
};
You should just run
$('.my_class').load('my/url/path/with/content/to/load');
on this page: http://www.mypage.com/info.
When you do window.location.href = 'http://www.mypage.com/info'; you're redirecting to another page. Nothing after that line will happen. You have to instead run the code after that line on the page that's loaded.
You can do this a few different ways. Try leveraging the localstorage API and passing info or content with a name and value pair (or a few of them) and unpack it on the receiving end.
On the page you're redirecting to, check for the localstorage key, and then load the contents of it (the aforementioned name and value pairs) into a div.
As an alternative, you can write one script file that you can deploy to several pages; do a check on window.location.href and conditionally load script accordingly. If you're on the redirected page, you can run whatever script you like. The nice part about doing it this way is that you're still working with one JS file - no need to fragment your code (assuming, of course, that the pages you're working with are all on the same site).
You don't need to do anything with php if you don't want to, or hashes... there's a few nifty tools that will do the trick if you can leverage HTML5 and its associated APIs.
window.location = '#/MyPage';
setTimeout(function() {
//MyCode To Run After PageLoad
});
You are redirecting the browser with window.location.href and I'm afraid as you are purely just changing the browser's location, you can't have any affect/input on the page you are moving to (unless you use query string parameters and then create content with something like PHP (myurl.php?newcontent=whatever) )
Once you redirect you can no longer execute scripts on that page, as the page is unloaded.
Try this,
redirect page:
function goToPage() {
window.location.href = 'http://www.mypage.com/info;
}
mypage.com/info:
js:
$('.my_class').load('my/url/path/with/content/to/load');
html:
<div class="my_class"></div>
I hope this helped you out, and let me know if you need further assistance!
I have been trying to write a script that fetches results from my university website. Someone suggested that I use Mechanize and it does look really promising.
In order to get the result, one has to first enter the roll number and then select the session.
Simulating the first part has been easy with Mechanize, but with the second part I'm having problems as it is actually a JavaScript onchange event.
I read the function definition in the JavaScript and this is what I have come up with so far. Mechanize can't handle the onchange event and also when I pass the values that are actually changed by the JavaScript function manually, the same page is returned.
Here's the javaScript Code
function __doPostBack(eventTarget, eventArgument) {
var theform;
if (window.navigator.appName.toLowerCase().indexOf("microsoft") > -1) {
theform = document.Form1;
}
else {
theform = document.forms["Form1"];
}
theform.__EVENTTARGET.value = eventTarget.split("$").join(":");
theform.__EVENTARGUMENT.value = eventArgument;
theform.submit();
}
I set a breakpoint in firebug and found the value of __EVENTTARGET to be 'Dt1', whereas __EVENTARGUMENT stays ''.
The ruby script that I have written to do this is
require 'mechanize'
#set up the agent to mimic firefox on windows
agent = Mechanize.new
agent.keep_alive = true
agent.user_agent = 'Windows Mozilla'
page = agent.get('http://www.nitt.edu/prm/nitreg/ShowRes.aspx')
#using mechanize to get us past the first form presented
result_form = page.form('Form1')
result_form.TextBox1 = '205110018'
page = agent.submit( result_form, result_form.buttons.first )
#the second hurdle that we encounter,
#here i'm trying to get past the JavaScript by doing what it does manually
result_form = page.form('Form1')
result_form.field_with('Dt1').options.find { |opt| opt.value == '66' }.select
result_form.field_with( :name => '__EVENTTARGET' ).value = 'Dt1'
#here i should have got the page with the results
page = agent.submit(result_form)
pp page
Can anyone tell me what I'm doing wrong?
It looks like you have it working already! Try using puts page.body instead of pp page and you'll see the contents of the page. You can use Mechanize search functions to scrape the data from the page.
Also, you could simplify that code to:
result_form['__EVENTTARGET'] = 'Dt1'
result_form['Dt1'] = '66'
Is there a one-liner I could execute in a javascript console to download and execute a javascript script from a remote source?
I was looking to see if there was a nice way to download this script and use it for experimenting interactively on random pages which may not have say, jQuery loaded.
[edit: I'm aware I could dynamically create a script element but is there a nicer way to do this?]
I've written a little script for that.
var loadjQuery = function(cb){
if(typeof(jQuery) == 'undefined'){
var scr = document.createElement('script');
scr.setAttribute('type', 'text/javascript');
scr.setAttribute('src', 'http://code.jquery.com/jquery-latest.js');
if(scr.readyState){
scr.onreadystatechange = function(){
if(scr.readyState === 'complete' || scr.readyState === 'loaded'){
scr.onreadystatechange = null;
if(cb === 'function'){
args = [].slice.call(arguments, 1);
cb.apply(this, args);
}
}
};
}
else {
scr.onload = function(){
if(cb === 'function'){
args = [].slice.call(arguments, 1);
cb.apply(this, args);
}
};
}
var head = document.getElementsByTagName('head')[0];
head.insertBefore(scr, head.firstChild);
}
}
This works cross-browser.
edit
I've updated that script as a function with a callback. Synopsis should be:
loadjQuery(function(something){
// execute code after library was loaded & executed
});
Well, it is quite simple to take a long javascript snippet and put it all together into one line :)
This approach takes a few lines you could mix togehter into a oneliner (but i guess you are looking for a shorter solution).
You will have to eval the contents of the two script tags to load Google AJAX libraries - that is all. You might need to do a call to get the first one though.
Go to the remote source (e.g.: https://ajax.googleapis.com/ajax/libs/d3js/5.7.0/d3.min.js)
Select all the js source (ctrl + a) and copy to the clipboard (ctrl + c)
Go to the target website where you want to inject the js
Open the console, paste the copied source and hit enter
All the functions of the library are available to you on the target website's console now.
I load this JS code from a bookmarklet:
function in_array(a, b)
{
for (i in b)
if (b[i] == a)
return true;
return false;
}
function include_dom(script_filename) {
var html_doc = document.getElementsByTagName('head').item(0);
var js = document.createElement('script');
js.setAttribute('language', 'javascript');
js.setAttribute('type', 'text/javascript');
js.setAttribute('src', script_filename);
html_doc.appendChild(js);
return false;
}
var itemname = '';
var currency = '';
var price = '';
var supported = new Array('www.amazon.com');
var domain = document.domain;
if (in_array(domain, supported))
{
include_dom('http://localhost/bklts/parse/'+domain+'.js');
alert(getName());
}
[...]
Note that the 'getName()' function is in http://localhost/bklts/parse/www.amazon.com/js. This code works only the -second- time I click the bookmarklet (the function doesn't seem to get loaded until after the alert()).
Oddly enough, if I change the code to:
if (in_array(domain, supported))
{
include_dom('http://localhost/bklts/parse/'+domain+'.js');
alert('hello there');
alert(getName());
}
I get both alerts on the first click, and the rest of the script functions. How can I make the script work on the first click of the bookmarklet without spurious alerts?
Thanks!
-Mala
Adding a <script> tag through DHTML makes the script load asynchroneously, which means that the browser will start loading it, but won't wait for it to run the rest of script.
You can handle events on the tag object to find out when the script is loaded. Here is a piece of sample code I use that seems to work fine in all browsers, although I'm sure theres a better way of achieving this, I hope this should point you in the right direction:
Don't forget to change tag to your object holding the <script> element, fnLoader to a function to call when the script is loaded, and fnError to a function to call if loading the script fails.
Bear in mind that those function will be called at a later time, so they (like tag) must be available then (a closure would take care of that normally).
tag.onload = fnLoader;
tag.onerror = fnError;
tag.onreadystatechange = function() {
if (!window.opera && typeof tag.readyState == "string"){
/* Disgusting IE fix */
if (tag.readyState == "complete" || tag.readyState == "loaded") {
fnLoader();
} else if (tag.readyState != "loading") {
fnError();
};
} else if (tag.readyState == 4) {
if (tag.status != 200) {
fnLoader();
}
else {
fnError();
};
};
});
It sounds like the loading of the external script (http://localhost/bklts/parse/www.amazon.com/js) isn't blocking execution until it is loaded. A simple timeout might be enough to give the browser a chance to update the DOM and then immediately queue up the execution of your next block of logic:
//...
if (in_array(domain, supported))
{
include_dom('http://localhost/bklts/parse/'+domain+'.js');
setTimeout(function() {
alert(getName());
}, 0);
}
//...
In my experience, if zero doesn't work for the timeout amount, then you have a real race condition. Making the timeout longer (e.g. 10-100) may fix it for some situations but you get into a risky situation if you need this to always work. If zero works for you, then it should be pretty solid. If not, then you may need to push more (all?) of your remaining code to be executed into the external script.
The best way I could get working: Don't.
Since I was calling the JS from a small loader bookmarklet anyway (which just tacks the script on to the page you're looking at) I modified the bookmarklet to point the src to a php script which outputs the JS code, taking the document.domain as a parameter. As such, I just used php to include the external code.
Hope that helps someone. Since it's not really an answer to my question, I won't mark this as the accepted answer. If someone has a better way, I'd love to know it, but I'll be leaving my code as is:
bookmarklet:
javascript:(function(){document.body.appendChild(document.createElement('script')).src='http://localhost/bklts/div.php?d='+escape(document.domain);})();
localhost/bklts/div.php:
<?php
print("
// JS code
");
$supported = array("www.amazon.com", "www.amazon.co.uk");
$domain = #$_GET['d']
if (in_array($domain, $supported))
include("parse/$domain.js");
print("
// more JS code
");
?>