I have been trying to write a script that fetches results from my university website. Someone suggested that I use Mechanize and it does look really promising.
In order to get the result, one has to first enter the roll number and then select the session.
Simulating the first part has been easy with Mechanize, but with the second part I'm having problems as it is actually a JavaScript onchange event.
I read the function definition in the JavaScript and this is what I have come up with so far. Mechanize can't handle the onchange event and also when I pass the values that are actually changed by the JavaScript function manually, the same page is returned.
Here's the javaScript Code
function __doPostBack(eventTarget, eventArgument) {
var theform;
if (window.navigator.appName.toLowerCase().indexOf("microsoft") > -1) {
theform = document.Form1;
}
else {
theform = document.forms["Form1"];
}
theform.__EVENTTARGET.value = eventTarget.split("$").join(":");
theform.__EVENTARGUMENT.value = eventArgument;
theform.submit();
}
I set a breakpoint in firebug and found the value of __EVENTTARGET to be 'Dt1', whereas __EVENTARGUMENT stays ''.
The ruby script that I have written to do this is
require 'mechanize'
#set up the agent to mimic firefox on windows
agent = Mechanize.new
agent.keep_alive = true
agent.user_agent = 'Windows Mozilla'
page = agent.get('http://www.nitt.edu/prm/nitreg/ShowRes.aspx')
#using mechanize to get us past the first form presented
result_form = page.form('Form1')
result_form.TextBox1 = '205110018'
page = agent.submit( result_form, result_form.buttons.first )
#the second hurdle that we encounter,
#here i'm trying to get past the JavaScript by doing what it does manually
result_form = page.form('Form1')
result_form.field_with('Dt1').options.find { |opt| opt.value == '66' }.select
result_form.field_with( :name => '__EVENTTARGET' ).value = 'Dt1'
#here i should have got the page with the results
page = agent.submit(result_form)
pp page
Can anyone tell me what I'm doing wrong?
It looks like you have it working already! Try using puts page.body instead of pp page and you'll see the contents of the page. You can use Mechanize search functions to scrape the data from the page.
Also, you could simplify that code to:
result_form['__EVENTTARGET'] = 'Dt1'
result_form['Dt1'] = '66'
Related
I'm trying to set a HTML input to read-only using ExecuteScriptAsync. I can make it work, but it's not an ideal scenario, so I'm wondering if anyone knows why it doesn't work the way I would expect it to.
I'm using Cef3, version 63.
I tried to see if it's a timing issue and doesn't appear to be.
I tried invalidating the view of the browser but that doesn't seem to help.
The code I currently have, which works:
public void SetReadOnly()
{
var script = #"
(function(){
var labelTags = document.getElementsByTagName('label');
var searchingText = 'Notification Initiator';
var found;
for (var i=0; i<labelTags.length; i++)
{
if(labelTags[i].textContent == searchingText)
{
found = labelTags[i]
break;
}
}
if(found)
{
found.innerHTML='Notification Initiator (Automatic)';
var input;
input = found.nextElementSibling;
if(input)
{
input.setAttribute('readonly', 'readonly');
}
}})()
";
_viewer.Browser.ExecuteScriptAsync(script);
_viewer.Browser.ExecuteScriptAsync(script);
}
now, if I remove
found.innerHTML='Notification Initiator (Automatic)';
the input is no longer shown as read-only. The HTML source of the loaded webpage does show it as read-only, but it seems like the frame doesn't get re-rendered once that property is set.
Another issue is that I'm executing the script twice. If I run it only once I don't get the desired result. I'm thinking this could be a problem with V8 Context that is required for the script to run. Apparently running the script will create the context, so that could be the reason why running it twice works.
I have been trying to figure this out for hours, haven't found anything that would explain this weird behaviour. Does anyone have a clue?
Thanks!
I am currently working on Pega. I am basically new in it. Pega version which I am working is 8.1.
Actually in the flow while submitting a form whenever there are validation errors in the required fields there an alert pops up. But as far as UI and UX the window needs to scroll with a subtle animation to the error fields, this I can do perfectly for any HTML page using jquery, but as I am new to Pega, I am facing a problem to get the error fields.
What I find, the error fields are appending on the relevant fields whenever there are validations, but catching those elements are going undefined.
I am sharing the simple code which I have written:
var validateScroll = function(){
var getErrors = function(){
this.parent = "";
}
getErrors.prototype.getErrorfn = function(node){
this.parent = $(node);
var errorLabels = this.parent.find('.dynamic-icon-error');
console.log(errorLabels.attr('class'))
}
var a = new getErrors();
var scrollfn = function(){
var forms = $(document).find('form')[0];
var _FORM = $(forms);
_FORM.on('submit',a.getErrorfn(_FORM));
}
return{
init:function(){
scrollfn();
}
}
}();
$(document).ready(function(){
validateScroll.init();
})
The result I am getting is "undefined", since those are appending and not caught in the DOM.
Can anyone please suggest me any process? Is there any API to get the error fields in javascript. I have searched their Javascript API, but couldn't find any such API.
I have also asked this on PDN but got no reply.
Thanks in advance.
I am trying to scrape a web page that requires clicking on a page button through a __doPostBack function. I have tried the following code in the chrome console.
javascript:__doPostBack('ctl00$siteContent$widgetLayout$rptWidgets$ctl03$widgetContainer$ctl00$pgrTable$pagingLinksRepeater$ctl02$pageSelector','')
This works and I am able to move to the next page. However I am having some difficulty in passing this command to puppeteer. I have tried the following with no success.
await page.evaluate(() => { javascript:__doPostBack('ctl00$siteContent$widgetLayout$rptWidgets$ctl03$widgetContainer$ctl00$pgrTable$pagingLinksRepeater$ctl02$pageSelector','');})
I have also tried to modify the aspnet form by resetting the __EVENTTARGET value to
'ctl00$siteContent$widgetLayout$rptWidgets$ctl03$widgetContainer$ctl00$pgrTable$pagingLinksRepeater$ctl02$pageSelector'
but it does not seem to be sufficient. Grateful for any suggestions.
The problem is that ASP.NET is registering two __doPostBack functions.
One in the page:
//<![CDATA[
var theForm = document.forms['aspnetForm'];
if (!theForm) {
theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
//]]>
And another one in the source script
Sys.Extended.UI.ControlBase.__doPostBack = function(n, t) {
if (!Sys.WebForms.PageRequestManager.getInstance().get_isInAsyncPostBack())
for (var i = 0; i < Sys.Extended.UI.ControlBase.onsubmitCollection.length; i++)
Sys.Extended.UI.ControlBase.onsubmitCollection[i]();
Function.createDelegate(window, Sys.Extended.UI.ControlBase.__doPostBackSaved)(n, t)
};
As they are extending window with ControlBase the __doPostBack function you are getting is the one from the resource file instead of the one from the page.
You can click the button instead.
await page.click('#ctl00_siteContent_widgetLayout_rptWidgets_ctl03_widgetContainer_ctl00_pgrTable_pagingLinksRepeater_ctl01_pageSelector');
You may call the function before it is defined in the page. Try to wait for it to be defined:
await page.waitForFunction(() => typeof __doPostBack !== 'undefined');
await page.evaluate(() => {
__doPostBack('ctl00$siteContent$widgetLayout$rptWidgets$ctl03$widgetContainer$ctl00$pgrTable$pagingLinksRepeater$ctl02$pageSelector','');
});
Came across this post after I tried doing this same thing with the website
https://members.acacamps.org/rentals
I tried to use page.click on the "Next" button with the navigator at the bottom, but I get a Node not found or not an HTML element error.
Not sure why Puppeteer doesn't pick it up, but imagine it has to do with the doPostBack.
The solution was to use document.querySelector:
page.evaluate(()=>document.querySelector('linkid').click()) worked
Hope this might help anyone that had this issue as well. I think we need to use querySelector here because this case does not have two separate doPostBack calls like the OP's problem?
I am using an authentication cookie passed between websites on the same domain. The cookie contains some user info and page info (the accession number). The design goal is for the user to click a button on the referring website, and it will launch a second website, authenticate based on the cookie, and do some useful stuff with the accession number. I got most of this built, including getting the authentication passed and properly parsed out on the receiving system.
The problem I am having is that I can't get the data within the cookie into the javascript on the page. It seems when i launch website2 from website1, $(document).ready() is not fired after the page_load event (which handles the cookie parsing). Also I tried using a literal to post the javascript code, it's never fired (seemingly it places it after the client side stuff is executed.
What I really want to do is call a javascript function getResults(accnum) using this data.
I have this code on the page_load event:
if (userdata != null)
{
accnum = userdata[4];
}
if (accnum != String.Empty)
{
//HttpCookie accnumcookie = new HttpCookie("accnum", accnum);
//this.Context.Response.Cookies.Set(accnumcookie);
}
}
When I run the .Set function, I'm not really sure of the innards and details, but long story short, the cookie is set but does nothing.
This is the document.ready.
$(document).ready(function () {
var accnum = new String();
accnum = GetCookie('accnum');
if (accnum != null) {
document.cookie = 'test=testz';
var srch = document.getElementById('crit');
srch.style.display = 'none';
getResults('', 'accnum', accnum);
}
I'm wondering if anyone else has experienced the following issue.
On a single non-linked (to a master page) .aspx page, I'm performing simple JS validations:
function validateMaxTrans(sender, args) {
// requires at least one digit, numeric only characters
var error = true;
var regexp = new RegExp("^[0-9]{1,40}(\.[0-9]{1,2})?$");
var txtAmount = document.getElementById('TxtMaxTransAmount');
if (txtAmount.value.match(regexp) && parseInt(txtAmount.value) >= 30) {
document.getElementById('maxTransValMsg').innerHTML = ""
args.IsValid = true;
}
else {
document.getElementById('maxTransValMsg').innerHTML = "*";
args.IsValid = false;
}
}
Then as soon as I move this into a Master page's content page, I get txtAmount is null.
Is there a different way to access the DOM when attempting to perform client-side JS validation with master/content pages?
Look at the source for your rendered page within the master page. Many elements will have an ID like ControlX$SubControlY$txtMaxTransAmount ... you'll need to adjust your validation accordingly. I will often just inject the IDs into the client doc..
<script type="text/javascript">
var controls = {
'txtAmount': '<%=TxtMaxTransAmount.ClientId%>',
...
}
</script>
I'd put this right before the end of your content area, to make sure the controls are rendered already. This way you can simply use window.controls.txtAmount to reference the server-side control's tag id. You could even make the right-side value a document.getElementById('...') directly.
Are you using asp textboxes? If so I believe you need to do somethign like document.getElementById('<%= txtMaxTransAmount.ClientID %>').
Hope this helps
Tom