I want to automatically login to a webpage without already logged in to a session, search something through its search bars, and load the HTML of the last search result from the result list. Do anyone have any direction on how to do this?
Unlike websites like Google, the final url does not change after the search. It appears that the search bar send a form about the search request and change the inner html of the webpage, without changing the title url, which makes things more difficult
If you want to know what this is for, I am doing some web scraping.
I tried making an HTML that automatically submits a form that mimics the form of the login page and search bars:
<body onload="document.frm1.submit()">
<form action="http://XXXXX.com/login" name="frm1">
<input type="text" name="username" value="Smith" />
<input type="password" name="password" value="12345" />
</form>
</body>
It works. However, it gets very complicated when the HTML of the webpage is very complicated. Right now I am just trying to mimic it based on what I see in F12 inspect HTML. Also, this method does not allow action chains. After I logged in, I am not able to search the items. And it seems like the page does not store sessions, because when I open a new tab I have to log in again, but I am not very sure.
I am trying to view the http header directly through the browser's inspect elements >> networks function. But I still haven't figured out exactly how to use the information there.
I also tried selenium (python), but its sendkeys function is so slow especially when I can only use Internet Explorer (a few seconds per character!). And Even if I use selenium, I am not sure why but some buttons in the webpage cannot be clicked.
For the second part (getting the html) I haven't done it as well, but I guess I will just use standard libraries like BeautifulSoup?
There are a few websites I need to work on. One of them is https://www.mdsystem.com
update:
I tried copying the form data from chrome>>insepct elemenets>>networks and sending the form using python library requests. However, it returns the following:
<html lang="en-US">
<head>
<script language="javascript" type="text/javascript">
if (window['AdfPage'] && AdfPage.PAGE && AdfPage.PAGE.__getSessionTimeoutHelper())
AdfPage.PAGE.__getSessionTimeoutHelper().__alertTimeout();
else {
alert('Because of inactivity, your session has timed out and is no longer active. Click OK to reload the page.');
window.location.replace(window.location.href);}
</script>
</head></html>
Which is weird. If I submit the form manually from a browser it will show the search results.
Related
I am using a template site from Webmatrix 3.0 that includes a sign in page (among other pages) working primarily with c#, razor, html, and jquery. I have created a new page where users can register that includes a password field, as well as other text fields. My issue is, even when I simply have two fields on the cshtml page and nothing else, these fields are filled with data during an initial page request. This data is old "remember me" data from a previous sign in.
I can either do a get request from another page's form to land on this page, or navigate here from a href link, and the result is the same. To try to find the issue, I even cut almost all the code from the page leaving simply this:
<input type="text" />
<input type="password" />
And both fields are populated when I make an initial page request. I have tried to use things like
value = " something like blank or a single space in here "
etc in the html, as well as trying to clear out the fields to no avail with javascript (which I prefer not to since I have had very spotty success with things like onload). Is there a fairly sure fire way to clear text and password fields for initial page requests? Edit - I have tried autocomplete=off to no avail as well.
I use an iframe on my page, which consists of a form with input elements.
Every input element has an onblur() event, which validates the input.
When I open the page in IE 8 with a freshly cleared cache it produces a javascript error like this.
document.getElementById(...)' is Null or not an Object
However, when I inspect the form it is loaded completely and the I'm trying to access is rendered.
Furthermore when i reload the whole page I don't get any errors anymore.
Also when I load the content of the iframe on its own I also don't get errors.
Firefox and Chrome dont throw errors at all.
In short, the Javascript errors I get only occur in IE and only when I use an iframe to display the form (which is mandatory) and only when the page is loaded for the first time.
Any ideas on how I can fix this?
I hope its not too confusing to read.
Edit:
document.getElementById("vHint_"+fieldName).innerHTML=data;
FieldName is the id of the input field. Data is the return value of the validation.
In this case data is an image tag.
After every input field is a span Tag with the id "vHint_"+fieldName.
The event is attached like this:
<input id="Jahr" class="input" type="text" onblur="validDate(this,'Jahr','_beginn')" maxlength="4" style="width:32px" value="" name="Jahr">
First of all thank you for your effort.
The example user13500 provided worked like a charm.
And it made me dig deeper.
And i found the solution.
All input fields are created with a self made ASP Framework, which puts them all in the Session.
The onblur() event of the input field within the iframe triggers an AJAX Request to an ASP file passing the name of the input field as a request parameter. The ASP file now tries to find the field in the Session and retrieve its value to validate the input.
After that the result is posted back to the javascript file, which then uses document.getElementById("vHint_"+fieldName).innerHTML=data; to post the result back in the page.
This normally works without erros.
But, since the application is run in an iframe and the domains of the surrounding page and the application in the iframe are different, IE rejects the Session of the iframe. Thus the result of the ASP validation is empty, because it couldn't find the field in the Session.
Having figured that out the only thing that has to be done is to add this line of code in the application:
Response.AddHeader "P3P", "CP=""CAO PSA OUR"""
This way IE doesn't reject the Session of the application anymore.
Maybe this can be useful for others too.
So I have a somewhat unique issue I believe and I'm not sure what's the best way around it. I have some legacy code that has worked fine in the past in all browser's and suddenly in IE10 it is not working. I'll try to explain as best I can how it works and what I think is the issue.
I am working on an online banking page which has an option for the user to download their account history as a QIF, CSV, etc. The page is written with Classic ASP and VB server code. The way the feature works is the user clicks the download button which reloads the page with a series of clickable images, one for each download file type. Based on the one they click, a javascript function is then called which submits a hidden form on the page and then submits a second hidden form in order to reload the original view with the account history and filters again. The first form action calls an asp page which builds the file and returns it as a response attachment which usually prompts the browser to download the file, and then the second submit action is just the original asp page with the history details. In IE10, the file doesn't download ever and instead some processing occurs and the second submit which reloads the history goes through fine.
What I've found in my looking is that if I comment out the javascript line that submits the second form, then the download works so I think what's happening is the submits are occuring asynchronously and the redirect one returns before the download one. Or something like that. I'm not sure. I'm trying to figure out a work around without having to completely rewrite the feature. Any thoughts?
EDIT:
The page this all occurs on is accountDetails.asp
The javascript --
function SetOFX(type){
// There is some code that does conditional handling of the #type parameter
document.forms.DownloadForm.submit();
document.forms.Finished.submit();
return false
}
The DownloadForm --
<form name="DownloadForm" id="DownloadForm" action="downloadofx.asp" method="post">
<!-- a bunch of input type="hidden" elements -->
</form>
The Finished Form --
<form name="Finished " id="Finished " action="accountDetails.asp" method="post">
<!-- a bunch of input type="hidden" elements -->
</form>
So the DownloadForm calls a separate asp page to get the download file and then the Finished form posts to the page the user is already on to reload the account history details instead of showing the download image buttons. I realize this is a really bad way of doing this in the first place; this is legacy code written by people who were learning and is already being used in production by hundreds of clients so I can't just rewrite it without a major project approval from my boss and all of our clients.
iI haven't tested any of these ideas, but if you want to keep the current architecture, you could try to detect when the file has been completely downloaded and then navigate away.
Have a look at this question to know how to detect when the file has been downloaded by the browser.
Another idea would be to drop the first form submission in favor of a simple a link with an href attribute that points to your file download link, using query string params to pass additionnal data. You might also want to put taget="_blank" on the link if you still experience the same issue without it.
Here's the answer we came up with in the end. The above javascript shouldn't have ever worked in the first place and in fact we found out after testing that it wasn't working in many places but the part we cared about (the file download) was always working. It turns out up until IE10, all browsers have been smart enough to know that you shouldn't submit two forms that way and they ended up ignoring the second submit. IE10 however was processing them both and the redirect was returning before the file download. Since we didn't care about an auto-redirect we just took that submit out and instead added a submit button to the finished form so the user could manually return to the previous view.
The fixed Javascript --
function SetOFX(type){
// There is some code that does conditional handling of the #type parameter
document.forms.DownloadForm.submit();
return false
}
The fixed Finished Form
<form name="Finished" id="Finished" action="accountDetails.asp" method="post">
<!-- a bunch of input type="hidden" elements -->
<input type="submit" value="Return to Account Details" />
</form>
I've built a notification system for an ASP.NET MVC3 site I've been working on that lets the user know that various actions they performed happened successfully (a "pat on the back" message). The solution works pretty well, but I have one issue that I would love to solve, but I can't seem to wrap my head around how to do so.
In a controller, I have the following example action methods:
<HttpGet()>
Public Function Edit(id As Guid) As ActionResult
Return View(GetMyViewModel(id))
End Function
<HttpPost()>
Public Function Edit(...) As ActionResult
' Save updated ... information
Me.TempData("UserMessage") = "Data Saved! You are truly an awesome user!"
' PRG back to Edit
Return RedirectToAction("Edit")
End Function
Then in my view (razor layout) I have code that looks for the existence of the "UserMessage" key in the TempData collection, and if it exists I build out some JavaScript to present a growl-like notification for the user:
/* This only exists when we have something to show */
$(function () {
showNotification([the message from TempData]);
});
The growl-like message then either goes away over time or the user can click on the message to dismiss it.
So far so good, everything is working as expected. User POSTs to Edit, they are RPGed back to Edit, the growl-like "Data Saved! You are truly an awesome user!" message is shown and dismissed.
If the user then navigates to another page and then hits the browser's back button, the browser then digs into its cache, resulting in the browser executing the same javascript, showing the user the same "Data Saved! You are truly an awesome user!" message again. This confuses the heck out of the user thinking that, by clicking the back button, they just did something that caused yet another save (or whatever the message was).
I'm looking for a way, that once the notification is shown once, I can somehow prevent the notification from ever showing back up -- basically making it a "one time shot" message. Things I have thought about are:
including a Guid with every message, and using localStorage to store a list of shown message ids, and only if the message being requested doesn't already exist in the list of shown messages, show it.
I've thought of using a Cookie in the same way, but cringed at the idea that the cookie is needlessly blasted back to the server for future requests, plus the content of the cookie would need to be carefully considered and probably per-message anyhow.
Instead of returning a message from the action method, return a Guid instead that points to a message in a database. Then on page load, AJAX back to the server to get that message. Once a message is got, it is deleted from the database, subsequent requests for that same message are handled by returning no message.
include a Guid with every message, and before showing the message, AJAX back to the server to see if the message has already been shown. Once the message is shown, AJAX back to the server to log that the message has been shown.
These all seem pretty untenable to me, but I could be convinced otherwise if someone wants to argue support for one of these.
Things I have tried:
After the message is dismissed removing all traces of the DOM element.
When the message is shown, set a jQuery .data() property on the message's DOM element to indicate that the message was shown, then before showing the message, make sure the .data() field doesn't exist.
These don't work because the browser caches pages at a point in time where both of these DOM changes happen afterwards.
Basically, I need a mechanism that my javascript can check to see if it really needs to show this given message, and if it does, show it, but then mark the message as shown so that if it is requested to be shown again, it doesn't. Any suggestions?
You also may take advantage of the fact that browsers remember on form-values, simple example:
<input id="notification" style="display:none" value="Data Saved!">
<script type="text/javascript">
$(function () {
if($('#notification').val())
{
alert($('#notification').val());
$('#notification').val('');
}
});
</script>
You could set a cookie to store the state of the message being shown, and check for the presence of the cookie when you attempt to show the message. If the cookie is there, you don't, and if it isn't, you do.
Seems like the logic for that would be pretty simple, as opposed to tracking GUIDs :)
// pseudocode
FUNCTION ShowMessage(args)
// if the cookie is here, don't show the message
IF StatusCookieIsPresent THEN RETURN
// if the cookie isn't here, this is the first time showing the message
ShowStatusMessage()
// we showed the message, so set the cookie to make sure we don't
// do it twice
SetStatusCookie()
END FUNCTION
I'm not an expert in MVC, but I'll try helping you out. I made a Hello World program 3 years ago and read a lot about it. :-) Are you using AJAX history in your page (using the # notation in the query string--the functionality Yahoo broke with their new YUI mail framework or forgot about when you click the back button--sends you to the login page haha)? When you click the back button, are you coming from a different page or the same page to display your message? I don't know enough about the Razor framework to help you out completely.
At first I thought you could easily disable cache on the page by using this code, but then I saw that you're using AJAX, so that's probably not a good enough solution.
<META HTTP-EQUIV="CACHE-CONTROL" CONTENT="NO-CACHE">
<META HTTP-EQUIV="EXPIRES" CONTENT="0">
<META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE">
This would turn it off completely, but probably not a good solution for performance reasons.
<caching>
<outputCache enableOutputCache="false" />
</caching>
You could block the cache for the page displaying the user message from your collection, but I don't know what is happening in your AJAX to be sure.
Response.CacheControl = "No-Cache"
Not sure if MVC would respect this, but worth a try:
Response.Cache.SetExpires(DateTime.Now.AddSeconds(60));
Response.Cache.SetCacheability(HttpCacheability.Public);
Response.Cache.SetValidUntilExpires(false);
Response.Cache.VaryByParams["Category"] = true;
if (Response.Cache.VaryByParams["Category"])
{
//...
}
Or, maybe that page is the donut with filling in the middle, and it's too filling (for the browser back button)--no pun intended. Make it a donut-hole, and it might solve your issue. Essentially you could implement a view control that contains the message your displaying. And remove the cache policy from this view control. I suppose it would work in theory. Hopefully it works well in real life for you.
http://haacked.com/archive/2009/05/12/donut-hole-caching.aspx
The OutputCache attribute (mentioned in that article) just controls cache for a given user control.
http://msdn.microsoft.com/en-us/library/hdxfb6cy.aspx
I have build a quite complex widget which contains "some kind of
form". It has a form tag, but I'm loading a lot of stuff in there via
Ajax etc. Cannot explain it in detail, and the code is too long to
paste in here.
Now, in a "live('click', function()" I use for one of the form fields,
I'm writing a couple of values into hidden fields of another form.
That works fine, as I can see them in the generated code. But if I
leave the page and then hit the back button, the values are gone.
If I write some values into those fields outside the live click
function though, they are still there when I leave the page and come
back using the back button.
But I need to write the values into the hidden fields out of the live
click function (I'm inserting values from fields of my form into
them).
I don't know what causes this and wasn't able to find a workaround yet
(even though I tried a lot).
Any ideas?
Thanks!
Have a look at the jquery history plugin (http://plugins.jquery.com/project/history)
Usually what happens is that browser remembers what you have entered into a form (even if you don't submit it) so that when you hit back button, it populates all the visible fields for you.
It seems it's not the case with hidden fields. There's a workaround though.
Every time one of your hidden fields is changed, you can add #part to your url (eg. www.mysite.com/users#userId,groupId,...).
When the page is loaded again (via back button for example), it will contain the #part. Parse it as a string to determine how to populate hidden fields and populate them.
Review the history plugin for jQuery to see how to read the #part.http://plugins.jquery.com/files/jquery.history.js_0.txt
Use CSS to hide the input instead of the input type.
<input type="text" id="foo" name="foo" style="display: none;" />
instead of
<input type="hidden" id="foo" name="foo" />
I tripped over the same issue and this seems to resolve it.