I'm trying to automate the process of getting my current student records at my college. In a browser the process involves typing in my college's URL, then clicking on the login link which then brings me to a https:// URLed page were I type my password and user-name in. Then from there it is one or two more links and reading some text on the page. Now, my question is, how might I go about do doing this but in an automated way, so my records would be displayed on the command line. The https:// in the URL signifies, I think, that it uses SSL are there certain libraries that can handle this? Also the 'submit' button on the login page I'm pretty sure uses JavaScript, again, are there libraries to handle this?
I'm sure I missed something or other in my question's description, so please ask if you do not understand my question or need more information.
PS. I am not well versed in Internet protocols and I am also new to Python. In fact I started studying it for this project. But, I am fluent in C and I am pretty good with C++.
Thanks in advance.
Michael,
You don't have to mimic all the actions you do in the browser.
First. There is no problem with https/ssl as long as you don't have to verify them (it seems that you don't have to), urllib2.urlopen will handle them.
Second. When you click 'Submit' browser sends a request to the server with your username, password and probably some other data. The type of that request is probably POST. As a response server will probably send you a cookie with session id. So all you need to do is to investigate the exact format of request to the server (e.g. using FireBug), and get the cookie from the server's response.
Third. Just use that cookie to navigate the pages on the site. This might help.
P.S. As you see, there is too much 'probably' word in the answer - the exact authentication process may differ from described above and you'd have to investigate it by yourself.
Roman's answer is good advice: you generally don't need to act like a real user when your script can call HTTP methods directly.
However, if you are not comfortable with reverse engineering the HTTP operations that the site requires, then an alternative would be to use Selenium, a tool for simulating interaction with web pages. Selenium is usually used by web application developers to test their applications, but it can also be used as an automatable client for an existing website.
Related
I'm currently making an open source browser extension that will send requests to my site. This can easily be done with Ajax, a request will be sent to the page action.php.
My site will use PHP, well now the question is, how can I make sure action.php receives the request from the original extension? I mean griefers could easily send false information to the server, or a fork could be used and send incorrect data. I thought of generating a token of some sort, but anyone could recreate it I guess.
How can I prevent this situation?
I have some experience with this myself. I've been building an extension with a login and eventually came to the inevitability that security in an extension is inherently difficult.
The issue is that an extension is just a bundle of JS and HTML that anyone can inspect the values of. This means that anyone determined enough to dig through your code can potentially find out how to bypass anything you have built in.
The solution I eventually came to is that, the extension itself cannot hold any long-lasting secrets. A session with a timeout is the only safe thing to store. The actual login for my extension is done via a website over HTTPS.
If you are trying to do this without any such login, your only recourse is to make it as difficult as possible to determine what needs to be sent by using an algorithm that can generate server verifiable tokens, and then only publishing minified code to the webstore.
EDIT: Reread the question and noticed that you said you are doing this open source. Without some sort of authentication on the webserver via HTTPS, there is little you can do to stop those determined to bypass your protections because they will be on display in your public repository.
For sensitive endpoints like this, it would make sense do to the data processing server-side. The client would only have to query the server to process the data.
I have a SignalR chat site that's meant for a school project (also uses C#). Theoretically, it is for trusted users, but as everyone will attest - never trust your users. This was proven to me as I sent out the link to a couple of my friends and they immediately tried to break it, ha ha.
I've sanitized all inputs properly now, but one thing that they were still able to do was to use the browser console tools to manually call the functions needed to send messages, etc..
Example: $.connection.chatHub.server.sendMessageToAll('FakeUser','FakeMsg',0);
I would like to prevent these types of actions. I recall a while back Facebook actually disabled the console window for "security" purposes. I even found several{1} resources{2}, which detail how this was done and attempts to further prevent console use once Chrome had fixed this.
However, none of these options work anymore and because browsers are constantly in flux, I'd rather not attempt to block at this level.
I was wondering if anyone on Stack knows of a better way to prevent these types of attacks? Is there a good way to check where the call is coming from? Does SignalR have a good method to prevent this? Ideas/Discussion would be surely welcome.
Trying to lock down the client like that might work reasonably well to prevent non-technical users from messing with your app, but it will do next to nothing against a knowledgeable and resourceful opponent. The circumstances under which such security measures make sense are rather limited, and certainly do not include any application that is accessible to everyone from the internet.
The only safe approach is well-known and very simple: the server does not trust the client for anything. It doesn't then matter what the client attempts to do as the server will refuse all actions it does not deem valid.
In your example, the server would assign a randomized opaque connection id to each session. The client would only be able to convince the server to do anything if they sent a valid id as part of their request; then, the server would not need to trust the client for a username because it would already know what connection each user has logged in from and could produce the username when given the id.
I've been attempting to do some research on this topic for a while, and even cite the following Stack Overflow threads :
Javascript Hijacking - When and How Much Should I Worry
JSON Security Best Practices
But my basic problem is this.
When I am building my web applications, I use tools like Fiddler, Chrome Developer Tools, Firebug, etc. I change things on the fly to test things. I can even seem to use fiddler to change the data that gets sent to the server.
What stops someone else from just opening up my webpage and doing this too? All of the jQuery validation in the world is useless if a user can just hit F12 and open up Chrome Developer tools, and change the data being sent over the wire, right?
I'm still relatively new in this field and this just has me very concerned as I see "Open" Protocols become more and more ubiquitous. I don't understand SSL yet (which is on my list of things to begin researching), so perhaps that is the answer and I just haven't dug deep enough. But the level of flexibility I have over manipulating my pages seems very extreme - which has me very concerned about what someone malicious could do.
Your concerns are indeed justified. This is why you should always validate everything on the server. Client-side validation should only be used for UX.
JavaScript's security is, in a nutshell, based around a trusted server. If you always trust what code the server sends you, it should be safe. It's impossible for a third party (like an ad supplier) to fetch data from the domain it's included on.
If the server also sends you user generated content, and in particular user generated code, then you have a potential security problem. This is what XSS attacks focus on (running a malicious script in a trusted environment).
Client side validation should focus on easy of use, make it easy to correct mistakes or guide the user so no mistakes are made. The server should always do validation, but validation of a more strict nature.
Validation should always happen Server Side, Client Side Validation is only valuable to make for a more convenient experience for the user. You can never trust a user to not manipulate the data on their end. (Javascript is ClientSide)
Next if you are wanting to secure your service so that only user1 can edit user1's profile you'll need to sign you JSON request with OAuth (or similar protocol).
yeah nothing can stop anybody from interfering the data that is being sent from the browser to your server and that's the reason you shouldn't trust it
always check the data from the user for authenticity and validity
also with it you can check and interfere with the data that big sites like google and microsoft send back and you might get an idea.
You have to assume that the client is malicious-- using SSL does not prevent this at all. All data validation and authorization checking needs to be done server side.
Javascript isn't going to be you only line of defense against hackers, in fact it shouldn't be used for security at all. Client side code can be used to verify form input so that users trying to use the page can have faster response times, and the page runs nice. Anyone who is trying to hack your page isn't going to care if your page works or not. No matter what, everything coming into your server should be verified and never assumed as safe.
We have a heavy Ajax dependent application. What are the good ways of making it sure that the request to server side scripts are not coming through standalone programs and are through an actual user sitting on a browser
There aren't any really.
Any request sent through a browser can be faked up by standalone programs.
At the end of the day does it really matter? If you're worried then make sure requests are authenticated and authorised and your authentication process is good (remember Ajax sends browser cookies - so your "normal" authentication will work just fine). Just remember that, of course, standalone programs can authenticate too.
What are the good ways of making it sure that the request to server side scripts are not coming through standalone programs and are through an actual user sitting on a browser
There are no ways. A browser is indistinguishable from a standalone program; a browser can be automated.
You can't trust any input from the client side. If you are relying on client-side co-operation for any security purpose, you're doomed.
There isn't a way to automatically block "non browser user" requests hitting your server side scripts, but there are ways to identify which scripts have been triggered by your application and which haven't.
This is usually done using something called "crumbs". The basic idea is that the page making the AJAX request should generate (server side) a unique token (which is typically a hash of unix timestamp + salt + secret). This token and timestamp should be passed as parameters to the AJAX request. The AJAX handler script will first check this token (and the validity of the unix timestamp e.g. if it falls within 5 minutes of the token timestamp). If the token checks out, you can then proceed to fulfill this request. Usually, this token generation + checking can be coded up as an Apache module so that it is triggered automatically and is separate from the application logic.
Fraudulent scripts won't be able to generate valid tokens (unless they figure out your algorithm) and so you can safely ignore them.
Keep in mind that storing a token in the session is also another way, but that won't buy any more security than your site's authentication system.
I'm not sure what you are worried about. From where I sit I can see three things your question can be related to:
First, you may want to prevent unauthorized users from making a valid request. This is resolve by using the browser's cookie to store a session ID. The session ID needs to tied to the user, be regenerated every time the user goes through the login process and must have an inactivity timeout. Anybody request coming in without a valid session ID you simply reject.
Second, you may want to prevent a third party from doing a replay attacks against your site (i.e. sniffing an inocent user's traffic and then sending the same calls over). The easy solution is to go over https for this. The SSL layer will prevent somebody from replaying any part of the traffic. This comes at a cost on the server side so you want to make sure that you really cannot take that risk.
Third, you may want to prevent somebody from using your API (that's what AJAX calls are in the end) to implement his own client to your site. For this there is very little you can do. You can always look for the appropriate User-Agent but that's easy to fake and will be probably the first thing somebody trying to use your API will think of. You can always implement some statistics, for example looking at the average AJAX requests per minute on a per user basis and see if some user are way above your average. It's hard to implement and it's only usefull if you are trying to prevent automated clients reacting faster than human can.
Is Safari a webbrowser for you?
If it is, the same engine you got in many applications, just to say those using QT QWebKit libraries. So I would say, no way to recognize it.
User can forge any request one wants - faking the headers like UserAgent any they like...
One question: why would you want to do what you ask for? What's the diffrence for you if they request from browser or from anythning else?
Can't think of one reason you'd call "security" here.
If you still want to do this, for whatever reason, think about making your own application, with a browser embedded. It could somehow authenticate to the application in every request - then you'd only send a valid responses to your application's browser.
User would still be able to reverse engineer the application though.
Interesting question.
What about browsers embedded in applications? Would you mind those?
You can probably think of a way of "proving" that a request comes from a browser, but it will ultimately be heuristic. The line between browser and application is blurry (e.g. embedded browser) and you'd always run the risk of rejecting users from unexpected browsers (or unexpected versions thereof).
As been mentioned before there is no way of accomplishing this... But there is a thing to note, useful for preventing against CSRF attacks that target the specific AJAX functionality; like setting a custom header with help of the AJAX object, and verifying that header on the server side.
And if in the value of that header, you set a random (one time use) token you can prevent automated attacks.
I have a problem where I cannot identify visitors to my intranet page because their browser is configured to use a proxy, even for the local intranet. I always see the proxy IP and no other details about the client. The SOE that my company uses has the proxy set up already for Firefox and Internet Explorer, and I cannot ask them to reconfigure their browser because that is fairly complicated. I have tried using the PHP $_SERVER['REMOTE_ADDR'] and also one called $HTTP_SERVER_VARS['HTTP_X_FORWARD_FOR']. In fact, I wrote a page that lists both the $_SERVER and $HTTP_SERVER_VARS arrays and there was nothing informative of the actual client connecting. This is why I think it needs to be done on the client's side.
I'm not looking for a secure solution because it is only a simple page, so I was hoping that I could use Javascript or something similar to find something revealing about the client and send it to my intranet page as a GET variable. It's basically for collating statistics. It is no use telling me most of the visitors are a proxy! :)
I also want to avoid having users log in if possible.
You could use a cookie with a random, unique ID that's set upon the first entrance, and then used for identification. Could be done either in JavaScript or in PHP.
I am pretty sure there's no universal way to do this otherwise the whole concept of anonymous proxies go down the drain :)
My advice would be to ask your IT department to configure the proxy to populate the HTTP-X-FORWARD-FOR, REMOTE-ADDR or some other identifying header.