I will do my best to try to explain this.
I am scraping a website for it's elements to then output in a different format. The problem that I am experiencing is the way that this site directs the user throughout the site is through a Javascript redirect.
When checking the 'a href' tag, this is the Javascript that shows up
javascript:doParamSubmit(2100, document.forms['studentFilteredListForm'], 'SSC000001MU9lI')
The SSC000001MU9lI changes for each element that it redirects to.
Is it possible to find a URL using this Javascript, so that I can reach the HTML page externally?
EDIT: Here is the doParamSubmit and doSubmit classes:
function doParamSubmit(event, form, parameter) {
form.userParam.value = parameter;
doSubmit(event, form);
}
function doSubmit(event, form)
{
// Make sure if something fails that the form can be resubmitted
try
{
// If this form has not been submitted yet... (except for IE)
if (allowSubmit == true && form != null && (submitted == false || isInternetExplorer6() || isInternetExplorer7()))
{
submitted = true;
form.userEvent.value = event;
// Fix for IE bug in which userEvent becomes a property array.
if (form.userEvent.length)
{
form.userEvent[0].value = event;
}
// Disable the form so the user can't accidentally resubmit the page
// (NOTE: this doesn't disable links (e.g. <a href="javascript:...">)
disableForm(form);
// If there is a populate form function, call it. If there are spell check fields on the
// page, populateForm is used to set hidden field values.
if (this.populateForm)
{
populateForm();
}
saveScrollCoordinates();
// resetSessionTimeout();
try
{
form.submit();
}
catch(e)
{
// Exceptions thrown here are only caused by canceling the submit in onbeforeunload, so ignore.
submitted = false;
}
}
if (allowSubmit == false)
{
alert(grabResource("message.pageLoading"));
}
}
catch(e)
{
submitted = false;
throw e;
}
}
I see 2 approaches.
You use a javascript enabled browser such as http://nrabinowitz.github.io/pjscrape/. I am not sure if you intend to just follow the links or instead grab the URL for some other use so your mileage may vary.
Find the doParamSumit() function in their page/scripts and analyze it to understand how it gets the URL - the one you have as an example looks like it grabs the action from a form perhaps? Once you know how the function work you might be able to harness that info in your scraping by using some regex to find URLs that match the doParamSubmit pattern and going from there. It's hard to say without seeing the function itself as well as the other links like it though.
Regardless of which method you choose I would begin by understanding the function - look for it in the code or loaded js files (you can also you things like javascript debuggers on most browsers to help you find it) and see what happens - it might be super obvious.
Also keep in mind that this might be a POST for a form - in which case the result of you following that link may not work if it expects valid form data.
Edit I see that you posted the function. It simply submits the form listed in the second parameter i.e. 'studentFilteredListForm'. While I don't think your scraping will go to far chasing forms you can still get the URL either with javascript if your scraper lets you (something like $('form[name=studentFilteredListForm]').attr('action') or using whatever language your are using for the scraper i.e. find the form and extract the action url (remembering that if there is no action it is probably posting back to the current URL)
But again... you might first manually get the URL of the form and see where that gets you. You might just get a page with form errors :)
Related
The Environment
PHP: Symfony -> Twig, Bootstrap
Doctrine -> Mongodb
Jquery
The Project
The timer is working, counting the interval, and I can send the form. This is good, but once I get to the part where the form is sent I seem to be running into two problems:
interval counter working
form submitting automatically, then
The null fields do not get accepted
The form does not get validated
The form itself is setup through a custom built PHP graphical interface that allows you to set the simple things very easily: fields, fieldtypes, names and labels, etc.
This is cool, but it means that you can't go in and tinker very much with the form or at all really. The most you can change is aesthetics in the template. Although you could wrap each form element in another form element or some kind of html identifier, this is an option, but a tedious one...
On a valid check the form gets upserted and all is well the else catches the non-valid form and lets the user know that some fields are not filled in. I would post that code here, but I think it would be frowned upon. Hopefully you understand the PHP concept.
The part that I can share is the JS:
autoSaveForm = function() {
var form = $('form'),
inputs = form.find('input');
form.on('submit', function() {
inputs.each( function( input ) {
if (input.attr("disabled") === true){
input.removeAttr("disabled");
}
})
});
function counter(){
var present = $('.minutes').attr('id'),
past = present;
$('.minutes').text(past);
setInterval(function(){
past--;
if(past>=0){
$('.minutes').text(past);
}
if(past==0){
$('.minutes').text(present);
}
},1000);
}
function disableInputs(){
inputs.each( function( input ) {
if (input.val() === ""){
input.attr("disabled", true);
}
});
}
// Start
counter();
// Loop
setInterval(function(){
counter();
form.submit()
},10000);
};
This sets a counter on the page and counts through an interval, it's set to seconds right now for testing, but eventually will be 10 minutes.
Possible Solutions
disable the null fields before submit and remove the attribute after submit is accomplished
this needs to be done so that the user can do a final completed save at the end of the process
this doesn't actaully seem to be disabling the fields
Their were a few things I found about adding in novalidate to the html, but it wasn't supported in all browsers and it might get tricky with the static nature of the forms.
Post the form with ajax and append a save true variable to the end of the url, then redirect it in PHP Controller with a check on the URI var although:
then I still have the null fields
to solve this: set a random value or default value to the fields as they are being passed through. haven't tried this yet
Something kind of like what this question is addressing, although I have some questions still about the null fields - Autosaving Form with Jquery and PHP
Sending the form through ajax
Any input would be greatly appreciated. :) Thanks
I know on client side (javascript) you can use windows.location.hash but could not find anyway to access from the server side. I'm using asp.net.
We had a situation where we needed to persist the URL hash across ASP.Net post backs. As the browser does not send the hash to the server by default, the only way to do it is to use some Javascript:
When the form submits, grab the hash (window.location.hash) and store it in a server-side hidden input field Put this in a DIV with an id of "urlhash" so we can find it easily later.
On the server you can use this value if you need to do something with it. You can even change it if you need to.
On page load on the client, check the value of this this hidden field. You will want to find it by the DIV it is contained in as the auto-generated ID won't be known. Yes, you could do some trickery here with .ClientID but we found it simpler to just use the wrapper DIV as it allows all this Javascript to live in an external file and be used in a generic fashion.
If the hidden input field has a valid value, set that as the URL hash (window.location.hash again) and/or perform other actions.
We used jQuery to simplify the selecting of the field, etc ... all in all it ends up being a few jQuery calls, one to save the value, and another to restore it.
Before submit:
$("form").submit(function() {
$("input", "#urlhash").val(window.location.hash);
});
On page load:
var hashVal = $("input", "#urlhash").val();
if (IsHashValid(hashVal)) {
window.location.hash = hashVal;
}
IsHashValid() can check for "undefined" or other things you don't want to handle.
Also, make sure you use $(document).ready() appropriately, of course.
[RFC 2396][1] section 4.1:
When a URI reference is used to perform a retrieval action on the
identified resource, the optional fragment identifier, separated from
the URI by a crosshatch ("#") character, consists of additional
reference information to be interpreted by the user agent after the
retrieval action has been successfully completed. As such, it is not
part of a URI, but is often used in conjunction with a URI.
(emphasis added)
[1]: https://www.rfc-editor.org/rfc/rfc2396#section-4
That's because the browser doesn't transmit that part to the server, sorry.
Probably the only choice is to read it on the client side and transfer it manually to the server (GET/POST/AJAX).
Regards
Artur
You may see also how to play with back button and browser history
at Malcan
Just to rule out the possibility you aren't actually trying to see the fragment on a GET/POST and actually want to know how to access that part of a URI object you have within your server-side code, it is under Uri.Fragment (MSDN docs).
Possible solution for GET requests:
New Link format: http://example.com/yourDirectory?hash=video01
Call this function toward top of controller or http://example.com/yourDirectory/index.php:
function redirect()
{
if (!empty($_GET['hash'])) {
/** Sanitize & Validate $_GET['hash']
If valid return string
If invalid: return empty or false
******************************************************/
$validHash = sanitizeAndValidateHashFunction($_GET['hash']);
if (!empty($validHash)) {
$url = './#' . $validHash;
} else {
$url = '/your404page.php';
}
header("Location: $url");
}
}
I have two subsequent forms on my website with POST method.
The first page of my website first.php contains this code:
<form action="a.php" method="POST" target="_blank">
<input name="value" type="hidden" value="foo"/>
<div class="button"><label><span class="icon"></span>
<input type="submit" class="button-graphic ajax" value="Click Here"></label></div></form>
a.php can be accessed only via this POST request (otherwise user will get method not allowed 405 error)
Once submitted, this form opens a.php with an AJAX modal window.
a.php contains another form:
<form action="b.php" method="POST" target="_blank">
<input name="bar" type="hidden" value="none"/>
<div class="border"><label><input type="submit" class="button-graphic2 tracking" value="Continue"></label></div></form>
When a user clicks Submit in the second form, it will open b.php,
which can also be accessed only via POST request (otherwise - 405 error).
The only difference I can think about between these forms is that the second one contains a tracking js class (opening an iframe). this is the js code:
$(document).ready(function() {
$(".tracking").click(function(){
var iframe = document.createElement('iframe');
iframe.style.width = '0px';
iframe.style.height = '0px';
iframe.style.display = 'block';
document.body.appendChild(iframe);
iframe.src = '/track.htm';
});
This is done in order to track a conversion using a third party script which is being execuated from track.htm
I noticed that I am having a problem with about 5% of my iPad visitors.
they open a.php properly with a POST request, but when they go ahead to continue and open b.php as well, about 5% sends out a GET request instead of the desired POST request, causing them to get an 405 error and leave the website.
I know that these are real human users as I can see some of them trying several times to open b.php and keep getting these 405 errors.
Could this be caused because simultaneously their device is using a GET request to obtain track.htm? and this is some glitch?
How can this be solved?
EDIT 4.4.2015:
Since there's a chance that firing the tracking script is causing this, I would like to know if there's another fire to fire it (or track that adwords conversion), without causing these iPad user to use "GET" requests for the form as well.
EDIT 10.4.2015:
This is the jquery code of the ajax class, that effects both first.php and perhaps a.php, as first.php is the parent frame:
$(document).ready(function() {
$(".ajax").click(function(t) {
t.preventDefault();
var e = $(this).closest("form");
return $.colorbox({
href: e.attr("action"),
transition: "elastic",
overlayClose: !1,
maxWidth: $("html").hasClass("ie7") ? "45%" : "false",
opacity: .7,
data: {
value: e.find('input[name="value"]').val(),
}
}), !1
})
}),
Technically, it shouldn't happen. The iframe created by your tracking script pointed to /track.htm, so there shouldn't be any GET request to your b.php page.
On the other hand, just thinking out loud here, there're a few scenario that could happen because of "real world" user.
The users happen to have bookmark the b.php page, thus causing them to open it using GET when they try to re-open the page using their bookmark.
The users tried to refresh the page b.php, then get warned about "Form re-submission". Being clueless as most real user are, they canceled the form re-submission, then click on the address bar and click GO on their browser with the sole intention of reloading the page. This could also cause the GET request to send to the b.php page.
Considering the best practice when designing the page flow for form submission, it might be better for you to only "process" your form data in b.php and then return a 302 Redirect to another page that show the result using a GET request. This will allow users to "refresh" the page without double submitting the form, and also allow user to bookmark the result page too.
This doesn't answer your question but as it entails to the GET glitch but as things stand, ~5% of your iPad visitors can't sign up because the code only accepts POST and so far no one can figure this out. So I propose a change of strategy, at least in the mean time.
Preventing CSRF by only accepting POST requests is already known to not work. Your choice of accepting only this request method as a means of security is what ultimately results in the 405. There are better ways.
One example of is using a CSRF token, specifically the Synchronizer Token Pattern.
The idea behind a CSRF token is that when you generate the form, you also generate a "key" which you tie to the form. When that form is submitted, if it doesn't have the key or the key isn't the right one, you don't bother processing the form. The Syncronizer Token Pattern gets fancy in that it changes the expect key each time (in the form field implementation, giving the <input type="hidden"> field a new name attribute each time) in addition to the value.
Have your code in a.php generate a random token and
store it as a session variable on the server. Output the token in the form as a hidden field.
Before processing the request in b.php, ensure the token value is in the request data and ensure it has the expected value.
You can first check for $_POST data and if it is missing, check for $_GET data. Regardless of which array contains the data, if the data does not have a valid CSRF token, respond with a 4xx error.
If the token is good, consume the token and process the request.
If the token is missing or is invalid, return a 4xx response code.
Another way would be to set your field names to random values each time the form is generated. So instead of <input name="value" type="hidden" value="foo"/> or <input name="bar" type="hidden" value="none"/>.
// ... in an importable file somewhere ...
// Generate our tokens
function token($len = 13) {
$chrs = 'abcdefghijklmnopqrstuvwxyz0123456789_';
$str = '';
$upper_lim = strlen($chrs) - 1;
for ($i = 0; $i < $len; $i++) {
$idx = rand(0, $upper_lim);
$str .= rand(0, 1) ? strtoupper($chrs[$idx]) : $chrs[$idx];
}
return $str;
}
function magic_set_function($key, $value) {
$_SESSION[$key] = $value;
}
function magic_get_function($key) {
return (array_key_exists($key, $_SESSION) ? $_SESSION[$key] : NULL)
}
function validate_request() {
$data = !empty($_POST) ? $_POST : $_GET;
if ( empty($data) ) { return false; }
// Ensure the tokens exist (hopefully not too costly)
$field_tokens = magic_get_function('field_tokens');
if ( $field_tokens) === NULL ) { return false; }
$csrf_token_name = $field_tokens['token'];
$given_csrf_token = $data[$csrf_token_name];
// Get our CSRF token
$expected_csrf_token = magic_get_function('csrf_token');
// ensure we're expecting a request / that we have generated a CSRF
if ( $expected_csrf_token === NULL ||
$expected_csrf_token !== $given_csrf_token) {
return FALSE;
}
// After whatever other checks you want...
return TRUE;
}
function fetch_data() {
$data = empty($_POST) == FALSE ? $_POST : $_GET;
if (empty($data ) { throw new DataLoadException(); }
// Ensure the tokens exist (hopefully not too costly)
$field_tokens = magic_get_function('field_tokens');
if ( $field_tokens) === NULL ) { throw new TokenLoadException(); }
foreach ($field_tokens as $field_name => $token_name) {
if ( isset($data[$token_name]) ) {
$data[$field_name] = $data[$token_name];
unset($data[$token_name]);
}
}
return $data;
}
// first.php/a.php/b.php (wherever necessary)
// ...
$tokens = array();
// our csrf token
$csrf_token = token();
$field_names = array('value', 'bar', 'token');
$field_values = array('value'=>'foo', 'bar' => 'none', 'token' => $csrf_token);
// Tokenize errthing...
foreach ($field_names as $k => $field_name) {
// and generate random strings
$tokens[$field_name] = token();
}
// You NEED TO STORE THESE TOKENS otherwise submissions lose context
magic_set_function('field_tokens', $tokens);
magic_set_function('csrf_token', $csrf_token); // dup, but j.i.c.
// first.php
printf('<input type="hidden" name="%s" value="%s"/>', $tokens['value'], $field_values['value']);
// ...
// a.php
// Get the data... (POST/GET)
if (ensure_valid_request() !== TRUE) { handle_invalid_request(); }
$data = fetch_data();
// ...
// Tokenize errthing, generate a csrf, store the values, etc.
// ...
printf('<input type="hidden" name="%s" value="%s"/>', $tokens['bar'], $field_values['bar']);
// ...
// b.php
// ... You get the idea ...
It doesn't answer your question of why 5% are sending GET Requests but it does solve your overall problem on both a security and user level.
EDIT:
To specifically answer OPs questions in comments:
"(1) does this require using cookies? (a session means cookies right?)"
Read up on PHP Sessions and look for a session library. Plenty out there, one heavyweight being Zend(http://framework.zend.com/manual/1.12/en/zend.session.html). You can save to a database instead for protected server-side sessions. I made one similar to Kohana's.
(2) I didn't understand the "another way" part - how does it differ from the method you described at first?
First method is to just add a token to your form and look for the token to have the expected value upon submission. If the form doesn't have it, you throw an error complaining.
Second method dynamically sets the field names upon form generation AND adds a token field. Submitting the proper form data from a program, bot, or outside source now first requires fetching the form since they wont know what field names to use (instead of just posting data with set field names).
"(3) most important, I am less worried about CSRF attacks, I just don't want bots/crawler to crawl into my forms, would this method prevent it from them, as opposed to humans? why? and is there an easier method to achieve that?"
If you mean bots like Google/SEO/respectful web-crawlers, robots.txt exists
for this purpose. robots.txt is a very simple text file that is placed in your site's root directory. You'll see requests in your webserver's access logs for a /robots.txt. This file tells search engine and other robots which areas of your site they are allowed to visit and index. You can read more on the (Robot Exclusion Standard)4 on many (websites)5.
As the second link notes, don't use robots.txt to hide information. It is a public file and visible to anyone. Also, malicious bots wont respect the file.
I'm not sure if when you say bots you mean just crawlers or spambots (bots trying to submit data) and such. If it's crawlers, robots.txt takes care of them. If it's spambots, you can add a hidden field (hidden with CSS not html) with a common name that when filled out you know is invalid, you can add a captcha, etc, etc, etc.
Try doing the tracking on the callback of the original request to ensure its loaded?
Also you could look into something like ajaxFormPlugin by malsup
i would like to suggest to check the permission of your "b.php" page. Please make sure the page has "w" permission for all users. this is a chance for not making a "POST" request.
I know it's a workaround but if, as I suppose, you have a bunch of checks for the $_POST variables, if you receive a GET request you could try replace the POST with the GET:
if (empty($_POST) && !empty($_GET)) $_POST = $_GET;
//here the check of $_POST
//...
since we don't know why this ipads (...apple -.-) have the issue, and between GET and POST there isn't so much difference - at least if you don't need to upload files...
The only way a post form can be sent as get is using script (changing the method attribute directly, or replacing the form behavior for example with an ajax request, binding to the event "submit" another function), so I suggest you to check every script that run in the parent and the children pages.
your ajax call doesn't contain method: "POST". This can be the cause.
I have come across a situation that doesn't make much sense to me. Just as some background information, I'm using the Laravel framework. The page in question calls a query when the page is requested using Laravel's '->with('var', $array)' syntax. This query (which I will post later) works perfectly fine on page load, and successfully inserts dummy data I fed it.
I call this same query via an Ajax $.post using jQuery, on click of a button. However, when I do this $.post and call this query, I get an Internal Server Error every time. Everything is exactly the same, information passed included; the only difference seems to be whether or not it is called on page load or via the $.post.
Here is the error:
Below is the code that performs the query on page load:
routes.php sends the HTTP get request to a file called AppController.php
routes.php
AppController.php
The page is then made with the following array acquired from DeviceCheckoutController.php
Which then goes to DeviceCheckout.php
I am able to echo $test on the page, and it returns the ID of a new row every time the page is reloaded (which obviously mean the 'insertGetId' query worked). However, I hooked this query up to the page load just to test. What I really want to happen is on click of a button. Here is the code for that:
$("#checkoutFormbox").on('click', '#checkoutButton', function() {
var checkoutInformation = Object();
var accessories = [];
var counter = 0;
var deviceName = checkoutDeviceTable.cell(0, 0).data();
$(".accessoryCheckbox").each(function() {
//add accessory ID's to this list of only accessories selected to be checked out
if($(this).val() == "1")
{
accessories[counter] = $(this).data('id') + " ";
}
counter++;
});
checkoutInformation['deviceID'] = $(".removeButton").val(); //deviceID was previously stored in the remove button's value when the add button was clicked
checkoutInformation['outBy'] = '';
checkoutInformation['outNotes'] = $("#checkOutDeviceNotes").val();
checkoutInformation['idOfAccessories'] = 2;
checkoutInformation['dueDate'] = $("#dueDate").val();
if($("#studentIdButton").hasClass('active'))
{
checkoutInformation['renterID'] = 0;
checkoutInformation['emplid'] = 1778884;
console.log(checkoutInformation);
$.post("http://xxx.xxx.xxx.xxx/testing/public/apps/devicecheckout-checkoutdevices", {type: "checkoutDeviceForStudent", checkoutInformation: checkoutInformation}, function(returnedData) {
alert(returnedData);
});
}
});
Which is also then routed to AppController.php, specifically to the 'checkoutDeviceForStudent' part of the switch statement:
And then back to that query that is shown previously in DeviceCheckout.php
Finally, here is my DB structure for reference:
Any explanation as for why this would be happening? Also, any Laravel or other general best practice tips would be greatly appreciated as I'm inexperienced in usage of this framework and programming overall.
Sorry for such a long post, I hope there is enough information to diagnose this problem. Let me know if I need to include anything else.
Edit: Included picture of error at the top of the page.
Everything is exactly the same, information passed included
No, it isn't. If it was exactly the same you wouldn't be getting the error you're getting.
These sorts of issues are too difficult to solve by taking guesses at what the problem might be. You need to
Setup your system so Laravel's logging errors to the laravel.log file
Setup you PHP system so errors Laravel can't handled are logged to your webserver's error log (and/or PHP's error log)
Put Laravel in debug mode so errors are output the the screen, and the view the output of your ajax request via Firebug or Chrome
Once you have the actual PHP error it's usually pretty easy to see what's different about the request you think is the same, and address the issue.
I found a resolution to my problem after some advice from a friend; much easier than I anticipated and much easier than any solution that has been offered to me here or other places.
Essentially, what I needed to do was place a try, catch clause in my model function, and then if an exception is encountered I store that in a variable, return it, and use console.log() to view the exception. Here is an example to emulate my point:
public function getUserFullname($userID)
{
try
{
$myResult = DB::connection('myDatabase')->table('TheCoolestTable')->select('fullName')->where('userID', '=', $userID)->get();
return $myResult;
}
catch(Exception $e)
{
$errorMessage = 'Caught exception: ' . $e->getMessage();
return $errorMessage;
}
}
And then on the View (or wherever your model function returns to), simply console.log() the output of your POST. This will display the results of the successful query, or the results of the Exception if it encountered one as opposed to an unhelpful Internal Server Error 500 message.
I have tried and tried with this and don't seem to be getting anywhere, so thought I would put it out there. I have a form full of data provided by the user that I have thus far been able to validate with js/jQuery without problem before sending it to php for further processing. I am achieving this like so:
form.submit(function(){
if(validateUserName() & validateEmail1() & validateEmail2() & validatePass1() & validatePass2() & acceptTerms()){
return true;
} else {
return false;
}
});
The form itself uses the following attributes:
<form id="signup" action="registration.php" method="post">
The problem function is acceptTerms(), which simply has to check that a checkbox is selected does not seem to work as it should (or as the rest of the validation functions do).
var termscons = $('input:checkbox[name=termscons]');
termscons.change(acceptTerms);
function acceptTerms(){
if($(termscons).is(':checked')) {
termsconsInfo.text("");
return true;
} else {
termsconsInfo.text("In order to join, you must accept these terms and conditions.");
return false;
}
}
I have integrated the termscons.change listener and termsconsInfo.text(""); to ensure that my selectors are pointing at the right thing, and that the function is being fired correctly (which it appears to, by changing the termsconsInfo.text when its state changes). When I try to submit the form however it appears to return false, since it does not submit. It should in fact be returning true, and posting my form over to my php script.
Am I going about this the right way? Is there something I have been missing when dealing with a checkbox as opposed to a textinput?
Please excuse my ignorance, this is all still very new to me.
First problem is that you are using & in your if statement. It needs to be &&.
Second problem is with acceptTerms (as you guessed):
var termscons = $('input:checkbox[name=termscons]');
termscons.change(acceptTerms);
function acceptTerms(){
if(termscons.is(':checked')) { // <-- termscons is already a jQuery object
termsconsInfo.text("");
return true;
} else {
termsconsInfo.text("In order to join, you must accept these terms and conditions.");
return false;
}
}
There might be more, but that is what I see for now.
i say forget reinventing the wheel and use something already tested for instance jQuery validate