Verify URL with JavaScript? - javascript

I have an application where users have somewhat of a settings page, and some of those settings allow URLs to be entered into them. I want to be able to run a check using javascript to make sure the URL they entered is valid and a real URL.
When they click submit, it will either do what it's supposed to do, or if there's an error in the URL, pop up with an alert and say "Invalid URL entered!".
Is there a way to do this with JS? I also want this script to work with http:// and https:// as well as www. in the URL, and every domain extension (.com, .tv, etc.). This also needs to be done with JavaScript, not jQuery.
Can anyone show me how to do this? Thanks.

It seems you're wanting to do two things:
1) Determine that the URL is valid (i.e. correctly formatted)
For this a regular expression will work well, this approach will also allow you to retrieve various parts of the URL if that's something you'd like to do.
this has been discussed here: What is the best regular expression to check if a string is a valid URL?.
2) Determine that the URL is real (i.e. if someone where to follow it they'd find something)
This is more tricky, but you could attempt an AJAX request to the URL and if it fails or times out assume it's down. There may be some limitations to this approach due to XSS security features on sites.
If that's a problem you could create a service of your own design that runs on a server that your JavaScript makes a request to, passing it the URL, and it responds with a failure or success.
Here's an example:
verify.js
function verifyURL (url) {
// with jQuery
$.getJSON('check-url.cgi', { url : url }, function (res) {
console.log(res); // display server response
if ( res.status == 'success' ) {
// URL is real
} else {
// URL is not real
}
});
}
check-url.cgi
#!/usr/bin/env perl
use v5.10;
use strict;
use warnings;
use CGI qw(:standard);
use JSON::XS;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent("URL Checker/0.1");
my $url = param('url');
my $req = HTTP::Request->new(GET => $url);
my $res = $ua->request($req);
my $status = $res->is_success ? 'success' : 'failure';
print header('applicaton/json'), encode_json { status => $status };

Related

Fetch Current URL not getting ID , class based filters

I am trying to fetch my current page URL, for an if-else statement.
My URL is like this : http://something.com/fetch_url.php#filter=.apples
And my code is :
<?php
echo $_SERVER['REQUEST_URI'].'<br/>';
?>
It gives me :
/fetch_url.php only, but i need to check for #filter=.apples , in if-else case.
Please guide.
Try using a correct URL like http://something.com/fetch_url.php?filter=apples
# part in the URL never approach to a web server, therefore you cannot access it.
And use if statement like this.
if($_REQUEST['filter']=='apples'){
//then perform your action here according to requirements
}else{
// otherwise provide instruction for the else condition here
}
A # part in the URL can never reaches a web server, hence you cannot access it. One workaround that we have been using is to use Javascript to access it via window.location.hash, and then do a separate post to the server to obtain this hash, if it's necessary.
Similar question has been asked before: Get fragment (value after hash '#') from a URL in php

Node js setup an Anchor [duplicate]

I know on client side (javascript) you can use windows.location.hash but could not find anyway to access from the server side. I'm using asp.net.
We had a situation where we needed to persist the URL hash across ASP.Net post backs. As the browser does not send the hash to the server by default, the only way to do it is to use some Javascript:
When the form submits, grab the hash (window.location.hash) and store it in a server-side hidden input field Put this in a DIV with an id of "urlhash" so we can find it easily later.
On the server you can use this value if you need to do something with it. You can even change it if you need to.
On page load on the client, check the value of this this hidden field. You will want to find it by the DIV it is contained in as the auto-generated ID won't be known. Yes, you could do some trickery here with .ClientID but we found it simpler to just use the wrapper DIV as it allows all this Javascript to live in an external file and be used in a generic fashion.
If the hidden input field has a valid value, set that as the URL hash (window.location.hash again) and/or perform other actions.
We used jQuery to simplify the selecting of the field, etc ... all in all it ends up being a few jQuery calls, one to save the value, and another to restore it.
Before submit:
$("form").submit(function() {
$("input", "#urlhash").val(window.location.hash);
});
On page load:
var hashVal = $("input", "#urlhash").val();
if (IsHashValid(hashVal)) {
window.location.hash = hashVal;
}
IsHashValid() can check for "undefined" or other things you don't want to handle.
Also, make sure you use $(document).ready() appropriately, of course.
[RFC 2396][1] section 4.1:
When a URI reference is used to perform a retrieval action on the
identified resource, the optional fragment identifier, separated from
the URI by a crosshatch ("#") character, consists of additional
reference information to be interpreted by the user agent after the
retrieval action has been successfully completed. As such, it is not
part of a URI, but is often used in conjunction with a URI.
(emphasis added)
[1]: https://www.rfc-editor.org/rfc/rfc2396#section-4
That's because the browser doesn't transmit that part to the server, sorry.
Probably the only choice is to read it on the client side and transfer it manually to the server (GET/POST/AJAX).
Regards
Artur
You may see also how to play with back button and browser history
at Malcan
Just to rule out the possibility you aren't actually trying to see the fragment on a GET/POST and actually want to know how to access that part of a URI object you have within your server-side code, it is under Uri.Fragment (MSDN docs).
Possible solution for GET requests:
New Link format: http://example.com/yourDirectory?hash=video01
Call this function toward top of controller or http://example.com/yourDirectory/index.php:
function redirect()
{
if (!empty($_GET['hash'])) {
/** Sanitize & Validate $_GET['hash']
If valid return string
If invalid: return empty or false
******************************************************/
$validHash = sanitizeAndValidateHashFunction($_GET['hash']);
if (!empty($validHash)) {
$url = './#' . $validHash;
} else {
$url = '/your404page.php';
}
header("Location: $url");
}
}

Clients using `GET` requests for a form, even though `POST` is defined. is javascript iframe the cause?

I have two subsequent forms on my website with POST method.
The first page of my website first.php contains this code:
<form action="a.php" method="POST" target="_blank">
<input name="value" type="hidden" value="foo"/>
<div class="button"><label><span class="icon"></span>
<input type="submit" class="button-graphic ajax" value="Click Here"></label></div></form>
a.php can be accessed only via this POST request (otherwise user will get method not allowed 405 error)
Once submitted, this form opens a.php with an AJAX modal window.
a.php contains another form:
<form action="b.php" method="POST" target="_blank">
<input name="bar" type="hidden" value="none"/>
<div class="border"><label><input type="submit" class="button-graphic2 tracking" value="Continue"></label></div></form>
When a user clicks Submit in the second form, it will open b.php,
which can also be accessed only via POST request (otherwise - 405 error).
The only difference I can think about between these forms is that the second one contains a tracking js class (opening an iframe). this is the js code:
$(document).ready(function() {
$(".tracking").click(function(){
var iframe = document.createElement('iframe');
iframe.style.width = '0px';
iframe.style.height = '0px';
iframe.style.display = 'block';
document.body.appendChild(iframe);
iframe.src = '/track.htm';
});
This is done in order to track a conversion using a third party script which is being execuated from track.htm
I noticed that I am having a problem with about 5% of my iPad visitors.
they open a.php properly with a POST request, but when they go ahead to continue and open b.php as well, about 5% sends out a GET request instead of the desired POST request, causing them to get an 405 error and leave the website.
I know that these are real human users as I can see some of them trying several times to open b.php and keep getting these 405 errors.
Could this be caused because simultaneously their device is using a GET request to obtain track.htm? and this is some glitch?
How can this be solved?
EDIT 4.4.2015:
Since there's a chance that firing the tracking script is causing this, I would like to know if there's another fire to fire it (or track that adwords conversion), without causing these iPad user to use "GET" requests for the form as well.
EDIT 10.4.2015:
This is the jquery code of the ajax class, that effects both first.php and perhaps a.php, as first.php is the parent frame:
$(document).ready(function() {
$(".ajax").click(function(t) {
t.preventDefault();
var e = $(this).closest("form");
return $.colorbox({
href: e.attr("action"),
transition: "elastic",
overlayClose: !1,
maxWidth: $("html").hasClass("ie7") ? "45%" : "false",
opacity: .7,
data: {
value: e.find('input[name="value"]').val(),
}
}), !1
})
}),
Technically, it shouldn't happen. The iframe created by your tracking script pointed to /track.htm, so there shouldn't be any GET request to your b.php page.
On the other hand, just thinking out loud here, there're a few scenario that could happen because of "real world" user.
The users happen to have bookmark the b.php page, thus causing them to open it using GET when they try to re-open the page using their bookmark.
The users tried to refresh the page b.php, then get warned about "Form re-submission". Being clueless as most real user are, they canceled the form re-submission, then click on the address bar and click GO on their browser with the sole intention of reloading the page. This could also cause the GET request to send to the b.php page.
Considering the best practice when designing the page flow for form submission, it might be better for you to only "process" your form data in b.php and then return a 302 Redirect to another page that show the result using a GET request. This will allow users to "refresh" the page without double submitting the form, and also allow user to bookmark the result page too.
This doesn't answer your question but as it entails to the GET glitch but as things stand, ~5% of your iPad visitors can't sign up because the code only accepts POST and so far no one can figure this out. So I propose a change of strategy, at least in the mean time.
Preventing CSRF by only accepting POST requests is already known to not work. Your choice of accepting only this request method as a means of security is what ultimately results in the 405. There are better ways.
One example of is using a CSRF token, specifically the Synchronizer Token Pattern.
The idea behind a CSRF token is that when you generate the form, you also generate a "key" which you tie to the form. When that form is submitted, if it doesn't have the key or the key isn't the right one, you don't bother processing the form. The Syncronizer Token Pattern gets fancy in that it changes the expect key each time (in the form field implementation, giving the <input type="hidden"> field a new name attribute each time) in addition to the value.
Have your code in a.php generate a random token and
store it as a session variable on the server. Output the token in the form as a hidden field.
Before processing the request in b.php, ensure the token value is in the request data and ensure it has the expected value.
You can first check for $_POST data and if it is missing, check for $_GET data. Regardless of which array contains the data, if the data does not have a valid CSRF token, respond with a 4xx error.
If the token is good, consume the token and process the request.
If the token is missing or is invalid, return a 4xx response code.
Another way would be to set your field names to random values each time the form is generated. So instead of <input name="value" type="hidden" value="foo"/> or <input name="bar" type="hidden" value="none"/>.
// ... in an importable file somewhere ...
// Generate our tokens
function token($len = 13) {
$chrs = 'abcdefghijklmnopqrstuvwxyz0123456789_';
$str = '';
$upper_lim = strlen($chrs) - 1;
for ($i = 0; $i < $len; $i++) {
$idx = rand(0, $upper_lim);
$str .= rand(0, 1) ? strtoupper($chrs[$idx]) : $chrs[$idx];
}
return $str;
}
function magic_set_function($key, $value) {
$_SESSION[$key] = $value;
}
function magic_get_function($key) {
return (array_key_exists($key, $_SESSION) ? $_SESSION[$key] : NULL)
}
function validate_request() {
$data = !empty($_POST) ? $_POST : $_GET;
if ( empty($data) ) { return false; }
// Ensure the tokens exist (hopefully not too costly)
$field_tokens = magic_get_function('field_tokens');
if ( $field_tokens) === NULL ) { return false; }
$csrf_token_name = $field_tokens['token'];
$given_csrf_token = $data[$csrf_token_name];
// Get our CSRF token
$expected_csrf_token = magic_get_function('csrf_token');
// ensure we're expecting a request / that we have generated a CSRF
if ( $expected_csrf_token === NULL ||
$expected_csrf_token !== $given_csrf_token) {
return FALSE;
}
// After whatever other checks you want...
return TRUE;
}
function fetch_data() {
$data = empty($_POST) == FALSE ? $_POST : $_GET;
if (empty($data ) { throw new DataLoadException(); }
// Ensure the tokens exist (hopefully not too costly)
$field_tokens = magic_get_function('field_tokens');
if ( $field_tokens) === NULL ) { throw new TokenLoadException(); }
foreach ($field_tokens as $field_name => $token_name) {
if ( isset($data[$token_name]) ) {
$data[$field_name] = $data[$token_name];
unset($data[$token_name]);
}
}
return $data;
}
// first.php/a.php/b.php (wherever necessary)
// ...
$tokens = array();
// our csrf token
$csrf_token = token();
$field_names = array('value', 'bar', 'token');
$field_values = array('value'=>'foo', 'bar' => 'none', 'token' => $csrf_token);
// Tokenize errthing...
foreach ($field_names as $k => $field_name) {
// and generate random strings
$tokens[$field_name] = token();
}
// You NEED TO STORE THESE TOKENS otherwise submissions lose context
magic_set_function('field_tokens', $tokens);
magic_set_function('csrf_token', $csrf_token); // dup, but j.i.c.
// first.php
printf('<input type="hidden" name="%s" value="%s"/>', $tokens['value'], $field_values['value']);
// ...
// a.php
// Get the data... (POST/GET)
if (ensure_valid_request() !== TRUE) { handle_invalid_request(); }
$data = fetch_data();
// ...
// Tokenize errthing, generate a csrf, store the values, etc.
// ...
printf('<input type="hidden" name="%s" value="%s"/>', $tokens['bar'], $field_values['bar']);
// ...
// b.php
// ... You get the idea ...
It doesn't answer your question of why 5% are sending GET Requests but it does solve your overall problem on both a security and user level.
EDIT:
To specifically answer OPs questions in comments:
"(1) does this require using cookies? (a session means cookies right?)"
Read up on PHP Sessions and look for a session library. Plenty out there, one heavyweight being Zend(http://framework.zend.com/manual/1.12/en/zend.session.html). You can save to a database instead for protected server-side sessions. I made one similar to Kohana's.
(2) I didn't understand the "another way" part - how does it differ from the method you described at first?
First method is to just add a token to your form and look for the token to have the expected value upon submission. If the form doesn't have it, you throw an error complaining.
Second method dynamically sets the field names upon form generation AND adds a token field. Submitting the proper form data from a program, bot, or outside source now first requires fetching the form since they wont know what field names to use (instead of just posting data with set field names).
"(3) most important, I am less worried about CSRF attacks, I just don't want bots/crawler to crawl into my forms, would this method prevent it from them, as opposed to humans? why? and is there an easier method to achieve that?"
If you mean bots like Google/SEO/respectful web-crawlers, robots.txt exists
for this purpose. robots.txt is a very simple text file that is placed in your site's root directory. You'll see requests in your webserver's access logs for a /robots.txt. This file tells search engine and other robots which areas of your site they are allowed to visit and index. You can read more on the (Robot Exclusion Standard)4 on many (websites)5.
As the second link notes, don't use robots.txt to hide information. It is a public file and visible to anyone. Also, malicious bots wont respect the file.
I'm not sure if when you say bots you mean just crawlers or spambots (bots trying to submit data) and such. If it's crawlers, robots.txt takes care of them. If it's spambots, you can add a hidden field (hidden with CSS not html) with a common name that when filled out you know is invalid, you can add a captcha, etc, etc, etc.
Try doing the tracking on the callback of the original request to ensure its loaded?
Also you could look into something like ajaxFormPlugin by malsup
i would like to suggest to check the permission of your "b.php" page. Please make sure the page has "w" permission for all users. this is a chance for not making a "POST" request.
I know it's a workaround but if, as I suppose, you have a bunch of checks for the $_POST variables, if you receive a GET request you could try replace the POST with the GET:
if (empty($_POST) && !empty($_GET)) $_POST = $_GET;
//here the check of $_POST
//...
since we don't know why this ipads (...apple -.-) have the issue, and between GET and POST there isn't so much difference - at least if you don't need to upload files...
The only way a post form can be sent as get is using script (changing the method attribute directly, or replacing the form behavior for example with an ajax request, binding to the event "submit" another function), so I suggest you to check every script that run in the parent and the children pages.
your ajax call doesn't contain method: "POST". This can be the cause.

How do I modify the accept headers for a GET request (outside of ajax) on the client side?

I have a download to Excel button on a page whose intent is to call the exact same URL, but with the Request Header set to "application/ms-excel".
Currently, I am faking it, by calling another URL, then adjusting the headers and then forwarding to the same function.
Server-side (Django):
HTTP_HEADER_EXCEL = "application/ms-excel"
#fake testing url
#http://localhost:8000/myfunction/<CLASSID>/xls/
def myfunction_xls(request, CLASSID):
#intercept request, add the appropriate accepts
#and forward it
request.META["HTTP_ACCEPT"] = HTTP_HEADER_EXCEL
request.META["dbr"] = dbr
return myfunction(request, CLASSID)
#standard url
#http://localhost:8000/myfunction/<CLASSID>/
def myfunction(request, CLASSID, f_callback=None):
if request.META["HTTP_ACCEPT"] == HTTP_HEADER_EXCEL:
f_callback=provider.generateExcel
....do lots of work...
di_context = dict(inst=inst,
parent=inst,
custom=custom,
url_excel=url_excel,
if f_callback:
#use xlsxwriter to process di_context data
#wrap up the appropriate response headers
#and it appears as a download (it works)
return f_callback(request, di_context)
#non-Excel branch, i.e. standard Django behavior
t = get_template('pssecurity/security_single.html')
c = RequestContext(
request,
di_context,
)
html = t.render(c)
return HttpResponse(html)
My problem is that I don't want to maintain a custom URL just for Excel (or adding an optional /xls/ to the regex for the url. Perfectly OK using the existing url, and having the server adjust on the basis of the accepts headers. And, yes, I could add query parameter to indicate xls, but... isn't my particular requirement what accept headers are for?
I found a discussion about how to do this in Ajax, but that's not necessary here. Perfectly happy with a regular GET (not POST) request that happens to specify application/ms-excel.
I know I can't specify the accepts using the href attribute. And, while window.open() in javascript would do the trick just fine, I don't see any way to change the accept headers there either.
Hmmm, yes, may be a web noob question, but I can't find much about easily modifying accept headers outside of $http or $ajax trickery.

Cut string obtained with Javascript inside hyperlink

I made a bookmark that users can add and it sends them to my site capturing the referrer.
Bookmark
My problem is that for some reason the location.href part instead of printing http:// it prints: "http%3A//". I want to remove it and get just the domain.com
I have a similar code that maybe could be useful but I'm having a hard time figuring out how to implement it inside HTML.
// Function to clean url
function cleanURL(url)
{
if(url.match(/http:\/\//))
{
url = url.substring(7);
}
if(url.match(/^www\./))
{
url = url.substring(4);
}
url = "www.chusmix.com/tests/?ref=www." + url;
return url;
}
</script>
Thanks
In most browsers, the referrer is sent as a standard field of the HTTP protocol. This technically isn't the answer to your question, but it would be a cleaner and less conspicuous solution to grab that information server-side.
In PHP, for example, you could write:
$ref = $_SERVER['HTTP_REFERER'];
...and then store that in a text file or a database or what-have-you. I can't really tell what your end purpose is, because clicking a bookmark lacks the continuity of browsing that necessitates referrer information (like the way that moving from a search engine or a competitor's website would). They could be coming from a history of zero, from another page on your site or something unrelated altogether.
Like already stated in my comment:
Be aware that this kind of bookmarking may harm users privacy, so please inform them accordingly.
That being said:
First, please use encodeURIComponent() instead of escape(), since escape() is deprecated since ECMAScript-262 v3.
Second, to get rid of the "http%3A//" do not use location.href, but assemble the location properties host, pathname, search and hash instead:
encodeURIComponent(location.host + location.pathname + location.search + location.hash);

Categories

Resources