Differenciate Between User Requests and AJAX/Resource Requests

Differenciate Between User Requests and AJAX/Resource Requests - javascript

I'm attempting to create an app with Node.js (using http.createServer()) which will be a single page application with requests for data via XMLHttpRequest. To do this I need to be able to differentiate between a user navigating to my domain, and AJAX requests and requests generated by the browser for linked resources.
If the request is from the user I always want to return the index.html page which will handle requesting content but if the request is browser generated or AJAX and is for CSS, Javascript or other linked files I want to serve those files. Is there any way to detect this?
Looking at the request headers for the different file types I saw the referer header appeared when the request for content was generated by the page. I figured that was the solution I was looking for but that header is also set when a user clicks on a link to the page making it useless.
The only other thing which seems to change is the accept header which could sort of work but might not be a catch all solution. Any user requests always seem to have text/html as the preferred return type regardless of which url was entered. I could detect that but I'm pretty sure AJAX requests for html files would also have that accept header which would cause problems.
Is there anything I'm missing here (any headers or properties I can look for)?
Edit: I do not need the solution to protect files and I don't care about users bypassing it with their own requests. My intention is not to hide files or make them secure, but rather to keep any data that is requested within the scope of the app.
For example, if a user navigates to http://example.com/images/someimage.jpg they are instead shown the index.html file which can then show the image in a richer context and include all of the links and functionality to go with it.
TL/DR: I need to detect when someone is trying to access the app to then serve them the index page and have that send them the content they want. I also need to detect when the browser has requested resources (JS, CSS, HTML, images, etc) needed by the app to be able to actually return the resource not the index file.

In terms of HTTP protocol there are NO difference between a user-generated-query and a browser-generated-query.
Every query is just... a query.
You can make a query with a command line, with a browser, you can click a link, send some ascii text via telnet, request a proxy which will make the query for you, the server goal is never to identify how the query was requested by the user.
See for example a request made by a user on a reverse proxy cache, this query will never reach your server (response comes from the cache), the first query made to build this response could have been made by a real user or by a browser.
In terms of security trying to control that the user is never requesting data by-himself cannot be done by detecting that the query is a real human click (and search google for clickjacking if you want to be afraid). Every query that a browser can make can also be played by the user, every one, you have no way to prevent that.
Some browsers plugins are even doing pre-fetching, detecting links on the page and making the request before you do it yourself (if it's a GET query).
For ajax, some libraries like JQuery will add an X-Requested-With: XMLHttpRequest header, and this is used on most framework to detect ajax mode.
But it is more robust to depend on a location policy for that (like making your ajax queries with a /format/ajax, which could also be used on other ways (like /format/json, /format/html, or /format/csv).
Spending time on a location policy based routing is certainly more usefull.
But one thing can make a difference, POST queries are not indempotent, it means the browser cannot make a POST query without a real user interaction, because a POST query may alter the state of the session or the state of the server data (but js can make POST queries, this is just a default behavior of browsers). The browser will never automatically retrieve a POST query, so you could make a website where all users interactions are POST queries (via forms or via some js altering link clicks to send POST ajax queries instead). But I'm not that's your real goal.

Not technically an answer to the question but I found a simple solution which does what I want: prefix all app based requests with a subdomain eg. http://data.example.com/. It's then really simple to check the host header for that subdomain: if present send the resource else send the index page.

Related

Server side rendering issue over a CDN

I have recently launched a site that uses server side rendering (with next.js). The site has login functionality where if an authentication cookie is present from a user's request then it will render a logged in view for that user on the server and return the rendered logged in view to the users browser. If the user does not have an authentication cookie present then it renders a logged out view on the server and returns that to the users browser.
Currently it works great but I have hit a snag when trying to serve the site over a CDN. My issue is that the CDN will cache a servers response to speed it up so what will happen is the first user to hit the website on the CDN will have their logged in view cached and returned to the browser. This in turn means because it is cached then other users who hit the site also see the other users logged in view as opposed to their own as that's what has been cached by the CDN. Not ideal.
I'm trying to think of what the best way to solve this problem would be. Would love to hear any suggestions of the best practice way to get around this?
One way I have thought of would be to potentially always return a logged out view request on the first page visit and so the authentication/ logging in client side and from then on always do the authentication on the server. This method would only work however if next.js only does server side rendering on the first request and let's subsequent requests do all rendering on the client and I'm not sure if that's the case.
Thanks and would love all the help/ suggestions I could get!
UPDATE
From what I can gather so far from the answers it seems that the best way for me to get around this will be to serve a CDN cached logged out view to every user when they first visit the site. I can then log them in manually from the frontend if an authentication token is present in their cookies. All pages after the first page they land on will have to return a logged in view - is this possible with Next.js? Would this be a good way to go about it? Here is a summary of these steps:
The user lands on any webpage
A request is made to the server for that page along with the users cookies.
Because this is the first page they are visitng the cookies are ignored and a "logged out" view is returned to the users browser (that will have been cached in the CDN)
The frontend then loads a logged out view. Once loaded it checks for an authentication token makes a call to the API to log them in if there is one present
Any other page navigation after that is returned from the server as a "logged in" view (ie the authentication cookie is not ignored this time). This avoids having to do step 4 again which would be annoying for the user on every page.

For well-behaved caching proxies (which your CDN should be), there are two response headers you should use:
Cache-Control: private
Setting this response header means that intermediary proxies are not allowed to cache the response. (The browser can still cache it, if it's appropriate to do so. If you want to prevent any caching, you'd use no-store instead.)
See also: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control
Vary: Cookie
This response header indicates that the data in the response is dependent on the Cookie request header. That is, if my request has the header Cookie: asdf and your request has the header Cookie: zxcv, then the requests are considered different, and will be cached independently. Note that using this response header may drastically impact your caching if cookies are used for anything on your domain... and I'd bet that they are.
See also: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Vary
An alternative...
A common alternative approach these days is to handle all the user facing dynamic data client-side. That way, you can make a request to some API server which has no caching CDN at all. The page is then filled client-side with the data needed. The static parts of the site are served directly from the CDN.

All CDNs cache and distribute data rely on the cache header in the HTTP response. You should consider these two simple notes to get the best performance without miss the power of CDN.
1. No-cache header for dynamic content (HTML response, APIs,...):
You should make sure all dynamic contents (HTML response, APIs,...) cache header response is Cache-Control: no-cache.
If you're using next.js can use a custom server (express.js) to serve your app and full control on the response header or you can change next.js config.
2. Set cache header for static content (js, CSS, images, ...)
You should make sure all statics contents (js, CSS, images, ...) cache header response is Cache-Control: max-age=31536000.
If you're using next.js in every build all assets have a unique name and you can set a long-term cache for static assets.

Try to add cache control header to your Auth required pages.
Cache-Control: Private
The private response directive indicates that a resource is user specific—it can still be cached, but only on a client device. For example, a web page response marked as private can be cached by a desktop browser, but not a content delivery network (CDN).

What I understand from your question is that when a user logged in, the logged-in view is getting cached on the CDN and when the user is logged out then also the site is shown in the logged-in view from the CDN cache.
There are some solutions to this issue are as follows:
Set some TTL(Time To Live) for the CDN so that it will automatically invalidate the cache data after a specific time.
As you want to deliver the site fastly means you want to achieve low latency. For this you can do one thing, just cache the big files from the website like images, videos, documents, etc to the CDN. And don't cache the entire website there. Now, every time the user request comes then the site will be served from the regular server and the media files will be taken from the CDN. In this way, you can achieve low latency. And as the media files are taken from the CDN cache, the website code will load fastly and the site will be served quickly. In this way, the authentication will be done on the server-side.
Another solution would be to invalidate the cookie and the authentication after a certain time of inactivity. And after that when a user comes then the site should render a logged-out view.

loading a web page for a fake query string

I don't even know how to phrase the title of this question, but hopefully the following description will explain my issue.
I have a web application that is made up of a single, bare search page with a search field. The search is actually performed by the client browser and results are loaded via ajax. In other words, the server does nothing but serve up the bare search page at http://server/index.html
Once the query is performed, I use history.pushState() to change the URI in the browser address bar to something more sensible like http://server/index.html?q=searchterm&page=1&size=10. Pagination is performed by prev and next links that too are called via ajax along with the appropriately incremented or decremented page and size values. All is good.
But, I want my application to be a good web citizen, and be bookmark-able. In other words, if someone enters http://server/index.html?q=searchterm&page=1&size=10 directly in the browser address bar, I want to load the results correctly. Except, if I send that URI to the server, the serve will croak unless I implement some server-side processing. And, that is something I don't want to do as that will change the complexity of my application completely. Unless I can do that with plain, vanilla nginx (my web server). In other words, I don't want to implement any server side scripting other than what can be done with the web server itself, such as SSI.
So, how do I solve this problem?

hi the exact term for what you are trying to do is "Client side routing". It involves a combination of manipulating the browsers history using history.pushState() [which you are already doing] and server side config setting
.htaccess if you are using apache
config file if you are using nginx.
The server side settings will make your web server your base index.html for whatever request the browser makes(http://server/index.html?q=searchterm&page=1&size=10) once loaded in the client you have to get the query string in the window address bar and handle accordingly(make an ajax request).
This implementation has implications when search engines crawl your site using the URL but that is not within the scope of this question.
this SO question will give you a start

actually, I think this is a lot easier than I thought. When I send the browser to http://server/index.html?q=searchterm&page=1&size=10, it doesn't complain. It simply sends back http://server/index.html. Then it is just a matter for me to use js to extract the query string and do my ajax bit. This should work.

Prevent local PHP/HTML files preview from executing javascript on server

I have some HTML/PHP pages that include javascript calls.
Those calls points on JS/PHP methods included into a library (PIWIK) stored onto a distant server.
They are triggered using an http://www.domainname.com/ prefix to point the correct files.
I cannot modify the source code of the library.
When my own HTML/PHP pages are locally previewed within a browser, I mean using a c:\xxxx kind path, not a localhost://xxxx one, the distant script are called and do their process.
I don't want this to happen, only allowing those scripts to execute if they are called from a www.domainname.com page.
Can you help me to secure this ?
One can for sure directly bypass this security modifying the web pages on-the-fly with some browser add-on while browsing the real web site, but it's a little bit harder to achieve.
I've opened an issue onto the PIWIK issue tracker, but I would like to secure and protect my web site and the according statistics as soon as possible from this issue, waiting for a further Piwik update.
EDIT
The process I'd like to put in place would be :
Someone opens a page from anywhere than www.domainname.com
> this page calls a JS method on a distant server (or not, may be copied locally),
> this script calls a php script on the distant server
> the PHP script says "hey, from where damn do yo call me, go to hell !". Or the PHP script just do not execute....
I've tried to play with .htaccess for that, but as any JS script must be on a client, it blocks also the legitimate calls from www.domainname.com

Untested, but I think you can use php_sapi_name() or the PHP_SAPI constant to detect the interface PHP is using, and do logic accordingly.
Not wanting to sound cheeky, but your situation sounds rather scary and I would advise searching for some PHP configuration best practices regarding security ;)
Edit after the question has been amended twice:
Now the problem is more clear. But you will struggle to secure this if the JavaScript and PHP are not on the same server.
If they are not on the same server, you will be reliant on HTTP headers (like the Referer or Origin header) which are fakeable.
But PIWIK already tracks the referer ("Piwik uses first-party cookies to keep track some information (number of visits, original referrer, and unique visitor ID)" so you can discount hits from invalid referrers.
If that is not enough, the standard way of being sure that the request to a web service comes from a verified source is to use a standard Cross-Site Request Forgery prevention technique -- a CSRF "token", sometimes also called "crumb" or "nonce", and as this is analytics software I would be surprised if PIWIK does not do this already, if it is possible with their architecture. I would ask them.
Most web frameworks these days have CSRF token generators & API's you should be able to make use of, it's not hard to make your own, but if you cannot amend the JS you will have problems passing the token around. Again PIWIK JS API may have methods for passing session ID's & similar data around.

Original answer
This can be accomplished with a Content Security Policy to restrict the domains that scripts can be called from:
CSP defines the Content-Security-Policy HTTP header that allows you to create a whitelist of sources of trusted content, and instructs the browser to only execute or render resources from those sources.
Therefore, you can set the script policy to self to only allow scripts from your current domain (the filing system) to be executed. Any remote ones will not be allowed.
Normally this would only be available from a source where you get set HTTP headers, but as you are running from the local filing system this is not possible. However, you may be able to get around this with the http-equiv <meta> tag:
Authors who are unable to support signaling via HTTP headers can use tags with http-equiv="X-Content-Security-Policy" to define their policies. HTTP header-based policy will take precedence over tag-based policy if both are present.
Answer after question edit
Look into the Referer or Origin HTTP headers. Referer is available for most requests, however it is not sent from HTTPS resources in the browser and if the user has a proxy or privacy plugin installed it may block this header.
Origin is available for XHR requests only made cross domain, or even same domain for some browsers.
You will be able to check that these headers contain your domain where you will want the scripts to be called from. See here for how to do this with htaccess.
At the end of the day this doesn't make it secure, but as in your own words will make it a little bit harder to achieve.

Chrome extension for blocking websites based on database blacklist

We have a database with millions of domain categorizations (storing it client side is not an option) and we want to make a chrome extension to blacklist sites based on how they are categorized in the Mysql database.
The server side stuff is easy, we post the domain, and return the category.
The tricky part is blocking requests based on the categorization. Here are a few potential implementations and why they won't (quite) work.
Idea 1:
Redirect all traffic using Chrome.webRequest to mysite.com/script.php?url=www.theoriginalurl
This script checks the database's category & either redirects them to the theoriginalurl.com or denies the request, redirecting them to www.youGotBlocked...
Have the chrome extension check the http referrer header to make sure that they came from mysite.com (unless the url is mysite.com, then do nothing).
Problems:
It doesn't seem like we can set the referrer header in PHP, so we have no way of knowing that they came from mysite.com. It seems like maybe we should be passing info via a cookie, but I haven't thought of an elegant solution involving cookies.
Idea 2:
Every time Chrome.webRequest fires make an AJAX POST request to mysite.com/categorizeURL.php with the URL to get the category. Block or allow based on the server's response.
Problems:
Either we make the request asynchronous and we can't get the response in time (their is no way that we have found to delay the callback until the server responds -- more on that here). Or we make the request synchronous, and IT WORKS!!! Except for the fact that if they can't reach our server, their entire browser locks up and they essentially need to refresh the extension to be able to access the internet again.
Other ideas?
Does anyone have other ideas for creating a blacklist via a Chrome extension? I simply refuse to believe that it is not possible.

Can POST request be an alternative to GET request in most of the scenarios?

I wet thru http://www.w3schools.com/tags/ref_httpmethods.asp and wondered why should i not always prefer
POST request over GET request. I can think of two scnarios where i have to use POST request instead of GET
Request. These are:-
1)Where i have requirement to bookmark the the URL
2)Where my requirement is to cache the web page (as POST request does
not cache the web page), so that next time same url is hit it can be
obtained from cache and optimize the performance.
I agree POST request is designed to create/update the resource where GET request is designed
to retrieve the resource.Though,techincally they can be used vice versa also.
So i was wondering
is it not always benefecial to use POST request over GET request(except the two requirements i mentioned
above) as Post is more secure? Is my understanding correct?

There are many reasons to use HTTP the way it was meant to be used. Here's a couple:
The value of the web is built on URLs. Every time you provide a page which is obtainable only via POST, you are denying the option to link to it, as well as to bookmark it. (Obviously a form button can still be made, but that's not as convenient.) Even if the page is some kind of “service”, there is still often value in linking — that you won't have thought of beforehand.
If the user reloads a page obtained via POST, most web browsers will warn that they are “resubmitting a form” and confirm the action. This is because in poorly designed applications this can result in things like placing a duplicate order or posting a duplicate message. Therefore, using GET for requests which do not have side effects eliminates this unnecessary warning. In fact, a useful practice for POSTs which have effects is to make the response to them be a redirect to a URL (which the browser will GET) for a page describing the results of the action (for example, if the POST posted a comment, it would then redirect to a link to the comment); this way the page can be reloaded (which could be implicit e.g. if the browser were restarted) without any ambiguity about whether it's re-executing the action.

Develop Reference

JavaScript is the programming language of the Web.