When I fetch a page using GET request in javascript, does the browser cache it the same way as it does when I click that link or type it in address bar?
If not, since I have already fetched the page, is there a way that I can add it (programmatically) to the browser cache?
When the browser fetches web pages, it is also using a GET request. Chances are that all GET requests go through the same caching mechanism in the browser, though there is no specification that formalizes how that works.
There is no programmatic way to add something to the browser's own cache other than just requesting the resource and letting the browser's cache do its normal thing with it. If you want to know if all common browsers will cache it in this way, then you need to make sure the server-side header settings are set appropriately (to allow it to be cached) and then test each browser to make sure it's cached like you want.
If you are staying within the same page and want to make sure something is not requested more than once from the same page, you can implement your own cache within that page's javascript code. You just store the result in a javascript variable the first time it is requested and then a function you implement to fetch this resource just checks your own local storage object to see if the resource is already here. If not, it requests it via a GET and then saves the result. You could make a simple version of this that was hardcoded to one particular resource or a more general version that saved the URL and result and a timestamp and implemented more typical caching behaviors.
If you want it to be cached across pages and your testing finds that the built-in browser caches are not adequate, then you can use Local Storage to store the data (probably with a timestamp) and then just check the local storage before requesting it with a GET request.
Related
I'm attempting to create an app with Node.js (using http.createServer()) which will be a single page application with requests for data via XMLHttpRequest. To do this I need to be able to differentiate between a user navigating to my domain, and AJAX requests and requests generated by the browser for linked resources.
If the request is from the user I always want to return the index.html page which will handle requesting content but if the request is browser generated or AJAX and is for CSS, Javascript or other linked files I want to serve those files. Is there any way to detect this?
Looking at the request headers for the different file types I saw the referer header appeared when the request for content was generated by the page. I figured that was the solution I was looking for but that header is also set when a user clicks on a link to the page making it useless.
The only other thing which seems to change is the accept header which could sort of work but might not be a catch all solution. Any user requests always seem to have text/html as the preferred return type regardless of which url was entered. I could detect that but I'm pretty sure AJAX requests for html files would also have that accept header which would cause problems.
Is there anything I'm missing here (any headers or properties I can look for)?
Edit: I do not need the solution to protect files and I don't care about users bypassing it with their own requests. My intention is not to hide files or make them secure, but rather to keep any data that is requested within the scope of the app.
For example, if a user navigates to http://example.com/images/someimage.jpg they are instead shown the index.html file which can then show the image in a richer context and include all of the links and functionality to go with it.
TL/DR: I need to detect when someone is trying to access the app to then serve them the index page and have that send them the content they want. I also need to detect when the browser has requested resources (JS, CSS, HTML, images, etc) needed by the app to be able to actually return the resource not the index file.
In terms of HTTP protocol there are NO difference between a user-generated-query and a browser-generated-query.
Every query is just... a query.
You can make a query with a command line, with a browser, you can click a link, send some ascii text via telnet, request a proxy which will make the query for you, the server goal is never to identify how the query was requested by the user.
See for example a request made by a user on a reverse proxy cache, this query will never reach your server (response comes from the cache), the first query made to build this response could have been made by a real user or by a browser.
In terms of security trying to control that the user is never requesting data by-himself cannot be done by detecting that the query is a real human click (and search google for clickjacking if you want to be afraid). Every query that a browser can make can also be played by the user, every one, you have no way to prevent that.
Some browsers plugins are even doing pre-fetching, detecting links on the page and making the request before you do it yourself (if it's a GET query).
For ajax, some libraries like JQuery will add an X-Requested-With: XMLHttpRequest header, and this is used on most framework to detect ajax mode.
But it is more robust to depend on a location policy for that (like making your ajax queries with a /format/ajax, which could also be used on other ways (like /format/json, /format/html, or /format/csv).
Spending time on a location policy based routing is certainly more usefull.
But one thing can make a difference, POST queries are not indempotent, it means the browser cannot make a POST query without a real user interaction, because a POST query may alter the state of the session or the state of the server data (but js can make POST queries, this is just a default behavior of browsers). The browser will never automatically retrieve a POST query, so you could make a website where all users interactions are POST queries (via forms or via some js altering link clicks to send POST ajax queries instead). But I'm not that's your real goal.
Not technically an answer to the question but I found a simple solution which does what I want: prefix all app based requests with a subdomain eg. http://data.example.com/. It's then really simple to check the host header for that subdomain: if present send the resource else send the index page.
I have a flash application that uses a large set (~1.5MB) of data. This data is likely to stay the same for a long time so I would like to use a caching method. The data should stay cached even if the user closes his browser (and restarts his computer).
At the moment, I'm using javascript files that are dynamically created and contain the data that will be transfered to flash later on. The server checks the If Modified since argument and returns a Not Modified if possible.
This method has the drawback that I still have to wait for the request to finish - I would like to rely on the old data while everything is set up and check for a new version later on.
tldr:
Is there a possibility to store data in a local cache (in the browser or my flash application) so that it isn't deleted when the browser is closed and is available without another request to the server?
You can use web storage.
I have stored more than 300 records for the same domain without problems in localStorage.
Here is a good document about web storage http://diveintohtml5.info/storage.html
I have never used it from flash but I found this at github https://github.com/shoito/as3webstorage
Is it possible for client side to store a local copy of the webpages after a server-side request, without manually saving the webpage (right click + save as...)?
I have multiple clients that displays a loop of webpages coming from the server. Each page have different media files such as images and swfs. As an alternative to the default cache mechanism, I would like for the client side, during the first load of webpages from the server, to store a copy of the webpages to the local of client side. That way I can reduce the requests coming from the clients every tiem the loop loads a web page request.
Whenever there are changes in the content, the server would tell the client side to request for the pages again and overwrite the local copy of the client side.
Well, you could do something like this:
localStorage['this_page'] = document.querySelector('html').innerHTML;
This will, of course, only work in modern browsers that support localStorage. There's no other browser API that will offer a way to store large amounts of data. Cookies are too small. You could use window.name as an alternative, but that's more a hack than anything else.
document.querySelector can, of course, be replaced with document.getElementsByTagName('html')[0], but just use querySelector since its supported everywhere localStorage is.
As far as i know it's impossible to achieve the following, but only an expert can confirm this:
I've got page number 1 that request for some user and application data as soon as the page loads, page number 2 uses the same script and it would be wasteful to request for the same info.
I know that the browsers caches the script, my question is if it caches the execution (data) as well.
The pages don't share the same layout, so it is not possible to make page number 2 be reloaded via ajax.
The browser doesn't automatically cache the result of the script (that would be seriously weird), but you can, by setting (and checking for) cookies, using the new local storage stuff on modern browser, etc. Note with cookies, though, that they're sent to the server on every request, so result in increased size of requests; if you can use local storage, do.
You can "cache" your data, if you use some kind of client side storage like localStorage (see MDN docu for more details).
The Browser itself may also cache your request internally as the ajax request is no different from any other request made by the browser (html docs, images, etc.). So depending on your exact request (including all parameters) the Browser may actually use a cached version of your request to avoid unnecessary calls. Here, however, the usual restrictions and properties of caching apply, so you can not rely on that behaviour!
Browser will not cache your data automatically if your "page" is a new URL.
But it is certainly possible for you to implement it in several ways
One is to use local storage in new browsers that support HTML5
Another is to write your app such that it is a single page with multiple views and transitions
Use AJAX to replace portions of your page (views).
This technique is becoming increasingly popular.
I highly recommend reading "Javascript Web Applications" by Alex MacCaw to understand javascript MVC and how to use javascript to create a client side (browser based) controller and views and manage caching, state etc in the browser. Also look at frameworks like backbone.js
http://www.amazon.com/JavaScript-Web-Applications-Alex-MacCaw/dp/144930351X/ref=sr_1_1?s=books&ie=UTF8&qid=1332771002&sr=1-1
I would avoid caching the data, except if there's serious performance problems (and, then, rather eliminate the performance problems than caching it). It's premature optimization.
When having the data cached, all kind of scenarios (stale data, deleted data) must be considered (except if the data is static, but then, it's not relevant anyways).
Imagine that your web application maintains a hit counter for one or multiple pages and that it also aggressively caches those pages for anonymous visitors. This poses the problem that at least the hitcount would be out of date for those visitors because although the hitcounter is accurately maintained on the server even for those visitors, they would see the old cached page for a while.
What if the server would continue to serve them the cached page but would pass the updated counter in a non-persistent http cookie to be read by a piece of javascript in the page that would inject the updated counter into the DOM.
Opinions?
You are never going to keep track of the visitors in this manner. If you are aggressively caching pages, intermediate proxies and browsers are also going to cache your pages. And so the request may not even reach your server for you to track.
The best way to do so would be to use an approach similar to google analytics. When the page is loaded, send an AJAX request to the server. This ajax request would increment the current counter value on the server, and return the latest value. Then the client side could could show the value returned by the server using javascript.
This approach allows you to cache as aggressively as you want without losing the ability to keep track of your visitors.
you can also get the page programmatically via asp or php out the cache yourself and replace the hitcounter.