Browser-based caching for remote resources

Browser-based caching for remote resources - javascript

I have two REST-ful resources on my server:
/someEntry/{id}
Response:
{
someInfoAboutEntry: ...,
entryTypeUrl: "/entryType/12345"
}
and
/entryType/{id}
Response:
{
someInfoAboutEntryType: ...
}
The entryTypeUrl is used to fetch additional data about the type of this entry from the different URL. It will be bound to some "Detailed information" button near each entry. There can be many (let's say 100) entries, while there are only 5 types (so most entries point to same entryTypeUrl.
I'm building a Javascript client to access those resources. Should I cache entryType results in my Javascript code, or should I rely on the browser to cache the data for me and dispatch XHR requests every time user clicks the "Detailed information" button?
As far as I see it, both approaches should work just fine. The second one (always dispatching requests) will result in clearer code though. Should I stick to it, or are there some points I'm not aware of?
Thanks in advance.

I would definitely let the browser manage the caching, rather than writing a custom caching layer yourself.
This way you have less code to write and maintain, and you allow the server to dictate (via its HTTP headers) whether the response should be cached or not. If you write your own caching code you remove the ability to refetch stale data - which you would get for free from the browser.

Related

Race conditions during simultaneous link click and asynchronous AJAX request?

I'm currently facing a situation similar to the relatively-simple example shown below. When a user clicks on a link to a third-party domain, I need to capture certain characteristics present in the user's DOM and store that data on my server. It's critical that I capture this data for all JS-enabled users, with zero data loss.
I'm slightly concerned that my current implementation (shown below) may be problematic. What would happen if the external destination server was extremely fast (or my internal /save-outbound-link-data endpoint was extremely slow), and the user's request to visit the external link was processed before the internal AJAX request had enough time to complete? I don't think this would be a problem (because in this situation, the browser doesn't care about receiving a response from the AJAX request), but getting some confirmation from fellow developers would be much appreciated.
Also, would the answer to the question above vary if the <a> link pointed to an internal URL rather than an external one?
<script type="text/javascript">
$(document).ready(function() {
$('.record-outbound-click').on('click', function(event) {
var link = $(this);
$.post(
'/save-outbound-link-data',
{
destination: link.attr('href'),
category: link.data('cat')
},
function() {
// Link tracked successfully.
}
);
});
});
</script>
<a href="http://www.stackoverflow.com" class="record-outbound-click" data-cat="programming">
Visit Stack Overflow
</a>
Please note that using event.preventDefault(), along with window.location.href = var.attr('href') inside $.post's success callback, isn't a viable solution for me. Neither is sending the user to a preliminary script on my server (for instance, /outbound?cat=programming&dest=http://www.stackoverflow.com), capturing their data, and then redirecting them to their destination.

Edit 2
Also consider the handshake step (Google's docs):
Time it took to establish a connection, including TCP handshakes/retries and negotiating a SSL.
I don't think you and the server you're sending the AJAX request to can complete the handshake if your client is no longer open for connection to the server (i.e., you're already at Stackoverflow or whatever website your link navigates to.)
Edit 1
More broadly, though, I was hoping to understand from a theoretical point of view whether or not the risk I'm concerned about is a legitimate one.
That's an interesting question, and the answer may not be as obvious as it seems.
That's just a sample request/response in my network tab. Definitely shouldn't be thought of to be used as any sort of trend or representation for general requests/responses.
I think the gap we might be most concerned with is the 1.933ms stall time. There's also other additional steps that need to happen before the actual request is sent (which itself was about 0.061ms).
I'd be worried if there's an interruption in any of the 3 steps leading up to the actual request (which took about 35ms give or take).
I think the question is, if you go somewhere else before the "stalled", "DNS Lookup", and "Initial connection" steps happen, is the request still going to be sent? That part, I don't know. But what about any general computer or browser lag beforehand?
Like you mentioned, the idea that somehow the req/res cycle to/from Stackoverflow would be faster than what's happening on your client (i.e., the initiation itself -- not even the complete cycle -- of a network request to your server) is probably a bit ridiculous, but I think theoretically (as you mentioned, this is what you're interested in), it's probably a bad idea in general to depend on these types of race conditions.
Original answer
What about making the AJAX request synchronous?
$.ajax({
type: "POST",
url: url,
async: false
});
This is generally a terrible idea, but if, in your case, the legacy code is so limiting that you have no way to modify it and this is your last option (think, zombie apocalypse), then consider it.
See jQuery: Performing synchronous AJAX requests.
The reason it's a bad idea is because it's completely blocking (in normal circumstances, you don't want potentially un-completeable requests blocking your main thread). But in your case, it looks like that's actually exactly what you want.

REST API, tracking changes in multiple resources, front-end synchronization

I have a system with quite complex business logic, so far I have around 10-15 database tables (resources), and this number is growing. The front-end for user is angularjs single page application. The problem is communication with back-end and keeping angular front-end sychronized with back-end data.
Back-end keeps all resources and relationships between them, this is obvious. And front-end fetches those resources and keeps copy of them locally to make interface much more responsive for user and avoid fetching data at every request. And this is awesome.
Server-side has many operations which affect many resources at once. What it means is that adding/removing/editing one resource (via REST api) can modify a lot of other resources.
I want front-end app data to be always fully synchronized with back-end data. This allows me to keep data integrity and keep my application bug-free. Any kind of desynchronization is a big "no no", it introduces hundreds of places where undefined behaviours could possibly occur in my front-end app.
The question is: what is the best way to achieve that? My ideas/insights:
Business logic (modifying/editing/deleting resources, managing relationships, keeping data integrity) must be implemented only once. Doubling business logic implementation (one in front-end and one in back-end) introduces a lot of potential bugs and involves code duplication which is obviously a bad thing. If business logic was implemented in front-end, back-end still would have to validate data and keep their integrity - duplication of business logic. So, the business logic MUST be in back-end, period.
I use REST API. When my front-end updates one resource (or many resources via PATCH method), a lot of side-effects happen in server-side, other resources get modified too. I want my front-end angular app to know WHICH resources got modified and update them (to keep full synchronization). REST returns only the resource which was originally requested to update, without other affected resources.
I know that that I could use some form of linking resources, and to send my original updated resource with links to other affected resources. But what if there are 100 of them? Making 100 requests to server is total performance kill.
I am not very attached to REST, because my API is not public, it could be anything. I think that the best solution would be back-end sending back ALL modified resources. This would allow my front-end to always be in sync with backend, would be fast and would be atomic (no invalid intermediate state between multiple requests to server). I think that this architecture would be awesome. The question is: is this a common approach? Are there any protocols / standards / libs allowing me to do this? We could write it from scratch, but we don't want to reinvent the wheel.
Actually, I think that having business logic in front-end and back-end would be good, but ONLY if it would be implemented once. This means Javascript back-end application. Unfortunately, at the time being, this is not possible solution for me.
Any insight will be welcome!
Added backbone.js tag, because question is much more about architecture than any specific technology.

You're on the right track and it is a common problem you're facing right now. As you said, in a REST world your API returns the requested / changed resource. A simple example of your problem:
You - as user X - want to follow another user Y. The front end displays your own following counter (X) and the follower counter of the other user (Y). The http call would be something like:
PUT /users/X/subscribe/Y
The API would return the user Y resource but X is missing, or the other way around.
To handle this cases I use an extended structure of my standard API response structure, my standard structure is:
meta object - includes the http status code and an explanation why this code got used, which app server processed the response and more
notification object - includes information of errors during processing (if any), special messages for developers and more
resource - the resource which got requested / modified, the name of this attribute is the resource type in singular for single resources (e.g. user) or in plural for resource collections (e.g. users)
{
meta: {
status: 200,
message: 'OK',
appServer: app3
},
notification: {
errors: []
},
user: {
id: 3123212,
subscribers: 123,
subscriptions: 3234
}
}
In order to return also other affected resources und keeping the REST way + my static, standard response structure I attach one more object to the response called 'affectedResources' which is an array of all other affected resources. In this very easy example the array would include just the user X resource object. The front end iterates the array and takes care of all necessary changes front end wise.

How to clean chrome in-memory cache?

I'm developing an extension in chrome and I'm trying to perform an action each time a user searches in Google. Currently I'm using chrome.webRequest onBeforeRequest listener. It works perfectly most of the cases but some of the requests are done through the cache and doesn't perform any call. I've found this in the API documentation about caching:
Chrome employs two caches — an on-disk cache and a very fast in-memory cache. The lifetime of an in-memory cache is attached to the lifetime of a render process, which roughly corresponds to a tab. Requests that are answered from the in-memory cache are invisible to the web request API. If a request handler changes its behavior (for example, the behavior according to which requests are blocked), a simple page refresh might not respect this changed behavior. To make sure the behavior change goes through, call handlerBehaviorChanged() to flush the in-memory cache. But don't do it often; flushing the cache is a very expensive operation. You don't need to call handlerBehaviorChanged() after registering or unregistering an event listener.
I've tried using the handlerBehaviorChanged() method to empty the in-memory cache, but there was no difference. Although it's not recommended I've even tried to call it after every request.
This is my code:
chrome.webRequest.MAX_HANDLER_BEHAVIOR_CHANGED_CALLS_PER_10_MINUTES = 1000;
chrome.webRequest.onBeforeRequest.addListener(function (details) {
//perform action
chrome.webRequest.handlerBehaviorChanged();
} {
urls: ["*://*.google.com/*"]
});
Is there any way to empty/disable this in-memory cache from the extension?

I asume the "Caching" is performed by the Google-Website with some crazy JavaScript in Objects, Arrays,... so emptying the browser in Memory-Cache won't help.
My first thought was that the data was Stored in the sessionStorage (due to the fact that the Values had the search-term in them [here I searched for test] and are updated/created on every request/change of the selected "search-word"
)
I tried clearing the Sessionstorage (even periodicaly), but it didn't really change the "not"-loading, further more the storage was recreated and even without the storage, the different results were displayed.
Due to this Information and the fact that I can't check several 1000 lines of minfied JavaScript Code, I just can asume that the website does the caching of the requests. I hope this Information can point you in the right direction.

POST manipulation, Tamper Data and AJAX security issues

Frequently when I work on AJAX applications, I'll pass around parameters via POST. Certain parts of the application might send the same number of parameters or the same set of data, but depending on a custom parameter I pass, it may do something completely different (such as delete instead of insert or update). When sending data, I'll usually do something like this:
$.post("somepage.php", {action: "complete", somedata: data, moredata: anotherdata}, function(data, status) {
if(status == "success") {
//do something
}
});
On another part of the application, I might have similar code but instead setting the action property to deny or something application specific that will instead trigger code to delete or move data on the server side.
I've heard about tools that let you modify POST requests and the data associated with them, but I've only used one such tool called Tamper Data for Firefox. I know the chances of someone modifying the data of a POST request is slim and even slimmer for them to change a key property to make the application do something different on the backend (such as changing action: "complete" to action: "deny"), but I'm sure it happens in day to day attacks on web applications. Can anyone suggest some good ways to avoid this kind of tampering? I've thought of a few ways that consist of checking if the action is wrong for the event being triggered and validating that along with everything else, but I can see that being an extra 100 lines of code for each part of the application that needs to have these kinds of requests protected.

You need to authorize clients making the AJAX call just like you would with normal requests. As long as the user has the rights to do what he is trying to do, there should be no problem.
You should also pass along an authentication token that you store in the users session, to protect against CSRF.

Your server can't trust anything it receives from the client. You can start establishing trust using sessions and authentication (make sure the user is who she says she is), SSL/TLS (prevent tampering from the network) and XSRF protection (make sure the action was carried out from html that you generated) as well as care to prevent XSS injection (make sure you control the way your html is generated). All these things can be handled by a server-side framework of good quality, but there are still many ways to mess up. So you should probably take steps to make sure the user can't do anything overly destructive for either party.

Disable browser cache

I implemented a REST service and i'm using a web page as client.
My page has some javascript functions that performs several times the same http get request to REST server and process the replies.
My problem is that the browser caches the first reply and not actualy sends the following requests..
Is there some way to force the browser execute all the requests without caching?
I'm using internet explorer 8.0
Thanks

Not sure if it can help you, but sometimes, I add a random parameter in the URL of my request in order to avoid being cached.
So instead of having:
http://my-server:8080/myApp/foo?bar=baz
I will use:
http://my-server:8080/myApp/foo?bar=baz&random=123456789
of course, the value of the random is different for every request. You can use the current time in milliseconds for that.

Not really. This is a known issue with IE, the classic solution is to append a random parameter at the end of the query string for every request. Most JS libraries do this natively if you ask them to (jQuery's cache:false AJAX option, for instance)

Well, of course you don't actually want to disable the browser cache entirely; correct caching is a key part of REST and the fact that it can (if properly followed by both client and server) allow for a high degree of caching while also giving fine control over the cache expiry and revalidation is one of the key advantages.
There is though an issue, as you have spotted, with subsequent GETs to the same URI from the same document (as in DOM document lifetime, reload the page and you'll get another go at that XMLHttpRequest request). Pretty much IE seems to treat it as it would a request for more than one copy of the same image or other related resource in a web page; it uses the cached version even if the entity isn't cacheable.
Firefox has the opposite problem, and will send a subsequent request even when caching information says that it shouldn't!
We could add a random or time-stamped bogus parameter at the end of a query string for each request. However, this is a bit like screaming "THIS IS SPARTA!" and kicking our hard-won download into a deep pit that no Health & Safety inspector considered putting a safety rail around. We obviously don't want to repeat a full unconditional request when we don't need to.
However, this behaviour has a time component. If we delay the subsequent request by a second, then IE will re-request when appropriate while Firefox will honour the max-age and expires headers and not re-request when needless.
Hence, if two requests could be within a second of each other (either we know they are called from the same function, or there's the chance of two events triggering it in close succession) using setTimeout to delay the second request by a second after the first has completed will make it use the cache correctly, rather than in the two different sorts of incorrect behaviour.
Of course, a second's delay is a second's delay. This could be a big deal or not, depending primarily on the size of the downloaded entity.
Another possibility is that something that changes so rapidly shouldn't be modelled as GETting the state of a resource at all, but as POSTing a request for a current status to a resource. This does smell heavily of abusing REST and POSTing what should really be a GET though.
Which can mean that on balance the THIS IS SPARTA approach of appending random stuff to query strings is the way to go. It depends, really.

Develop Reference

JavaScript is the programming language of the Web.