How to cache results for autosuggest component? - javascript

I have a UI autosuggest component that performs an AJAX request as user types. For example, if user types mel, the response could be:
{
suggestions: [{
id: 18,
suggestion: 'Melbourne'
}, {
id: 7,
suggestion: 'East Melbourne'
}, {
id: 123,
suggestion: 'North Melbourne'
}]
}
The UI component implements client side caching. So, if user now clicks b (results for melb are retrieved), and then Backspace, the browser already has results for mel in memory, so they are immediately available. In other words, every client makes at most one AJAX call for every given input.
Now, I'd like to add server side caching on top of this. So, if one client performs an AJAX call for mel, and let's say there is some heavy computation going on to prepare the response, other clients would be getting the results without executing this heavy computation again.
I could simply have a hash of queries and results, but I'm not sure that this is the most optimal way to achieve this (memory concerns). There are ~20000 suggestions in the data set.
What would be the best way to implement the server side caching?

You could implement a simple cache with an LRU (least recently used) discard algorithm. Basically, set a few thresholds (for example: 100,000 items, 1 GB) and then discard the least recently used item (i.e., the item that is in cache but was last accessed longer ago than any of the other ones). This actually works pretty well, and I'm sure you can use an existing Node.js package out there.
If you're going to be building a service that has multiple frontend servers, it might be easier and simpler to just set up memcached on a server (or even put it on a frontend server if you have a relatively low load on your server). It's got an extremely simple TCP/IP protocol and there's memcached clients available for Node.js.
Memcached is easy to set up and will scale for a very long time. Keeping the cache on separate servers also has the potential benefit of speeding up requests for all frontend instances, even the ones that have not received a particular request before.
No matter what you choose to do, I would recommend keeping the caching out of the process that serves the requests. That makes it easy to just kill the cache if you have caching issues or need to free up memory for some reason.

(memory concerns). There are ~20000 suggestions in the data set.
20,000 results? Have you thought about home much memory that will actually take? My response is assuming you're talking about 20,000 short strings as presented in the example. I feel like you're optimizing for a problem you don't have yet.
If you're talking about a reasonably static piece of data, just keep it in memory. Even if you want to store it in a database, just keep it in memory. Refresh it periodically if you must.
If it's not static, just try and read it from the database on every request first. Databases have query caches and will chew through a 100KB table for breakfast.
Once you're actually getting enough hits for this to become an actual issue, don't cache it yourself. I have found that if you actually have a real need for a cache other people have written it better than you would have. But if you really need one, go for an external one like Memcached or even something like Redis. Keeping that stuff external can makes testing and scalability a heap easier.
But you'll know when you actually need a cache.

Related

Rest api design with relational data

So this question is less of a problem I have and more of a question about how I should go about implementing something.
lets imagine for example I have a User and a Resource, and a User can have multiple Resource but a Resource can have only 1 User. How should you go about creating api endpoints for interacting with this data?
should it be something like
// POST /api/users/resource (to create a resource)
or something like
// POST /api/resource
thats just one example, but there is a lot of questions like that, that come to mind when im thinking about this.
it would be nice if someone who knew what is the right approach (or just a good approach) would give an example on how you would structure api endpoints with relational data like this.
any and all help is appreciated thanks!
I would go with the latter one. The reason for doing that would be the endpoint /api/resource does not bind us to create resources with respect to the user. Down the line in future, we could create resources for Supplier (a hypothetical example). Thus having better flexibility and not needing to change the endpoint for Supplier.
Part of the point of REST is that the server's implementation of a resource is hidden behind the uniform interface. In a sense, you aren't supposed to be able to tell from the resource identifiers whether or not you are dealing with "relational data".
Which is freeing (because you get to design the best possible resource model for your needs); but also leads to analysis-paralysis, because the lack of constraints means that you have many options to choose from.
POST /api/users/resource
POST /api/resource
Both of these are fine. The machines are perfectly happy to carry either message. If you wanted to implement an API that supported both options, that would also be OK.
So how do we choose?
The answer to this really has two parts. The first relates to understanding resources, which are really just generalizations of documents. When we ask for a document on the web, one of the things that can happen is that the document can be cached. If we are sending a message that we expect to modify a document, then we probably want caches to invalidated previously cached versions of the document.
And the primary key used to identified cached documents? The URI.
In the case where we are sending a message to a server to save a new document, and we expect the server to choose its own identifier for its copy of the new document, then one logical choice of request target is the resource that is the index of documents on the server.
This is why you will normally see CreateItem operations implemented as POST handlers on a Collection resource - if the item is successfully added, we want to invalidate previously cached responses to GET /collection.
Do you have to do it that way? No, you do not - it's a "trade off"; you weigh the costs and benefits of the options, and choose one. If you wanted to instead have a separate resource for the CreateItem operation, that's OK too.
The second part of the answer relates to the URI - having decided which document should be handling the requests, what spelling should we use for the identifier of that document.
And, once again, the machines don't care very much. It needs to be RFC 3986 compliant, and you'll save yourself a lot of bother if you choose a spelling that works well with URI Templates, but that still leaves you with a lot of freedom.
The usual answer? Think about the people, who they are, and what they are doing when they are looking at a URI. You've got visitors looking at a browser history, and writers trying to document the API, and operators reading through access logs trying to understand the underlying traffic patterns. Pick a spelling that's going to be helpful to the people you care about.

How to avoid dog-pile effect at Node.js & MongoDB & Redis stack?

When some cached value is expired or new cache will be generated for any reason and we have a huge traffic at the time of no cache exists, there will be a heavy load on MongoDB and response time significantly increases. This is typically called "Dog-pile effect". Everything works well after cache is created.
I know that it's a very common problem which applies to all web applications using a database & cache system.
What should one do to avoid dog-pile effect at a Node.js & MongoDB & Redis stack? What are best practices and common mistakes?
One fairly proven way to keep the dogs from piling up is to keep a "lock" (e.g. in Redis) that prevents the cache populating logic from firing up more than once. The first time that the fetcher is called (for a given piece of content), the lock is acquired (for it) and set to expire (e.g. with SET ... NX EX 60). Any subsequent invocation of the fetcher for that content will fail on getting the lock thus only one dog gets to the pile.
The other thing you may want to put into place is some kind of rate limiting on the fetcher, regardless the content. That's also quite easily doable with Redis - feel free to look it up or ask another question :)
Id just serve expired content until new content is done caching so that database wont get stampede.

Meteor server-side update/insert vs client side update/insert

I have a question about the advantages vs disadvantages of an update/insert of a collection on the client vs server. For example say I have the method which takes a current player, sets him/her no longer as the current player and then creates a new current player.
Meteor.methods({
currentPlayer : function () {
var id = Player.findOne({current:true})._id;
Player.update(id, {$set:{current:false}});
Player.insert({current:true});
...
What would be the advantages to doing this on the server vs doing the exact same thing on the client side:
'click #add' : function () {
var id = Player.findOne({current:true})._id;
Player.update(id, {$set:{current:false}});
Player.insert({current:true});
...
Maybe there aren't any inherently important differences or advantages to either technique. However if there is I would like to be aware of them. Thanks for your input!
I think Akshat has some great points. Basically there isn't a lot of difference in terms of latency compensation if you define the method on both the client and the server. In my opinion, there are a couple of reasons to use a method:
The operation can only be completed on the server or it results in some side effect that only makes sense on the server (e.g. sending an email).
You are doing an update and the permissions for doing the update are complex. For example maybe only the leader of a game can update certain properties of the players. Cases like that are extremely hard to express in allow/deny rules, but are easy to write using methods.
Personally, I prefer using methods in large projects because I find it's easier to reason about state mutations when all of the changes are forced to funnel through a small set of functions.
On the other hand, if you are working on a smaller project that doesn't have a lot of complex update rules, doing direct collection mutations may be a bit faster to write.
The main difference is latency compensation.
Under the hood, Player.update/insert/remove, uses a Meteor.call anyway. The difference is that it simulates the result of a successful operation on the browser before it has happened.
So say your server is somewhere on the other side of the world where it has a 2-3 second latency. If you update your player using Player.insert/update it would reflect instantly as if it was inserted and updated. This can be make the UI feel responsive.
Using a Meteor.methods waits for the server to send an updated record, meaning when you update something it would take the 2-3 seconds to reflect on your UI.
Using the method's you can be sure that the data has been inserted on the server at the cost of UI responsiveness. (You could also use the Player.insert & Player.update callbacks for this too.
With Meteor.methods you can also simulate this same latency compensation effect by doing the same Meteor.method on the client side with the code that you would like to run to simulate latency compensation.
There's a bit more details on the specifics on how to do this at the docs: http://docs.meteor.com/#meteor_methods

Sending large number of requests to server on node.js

I have the following:
var tempServer=require("./myHttp");
tempServer.startServer(8008,{'/fun1':fun1 , '/fun2': fun2 , '/fun3':fun3 , '/fun4':fun4},"www");
which creates a server on localhost:8008.
and if i type in the url line in my browser the following:
localhost:http://localhost:8008/fun2
it will call fun2 function and will perform what is needed.
having said that,
how can i write a script function (fun2) which simulates a call for a large number
of requests (say 10000) to tempServer?
any help will be much appreciated!
You could try this on your laptop, but you would find that your computer will not really be able to create 10,000 requests to a temp server (or anyplace) at the same time. Most modern machines top out around 1000 or so simultaneous connections.
You can create something in code that will certainly do this one connection after another, but this is not the same as a true load test where you ramp up and ramp down the number of requests per second. You will quickly find that this is IO and connection-bound and not a true test for your application.
Your best bet, if you are trying to benchmark/stress your server, is to put your server on a box or service that is internet accessible like nodejitsu or nodester (both free services) and then use another service (free or otherwise) to hit your server's URL. You don't say if this is what you're doing, but this is the usual way to do load and stress testing.
Two companies that I have used in the past are Load Impact and Blitz.io to get the number of simultaneous user requests to your server. for 10,000 users you will need to pay a bit but it's not too large a fee.
You may also want to look into the libraries from New Relic to help you monitor the server/service itself and how it behaves under stress. They have some great capabilities in helping find bottlenecks in applications.
This may be a bit more than what you were looking for, but I hope that it's helpful. If I am off the mark, let me know and I'll be happy to amend this to be closer to what it is you are looking for.
Are you interested in scripting 10,000 calls to fun2 via http2 or
issuing 10,000 requests from fun2 within node?
If it is within Node and as you probably know all the code is written as sequential but
the events are executed asynchronously.
Check the EventEmitter for dispatching function calls on evens:
http://nodejs.org/docs/v0.4.7/api/all.html#events.EventEmitter
and see this example as an inspiration for issuing calls in parallel:
http://ricochen.wordpress.com/2011/10/15/node-js-example-2-parallel-processing/
Hope this helps.
Edmon

Save or destroy data/DOM elements? Which takes more resources?

I've been getting more and more into high-level application development with JavaScript/jQuery. I've been trying to learn more about the JavaScript language and dive into some of the more advanced features. I was just reading an article on memory leaks when i read this section of the article.
JavaScript is a garbage collected language, meaning that memory is allocated to objects upon their creation and reclaimed by the browser when there are no more references to them. While there is nothing wrong with JavaScript's garbage collection mechanism, it is at odds with the way some browsers handle the allocation and recovery of memory for DOM objects.
This got me thinking about some of my coding habits. For some time now I have been very focused on minimizing the number of requests I send to the server, which I feel is just a good practice. But I'm wondering if sometimes I don't go too far. I am very unaware of any kind of efficiency issues/bottlenecks that come with the JavaScript language.
Example
I recently built an impound management application for a towing company. I used the jQuery UI dialog widget and populated a datagrid with specific ticket data. Now, this sounds very simple at the surface... but their is a LOT of data being passed around here.
(and now for the question... drumroll please...)
I'm wondering what the pros/cons are for each of the following options.
1) Make only one request for a given ticket and store it permanently in the DOM. Simply showing/hiding the modal window, this means only one request is sent out per ticket.
2) Make a request every time a ticket is open and destroy it when it's closed.
My natural inclination was to store the tickets in the DOM - but i'm concerned that this will eventually start to hog a ton of memory if the application goes a long time without being reset (which it will be).
I'm really just looking for pros/cons for both of those two options (or something neat I haven't even heard of =P).
The solution here depends on the specifics of your problem, as the 'right' answer will vary based on length of time the page is left open, size of DOM elements, and request latency. Here are a few more things to consider:
Keep only the newest n items in the cache. This works well if you are only likely to redisplay items in a short period of time.
Store the data for each element instead of the DOM element, and reconstruct the DOM on each display.
Use HTML5 Storage to store the data instead of DOM or variable storage. This has the added advantage that data can be stored across page requests.
Any caching strategy will need to consider when to invalidate the cache and re-request updated data. Depending on your strategy, you will need to handle conflicts that result from multiple editors.
The best way is to get started using the simplest method, and add complexity to improve speed only where necessary.
The third path would be to store the data associated with a ticket in JS, and create and destroy DOM nodes as the modal window is summoned/dismissed (jQuery templates might be a natural solution here.)
That said, the primary reason you avoid network traffic seems to be user experience (the network is slower than RAM, always). But that experience might not actually be degraded by making a request every time, if it's something the user intuits involves loading data.
I would say number 2 would be best. Because that way if the ticket changes after you open it, that change will appear the second time the ticket is opened.
One important factor in the number of redraws/reflows that are triggered for DOM manipulation. It's much more efficient to build up your content changes and insert them in one go than do do it incrementally, since each increment causes a redraw/reflow.
See: http://www.youtube.com/watch?v=AKZ2fj8155I to better understand this.

Categories

Resources