How to ensure data consistency in REST API pagination

How to ensure data consistency in REST API pagination - javascript

I'm building an instant messenger on mobile client that interacts with RESTful API through HTTP requests. The pagination endpoint is quite standard - it has starting location (offset) and number of items in a page (limit). I'm having trouble figuring out how to ensure 100% data consistency with pagination when the database can rapidly change.
For example, with some dozen participants, there could be a dozen new messages in a conversation within a second. I don't think it's far-fetched to guess that some of those messages can alter the database within the time the HTTP request for pagination comes back from the server. Fortunately, since this is a messenger I do not have to consider the possibility of data deletion and consider only the data addition.
Among my research, following two links were quite helpful but didn't provide clear solution:
How to ensure data integrity in paginated REST API?
How to implement robust pagination with a RESTful API when the resultset can change?
The only potential solution I can come up with is using the timestamp of the last object in the previously fetched page. So the HTTP query would have timestamp as a parameter, and the server would return a page of objects created after that timestamp.
Is there any potential problem I'm not seeing, or even better, a much better solution to this issue?

It seems that the method I've thought of has a name - cursor based pagination.
The link below has a great graphical description and explanation, plus an example in php.
http://www.sitepoint.com/paginating-real-time-data-cursor-based-pagination/
There's also a helpful guide from Django Framework that compares two different pagination techniques (LimitOffsetPagination and CursorPagination).
http://www.django-rest-framework.org/api-guide/pagination/
Cursor based pagination requires a unique, unchanging ordering of items. Facebook and Twitter use some generated IDs. As for me, I've decided to simply use timestamp at object creation, as it supports up to milliseconds precision. That should be good enough for now.

Related

Caching query results, to do or not to do, overkill or performance energizer?

Good evening,
my project uses the MEAN Stack and has a few collections and a single database from which the data is retrieved.
Thinking about how the user would interface itself with the webapp I am going to build, I figured that my idea of the application is quite a bit of a waste.
Now, the application is hosted on a private server on the LAN, making it very fast on requests and it's running an express server.
The application is made around employee management, services and places where the services can take place. Just describing, so to have an idea.
The "ring to rule them all" is pretty much the first collection, services, which starts the core of the application. There's a page that let's you add rows, one for each service that you intend to do and within that row you choose an employee to "run the service", based on characteristics that this employee has, meaning that if the service is about teaching Piano, the employee must know how to play Piano. The same logic works for the rest of the "columns" that will build up my row into a full service recognized by the app as such.
Now, what I said above is pretty much information retrieval from a database and logic to make the application model the information retrieved and build something with it.
My question or rather my doubt comes from how I imagined the querying to work for each field that is part of the service row. Right now I'm thinking about querying the database (mongodb) each time I have to pick a value for a field, but if you consider that I might want to add a 100 rows, each of which would have 10 fields, that would make up for a lot of requests to the database. Now, that doesn't seem elegant, nor intelligent to me, but I can't come up with a better solution or idea.
Any suggestions or rule of thumbs for a MEAN newb?
Thanks in advance!
EDIT: Answer to a comment question which was needed.
No, the database is pretty static (unless the user willingly inserts a new value, say a new employee that can do a service). That wouldn't happen very often. Considering the query that would return all the employees for a given service, those employees would (ideally) be inside an associative array, with the possibility to be "pop'd" from it if chosen for a service, making them unavailable for further services (because one person can't do two services at the same time). Hope I was clear, I'm surely not the best person at explaining oneself.

It would query the database on who is available when a user looks at that page and another query if the user assigns an employee to do a service.
In general 1 query on page load and another when data is submitted is standard.
You would only want to use an in memory cache for
frequent queries but most databases will do this automatically.
values that change frequently like:
How many users are connected
Last query sent
Something that happens on almost every query (>95%)

"fragmenting" HTTP requests

I have an Angular app pulling data from a REST server. Each item we pull has some "core" data - what's needed to display it's basic representation - and then what I call "secondary" data, comments and other things that the user might want to see and might not.
I'm trying to optimize our request pattern to minimize the overall amount of time the user spends looking at a loading spinner: Pulling all (core/secondary) data at once causes the initial request to return far too slowly, but pulling only the bare essentials until the user asks for something we haven't requested yet also creates unnecessary load time, at least inasmuch as I could've anticipated them wanting to see it and loaded it while they were busy reading the core content.
So, right now I'm doing a "core content" pull first and then initiating a "secondary" pull at the end of the success callback from the first. This is going to be an experimental process, but I'm wondering what (if any) best practices have been established in this situation. (I'm sure a good answer to that is a google away, but in this instance I'm not quite sure what to google - thus the quotation marks in this question's title)
A more concrete question: Am I better off initiating many small HTTP transactions or a few large ones? My instinct is to do many small ones, particularly if I can anticipate a few things the user is likeliest to want to see first and get those loaded as soon as possible. But surely there's an asymptote here? Or am I off-base in this line of thinking entirely?

I use the same approach as you, and it works pretty well for a many-keyed, 10,0000+ collection.
The collection is paginated with ui.bootstrap.pagination, only a maximum of 10 items are displayed at once. It can be searched on title.
So my approach is to retrieve only id and title, for the whole collection, so the search can be used straight away.
Then, as the items displayed on screen are in an array, I place a $watch on that array. The job of the $watch is to go fetch full details of the items in the array (secondary pull), but of course only when the array is changed.
So, in the worst case scenario, you are pulling the full details of only 10 items.
Results are cached for more efficiency. It displays instant results, as the $watch acts as a pre-loader.
Am I better off initiating many small HTTP transactions or a few large ones?
I believe large transactions, for just a few items (the ones which are clickable on the screen) are very efficient.
Regarding the best practice bit: I suppose there are many ways to achieve your goals; however, the technique you are using works extremely well, as it retrieves only what is needed, and only just before it is needed.
Besides, it is simple enough to implement.
Also, like you I would have thought many smaller pulls were surely better than several large ones. However, I was advised to go for a large pull as a comment to this question: Fetching subdocuments with angular $http

To answer you question about which keywords to search for, I suggest:
progressive loading
An alternative could be using websockets and streaming loading: Oboe.js does this quite well:
http://oboejs.com/examples

Saving and updating the state of an AJAX based graph in Rails

I have a Google Chart's ColumnChart in a Rails project. This is generated and populated in JavaScript, by calling a Rails controller action which renders JSON.
The chart displays a month's worth of information for a customer.
Above the chart I have next and previous arrows which allow a customer to change the month displayed on the chart. These don't have any functionality as it stands.
My question is, what is the best way to save the state of the chart, in terms of it's current month for a customer viewing the chart.
Here is how the I was thinking of doing the workflow:
One of the arrows is selected.
This event is captured in JavaScript.
Another request to the Rails action rendering JSON is performed with an additional GET parameter
passed, based on an data attribute of the arrow button (Either + or - ).
The chart is re-rendered using the new JSON response.
Would the logic around incrementing or decrementing the graphs current date be performed on the server side? With the chart's date being stored in a session array defaulting to the current date on first load?
On the other hand would it make sense to save the chart state on the client side within the JavaScript code or in cookie, then manipulate the date before it's sent to the Rails controller?
I've been developing with Rails for about 6 months and feel comfortable with it, but have only just recently started developing with JavaScript, using AJAX. My experience tying JS code together with Rails is some what limited at this point, so looking for some advice/best practices about how to approach this.
Any advice is much appreciated.

I'm going to go through a couple of options, some good, some bad.
First, what you definitely don't want to do is maintain any notion of what month you are in in cookies or any other form of persistent server-side storage. Certainly sometimes server state is necessary, but it shouldn't be used when their are easy alternatives. Part of REST (which Rails is largely built around) is trying to represent data in pure attributes rather than letting it's state be spread around like that.
From here, most solutions are probably acceptable, and opinion plays a greater role. One thing you could do is calculate a month from the +/- sign using the current month and send that to the server, which will return the information for the month requested.
I'm not a huge fan of this though, as you have to write javascript that's capable of creating valid date ranges, and most of this functionality will probably be on the server already. Just passing a +/- and the current month to the server will work as well, you'll just have to do a bit of additional routing and logic to resolve the sign on the server to a different month.
While either of these would work, my preferred solution would instead have the initial request for the month generate valid representations of the neighbouring months, and returning this to the client. Then, when you update the graph with the requested data, you also replace the forward/backward links on the graph with the ones provided by the server. This provides a nice fusion of the benefits of the prior two solutions - no additional routing on the server, and no substantive addition to the client-side code. Also, you have the added benefit of being able to grey out transitions to months where no data was collected from the client (i.e. before they were a customer and the future). Without this, you'd have to create separate logic to handle client requests for information that doesn't exist, which is extra work for you and more confusion for the customer.

Dynamic filtering, am I doing it wrong?

So I have an umbraco site with a number of products in it that is content managed, I need to search/filter this dataset on the front end based on 5 criteria.
I'd estimate I will have 300 products. I need to filter this data very fast and hide show options that are no longer relevant based on the previous selections.
I'm currently building a webservice and jquery implementation using AJAX.
Is the best way to do this to load it into a javascript data structure and operate on it there or will AJAX calls be fast enough? Obviously this will mean duplicating the functionality on the server side for non-javascript users.

If you need to filter the data "very fast" then I imagine the best way is to preload all the data then manipulate it client side. If you're waiting for an Ajax response every time the user needs to filter the data then it's not going to be as fast as filtering it on the client (assuming they haven't got an ancient computer running IE6).
It would depend on the complexity of your filtering. If all your doing is showing results where, for example, the product's price is greater than $10, then that will definitely be much faster. If you're going to be doing complex searches then it's possible that it could be faster to process serverside. The other question is how much data is saved for each product - preloading a few hundred products with a lot of data may take some time.
As always, the only way you'll truly be able to answer this question is by profiling the two solutions.

How to improve performance of Jquery autocomplete

I was planning to use jquery autocomplete for a site and have implemented a test version. Im now using an ajax call to retrieve a new list of strings for every character input. The problem is that it gets rather slow, 1.5s before the new list is populated. What is the best way to make autocomplete fast? Im using cakephp and just doing a find and with a limit of 10 items.

This article - about how flickr does autocomplete is a very good read. I had a few "wow" experiences reading it.
"This widget downloads a list of all
of your contacts, in JavaScript, in
under 200ms (this is true even for
members with 10,000+ contacts). In
order to get this level of
performance, we had to completely
rethink how we send data from the
server to the client."

Try preloading your list object instead of doing the query on the fly.
Also the autocomplete has a 300 ms delay by default.
Perhaps remove the delay
$( ".selector" ).autocomplete({ delay: 0 });

1.5-second intervals are very wide gaps to serve an autocomplete service.
Firstly optimize your query and db
connections. Try keeping your db connection
alive with memory caching.
Use result caching methods if your
service is highly used to ignore re-fetchs.
Use a client cache (a JS list) to keep the old requests on the client. If user types back and erases, it is going to be useful. Results will come from the frontend cache instead of backend point.
Regex filtering on the client side wont be costly, you may give it a chance.

Before doing some optimizations you should first analyze where the bottle-neck is. Try to find out how long each step (input → request → db query → response → display) takes. Maybe the CakePHP implementation has a delay not to send a request for every character entered.

Server side on PHP/SQL is slow.
Don't use PHP/SQL. My autocomplete written on C++, and uses hashtables to lookup. See performance here.
This is Celeron-300 computer, FreeBSD, Apache/FastCGI.
And, you see, runs quick on huge dictionaries. 10,000,000 records isn't a problem.
Also, supports priorities, dynamic translations, and another features.

The real issue for speed in this case I believe is the time it takes to run the query on the database. If there is no way to improve the speed of your query then maybe extending your search to include more items with a some highly ranked results in it you can perform one search every other character, and filter through 20-30 results on the client side.
This may improve the appearance of performance, but at 1.5 seconds, I would first try to improve the query speed.
Other than that, if you can give us some more information I may be able to give you a more specific answer.
Good luck!

Autocomplete itself is not slow, although your implementation certainly could be. The first thing I would check is the value of your delay option (see jQuery docs). Next, I would check your query: you might only be bringing back 10 records but are you doing a huge table scan to get those 10 records? Are you bringing back a ton of records from the database into a collection and then taking 10 items from the collection instead of doing server-side paging on the database? A simple index might help, but you are going to have to do some testing to be sure.

Develop Reference

JavaScript is the programming language of the Web.