Improve client-server data sync functionality with deltas

Improve client-server data sync functionality with deltas - javascript

The app
I have a web app that currently uses AppCache for offline functionality since users of the system need to create documents offline. The document is first created offline and when internet access is available, the user can click "sync" which will send the document to the server and save it as a revision. To be more specific, the app does not save the change delta as a revision (the exact field modified) but rather the whole document in its entirety. So in other words, a "snapshot" document is saved.
The problem
Users can login from different browsers and devices and work on their documents. When they click "sync", if the server's document is newer, the entire client's version will be overridden by the server's. This leads to one main issue that is depicted in the image below.
The scenario above occurs because of the current implementation which does not rely on deltas (small changes) and rather relies on snapshot revisions.
Some questions
1) My research indicates that I should be upgrading the "sync" mechanism to be expressed in deltas (small changes that can be applied independently). Is this a sound approach?
2) Should each delta be applied independently?
2) According to my research, revision deltas have a numeric value and not a timestamp. What should the value for this be exactly? How would I ensure both the server and the client agree on what the revision number should be?
Stack information
Angular on the frontend
IndexedDB to save documents locally (offline mode)
Postgres DB with JSONB in the backend

What your describing is a version control issue like in this question. The choice is yours with how to resolve. Here are a few examples of other products with this problem:
Google docs: A makes edit offline, B makes edit online, A goes online, Sync, Google Docs combines A and B's edits
Apple notes: Same as Google Docs
Git/Subversion: Throw an error, ask user to resolve conflicts
Wunderlist: Last edit overwrites previous
For your case, this simplest solution is to use Wunderlist's approach, but it seems that may cause a usability issue. What do your users expect to happen?
Answering your questions directly:
A custom sync implementation is necessary if you don't want overwrites.
This is a usability decision, what does the user expect?
True, revisions are numeric (e.g r1, r2). To get server agreement, alter the return value of the last sync request. You can return the entire model to the client each time (or just a 200 OK if a normal sync happened). If a model is returned to the client, update the client with the latest model.
In any case, the server should always be the source of truth. This post provides some good advice on server/mobile referential integrity:
To track inserts you need a Created timestamp ... To track updates you need to track a LastUpdate timestamp on your rows ... To track deletes you need a tombstone table.
Note that when you do a sync, you need to check the time offset between the server and the mobile device, and you need to have a method for resolving conflicts. Inserts are no big deal (they shouldn't conflict), but updates could conflict, and a delete could conflict with an update.

Related

Caching Contentful content in Node environment

I am a young developer, and I work on the development of a site whose content is stored on Contentful. Currently, each reloading of the page, the javascript will retrieve the content on Contentful via the API.
The content of the site is not likely to change often, so I would like to cache it.
The site is stored on Netlify. Link
So I thought I could recover the content on Contentful on the Node build, store it in a "cache", that the javascript could use when loading the page. And when modifying on Contentful, a webhook would trigger the rebuild on Netlify.
I do not know if my thinking is the right one, thank you for your help and your answers.

Contentful actually has caching built into its service so you shouldn't need to do anything to get the benefits of caching on your website. Quoting from the Contentful Docs:
There are no limits enforced on requests that hit our CDN cache, i.e. the request doesn't count towards your rate limit and you can make an unlimited amount of cache hits. For requests that do hit the Contentful Delivery API enforces rate limits of 78 requests per second and 280800 requests per hour by default. Higher rate limits may apply depending on your current plan.
See https://www.contentful.com/developers/docs/references/content-delivery-api/#/introduction/api-rate-limits for full details
If you want to do additional caching onto of the Contentful API you could utilize a Node library that'll do it for you. Something like APICache would work pretty well in this use case.
If the rebuilding stack when new content is published, rather than rending it on page view, is important to you, I'd encourage you to take a look at static sites. Contentful has some great webhook support that you can use together with Netlify to help rebuild your site anytime an author pushes new content. Check out this tutorial about using Gatsby for more details - https://www.contentful.com/blog/2018/02/28/contentful-gatsby-video-tutorials/

It seems to be better to cache the pages separately (instead of caching the whole site) and use a cron job to compare the cache of each page (maybe weekly) against the current version. If it is different, regenerate the cache for that page. Also, you might want to manually trigger that, possibly on deploys or in the rare event when there is a change on a given page.
Anyway, before you start to do all this caching stuff you should check whether your site is anywhere near to be overwhelmed by requests. If not, then caching can be postponed to be later, which would be wise, since, in the case your site's nature will change over time and changes will occur often you might need a different cache, or even no cache at all.

Updating a single page application built with AngularJS

I am creating a complex social networking website that is all one single page that never refreshes unless a user presses the refresh button on the browser.
The issue here is that when I edit files and upload them to the server they don't take effect unless the user refreshes the browser.
How would I go about and fix this problem? Should I do a time interval of browser refreshes? Or should I poll the server every 10 minutes to check if the browser should do a refresh?
Any suggestions?

Server
I would communicate the version number through whatever means you're already using for data transfer. Presumably that's some kind of API, but it may be sockets or whatever else.
Whatever the case, I would recommend that with each response - a tidy way is in the header, as suggested in comments by Kevin B - you transmit the current application version.
Client
It is then up to the client to handle changes to the version number supplied. It will know from initial load and more recent requests what the version number has been up until this point. You might want to consider different behaviour depending on what the change in version is.
For example, if it is a patch number change, you might want to present to the user the option of reloading, like Outlook.com does. A feature change might do the same with a different message advertising the fact that new functionality is available, and a major version change may just disable the site and tell the user to reload to regain access.
You'll notice that I've skated around automatic reloading. This is definitely not a technical issue so much as a UX one. Having a SPA reload with no warning (which may well result in data loss) is not the best and I'd advise against it, especially for patch version changes.
Edit
Of course, if you're not using any kind of API or other means of dynamically communicating data with the server, you will have to resort to polling an endpoint that will give you a version and then handle it on the client in the same way. Polling isn't super tidy, but it's certainly better - in my strong opinion - than reloading on a timer on the offchance that the application has updated in the interim.

Are you talking about changing the client side code of the app or the content? You can have the client call the server for updated content using AJAX requests, one possibility would be whenever the user changes states in the app or opens a page that loads a particular controller. If you are talking about changing the html or javascript, I believe the user would need to reload to get those updates.

What is good way to store big JSON objects in couchDb?

I work on a web app which store projects data. Data are saved in a couchDb database A. The app pull and push data with a local pouchDb database B, which is sync with A.
So the app can also work offline. When user has connection back, changes made on localDb B during offline time are sent to A using a classic replication.
I store 1 document per project in couchDb, it is a big JSON object with lot of data (project todos, collaborators, advancements, risks, problems, etc...).
It is working like a charm, but I have some problems, and it seems I use pouchDb in wrong way. Situation example:
User A is offline and he adds a todo on project 1.
User B is online and he adds a new collaborator on project 1.
User B changes are pushed to couchDb by the automatic sync.
The project 1 _rev has been incremented.
User B pulls its own changes from couchDb, because the app downloads all documents on any couchDb changes detected. Weird... Idk how to prevent that. But the app still work fine so it's not a big problem.
User A gets its connection back.
User A changes are ignored because of older _rev. But the user did a modification on a different project property, can couchDb detect that himself and merge with newer _rev ?
I clearly see my problem is I'm using 1 document per project. I could use thousands documents to store each properties of each project and my problem woudn't happens, but it seems quite weird: To retrieve all data of a project I would fully scan my database, check document type (collaborator, todos, ...?), and check if the document is linked to the project by adding a new _projectId property to any document.
Currently I just have to request one document, which contains all project data, then I manipulate my JSON easily. It's quite convenient to handle.
How to manage this ? A project may contains averagely 10 to 10 000 properties that multiple users can edit being online or offline.

But the user did a modification on a different project property, can couchDb detect that himself and merge with newer _rev ?
PouchDB/CouchDB conflict handling is described in the PouchDB guide: http://pouchdb.com/guides/conflicts.html
the app downloads all documents on any couchDb changes detected. Weird... Idk how to prevent that.
This is standard PouchDB/CouchDB behavior - you asked it to sync the whole database, so it synced the whole database. :) You can prevent it by using filtered-replication: http://pouchdb.com/api.html#filtered-replication.
How to manage this ? A project may contains averagely 10 to 10 000 properties that multiple users can edit being online or offline.
It really really depends on your data, how frequently it may change, what the unique identifier of a single "property" is... Storing 10,000 separate documents in PouchDB/CouchDB is not a crazy idea, though, and may help you out when it comes to conflicts, since only those individual documents can ever be in conflict.
In general, I'd recommend you read the guide to conflict resolution as described above and review your options. There's also a plugin that may help you with conflict resolution: https://github.com/jo/pouch-resolve-conflicts

Meteor reactive publish based on client session variable

I'm making a meteor js web app that presents the client an html range slider tied to a session variable.
I want the server to only publish data with values less than the current value of the slider with data sorted from newest to oldest. I have a lot of database entries (2000+). If I publish everything within the max of slider my browsers too slow. If I limit the publish to 100 entries or so, I miss out on a lot of data with small values (which happen to be older) when I bring the slider down.
What are the best practices for trying to be scalable (not sending too much data to the client)? Is a reactive publish function the key (using onchange with the slider value as the key)? That sounds like a lot of server round trips. Help!

Would pagination be acceptable from a UX standpoint? If so, there are packages that may help, for instance alethes:pages.
Otherwise, Adam is on the right track by suggesting to use Tracker.autorun (Tracker has replaced Deps).
As with any other publication, make sure your publish function only returns the fields that you need on the client, in order to minimize the data transferred and the memory consumption.

How do I give users imidiate feedback in a CQRS web application

I have a CQRS application with eventual consistency between the event store and the read model. In it I have a list of items and under the list a "Create new" button. When a user successfully creates a new item he is directed back to the list but since the read model has not been updated yet (eventual consistency) the item is missing in the list.
I want to fake the entry in the list until the read model has been updated.
How do I best do that and how do I remove it when the new item is present in the actual list? I expect delays of about 60 seconds for the read model to catch up.
I do realize that there are simpler ways to achieve this behavior without CQRS but the rest of the application really benefits from CQRS.
If it matters the application is a c# mvc4 application. I've been thinking of solutions involving HTML5 Web Storage but want to know what the best practice is for solving this kind of problem.

In this situation, you can present the result in the UI with total confidence. There is no difference in presenting this information directly and reading it from the read model.
Your domain objects are up to date with the UI and that's what really matters here. Moreover, if you valid correctly your AR state in every operation and you keep track of the concurrency with the AR's version then you're safe and your model will be protected against invalid operations.
At the end, what are the probability of your UI going out of sync? This can happen if you there are many users modifying the information you're displaying at the same time. This can be avoided by creating task based UI and following the rule 'one command/operation in the AR per request'.
The read model can be unsynced until the denormalizers do their job.
In the other hand, if the command will generate a conversation (long running operation) between a saga and AR's then you cannot do this and must warn the user about it.

It doesn't matter that's a asp.net mvc app. The only solution I see, besides just telling the user to wait a bit, is to have another but this time synchronous event handler that generate the same model (of course the actual model generation should be encapsulated in a service) and sends it to a memory cache.
Being everything in memory makes it very fast and being synchronous means it's automatically executed before the request ends. I'm assuming the command is executed syncronously too.
Then in your query repository you also consider results from cache, removing it if that result is already returned by the db.
Personally, for things that I know I want to be available to the user and where the read model generation is trivial, I would use only synchronous event handlers. The user doesn't mind waiting a few seconds when submitting something and if updating a read model takes a few seconds, you know you have a backend problem.

I see that eventual consistency is applicable to application only if application environment has multiple front-end servers hosting application and all these servers has own copy of read model. All servers uses same copy of event store.
When something is changed to event store, read model that is used to read result to user must be updated in sync with event store. Rest of servers and read models managed by them can be updated using eventual consistency.
This way result to user (list of items) can be read from local read model copy because it is already updated in sync. No need for special complex fake updates/rollbacks.
Only case when user can see incomplete list is that user hits F5 to refresh list after update change and load balancing directs user request to front-end server which read model is not yet updated (60 second delay), but this can be avoided so that load balancing does not change users server in middle of session.
So, if application has only one front-end server, eventual consistency is not very usable or it does not give any benefits without some special fake updates/rollbacks with read model...

Develop Reference

JavaScript is the programming language of the Web.