Firebase offline cache & original firebase.js source code - javascript

My question is a follow-up to this topic. I love the simplicity and performance of Firebase from what I have seen so far.
As I understand, firebase.js syncs data snapshots from the server into an object in Javascript memory. However there is currently no functionality to cache this data to disk.
As a result:
Applications are required to have a connection when they start-up, thus there is no true offline access.
Bandwidth is wasted every time an app starts up by re-transmitting all previous data.
Since the snapshot data is sitting in memory as a Javascript object, it should be quite trivial to serialize it as JSON and save it to localStorage, so the exact application state can be loaded next time the app is started, online or not. But as the firebase.js code is minified and cryptic I have no idea where to look.
PouchDB handles this very well on a CouchDB backend. (But it lacks the quick response time and simplicity of Firebase.)
So my questions are:
1. What data would I need to serialize to save a snapshot to localStorage? How can I then load this back into Firebase when the app starts?
2. Where can I download the original non-minified dev source code for firebase.js?
(By the way, two features that would help Firebase blow the competition out of the water: offline caching and map reduce.)

Offline caching and map reduce-like functionality are both in development. The firebase.js source is available here for dev and debugging.
You can serialize a snapshot locally using exportVal to preserve all priority data. If you aren't using priorities, a simple value will do:
var fb = new Firebase(URL);
fb.once('value', function(snapshot) {
console.log('values with priorities', snapshot.exportVal());
console.log('values without priorities', snapshot.val());
});
Later, if Firebase is offline (use .info/connected to help determine this) when your app is loaded, you can call .set() to put that data back into the local Firebase. When/if Firebase comes online, it will be synced.
However, this is truly only suitable for static data that only one person will access and change. Consider, for example, the fallout if I download the data, keep it locally for a week, and it's modified by several other users during that time, then I load my app offline, make one minor change, and then come online. My stale changes would blow away all the work done in between.
There are lots of ways to deal with this--conflict resolution, using security rules and update counters/timestamps to detect stale data and prevent regressions--but this isn't a simple affair and needs deep consideration before you head down this route.

Related

Concurrency, offline persistence and get()

I have a question about a theoretical situation and how the Firestore JS SDK handles it.
The setup is:
We have offline persistence enabled.
We're offline
We need to get() from collection A immediately after coming online.
I'm gonna exaggerate the numbers to make the situation more easily graspable.
Steps
While offline, we add 1000000 documents to collection A.
We come back online, and the assumption is that Firestore starts synching the local data from Collection A to the server, which will take a while.
We do a get() from collection A, while Firestore might not yet have finished synching.
What happens? The assumption here is that, seeing as Firestore has detected we're online again, it tries to get the documents from Collection A that are found in the online DB, and thus might miss out on some of the documents that is still being synchronized from step 2.
Can a Firebase engineer clarify what would happen in this scenario?
A local client will always see its own changes. So even while you're offline, it will see the changes you've made locally in the collection. When you're back online, it will see the changes it's made locally too, regardless of whether those have been synchronized to the server yet.

Offline IndexDB vs Browser Eviction

I'm puzzled by the juxtaposition of pitching IndexDB for offline (e.g.) single-page HTML apps with the fact that the documentation seems to indicate the browser can trash your local data at any time. "In addition, be aware that browsers can wipe out the database, such as in the following conditions"...
It seems like the options are
a) only design read-only offline apps or
b) just accept that once in a while some users of your offline app are going to get unlucky and lose all their work when the browser gets in a mood to delete your IndexedDB data.
My question is: is there any serious discussion of this issue anywhere, or (better, but too much to hope for) a serious read/write offline app that deals with the issue? My searches on the topic have been fruitless. For example, this complete offline todo app example manages to never mention the problem -- who wants to store even simple todo data in a storage that the browser could wipe out at any moment and that can't trivially be backed up?

Improve client-server data sync functionality with deltas

The app
I have a web app that currently uses AppCache for offline functionality since users of the system need to create documents offline. The document is first created offline and when internet access is available, the user can click "sync" which will send the document to the server and save it as a revision. To be more specific, the app does not save the change delta as a revision (the exact field modified) but rather the whole document in its entirety. So in other words, a "snapshot" document is saved.
The problem
Users can login from different browsers and devices and work on their documents. When they click "sync", if the server's document is newer, the entire client's version will be overridden by the server's. This leads to one main issue that is depicted in the image below.
The scenario above occurs because of the current implementation which does not rely on deltas (small changes) and rather relies on snapshot revisions.
Some questions
1) My research indicates that I should be upgrading the "sync" mechanism to be expressed in deltas (small changes that can be applied independently). Is this a sound approach?
2) Should each delta be applied independently?
2) According to my research, revision deltas have a numeric value and not a timestamp. What should the value for this be exactly? How would I ensure both the server and the client agree on what the revision number should be?
Stack information
Angular on the frontend
IndexedDB to save documents locally (offline mode)
Postgres DB with JSONB in the backend
What your describing is a version control issue like in this question. The choice is yours with how to resolve. Here are a few examples of other products with this problem:
Google docs: A makes edit offline, B makes edit online, A goes online, Sync, Google Docs combines A and B's edits
Apple notes: Same as Google Docs
Git/Subversion: Throw an error, ask user to resolve conflicts
Wunderlist: Last edit overwrites previous
For your case, this simplest solution is to use Wunderlist's approach, but it seems that may cause a usability issue. What do your users expect to happen?
Answering your questions directly:
A custom sync implementation is necessary if you don't want overwrites.
This is a usability decision, what does the user expect?
True, revisions are numeric (e.g r1, r2). To get server agreement, alter the return value of the last sync request. You can return the entire model to the client each time (or just a 200 OK if a normal sync happened). If a model is returned to the client, update the client with the latest model.
In any case, the server should always be the source of truth. This post provides some good advice on server/mobile referential integrity:
To track inserts you need a Created timestamp ... To track updates you need to track a LastUpdate timestamp on your rows ... To track deletes you need a tombstone table.
Note that when you do a sync, you need to check the time offset between the server and the mobile device, and you need to have a method for resolving conflicts. Inserts are no big deal (they shouldn't conflict), but updates could conflict, and a delete could conflict with an update.

What is good way to store big JSON objects in couchDb?

I work on a web app which store projects data. Data are saved in a couchDb database A. The app pull and push data with a local pouchDb database B, which is sync with A.
So the app can also work offline. When user has connection back, changes made on localDb B during offline time are sent to A using a classic replication.
I store 1 document per project in couchDb, it is a big JSON object with lot of data (project todos, collaborators, advancements, risks, problems, etc...).
It is working like a charm, but I have some problems, and it seems I use pouchDb in wrong way. Situation example:
User A is offline and he adds a todo on project 1.
User B is online and he adds a new collaborator on project 1.
User B changes are pushed to couchDb by the automatic sync.
The project 1 _rev has been incremented.
User B pulls its own changes from couchDb, because the app downloads all documents on any couchDb changes detected. Weird... Idk how to prevent that. But the app still work fine so it's not a big problem.
User A gets its connection back.
User A changes are ignored because of older _rev. But the user did a modification on a different project property, can couchDb detect that himself and merge with newer _rev ?
I clearly see my problem is I'm using 1 document per project. I could use thousands documents to store each properties of each project and my problem woudn't happens, but it seems quite weird: To retrieve all data of a project I would fully scan my database, check document type (collaborator, todos, ...?), and check if the document is linked to the project by adding a new _projectId property to any document.
Currently I just have to request one document, which contains all project data, then I manipulate my JSON easily. It's quite convenient to handle.
How to manage this ? A project may contains averagely 10 to 10 000 properties that multiple users can edit being online or offline.
But the user did a modification on a different project property, can couchDb detect that himself and merge with newer _rev ?
PouchDB/CouchDB conflict handling is described in the PouchDB guide: http://pouchdb.com/guides/conflicts.html
the app downloads all documents on any couchDb changes detected. Weird... Idk how to prevent that.
This is standard PouchDB/CouchDB behavior - you asked it to sync the whole database, so it synced the whole database. :) You can prevent it by using filtered-replication: http://pouchdb.com/api.html#filtered-replication.
How to manage this ? A project may contains averagely 10 to 10 000 properties that multiple users can edit being online or offline.
It really really depends on your data, how frequently it may change, what the unique identifier of a single "property" is... Storing 10,000 separate documents in PouchDB/CouchDB is not a crazy idea, though, and may help you out when it comes to conflicts, since only those individual documents can ever be in conflict.
In general, I'd recommend you read the guide to conflict resolution as described above and review your options. There's also a plugin that may help you with conflict resolution: https://github.com/jo/pouch-resolve-conflicts

Which (if any) Javascript storage API (Google Drive, Dropbox, OneDrive) provides automatic syncing?

I have an application that was developed using HTML and javascript. What I need now is to make use of a cloud storage system to access a user's files, which could either be using Google Drive, OneDrive or Dropbox.
One of the requirements is that the application should sync so that new files are added automatically and deleted files removed etc. The sync should be automatic, and there should be no need to poll for changes in the code "manually".
I have determined (as far as I can tell) that with the Dropbox Javascript API, you have to poll for changes and then pull the changes. It seems also with the Google Drive Javascript API that you need to watch for changes and then get those changes. I was leaning towards using OneDrive, but my big problem with that API is that you can (well, so it seems) only access files through a file picker, and I need to get the files without involving the user.
Can anyone confirm the above?
If not, if you need to poll for changes, which would be the best API to use?
And just if anyone has an idea, how often should this be done, and where in the code? Is there some sort of guideline for this?
You can get properties for Files and Folders without the need of the file picker.
File and folder properties (Windows Runtime apps using JavaScript and HTML)
The user will need to authentic with the service as well as grant consent for your application access to their data. Other than that there would be no user interaction required.
You can also use the REST Api's directly once authenticated and granted access. The REST API's are documented here.
Using the REST API
As for the polling interval I might consider using an "observer" design pattern. You're cloud storage system component would register with the "provider" (the parent HTML application) for notifications. You could call the "sync" logic to execute when a predefined operation occurred such as login. You could persist the modified date time your applications root data folder. Then only look for changes in the event of that miss match.
Polling at a given time frequency will only ensure that the data is in sync at that specific time. The user sync state may or may not be valid when they access your application regardless of what frequency you put on the polling method.
Regarding the Dropbox API at least, this is correct. Using the Dropbox JavaScript SDK you need to poll for changes and then pull those changes into your app's local state.

Categories

Resources