Firestore slow performance issue on getting data

Firestore slow performance issue on getting data - javascript

I'm having slow performance issues with Firestore while retrieving basic data stored in a document compared to the realtime database with 1/10 ratio.
Using Firestore, it takes an average of 3000 ms on the first call
this.db.collection(‘testCol’)
.doc(‘testDoc’)
.valueChanges().forEach((data) => {
console.log(data);//3000 ms later
});
Using the realtime database, it takes an average of 300 ms on the first call
this.db.database.ref(‘/test’).once(‘value’).then(data => {
console.log(data); //300ms later
});
This is a screenshot of the network console :
I'm running the Javascript SDK v4.50 with AngularFire2 v5.0 rc.2.
Did anyone experience this issue ?

UPDATE: 12th Feb 2018 - iOS Firestore SDK v0.10.0
Similar to some other commenters, I've also noticed a slower response on the first get request (with subsequent requests taking ~100ms). For me it's not as bad as 30s, but maybe around 2-3s when I have good connectivity, which is enough to provide a bad user experience when my app starts up.
Firebase have advised that they're aware of this "cold start" issue and they're working on a long term fix for it - no ETA unfortunately. I think it's a separate issue that when I have poor connectivity, it can take ages (over 30s) before get requests decide to read from cache.
Whilst Firebase fix all these issues, I've started using the new disableNetwork() and enableNetwork() methods (available in Firestore v0.10.0) to manually control the online/offline state of Firebase. Though I've had to be very careful where I use it in my code, as there's a Firestore bug that can cause a crash under certain scenarios.
UPDATE: 15th Nov 2017 - iOS Firestore SDK v0.9.2
It seems the slow performance issue has now been fixed. I've re-run the tests described below and the time it takes for Firestore to return the 100 documents now seems to be consistently around 100ms.
Not sure if this was a fix in the latest SDK v0.9.2 or if it was a backend fix (or both), but I suggest everyone updates their Firebase pods. My app is noticeably more responsive - similar to the way it was on the Realtime DB.
I've also discovered Firestore to be much slower than Realtime DB, especially when reading from lots of documents.
Updated tests (with latest iOS Firestore SDK v0.9.0):
I set up a test project in iOS Swift using both RTDB and Firestore and ran 100 sequential read operations on each. For the RTDB, I tested the observeSingleEvent and observe methods on each of the 100 top level nodes. For Firestore, I used the getDocument and addSnapshotListener methods at each of the 100 documents in the TestCol collection. I ran the tests with disk persistence on and off. Please refer to the attached image, which shows the data structure for each database.
I ran the test 10 times for each database on the same device and a stable wifi network. Existing observers and listeners were destroyed before each new run.
Realtime DB observeSingleEvent method:
func rtdbObserveSingle() {
let start = UInt64(floor(Date().timeIntervalSince1970 * 1000))
print("Started reading from RTDB at: \(start)")
for i in 1...100 {
Database.database().reference().child(String(i)).observeSingleEvent(of: .value) { snapshot in
let time = UInt64(floor(Date().timeIntervalSince1970 * 1000))
let data = snapshot.value as? [String: String] ?? [:]
print("Data: \(data). Returned at: \(time)")
}
}
}
Realtime DB observe method:
func rtdbObserve() {
let start = UInt64(floor(Date().timeIntervalSince1970 * 1000))
print("Started reading from RTDB at: \(start)")
for i in 1...100 {
Database.database().reference().child(String(i)).observe(.value) { snapshot in
let time = UInt64(floor(Date().timeIntervalSince1970 * 1000))
let data = snapshot.value as? [String: String] ?? [:]
print("Data: \(data). Returned at: \(time)")
}
}
}
Firestore getDocument method:
func fsGetDocument() {
let start = UInt64(floor(Date().timeIntervalSince1970 * 1000))
print("Started reading from FS at: \(start)")
for i in 1...100 {
Firestore.firestore().collection("TestCol").document(String(i)).getDocument() { document, error in
let time = UInt64(floor(Date().timeIntervalSince1970 * 1000))
guard let document = document, document.exists && error == nil else {
print("Error: \(error?.localizedDescription ?? "nil"). Returned at: \(time)")
return
}
let data = document.data() as? [String: String] ?? [:]
print("Data: \(data). Returned at: \(time)")
}
}
}
Firestore addSnapshotListener method:
func fsAddSnapshotListener() {
let start = UInt64(floor(Date().timeIntervalSince1970 * 1000))
print("Started reading from FS at: \(start)")
for i in 1...100 {
Firestore.firestore().collection("TestCol").document(String(i)).addSnapshotListener() { document, error in
let time = UInt64(floor(Date().timeIntervalSince1970 * 1000))
guard let document = document, document.exists && error == nil else {
print("Error: \(error?.localizedDescription ?? "nil"). Returned at: \(time)")
return
}
let data = document.data() as? [String: String] ?? [:]
print("Data: \(data). Returned at: \(time)")
}
}
}
Each method essentially prints the unix timestamp in milliseconds when the method starts executing and then prints another unix timestamp when each read operation returns. I took the difference between the initial timestamp and the last timestamp to return.
RESULTS - Disk persistence disabled:
RESULTS - Disk persistence enabled:
Data Structure:
When the Firestore getDocument / addSnapshotListener methods get stuck, it seems to get stuck for durations that are roughly multiples of 30 seconds. Perhaps this could help the Firebase team isolate where in the SDK it's getting stuck?

Update Date March 02, 2018
It looks like this is a known issue and the engineers at Firestore are working on a fix. After a few email exchanges and code sharing with a Firestore engineer on this issue, this was his response as of today.
"You are actually correct. Upon further checking, this slowness on getDocuments() API is a known behavior in Cloud Firestore beta. Our engineers are aware of this performance issue tagged as "cold starts", but don't worry as we are doing our best to improve Firestore query performance.
We are already working on a long-term fix but I can't share any timelines or specifics at the moment. While Firestore is still on beta, expect that there will be more improvements to come."
So hopefully this will get knocked out soon.
Using Swift / iOS
After dealing with this for about 3 days it seems the issue is definitely the get() ie .getDocuments and .getDocument. Things I thought were causing the extreme yet intermittent delays but don't appear to be the case:
Not so great network connectivity
Repeated calls via looping over .getDocument()
Chaining get() calls
Firestore Cold starting
Fetching multiple documents (Fetching 1 small doc caused 20sec delays)
Caching (I disabled offline persistence but this did nothing.)
I was able to rule all of these out as I noticed this issue didn't happen with every Firestore database call I was making. Only retrievals using get(). For kicks I replaced .getDocument with .addSnapshotListener to retrieve my data and voila. Instant retrieval each time including the first call. No cold starts. So far no issues with the .addSnapshotListener, only getDocument(s).
For now, I'm simply dropping the .getDocument() where time is of the essence and replacing it with .addSnapshotListener then using
for document in querySnapshot!.documents{
// do some magical unicorn stuff here with my document.data()
}
... in order to keep moving until this gets worked out by Firestore.

Almost 3 years later, firestore being well out of beta and I can confirm that this horrible problem still persists ;-(
On our mobile app we use the javascript / node.js firebase client. After a lot of testing to find out why our app's startup time is around 10sec we identified what to attribute 70% of that time to... Well, to firebase's and firestore's performance and cold start issues:
firebase.auth().onAuthStateChanged() fires approx. after 1.5 - 2sec, already quite bad.
If it returns a user, we use its ID to get the user document from firestore. This is the first call to firestore and the corresponding get() takes 4 - 5sec. Subsequent get() of the same or other documents take approx. 500ms.
So in total the user initialization takes 6 - 7 sec, completely unacceptable. And we can't do anything about it. We can't test disabling persistence, since in the javascript client there's no such option, persistence is always enabled by default, so not calling enablePersistence() won't change anything.

I had this issue until this morning. My Firestore query via iOS/Swift would take around 20 seconds to complete a simple, fully indexed query - with non-proportional query times for 1 item returned - all the way up to 3,000.
My solution was to disable offline data persistence. In my case, it didn't suit the needs of our Firestore database - which has large portions of its data updated every day.
iOS & Android users have this option enabled by default, whilst web users have it disabled by default. It makes Firestore seem insanely slow if you're querying a huge collection of documents. Basically it caches a copy of whichever data you're querying (and whichever collection you're querying - I believe it caches all documents within) which can lead to high Memory usage.
In my case, it caused a huge wait for every query until the device had cached the data required - hence the non-proportional query times for the increasing numbers of items to return from the exact same collection. This is because it took the same amount of time to cache the collection in each query.
Offline Data - from the Cloud Firestore Docs
I performed some benchmarking to display this effect (with offline persistence enabled) from the same queried collection, but with different amounts of items returned using the .limit parameter:
Now at 100 items returned (with offline persistence disabled), my query takes less than 1 second to complete.
My Firestore query code is below:
let db = Firestore.firestore()
self.date = Date()
let ref = db.collection("collection").whereField("Int", isEqualTo: SomeInt).order(by: "AnotherInt", descending: true).limit(to: 100)
ref.getDocuments() { (querySnapshot, err) in
if let err = err {
print("Error getting documents: \(err)")
} else {
for document in querySnapshot!.documents {
let data = document.data()
//Do things
}
print("QUERY DONE")
let currentTime = Date()
let components = Calendar.current.dateComponents([.second], from: self.date, to: currentTime)
let seconds = components.second!
print("Elapsed time for Firestore query -> \(seconds)s")
// Benchmark result
}
}

well, from what I'm currently doing and research by using nexus 5X in emulator and real android phone Huawei P8,
Firestore and Cloud Storage are both give me a headache of slow response
when I do first document.get() and first storage.getDownloadUrl()
It give me more than 60 seconds response on each request. The slow response only happen in real android phone. Not in emulator. Another strange thing.
After the first encounter, the rest request is smooth.
Here is the simple code where I meet the slow response.
var dbuserref = dbFireStore.collection('user').where('email','==',email);
const querySnapshot = await dbuserref.get();
var url = await defaultStorage.ref(document.data().image_path).getDownloadURL();
I also found link that is researching the same.
https://reformatcode.com/code/android/firestore-document-get-performance

Related

Angular: Increase Query Loading Time in Firebase Database

I have an angular app where i am querying my firebase database as below:
constructor() {
this.getData();
}
getData() {
this.projectSubscription$ = this.dataService.getAllProjects()
.pipe(
map((projects: any) =>
projects.map(sc=> ({ key: sc.key, ...sc.payload.val() }))
),
switchMap(appUsers => this.dataService.getAllAppUsers()
.pipe(
map((admins: any) =>
appUsers.map(proj =>{
const match: any = admins.find(admin => admin.key === proj.admin);
return {...proj, imgArr: this.mapObjectToArray(proj.images), adminUser: match.payload.val()}
})
)
)
)
).subscribe(res => {
this.loadingState = false;
this.projects = res.reverse();
});
}
mapObjectToArray = (obj: any) => {
const mappedDatas = [];
for (const key in obj) {
if (Object.prototype.hasOwnProperty.call(obj, key)) {
mappedDatas.push({ ...obj[key], id: key });
}
}
return mappedDatas;
};
And here is what I am querying inside dataService:
getAllProjects() {
return this.afDatabase.list('/projects/', ref=>ref.orderByChild('createdAt')).snapshotChanges();
}
getAllAppUsers() {
return this.afDatabase.list('/appUsers/', ref=>ref.orderByChild('name')).snapshotChanges();
}
The problem I am facing with this is I have 400 rows of data which I am trying to load and it is taking around 30seconds to load which is insanely high. Any idea how can I query this in a faster time?

We have no way to know whether the 30s is reasonable, as that depends on the amount of data loaded, the connection latency and bandwidth of the client, and more factors we can't know/control.
But one thing to keep in mind is that you're performing 400 queries to get the users of each individual app, which is likely not great for performance.
Things you could consider:
Pre-load all the users once, and then use that list for each project.
Duplicate the name of each user into each project, so that you don't need to join any data at all.
If you come from a background in relational databases the latter may be counterintuitive, but it is actually very common in NoSQL data modeling and is one of the reasons NoSQL databases scale so well.

I propose 3 solutions.
1. Pagination
Instead of returning all those documents on app load, limit them to just 10 and keep record of the last one. Then display the 10 (or any arbitrary base number)
Then make the UI in such a way that the user has to click next or when the user scrolls, you fetch the next set based on the previous last document's field's info.
I'm supposing you need to display all the fetched data in some table or list so having the UI paginate the data should make sense.
2. Loader
Show some loader UI on website load. Then when all the documents have fetched, you hide the loader and show the data as you want. You can use some custom stuff for loader, or choose from any of the abundant libraries out there, or use mat-progress-spinner from Angular Material
3. onCall Cloud Function
What if you try getting them through an onCall cloud function? It night be faster because it's just one request that the app will make and Firebase's Cloud Functions are very fast within Google's data centers.
Given that the user's network might be slow to iterate the documents but the cloud function will return all at once and that might give you what you want.
I guess you could go for this option only if you really really need to display all that data at once on website load.
... Note on cost
Fetching 400 or more documents every time a given website loads might be expensive. It'll be expensive if the website is visited very frequently by very many users. Firebase cost will increase as you are charged per document read too.
Check to see if you could optimise the data structure to avoid fetching this much.
This doesn't apply to you if this some admin dashboard or if fetching all users like this is done rarely making cost to not be high in that case.

firebase timestamps not serverside created

I have a chat app in react native using firebase and it is crucial to me that the timestamps are all synchronised and created serversided, not from the client. Clients can have different times depending on the settings on the mobile phone.
I tried out following function to prove my point:
const comparetimes = async () => {
const timezone1 = firebase.firestore.Timestamp.now();
const timezone2 = (firebase.firestore.Timestamp.now()).toMillis();
const timezone3 = Date.now();
console.log(timezone1);
console.log(timezone2);
console.log(timezone3)
}
What shocked me is the result of this function:
{"nanoseconds": 79000000, "seconds": 1641054839}
1641054839080
1641054839080
Apperently the firebase timestamp is the exact same as Date.now() which is a direct timestamp taken from the mobile phone, and therefore unaccurate.
What can I do to have timestamps not created by the client but by the server, in this example firebase? Do I need to checkout some APIs or is there something I miss here?

When you call Timestamp.now(), it really is just taking the timestamp on the client device. That's expected.
Read the documentation on how to use server timestamps. You must use the token value returned by firestore.FieldValue.serverTimestamp() in order to tell Firestore to use the current moment in time as the request reaches the server.
When storing timestamps, it is recommended you use the serverTimestamp
static method on the FieldValue class. When written to the database,
the Firebase servers will write a new timestamp based on their time,
rather than the clients. This helps resolve any data consistency
issues with different client timezones:
firestore().doc('users/ABC').update({
createdAt: firestore.FieldValue.serverTimestamp(),
});
Also read:
https://medium.com/firebase-developers/the-secrets-of-firestore-fieldvalue-servertimestamp-revealed-29dd7a38a82b

matrix-js-sdk setup and configuration

I am having some issues trying to connect to a matrix server using the matrix-js-sdk in a react app.
I have provided a simple code example below, and made sure that credentials are valid (login works) and that the environment variable containing the URL for the matrix client is set. I have signed into element in a browser and created two rooms for testing purposes, and was expecting these two rooms would be returned from matrixClient.getRooms(). However, this simply returns an empty array. With some further testing it seems like the asynchronous functions provided for fetching room, member and group ID's only, works as expected.
According to https://matrix.org/docs/guides/usage-of-the-matrix-js-sd these should be valid steps for setting up the matrix-js-sdk, however the sync is never executed either.
const matrixClient = sdk.createClient(
process.env.REACT_APP_MATRIX_CLIENT_URL!
);
await matrixClient.long("m.login.password", credentials);
matrixClient.once('sync', () => {
debugger; // Never hit
}
for (const room of matrixClient.getRooms()) {
debugger; // Never hit
}
I did manage to use the roomId's returned from await matrixClient.roomInitialSync(roomId, limit, callback), however this lead me to another issue where I can't figure out how to decrypt messages, as the events containing the messages sent in the room seems to be of type 'm.room.encrypted' instead of 'm.room.message'.
Does anyone have any good examples of working implementations for the matrix-js-sdk, or any other good resources for properly understanding how to put this all together? I need to be able to load rooms, persons, messages etc. and display these respectively in a ReactJS application.

It turns out I simply forgot to run startClient on the matrix client, resulting in it not fetching any data.

Efficient DB design with PouchDB/CouchDB

So I was reading a lot about how to actually store and fetch data in an efficient way. Basically my application is about time management/capturing for projects. I am very happy for any opinion on which strategy I should use or even suggestions for other strategies. The main concern is about the limited resources for local storage on the different Browsers.
This is the main data I have to store:
db_projects: This is a database where the projects itself are stored.
db_timestamps: Here go the timestamps per project whenever a project is running.
I came up with the following strategies:
1: Storing the status of the project in the timestamps
When a project is started, there is addad a timestamp to db_timestamps like so:
db_timestamps.put({
_id: String(Date.now()),
title: projectID,
status: status //could be: 1=active/2=inactive/3=paused
})...
This follows the strategy to only add data to the db and not modify any entries. The problem I see here is that if I want to get all active projects for example, I would need to query the whole db_timestamp which can contain thousands of entries. Since I can not use the ID to search all active projects, this could result in a quite heavy DB query.
2: Storing the status of the project in db_projects
Each time a project changes it's status, there is a update to the project itself. So the "get all active projects"-query would be much resource friendly, since there are a lot less projects than timestamps. But this would also mean that each time a status change happens, the project entry would be revisioned and therefor would produce "a lot" of overhead. I'm also not sure if the compaction feature would do a good job, since not all revision data is deleted (the documents are, but the leaf revisions not). This means for a state change we have at least the _rev information which is still a string of 34 chars for changing only the status (1 char). Or can I delete the leaf revisions after conflict resolution?
3: Storing the status in a separate DB like db_status
This leads to the same problem as in #2 since status changes lead to revisions on this DB. Or if the states would be added in "only add data"-mode (like in #1), it would just quickly fill with entries.

The general problem is that you have a limited amount of space that you could put into indexedDB. On the other hand the principle of ChouchDB is that storage space is cheap (which it is indeed true when you store on the server side only). Here an interesting discussion about that.
So this is the solution that I use for now. I am using a mix between solution 1 and solution 2 from above with the following additions:
Storing only the timesamps in a synced Database (db_timestamps) with the "only add data" principle.
Storing the projects and their states in a separate local (not
synced) database (db_projects). Therefor I still use pouchDB since
it has a lot simpler API than indexedDB.
Storing the new/changed
project status in each timestamp aswell (so you could rebuild db_projects
out of db_timestams if needed)
Deleting db_projects every so often and repopulate it, so the
revision data (overhead for this db in my case) is eliminated and the size is acceptable.
I use the following code to rebuild my DB:
//--------------------------------------------------------------------
function rebuild_db_project(){
db_project.allDocs({
include_docs: true,
//attachments: true
}).then(function (result) {
// do stuff
console.log('I have read the DB and delete it now...');
deleteDB('db_project', '_pouch_DB_Projekte');
return result;
}).then(function (result) {
console.log('Creating the new DB...'+result);
db_project = new PouchDB('DB_Projekte');
var dbContentArray = [];
for (var row in result.rows) {
delete result.rows[row].doc._rev; //delete the revision of the doc. else it would raise an error on the bulkDocs() operation
dbContentArray.push(result.rows[row].doc);
}
return db_project.bulkDocs(dbContentArray);
}).then(function(response){
console.log('I have successfully populated the DB with: '+JSON.stringify(response));
}).catch(function (err) {
console.log(err);
});
}
//--------------------------------------------------------------------
function deleteDB(PouchDB_Name, IndexedDB_Name){
console.log('DELETE');
new PouchDB(PouchDB_Name).destroy().then(function () {
// database destroyed
console.log("pouchDB destroyed.");
}).catch(function (err) {
// error occurred
});
var DBDeleteRequest = window.indexedDB.deleteDatabase(IndexedDB_Name);
DBDeleteRequest.onerror = function(event) {
console.log("Error deleting database.");
};
DBDeleteRequest.onsuccess = function(event) {
console.log("IndexedDB deleted successfully");
console.log(request.result); // should be null
};
}
So I not only use the pouchDB.destroy() command but also the indexedDB.deleteDatabase() command to get the storage freed nearly completely (there is still some 4kB that are not freed, but this is insignificant to me.)
The timings are not really proper but it works for me. I'm happy if somone has an idea to make the timing work properly (The problem for me is that indexedDB does not support promises).

Nodejs Mongoose - Serve clients a single query result

I'm looking to implement a solution where I can query the Mongoose Database on a regular interval and then store the results to serve to my clients.
I'm assuming this will reduce my response time when my users pull the collection.
I attempted to implement this plan by creating an empty global object and then writing a function that queries the db and then stores the results as the global object mentioned previously. At the end of the function I setTimeout for 60 seconds and then ran the function again. I call this function the first time the server controller gets called when the app is first run.
I then set my clients up so that when they requested the collection, it would first look to see if the global object exists, and if so return that as the response. I figured this would cut my 7-10 second queries down to < 1 sec.
In my novice thinking I assumed that Nodejs being 'single-threaded' something like this could work quite well - but it just seemed to eat up all my RAM and cause fatal errors.
Am I on the right track with my thinking or is it better to query the db every time people pull the collection?
Here is the code in question:
var allLeads = {};
var getAllLeads = function(){
allLeads = {};
console.log('Getting All Leads...');
Lead.find().sort('-lastCalled').exec(function(err, leads) {
if (err) {
console.log('Error getting leads');
} else {
allLeads = leads;
}
});
setTimeout(function(){
getAllLeads();
}, 60000);
};
getAllLeads();
Thanks in advance for your assistance.

Develop Reference

JavaScript is the programming language of the Web.