Reading Multiple Firebase Documents One By One - javascript

So in my web application, I am fetching data from firebase, it will have about 5000 documents by the time it's ready. So to reduce the number of reads, I keep an array in firebase with a list of ids of each documents, which I read and then compare with the list I have saved locally. If an id is missing, I read that particular doc from firebase (Firebase website definitely slowed down with having an array with 3000 docs so far).
However, consider the situation where I have 2000+ docs missing and I have to fetch them one by one according to missingList I created from comparing the firebase and local arrays. It is heck a lot of slow, it takes so long to fetch even a hundred documents because I am using await. If I don't, then firebase is overloaded with requests and for some reason it shows a warning that my net is disconnected.
What is the best way to fetch documents with ids given in an array, with no loss and reasonably fast enough? Is there a way for me to batch read documents like there is batch write, update and set?
Say Firebase has [1234, 1235, 1236, 1237] and I only have [1234] so I need to read [1235, 1236, 1237] from it.
I am using a for loop to iterate through each id and get the corresponding document like so
for (let i = 0; i < missingList.length; i++) {
var item = await db.collection('myCollection').doc(missingList[i]).get().then((snapshot) => snapshot.data())
await saveToDB(item) //I use PouchDB for local storage
}

The closest thing to a 'batched read' in Firestore is a transaction, but using a transation to read hundreds of document is not really efficient. Instead, I suggest you add one field to every document, like token. Then, every time you get data for the first time, generate a random token client-side, then write it to every document that you read. After that, if you detect a change in the list of ids you mention above, run a query on that token field, like :
db.collection('myCollection').where('token', '!=', locallySavedToken).get()
Remember to write your local token back to the read documents. This way, your query time will be much faster. The down side is you need 1 extra write request for every document read. If you need to re-read your data a lot then maybe look into transactions, since write requests pricing is more expensive than read requests

Related

Firebase: best practice to display items

I would like a suggestion.
I'm developing a web app using react-js as frontend and firebase as backend.
For sake of semplicity, say the aim of the web-app is to allow the user to upload some items which are then displayed on the homepage
An item consists of two string: 1) name of item 2) url pointed to an image stored on IPFS
Once the user submit a form, the frontend creates a new doc in a collection in firestore:
await addDoc(collectionRef, newDoc)
Then, in the homepage, I display all the items in the collection
const querySnapshot = await getDocs(collectionRef);
querySnapshot.forEach((doc) => {
...
});
This works; however, if I've understood properly, every time the homepage is refreshed the frontend makes N read calls (where N is the number of docs in the collection). Therefore, I'm wondering if its the right approach, is there a better way?
I really appreaciate any suggestion (also regarding potential major safety flaws in this setting)
Every time the homepage is refreshed the frontend makes N read calls
(where N is the number of docs in the collection).
With const querySnapshot = await getDocs(collectionRef); this is indeed the case: all the docs in the collection are fetched.
If your collection contains a lot of documents and is fetched frequently this will cost you money and, at one moment, the performance of you app may be degraded (duration of the Firestore query execution and lot of docs transferred from the back-end to the front-end).
One approach to avoid that is to use pagination and fetch only the first X documents and display a link to fetch the X next ones, etc.
There is a page in the Firestore documentation dedicated to pagination implementation.
You could also implement an infinite scroll mechanism instead of a link, to load the next set of X document continuously as the user scrolls down the page. That's just a UI variation and is based on the exact same approach detailed in the Firestore documentation.
Also regarding potential major safety flaws in this setting
Fetching all the docs of a collection or using pagination does not make any difference in terms of security. In any case you should implement security according to your requirements by using security rules (and potentially Firebase Auth as well). And don't forget that security rules are not filters.

Best way to have a Node.JS server keep updated with a FireBase database in real time?

I currently have a Node.JS server set up that is able to read and write data from a FireBase database when a request is made from a user.
I would like to implement time based events that result in an action being performed at a certain date or time. The key thing here though, is that I want to have the freedom to do this in seconds (for example, write a message to console after 30 seconds have passed, or on Friday the 13th at 11:30am).
A way to do this would be to store the date/time an action needs be performed in the database, and read from the database every second and compare the current date/time with events stored so we know if an action needs to be performed at this moment. As you can imagine though, this would be a lot of unnecessary calls to the database and really feels like a poor way to implement this system.
Is there a way I can stay synced with the database without having to call every second? Perhaps I could then store a version of the events table locally and update this when a change is made to the database? Would that be a better idea? Is there another solution I am missing?
Any advice would be greatly appreciated, thanks!
EDIT:
How I currently initialise the database:
firebase.initializeApp(firebaseConfig);
var database = firebase.database();
How I then get data from the database:
await database.ref('/').once('value', function(snapshot){
snapshot.forEach(function(childSnapshot){
if(childSnapshot.key === userName){
userPreferences = childSnapshot.val().UserPreferences;
}
})
});
The Firebase once() API reads the data from the database once, and then stops observing it.
If you instead us the on() API, it will continue observing the database after getting the initial value - and call your code whenever the database changes.
It sounds like you're looking to develop an application for scheduling. If that's the case you should check out node-schedule.
Node Schedule is a flexible cron-like and not-cron-like job scheduler
for Node.js. It allows you to schedule jobs (arbitrary functions) for
execution at specific dates, with optional recurrence rules. It only
uses a single timer at any given time (rather than reevaluating
upcoming jobs every second/minute).
You then can use the database to keep a "state" of the application so on start-up of the application you read all the upcoming jobs that will be expected and load them into node-schedule and let node-schedule do the rest.
The Google Cloud solution for scheduling a single item of future work is Cloud Tasks. Firebase is part of Google Cloud, so this is the most natural product to use. You can use this to avoid polling the database by simply specifying exactly when some Cloud Function should run to do the work you want.
I've written a blog post that demonstrates how to set up a Cloud Task to call a Cloud Functions to delete a document in Firestore with an exact TTL.

Deleting old documents based on timestamp through Firebase Functions

I'm trying to delete specific document in the collection based on timestamp of that document. Posts that are out of date I set them to, should be deleted from the cloud on the script call.
The problem is that I couldn't manage to find a way to iterate over all the documents in the collection, so that I can access the fields and compare Date.now() to post['expireDate].
I'm not using the realtime database, but the firestore cloud for my project. I've found a way to do it in the db, but no on the cloud, and have tried different ways to do it.
exports.removeOldPosts = functions.https.onRequest((req, res) => {
const timeNow = Date.now();
let postsRef = admin.firestore().collection('accesories/').listDocuments();
postsRef.forEach(post => {
if (post['expiredDate'] < timeNow) {
post.delete();
}
})
return res.status(200).end();
});
You're using the listDocuments() API, which returns (asynchronously) a list of document references. But your code :
does not deal with the fact that the call is asynchronous. I.e. you don't have a then() callback, as in the example of the documentation.
assumes that the data of the document is retrieved, while listDocuments in reality only returns document references.
To fix this, I recommend using the get() API that is shown in the Firebase documentation on getting all documents from a collection. You'll still have to deal with the asynchronous nature yourself though, even with that API, so I recommend studying up on the asynchronous nature of Cloud Functions and modern web APIs. One placeto get started could be Doug's excellent blog post Why are the Firebase API asynchronous? and video series Learn JavaScript Promises.
As a separate note: instead of reading all documents from the database, and then determining in your code which ones to delete, I'd recommend using a query to only retrieve the documents that you want to delete. This leads to reading fewer documents, which saves you some cost in both document reads and bandwidth.

Right way of passing consistent data from DB to user without repeatedly querying

Database stores some data about the user which almost never change. Well sometimes information might change if the user wants to edit his name for example.
Data information is about each user's name, username and his company data.
The first two are being shown to his navigation bar all the time using ejs, like User_1 is logged in, his company profile data when he needs to create an invoice.
My current way is to fetch user data through middleware using router.use so the extracted information is always available through all routes/views, for example:
router.use(function(req, res ,next) { // this block of code is called as middleware in every route
req.getConnection(function(err,conn){
uid = req.user.id;
if(err){
console.log(err);
return next("Mysql error, check your query");
}
var query = conn.query('SELECT * FROM user_profile WHERE uid = ? ', uid, function(err,rows){
if(err){
console.log(err);
return next(err, uid, "Mysql error, check your query");
}
var userData = rows;
return next();
});
});
})
.
I understand that this is not an optimal way of passing user profile data to every route/view since it makes new DB queries every time the user navigates through the application.
What would be a better way of having this data available without repeating the same query in each route yet having them re-fetched once the user changes a portion of this data, like his fullname ?
You've just stumbled into the world of "caching", welcome! Caching is a very popular choice for use cases like this, as well as many others. A cache is essentially somewhere to store data that you can get back much quicker than making a full DB query, or a file read, etc.
Before we go any further, it's worth considering your use case. If you're serving only a few users and have a low load on your service, caching might be over-engineering and in fact making a DB request might be the simplest idea. Adding caching can add a lot of complexity to your code as things move forward, not enough to scare you, but enough to cause hard to trace bugs. So consider for a moment your service load, if it's not very high (say an internal application for somewhere you work with only maybe a few requests every few minutes) then just reading from the DB is probably not going to slow down a request too much. In this case, reading from the DB is the simplest and probably best solution. However, if you're noticing that this DB request is slowing down your application for requests or making it harder to scale up, then caching might be for you.
A really popular approach for this would be to get something like "redis" which is a key-value database that holds everything in memory (RAM). Redis can sit as a service like MySQL and has a very basic query language. It is blindingly fast and can scale to enormous loads. If you're using Express, there are a number of NPM modules that help you access a redis instance. Simply push in your credentials and you can then make GET and SET requests (to get data or to set data).
In your example, you may wish to store a users profile in a JSON format against their user id or username in redis. Then, create a function called getUserProfile which takes in the ID or username. This can then look it up in redis, if it finds the record then it can return it to your main controller logic. If it does not, it can look it up in your MySQL database, save it in redis, and then return it to the controller logic (so it'll be able to get it from cache next time).
Your next problem is known for being a very pesky problem in computer science. It's "Cache Invalidation", in this case if your user profile updates you want to "invalidate" your cache. A way of doing this would be to update your cached version when the user updates their profile (or any other data saved). Alternatively, you could also just remove the cached version from redis and then next time it's requested from getUserProfile, it will be fetched from the DB fresh, and then put into redis for next time.
There are many other ways to approach this, but this will most likely solve your problem in the simplest way without too much overhead. It will also be easy to expand in the future!

Dropping a Mongo Database Collection in Meteor

Is there any way to drop a Mongo Database Collection from within the server side JavaScript code with Meteor? (really drop the whole thing, not just Meteor.Collection.remove({}); it's contents)
In addition, is there also a way to drop a Meteor.Collection from within the server side JavaScript code without dropping the corresponding database collection?
Why do that?
Searching in the subdocuments (subdocuments of the user-document, e.g. userdoc.mailbox[12345]) with underscore or similar turns out quiet slow (e.g. for large mailboxes).
On the other hand, putting all messages (in context of the mailbox-example) of all users in one big DB and then searching* all messages for one or more particular messages turns out to be very, very slow (for many users with large mailboxes), too.
There is also the size limit for Mongo documents, so if I store all messages of a user in his/her user-document, the mailbox's maximum size is < 16 MB together with all other user-data.
So I want to have a database for each of my user to use it as a mailbox, then the maximum size for one message is 16 MB (very acceptable) and I can search a mailbox using mongo queries.
Furthemore, since I'm using Meteor, it would be nice to then have this mongo db collection be loaded as Meteor.Collection whenever a user logs in. When a user deactivates his/her account, the db should of course be dropped, if the user just logs out, only the Meteor.Collection should be dropped (and restored when he/she logs in again).
To some extent, I got this working already, each user has a own db for the mailbox, but if anybody cancels his/her account, I have to delete this particular Mongo Collection manually. Also, I have do keep all mongo db collections alive as Meteor.Collections at all times because I cannot drop them.
This is a well working server-side code snippet for one-collection-per-user mailboxes:
var mailboxes = {};
Meteor.users.find({}, {fields: {_id: 1}}).forEach(function(user) {
mailboxes[user._id] = new Meteor.Collection("Mailbox_" + user._id);
});
Meteor.publish("myMailbox", function(_query,_options) {
if (this.userId) {
return mailboxes[this.userId].find(_query, _options);
};
});
while a client just subscribes with a certain query with this piece of client-code:
myMailbox = new Meteor.Collection("Mailbox_"+Meteor.userId());
Deps.autorun(function(){
var filter=Session.get("mailboxFilter");
if(_.isObject(filter) && filter.query && filter.options)
Meteor.subscribe("myMailbox",filter.query,filter.options);
});
So if a client manipulates the session variable "mailboxFilter", the subscription is updated and the user gets a new bunch of messages in the minimongo.
It works very nice, the only thing missing is db collection dropping.
Thanks for any hint already!
*I previeously wrote "dropping" here, which was a total mistake. I meant searching.
A solution that doesn't use a private method is:
myMailbox.rawCollection().drop();
This is better in my opinion because Meteor could randomly drop or rename the private method without any warning.
You can completely drop the collection myMailbox with myMailbox._dropCollection(), directly from meteor.
I know the question is old, but it was the first hit when I searched for how to do this
Searching in the subdocuments...
Why use subdocuments? A document per user I suppose?
each message must be it's own document
That's a better way, a collection of messages, each is id'ed to the user. That way, you can filter what a user sees when doing publish subscribe.
dropping all messages in one db turns out to be very slow for many users with large mailboxes
That's because most NoSQL DBs (if not all) are geared towards read-intensive operations and not much with write-intensive. So writing (updating, inserting, removing, wiping) will take more time.
Also, some online services (I think it was Twitter or Yahoo) will tell you when deactivating the account: "Your data will be deleted within the next N days." or something that resembles that. One reason is that your data takes time to delete.
The user is leaving anyway, so you can just tell the user that your account has been deactivated, and your data will be deleted from our databases in the following days. To add to that, so you can respond to the user immediately, do the remove operation asynchronously by sending it a blank callback.

Categories

Resources