Deleting old documents based on timestamp through Firebase Functions

Deleting old documents based on timestamp through Firebase Functions - javascript

I'm trying to delete specific document in the collection based on timestamp of that document. Posts that are out of date I set them to, should be deleted from the cloud on the script call.
The problem is that I couldn't manage to find a way to iterate over all the documents in the collection, so that I can access the fields and compare Date.now() to post['expireDate].
I'm not using the realtime database, but the firestore cloud for my project. I've found a way to do it in the db, but no on the cloud, and have tried different ways to do it.
exports.removeOldPosts = functions.https.onRequest((req, res) => {
const timeNow = Date.now();
let postsRef = admin.firestore().collection('accesories/').listDocuments();
postsRef.forEach(post => {
if (post['expiredDate'] < timeNow) {
post.delete();
}
})
return res.status(200).end();
});

You're using the listDocuments() API, which returns (asynchronously) a list of document references. But your code :
does not deal with the fact that the call is asynchronous. I.e. you don't have a then() callback, as in the example of the documentation.
assumes that the data of the document is retrieved, while listDocuments in reality only returns document references.
To fix this, I recommend using the get() API that is shown in the Firebase documentation on getting all documents from a collection. You'll still have to deal with the asynchronous nature yourself though, even with that API, so I recommend studying up on the asynchronous nature of Cloud Functions and modern web APIs. One placeto get started could be Doug's excellent blog post Why are the Firebase API asynchronous? and video series Learn JavaScript Promises.
As a separate note: instead of reading all documents from the database, and then determining in your code which ones to delete, I'd recommend using a query to only retrieve the documents that you want to delete. This leads to reading fewer documents, which saves you some cost in both document reads and bandwidth.

Related

Batch Update Firebase Collection

I am attempting to update the "likes" field for every document in a collection in Firebase. This operation must be done once every week, so we are using Firebase Cloud Functions to achieve this. As of now, the following code works to achieve this goal (in Web version 8 namespaced):
await db.collection("events").get().then((querySnapshot) => {
querySnapshot.forEach((doc) => {
doc.ref.update({
likes: 0
});
});
However, I would like to implement batch writes to make this operation scalable. I would also like to keep the bill generated from using Firebase to a minimum. How can this be implemented in this particular case? I have been looking through the Firebase Documentation in the Transactions and Batched Writes section, but have not found much help. The issue is that the document names must be known ahead of time according to the docs.

Firebase: best practice to display items

I would like a suggestion.
I'm developing a web app using react-js as frontend and firebase as backend.
For sake of semplicity, say the aim of the web-app is to allow the user to upload some items which are then displayed on the homepage
An item consists of two string: 1) name of item 2) url pointed to an image stored on IPFS
Once the user submit a form, the frontend creates a new doc in a collection in firestore:
await addDoc(collectionRef, newDoc)
Then, in the homepage, I display all the items in the collection
const querySnapshot = await getDocs(collectionRef);
querySnapshot.forEach((doc) => {
...
});
This works; however, if I've understood properly, every time the homepage is refreshed the frontend makes N read calls (where N is the number of docs in the collection). Therefore, I'm wondering if its the right approach, is there a better way?
I really appreaciate any suggestion (also regarding potential major safety flaws in this setting)

Every time the homepage is refreshed the frontend makes N read calls
(where N is the number of docs in the collection).
With const querySnapshot = await getDocs(collectionRef); this is indeed the case: all the docs in the collection are fetched.
If your collection contains a lot of documents and is fetched frequently this will cost you money and, at one moment, the performance of you app may be degraded (duration of the Firestore query execution and lot of docs transferred from the back-end to the front-end).
One approach to avoid that is to use pagination and fetch only the first X documents and display a link to fetch the X next ones, etc.
There is a page in the Firestore documentation dedicated to pagination implementation.
You could also implement an infinite scroll mechanism instead of a link, to load the next set of X document continuously as the user scrolls down the page. That's just a UI variation and is based on the exact same approach detailed in the Firestore documentation.
Also regarding potential major safety flaws in this setting
Fetching all the docs of a collection or using pagination does not make any difference in terms of security. In any case you should implement security according to your requirements by using security rules (and potentially Firebase Auth as well). And don't forget that security rules are not filters.

Best way to have a Node.JS server keep updated with a FireBase database in real time?

I currently have a Node.JS server set up that is able to read and write data from a FireBase database when a request is made from a user.
I would like to implement time based events that result in an action being performed at a certain date or time. The key thing here though, is that I want to have the freedom to do this in seconds (for example, write a message to console after 30 seconds have passed, or on Friday the 13th at 11:30am).
A way to do this would be to store the date/time an action needs be performed in the database, and read from the database every second and compare the current date/time with events stored so we know if an action needs to be performed at this moment. As you can imagine though, this would be a lot of unnecessary calls to the database and really feels like a poor way to implement this system.
Is there a way I can stay synced with the database without having to call every second? Perhaps I could then store a version of the events table locally and update this when a change is made to the database? Would that be a better idea? Is there another solution I am missing?
Any advice would be greatly appreciated, thanks!
EDIT:
How I currently initialise the database:
firebase.initializeApp(firebaseConfig);
var database = firebase.database();
How I then get data from the database:
await database.ref('/').once('value', function(snapshot){
snapshot.forEach(function(childSnapshot){
if(childSnapshot.key === userName){
userPreferences = childSnapshot.val().UserPreferences;
}
})
});

The Firebase once() API reads the data from the database once, and then stops observing it.
If you instead us the on() API, it will continue observing the database after getting the initial value - and call your code whenever the database changes.

It sounds like you're looking to develop an application for scheduling. If that's the case you should check out node-schedule.
Node Schedule is a flexible cron-like and not-cron-like job scheduler
for Node.js. It allows you to schedule jobs (arbitrary functions) for
execution at specific dates, with optional recurrence rules. It only
uses a single timer at any given time (rather than reevaluating
upcoming jobs every second/minute).
You then can use the database to keep a "state" of the application so on start-up of the application you read all the upcoming jobs that will be expected and load them into node-schedule and let node-schedule do the rest.

The Google Cloud solution for scheduling a single item of future work is Cloud Tasks. Firebase is part of Google Cloud, so this is the most natural product to use. You can use this to avoid polling the database by simply specifying exactly when some Cloud Function should run to do the work you want.
I've written a blog post that demonstrates how to set up a Cloud Task to call a Cloud Functions to delete a document in Firestore with an exact TTL.

Reading Multiple Firebase Documents One By One

So in my web application, I am fetching data from firebase, it will have about 5000 documents by the time it's ready. So to reduce the number of reads, I keep an array in firebase with a list of ids of each documents, which I read and then compare with the list I have saved locally. If an id is missing, I read that particular doc from firebase (Firebase website definitely slowed down with having an array with 3000 docs so far).
However, consider the situation where I have 2000+ docs missing and I have to fetch them one by one according to missingList I created from comparing the firebase and local arrays. It is heck a lot of slow, it takes so long to fetch even a hundred documents because I am using await. If I don't, then firebase is overloaded with requests and for some reason it shows a warning that my net is disconnected.
What is the best way to fetch documents with ids given in an array, with no loss and reasonably fast enough? Is there a way for me to batch read documents like there is batch write, update and set?
Say Firebase has [1234, 1235, 1236, 1237] and I only have [1234] so I need to read [1235, 1236, 1237] from it.
I am using a for loop to iterate through each id and get the corresponding document like so
for (let i = 0; i < missingList.length; i++) {
var item = await db.collection('myCollection').doc(missingList[i]).get().then((snapshot) => snapshot.data())
await saveToDB(item) //I use PouchDB for local storage
}

The closest thing to a 'batched read' in Firestore is a transaction, but using a transation to read hundreds of document is not really efficient. Instead, I suggest you add one field to every document, like token. Then, every time you get data for the first time, generate a random token client-side, then write it to every document that you read. After that, if you detect a change in the list of ids you mention above, run a query on that token field, like :
db.collection('myCollection').where('token', '!=', locallySavedToken).get()
Remember to write your local token back to the read documents. This way, your query time will be much faster. The down side is you need 1 extra write request for every document read. If you need to re-read your data a lot then maybe look into transactions, since write requests pricing is more expensive than read requests

Firestore Offline Cache & Promises

This question is a follow-up on Firestore offline cache. I've read the offline cache documentation but am confused on one point.
One commenter answered in the prior question (~ year ago):
You Android code that interact with the database will be the same
whether you're connected or not, since the SDK simply works the same."
In the API documentation for DocumentReference's set method, I just noticed that it says:
Returns
non-null Promise containing void A promise that resolves once
the data has been successfully written to the backend. (Note that it
won't resolve while you're offline).
Emphasis mine. Wouldn't this bit in the documentation suggest that the code won't behave the same, or am I missing something? If I'm waiting on the .set() to resolve before allowing some user interaction, it sounds from this bit like I need to adjust the code for an offline case differently than I would normally.
The CollectionReference's add method worries me a bit more. It doesn't have exactly the same note but says (emphasis mine):
A Promise that resolves with a DocumentReference pointing to the newly created document after it has been written to the backend.
That is a little more vague as not sure if "backend" in this case is a superset of "cache" and "server" or if it's meant to denote only the server. If this one doesn't resolve, that would mean that the following wouldn't work, correct?
return new Promise((resolve, reject) => {
let ref = firestore.collection(path)
ref.add(data)
.then(doc => {
resolve({ id: doc.id, data: data })
})
...
})
Meaning, the .add() would not resolve, .then() would not run, and I wouldn't have access to the id of the document that was just added. I hope I'm just misunderstanding something and that my code can continue to function as-is both online and offline.

You have two of concerns here, which are not really related. I'll explain both of them separately.
For the most part, developers don't typically care if the promise from a document update actually resolves or not. It's almost always "fire and forget". What would an app gain to know that the update hit the server, as long as the app behaves the same way regardless? The local cache has been updated, and all future queries will show that the document has been updated, even if the update hasn't been synchronized with the server yet.
The primary exception to this is transactions. Transactions require that the server be online, because round trips need to be made between the client and server in order to ensure that the update was atomic. Transactions simply don't work offline. If you need to know if a transaction worked, you need to be online. Unlike normal document writes, transactions don't persist in local cache. If the app is killed before the transaction completes on the server, the transaction is lost.
Your second concern is about newly added documents where the id of the document isn't defined at the time of the update. It's true that add() returns a promise that only resolves when the new document exists on the server. You can't know the id of the document until the promise delivers you the DocumentReference of the new document.
If this behavior doesn't work for you, you can generate a new id for a document by simply calling doc() with no arguments instead of add(). doc() immediately returns the DocumentReference of the new (future) document that hasn't been written (until you choose to write it). In both the case of doc() and add(), these DocumentReference objects contain unique ids generated on the client. The difference is that with doc(), you can use the id immediately, because you get a DocumentReference immediately. With add(), you can't, because the DocumentReference isn't provided until the promise resolves. If you need that new document id right now, even while offline, use doc() instead of add(). You can then use the returned DocumentReference to create the document offline, stored in the local cache, and synchronized later. The update will then return a promise that resolves when the document is actually written.

Develop Reference

JavaScript is the programming language of the Web.