We use ndb datastore in our current python 2.7 standard environment. We migrating this application to python 3.7 standard environment with firestore (native mode).
We use pagination on ndb datastore and construct our query using fetch.
query_results , next_curs, more_flag = query_structure.fetch_page(10)
The next_curs and more_flag are very useful to indicate if there is more data to be fetched after the current query (to fetch 10 elements). We use this to flag the front end for "Next Page" / "Previous Page".
We can't find an equivalent of this in Firestore. Can someone help how to achieve this?
There is no direct equivalent in Firestore pagination. What you can do instead is fetch one more document than the N documents that the page requires, then use the presence of the N+1 document to determine if there is "more". You would omit the N+1 document from the displayed page, then start the next page at that N+1 document.
I build a custom firestore API not long ago to fetch records with pagination. You can take a look at the repository. This is the story of the learning cycle I went through:
My first attempt was to use limit and offset, this seemed to work like a charm, but then I walked into the issue that it ended up being very costly to fetch like 200.000 records. Because when using offset, google charges you also for the reads on all the records before that. The Google Firestore Pricing Page clearly states this:
There are no additional costs for using cursors, page tokens, and
limits. In fact, these features can help you save money by reading
only the documents that you actually need.
However, when you send a query that includes an offset, you are
charged a read for each skipped document. For example, if your query
uses an offset of 10, and the query returns 1 document, you are
charged for 11 reads. Because of this additional cost, you should use
cursors instead of offsets whenever possible.
My second attempt was using a cursor to minimize those reads. I ended up fetching N+1 documents and place the cursor like so:
collection = 'my-collection'
cursor = 'we3adoipjcjweoijfec93r04' # N+1th doc id
q = db.collection(collection)
snapshot = db.collection(collection).document(cursor).get()
q = q.start_at(snapshot) # Place cursor at this document
docs = q.stream()
Google wrote a whole page on pagination in Firestore. Some useful query methods when implementing pagination:
limit() limits the query to a fixed set of documents.
start_at() includes the cursor document.
start_after() starts right after the cursor document.
order_by() ensures all documents are ordered by a specific field.
Related
Im using Firestore with web javascript sdk.
Assume following scheme:
User Doc -> Friends collection
I want to know when someone change/remove/add data to it.
so what I wrote is something like this:
friendsCollectionRef.onSnapshot(snapshot => {
snapshot.docChanges().forEach(change => {
onChange(change);
});
});
The problem is that whenever I refresh the page, it keeps calling the onChange with data that was updated in my last session..
Is there a way to get only NEW data and not retroactively?
I would like to avoid store "LastUpdate" field on everything.
This, of course, should not be done in client side because then I pay for network which im never going to use..
So storing a boolean isFirstCall in out of the question.
As explained in the doc, when you listen to multiple documents in a collection with the onSnapshot() method:
The first query snapshot contains added events for all existing
documents that match the query. This is because you're getting a set
of changes that bring your query snapshot current with the initial
state of the query.
So each time you refresh your page you are calling again the onSnapshot() method "from scratch" and therefore you get the first query snapshot with all the collection docs.
In other words, I think you will have to implement your "home-made" mechanism to only get the documents you want (probably a "LastUpdate" field...).
You may be interested by this SO answer which shows how to add a createdAt timestamp to a Firestore document via a Cloud Function. You could easily adapt it to record the last update. It would be more complicated if you want to detect the Documents that were deleted since the last fetch.
Is there any way to get all document IDs in one shot (as a single read) from a collection (or subcollection) with Cloud Firestore javascript web library? I don't want to read their content, I only need the id-s. As far as I could monitor in Chrome var docs = (await firebase.firestore().collection("collection-path").get()).docs.map(x => x.id) will retrieve the content inclusive. As I am talking about a large collection it takes quite a while, and this is not an option even if it would count a single read operation.
There is no such API in the provided client libraries. You have to query for all the documents and read them, even if you just want their IDs.
To import my crm data into Google analytics (GA), I linked the UserID of my users with ClientID in GA.
For this, I used the following code from GA documentation:
ga('set', 'userId', '432432');
Over time, the format of the User IDs on my website has changed - instead of the numbers, hashes are now used.
Can I now use the same code above, but only with new identifiers of my users, to send UserIDs то GA without damage current analytics?
In short, can I override the current User IDs in GA so that one user is not identified by the GA system as two different people?
You can't overwrite the historical data already processed by Google Analytics, only those of the current day.
You could apply the new algorithm only to new users of the crm, from a given id on, leaving the same encoding (numbers) for the previous ones (users already processed by Analytics).
If you have a mapping table between the new and old ids there is a solution.
You need to take out the data from GA - historically. (You can use the paid service from Scitylana for this - you get data in BigQuery or in S3)
Then you need a copy of the new_id old_id mappings table in the database where you put the exported data from Scitylana.
Since you can no longer rely on the ga:userType variable (new/returning). You need to create a query that calculates it again using the consolidated new ids.
This can all be set up in a flow that updates nightly.
But you need to analyze via SQL or use a dashboard tool like Power BI, Data Studio, Tableau etc.
Since the data from Scitylana is hit-level you can calculate everything correct, no need to worry about double aggregations etc.
(I work at Scitylana)
Here's my setup:
I'm running a Node.js Web App in Azure, which is using Azure Table Storage (Non-SQL). To work with table storage I'm using the azure-storage npm module.
What I'm trying to do:
So I have a system that's tracking events for devices. In storage I'm setting my PartitionKey to be the deviceId and I'm setting the RowKey to be the eventId.
Adding events is straight forward; add them one at a time as they occur.
Retrieving them is easy using the query structure.
However, deleting large quantities of entries seems to be a pain. It appears you can only delete one entity at a time. There doesn't seem to be a query based implementation.
There is the option to use batches to create a large batch of delete operations; but I've just found that there is a cap of 100 operations per batch.
So I'm trying to delete all events for a single device; in my current case I have about 5000 events. So to achieve this I first have to query all my events with a GET request (and concatenate them using continuation tokens), then separate them into batches of 100, and then send 50 large requests in order to delete all the entries...
The same thing in SQL would be DELETE * WHERE deviceId='xxxxxxxx'
Surely there must be a better way than this!
The same thing in SQL would be DELETE * WHERE deviceId='xxxxxxxx'
Surely there must be a better way than this!
Sadly, there isn't :). You must fetch the entities based on your requirement and then delete them (either in batches or individually).
You can however optimize the fetching process by only fetching PartitionKey and RowKey from your table instead of all attributes as you only need these two attributes for deleting an entity.
How can one make one global database query (with PHP) and then use all the output (multiple rows) on various places on a webpage?
I read that javascripts can be called for each specific field on a webpage that needs data, but this is inefficient with regards to performance.
The webpage would contact a sort of table-of-contents with version numbers next to each of them. Those version numbers are stored inside the database and calling the database 20 times for the 20 different fields would be inefficient.
Any suggestions on how to run say a PHP query globally when the page loads and then later in the page use the different output at different locations on the page?
QUESTION UPDATE WITH EXAMPLE OF DESIRED OUTPUT:
The webpage should show the following output:
Document Name Document Version
DEPT A DOCS:
Doc ABC 1.2
- Description of doc
Doc another doc 2.3
- Description of doc
DEPT B DOCS:
Yet another doc 0.9
- Description of doc
Doc XYZ 3.0
- Description of doc
Each of the documents have its own version associated with it. Each document has its own table inside the database with its associated version and this can be queried from a Postgres function or View. I wish to query this function or view only once and then display the results in a sort of 'table-of-contents' style (or table sort of view) on the webpage.
Thank you
P.S. This is my first post here.
Make the query in a separate PHP page that is included in all the pages you want to use the information on.
In the beginning of your page make one database query to get data for all versions.
Using PHP split it into associative array having version number as key.
Then during in different section of your page just apply to that array with version number. output the data the way you need. and it will be the data of this version