Progress event for large Firebase queries?

Progress event for large Firebase queries? - javascript

I have the following query:
fire = new Firebase 'ME.firebaseio.com'
users = fire.child 'venues/ID/users'
users.once 'value', (snapshot) ->
# do things with snapshot.val()
...
I am loading 10+ mb of data, and the request takes around 1sec/mb. Is it possible to give the user a progress indicator as content streams in? Ideally I'd like to process the data as it comes in as well (not just notify).
I tried using the on "child_added" event instead, but it doesn't work as expected - instead of children streaming in at a consistent rate, they all come at once after the entire dataset is loaded (which takes 10-15 sec), so in practice it seems to be a less performant version of on "value".

You should be able to optimize your download time from 10-20secs to a few milliseconds by starting with some denormalization.
For example, we could move the images and any other peripherals comprising the majority of the payload to their own path, keep only the meta data (name, email, etc) in the user records, and grab the extras separately:
/users/user_id/name, email, etc...
/images/user_id/...
The number of event listeners you attach or paths you connect to does not have any significant overhead locally or for networking bandwidth (just the payload) so you can do something like this to "normalize" after grabbing the meta data:
var firebaseRef = new Firebase(URL);
firebaseRef.child('users').on('child_added', function(snap) {
console.log('got user ', snap.name());
// I chose once() here to snag the image, assuming they don't change much
// but on() would work just as well
firebaseRef.child('images/'+snap.name()).once('value', function(imageSnap) {
console.log('got image for user ', imageSnap.name());
});
});
You'll notice right away that when you move the bulk of the data out and keep only the meta info for users locally, they will be lightning-fast to grab (all of the "got user" logs will appear right away). Then the images will trickle in one at a time after this, allowing you to create progress bars or process them as they show up.
If you aren't willing to denormalize the data, there are a couple ways you could break up the loading process. Here's a simple pagination approach to grab the users in segments:
var firebaseRef = new Firebase(URL);
grabNextTen(firebaseRef, null);
function grabNextTen(ref, startAt) {
ref.limit(startAt? 11 : 10).startAt(startAt).once('value', function(snap) {
var lastEntry;
snap.forEach(function(userSnap) {
// skip the startAt() entry, which we've already processed
if( userSnap.name() === lastEntry ) { return; }
processUser(userSnap);
lastEntry = userSnap.name();
});
// setTimeout closes the call stack, allowing us to recurse
// infinitely without a maximum call stack error
setTimeout(grabNextTen.bind(null, ref, lastEntry);
});
}
function processUser(snap) {
console.log('got user', snap.name());
}
function didTenUsers(lastEntry) {
console.log('finished up to ', lastEntry);
}
A third popular approach would be to store the images in a static cloud asset like Amazon S3 and simply store the URLs in Firebase. For large data sets in the hundreds of thousands this is very economical, since those solutions are a bit cheaper than Firebase storage.
But I'd highly suggest you both read the article on denormalization and invest in that approach first.

Related

Angular: Increase Query Loading Time in Firebase Database

I have an angular app where i am querying my firebase database as below:
constructor() {
this.getData();
}
getData() {
this.projectSubscription$ = this.dataService.getAllProjects()
.pipe(
map((projects: any) =>
projects.map(sc=> ({ key: sc.key, ...sc.payload.val() }))
),
switchMap(appUsers => this.dataService.getAllAppUsers()
.pipe(
map((admins: any) =>
appUsers.map(proj =>{
const match: any = admins.find(admin => admin.key === proj.admin);
return {...proj, imgArr: this.mapObjectToArray(proj.images), adminUser: match.payload.val()}
})
)
)
)
).subscribe(res => {
this.loadingState = false;
this.projects = res.reverse();
});
}
mapObjectToArray = (obj: any) => {
const mappedDatas = [];
for (const key in obj) {
if (Object.prototype.hasOwnProperty.call(obj, key)) {
mappedDatas.push({ ...obj[key], id: key });
}
}
return mappedDatas;
};
And here is what I am querying inside dataService:
getAllProjects() {
return this.afDatabase.list('/projects/', ref=>ref.orderByChild('createdAt')).snapshotChanges();
}
getAllAppUsers() {
return this.afDatabase.list('/appUsers/', ref=>ref.orderByChild('name')).snapshotChanges();
}
The problem I am facing with this is I have 400 rows of data which I am trying to load and it is taking around 30seconds to load which is insanely high. Any idea how can I query this in a faster time?

We have no way to know whether the 30s is reasonable, as that depends on the amount of data loaded, the connection latency and bandwidth of the client, and more factors we can't know/control.
But one thing to keep in mind is that you're performing 400 queries to get the users of each individual app, which is likely not great for performance.
Things you could consider:
Pre-load all the users once, and then use that list for each project.
Duplicate the name of each user into each project, so that you don't need to join any data at all.
If you come from a background in relational databases the latter may be counterintuitive, but it is actually very common in NoSQL data modeling and is one of the reasons NoSQL databases scale so well.

I propose 3 solutions.
1. Pagination
Instead of returning all those documents on app load, limit them to just 10 and keep record of the last one. Then display the 10 (or any arbitrary base number)
Then make the UI in such a way that the user has to click next or when the user scrolls, you fetch the next set based on the previous last document's field's info.
I'm supposing you need to display all the fetched data in some table or list so having the UI paginate the data should make sense.
2. Loader
Show some loader UI on website load. Then when all the documents have fetched, you hide the loader and show the data as you want. You can use some custom stuff for loader, or choose from any of the abundant libraries out there, or use mat-progress-spinner from Angular Material
3. onCall Cloud Function
What if you try getting them through an onCall cloud function? It night be faster because it's just one request that the app will make and Firebase's Cloud Functions are very fast within Google's data centers.
Given that the user's network might be slow to iterate the documents but the cloud function will return all at once and that might give you what you want.
I guess you could go for this option only if you really really need to display all that data at once on website load.
... Note on cost
Fetching 400 or more documents every time a given website loads might be expensive. It'll be expensive if the website is visited very frequently by very many users. Firebase cost will increase as you are charged per document read too.
Check to see if you could optimise the data structure to avoid fetching this much.
This doesn't apply to you if this some admin dashboard or if fetching all users like this is done rarely making cost to not be high in that case.

Synchronize critical section in API for each user in JavaScript

I wanted to swap a profile picture of a user. For this, I have to check the database to see if a picture has already been saved, if so, it should be deleted. Then the new one should be saved and entered into the database.
Here is a simplified (pseudo) code of that:
async function changePic(user, file) {
// remove old pic
if (await database.hasPic(user)) {
let oldPath = await database.getPicOfUser(user);
filesystem.remove(oldPath);
}
// save new pic
let path = "some/new/generated/path.png";
file = await Image.modify(file);
await Promise.all([
filesystem.save(path, file),
database.saveThatUserHasNewPic(user, path)
]);
return "I'm done!";
}
I ran into the following problem with it:
If the user calls the API twice in a short time, serious errors occur. The database queries and the functions in between are asynchronous, causing that the changes of the first API call weren't applied when the second API checks for a profile pic to delete. So I'm left with a filesystem.remove request for an already unexisting file and an unremoved image in the filesystem.
I would like to safely handle that situation by synchronizing this critical section of code. I don't want to reject requests only because the server hasn't finished the previous one and I also want to synchronize it for each user, so users aren't bothered by the actions of other users.
Is there a clean way to achieve this in JavaScript? Some sort of monitor like you know it from Java would be nice.

You could use a library like p-limit to control your concurrency. Use a map to track the active/pending requests for each user. Use their ID (which I assume exists) as the key and the limit instance as the value:
const pLimit = require('p-limit');
const limits = new Map();
function changePic(user, file) {
async function impl(user, file) {
// your implementation from above
}
const { id } = user // or similar to distinguish them
if (!limits.has(id)) {
limits.set(id, pLimit(1)); // only one active request per user
}
const limit = limits.get(id);
return limit(impl, user, file); // schedule impl for execution
}
// TODO clean up limits to prevent memory leak?

Handling large data sets on client side

I'm trying to build an application that uses Server Sent Events in order to fetch and show some tweets (latest 50- 100 tweets) on UI.
Url for SSE:
https://tweet-service.herokuapp.com/stream
Problem(s):
My UI is becoming unresponsive because there is a huge data that's coming in!
How do I make sure My UI is responsive? What strategies should I usually adopt in making sure I'm handling the data?
Current Setup: (For better clarity on what I'm trying to achieve)
Currently I have a Max-Heap that has a custom comparator to show latest 50 tweets.
Everytime there's a change, I am re-rendering the page with new max-heap data.

We should not keep the EventSource open, since this will block the main thread if too many messages are sent in a short amount of time. Instead, we only should keep the event source open for as long as it takes to get 50-100 tweets. For example:
function getLatestTweets(limit) {
return new Promise((resolve, reject) => {
let items = [];
let source = new EventSource('https://tweet-service.herokuapp.com/stream');
source.onmessage = ({data}) => {
if (limit-- > 0) {
items.push(JSON.parse(data));
} else {
// resolve this promise once we have reached the specified limit
resolve(items);
source.close();
}
}
});
}
getLatestTweets(100).then(e => console.log(e))
You can then compare these tweets to previously fetched tweets to figure out which ones are new, and then update the UI accordingly. You can use setInterval to call this function periodically to fetch the latest tweets.

Efficient DB design with PouchDB/CouchDB

So I was reading a lot about how to actually store and fetch data in an efficient way. Basically my application is about time management/capturing for projects. I am very happy for any opinion on which strategy I should use or even suggestions for other strategies. The main concern is about the limited resources for local storage on the different Browsers.
This is the main data I have to store:
db_projects: This is a database where the projects itself are stored.
db_timestamps: Here go the timestamps per project whenever a project is running.
I came up with the following strategies:
1: Storing the status of the project in the timestamps
When a project is started, there is addad a timestamp to db_timestamps like so:
db_timestamps.put({
_id: String(Date.now()),
title: projectID,
status: status //could be: 1=active/2=inactive/3=paused
})...
This follows the strategy to only add data to the db and not modify any entries. The problem I see here is that if I want to get all active projects for example, I would need to query the whole db_timestamp which can contain thousands of entries. Since I can not use the ID to search all active projects, this could result in a quite heavy DB query.
2: Storing the status of the project in db_projects
Each time a project changes it's status, there is a update to the project itself. So the "get all active projects"-query would be much resource friendly, since there are a lot less projects than timestamps. But this would also mean that each time a status change happens, the project entry would be revisioned and therefor would produce "a lot" of overhead. I'm also not sure if the compaction feature would do a good job, since not all revision data is deleted (the documents are, but the leaf revisions not). This means for a state change we have at least the _rev information which is still a string of 34 chars for changing only the status (1 char). Or can I delete the leaf revisions after conflict resolution?
3: Storing the status in a separate DB like db_status
This leads to the same problem as in #2 since status changes lead to revisions on this DB. Or if the states would be added in "only add data"-mode (like in #1), it would just quickly fill with entries.

The general problem is that you have a limited amount of space that you could put into indexedDB. On the other hand the principle of ChouchDB is that storage space is cheap (which it is indeed true when you store on the server side only). Here an interesting discussion about that.
So this is the solution that I use for now. I am using a mix between solution 1 and solution 2 from above with the following additions:
Storing only the timesamps in a synced Database (db_timestamps) with the "only add data" principle.
Storing the projects and their states in a separate local (not
synced) database (db_projects). Therefor I still use pouchDB since
it has a lot simpler API than indexedDB.
Storing the new/changed
project status in each timestamp aswell (so you could rebuild db_projects
out of db_timestams if needed)
Deleting db_projects every so often and repopulate it, so the
revision data (overhead for this db in my case) is eliminated and the size is acceptable.
I use the following code to rebuild my DB:
//--------------------------------------------------------------------
function rebuild_db_project(){
db_project.allDocs({
include_docs: true,
//attachments: true
}).then(function (result) {
// do stuff
console.log('I have read the DB and delete it now...');
deleteDB('db_project', '_pouch_DB_Projekte');
return result;
}).then(function (result) {
console.log('Creating the new DB...'+result);
db_project = new PouchDB('DB_Projekte');
var dbContentArray = [];
for (var row in result.rows) {
delete result.rows[row].doc._rev; //delete the revision of the doc. else it would raise an error on the bulkDocs() operation
dbContentArray.push(result.rows[row].doc);
}
return db_project.bulkDocs(dbContentArray);
}).then(function(response){
console.log('I have successfully populated the DB with: '+JSON.stringify(response));
}).catch(function (err) {
console.log(err);
});
}
//--------------------------------------------------------------------
function deleteDB(PouchDB_Name, IndexedDB_Name){
console.log('DELETE');
new PouchDB(PouchDB_Name).destroy().then(function () {
// database destroyed
console.log("pouchDB destroyed.");
}).catch(function (err) {
// error occurred
});
var DBDeleteRequest = window.indexedDB.deleteDatabase(IndexedDB_Name);
DBDeleteRequest.onerror = function(event) {
console.log("Error deleting database.");
};
DBDeleteRequest.onsuccess = function(event) {
console.log("IndexedDB deleted successfully");
console.log(request.result); // should be null
};
}
So I not only use the pouchDB.destroy() command but also the indexedDB.deleteDatabase() command to get the storage freed nearly completely (there is still some 4kB that are not freed, but this is insignificant to me.)
The timings are not really proper but it works for me. I'm happy if somone has an idea to make the timing work properly (The problem for me is that indexedDB does not support promises).

Self-triggered perpetually running Firebase process using NodeJS

I have a set of records that I would like to update sequentially in perpetuity. Basically:
Get least recently updated record
Update record
Set date of record to now (aka. send it to the back of the list)
Back to step 1
Here is what I was thinking using Firebase:
// update record function
var updateRecord = function() {
// get least recently updated record
firebaseOOO.limit(1).once('value', function(snapshot) {
key = _.keys(snapshot.val())[0];
/*
* do 1-5 seconds of non-Firebase processing here
*/
snapshot.ref().child(key).transaction(
// update record
function(data) {
return updatedData;
},
// update priority after commit (would like to do it in transaction)
function(error, committed, snap2) {
snap2.ref().setPriority(snap2.dateUpdated);
}
);
});
};
// listen whenever priority changes (aka. new item needs processing)
firebaseOOO.on('child_moved', function(snapshot) {
updateRecord();
});
// kick off the whole thing
updateRecord();
Is this a reasonable thing to do?

In general, this type of daemon is precisely what was envisioned for use with the Firebase NodeJS client. So, the approach looks good.
However, in the on() call it looks like you're dropping the snapshot that's being passed in on the floor. This might be application specific to what you're doing, but it would be more efficient to consume that snapshot in relation to the once() that happens in the updateRecord().

Develop Reference

JavaScript is the programming language of the Web.