meteorjs subscribe usage when collection is huge - javascript

I don't know the best way to handle huge mongo databases with meteorjs.
In my example I have a database collection with addresses in it with the geo location. (the whole code snippets are just examples)
Example:
{
address : 'Some Street',
geoData : [lat, long]
}
Now I have a form where the user can enter an address to get the geo-data. Very simple. But the problem is, that the collection with the geo data has millions of documents in it.
In Meteor you have to publish a collection on Server side and to subscribe on Client and Server side. So my code is like this:
// Client / Server
Geodata = new Meteor.collection('geodata');
// Server side
Meteor.publish('geodata', function(){
return Geodata.find();
});
// Client / Server
Meteor.subscribe('geodata');
Now a person has filled the form - after this I get the data. After this I search for the right document to return. My method is this:
// Server / Client
Meteor.methods({
getGeoData : function (address) {
return Geodata.find({address : address});
}
});
The result is the right one. And this is still working. But my question is now:
Which is the best way to handle this example with a huge database like in my example ? The problem is that Meteor saves the whole collection in the users cache when I subscribed it. Is there a way to subscribe to just the results I need and when the user reused the form then I can overwrite the subscribe? Or is there another good way to save the performance with huge databases and the way I use it in my example?
Any ideas?

Yes, you can do something like this:
// client
Deps.autorun(function () {
// will re subscribe every the 'center' session changes
Meteor.subscribe("locations", Session.get('center'));
});
// server
Meteor.publish('locations', function (centerPoint) {
// sanitize the input
check(centerPoint, { lat: Number, lng: Number });
// return a limited number of documents, relevant to our app
return Locations.find({ $near: centerPoint, $maxDistance: 500 }, { limit: 50 });
});
Your clients would ask only for some subset of the data at the time. i.e. you don't need the entire collection most of the time, usually you need some specific subset. And you can ask server to keep you up to date only to that particular subset. Bare in mind that more different "publish requests" your clients make, more work there is for your server to do, but that's how it is usually done (here is the simplified version).
Notice how we subscribe in a Deps.autorun block which will resubscribe depending on the center Session variable (which is reactive). So your client can just check out a different subset of data by changing this variable.

When it doesn't make sense to ship your entire collection to the client, you can use methods to retrieve data from the server.
In your case, you can call the getGeoData function when the form is filled out and then display the results after the method returns. Try taking the following steps:
Clearly divide your client and server code into their respective client and server directories if you haven't already.
Remove the geodata subscription on the server (only clients can activate subscriptions).
Remove the geodata publication on the server (assuming this isn't needed anymore).
Define the getGeoData method only on the server. It should return an object, not a cursor so use findOne instead of find.
In your form's submit event, do something like:
Meteor.call('getGeoData', address, function(err, geoData){Session.set('geoDataResult', geoData)});
You can then display the geoDataResult data in your template.

Related

Would giving response to client while letting asynchronous operation continue to run a good idea?

So I need to implement an "expensive" API endpoint. Basically, the user/client would need to be able to create a "group" of existing users.
So this "create group" API would need to check that each users fulfill the criteria, i.e. all users in the same group would need to be from the same region, same gender, within an age group etc. This operation can be quite expensive, especially since there are no limit on how many users in one group, so its possible that the client requests group of 1000 users for example.
My idea is that the endpoint will just create entry in database and mark the "group" as pending, while the checking process is still happening, then after its completed, it will update the group status to "completed" or "error" with error message, then the client would need to periodically fetch the status if its still pending.
My implementation idea is something along this line
const createGroup = async (req, res) => {
const { ownerUserId, userIds } = req.body;
// This will create database entry of group with "pending" status and return the primary key
const groupId = await insertGroup(ownerUserId, 'pending');
// This is an expensive function which will do checking over the network, and would take 0.5s per user id for example
// I would like this to keep running after this API endpoint send the response to client
checkUser(userIds)
.then((isUserIdsValid) => {
if (isUserIdsValid) {
updateGroup(groupId, 'success');
} else {
updateGroup(groupId, 'error');
}
})
.catch((err) => {
console.error(err);
updateGroup(groupId, 'error');
});
// The client will receive a groupId to check periodically whether its ready via separate API
res.status(200).json({ groupId });
};
My question is, is it a good idea to do this? Do I missing something important that I should consider?
Yes, this is the standard approach to long-running operations. Instead of offering a createGroup API that creates and returns a group, think of it as having an addGroupCreationJob API that creates and returns a job.
Instead of polling (periodically fetching the status to check whether it's still pending), you can use a notification API (events via websocket, SSE, webhooks etc) and even subscribe to the progress of processing. But sure, a check-status API (via GET request on the job identifier) is the lowest common denominator that all kinds of clients will be able to use.
Did I not consider something important?
Failure handling is getting much more complicated. Since you no longer create the group in a single transaction, you might find your application left in some intermediate state, e.g. when the service crashed (due to unrelated things) during the checkUser() call. You'll need something to ensure that there are no pending groups in your database for which no actual creation process is running. You'll need to give users the ability to retry a job - will insertGroup work if there already is a group with the same identifier in the error state? If you separate the group and the jobs into independent entities, do you need to ensure that no two pending jobs are trying to create the same group? Last but not least you might want to allow users to cancel a currently running job.

matrix-js-sdk setup and configuration

I am having some issues trying to connect to a matrix server using the matrix-js-sdk in a react app.
I have provided a simple code example below, and made sure that credentials are valid (login works) and that the environment variable containing the URL for the matrix client is set. I have signed into element in a browser and created two rooms for testing purposes, and was expecting these two rooms would be returned from matrixClient.getRooms(). However, this simply returns an empty array. With some further testing it seems like the asynchronous functions provided for fetching room, member and group ID's only, works as expected.
According to https://matrix.org/docs/guides/usage-of-the-matrix-js-sd these should be valid steps for setting up the matrix-js-sdk, however the sync is never executed either.
const matrixClient = sdk.createClient(
process.env.REACT_APP_MATRIX_CLIENT_URL!
);
await matrixClient.long("m.login.password", credentials);
matrixClient.once('sync', () => {
debugger; // Never hit
}
for (const room of matrixClient.getRooms()) {
debugger; // Never hit
}
I did manage to use the roomId's returned from await matrixClient.roomInitialSync(roomId, limit, callback), however this lead me to another issue where I can't figure out how to decrypt messages, as the events containing the messages sent in the room seems to be of type 'm.room.encrypted' instead of 'm.room.message'.
Does anyone have any good examples of working implementations for the matrix-js-sdk, or any other good resources for properly understanding how to put this all together? I need to be able to load rooms, persons, messages etc. and display these respectively in a ReactJS application.
It turns out I simply forgot to run startClient on the matrix client, resulting in it not fetching any data.

How do I publish a piece of data and then stop reacting to it?

I want to make a homepage where several pieces of data are published, but only when the user first visits the page : one would get the latest 10 articles published but that's it - it won't keep changing.
Is there a way to make the inbuilt pub/sub mechanism turn itself off after a set amount of time or number of records, or another mechanism?
Right now I'm using a very simple setup that doesn't "turn off":
latestNews = new Mongo.Collection('latestNews');
if (Meteor.isClient) {
Meteor.subscribe("latestNews");
}
if (Meteor.isServer) {
Meteor.publish('latestNews', function() {
return latestNews.find({}, {sort: { createdAt: -1 }, limit : 10});
});
}
The pub/sub pattern as it is implemented in Meteor is all about reactive data updates. In your case that would mean if the author or last update date of an article changes then users would see this change immediately reflected on their home page.
However you want to send data once and not update it ever again.
Meteor has a built-in functionality to handle this scenario : Methods. A method is a way for the client to tell the server to execute computations and/or send pure non-reactive data.
//Server code
var lastTenArticlesOptions = {
sort : {
createdAt : -1
},
limit : 10
}
Meteor.methods({
'retrieve last ten articles' : function() {
return latestNews.find({}, lastTenArticlesOptions).fetch()
}
})
Note that contrary to publications we do not send a Mongo.Cursor! Cursors are used in publications as a handy (aka magic) way to tell the server which data to send.
Here, we are sending the data the data directly by fetching the cursor to get an array of articles which will then be EJSON.stringifyied automatically and sent to the client.
If you need to send reactive data to the client and at a later point in time to stop pushing updates, then your best bet is relying on a pub/sub temporarily, and then to manually stop the publication (server-side) or the subscription (client-side) :
Meteor.publish('last ten articles', function() {
return latestNews.find({}, lastTenArticlesOptions)
})
var subscription = Meteor.subscribe('last ten articles')
//Later...
subscription.stop()
On the server-side you would store the publication handle (this) and then manipulate it.
Stopping a subscription or publication does not destroy the documents already sent (the user won't see the last ten articles suddenly disappear).

Meteor collection not updating subscription on client

I'm quite new on Meteor and Mongo and even if I don't want it, I need some relations.
I have a Collection called Feeds and another called UserFeeds where I have a feedid and a userid, and I publish the user feeds on the server like this:
Meteor.publish('feeds', function(){
return Feeds.find({_id:{$in:_.pluck(UserFeeds.find({user:this.userId}).fetch(),'feedid')}});
});
I find the user on UserFeeds, fetch it (returns an array) and pluck it to have only the feedid field, and then find those feeds on the Feeds collection.
And subscribe on the client like this:
Deps.autorun(function(){
Meteor.subscribe("feeds");
});
The problem is that when I add a new feed and a new userfeed the client doesn't receive the change, but when I refresh the page the new feed does appear.
Any idea of what I'm missing here?
Thanks.
I've run into this, too. It turns out publish functions on the server don't re-run reactively: if they return a Collection cursor, as you're doing (and as most publish functions do), then the publish function will run once and Meteor will store the cursor and send down updates only when the contents of the cursor change. The important thing here is that Meteor will not re-run the publish function, nor, therefore, the Collection.find(query), when query changes. If you want the publish function to re-run, then the way I've done it so far is to set up the publish function to receive an argument. That way the client, whose collections do update reactively, can re-subscribe reactively. The code would look something like:
// client
Meteor.subscribe('user_feeds');
Deps.autorun(function(){
var allFeeds = UserFeeds.find({user: Meteor.userId()}).fetch();
var feedIds = _.pluck(allFeeds,'feedid');
Meteor.subscribe('feeds',feedids);
});
// server
Meteor.publish('feeds',function(feedids) {
return Feeds.find({_id: {$in: feedids}});
});
I believe the Meteorite package publish-with-relations is designed to solve this problem, although I haven't used it.
EDIT: I believe the publish function will re-run when the userId changes, which means that you can have a server-side check to make sure the user is logged in before publishing sensitive data.
I think your problem is that .fetch() which you use here…
UserFeeds.find({user:this.userId}).fetch()
…removes the reactivity.
.fetch() returns an array instead of a cursor, and that array won't be reactive.
http://docs.meteor.com/#fetch
try this ...
Meteor.autosubscribe(function(){
Meteor.subscribe("feeds");
});
and in the Template JS ...
Template.templateName.feeds = function()
return Feeds.find() # or any specific call
};
in the HTML ...
{{#each feeds}}
do some stuff
{{else}}
no feed
{{/each}}
You can use the reactive-publish package (I am one of authors). It allows you to create publish endpoints which depend on the result of another query. In your case, query on UserFeeds.
Meteor.publish('feeds', function () {
this.autorun(function (computation) {
var feeds = _.pluck(UserFeeds.find({user: this.userId}, {fields: {feedid: 1}}).fetch(), 'feedid');
return Feeds.find({_id: {$in: feeds}});
});
});
The important part is that you limit the UserFeeds fields only to feedid to make sure autorun does not rerun when some other field changes in UserFeeds, a field you do not care about.

Sending extra, non-model data in a save request with backbone.js?

I'm looking for a solution for dealing with an issue of state between models using backbone.js.
I have a time tracking app where a user can start/stops jobs and it will record the time the job was worked on. I have a job model which holds the job's data and whether it is currently 'on'.
Only 1 job can be worked on at a time. So if a user starts a job the currently running job must be stopped. I'm wondering what the best solution to do this is. I mean I could simply toggle each job's 'on' parameter accordingly and then call save on each but that results in 2 requests to the server each with a complete representation of each job.
Ideally it would be great if I could piggyback additional data in the save request similarly to how it's possible to send extra data in a fetch request. I only need to send the id of the currently running job and since this really is unrelated to the model it needs to be sent alongside the model, not part of it.
Is there a good way to do this? I guess I could find a way to maintain a reference to the current job server side if need be :\
when you call a save function, the first parameter is an object of the data that's going to be saved. Instead of just calling model.save(), create an object that has the model data and your extra stuff.
inside of your method that fires off the save:
...
var data = this.model.toJSON();
data.extras = { myParam : someData };
this.model.save(data, {success: function( model, response ) {
console.log('hooray it saved: ', model, response);
});
...

Categories

Resources