Mongoose get sum of fields - javascript

I'm trying to track the bandwidth usage of a user based upon two mongoose schemas. I have a user and image schema, were a user has many images. My image schema looks like this:
image = {
creator: 'ObjectId of user',
size: '12345', //kb
uploadedTo:[{}]
}
Essentially I want to create a query that will get all images that belong to a user via the image.creator property. I would then multiply the image.size property by image.uploadedTo.length value to get the total bandwidth used.
For example: If a user has 5 images, each image is 5,000kb and is uploaded to 3 services each, the total bandwidth for the user would be 75,000kb (5*5,000*3).
Is this query possible strictly through mongoose, or would I have to just get the user's images and then use regular javascript to get the total bandwidth?

You'll want to use the aggregation pipeline. The basic projection might look like this:
{
$project: {
size: 1,
number_of_uploads: {
$size: "$uploadedTo"
},
total_bandwidth: {
$multiply: [ "$size", "$number_of_uploads" ]
}
}
You'd get a new document that looks like:
{
size: '1234',
number_of_uploads: 2,
total_bandwidth: 2468
}
You'll need to integrate that with Mongoose's aggregate helper.
If you're using MongoDB 3.2, you can also use $lookup (which is basically a join operation) as part of your pipeline to look up the creator._id, and then run a $sum operation on all of the images (you'll probably $group by that creator ID). The benefit of this is that your server doesn't do any work; the lookups and operations happen inside MongoDB itself.
If you're not using v3.2, you can leverage Mongoose's population to look up (on your own server) the creator ID for you, and then use JavaScript on your own server to calculate the sum.
It's a bit difficult for me to come up with what exactly your pipeline will look like since I don't have a sample dataset to play with, but the above tools should be all that you need.
Additional operation resources
$size
$multiply
(P.S. you're probably looking at this like "WTF?". Sometimes it's easier to just do the calculations yourself and "use regular javascript to get the total bandwidth", as you mentioned. Both solutions will work, it just depends on where you want to put the load - whether on the MongoDB server or on your server - and how many round-trips you want to make.)

Related

In sails/waterline get maximum value of a column in a database agnostic way

While using sails as ORM (version 1.0), I notice that there is a function called Model.avg (as well as sum). - However there is not a maximum or minimum function to get the maximum or minimum from a column in a model; so it seems this is not necessary because it is covered by other functions already?
Now in my database I need to get the "maximum id" in a list; and I have it working for postgresql by using a native query:
const maxnum = await Order.getDatastore().sendNativeQuery('SELECT MAX(\"orderNr\") FROM \"order\"')
While this isn't the most difficult thing, it is not what I truly want: it is limited to only sql-based datastores (so we wouldn't be able to move easily to mongodb); and the syntax might actually be even different for another sql database type.
So I wonder - can this be transformed in such a way it doesn't rely on sendNativeQuery?
You can try .query() to execute a raw SQL query using the specified model's datastore and if u want u can try pg , an NPM package used for communicating with PostgreSQL databases:
Pet.query('SELECT pet.name FROM pet WHERE pet.name = $1', [ 'dog' ]
,function(err, rawResult) {
if (err) { return res.serverError(err); }
sails.log(rawResult);
// (result format depends on the SQL query that was passed in, and
the adapter you're using)
// Then parse the raw result and do whatever you like with it.
return res.ok();
});
You can use the limit and order options waterline provides to get a single Model with a maximal value (then just extract that value).
const orderModel = await Order.find({
where: {},
select: ['orderNr'],
limit: 1,
sort: 'orderNr DESC'
});
console.log(orderModel.orderNr);
Like most things in Waterline, it's probably not as efficient as an SQL SELECT MAX query (or some equivalent in mongo, etc), but it should allow swapping out the database with no maintenance. Last note, don't forget to handle the case of no models found.

Using Meteor publish-with-relations package where each join cannot use the _id field

I am working to solve a problem not dissimilar to the discussion present at the following blog post. This is wishing to publish two related data sets in Meteor, with a 'reactive join' on the server side.
https://www.discovermeteor.com/blog/reactive-joins-in-meteor/
Unfortunately for me, however, the related collection I wish to join to, will not be joined using the "_id" field, but using another field. Normally in mongo and meteor I would create a 'filter' block where I could specify this query. However, as far as I can tell in the PWR package, there is an implicit assumption to join on '_id'.
If you review the example given on the 'publish-with-relations' github page (see below) you can see that both posts and comments are being joined to the Meteor.users '_id' field. But what if we needed to join to the Meteor.users 'address' field ?
https://github.com/svasva/meteor-publish-with-relations
In the short term I have specified my query 'upside down' (as luckily I m able to use the _id field when doing a reverse join), but I suspect this will result in an inefficient query as the datasets grow, so would rather be able to do a join in the direction planned.
The two collections we are joining can be thought of as like a conversation topic/header record, and a conversation message collection (i.e. one entry in the collection for each message in the conversation).
The conversation topic in my solution is using the _id field to join, the conversation messages have a "conversationKey" field to join with.
The following call works, but this is querying from the messages to the conversation, instead of vice versa, which would be more natural.
Meteor.publishWithRelations({
handle: this,
collection: conversationMessages,
filter: { "conversationKey" : requestedKey },
options : {sort: {msgTime: -1}},
mappings: [{
//reverse: true,
key: 'conversationKey',
collection: conversationTopics,
filter: { startTime: { $gt : (new Date().getTime() - aLongTimeAgo ) } },
options: {
sort: { createdAt: -1 }
},
}]
});
Can you do a join without an _id?
No, not with PWR. Joining with a foreign key which is the id in another table/collection is nearly always how relational data is queried. PWR is making that assumption to reduce the complexity of an already tricky implementation.
How can this publish be improved?
You don't actually need a reactive join here because one query does not depend on the result of another. It would if each conversation topic held an array of conversation message ids. Because both collections can be queried independently, you can return an array of cursors instead:
Meteor.publish('conversations', function(requestedKey) {
check(requestedKey, String);
var aLongTimeAgo = 864000000;
var filter = {startTime: {$gt: new Date().getTime() - aLongTimeAgo}};
return [
conversationMessages.find({conversationKey: requestedKey}),
conversationTopics.find(requestedKey, {filter: filter})
];
});
Notes
Sorting in your publish function isn't useful unless you are using a limit.
Be sure to use a forked version of PWR like this one which includes Tom's memory leak fix.
Instead of conversationKey I would call it conversationTopicId to be more clear.
I think this could be now much easier solved with the reactive-publish package (I am one of authors). You can make any query now inside an autorun and then use the results of that to publish the query you want to push to the client. I would write you an example code, but I do not really understand what exactly do you need. For example, you mention you would like to limit topics, but you do not explain why would they be limited if you are providing requestedKey which is an ID of a document anyway? So only one result is available?

Angular.js accessing and displaying nested models efficiently

I'm building a site at the moment where there are many relational links between data. As an example, users can make bookings, which will have booker and bookee, along with an array of messages which can be attached to a booking.
An example json would be...
booking = {
id: 1,
location: 'POST CDE',
desc: "Awesome stackoverflow description."
booker: {
id: 1, fname: 'Lawrence', lname: 'Jones',
},
bookee: {
id: 2, fname: 'Stack', lname: 'Overflow',
},
messages: [
{ id: 1, mssg: 'For illustration only' }
]
}
Now my question is, how would you model this data in your angular app? And, while very much related, how would you pull it from the server?
As I can see it I have a few options.
Pull everything from the server at once
Here I would rely on the server to serialize the nested data and just use the given json object. Downsides are that I don't know what users will be involved when requesting a booking or similar object, so I can't cache them and I'll therefore be pulling a large chunk of data every time I request.
Pull the booking with booker/bookee as user ids
For this I would use promises for my data models, and have the server return an object such as...
booking = {
id: 1,
location: 'POST CDE',
desc: "Awesome stackoverflow description."
booker: 1, bookee: 2,
messages: [1]
}
Which I would then pass to a Booking constructor, which would resolve the relevant (booker,bookee and message) ids into data objects via their respective factories.
The disadvantages here are that many ajax requests are used for a single booking request, though it gives me the ability to cache user/message information.
In summary, is it better practise to rely on a single ajax request to collect all the nested information at once, or rely on various requests to 'flesh out' the initial response after the fact.
I'm using Rails 4 if that helps (maybe Rails would be more suited to a single request?)
I'm going to use a system where I can hopefully have the best of both worlds, by creating a base class for all my resources that will be given a custom resolve function, that will know what fields in that particular class may require resolving. A sample resource function would look like this...
class Booking
# other methods...
resolve: ->
booking = this
User
.query(booking.booker, booking.bookee)
.then (users) ->
[booking.booker, booking.bookee] = users
Where it will pass the value of the booker and bookee fields to the User factory, which will have a constructor like so...
class User
# other methods
constructor: (data) ->
user = this
if not isNaN(id = parseInt data, 10)
User.get(data).then (data) ->
angular.extend user, data
else angular.extend this, data
If I have passed the User constructor a value that cannot be parsed into a number (so this will happily take string ids as well as numerical) then it will use the User factorys get function to retrieve the data from the server (or through a caching system, implementation is obviously inside the get function itself). If however the value is detected to be non-NaN, then I'll assume that the User has already been serialized and just extend this with the value.
So it's invisible in how it caches and is independent of how the server returns the nested objects. Allows for modular ajax requests and avoids having to redownload unnecessary data via its caching system.
Once everything is up and running I'll write some tests to see whether the application would be better served with larger, chunked ajax requests or smaller modular ones like above. Either way this lets you pass all model data through your angular factories, so you can rely on every record having inherited any prototype methods you may want to use.

Get last documents with a distinct criteria

Situation
I'm having trouble coming up with a good way to do a certain MongoDb query. First, here's what kind of query I want to do. Assume a simple database which logs entry and exit events (and possibly other actions, doesn't matter) by electronic card swipe. So there's a collection called swipelog with simple documents which look like this:
{
_id: ObjectId("524ab4790a4c0e402200052c")
name: "John Doe",
action: "entry",
timestamp: ISODate("2013-10-01T1:32:12.112Z")
}
Now I want to list names and their last entry times (and any other fields I may want, but example below uses just these two fields).
Current solution
Here is what I have now, as a "one-liner" for MongoDb JavaScript console:
db.swipelog.distinct('name')
.forEach( function(name) {
db.swipelog.find( { name: name, action:"entry" } )
.sort( { $natural:-1 } )
.limit(1)
.forEach( function(entry) {
printjson( [ entry.name, entry.timestamp ] )
})
})
Which prints something like:
[ "John Doe", ISODate("2013-10-01T1:32:12.112Z")]
[ "Jane Deo", ISODate("2013-10-01T1:36:12.112Z")]
...
Question
I think above has the obvious scaling problem. If there are a hundred names, then 1+100 queries will be made to the database. So what is a good/correct way to get "last timestamp of every distinct name" ? Changing database structure or adding some collections is ok, if it makes this easier.
You can use aggregation framework to achieve this:
db.collection.aggregate(
[
{$match:
{action:'entry'}
},
{$group:
{_id:'$name',
first:
{$max:'$timestamp'}
}
}
])
If you likely to include other fields in the results, you can use the $first operator
db.collection.aggregate(
[
{$match:
{action:'entry'}
},
{$sort:
{name:1, timestamp:-1}
},
{$group:
{_id:'$name',
timestamp: {$first:'$timestamp'},
otherField: {$first:'$otherField'},
}
}
])
This answer should be a comment on attish's answer above, but I don't have sufficient rep here to comment
Keep in mind that the aggregation framework cannot return more than 16MB of data. If you have a very large number of users, you may run into this limitation on your production system.
MongoDB 2.6 adds new features to the aggregation framework to deal with this:
db.collection.aggregateCursor() (temporary name) is identical to db.collection.aggregate() except that it returns a cursor instead of a document. This avoids the 16MB limitation
$out is a new pipeline phase that directs the pipeline's output to a collection. This allows you to run aggregation jobs
$sort has been improved to remove its RAM limitations and increase speed
If query performance is more important than data age, you could schedule a regular aggregate command that stores its results in collection like db.last_swipe, then have your application simply query db.last_swipe for the relevant user.
Conclusion: I agree that attish has the right approach. However, you may run into trouble scaling it on the current MongoDB release and should look into Mongo 2.6.

syntax for linking documents in mongodb

If I have two objects in a user collection:
{_id: 1, name: 'foo', workItems: []}
{_id: 2, name: 'bar', workItems: []}
how would I add links to objects in a workItem collection into the workItems array for each user?
I understand direct embedding but some workItems will be assigned to multiple users so I don't want to duplicate data. I have looked on mongodb.org but I can't find any examples of linking.
Sometimes it is just better to duplicate the data. MongoDB is a non relational Database. Some ways of doing stuffs are bad practices with relational databases but intended with non relational one. This really is not the same way of thinking even though there are obvious common points.
At my work, we use it in production and found it both easier and faster for read operations to duplicate the data. This is precisely where the power of MongoDB stands.
Of course, when a workitem is modified, this requires your application to update all the places where it appears... This may not be a good solution for systems that are write intensive.
Another point is that joints are not handled by the engine so that you will have to issue at least a second request. You will then have to do the joint manually on the application side. Either way, you will have to move logic from the database to the client application.
You can do a DBRef like this:
{ $ref : <name of collection where reference is>, $id : <_id of document>, $db : <optional argument for specifying the databse the document is at> }
So your document would look like this:
{_id: 1, name: 'foo', workItems: {$ref: "blarg", $id: "1"}}

Categories

Resources