Consider this example collection:
{
"_id:"0,
"firstname":"Tom",
"children" : {
"childA":{
"toys":{
'toy 1':'batman',
'toy 2':'car',
'toy 3':'train',
}
"movies": {
'movie 1': "Ironman"
'movie 2': "Deathwish"
}
},
"childB":{
"toys":{
'toy 1':'doll',
'toy 2':'bike',
'toy 3':'xbox',
}
"movies": {
'movie 1': "Frozen"
'movie 2': "Barbie"
}
}
}
}
Now I would like to retrieve ONLY the movies from a particular document.
I have tried something like this:
movies = users.find_one({'_id': 0}, {'_id': 0, 'children.ChildA.movies': 1})
However, I get the whole field structure from 'children' down to 'movies' and it's content. How do I just do a query and retrieve only the content of 'movies'?
To be specific I want to end up with this:
{
'movie 1': "Frozen"
'movie 2': "Barbie"
}
The problem here is your current data structure is not really great for querying. This is mostly because you are using "keys" to actually represent "data points", and while it might initially seem to be a logical idea it is actually a very bad practice.
So rather than do something like assign "childA" and "childB" as keys of an object or "sub-document", you are better off assigning these are "values" to a generic key name in a structure like this:
{
"_id:"0,
"firstname":"Tom",
"children" : [
{
"name": "childA",
"toys": [
"batman",
"car",
"train"
],
"movies": [
"Ironman"
"Deathwish"
]
},
{
"name": "childB",
"toys": [
"doll",
"bike",
"xbox",
],
"movies": [
"Frozen",
"Barbie"
]
}
]
}
Not the best as there are nested arrays, which can be a potential problem but there are workarounds to this as well ( but later ), but the main point here is this is a lot better than defining the data in "keys". And the main problem with "keys" that are not consistently named is that MongoDB does not generally allow any way to "wildcard" these names, so you are stuck with naming and "absolute path" in order to access elements as in:
children -> childA -> toys
children -> childB -> toys
And that in a nutshell is bad, and compared to this:
"children.toys"
From the sample prepared above, then I would say that is a whole lot better approach to organizing your data.
Even so, just getting back something such as a "unique list of movies" is out of scope for standard .find() type queries in MongoDB. This actually requires something more of "document manipulation" and is well supported in the aggregation framework for MongoDB. This has extensive capabilities for manipulation that is not present in the query methods, and as a per document response with the above structure then you can do this:
db.collection.aggregate([
# De-normalize the array content first
{ "$unwind": "$children" },
# De-normalize the content from the inner array as well
{ "$unwind": "$children.movies" },
# Group back, well optionally, but just the "movies" per document
{ "$group": {
"_id": "$_id",
"movies": { "$addToSet": "$children.movies" }
}}
])
So now the "list" response in the document only contains the "unique" movies, which corresponds more to what you are asking. Alternately you could just $push instead and make a "non-unique" list. But stupidly that is actually the same as this:
db.collection.find({},{ "_id": False, "children.movies": True })
As a "collection wide" concept, then you could simplify this a lot by simply using the .distinct() method. Which basically forms a list of "distinct" keys based on the input you provide. This playes with arrays really well:
db.collection.distinct("children.toys")
And that is essentially a collection wide analysis of all the "distinct" occurrences for each"toys" value in the collection, and returned as a simple "array".
But as for you existing structure, it deserves a solution to explain, but you really must understand that the explanation is horrible. The problem here is that the "native" and optimized methods available to general queries and aggregation methods are not available at all and the only option available is JavaScript based processing. Which even though a little better through "v8" engine integration, is still really a complete slouch when compared side by side with native code methods.
So from the "original" form that you have, ( JavaScript form, functions have to be so easy to translate") :
db.collection.mapReduce(
// Mapper
function() {
var id this._id;
children = this.children;
Object.keys(children).forEach(function(child) {
Object.keys(child).forEach(function(childKey) {
Object.keys(childKey).forEach(function(toy) {
emit(
id, { "toys": [children[childkey]["toys"][toy]] }
);
});
});
});
},
// Reducer
function(key,values) {
var output = { "toys": [] };
values.forEach(function(value) {
value.toys.forEach(function(toy) {
if ( ouput.toys.indexOf( toy ) == -1 )
output.toys.push( toy );
});
});
},
{
"out": { "inline": 1 }
}
)
So JavaScript evaluation is the "horrible" approach as this is much slower in execution, and you see the "traversing" code that needs to be implemented. Bad news for performance, so don't do it. Change the structure instead.
As a final part, you could model this differently to avoid the "nested array" concept. And understand that the only real problem with a "nested array" is that "updating" a nested element is really impossible without reading in the whole document and modifying it.
So $push and $pull methods work fine. But using a "positional" $ operator just does not work as the "outer" array index is always the "first" matched element. So if this really was a problem for you then you could do something like this, for example:
{
"_id:"0,
"firstname":"Tom",
"childtoys" : [
{
"name": "childA",
"toy": "batman"
}.
{
"name": "childA",
"toy": "car"
},
{
"name": "childA",
"toy": "train"
},
{
"name": "childB",
"toy": "doll"
},
{
"name": "childB",
"toy": "bike"
},
{
"name": "childB",
"toy": "xbox"
}
],
"childMovies": [
{
"name": "childA"
"movie": "Ironman"
},
{
"name": "childA",
"movie": "Deathwish"
},
{
"name": "childB",
"movie": "Frozen"
},
{
"name": "childB",
"movie": "Barbie"
}
]
}
That would be one way to avoid the problem with nested updates if you did indeed need to "update" items on a regular basis rather than just $push and $pull items to the "toys" and "movies" arrays.
But the overall message here is to design your data around the access patterns you actually use. MongoDB does generally not like things with a "strict path" in the terms of being able to query or otherwise flexibly issue updates.
Projections in MongoDB make use of '1' and '0' , not 'True'/'False'.
Moreover ensure that the fields are specified in the right cases(uppercase/lowercase)
The query should be as below:
db.users.findOne({'_id': 0}, {'_id': 0, 'children.childA.movies': 1})
Which will result in :
{
"children" : {
"childA" : {
"movies" : {
"movie 1" : "Ironman",
"movie 2" : "Deathwish"
}
}
}
}
Related
I am working on a MERN project. I have created a collection in MongoDB having different types of document. Is it an accepted practice to have different structure documents in a single collection? Secondly i need to fetch only a single document from the collection using the key name. My documents are
[{
"_id": {
"$oid": "6333f72822dc0acc4bea17bd"
},
"designation": [
{
"name": "Chairman",
"level": 17
},
{
"name": "Director",
"level": 13
},
{
"name": "Secretary ",
"level": 13
},
{
"name": "Account Officer",
"level": 9
},
{
"name": "Data Entry Operator-GR B",
"level": 5
}
]
},
{
"_id": {
"$oid": "6334313b22dc0acc4bea17c2"
},
"storeRole": ["manager", "approver", "accepter", "firstsignatory"]
},
{
"_id": {
"$oid": "63369d2083a7cc2e818990dd"
},
"designationSuffix": ["I","II", "III"]
}]
How do I get any of the three documents if I only know the key name i.e(designation, storeRole, designationSuffix). I dont want to use ID value.
Welcome to SO.
First, yes it is an accepted practice and indeed, a powerful feature of MongoDB to have different shapes of data in a single collection.
There are two important things to remember when querying for data:
Matching on fields that don't even exist in a document is OK; the document will simply be skipped. This permits you, for example, to query for storeRole and ignore the other documents with designation, etc. -- unless of course you wish to look for those too using an $or expression.
Matching (using $match) for elements in an array will return the whole array, not just the elements that match.
To illustrate this point, let's expand your input data slightly:
{"designation": [
{"name": "Chairman","level": 17},
{"name": "Director", "level": 13}
]
},
{"designation": [
{"name": "Secretary","level": 13}
]
},
We will use dot notation to reach into the structures in the designation array to find those docs where at least one of the name fields is Chairman:
db.foo.aggregate([
{$match: {"designation.name": "Chairman"}}
]);
{
"_id" : 0,
"designation" : [
{
"name" : "Chairman",
"level" : 17
},
{
"name" : "Director",
"level" : 13
}
]
}
The query eliminated the document with name = Secretary as expected but properly returned the whole document (and the whole array) where name = Chairman. Very often the goal is to fetch only the matching items in the array; this is accomplished with the $filter operator:
db.foo.aggregate([
{$match: {"designation.name": "Chairman"}},
{$project: {
// Assigning the output of $filter to the same name as input:
designation: {$filter: {
input: "$designation",
as: "zz",
cond: {$eq: ['$$zz.name','Chairman']}
}}
}}
]);
{
"_id" : 0,
"designation" : [
{
"name" : "Chairman",
"level" : 17
}
]
}
An alternative approach which is useful when query conditions yield null or empty arrays instead of eliminating the document altogether is to $filter first, then match only on results where the array has a length > 1. We must use the $ifNull function to protect $size from being passed a null by turning it into an empty (but not null) array:
db.foo.aggregate([
{$project: {
// Assigning the output of $filter to the same name as input:
designation: {$filter: {
input: "$designation",
as: "zz",
cond: {$eq: ['$$zz.name','Chairman']}
}}
}},
{$match: {$expr: {$gt:[{$size: {$ifNull:["$designation",[] ]}}, 0]}} }
]);
Try commenting out the $match to see what $filter returns when a document has the target array field but no matches vs. when the document does not have the field.
Apologies for the nood question. I'm just starting out using MongoDB and MongoDB Shell.
I've got a DB called Dealers that looks a little like this (very simplified):
[
{
"Id": 1,
"Vehicles": [
{
"Manufacturer": "Ford"
},
{
"Manufacturer": "MG"
},
{
"Manufacturer": "Citroen"
}
]
},
{
"Id": 2,
"Vehicles": [
{
"Manufacturer": "Ford"
},
{
"Manufacturer": "Nissan"
},
{
"Manufacturer": "Ford"
}
]
}
]
I'm trying to get my head round how you filter collections within collections EG. Say I wanted to select all the Ford's from Id 2.
I get as far as:
const dealer = database.collection('Dealers');
const result = await dealer.find({Id: 2})
and I tried:
const result = await dealers.find({
Id: 2,
Vehicles: [
{
Manufacturer: "Ford"
}
]
})
But I know that won't work because it's not iterating through the Vehicles collection. Is this the sort of instance that you would use an aggregation? Like I say, I'm very new to this sort of environment, and would really appreciate any pointers please.
I Just have tried. You can use Aggregate function To actually match the items inside the array in the collection. Like the way following query will select all the documents that have id equal to 1 and Manufacturer equal to Ford
db.MyCollection.aggregate([{$match:{Id:1}},{$unwind:"$Vehicles"}, {$match: {"Vehicles.Manufacturer":"Ford"}}]);
It is returning like this. I have used my own Id i.e equal to one you can change this.
I have array of users who have a property array 'rights' and I want to filter out the users who have specific rights. I would like to filter by an array so if I wanted all the users with full rights ['full'] or users with both full and edit ['full','edit']. I am fairly new to using lodash and I think I can chain some together but I am not sure if this is there are more efficient ways of doing it.
Here is my plunker: http://plnkr.co/edit/5PCvaDJaXF4uxRowVBlK?p=preview
Result ['full'] :
[{
"name": "Company1 Admin",
"rights": [
"full"
]
},
{
"name": "FullRights Company1",
"rights": [
"full","review"
]
}]
Result ['full','edit']:
[{
"name": "Company1 Admin",
"rights": [
"full"
]
},
{
"name": "FullRights Company1",
"rights": [
"full","review"
]
},
{
"name": "EditRights Company1",
"rights": [
"edit"
]
}]
Code:
var users = [
{
"name": "Company1 Admin",
"rights": [
"full"
]
},
{
"name": "FullRights Company1",
"rights": [
"full","review"
]
},
{
"name": "ApproveRights Company1",
"rights": [
"approve","review"
]
},
{
"name": "EditRights Company1",
"rights": [
"edit"
]
},
{
"name": "ReviewRights Company1",
"rights": [
"review"
]
},
{
"name": "NoRights Company1",
"rights": [
"none"
]
}
];
var tUsers = [];
var filterRights = ['full','edit'];
_.forEach(users, function(user) {
if (_.intersection(user.rights, filterRights).length > 0) {
tUsers.push(user);
}
}) ;
//console.log('users', JSON.stringify(users, null, 2));
console.log('tUsers', JSON.stringify(tUsers, null, 2));
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/3.10.1/lodash.min.js"></script>
From the docs
_.filter(collection, predicate, thisArg);
Arguments
collection (Array|Object|string): The collection to iterate over.
[predicate=_.identity] (Function|Object|string): The function invoked per iteration.
[thisArg] (*): The this binding of predicate.
Chaining is great when you want to connect different processing steps.
If your problem statement was to
filter by rights
sort by oldest person
take 10
Then chaining would make a lot of sense.
This problem seems to be mostly custom logic on filtering.
var users = [/* Your user data here */];
function filterByRights (users, rights) {
return _.filter(users, function (user) {
return _.any(user.rights, function (right) {
return _.contains(rights, right);
});
});
}
filterByRights(users, ['full', 'edit']); // [/*Users with full or edit rights*/]
I think my example is good becuase it doesn't depend on conditional logic. It uses lodash defined methods like any and contains
Performance concerns
I want to expand on what performance concerns you have. Here are a couple of points.
Your question code is maintaining its own mechanism for filtering out users. While it is a perfectly good solution you should opt into letting the guys who maintain lodash handle this logic. They have probably spent a lot of time optimizing how to create another array from an original one.
_.any is more efficient than _.intersection. _.intersection needs to process every element to know what the intersection is. _.any stops when it hits the first element which passes the predicate otherwise it checks each of them. This point is minor since there are a small number of "rights"
The example I've given is probably more "lodash standard". You typically can do data transformations completely with lodash defined methods and trivial predicates.
Here is an update to #t3dodson 's answer. You should now use the following snippet if using current (4.17.4) Lodash version:
function filterByRights (users, rights) {
return _.filter(users, function (user) {
return _.some(user.rights, function (right) {
return _.includes(rights, right);
});
});
}
From the Changelog:
Removed _.contains in favor of _.includes
Removed _.any in favor of _.some
I think you were on the right path with intersection() (I've never seen any performance issues with this function). Here's how I would compose an iteratee using flow():
_.filter(users, _.flow(
_.property('rights'),
_.partial(_.intersection, filterRights),
_.size
));
The property() function gets the rights property, and passes it to intersection(). We've already partially-applied the filterRights array. Lastly, the size() function is necessary to pass a thruthy/falesy value to filter().
I have a dataset of records stored in mongodb and i have been trying to extract a complex set of data from the records.
Sample records are as follows :-
{
bookId : '135wfkjdbv',
type : 'a',
store : 'crossword',
shelf : 'A1'
}
{
bookId : '13erjfn',
type : 'b',
store : 'crossword',
shelf : 'A2'
}
I have been trying to extract data such that for each bookId, i get a count (of records) for each shelf per store name that holds the book identified by bookId where the type of the book is 'a'.
I understand that the aggregation query allows a pipeline that allows grouping, matching etc, but I have not been able to reach a solution.
The desired output is of the form :-
{
bookId : '135wfkjdbv',
stores : [
{
name : 'crossword'
shelves : [
{
name : 'A1',
count : 12
},
]
},
{
name : 'granth'
shelves : [
{
name : 'C2',
count : 12
},
{
name : 'C4',
count : 12
},
]
}
]
}
The process isn't really that difficult when you look at at. The aggregation "pipeline" is exactly that, where each "stage" feeds a result into the next for processing. Just like unix "pipe":
ps -ef | grep mongo | tee out.txt
So it's just adding stages, and in fact three $group stages where the first does the basic aggregation and the remaining two simply "roll up" the arrays required in the output.
db.collection.aggregate([
{ "$group": {
"_id": {
"bookId": "$bookId",
"store": "$store",
"shelf": "$shelf"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": {
"bookId": "$_id.bookId",
"store": "$_id.store"
},
"shelves": {
"$push": {
"name": "$_id.shelf",
"count": "$count"
}
}
}},
{ "$group": {
"_id": "$_id.bookId",
"stores": {
"$push": {
"name": "$_id.store",
"shelves": "$shelves"
}
}
}}
])
You could possibly $project at the end to change the _id to bookId, but you should already know that is what it is and get used to treating _id as a primary key. There is a cost to such operations, so it is a habit you should not get into and learn doing things correctly from the start.
So all that really happens here is all the fields that would make up the grouping detail are made the primary key of $group with the other field being produced as count, to count the shelves within that grouping. Think the SQL equivalent:
GROUP BY bookId, store, shelf
All each other stage does is transpose each grouping level into array entries, first by shelf within the store and then the store within the bookId. Each time the fields in the primary grouping key are reduced down by the content going into the produced array.
When you start thinking in terms of "pipeline" processing, then it becomes clear. As you construct one form, then take that output and move it to the next form and so on. This is basically how you fold the results within two arrays.
I have documents in a collection, looking like this:
[
{
userId: 1,
itemsIds: [399957190, 366369952],
hash: '85e765840b1cd3c413404cdf6b8fb2a4'
},
{
userId: 2,
itemsIds: [349551151, 366369952],
hash: 'a28fa334515749b1b13fcd2183edb8de'
},
{
userId: 3,
itemsIds: [399957190, 366369952],
hash: '85e765840b1cd3c413404cdf6b8fb2a4'
}
]
These are users, which have favorite items in their lists. I want one user's list to others and find if they are equal. If they are, I want to mark them as a kind of a pair in my code and perform some actions.
In the example above users 1 and 3 have the same favorites lists.
How do I find users with an array which contains exactly the values I list?
There are several "very useful cases" here where in fact trying to create a "unique hash" over the array content is actually "getting in the way" of the myriad of problems that can be easily addressed.
Finding Common to "Me"
If you for example take "user 1" from the sample provided, and consider that you have that data loaded already and want to find "those in common with me" by the matched "itemsIds" from what the current user object has, then there are two simple query approaches:
Find "exactly" the same: Is where you want to inspect other user data to see those users that have the same "exact" interests. This is simple and "unordered" usage of the $all query operator:
db.collection.find({
"itemsIds": { "$all": [399957190, 366369952] },
"userId": { "$ne": 1 }
})
Which is going to return "user 3" since they are the one with "both" common "itemsIds" entries. Order is not important here as it is always a match in any order, as long as they are both there. This is another form of $and as query arguments.
Find "similar" in common to me": Which is basically asking "do you have something that is the same?". For that you can use the $in query operator. It will match if "either" of the specified conditions is met:
db.collection.find({
"itemsIds": { "$in": [399957190, 366369952] },
"userId": { "$ne": 1 }
})
In this case "both" the "user 2" and "user 3" are going to match, as they "at least" share "one" of the conditions specified and that means that have "something in common" with the source data of the query.
This is in fact another form of the $or query operator, and just like before it is a lot simplier and concise to write this way given the conditions to be applied.
Finding Common "Things"
There are also cases where you might want to find things "in common" without having a base "user" to start from. So how do you tell that "user 1" and "user 2" share the same "itemIds", or in fact that various of the users might share the same "itemIds" value individually, but who are they?
Get the exact matches: Is of course where you look at the "itemsIds" values and $group them together. Generally the "order is important" here, so optimally you have them "pre-ordered" and consistently always to make this as simple as:
db.collection.aggregate([
{ "$group": {
"_id": "$itemsIds",
"common": { "$push": "$userId" }
}}
])
And that is all there really is to it, as long as the order is already there. If not, then you can do a slightly longer winded form to do the "ordering", but the same could be said of generating a "hash":
db.collection.aggregate([
{ "$unwind": "$itemsIds" },
{ "$sort": { "_id": 1, "itemsIds": 1 } },
{ "$group": {
"_id": "$_id",
"userId": { "$first": "$userId" },
"itemsIds": { "$push": "$itemsIds" }
}},
{ "$group": {
"_id": "$itemsIds",
"common": { "$push": "$userId" }
}}
])
Not "super" performant, but it makes the point of why you always keep ordered on addition of array entries. Which is a very simple process.
Common "user" to "items": Which is another simple process abstracting on above with "breaking down" the array under $unwind, and then basically grouping back:
db.collection.aggregate([
{ "$unwind": "$itemsIds" },
{ "$group": {
"_id": "$itemsIds",
"users": { "$addToSet": "$userId" }
}}
])
And again, just a simple grouping aggregator of $addToSet does the job and collects the "distinct userId" values for each "itemsIds" value.
These are all basic solutions, and I could go on with "set intersections" and what not, but this is the "primer".
Don't try to compute a "hash", MongoDB has a good "arsenal" for matching the entries anyway. Use it and "abuse it" as well, until it breaks. Then try harder.