How to retrieve documents with conditioning an array of nested objects? - javascript

The structure of the objects stored in mongodb is the following:
obj = {_id: "55c898787c2ab821e23e4661", ingredients: [{name: "ingredient1", value: "70.2"}, {name: "ingredient2", value: "34"}, {name: "ingredient3", value: "15.2"}, ...]}
What I would like to do is retrieve all documents, which value of specific ingredient is greater than arbitrary number.
To be more specific, suppose we want to retrieve all the documents which contain ingredient with name "ingredient1" and its value is greater than 50.
Trying the following I couldn't retrieve desired results:
var collection = db.get('docs');
var queryTest = collection.find({$where: 'this.ingredients.name == "ingredient1" && parseFloat(this.ingredients.value) > 50'}, function(e, docs) {
console.log(docs);
});
Does anyone know what is the correct query to condition upon specific array element names and values?
Thanks!

You really don't need the JavaScript evaluation of $where here, just use basic query operators with an $elemMatch query for the array. While true that the "value" elements here are in fact strings, this is not really the point ( as I explain at the end of this ). The main point is to get it right the first time:
collection.find(
{
"ingredients": {
"$elemMatch": {
"name": "ingredient1",
"value": { "$gt": 50 }
}
}
},
{ "ingredients.$": 1 }
)
The $ in the second part is the postional operator, which projects only the matched element of the array from the query conditions.
This is also considerably faster than the JavaScript evaluation, in both that the evaluation code does not need to be compiled and uses native coded operators, as well as that an "index" can be used on the "name" and even "value" elements of the array to aid in filtering the matches.
If you expect more than one match in the array, then the .aggregate() command is the best option. With modern MongoDB versions this is quite simple:
collection.aggregate([
{ "$match": {
"ingredients": {
"$elemMatch": {
"name": "ingredient1",
"value": { "$gt": 50 }
}
}
}},
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$eq": [ { "$ifNull": [ "$name", "ingredient1" ] }, "ingredient1" ] },
{ "$gt": [ { "$ifNull": [ "$value", 60 ] }, 50 ] }
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])
And even simplier in forthcoming releases which introduce the $filter operator:
collection.aggregate([
{ "$match": {
"ingredients": {
"$elemMatch": {
"name": "ingredient1",
"value": { "$gt": 50 }
}
}
}},
{ "$project": {
"ingredients": {
"$filter": {
"input": "$ingredients",
"as": "ingredient",
"cond": {
"$and": [
{ "$eq": [ "$$ingredient.name", "ingredient1" ] },
{ "$gt": [ "$$ingredient.value", 50 ] }
]
}
}
}
}}
])
Where in both cases you are effectively "filtering" the array elements that do not match the conditions after the initial document match.
Also, since your "values" are actually "strings" right now, you reaally should change this to be numeric. Here is a basic process:
var bulk = collection.initializeOrderedBulkOp(),
count = 0;
collection.find().forEach(function(doc) {
doc.ingredients.forEach(function(ingredient,idx) {
var update = { "$set": {} };
update["$set"]["ingredients." + idx + ".value"] = parseFloat(ingredients.value);
bulk.find({ "_id": doc._id }).updateOne(update);
count++;
if ( count % 1000 != 0 ) {
bulk.execute();
bulk = collection.initializeOrderedBulkOp();
}
})
]);
if ( count % 1000 != 0 )
bulk.execute();
And that will fix the data so the query forms here work.
This is much better than processing with JavaScript $where which needs to evaluate every document in the collection without the benefit of an index to filter. Where the correct form is:
collection.find(function() {
return this.ingredients.some(function(ingredient) {
return (
( ingredient.name === "ingredient1" ) &&
( parseFloat(ingredient.value) > 50 )
);
});
})
And that can also not "project" the matched value(s) in the results as the other forms can.

Try using $elemMatch:
var queryTest = collection.find(
{ ingredients: { $elemMatch: { name: "ingredient1", value: { $gte: 50 } } } }
);

Related

MongoDB Conditional Projection based on existence of query of subdocument in Array

I have a schema in which properties can have respective "override" documents stored in an Array("overrides")
E.g.
{
foo:'original foo',
overrides: [
{property:'foo', value:'foo override'},
{property:'bar', value:'bar override'},
]
}
I want to project a field for the override value if it exists, otherwise, the original property.
So something like this
project: { overrideOrOriginal: {$cond: fooOverrideExists ? fooOverrideValue : originalFooValue }
So in this example, I would expect overrideOrOriginal to equal 'foo override' . If - {property:'foo', value:'foo override'} subDoc didn't exist in the overrides array (or if overrides array itself didn't even exist)...then I'd expect overrideOrOriginal = 'original foo'
How can I do this?
I was thinking I'd need $exists in tandem with $cond. But the complication here is that I'm searching for a subDoc in an Array based on a query
Thanks!
$ifNull to check if field is null then return empty array
$in to check "foo" is in overrides.property array
$indexOfArray to get index of array element in overrides.property array
$arrayElemAt to get element by specific index return from above operator
let fooOverrideExists = "foo";
db.collection.find({},
{
overrideOrOriginal: {
$cond: [
{
$in: [
fooOverrideExists,
{ $ifNull: ["$overrides.property", []] }
]
},
{
$arrayElemAt: [
"$overrides.value",
{ $indexOfArray: ["$overrides.property", fooOverrideExists] }
]
},
"$foo"
]
}
})
Playground
Query
find the property , key-value(kv) (it works for all property names)
(assumes your schema with only string value the value of that property)
checks if that it exists in the overrides array
if it exists, takes the value from the array
else keeps the original
*checks also cases where override doesnt exists, or its empty array, or property doesn't exist
*in case you want to do it only for a specific "foo" ignore the big first $set and use this code
Test code here
db.collection.aggregate([
{
"$set": {
"kv": {
"$arrayElemAt": [
{
"$filter": {
"input": {
"$objectToArray": "$$ROOT"
},
"cond": {
"$eq": [
{
"$type": "$$this.v"
},
"string"
]
}
}
},
0
]
}
}
},
{
"$set": {
"index": {
"$indexOfArray": [
"$overrides.property",
"$kv.k"
]
}
}
},
{
"$project": {
"_id": 0,
"overrideOrOriginal": {
"$cond": [
{
"$or": [
{
"$eq": [
"$index",
-1
]
},
{
"$not": [
"$overrides"
]
}
]
},
"$kv.v",
{
"$arrayElemAt": [
"$overrides.value",
"$index"
]
}
]
}
}
}
])

MongoDB - mapReduce

I've got mongoDB collection, where each doc looks like:
{
"_id": 1,
"name": "Aurelia Menendez",
"scores": [{
"score": 60.06045071030959,
"type": "exam"
}, {
"score": 52.79790691903873,
"type": "quiz"
}, {
"score": 71.76133439165544,
"type": "homework"
}]
}
I try to run:
db.students.mapReduce(
function() {
emit(this._id, this.scores.map(a => a.score));
},
function(_id, values) {
//here i try:
1) return values.reduce((a, b) => a + b);
2) return values.reduce((a, b) => a + b, 0);
3) return Array.sum(values);
},
{ out: "total_scores" }
)
What I get? I get collection where each doc look like:
"value" is array:
{
"_id": 20,
"value": [42.17439799514388, 71.99314840599558, 81.23972632069464]
}
"value" is value
{
"_id": 188,
"value": "060.314725741828,41.12327471818652,74.8699176311771"
}
"value" is array
{
"_id": 193,
"value": [47.67196715489599, 41.55743490493954, 70.4612811769744]
}
Why I don't get sum of elements? When I try this.scores or this.scores.score instead of this.scores.map(a => a.score), I have all attributes, or null values.
Maybe someone have any idea, what did I wrong?
You should use Aggregation instead of MapReduce. This is note from official Mongo document
Aggregation pipeline provides better performance and a more coherent
interface than map-reduce, and various map-reduce operations can be
rewritten using aggregation pipeline operators, such as $group,
$merge, $accumulator, etc..
The steps I used to get the aggregation stages:
Use MongoDB Compass and open Aggregations Tab to test aggregation.
Add stages : $match : filter student, $unwind: flatten array of score, $group : get the total by sum all scores.
Convert to code
The result is
[
{
'$match': {
'name': 'Aurelia Menendez'
}
}, {
'$unwind': {
'path': '$scores'
}
}, {
'$group': {
'_id': '$_id',
'total': {
'$sum': '$scores.score'
}
}
}
]

Remove all Field Beginning with name "XX"

An example document from a collection:
{ "teamAlpha": { }, "teamBeta": { }, "leader_name": "leader" }
For such document, I would like to remove all fields that starts with "team". So the expected result is
{leader_name: "leader"}
I am currently using a function:
db.teamList.find().forEach(
function(document) {
for(var k in document) {
if (k.startsWith('team')) {
delete document[k];
}
}
db.teamList.save(document);
}
);
I am wondering if there is a better approach for this problem.
It would be "better" to instead determine all the possible keys beforehand and then issue a single "multi" update to remove all the keys. Depending on the available MongoDB version there would be different approaches.
MongoDB 3.4: $objectToArray
let fields = db.teamList.aggregate([
{ "$project": {
"_id": 0,
"fields": {
"$map": {
"input": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"as": "d",
"cond": { "$eq": [{ "$substrCP": [ "$$d.k", 0, 4 ] }, "team" ] }
}
},
"as": "f",
"in": "$$f.k"
}
}
}},
{ "$unwind": "$fields" },
{ "$group": { "_id": "$fields" } }
])
.map( d => ({ [d._id]: "" }))
.reduce((acc,curr) => Object.assign(acc,curr),{})
db.teamList.updateMany({},{ "$unset": fields });
The .aggregate() statement turns the fields in the document into an array via $objectToArray and then applies $filter to only return those where the first four letters of the "key" matched the string "team". This is then processed with $unwind and $group to make a "unique list" of the matching fields.
The subsequent instructions merely process that list returned in the cursor into a single object like:
{
"teamBeta" : "",
"teamAlpha" : ""
}
Which is then passed to $unset to remove those fields from all documents.
Earlier Versions: mapReduce
var fields = db.teamList.mapReduce(
function() {
Object.keys(this).filter( k => /^team/.test(k) )
.forEach( k => emit(k,1) );
},
function() {},
{ "out": { "inline": 1 } }
)
.results.map( d => ({ [d._id]: "" }))
.reduce((acc,curr) => Object.assign(acc,curr),{})
db.teamList.update({},{ "$unset": fields },{ "multi": true });
Same basic thing, where the only difference demonstrated is where .updateMany() does not exist as a method we simply call .update() using the "multi" parameter to apply to all matched documents. Which is all the new API call actually does.
Beyond those Options
It certainly is not wise to iterate all documents simply to remove fields, and therefore either of the above would be the "preferred" approach. The only possible failing is that constructing the "distinct list" of keys actually exceeds the 16MB BSON limit. That is pretty extreme, but depending on the actual data it is possible.
Therefore there are essentially "two extensions" that naturally apply to the techniques:
Use the "cursor" with .aggregate()
var fields = [];
db.teamList.aggregate([
{ "$project": {
"_id": 0,
"fields": {
"$map": {
"input": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"as": "d",
"cond": { "$eq": [{ "$substrCP": [ "$$d.k", 0, 4 ] }, "team" ] }
}
},
"as": "f",
"in": "$$f.k"
}
}
}},
{ "$unwind": "$fields" },
{ "$group": { "_id": "$fields" } }
]).forEach( d => {
fields.push(d._id);
if ( fields.length >= 2000 ) {
db.teamList.updateMany({},
{ "$unset":
fields.reduce((acc,curr) => Object.assign(acc,{ [curr]: "" }),{})
}
);
}
});
if ( fields.length > 0 ) {
db.teamList.updateMany({},
{ "$unset":
fields.reduce((acc,curr) => Object.assign(acc,{ [curr]: "" }),{})
}
);
}
Where this would essentially "batch" the number of fields as processed on the "cursor" into lots of 2000, which "should" stay well under the 16MB BSON limit as a request.
Use a temporary collection with mapReduce()
db.teamList.mapReduce(
function() {
Object.keys(this).filter( k => /^team/.test(k) )
.forEach( k => emit(k,1) );
},
function() {},
{ "out": { "replace": "tempoutput" } }
);
db.tempoutput.find({},{ "_id": 1 }).forEach(d => {
fields.push(d._id);
if ( fields.length >= 2000 ) {
db.teamList.update({},
{ "$unset":
fields.reduce((acc,curr) => Object.assign(acc,{ [curr]: "" }),{})
},
{ "multi": true }
);
}
});
if ( fields.length > 0 ) {
db.teamList.update({},
{ "$unset":
fields.reduce((acc,curr) => Object.assign(acc,{ [curr]: "" }),{})
},
{ "multi": true }
);
}
Where it is again essentially the same process, except as mapReduce cannot output to a "cursor", you need to output to a temporary collection consisting of only the "distinct field names" and then iterate the cursor from that collection in order to process in the same "batch" manner.
Just as the similar initial approaches, these are much more performant options than iterating the whole collection and making adjustments to each document individually. It generally should not be necessary since the likelihood of any "distinct list" actually causing a single update request to exceed 16MB would indeed be extreme. But this would again be the "preferred" way to handle such an extreme case.
General
Of course if you simply know all the field names and do not need to work them out by examining the collection, then simply write the statement with the known names:
db.teamList.update({},{ "$unset": { "teamBeta": "", "teamAlpha": "" } },{ "multi": true })
Which is perfectly valid because all the other statements are doing is working out what those names should be for you.

Find maximum length of data in keys for the collection

{
"_id" : ObjectId("59786a62a96166007d7e364dsadasfafsdfsdgdfgfd"),
"someotherdata" : {
"place1" : "lwekjfrhweriufesdfwergfwr",
"place2" : "sgfertgryrctshyctrhysdthc ",
"place3" : "sdfsdgfrdgfvk",
"place4" : "asdfkjaseeeeeeeeeeeeeeeeefjnhwklegvds."
}
}
I have thousands of these in my collection. I need to look through all the someotherdata and do the following
Check to see if it is present (in some records i have place1 and not place4)
Find the longest record (in terms of string length)
The output must look something like this (showing the count of characters for the longest)
{
place1: 123,
place2: 12,
place3: 17
place4: 445
}
I'am using Mongodb 3.2.9 so don't have access to the new aggregate functions. But I do have the Mongodb shell
EDIT: To be clear I want the longest throughout the whole collection. So there might be 1000 documents but only one result with the longest length for each field throughout the whole collection.
Use .mapReduce() for this to reduce down to the largest values for each key:
db.collection.mapReduce(
function() {
emit(null,
Object.keys(this.someotherdata).map(k => ({ [k]: this.someotherdata[k].length }))
.reduce((acc,curr) => Object.assign(acc,curr),{})
);
},
function(key,values) {
var result = {};
values.forEach(value => {
Object.keys(value).forEach(k => {
if (!result.hasOwnProperty(k))
result[k] = 0;
if ( value[k] > result[k] )
result[k] = value[k];
});
});
return result;
},
{
"out": { "inline": 1 },
"query": { "someotherdata": { "$exists": true } }
}
)
Which basically emits the "length" of each key present in the sub-document path for each document, and then in "reduction", only the largest "length" for each key is actually returned.
Note that in mapReduce you need to put out the same structure you put in, since the way it deals with a large number of documents is by "reducing" in gradual batches. Which is why we emit in numeric form, just like the "reduce" function does.
Gives this output on your document shown in the question. Of course it's the "max" on all documents in the collection when you have more.
{
"_id" : null,
"value" : {
"place1" : 25.0,
"place2" : 26.0,
"place3" : 13.0,
"place4" : 38.0
}
}
For the interested, the context of the question is in fact that features of MongoDB 3.4 were not available to them. But to do the same thing using .aggregate() where the features are available:
db.collection.aggregate([
{ "$match": { "someotherdata": { "$exists": true } } },
{ "$project": {
"_id": 0,
"someotherdata": {
"$map": {
"input": { "$objectToArray": "$someotherdata" },
"as": "s",
"in": { "k": "$$s.k", "v": { "$strLenCP": "$$s.v" } }
}
}
}},
{ "$unwind": "$someotherdata" },
{ "$group": {
"_id": "$someotherdata.k",
"v": { "$max": "$someotherdata.v" }
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": null,
"data": {
"$push": { "k": "$_id", "v": "$v" }
}
}},
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": "$data"
}
}}
])
With the same output:
{
"place1" : 25,
"place2" : 26,
"place3" : 13,
"place4" : 38
}
Use cursor.forEach to iterate through the collection.
Keep track of the longest placen values (starting from -1, updating when greater found). Print out values with print() or printjson()

How to Sort by Weighted Values

I have this problem that I want to sort the result of a query based on the field values from another collection,
Problem: I want to first get the user 123 friends and then get their posts and then sort the post with the friends strength value,
I have this :
POST COLLECTON:
{
user_id: 8976,
post_text: 'example working',
}
{
user_id: 673,
post_text: 'something',
}
USER COLLECTON:
{
user_id: 123,
friends: {
{user_id: 673,strength:4}
{user_id: 8976,strength:1}
}
}
Based on the information you have retrieved from your user you essentially want to come out to an aggregation framework query that looks like this:
db.posts.aggregate([
{ "$match": { "user_id": { "$in": [ 673, 8976 ] } } },
{ "$project": {
"user_id": 1,
"post_text": 1,
"weight": {
"$cond": [
{ "$eq": [ "$user_id", 8976 ] },
1,
{ "$cond": [
{ "$eq": [ "$user_id", 673 ] },
4,
0
]}
]
}
}},
{ "$sort": { "weight": -1 } }
])
So why aggregation when this does not aggregate? As you can see, the aggregation framework does more than just aggregate. Here it is being used to "project" a new field into the document an populate it with a "weight" to sort on. This allows you to get the results back ordered by the value you want them to be sorted on.
Of course, you need to get from your initial data to this form in a "generated" way that you do do for any data. This takes a few steps, but here I'll present the JavaScript way to do it, which should be easy to convert to most languages
Also presuming your actual "user" looks more like this, which would be valid:
{
"user_id": 123,
"friends": [
{ "user_id": 673, "strength": 4 },
{ "user_id": 8976, "strength": 1 }
]
}
From an object like this you then construct the aggregation pipeline:
// user is the structure shown above
var stack = [];
args = [];
user.friends.forEach(function(friend) {
args.push( friend.user_id );
var rec = {
"$cond": [
{ "$eq": [ "user_id", friend.user_id ] },
friend.strength
]
};
if ( stack.length == 0 ) {
rec["$cond"].push(0);
} else {
var last = stack.pop();
rec["$cond"].push( last );
}
stack.push( rec );
});
var pipeline = [
{ "$match": { "user_id": { "$in": args } } },
{ "$project": {
"user_id": 1,
"post_text": 1,
"weight": stack[0]
}},
{ "$sort": { "weight": -1 } }
];
db.posts.aggregate(pipeline);
And that is all there is to it. Now you have some code to go through the list of "friends" for a user and construct another query to get all posts from those friends weighted by the "strength" value for each.
Of course you could do much the same things with a query for all posts by just removing or changing the $match, but keeping the "weight" projection you can "float" all of the "friends" posts to the top.

Categories

Resources