MongoDB - mapReduce - javascript

I've got mongoDB collection, where each doc looks like:
{
"_id": 1,
"name": "Aurelia Menendez",
"scores": [{
"score": 60.06045071030959,
"type": "exam"
}, {
"score": 52.79790691903873,
"type": "quiz"
}, {
"score": 71.76133439165544,
"type": "homework"
}]
}
I try to run:
db.students.mapReduce(
function() {
emit(this._id, this.scores.map(a => a.score));
},
function(_id, values) {
//here i try:
1) return values.reduce((a, b) => a + b);
2) return values.reduce((a, b) => a + b, 0);
3) return Array.sum(values);
},
{ out: "total_scores" }
)
What I get? I get collection where each doc look like:
"value" is array:
{
"_id": 20,
"value": [42.17439799514388, 71.99314840599558, 81.23972632069464]
}
"value" is value
{
"_id": 188,
"value": "060.314725741828,41.12327471818652,74.8699176311771"
}
"value" is array
{
"_id": 193,
"value": [47.67196715489599, 41.55743490493954, 70.4612811769744]
}
Why I don't get sum of elements? When I try this.scores or this.scores.score instead of this.scores.map(a => a.score), I have all attributes, or null values.
Maybe someone have any idea, what did I wrong?

You should use Aggregation instead of MapReduce. This is note from official Mongo document
Aggregation pipeline provides better performance and a more coherent
interface than map-reduce, and various map-reduce operations can be
rewritten using aggregation pipeline operators, such as $group,
$merge, $accumulator, etc..
The steps I used to get the aggregation stages:
Use MongoDB Compass and open Aggregations Tab to test aggregation.
Add stages : $match : filter student, $unwind: flatten array of score, $group : get the total by sum all scores.
Convert to code
The result is
[
{
'$match': {
'name': 'Aurelia Menendez'
}
}, {
'$unwind': {
'path': '$scores'
}
}, {
'$group': {
'_id': '$_id',
'total': {
'$sum': '$scores.score'
}
}
}
]

Related

MongoDB conditional update array elements

I'm using Mongoose in a Node.js backend and I need to update a subset of elements of an array within a document based on a condition. I used to perform the operations using save(), like this:
const channel = await Channel.findById(id);
channel.messages.forEach((i) =>
i._id.toString() === messageId && i.views < channel.counter
? i.views++
: null
);
await channel.save();
I'd like to change this code by using findByIdAndUpdate since it is only an increment and for my use case, there isn't the need of retrieving the document. Any suggestion on how I can perform the operation?
Of course, channel.messages is the array under discussion. views and counter are both of type Number.
EDIT - Example document:
{
"_id": {
"$oid": "61546b9c86a9fc19ac643924"
},
"counter": 0,
"name": "#TEST",
"messages": [{
"views": 0,
"_id": {
"$oid": "61546bc386a9fc19ac64392e"
},
"body": "test",
"sentDate": {
"$date": "2021-09-29T13:36:03.092Z"
}
}, {
"views": 0,
"_id": {
"$oid": "61546dc086a9fc19ac643934"
},
"body": "test",
"sentDate": {
"$date": "2021-09-29T13:44:32.382Z"
}
}],
"date": {
"$date": "2021-09-29T13:35:33.011Z"
},
"__v": 2
}
You can try updateOne method if you don't want to retrieve document in result,
match both fields id and messageId conditions
check expression condition, $filter to iterate loop of messages array and check if messageId and views is less than counter then it will return result and $ne condition will check the result should not empty
$inc to increment the views by 1 if query matches using $ positional operator
messageId = mongoose.Types.ObjectId(messageId);
await Channel.updateOne(
{
_id: id,
"messages._id": messageId,
$expr: {
$ne: [
{
$filter: {
input: "$messages",
cond: {
$and: [
{ $eq: ["$$this._id", messageId] },
{ $lt: ["$$this.views", "$counter"] }
]
}
}
},
[]
]
}
},
{ $inc: { "messages.$.views": 1 } }
)
Playground

mongodb to return object from facet

Is it possible to have facet to return as an object instead of an array? It seems a bit counter intuitive to need to access result[0].total instead of just result.total
code (using mongoose):
Model
.aggregate()
.match({
"name": { "$regex": name },
"user_id": ObjectId(req.session.user.id),
"_id": { "$nin": except }
})
.facet({
"results": [
{ "$skip": start },
{ "$limit": finish },
{
"$project": {
"map_levels": 0,
"template": 0
}
}
],
"total": [
{ "$count": "total" },
]
})
.exec()
Each field you get using $facet represents separate aggregation pipeline and that's why you always get an array. You can use $addFields to overwrite existing total with single element. To get that first item you can use $arrayElemAt
Model
.aggregate()
.match({
"name": { "$regex": name },
"user_id": ObjectId(req.session.user.id),
"_id": { "$nin": except }
})
.facet({
"results": [
{ "$skip": start },
{ "$limit": finish },
{
"$project": {
"map_levels": 0,
"template": 0
}
}
],
"total": [
{ "$count": "total" },
]
})
.addFields({
"total": {
$arrayElemAt: [ "$total", 0 ]
}
})
.exec()
You can try this as well
Model
.aggregate()
.match({
"name": { "$regex": name },
"user_id": ObjectId(req.session.user.id),
"_id": { "$nin": except }
})
.facet({
"results": [
{ "$skip": start },
{ "$limit": finish },
{
"$project": {
"map_levels": 0,
"template": 0
}
}
],
"total": [
{ "$count": "total" },
]
})
.addFields({
"total": {
"$ifNull": [{ "$arrayElemAt": [ "$total.total", 0 ] }, 0]
}
})
.exec()
imagine that you want to pass the result of $facet to the next stage, let's say $match. well $match accepts an array of documents as input and return an array of documents that matched an expression, if the output of $facet was just an element we can't pass its output to $match because the type of output of $facet is not the same as the type of input of $match ($match is just an example). In my opinion it's better to keep the output of $facet as array to avoid handling those types of situations.
PS : nothing official in what i said

Find maximum length of data in keys for the collection

{
"_id" : ObjectId("59786a62a96166007d7e364dsadasfafsdfsdgdfgfd"),
"someotherdata" : {
"place1" : "lwekjfrhweriufesdfwergfwr",
"place2" : "sgfertgryrctshyctrhysdthc ",
"place3" : "sdfsdgfrdgfvk",
"place4" : "asdfkjaseeeeeeeeeeeeeeeeefjnhwklegvds."
}
}
I have thousands of these in my collection. I need to look through all the someotherdata and do the following
Check to see if it is present (in some records i have place1 and not place4)
Find the longest record (in terms of string length)
The output must look something like this (showing the count of characters for the longest)
{
place1: 123,
place2: 12,
place3: 17
place4: 445
}
I'am using Mongodb 3.2.9 so don't have access to the new aggregate functions. But I do have the Mongodb shell
EDIT: To be clear I want the longest throughout the whole collection. So there might be 1000 documents but only one result with the longest length for each field throughout the whole collection.
Use .mapReduce() for this to reduce down to the largest values for each key:
db.collection.mapReduce(
function() {
emit(null,
Object.keys(this.someotherdata).map(k => ({ [k]: this.someotherdata[k].length }))
.reduce((acc,curr) => Object.assign(acc,curr),{})
);
},
function(key,values) {
var result = {};
values.forEach(value => {
Object.keys(value).forEach(k => {
if (!result.hasOwnProperty(k))
result[k] = 0;
if ( value[k] > result[k] )
result[k] = value[k];
});
});
return result;
},
{
"out": { "inline": 1 },
"query": { "someotherdata": { "$exists": true } }
}
)
Which basically emits the "length" of each key present in the sub-document path for each document, and then in "reduction", only the largest "length" for each key is actually returned.
Note that in mapReduce you need to put out the same structure you put in, since the way it deals with a large number of documents is by "reducing" in gradual batches. Which is why we emit in numeric form, just like the "reduce" function does.
Gives this output on your document shown in the question. Of course it's the "max" on all documents in the collection when you have more.
{
"_id" : null,
"value" : {
"place1" : 25.0,
"place2" : 26.0,
"place3" : 13.0,
"place4" : 38.0
}
}
For the interested, the context of the question is in fact that features of MongoDB 3.4 were not available to them. But to do the same thing using .aggregate() where the features are available:
db.collection.aggregate([
{ "$match": { "someotherdata": { "$exists": true } } },
{ "$project": {
"_id": 0,
"someotherdata": {
"$map": {
"input": { "$objectToArray": "$someotherdata" },
"as": "s",
"in": { "k": "$$s.k", "v": { "$strLenCP": "$$s.v" } }
}
}
}},
{ "$unwind": "$someotherdata" },
{ "$group": {
"_id": "$someotherdata.k",
"v": { "$max": "$someotherdata.v" }
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": null,
"data": {
"$push": { "k": "$_id", "v": "$v" }
}
}},
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": "$data"
}
}}
])
With the same output:
{
"place1" : 25,
"place2" : 26,
"place3" : 13,
"place4" : 38
}
Use cursor.forEach to iterate through the collection.
Keep track of the longest placen values (starting from -1, updating when greater found). Print out values with print() or printjson()

Aggregate data from array of objects

I have the following schema:
{ "_id": {
"$oid": "58c0204d9f10810115f13e5d"
},"OrgName": "A",
"modules": [
{
"name": "test",
"fullName": "john smith",
"_id": {
"$oid": "58c0204d9f10810115f13e5e"
},
"TimeSavedPlanning": 520,
"TimeSavedWorking": 1000,
"costSaved": 0
},
{
"name": "test1",
"fullName": "john smith",
"_id": {
"$oid": "58c020f85437c22215be92cc"
},
"TimeSavedPlanning": 0,
"TimeSavedWorking": 1000,
"costSaved": 500
}
]
}
I want to aggregate the data within the "modules" array for all documents where OrgName = A and outputs the following totals.
TimeSavedPlanning = 520 (because 520 + 0 = 520)
TimeSavedWorking = 2000 (because 1000 + 1000 = 2000)
costSaved = 500 (because 0 + 500)
Just supply each field for the $group accumulators. And use the "double barreled" $sum to "sum" both from arrays, and from documents:
Model.aggregate([
{ "$match": { "OrgName": "A" } },
{ "$group": {
"_id": null,
"TimeSavedPlanning": { "$sum": { "$sum":"$modules.TimeSavedPlanning" } },
"TimeSavedWorking": { "$sum": { "$sum": "$modules.TimeSavedWorking" } },
"costSaved": { "$sum": { "$sum": { "$modules.costSaved" } }
}}
])
You have been allowed to use $sum like that since MongoDB 3.2. Since that release it has "two" functions:
Takes an "array" of values and "sums" them together.
Acts and an "accumulator" within $group to "sum" values provided from documents.
So here you use "both" functions by "reducing" the arrays down to numeric values per document, and then "accumulating" via the $group.
Of course the $match does the "selection" right at the beginning of the operation chain. Since that determines the selection of data, and you put that there for that purpose, as well as the fact you can use an "index" from that "first" stage.

How to retrieve documents with conditioning an array of nested objects?

The structure of the objects stored in mongodb is the following:
obj = {_id: "55c898787c2ab821e23e4661", ingredients: [{name: "ingredient1", value: "70.2"}, {name: "ingredient2", value: "34"}, {name: "ingredient3", value: "15.2"}, ...]}
What I would like to do is retrieve all documents, which value of specific ingredient is greater than arbitrary number.
To be more specific, suppose we want to retrieve all the documents which contain ingredient with name "ingredient1" and its value is greater than 50.
Trying the following I couldn't retrieve desired results:
var collection = db.get('docs');
var queryTest = collection.find({$where: 'this.ingredients.name == "ingredient1" && parseFloat(this.ingredients.value) > 50'}, function(e, docs) {
console.log(docs);
});
Does anyone know what is the correct query to condition upon specific array element names and values?
Thanks!
You really don't need the JavaScript evaluation of $where here, just use basic query operators with an $elemMatch query for the array. While true that the "value" elements here are in fact strings, this is not really the point ( as I explain at the end of this ). The main point is to get it right the first time:
collection.find(
{
"ingredients": {
"$elemMatch": {
"name": "ingredient1",
"value": { "$gt": 50 }
}
}
},
{ "ingredients.$": 1 }
)
The $ in the second part is the postional operator, which projects only the matched element of the array from the query conditions.
This is also considerably faster than the JavaScript evaluation, in both that the evaluation code does not need to be compiled and uses native coded operators, as well as that an "index" can be used on the "name" and even "value" elements of the array to aid in filtering the matches.
If you expect more than one match in the array, then the .aggregate() command is the best option. With modern MongoDB versions this is quite simple:
collection.aggregate([
{ "$match": {
"ingredients": {
"$elemMatch": {
"name": "ingredient1",
"value": { "$gt": 50 }
}
}
}},
{ "$redact": {
"$cond": {
"if": {
"$and": [
{ "$eq": [ { "$ifNull": [ "$name", "ingredient1" ] }, "ingredient1" ] },
{ "$gt": [ { "$ifNull": [ "$value", 60 ] }, 50 ] }
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])
And even simplier in forthcoming releases which introduce the $filter operator:
collection.aggregate([
{ "$match": {
"ingredients": {
"$elemMatch": {
"name": "ingredient1",
"value": { "$gt": 50 }
}
}
}},
{ "$project": {
"ingredients": {
"$filter": {
"input": "$ingredients",
"as": "ingredient",
"cond": {
"$and": [
{ "$eq": [ "$$ingredient.name", "ingredient1" ] },
{ "$gt": [ "$$ingredient.value", 50 ] }
]
}
}
}
}}
])
Where in both cases you are effectively "filtering" the array elements that do not match the conditions after the initial document match.
Also, since your "values" are actually "strings" right now, you reaally should change this to be numeric. Here is a basic process:
var bulk = collection.initializeOrderedBulkOp(),
count = 0;
collection.find().forEach(function(doc) {
doc.ingredients.forEach(function(ingredient,idx) {
var update = { "$set": {} };
update["$set"]["ingredients." + idx + ".value"] = parseFloat(ingredients.value);
bulk.find({ "_id": doc._id }).updateOne(update);
count++;
if ( count % 1000 != 0 ) {
bulk.execute();
bulk = collection.initializeOrderedBulkOp();
}
})
]);
if ( count % 1000 != 0 )
bulk.execute();
And that will fix the data so the query forms here work.
This is much better than processing with JavaScript $where which needs to evaluate every document in the collection without the benefit of an index to filter. Where the correct form is:
collection.find(function() {
return this.ingredients.some(function(ingredient) {
return (
( ingredient.name === "ingredient1" ) &&
( parseFloat(ingredient.value) > 50 )
);
});
})
And that can also not "project" the matched value(s) in the results as the other forms can.
Try using $elemMatch:
var queryTest = collection.find(
{ ingredients: { $elemMatch: { name: "ingredient1", value: { $gte: 50 } } } }
);

Categories

Resources