how to complete (update) a mongoDB document - javascript

I am facing the following situation:
I have a api request from one service who create multiple mongo documents in one collection for example:
[
{_id: 1, test1: 2, test: 3},
{_id: 2, test1: 3, test: 4}
]
after this a second service read these documents and search another part of the information, so I am building an object like:
[
{_id: 1, newValue: 4},
{_id: 2, newValue: 4}
]
my question is: can I using the "_id" value update all the documents at once ? the problem is that I don't want to update one per one the documents because there are to many, so I am guessing that is better delete all these documents and insert them with all information, what do you think ?

A mass deletion followed by "re-insertion" is a bit heavy handed and will also take its toll on index management. Doing multiple updates in bulk using bulkWrite is the better way:
var bulkOps = [];
for each thing you need to process {
set up _id and newValue ... ;
bulkOps.push({
"updateOne": {
"filter": { "_id": theOneYouAreWorkingOn },
update: { $set: {"newValue": theTargetValue } }
}
});
});
// Send a single big bundle of updates. This could also be wrapped in a transaction:
printjson( db.foo.bulkWrite(bulkOps) );

Related

How to create "products filter" efficiently in Node.js and Angular?

I'm creating an angular application (computer online store) with a node/express backened. I have a products page to display all the products in my DB. A product has this model (typescript):
interface Product {
name: string
properties: {name: string, value: string | number}[]
}
I have a section within the page where you can filter products by properties. for instance a user can filter all the CPUs that have 4 cores or 8 cores. right now this is implemented like this:
In the angular application i query ALL THE PRODUCTS of the requested category,
loop through all of them, collect their properties and all the possible values and filter like this...
const products = [
{
name: 'intel cpu 1',
properties: [
{name: 'cores', value: 8},
{name: 'clock speed', value: 2.6}
]
},
{
name: 'intel cpu 2',
properties: [
{name: 'cores', value: 4},
{name: 'clock speed', value: 1.2}
]
}
]
collectPropertiesFromProducts(products)
// RESULT:
[
{property: 'cores', possibleValues: [4,8]},
{property: 'clock speed', possibleValues: [1.2,2.6]}
]
For now it works great, i can filter products easily by the result and it is all dynamic (i can just add a property to a product and thats it).
The problem is that it scales VERY BADLY, because:
I have to query all of the products to know their properties
The more products/properties = more CPU time = blocks main thread
My question is how can i do better? i have a node server so moving all the logic to there its pretty useless, i could also just move the "property collecting" function to a worker thread but again, ill have to query all the products...
Instead of dealing with this in the client or in the service itself, you can let mongodb do the calculations for you. E.g. you could write the following aggregation:
db.getCollection('products').aggregate([{
$unwind: "$properties"
},
{
$project: {
name: "$properties.name",
total: {
$add: ["$properties.value", ]
}
}
}, {
$group: {
_id: "$name",
possibleValues: {
$addToSet: "$total"
}
}
}
])
You could then expose this query through a custom endpoint (e.g. GET /product-properties) on your node-server and consume the response on the client.
You should consider doing multiple requests to the backend:
First:
getQueryParams, a new endpoint which returns your RESULT
Second:
A none filtered request to receive the initial set of products
Third:
When select a filter (based on first request) you do a new request with the selected filter

Mongoose: Sorting

what's the best way to sort the following documents in a collection:
{"topic":"11.Topic","text":"a.Text"}
{"topic":"2.Topic","text":"a.Text"}
{"topic":"1.Topic","text":"a.Text"}
I am using the following
find.(topic:req.body.topic).(sort({topic:1}))
but is not working (because the fields are strings and not numbers so I get):
{"topic":"1.Topic","text":"a.Text"},
{"topic":"11.Topic","text":"a.Text"},
{"topic":"2.Topic","text":"a.Text"}
but i'd like to get:
{"topic":"1.Topic","text":"a.Text"},
{"topic":"2.Topic","text":"a.Text"},
{"topic":"11.Topic","text":"a.Text"}
I read another post here that this will require complex sorting which mongoose doesn't have. So perhaps there is no real solution with this architecture?
Your help is greatly appreciated
i will suggest you make your topic filed as type : Number, and create another field topic_text.
Your Schema would look like:
var documentSchema = new mongoose.Schema({
topic : Number,
topic_text : String,
text : String
});
Normal document would look something like this:
{document1:[{"topic":11,"topic_text" : "Topic" ,"text":"a.Text"},
{"topic":2,"topic_text" : "Topic","text":"a.Text"},
{"topic":1,"topic_text" : "Topic","text":"a.Text"}]}
Thus, you will be able to use .sort({topic : 1}) ,and get the result you want.
while using topic value, append topic_text to it.
find(topic:req.body.topic).sort({topic:1}).exec(function(err,result)
{
var topic = result[0].topic + result[0].topic_text;//use index i to extract the value from result array.
})
If you do not want (or maybe do not even can) change the shape of your documents to include a numeric field for the topic number then you can achieve your desired sorting with the aggregation framework.
The following pipeline essentially splits the topic strings like '11.Topic' by the dot '.' and then prefixes the first part of the resulting array with a fixed number of leading zeros so that sorting by those strings will result in 'emulated' numeric sorting.
Note however that this pipeline uses $split and $strLenBytes operators which are pretty new so you may have to update your mongoDB instance - I used version 3.3.10.
db.getCollection('yourCollection').aggregate([
{
$project: {
topic: 1,
text: 1,
tmp: {
$let: {
vars: {
numStr: { $arrayElemAt: [{ $split: ["$topic", "."] }, 0] }
},
in: {
topicNumStr: "$$numStr",
topicNumStrLen: { $strLenBytes: "$$numStr" }
}
}
}
}
},
{
$project: {
topic: 1,
text: 1,
topicNumber: { $substr: [{ $concat: ["_0000", "$tmp.topicNumStr"] }, "$tmp.topicNumStrLen", 5] },
}
},
{
$sort: { topicNumber: 1 }
},
{
$project: {
topic: 1,
text: 1
}
}
])

Insert array of objects into MongoDB

I wonder how I could insert array of objects to Mongo collection "root-level documents" with own pre-defined _id values.
I have tried db.MyCollection.insert(array); but it creates nested documents under one single generated _id in MongoDB.
var array = [
{ _id: 'rg8nsoqsxhpNYho2N',
goals: 0,
assists: 1,
total: 1 },
{ _id: 'yKMx6sHQboL5m8Lqx',
goals: 0,
assists: 1,
total: 1 }];
db.MyCollection.insert(array);
What I want
db.collection.insertMany() is what you need (supported from 3.2):
db.users.insertMany(
[
{ name: "bob", age: 42, status: "A", },
{ name: "ahn", age: 22, status: "A", },
{ name: "xi", age: 34, status: "D", }
]
)
output:
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("57d6c1d02e9af409e0553dff"),
ObjectId("57d6c1d02323d119e0b3c0e8"),
ObjectId("57d6c1d22323d119e0b3c16c")
]
}
Why not iterate over the array objects, and insert them one at a time?
array.forEach((item) => db.MyCollection.insert(item));
Go through this Link To get Exact Outcome the way you want:
https://docs.mongodb.org/manual/tutorial/insert-documents/#insert-a-document
You can use MongoDB Bulk to insert multiple document in one single call to the database.
First iterate over your array and call the bulk method for each item:
bulk.insert(item)
After the loop, call execute:
bulk.execute()
Take a look at the refereed documentation to learn more.

MongoDB aggregate merge two different fields as one and get count

I have following data in MongoDB:
[{id:3132, home:'NSH', away:'BOS'}, {id:3112, home:'ANA', away:'CGY'}, {id:3232, home:'MIN', away:'NSH'}]
Is it possible to get total game count for each team with aggregate pipeline?
desired result:
[{team: 'NSH', totalGames: 2}, {team:'MIN', totalGames: 1}, ...}]
i can get each on seperately to their own arrays with two aggregate calls:
[{$group: {_id: "$home", gamesLeft: {$sum: 1}}}]
and
[{$group: {_id: "$away", gamesLeft: {$sum: 1}}}]
resulting
var homeGames = [ { _id: 'NSH', totalGames: 1 }, { _id: 'SJS', totalGames: 2 }, ...]
var awayGames = [ { _id: 'NSH', totalGames: 1 }, { _id: 'SJS', totalGames: 4 }, ...]
But i really want to get it working with just one query. If not possible what would be the best way to combine these two results in to one using javascript?
After some puzzling, I found a way to get it done using an aggregate pipeline. Here is the result:
db.games.aggregate([{
$project: {
isHome: { $literal: [true, false] },
home: true,
away: true
}
}, {
$unwind: '$isHome'
}, {
$group: {
_id: { $cond: { if: '$isHome', then: '$home', else: '$away' } },
totalGames: { $sum: 1 }
}
}
]);
As you can see it consists of three stages. The first two are meant to duplicate each document into one for the home team and one for the away team. To do this, the project stage first creates a new isHome field on each document containing a true and a false value, which the unwind stage then splits into separate documents containing either the true or the false value.
Then in the group phase, we let the isHome field decide whether to group on the home or the away field.
It would be nicer if we could create a team field in the project step, containing the array [$home, $away], but mongo only supports adding array literals here, hence the workaround.

MongoDB mapReduce method unexpected results

I have 100 documents in my mongoDB, assuming each of them are possible duplicate with other document(s) in different conditions, such as firstName & lastName, email and mobile phone.
I am trying to mapReduce these 100 documents to have the key-value pairs, like grouping.
Everything works fine until I have the 101st duplicate records in the DB.
The output of the mapReduce result for the other documents which are duplicate with the 101st records are corrupted.
For example:
I am working on firstName & lastName now.
When the DB contains 100 documents, I can have the result containing
{
_id: {
firstName: "foo",
lastName: "bar,
},
value: {
count: 20
duplicate: [{
id: ObjectId("/*an object id*/"),
fullName: "foo bar",
DOB: ISODate("2000-01-01T00:00:00.000Z")
},{
id: ObjectId("/*another object id*/"),
fullName: "foo bar",
DOB: ISODate("2000-01-02T00:00:00.000Z")
},...]
},
}
It is what exactly I want, but...
when the DB contains more than 100 possible duplicated documents, the result became like this,
Let's say the 101st documents is
{
firstName: "foo",
lastName: "bar",
email: "foo#bar.com",
mobile: "019894793"
}
containing 101 documents:
{
_id: {
firstName: "foo",
lastName: "bar,
},
value: {
count: 21
duplicate: [{
id: undefined,
fullName: undefined,
DOB: undefined
},{
id: ObjectId("/*another object id*/"),
fullName: "foo bar",
DOB: ISODate("2000-01-02T00:00:00.000Z")
}]
},
}
containing 102 documents:
{
_id: {
firstName: "foo",
lastName: "bar,
},
value: {
count: 22
duplicate: [{
id: undefined,
fullName: undefined,
DOB: undefined
},{
id: undefined,
fullName: undefined,
DOB: undefined
}]
},
}
I found another topic on stackoverflow having the similar issue like me, but the answer does not work for me
MapReduce results seem limited to 100?
Any ideas?
Edit:
Original source code:
var map = function () {
var value = {
count: 1,
userId: this._id
};
emit({lastName: this.lastName, firstName: this.firstName}, value);
};
var reduce = function (key, values) {
var reducedObj = {
count: 0,
userIds: []
};
values.forEach(function (value) {
reducedObj.count += value.count;
reducedObj.userIds.push(value.userId);
});
return reducedObj;
};
Source code now:
var map = function () {
var value = {
count: 1,
users: [this]
};
emit({lastName: this.lastName, firstName: this.firstName}, value);
};
var reduce = function (key, values) {
var reducedObj = {
count: 0,
users: []
};
values.forEach(function (value) {
reducedObj.count += value.count;
reducedObj.users = reducedObj.users.concat(values.users); // or using the forEach method
// value.users.forEach(function (user) {
// reducedObj.users.push(user);
// });
});
return reducedObj;
};
I don't understand why it would fail as I was also pushing a value (userId) to reducedObj.userIds.
Are there some problems about the value that I emitted in map function?
Explaining the problem
This is a common mapReduce trap, but clearly part of the problem you have here is that the questions you are finding don't have answers that explain this clearly or even properly. So an answer is justified here.
The point in the documentation that is often missed or at least misunderstood is here in the documentation:
MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.
And adding to that just a little later down the page:
the type of the return object must be identical to the type of the value emitted by the map function.
What this means in the context of your question is that at a certain point there are "too many" duplicate key values being passed in for a reduce stage to act on this in one single pass as it will be able to do for a lower number of documents. By design the reduce method is called multiple times, often taking the "output" from data that is already reduced as part of it's "input" for yet another pass.
This is how mapReduce is designed to handle very large datasets, by processing everything in "chunks" until it finally "reduces" down to a singular grouped result per key. This is why the next statement is important is that what comes out of both emit and the reduce output needs to be structured exactly the same in order for the reduce code to handle it correctly.
Solving the problem
You correct this by fixing up how you are both emitting the data in the map and how you also return and process in the reduce function:
db.collection.mapReduce(
function() {
emit(
{ "firstName": this.firstName, "lastName": this.lastName },
{ "count": 1, "duplicate": [this] } // Note [this]
)
},
function(key,values) {
var reduced = { "count": 0, "duplicate": [] };
values.forEach(function(value) {
reduced.count += value.count;
value.duplicate.forEach(function(duplicate) {
reduced.duplicate.push(duplicate);
});
});
return reduced;
},
{
"out": { "inline": 1 },
}
)
The key points can be seen in both the content to emit and the first line of the reduce function. Essentially these present a structure that is the same. In the case of the emit it does not matter that the array being produced only has a singular element, but you send it that way anyhow. Side by side:
{ "count": 1, "duplicate": [this] } // Note [this]
// Same as
var reduced = { "count": 0, "duplicate": [] };
That also means that the remainder of the reduce function will always assume that the "duplicate" content is in fact an array, because that is how it came as original input and is also how it will be returned:
values.forEach(function(value) {
reduced.count += value.count;
value.duplicate.forEach(function(duplicate) {
reduced.duplicate.push(duplicate);
});
});
return reduced;
Alternate Solution
The other reason for an answer is that considering the output you are expecting, this would in fact be much better suited to the aggregation framework. It's going to do this a lot faster than mapReduce can, and is even far more simple to code up:
db.collection.aggregate([
{ "$group": {
"_id": { "firstName": "$firstName", "lastName": "$lastName" },
"duplicate": { "$push": "$$ROOT" },
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } }}
])
That's all it is. You can write out to a collection by adding an $out stage to this where required. But basically either mapReduce or aggregate, you are still placing the same 16MB restriction on the document size by adding your "duplicate" items into an array.
Also note that you can simply do something that mapReduce cannot here, and just "omit" any items that are not in fact a "duplicate" from the results. The mapReduce method cannot do this without first producing output to a collection and then "filtering" the results in a separate query.
That core documentation itself quotes:
NOTE
For most aggregation operations, the Aggregation Pipeline provides better performance and more coherent interface. However, map-reduce operations provide some flexibility that is not presently available in the aggregation pipeline.
So it's really a case of weighing up which is better suited to the problem at hand.

Categories

Resources