I'm starting off with Meteor and need some help with Mongo. I have a collection of names that I'm displaying on a list and want to be able to update one variable of certain entries in the database based on other criteria. Basically what I want to do is:
For every entry where characteristic A = true and B = true, change characteristic C to be false.
So far, I've been trying to figure out how Mongo can handle a "for each" loop over the elements of the collection, and for each element check if conditions A and B hold, and then collection.update(element, {C: false}). This is proving to be a lot more problematic than I thought. I want to do something like this (using dummy variable names):
for (i = 0; i < collection.find().count(); i++){
if (collection[i].A===true && collection[i].B===true)
collection.update(collection[i], {$set: {C: false}});
};
I've been changing this base code around, but am starting to sense that I'm missing something basic about indexing/how collections work in Mongo. Can you index a collection like this (and if so, is this even the most convenient way to do what I'm trying to do?)?
Of course I figure out how to do this right after posting, and of course it's suggested in the Meteor documentation!
And, of course, it's a simple solution:
collection.update({A: true, B: true}, {$set: {C:false}});
As already suggested in comments, the correct answer is:
collection.update({A: true, B: true}, {$set: {C:false}}, {multi: true});
(At least in pure MongoDB, see there).
Without multi: true it will change only one document matching the criteria.
In Meteor it is a bit more tricky, as you are not allowed to do client-side updates other than by matching it (so no possibility for various criteria, no possibility for multi), see http://docs.meteor.com/#update.
You can iterate over all finds, but it would be better to run such code server-side.
Related
I'm looking in ways to improve the upsert performance of my mongoDB application. In my test program I have a 'user' collection which has an 'id' (type - Number) and a 'name' (type - string) property. There is an unique index on the 'id'.
The Problem:
When performing a bulk write (ordered: false) - It seems that updateOne or replaceOne with upsert enabled is about 6 to 8 times slower than 'insertOne'.
My Index:
await getDb().collection('user').createIndex({
id: 1
}, {
unique: true,
name: "id_index"
});
Example replaceOne (Take 8.8 seconds) for 100,000 users:
operations.push({
replaceOne: {
filter: {id: 1},
replacement: {id: 1, name: "user 1"},
upsert: true
}
})
Example updateOne (Take 8.4 seconds) 100,000 users:
operations.push({
updateOne: {
filter: {id: 1},
update: {$set:{name: "user 1"}},
upsert: true
}
})
Example insertOne (Take 1.3 seconds) 100,000 users:
operations.push({
insertOne: {id: 1, name: "user 1"}
})
NOTE - each time I preformed these tests, the collection was emptied, and index was recreated.
Is that to be expected?
Is there anything else I can do to improve upsert performance? I have modified writeConcern on bulkWrite with little to no impact.
I was 'following' this question to see what might come of it. Seeing no answers over the week, I'll offer my own.
Without further information, I think the evidence that you provided yourself is reasonably strong evidence to suggest that the answer is 'yes, this is expected'. While we don't know details such as how many updates versus inserts were performed by your test or what version of the database you are using, there doesn't seem to be anything blatantly wrong with the comparison. The upsert documentation suggest that the database first checks for the existence of documents that would be updated by the command before performing the insert. This is further suggested by the following text a little bit lower on the same page (emphasis added):
If all update() operations finish the query phase before any client successfully inserts data, and there is no unique index on the name field, each update() operation may result in an insert, creating multiple documents with name: Andy.
Based on all of this, I think it is perfectly reasonable to surmise that the update portion of an upsert operation has a noticeable overhead on the operation that is absent for direct insert operations.
Given this information, I think it raises a few additional questions which are important to consider:
What was your goal in knowing this information? Just to make sure you had configured things optimally, or were you not currently achieving some performance targets?
Depending on a variety of factors, perhaps an alternative approach here would be just attempt the insert (or update) and deal with the exceptions separately afterwards?
Perhaps out of curiosity, what's the purpose of having a separate unique index on id when there is already one present for the _id field? Certainly each new index (unique or not) adds some overhead, so perhaps it would be best to just repurpose the required _id field and index to use your particular needs?
How would I go about filtering a set of records based on their child records.
Let's say I have a collection Item that has a field to another collection Bag called bagId. I'd like to find all Items where a field on Bags matches some clause.
I.e. db.Items.find( { "where bag.type:'Paper' " }) . How would I go about doing this in MongoDB. I understand I'd have to join on Bags and then link where Item.bagId == Bag._id
I used Studio3T to convert a SQL GROUP BY to a Mongo aggregate. I'm just wondering if there's any defacto way to do this.
Should I perform a data migration to simply include Bag.type on every Item document (don't want to get into the habit of continuously making schema changes everytime I want to sort/filter Items by Bag fields).
Use something like https://github.com/meteorhacks/meteor-aggregate (No luck with that syntax yet)
Grapher https://github.com/cult-of-coders/grapher I played around with this briefly and while it's cool I'm not sure if it'll actually solve my problem. I can use it to add Bag.type to every Item returned, but I don't see how that could help me filter every item by Bag.type.
Is this just one of the tradeoffs of using a NoSQL dbms? What option above is recommended or are there any other ideas?
Thanks
You could use the $in functionality of MongoDB. It would look something like this:
const bagsIds = Bags.find({type: 'paper'}, {fields: {"_id": 1}}).map(function(bag) { return bag._id; });
const items = Items.find( { bagId: { $in: bagsIds } } ).fetch();
It would take some testing if the reactivity of this solution is still how you expect it to work and if this would still be suitable for larger collections instead of going for your first solution and performing the migration.
Is there a way to do this ? Or with oderByValue ? Any other mean ?
fb.orderByChild('upvotes - downvotes').startAt(_start).endAt(_end).limitToLast(_n).on("child_added", function(dataSnapshot) {
data.push(dataSnapshot.val());
});
P.S.: Yes, I already though about creating a third entry in the database that keeps track of (upvotes-downvotes), but that is what I would like to avoid here.
No, orderByChild does exactly what its name says. Currently, there is no way to define such "computed" nodes neither in the data model, nor in query criteria.
Therefore, you should store the difference directly in the database, set an index on it using security rules, and use that for your query. Updating the vote counts with a transaction should help a lot to implement it in a clean way.
Should I store objects in an Array or inside an Object with top importance given Write Speed?
I'm trying to decide whether data should be stored as an array of objects, or using nested objects inside a mongodb document.
In this particular case, I'm keeping track of a set of continually updating files that I add and update and the file name acts as a key and the number of lines processed within the file.
the document looks something like this
{
t_id:1220,
some-other-info: {}, // there's other info here not updated frequently
files: {
log1-txt: {filename:"log1.txt",numlines:233,filesize:19928},
log2-txt: {filename:"log2.txt",numlines:2,filesize:843}
}
}
or this
{
t_id:1220,
some-other-info: {},
files:[
{filename:"log1.txt",numlines:233,filesize:19928},
{filename:"log2.txt",numlines:2,filesize:843}
]
}
I am making an assumption that handling a document, especially when it comes to updates, it is easier to deal with objects, because the location of the object can be determined by the name; unlike an array, where I have to look through each object's value until I find the match.
Because the object key will have periods, I will need to convert (or drop) the periods to create a valid key (fi.le.log to filelog or fi-le-log).
I'm not worried about the files' possible duplicate names emerging (such as fi.le.log and fi-le.log) so I would prefer to use Objects, because the number of files is relatively small, but the updates are frequent.
Or would it be better to handle this data in a separate collection for best write performance...
{
"_id": ObjectId('56d9f1202d777d9806000003'),"t_id": "1220","filename": "log1.txt","filesize": 1843,"numlines": 554
},
{
"_id": ObjectId('56d9f1392d777d9806000004'),"t_id": "1220","filename": "log2.txt","filesize": 5231,"numlines": 3027
}
From what I understand you are talking about write speed, without any read consideration. So we have to think about how you will insert/update your document.
We have to compare (assuming you know the _id you are replacing, replace {key} by the key name, in your example log1-txt or log2-txt):
db.Col.update({ _id: '' }, { $set: { 'files.{key}': object }})
vs
db.Col.update({ _id: '', 'files.filename': '{key}'}, { $set: { 'files.$': object }})
The second one means that MongoDB have to browse the array, find the matching index and update it. The first one means MongoDB just update the specified field.
The worst:
The second command will not work if the matching filename is not present in the array! So you have to execute it, check if nMatched is 0, and create it if it is so. That's really bad write speed (see here MongoDB: upsert sub-document).
If you will never/almost never use read queries / aggregation framework on this collection: go for the first one, that will be faster. If you want to aggregate, unwind, do some analytics on the files you parsed to have statistics about file size and line numbers, you may consider using the second one, you will avoid some headache.
Pure write speed will be better with the first solution.
Lets say I want to maintain a list of items per user (in MongoDB with Mongoose ODM in Node.js environment) and later query to see if an item is owned by a user. For example, I want to store all of the favorite colors of each user, and later see if a specific color is owned by a specific user. It seems to me that it would be better to store the colors as an embedded object within the user document, rather than an array within the user document. The reason why is that it seems more efficient to check to see if a color exists in an object as I can just check to see if the object property exists:
if(user.colors.yellow){
//true case
} else {
//false case
}
Versus an array where I have to iterate through the entire array to see if the color exists somewhere in the array:
for (var i = 0; i < user.colors.length; i++) {
if(user.colors[i] === "yellow"){
//true case
} else {
//false case
}
}
However, from many of the examples I've seen online, it seems like using arrays for this type of thing is fairly prevalent. Am I missing something? What are the pros/cons, and best way to do this?
You don't have to use a for-loop if the colors are stored as an array. You can just use indexOf, as in:
if(user.colors.indexOf("yellow") != -1){
//true case
}
That said, it really shouldn't matter either way whether you use an embedded document or array; it's not like one or the other is going to give a huge performance advantage.
You frequently see embedded arrays used to store this type of information in MongoDB collections because if you add an index to the array property they're efficient to query and they use a natural syntax. In this case something like the following (assuming a users collection that contains a colors array property):
db.users.find({id: 1, colors: 'yellow'})
If you did this with individual color properties you'd have a more awkward looking query like this and you'd have to separately index each color (instead of just colors like you would above):
db.users.find({id: 1, 'colors.yellow': true})