Mongodb find in hash by value - javascript

i have this mongodb documents format:
{
"_id": ObjectId("5406e4c49b324869198b456a"),
"phones": {
"12035508684": 1,
"13399874497": 0,
"15148399728": 1,
"18721839971": 1,
"98311321109": -1,
}
}
phones field - its a hash of phone numbers and frequency of its using.
And i need to select all documents, which have at least one zero or less frequency.
Trying this:
db.my_collection.find({"phones": { $lte: 0} })
but no luck.
Thanks in advance for your advices

You can't do that sort of query in MongoDB, well not in a simple way anyhow, as what you are doing here is generally an "anti-pattern", where part of your data is actually being specified as "keys". So a better way to model this is you use something where that "data" is actually a value to a key, and not the other way around:
{
"_id": ObjectId("5406e4c49b324869198b456a"),
"phones": [
{ "number": "12035508684", "value": 1 },
{ "number": "13399874497", "value": 0 },
{ "number": "15148399728", "value": 1 },
{ "number": "18721839971", "value": 1 },
{ "number": "98311321109", "value": -1 },
}
}
Then your query is quite simple:
db.collection.find({ "phones.value": { "$lte": 0 } })
But otherwise MongoDB cannot "natively" traverse the "keys" of an object/hash, and to do that you need do JavaScript evaluation to do this. Which is not a great idea for performance. Basically a $where query in short form:
db.collection.find(function() {
var phones = this.phones;
return Object.keys(phones).some(function(phone) {
return phones[phone] <= 0;
})
})
So the better option is to change the way you are modelling this and take advantage of the native operators. Otherwise most queries require and "explicit" path to any "key" inside the object/hash.

Related

How to update date string in array to date format in mongoDB?

My mongoDB collection looks like this:
[
{
"id": "myid",
"field": {
"total": 1,
"subfield": [
{
"time": "2020-08-06T08:33:57.977+0530"
},
{
"time": "2020-05-08T04:13:27.977+0530"
}
]
}
},
{
"id": "myid2",
"field": {
"total": 1,
"subfield": [
{
"time": "2020-07-31T10:15:50.184+0530"
}
]
}
}
]
I need to update all the documents and convert date string in the field time available in the subfieldarray to mongoDB ISO date format.
I have thousands of documents and hundreds of objects in subfield array
I'm aware of the aggregate function $todate and $convert.
But I don't want to use aggregation because,
To use $todate or $convert, I need to unwind the field.subfield array which is again an expensive operation.
I want to update my document and save it with the date format.
My MongoDB server version: 4.0.3
I tried the following but it doesn't seem to work and also doesn't return any errors.
db.collection.find().forEach(function(doc) {
doc.field.subfield.time=new ISODate(doc.field.subfield.time);
db.collection.save(doc);
})
You missed a loop for subfield, because its an array,
db.collection.find().forEach(function(doc) {
doc.field.subfield.forEach(function(r) {
r.time = new ISODate(r.time);
})
db.collection.save(doc);
})
If this is for one time then time does not matter, i think both will take same time if you do with aggregation or forEach.
If you are planing to update MongoDb version then form 4.2,
a option you can update with updateMany() using update with aggregation pipeline,
db.collection.updateMany({},
[{
$set: {
"field.subfield": {
$map: {
input: "$field.subfield",
as: "r",
in: {
$mergeObjects: [
"$$r",
{ time: { $toDate: "$$r.time" } }
]
}
}
}
}
}]
)

RethinkDB: Javascript - How to deleted nested objects

I'm having a rather large amount of difficulty with trying to remove nested objects from my table, without accidentally deleting all my data in the process (happened three times now, thank god I made copies).
My Object:
{
"value1": thing,
"value2": thing,
"value3": thing,
"roles": {
"1": {
"name": "Dave",
"id": "1"
},
"2": {
"name": "Jeff",
"id": "2"
},
"3": {
"name": "Rick",
"id": "3"
},
"4": {
"name": "Red",
"id": "4"
}
}
}`
I've tried a number of rethink queries, but none have worked thus far. It should be noted that 1, 2, 3, & 4 are variables that can have any amount of numbers, and thus my query must reflect that.
Some attempted queries:
function removeRole(id, roleName) {
let role = `${roleName}`
return this.r.table('guilds').get(id).replace(function(s){
return s.without({roles : {[role] : { "name": role }}})
})
}
function removeRole(id, roleName) {
return this.r.table('guilds').getAll(id).filter(this.r.replace(this.r.row.without(roleName))).run()
}
function removeRole(id, roleName) {
return this.r.table('guilds').get(id)('roles')(roleName).delete()
}
Any assistance is greatly appreciated, and if the question has issues, please let me know. Still rather new to this so feedback is appreciated.
I'm not sure if I understood your intention, but the following query seems to do what you're trying to accomplish:
r.db('test')
.table('test')
.get(id)
.replace((doc) => {
// This expression makes sure that we delete the specified keys only
const roleKeys = doc
.getField('roles')
.values()
// Make sure we have a role name is in the names array
.filter(role => r.expr(names).contains(role.getField('name')))
// This is a bit tricky, and I believe I implemented this in a not efficient
// way probably missing a first-class RethinkDB expression that supports
// such a case out of box. Since we are going to delete by nested dynamic
// ids, RethinkDB requires special syntax to denote nested ids:
// {roles: {ID_1: true, ID_2: true}}
// Well, this is just a JavaScript syntax workaround, so we're building
// such an object dynamically using fold.
.fold({}, (acc, role) => acc.merge(r.object(role.getField('id'), true)));
return doc.without({roles: roleKeys});
})
For example, if names is an array, say ['Jeff', 'Rick'], the nested roleKeys expession will be dynamically evaluated into:
{2: true, 3: true}
that is merged into the roles selector, and the above query will transform the document as follows:
{
"value1": ...,
"value2": ...,
"value3": ...,
"roles": {
"1": {"name": "Dave", "id": "1"},
"4": {"name": "Red", "id": "4"}
}
}

Elasticsearch: can I avoid enabling fielddata on text fields?

I'm trying to get the latest records, grouped by the field groupId, which is a String like "group_a".
I followed the accepted answer of this question, but I've got the following error message:
Fielddata is disabled on text fields by default. Set fielddata=true on [your_field_name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.
In the Elasticsearch docs is written:
Before you enable fielddata, consider why you are using a text field for aggregations, sorting, or in a script. It usually doesn’t make sense to do so.
I'm using a text field, because groupId is a String. Does it make sense to set fielddata: true if I want to group by it?
Or are there alternatives?
Using "field": "groupId.keyword" (suggested here) didn't work for me.
Thanks in advance!
The suggest answer with .keyword is the correct one.
{
"aggs": {
"group": {
"terms": {
"field": "groupId.raw"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp (or wathever you want to sort)": {
"order": "desc"
}
}
]
}
}
}
}
}
}
with a mapping like that:
"groupId": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}

Ordering by count of filtered subdocument array elements

I currently have a MongoDB collection that looks like so:
{
{
"_id": ObjectId,
"user_id": Number,
"updates": [
{
"_id": ObjectId,
"mode": Number,
"score": Number
},
{
"_id": ObjectId,
"mode": Number,
"score": Number
},
{
"_id": ObjectId,
"mode": Number,
"score": Number
}
]
}
}
I am looking to find a way to find the users with the largest number of updates per mode. For instance, if I specify mode 0, I want it to load the users in order of greatest number of updates with mode: 0.
Is this possible in MongoDB? It does not need to be a fast algorithm, as it will be cached for quite a while, and it will run asynchronously.
The fastest way would be to store a count for each "mode" within the document as another field, then you could just sort on that:
var update = {
"$push": { "updates": updateDoc },
};
var countDoc = {};
countDoc["counts." + updateDoc.mode] = 1;
update["$inc"] = countDoc;
Model.update(
{ "_id": id },
update,
function(err,numAffected) {
}
);
Which would use $inc to increment a "counts" field for each "mode" value as a key for each "mode" pushed to the "updates" array. All the calculation happens on update, so it's fast and so is the query that can be applied with a sort on that value:
Model.find({ "updates.mode": 0 }).sort({ "counts.0": -1 }).exec(function(err,users) {
});
If you don't want to or cannot store such a field then the other option is to calculate at query time with .aggregate():
Model.aggregate(
[
{ "$match": { "updates.mode": 0 } },
{ "$project": {
"user_id": 1,
"updates": 1,
"count": {
"$size": {
"$setDifference": [
{ "$map": {
"input": "$updates",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.mode", 0 ] },
"$$el",
false
]
}
}},
[false]
]
}
}
}},
{ "$sort": { "count": -1 } }
],
function(err,results) {
}
);
Which isn't bad since the filtering of the array and getting the $size is fairly effecient, but it's not as fast as just using a stored value.
The $map operator allows inline processing of the array elements which are tested by $cond to see if it returns a match or false. Then $setDifference removes any false values. A much better way to filter array content than using $unwind, which can slow things down significantly and should not be used unless your intent to to aggregate array content across documents.
But the better approach is to store the value for the count instead, since this does not require runtime calculation and can even use an index
I think this is a duplicate of this question:
Mongo find query for longest arrays inside object
The accepted answer seem to be doing exactly what you ask for.
db.collection.aggregate( [
{ $unwind : "$l" },
{ $group : { _id : "$_id", len : { $sum : 1 } } },
{ $sort : { len : -1 } },
{ $limit : 25 }
] )
just replace "$l" with "$updates".
[edit:] and you probably do not want the result limited to 25, so you should also get rid of the { $limit : 25 }

group by and delete documents based on a field array size

I have documents like this:
{
"_id" : ObjectId("53bcedc39c837bba3e1bf1c2"),
id : "abc1",
someArray: [ 1 , 10 , 11]
}
{
"_id" : ObjectId("53bcedc39c837bba3e1bf1c4"),
id : "abc1",
someArray: [ 1 , 10]
}
... other similar documents with different Ids
I would like to go through the whole collection and delete the document where someArray is smallest, grouped by id. So in this example, I group by abc1 (and I get 2 documents) and then the 2nd document would be the one to delete because it has least count in someArray.
There isn't a $count accumulator so I don't see how I can use $group.
Additionally, there will be 1000s of Ids that have duplicates like this, so if there is such a thing as a bulk check/delete that would be good (possibly a stupid question, sorry, Mongo is all new to me!)
Removing "duplicates" is a process here and there is no simple way to both "identify" the dupliciates and "remove" them as a single statement. Another particular here is that query forms cannot "typically" determine the size of an array, and certainly cannot sort by that where it is not already present in the document.
All cases basically come down to
Identifying the list of documents that are "duplicates", and then ideally fingering the particular document you want to delete, or more to the point the document you "don't" want to delete from the possible duplicates.
Processing that list to actually perform the deletes.
With that in mind you hopefully have a modern MongoDB of 2.6 version or greater where you can obtain a cursor from the aggregate method. You also want the Bulk Operations API available in these versions for optimal speed:
var bulk = db.collection.initializeOrderedBulkOp();
var counter = 0;
db.collection.aggregate([
{ "$project": {
"id": 1,
"size": { "$size": "$someArray" }
}},
{ "$sort": { "id": 1, "size": -1 } },
{ "$group": {
"_id": "$id",
"docId": { "$first": "$_id" }
}}
]).forEach(function(doc) {
bulk.find({ "id": doc._id, "_id": { "$ne": doc.docId }).remove();
counter++;
// Send to server once every 1000 statements only
if ( counter % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp(); // need to reset
}
});
// Clean up results that did not round to 1000
if ( counter % 1000 != 0 )
bulk.execute();
You can still do much the same thing with older versions of MongoDB, but the result from .aggregate() must be under 16MB which is the BSON limit. That still should be a lot, but with older versions you could also output to a collection with mapReduce.
But for the general aggregation response, you get an array of results and you also don't have the other convienience methods for finding the size of the array. So a little more work:
var result = db.collection.aggregate([
{ "$unwind": "$someArray" },
{ "$group": {
"_id": "$id",
"id": { "$first": "$id" },
"size": { "$sum": 1 }
}},
{ "$sort": { "id": 1, "size": -1 } },
{ "$group": {
"_id": "$id",
"docId": { "$first": "$_id" }
}}
]);
result.result.forEach(function(doc) {
db.collection.remove({ "id": doc._id, "_id": { "$ne": doc.docId } });
});
So no cursor for large results and no bulk operations so every single "remove" needs to be sent to the server individually.
So in MongoDB there are no "sub-queries" or even when there is more than "two duplicates" a way to single out the document you don't want to remove from the other duplicates. But this is the general way to do it.
Just as a note, if the "size" of arrays is something important to you for a purpose such as "sorting", then your best apporach is to maintain that "size" as another property of your document so it makes those operations easier without needing to "calculate" that as is done here.

Categories

Resources