My document in cosmosdb looks like this
{
"todayDate": "2017-12-08",
"data": [
{
"group": {"priority": 1, "total": 10},
"severity": 1
},
{
"group": {"priority": 2, "total": 13},
"priority": 2
}
]
}
The following query when issued from either mongoShell for cosmosdb in azure portal or using my spring data mongodb project works fine and returns results in no time:
db.myCollection.find({ "$or" : [ { "data" : { "$elemMatch" : { "priority" : 1}} , "$or" : [ { "data" : { "$elemMatch" : { "group.priority" : 1}}}] }]})
However, the following query on the same lines with more OR conditions which basically is two of the above queries with OR operator, hangs indefinitely:
db.myCollection.find({ "$or": [ { "data" : { "$elemMatch" : { "priority" : 1}} , "$or" : [ { "data" : { "$elemMatch" : { "group.priority" : 1}}}] }, { "data" : { "$elemMatch" : { "severity" : 2}} , "$or" : [ { "data" : { "$elemMatch" : { "group.severity" : 2}}}] } ] })
Is there anything wrong with the last query that makes it hang indefinitely? Even if I replace initial OR with AND, still the same result i.e. hangs indefinitely.
I created 3 documents in my cosmos db according to the document template you provided.
[
{
"id": "1",
"todayDate": "2017-12-08",
"data": [
{
"group": {
"severity": 1,
"total": 10
},
"severity": 1
},
{
"group": {
"priority": 1,
"total": 13
},
"priority": 1
}
]
},
{
"id": "2",
"todayDate": "2017-12-09",
"data": [
{
"group": {
"priority": 3,
"total": 10
},
"severity": 1
},
{
"group": {
"priority": 3,
"total": 13
},
"priority": 1
}
]
},
{
"id": "3",
"todayDate": "2017-12-10",
"data": [
{
"group": {
"priority": 1,
"total": 10
},
"severity": 1
},
{
"group": {
"priority": 2,
"total": 13
},
"priority": 2
}
]
}
]
Then I use Robo 3T tool to execute your sql.
db.coll.find({
"$or": [
{ "data" : { "$elemMatch" : { "priority" : 1}} ,
"$or" : [
{ "data" : { "$elemMatch" : { "group.priority" : 1}}}
] },
{ "data" : { "$elemMatch" : { "severity" : 2}} ,
"$or" : [
{ "data" : { "$elemMatch" : { "group.severity" : 2}}}
] }
]
})
result:
The syntax of the $or that I found on the official document is:
{ $or: [ { <expression1> }, { <expression2> }, ... , { <expressionN> } ] }
It seems that your SQL can be executed normally though it is different from the above syntax. Per my experience, $or is generally used to be nested with $and (MongoDB Nested OR/AND Where?) ,so I do not quite understand what is the purpose of your $or nested here.
Surely, an indefinite hang is probably because the data is too large so that SQL runs too long and you need to optimize your SQL.
Hope it helps you.Any concern ,please let me know.
Update Answer:
I have properly modified my 3 sample documents then query 2 eligible documents via the SQL you provided.
SQL:
db.coll.find(
{
"$and": [
{
"$or": [
{
"data": {
"$elemMatch": {
"priority": 2
}
}
},
{
"data": {
"$elemMatch": {
"group.priority": 2
}
}
}
]
},
{
"$or": [
{
"data": {
"$elemMatch": {
"severity": 1
}
}
},
{
"data": {
"$elemMatch": {
"group.severity": 1
}
}
}
]
}
]
}
)
Results:
So , I think your SQL is correct. Is the data in the database very large? If you've been hanging for a long time, did you have seen timeout error messages? Or you could check RUs setting's issue.
Related
idk if this is possible but need some help with mongo, I have the following document, and I want to make it so I can use $addToSet to add a value to one of the items in votes, but remove that item from all the other items in votes but have no idea how
{
_id: '872952643117518909',
questions: [
{ question: 'a', number: 1, dropDownInfo: [Object] },
{ question: 'b', number: 2, dropDownInfo: [Object] },
{ question: 'c', number: 3, dropDownInfo: [Object] }
],
votes: {
'1': [ '619284841187246090', '662697094104219678' ],
'2': [ '619284841187246090', '662697094104219678' ],
'3': [ '662697094104219678', '619284841187246090' ]
},
question: 'abc',
timestamp: 1628198528903,
finished: false,
channel: '812038854302892064'
}
The bellow pipeline adds a vote('619284841187246090') to a specific field,here randomly "2" was chosen,and removes that vote from "1" and "3" array.
Solution is general,can work with any vote fields not just "1" "2" "3".
You can use this pipeline in aggregation or update with pipeline (Mongodb>=4.2)
$addToSet doesn't work in arrays, it works when grouping and
in some other places in MongoDB 5.
I think your schema has a problem, because you are saving data in the schema, and that makes querying harder and creating indexing harder etc.
But we can still do it converting the object to array and back to object.
I think its best to keep data in arrays,and fields to be the known schema.
You can run the bellow code here
Query
db.collection.aggregate( [ {
"$addFields" : {
"votes" : {
"$arrayToObject" : {
"$map" : {
"input" : {
"$map" : {
"input" : {
"$objectToArray" : "$votes"
},
"as" : "m",
"in" : [ "$$m.k", "$$m.v" ]
}
},
"as" : "vote",
"in" : {
"$cond" : [ {
"$eq" : [ {
"$arrayElemAt" : [ "$$vote", 0 ]
}, "2" ]
}, [ {
"$arrayElemAt" : [ "$$vote", 0 ]
}, {
"$cond" : [ {
"$in" : [ "619284841187246090", {
"$arrayElemAt" : [ "$$vote", 1 ]
} ]
}, {
"$arrayElemAt" : [ "$$vote", 1 ]
}, {
"$concatArrays" : [ {
"$arrayElemAt" : [ "$$vote", 1 ]
}, [ "619284841187246090" ] ]
} ]
} ], [ {
"$arrayElemAt" : [ "$$vote", 0 ]
}, {
"$filter" : {
"input" : {
"$arrayElemAt" : [ "$$vote", 1 ]
},
"as" : "v",
"cond" : {
"$not" : [ {
"$eq" : [ "$$v", "619284841187246090" ]
} ]
}
}
} ] ]
}
}
}
}
}
} ])
Results
[
{
"_id": "872952643117518909",
"channel": "812038854302892064",
"finished": false,
"question": "abc",
"questions": [
{
"dropDownInfo": "",
"number": 1,
"question": "a"
},
{
"dropDownInfo": "",
"number": 2,
"question": "b"
},
{
"dropDownInfo": "",
"number": 3,
"question": "c"
}
],
"timestamp": 1.628198528903e+12,
"votes": {
"1": [
"662697094104219678"
],
"2": [
"619284841187246090",
"662697094104219678"
],
"3": [
"662697094104219678"
]
}
}
]
I am trying to modify query to get expected output.I am able to write the query but not getting the output as expected so that I may bind in the front end.
Actual output:-
{
"_id" : null,
"first" : 3571.0,
"second" : 24.0
}
Expected output:-
{ "_id" : null,
"opertion":edit,
"count" : 3571.0,
}
{ "_id" : null,
"opertion":read,
"count" : 24,
}
{ "_id" : null,
"opertion":update,
"count" : 9000,
}
Myquery:-
db.getCollection('blog').aggregate([
{ "$group": {
"_id": null,
"first": {
"$sum": {
"$cond": [{ "$in": ["$Operation", ["edit1", "edit2"]] }, 1, 0]
}
},
"second": {
"$sum": {
"$cond": [{ "$in": ["$Operation", ["read1", "read2"]] }, 1, 0]
}
}
},
},
])
if you have collection which is like as below:
[
{
"_id" : 1,
"operation" : "edit1" # some extra fields
},
{
"_id" : 2,
"operation" : "read1"
},
{
"_id" : 3,
"operation" : "update1"
}
]
by using $project and $cond you can rename the "read1", "read2" to read or updates to update, or edits to edit then by grouping on the new operation field you can get the count of each operation.
you can use this query:
db.aggregate([
{
"$project": {
"new_operation":
{
"$cond": [
{"$in":
["$Operation", ["edit1", "edit2"]]
}, "edit", {
"$cond": [
{"$in":
["$operation", ["read1", "read2"]]
}, "read", "update"]
}
]
}
}
},
{
"$group": {
"_id": "$new_operation",
"count": {"$sum": 1}
}
}
])
I have stored the documents which include status property. I would like to sort the documents by status priority (not status alphabetically). I have followed previous answers and composed the following function which still doesnt work as expected; the documents are sorted by status names (alphabetically):
function getESSortingByStatusQuery(query, order) {
let statusOrder = ['BLUE', 'RED', 'BLACK', 'YELLOW', 'GREEN'];
if(order == 'desc'){
statusOrder.reverse();
}
const functions = statusOrder.map((item) => {
const idx = statusOrder.indexOf(item);
return {filter: {match: {statusColor: item}},
weight: (idx + 1) * 50}
});
const queryModified = {
"function_score": {
"query": {"match_all": {}}, // this is for testing purposes and should be replaced with original query
"boost": "5",
"functions": functions,
"score_mode": "multiply",
"boost_mode": "replace"
}
}
return queryModified;
}
I would be thankful if anyone suggested the way to sort items according to predefined priority of the property (in this case status).
Below is a sample custom sort script which I think is what you are looking for. I've added sample mapping, documents, query and the response as how it appears.
Mapping:
PUT color_index
{
"mappings": {
"properties": {
"color":{
"type": "keyword"
},
"product":{
"type": "text"
}
}
}
}
Sample Documents:
POST color_index/_doc/1
{
"color": "BLUE",
"product": "adidas and nike"
}
POST color_index/_doc/2
{
"color": "GREEN",
"product": "adidas and nike and puma"
}
POST color_index/_doc/3
{
"color": "GREEN",
"product": "adidas and nike"
}
POST color_index/_doc/4
{
"color": "RED",
"product": "nike"
}
POST color_index/_doc/5
{
"color": "RED",
"product": "adidas and nike"
}
Query:
POST color_index/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "*",
"query": "adidas OR nike"
}
}
]
}
},
"sort": [
{ "_score": { "order": "desc"} }, <---- First sort by score
{ "_script": { <---- Second sort by Colors
"type": "number",
"script": {
"lang": "painless",
"source": "if(params.scores.containsKey(doc['color'].value)) { return params.scores[doc['color'].value];} return 100000;",
"params": {
"scores": {
"BLUE": 0,
"RED": 1,
"BLACK": 2,
"YELLOW": 3,
"GREEN": 4
}
}
},
"order": "asc"
}
}
]
}
Firstly it would return documents sorted by its score, and then it would apply the second sorting logic to that result.
For the second sorting, i.e. using script sort, notice how I have added the numeric values to the colors in the scores section. You would need to construct your query accordingly.
The logic as how it works is in the source section which I believe is self-explainable, where I used doc['color'].value as that was my field on which I'm applying custom sort logic.
Response:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "color_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5159407,
"_source" : {
"color" : "BLUE",
"product" : "adidas and nike"
},
"sort" : [
0.5159407, <--- This value is score(desc by nature)
0.0 <--- This value comes from script sort as its BLUE and I've used value 0 in the script which is in 'asc' order
]
},
{
"_index" : "color_index",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.5159407,
"_source" : {
"color" : "RED",
"product" : "adidas and nike"
},
"sort" : [
0.5159407,
1.0
]
},
{
"_index" : "color_index",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.5159407,
"_source" : {
"color" : "GREEN",
"product" : "adidas and nike"
},
"sort" : [
0.5159407,
4.0
]
},
{
"_index" : "color_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.40538198,
"_source" : {
"color" : "GREEN",
"product" : "adidas and nike and puma"
},
"sort" : [
0.40538198,
4.0
]
},
{
"_index" : "color_index",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.10189847,
"_source" : {
"color" : "RED",
"product" : "nike"
},
"sort" : [
0.10189847,
1.0
]
}
]
}
}
Notice the first three documents, it has exact value of product but different color and you can see that they are grouped together as we first sorted by _score then we sort that by color
Let me know if this helps!
Here's the code sample of sorting result. I think this will helps you. If you don't want to get entire documents as result you can filter results using includes.
GET testindex/_search
{
"_source": {
"includes": [
"filed1"
]
},
"aggs": {
"emp_figures": {
"terms": {
"field": "status"
}
}
}
}
This is the sample result you should retrieve
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 84968,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "type",
"_id": "0001",
"_score": 1,
"_source": {
"filed1": "color1,
}
},
{
"_index": "test",
"_type": "type",
"_id": "0002",
"_score": 1,
"_source": {
"filed1": "color2,
}
}
}
}
}
I have a dataset that looks something like this:
{
"id": "02741544",
"items": [{
"item": "A"
}]
}, {
"id": "02472691",
"items": [{
"item": "A"
}, {
"item": "B"
}, {
"item": "C"
}]
}, {
"id": "01316523",
"items": [{
"item": "A"
}, {
"item": "B"
}]
}, {
"id": "01316526",
"items": [{
"item": "A"
}, {
"item": "B"
}]
}, {
"id": "01316529",
"items": [{
"item": "A"
}, {
"item": "D"
}]
},
I'm trying to craft a query that will give me an output that looks like this:
{
"item": "A",
"ids": [{
"id": "02741544"
}, {
"id": "02472691"
}, {
"id": "01316523"
}, {
"id": "01316526"
}, {
"id": "01316529"
}]
}, {
"item": "B",
"ids": [{
"id": "02472691"
}, {
"id": "01316523"
}, {
"id": "01316526"
}]
}, {
"item": "C",
"ids": [{
"id": "02472691"
}]
}, {
"item": "D",
"ids": [{
"id": "02472691"
}]
},
Basically, I'm trying to get the distinct items from the item array in the object, and then returning an array of ids for each obj that has that item in it's item array.
Better use the aggregation framework in which you need to run an operation that consists of the following pipeline steps (in the given order):
$unwind - This initial step will flatten the items array i.e. it produces a copy of each document per array entry. This is necessary for processing the documents further down the pipeline as "denormalised" documents which you can aggregate as groups.
$group - This will group the flattened documents by the item subdocument key and create the ids list by using the $push accumulator operator.
-- UPDATE --
As #AminJ pointed out in the comments, if items can have duplicate item values and you don't want duplicate ids in the result you can use $addToSet instead of $push
The following example demonstrates this:
db.collection.aggregate([
{ "$unwind": "$items" },
{
"$group": {
"_id": "$items.item",
"ids": {
"$push": { "id": "$id" } /* or use
"$addToSet": { "id": "$id" } if you don't want duplicate ids */
}
}
}
])
Sample Output
{
"_id" : "A",
"ids" : [
{ "id" : "02741544" },
{ "id" : "02472691" },
{ "id" : "01316523" },
{ "id" : "01316526" },
{ "id" : "01316529" }
]
}
/* 2 */
{
"_id" : "B",
"ids" : [
{ "id" : "02472691" },
{ "id" : "01316523" },
{ "id" : "01316526" }
]
}
/* 3 */
{
"_id" : "C",
"ids" : [
{ "id" : "02472691" }
]
}
/* 4 */
{
"_id" : "D",
"ids" : [
{ "id" : "01316529" }
]
}
The result from an aggregate() function is a cursor to the documents produced by the final stage of the aggregation pipeline operation. So if you want the results in an array you can use the cursor's toArray() method which returns an array that contains all the documents from it.
For example:
var pipeline = [
{ "$unwind": "$items" },
{
"$group": {
"_id": "$items.item",
"ids": {
"$push": { "id": "$id" } /* or use
"$addToSet": { "id": "$id" } if you don't want duplicate ids */
}
}
}
],
results = db.collection.aggregate(pipeline).toArray();
printjson(results);
Here's a solution using an aggregation pipeline:
db.col.aggregate([
{
$unwind: "$items"
},
{
$project: {
id: 1,
item: "$items.item"
}
},
{
$group: {
_id: "$item",
ids: {
$push: "$id"
}
}
}
])
I have the following document in my collection.
{
"_id" : ObjectId("55961a28bffebcb8058b4570"),
"title" : "BackOffice 2",
"cts" : NumberLong(1435900456),
"todo_items" : [
{
"id" : "55961a42bffebcb7058b4570",
"task_desc" : "test 1",
"completed_by" : "557fccb5bffebcf7048b457c",
"completed_date" : NumberLong(1436161096)
},
{
"id" : "559639afbffebcc7098b45a6",
"task_desc" : "test 2",
"completed_by" : "557fccb5bffebcf7048b457c",
"completed_date" : NumberLong(1435911809)
},
{
"id" : "559a22f5bffebcb0048b476c",
"task_desc" : "test 3",
}
],
"uts" : NumberLong(1436164853)
}
I need an aggregation query to perform following, if there is field "completed_by" and "completed_date" and if there is a value which is not null push in to the "completed" array field, otherwise push them into the "incomplete" field.
Following is a sample result I want.
{
"_id" : ObjectId("55961a28bffebcb8058b4570"),
"completed" : [
{
"id":"557fccb5bffebcf7048b457c",
"title":"test 1",
"completed_by" : "557fccb5bffebcf7048b457c",
"completed_date" : NumberLong(1436161096)
},
{
"id":"557fccb5bffebcf7048b457c",
"title":"test 1",
"completed_by" : "557fccb5bffebcf7048b457c",
"completed_date" : NumberLong(1436161096)
}
],
"incomplete":[
{
"id" : "559a22f5bffebcb0048b476c",
"title" : "test 3"
}
]
}
As long as your "array" items have "distinct" identifiers ( which they have ) there are a couple of approaches to this;
Firstly, without actually "aggregating accross documents":
db.collection.aggregate([
{ "$project": {
"title": 1,
"cts": 1,
"completed": { "$setDifference": [
{ "$map": {
"input": "$todo_items",
"as": "i",
"in": {
"$cond": [
"$$i.completed_date",
"$$i",
false
]
}
}},
[false]
]},
"incomplete": { "$setDifference": [
{ "$map": {
"input": "$todo_items",
"as": "i",
"in": {
"$cond": [
"$$i.completed_date",
false,
"$$i"
]
}
}},
[false]
]}
}}
])
That requires that you at least have MongoDB 2.6 available on the server in order to use the required $map and $setDifference operators. It's pretty fast considering that all the work is done in a single $project stage.
The alternative, which you should only use when "aggregating across documents", is available to all versions supporting the aggregation framework post MongoDB 2.2:
db.collection.aggregate([
{ "$unwind": "$todo_items" },
{ "$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"cts": { "$first": "$cts" },
"completed": {
"$addToSet": {
"$cond": [
"$todo_items.completed_date",
"$todo_items",
null
]
}
},
"incomplete": {
"$addToSet": {
"$cond": [
"$todo_items.completed_date",
null,
"$todo_items",
]
}
}
}},
{ "$unwind": "$completed" },
{ "$match": { "completed": { "$ne": null } } },
{ "$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"cts": { "$first": "$cts" },
"completed": { "$push": "$completed" },
"incomplete": { "$first": "$incomplete" }
}}
{ "$unwind": "$incomplete" },
{ "$match": { "incomplete": { "$ne": null } } },
{ "$group": {
"_id": "$_id",
"title": { "$first": "$title" },
"cts": { "$first": "$cts" },
"completed": { "$first": "$completed" },
"incomplete": { "$push": "$incomplete" }
}}
])
Which isn't entirely all there since you need to cater for conditions where an array may end up empty. But that is not the real lesson here since MongoDB 2.6 is already a couple of years in circulation.
In aggregation, you cannot really exclude the "null/false" results, but you can "filter" them.
Also, unless you are actually "aggregating accross documents" as mentioned already, then the second form with $unwind to process the arrays comes with a "lot" of overhead. So you really should be altering the array contents in your client code as each document is read.
Can you please check the below :
db.collection.aggregate([
{$unwind : "$todo_items"},
{$group: {_id : "$_id" , completed : {{$cond :
{
if : { $and : [ {"todo_items.completed_by" : {$exists: true, $ne : null }},
{"todo_items.completed_date" : {$exists : true, $ne : null}} ] } },
then : {$push : {"old_completed" : "$todo_items"}},
else: {$push : {"old_incompleted" : "$todo_items"}}
} } } },
{$project: {_id : "$_id", completed : "$completed.old_completed" ,
incompleted : "$completed.old_incompleted"}}
]);