I was trying to write a query which is finding MAX value from all documents. The scenario is something like I have 100 Students Documents, in which student Name, roll number as well as array of Tests inside that array of Subject and its respective marks. So, I am getting highest marks among subject physics from all documents. But I am not getting it with student roll number. That I was trying to find out.
TestDoc is:
Student[
StudenName:"A",
StudentRollNo :1,
id:"1",
StudentAdd:"---",
Test1:[
{
SubName:"S1",
Marks:20
},
{
SubName:"S2",
Marks:30
},
...
],
Test2:
[
Same as above
],
],
[
STUDENT2
] ,
and so on
Query I am using is:
select MAX(s.Marks) from c join test in c.Test1 join s in test.marks
According to your description, you want to implement function like GROUP BY in azure cosmosdb queries.
Per my experience, azure cosmosdb aggregation capability in SQL limited to COUNT, SUM, MIN, MAX, AVG functions. GROUP BY or other aggregation functionality are not be supported in azure cosmosdb now.
However, stored procedures or UDF can be used to implement your aggregation requirement.
You could refer to a great package documentdb-lumenize based on DocumentDb stored procedure.
For your first scenario in your post,I created two student documents in my azure cosmosdb account.
[
{
"id": "1",
"StudenName": "A",
"StudentRollNo": 1,
"Test": [
{
"SubName": "S1",
"Marks": 20
},
{
"SubName": "S2",
"Marks": 30
}
],
},
{
"id": "2",
"StudenName": "B",
"StudentRollNo": 2,
"Test": [
{
"SubName": "S1",
"Marks": 10
},
{
"SubName": "S2",
"Marks": 40
}
],
}
]
then I put the resultset searched by SQL below to the documentdb-lumenize mentioned above to get the max S2 mark.
SELECT c.StudentRollNo,test1.Marks as mark FROM c
join test1 in c.Test
where test1.SubName='S2'
For your second scenario in your comment,I removed the where clause of the SQL above.
SELECT c.StudentRollNo,test1.Marks as mark FROM c
join test1 in c.Test
and resultset like:
This applies only to one test.If you want to query multiple tests, you could use stored procedure.
You could also refer to SO threads below:
1.Azure DocumentDB - Group By Aggregates
2.Grouping by a field in DocumentDB
Related
I want to display these fields :name, age, addresses_id, addresses_city, addresses_primary for each person into data studio.
My JSON data
{
"data": [
{
"name": "Lio",
"age": 30,
"addresses": [
{
"id": 7834,
"city": "ML",
"primary": 1
},
{
"id": 5034,
"city": "MM",
"primary": 1
}
]
},
{
"name": "Kali",
"age": 41,
"addresses": [
{
"id": 3334,
"city": "WK",
"primary": 1
},
{
"id": 1730,
"city": "DC",
"primary": 1
}
]
},
...
]
}
there is no problem if i don't render the addresses field
return {
schema: requestedFields.build(),
rows: rows
};
//rows:
/*
"rows": [
{
"values": ["Lio", 30]
},
{
"values": ["Kali", 41]
},
...
]
*/
The problem is
I'm not able to model the nested JSON data in Google Data Studio. I
have the problem exactly in the "addresses" field.
Could anyone tell me what format should be for the rows in this case?
As you already know, for each name of your dataset, you clearly have more than one row (one person has multiple addresses). Data Studio only accepts a single data for each field, since arrays are not supported at all. So you need to work on this.
There are some ways to solve this, but always keep in mind that:
getSchema() should return all available fields for your connector (the order doesn't really matter, since Data Studio always sort alphabetically the available fields)
getData() should return a list of values. But here the order is relevant: it should be the same as the parameter passed to getData() (which means the results should be dynamic, sometimes you'll return all values, sometimes not, and the order may change).
Solution 1: Return multiple rows per record
Since you can produce multiple rows for each name, just do it.
To achieve this, your field definition (=getSchema()) should include fields address_id, address_city and address_primary (you can also add address_order if you need to know the position of the address in the list).
Supposing getData() is called with all fields in the same order they were discribed, rows array should look like this:
"rows": [
{
"values": ["Lio", 30, "7834", "ML", 1]
},
{
"values": ["Lio", 30, "5034", "MM", 1]
},
{
"values": ["Kali", 41, "3334", "WK", 1]
},
{
"values": ["Kali", 41, "1730", "DC", 1]
},
...
]
IMO, this is the best solution for your data.
Solution 2: Return one address only, ignoring others
If you prefer one row per person, you can get one of the addresses and display only it (usually the main/primary address, or the first one).
To achieve this, your field definition (=getSchema()) should include fields address_id, address_city and address_primary.
Supposing getData() is called with all fields in the same order they were discribed, rows array should look like this:
"rows": [
{
"values": ["Lio", 30, "7834", "ML", 1]
},
{
"values": ["Kali", 41, "3334", "WK", 1]
},
...
]
Solution 3: Return all addresses, serialized in a field
This is helpful if you really need all information but do not want a complex scheme.
Just create a field called addresses in your field definition (=getSchema()) and write the JSON there as a string (or any other format you want).
Supposing getData() is called with all fields in the same order they were discribed, rows array should look like this:
"rows": [
{
"values": ["Lio", 30, "[{\"id\": 7834, \"city\": "ML", \"primary\": 1}, {\"id\": 5034, \"city\": \"MM\", \"primary\": 1}]"]
},
{
"values": ["Kali", 41, "[{\"id\": 3334, \"city\": \"WK\", \"primary\": 1}, {\"id\": 1730, \"city\": \"DC\", \"primary\": 1}]"]
},
...
]
This solution may appear senseless, but it is possible to interact with this data later in DataStudio using REGEX if really needed.
Solution 4: Create a different field for each address
If you're sure all records has a maximum number of addresses (in you example, both names have 2 addresses, for example), you can create multiple fields.
Your field definition (=getSchema()) should include fields address_id1, address_city1, address_primary1, address_id2, ... address_primaryN.
I wouldn't explain how rows should look like in this situation, but it is not hard to guess with the other examples.
I have an array of elements which contain some ids. I need to connect to firestore and fetch only the records which have each specified id.
My array:
var ids = { 101, 201, 303}
and my firestore documents:
{
"users": {
{
"id": 1,
"name": "name1",
"otherData:" "otherData"
},
{
"id": 2,
"name": "name1",
"otherData:" "otherData"
},
{
"id": 3,
"name": "name1",
"otherData:" "otherData"
},
...
{
"id": 1000,
"name": "name1",
"otherData:" "otherData"
}
}
}
How can I do that efficiently using db.collection('coll1').where() statements?
I have tried to fetch the data using forEach like this:
ids.forEach(id => {
let result = db.collection('coll1').where('id', '==', id).get();
...
});
But each time I try doing it this way, it does not work.
I am new to the firestore environment and not sure how to do such an operation. Please help.
You can use Firstore compound queries for this. Link to officials docs here.
Use the in operator to combine up to 10 equality (==) clauses on the same field with a logical OR. An in query returns documents where the given field matches any of the comparison values.
Similarly, use the array-contains-any operator to combine up to 10 array-contains clauses on the same field with a logical OR. An array-contains-any query returns documents where the given field is an array that contains one or more of the comparison values
You need to change ids to array type
var ids = [ 101, 201, 303 ]
Query
db.collection("coll1").where("id","in", ids).get();
Let's say I have the following document:
{
"Id": "1",
"Properties": [
{
"Name": "Name1",
"PropertyTypes": [
"Type1"
]
},
{
"Name": "Name2",
"PropertyTypes": [
"Type1",
"Type2",
"Type3"
]
}
]
}
When I use the following SQL:
SELECT c.Id FROM c
JOIN p in c.Properties
WHERE ARRAY_CONTAINS(p.PropertyTypes,"Type1")
I get as return:
[
{
"Id": "1"
},
{
"Id": "1"
}
]
How do I change my query so that it only returns distinct documents?
As far as I know, Distinct hasn't supported by Azure Cosmos DB yet.
It seems that there is no way to remove the repeat data in the query SQL level.
You could handle with your query result set in the loop locally.
However, if your result data is large,I suggest you using a stored procedure to handle with result data in Azure Cosmos DB to release the pressure on your local server.
You could refer to the official tutorial about SP.
I have entries into the mongo database that hold daily entries and im trying to find a specific entry within an array. For example trying to find the coca cola for this user on the date using mongoose.
"_id": ObjectId("ID"),
"user_id": ObjectId("ID"),
"date": today,
"snacks":
[
{
"nutrients": [{...}],
"servings": 1,
"name": "Coca-Cola"
}
]
user_food.find({user_id : req.session.user_id, date: today}, {snacks:{[name:{"Coca-Cola"}]}}
I can query and retrieve the full entry by date with the following query:
user_food.findOne({user_id : req.session.user_id, date: today}, function (err, diary) {...});
My only problem is obtaining only the specific entry object by the name: snack - name.
You could change your query to look like this:
user_food.find({user_id : req.session.user_id, date: today, 'snacks.name': 'Coca-Cola' })
This will only find documents that have a snack name of "Coca-Cola" you can then project it if you wish to only send back relevant information
I have a dataset of records stored in mongodb and i have been trying to extract a complex set of data from the records.
Sample records are as follows :-
{
bookId : '135wfkjdbv',
type : 'a',
store : 'crossword',
shelf : 'A1'
}
{
bookId : '13erjfn',
type : 'b',
store : 'crossword',
shelf : 'A2'
}
I have been trying to extract data such that for each bookId, i get a count (of records) for each shelf per store name that holds the book identified by bookId where the type of the book is 'a'.
I understand that the aggregation query allows a pipeline that allows grouping, matching etc, but I have not been able to reach a solution.
The desired output is of the form :-
{
bookId : '135wfkjdbv',
stores : [
{
name : 'crossword'
shelves : [
{
name : 'A1',
count : 12
},
]
},
{
name : 'granth'
shelves : [
{
name : 'C2',
count : 12
},
{
name : 'C4',
count : 12
},
]
}
]
}
The process isn't really that difficult when you look at at. The aggregation "pipeline" is exactly that, where each "stage" feeds a result into the next for processing. Just like unix "pipe":
ps -ef | grep mongo | tee out.txt
So it's just adding stages, and in fact three $group stages where the first does the basic aggregation and the remaining two simply "roll up" the arrays required in the output.
db.collection.aggregate([
{ "$group": {
"_id": {
"bookId": "$bookId",
"store": "$store",
"shelf": "$shelf"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": {
"bookId": "$_id.bookId",
"store": "$_id.store"
},
"shelves": {
"$push": {
"name": "$_id.shelf",
"count": "$count"
}
}
}},
{ "$group": {
"_id": "$_id.bookId",
"stores": {
"$push": {
"name": "$_id.store",
"shelves": "$shelves"
}
}
}}
])
You could possibly $project at the end to change the _id to bookId, but you should already know that is what it is and get used to treating _id as a primary key. There is a cost to such operations, so it is a habit you should not get into and learn doing things correctly from the start.
So all that really happens here is all the fields that would make up the grouping detail are made the primary key of $group with the other field being produced as count, to count the shelves within that grouping. Think the SQL equivalent:
GROUP BY bookId, store, shelf
All each other stage does is transpose each grouping level into array entries, first by shelf within the store and then the store within the bookId. Each time the fields in the primary grouping key are reduced down by the content going into the produced array.
When you start thinking in terms of "pipeline" processing, then it becomes clear. As you construct one form, then take that output and move it to the next form and so on. This is basically how you fold the results within two arrays.