In sails/waterline get maximum value of a column in a database agnostic way - javascript

While using sails as ORM (version 1.0), I notice that there is a function called Model.avg (as well as sum). - However there is not a maximum or minimum function to get the maximum or minimum from a column in a model; so it seems this is not necessary because it is covered by other functions already?
Now in my database I need to get the "maximum id" in a list; and I have it working for postgresql by using a native query:
const maxnum = await Order.getDatastore().sendNativeQuery('SELECT MAX(\"orderNr\") FROM \"order\"')
While this isn't the most difficult thing, it is not what I truly want: it is limited to only sql-based datastores (so we wouldn't be able to move easily to mongodb); and the syntax might actually be even different for another sql database type.
So I wonder - can this be transformed in such a way it doesn't rely on sendNativeQuery?

You can try .query() to execute a raw SQL query using the specified model's datastore and if u want u can try pg , an NPM package used for communicating with PostgreSQL databases:
Pet.query('SELECT pet.name FROM pet WHERE pet.name = $1', [ 'dog' ]
,function(err, rawResult) {
if (err) { return res.serverError(err); }
sails.log(rawResult);
// (result format depends on the SQL query that was passed in, and
the adapter you're using)
// Then parse the raw result and do whatever you like with it.
return res.ok();
});

You can use the limit and order options waterline provides to get a single Model with a maximal value (then just extract that value).
const orderModel = await Order.find({
where: {},
select: ['orderNr'],
limit: 1,
sort: 'orderNr DESC'
});
console.log(orderModel.orderNr);
Like most things in Waterline, it's probably not as efficient as an SQL SELECT MAX query (or some equivalent in mongo, etc), but it should allow swapping out the database with no maintenance. Last note, don't forget to handle the case of no models found.

Related

DynamoDB: Query only every 10th value

I am querying data between two specific unixtime values. for example:
all data between 1516338730 (today, 6:12) and 1516358930 (today, 11:48)
my database receives a new record every minute. Now, when i want to query the data of last 24h, its way too dense. every 10th minute would be perfect.
my question now is: how can i read only every 10th database record, using DynamoDB?
As far as i know, theres no posibility to use modulo or something similar that pleases my needs.
This is my AWS Lambda Code so far:
var read = {
TableName: "user",
ProjectionExpression:"#time, #val",
KeyConditionExpression: "Id = :id and TIME between :time_1 and :time_2",
ExpressionAttributeNames:{
"#time": "TIME",
"#val": "user_data"
},
ExpressionAttributeValues: {
":id": event, // primary key
":time_1": 1516338730,
":time_2": 1516358930
},
ScanIndexForward: true
};
docClient.query(read, function(err, data) {
if(err) {
callback(err, null);
}
else {
callback(null, data.Items);
}
});
};
You say that you insert 1 record every minute?
The following might be an option:
At the time of insertion, set another field on the record, let's call it MinuteBucket, which is calculated as the timestamp's minute value mod 10.
If you do this via a stream function, you can handle new records, and then write something to touch old records to force a calculation.
Your query would change to this:
/*...snip...*/
KeyConditionExpression: "Id = :id and TIME between :time_1 and :time_2 and MinuteBucket = :bucket_id",
/*...snip...*/
ExpressionAttributeValues: {
":id": event, // primary key
":time_1": 1516338730,
":time_2": 1516358930,
":bucket_id": 0 //can be 0-9, if you want the first record to be closer to time_1, then set this to :time_1 minute value mod 10
},
/*...snip...*/
Just as a follow-up thought: if you want to speed up your queries, perhaps investigate using the MinuteBucket in an index, though that might come at a higher price.
I don't think that it is possible with dynamoDB API.
There are FilterExpression that contains conditions that DynamoDB applies after the Query operation, but before the data is returned to you.
But AFAIK it isn't possible to use a custom function. And build-in functions are poor.
As a workaround, you could mark each 10th item on the client side. And then query with checking attribute_exists (or attribute value) to filter them.
BTW, it would be nice to create the index for 'Id' attribute with sort key 'TIME' for improving query performance.

Mongoose get sum of fields

I'm trying to track the bandwidth usage of a user based upon two mongoose schemas. I have a user and image schema, were a user has many images. My image schema looks like this:
image = {
creator: 'ObjectId of user',
size: '12345', //kb
uploadedTo:[{}]
}
Essentially I want to create a query that will get all images that belong to a user via the image.creator property. I would then multiply the image.size property by image.uploadedTo.length value to get the total bandwidth used.
For example: If a user has 5 images, each image is 5,000kb and is uploaded to 3 services each, the total bandwidth for the user would be 75,000kb (5*5,000*3).
Is this query possible strictly through mongoose, or would I have to just get the user's images and then use regular javascript to get the total bandwidth?
You'll want to use the aggregation pipeline. The basic projection might look like this:
{
$project: {
size: 1,
number_of_uploads: {
$size: "$uploadedTo"
},
total_bandwidth: {
$multiply: [ "$size", "$number_of_uploads" ]
}
}
You'd get a new document that looks like:
{
size: '1234',
number_of_uploads: 2,
total_bandwidth: 2468
}
You'll need to integrate that with Mongoose's aggregate helper.
If you're using MongoDB 3.2, you can also use $lookup (which is basically a join operation) as part of your pipeline to look up the creator._id, and then run a $sum operation on all of the images (you'll probably $group by that creator ID). The benefit of this is that your server doesn't do any work; the lookups and operations happen inside MongoDB itself.
If you're not using v3.2, you can leverage Mongoose's population to look up (on your own server) the creator ID for you, and then use JavaScript on your own server to calculate the sum.
It's a bit difficult for me to come up with what exactly your pipeline will look like since I don't have a sample dataset to play with, but the above tools should be all that you need.
Additional operation resources
$size
$multiply
(P.S. you're probably looking at this like "WTF?". Sometimes it's easier to just do the calculations yourself and "use regular javascript to get the total bandwidth", as you mentioned. Both solutions will work, it just depends on where you want to put the load - whether on the MongoDB server or on your server - and how many round-trips you want to make.)

Using Meteor publish-with-relations package where each join cannot use the _id field

I am working to solve a problem not dissimilar to the discussion present at the following blog post. This is wishing to publish two related data sets in Meteor, with a 'reactive join' on the server side.
https://www.discovermeteor.com/blog/reactive-joins-in-meteor/
Unfortunately for me, however, the related collection I wish to join to, will not be joined using the "_id" field, but using another field. Normally in mongo and meteor I would create a 'filter' block where I could specify this query. However, as far as I can tell in the PWR package, there is an implicit assumption to join on '_id'.
If you review the example given on the 'publish-with-relations' github page (see below) you can see that both posts and comments are being joined to the Meteor.users '_id' field. But what if we needed to join to the Meteor.users 'address' field ?
https://github.com/svasva/meteor-publish-with-relations
In the short term I have specified my query 'upside down' (as luckily I m able to use the _id field when doing a reverse join), but I suspect this will result in an inefficient query as the datasets grow, so would rather be able to do a join in the direction planned.
The two collections we are joining can be thought of as like a conversation topic/header record, and a conversation message collection (i.e. one entry in the collection for each message in the conversation).
The conversation topic in my solution is using the _id field to join, the conversation messages have a "conversationKey" field to join with.
The following call works, but this is querying from the messages to the conversation, instead of vice versa, which would be more natural.
Meteor.publishWithRelations({
handle: this,
collection: conversationMessages,
filter: { "conversationKey" : requestedKey },
options : {sort: {msgTime: -1}},
mappings: [{
//reverse: true,
key: 'conversationKey',
collection: conversationTopics,
filter: { startTime: { $gt : (new Date().getTime() - aLongTimeAgo ) } },
options: {
sort: { createdAt: -1 }
},
}]
});
Can you do a join without an _id?
No, not with PWR. Joining with a foreign key which is the id in another table/collection is nearly always how relational data is queried. PWR is making that assumption to reduce the complexity of an already tricky implementation.
How can this publish be improved?
You don't actually need a reactive join here because one query does not depend on the result of another. It would if each conversation topic held an array of conversation message ids. Because both collections can be queried independently, you can return an array of cursors instead:
Meteor.publish('conversations', function(requestedKey) {
check(requestedKey, String);
var aLongTimeAgo = 864000000;
var filter = {startTime: {$gt: new Date().getTime() - aLongTimeAgo}};
return [
conversationMessages.find({conversationKey: requestedKey}),
conversationTopics.find(requestedKey, {filter: filter})
];
});
Notes
Sorting in your publish function isn't useful unless you are using a limit.
Be sure to use a forked version of PWR like this one which includes Tom's memory leak fix.
Instead of conversationKey I would call it conversationTopicId to be more clear.
I think this could be now much easier solved with the reactive-publish package (I am one of authors). You can make any query now inside an autorun and then use the results of that to publish the query you want to push to the client. I would write you an example code, but I do not really understand what exactly do you need. For example, you mention you would like to limit topics, but you do not explain why would they be limited if you are providing requestedKey which is an ID of a document anyway? So only one result is available?

MongoDB Bulk Save Equivalent?

I am a mongodb noob and am running into some difficulty trying to create an equivalent to bulk save (as I can't find a bulk save operation) using the MongoDB bulk operations. Briefly, given an array of documents:
[{ _id:1, name:"a" ... }, { _id:1, name:"b" ... } ... ]
I want to bulk upsert the documents in the array, using the _id attribute as the comparison field to determine which incoming records are equivalent to records already in mongodb. In pseudo-code I want mongodb to bulk upsert as follows:
if(incomingDocument._id == existingDocument._id){
update(incoming) // overwrite existing document with entire incoming document
} else {
insert(incoming)
}
Ideally, I would like to pass mongo an array and an comparator vs queuing up an individual bulk operation for each document.
How/can I do this with Bulk.find().upsert().update(<update>); or similar ?
(Alternately, is there an undocumented bulk save() operation?)
Thank you!
Bulk.find.upsert
With the upsert option set to true, if no matching documents exist for
the Bulk.find() condition, then the update or the replacement
operation performs an insert. If a matching document does exist, then
the update or replacement operation performs the specified update or
replacement.
But you will need to loop over your collection:
var bulk = db.items.initializeUnorderedBulkOp();
myDocumnets.forEach(function(doc) {
bulk.find({_id: doc._id}).upsert().replaceOne(doc);
});
bulk.execute({w: 1, j: true}, function (err, result) {
if (result.isOk()) {
...
}
More, or less, I am sorry I am not able to test it at the moment. I am also not able to say how it will behave on large amounts of documents.
UPDATE
I modified code, as suggested by Colin.

MongoDb bulk insert limit issue

Im new with mongo and node. I was trying to upload a csv into the mongodb.
Steps include:
Reading the csv.
Converting it into JSON.
Pushing it to the mongodb.
I used 'csvtojson' module to convert csv to json and pushed it using code :
MongoClient.connect('mongodb://127.0.0.1/test', function (err, db) { //connect to mongodb
var collection = db.collection('qr');
collection.insert(jsonObj.csvRows, function (err, result) {
console.log(JSON.stringify(result));
console.log(JSON.stringify(err));
});
console.log("successfully connected to the database");
//db.close();
});
This code is working fine with csv upto size 4mb; more than that its not working.
I tried to console the error
console.log(JSON.stringify(err));
it returned {}
Note: Mine is 32 bit system.
Is it because there a document limit of 4mb for 32-bit systems?
I'm in a scenario where I can't restrict the size and no.of attributes in the csv file (ie., the code will be handling various kinds of csv files). So how to handle that? I there any modules available?
If you are not having a problem on the parsing the csv into JSON, which presumably you are not, then perhaps just restrict the list size being passed to insert.
As I can see the .csvRows element is an array, so rather than send all of the elements at once, slice it up and batch the elements in the call to insert. It seems likely that the number of elements is the cause of the problem rather than the size. Splitting the array up into a few inserts rather than 1 should help.
Experiment with 500, then 1000 and so on until you find a happy medium.
Sort of coding it:
var batchSize = 500;
for (var i=0; i<jsonObj.csvRows.length; i += batchSize) {
var docs = jsonObj.csvRows.slice(i, i+(batchSize -1));
db.collection.insert( docs, function(err, result) {
// Also don't JSON covert a *string*
console.log(err);
// Whatever
}
}
And doing it in chunks like this.
You can make those data as an array of elements , and then simply use the MongoDB insert function, passing this array to the insert function

Categories

Resources