Best way to get rid of old messages/posts in a collection? - javascript

I know this website prefers answers over discussions but I am quite lost on this.
What would be a sufficient enough way to get rid of old messages that are stored in a collection? As they are messages, there will be a large amount of them.
What I have so far are either deleting messages using
if (Messages.find().count() > 100) {
Messages.remove({
_id: Messages.findOne({}, { sort: { createdAt: 1 } })._id
});
}
and I have also tried using expire.
Is there any other/more efficient way to do this?

Depending on how you define the age to expiry, there are two ways you can go about this.
The first one would be to use "TTL indexes" that automatically prune some collections based on time. For instance, you might have a logs table to log all the application events and you only want to keep the logs for the last hour. To implement this, add a date field to your logs document. This will indicate the age of the document. MongoDB will use this field to determine if your document is expired and needs to be removed:
db.log_events.insert({
"name": "another log entry"
"createdAt": new Date()
})
Now add a TTL index to your collection on this field. In the example below I used an expireAfterSeconds value of 3600 which will annihilate logs after every hour:
db.log_events.createIndex({ "createdAt": 1 }, { expireAfterSeconds: 3600 })
So for your case you would need to define an appropriate expiry time in seconds. For more details refer to the MongoDB documentation on expiration of data using TTL indexes.
The second approach involves manually removing the documents based on a date range query. For the example above given the same collection, to remove documents older that an hour you need to create a date that represents an hour ago relative to the current timestamp and use that date as the query in the remove method of the collection:
var now = new Date(),
hourAgo = new Date(now.getTime() - (60 * 60 * 1000));
db.log_events.remove({"createdAt": { "$lte": hourAgo }})
The above will delete log documents older than an hour.

Related

Creating a mongo view that depends on the current time

I have a collection that has a date field and I want to create a mongo view that filter all the documents by the current date. For example, I want my view to contain all the documents of the last 7 days.
I have a javascript script that creates the view with aggregation pipeline. I used javascript method- new Date() to write the condition of the last 7 days:
{
"$lt": [
{"$subtract": [new Date(), "$DateOfDocument"]}, // difference in milliseconds
1000 * 60 * 60 * 24 * 7 // 7 days in milliseconds
]
}
But when I execute the script that creates the view, mongo calculates 'new Date()' and than creates the view, with the result of 'new Date()' as ISODate. Now the aggregation pipeline calculates the view by the last time I executed the script, not by the actual current date.
{
"$lt": [
{"$subtract": [ISODate("2018-02-05T06:52:32.10+0000"), "$DateOfDocument"]},
604800000
]
}
Is there any way to get a view filtered by the current date? Any aggregation method for the current date, like oracle's 'sysdate'? I don't want to execute the script that recreate the view every time I want to read the view.
Looks like this feature is in the works for MongoDB 3.7.
https://jira.mongodb.org/browse/SERVER-23656

Compare two +new Date strings

I'm trying to compare two different +new Date, but sometimes it gives different results.
So this is how I'm doing it:
The first user sends a request to Firebase(server) and sets value
time: +new Date like this
firebase.userRef.set({
time: +new Date
})
Then the second user sends a similar request to Firebase and sets value like above. Then second user finds user 1 in query and checks who sent the request first. Like so:
firebase.usersRef.child(user_uid).once("value",function(snapshot){
firebase.usersRef.child(current_uid).once("value",function(childSnapshot) {
if(snapshot.val().time > childSnapshot.val().time){
//current user requested first
} else {
//user found requested first
}
});
});
But, sometimes both users gets the wrong answer. So the users think that current user requested first.
A better way of setting the timestamp would be using ServerValues:
userRef.set({
time: firebase.database.ServerValue.TIMESTAMP
})
Using server values would be better since the value is fetched once it hits Firebase's server and would make your comparisons accurate.

Time sensitive data in Node.js

I'm building an application in Node.js and MongoDB, and the application has something of time-valid data, meaning if some piece of data was inserted into the database.
I'd like to remove it from the database (via code) after three days (or any amount of days/time spread).
Currently, my solution is to have some sort of member in my Schema that checks when it was actually posted and subsequently removes it when the current time is past 3 days from the insertion, but I'm having trouble in figuring out a good way to write it in code.
Are there any standard ways to accomplish something like this?
There are two basic ways to accomplish this with a TTL index. A TTL index will let you define a special type of index on a BSON Date field that will automatically delete documents based on age. First, you will need to have a BSON Date field in your documents. If you don't have one, this won't work. http://docs.mongodb.org/manual/reference/bson-types/#document-bson-type-date
Then you can either delete all documents after they reach a certain age, or set expiration dates for each document as you insert them.
For the first case, assuming you wanted to delete documents after 1 hour you would create this index:
db.mycollection.ensureIndex( { "createdAt": 1 }, { expireAfterSeconds: 3600 } )
assuming you had a createdAt field that was a date type. MongoDB will take care of deleting all documents in the collection once they reach 3600 seconds (or 1 hour) old.
For the second case, you will create an index with expireAfterSeconds set to 0 on a different field:
db.mycollection.ensureIndex( { "expireAt": 1 }, { expireAfterSeconds: 0 } )
If you then insert a document with an expireAt field set to a date mongoDB will delete that document at that date and time:
db.mycollection.insert( {
"expireAt": new Date('June 6, 2014 13:52:00'),
"mydata": "data"
} )
You can read more detail about how to use TTL indexes here:
http://docs.mongodb.org/manual/tutorial/expire-data/

MongoDB query for document older than 30 seconds

Does anyone have a good approach for a query against a collection for documents that are older than 30 seconds. I'm creating a cleanup worker that marks items as failed after they have been in a specific state for more than 30 seconds.
Not that it matters, but I'm using mongojs for this one.
Every document has a created time associated with it.
If you want to do this using mongo shell:
db.requests.find({created: {$lt: new Date((new Date())-1000*60*60*72)}}).count()
...will find the documents that are older than 72 hours ("now" minus "72*60*60*1000" msecs). 30 seconds would be 1000*30.
We are assuming you have a created_at or similar field in your document that has the time it was inserted or otherwise modified depending on which is important to you.
Rather than iterate over the results you might want to look at the multi option in update to apply your change to all documents that match your query. Setting the time you want to look past should be fairly straightforward
In shell syntax, which should be pretty much the same of the driver:
db.collection.update({
created_at: {$lt: time },
state: oldstate
},
{$set: { state: newstate } }, false, true )
The first false being for upserts which does not make any sense in this usage and the second true marking for multi document update.
If the documents are indeed going to be short lived and you have no other need for them afterwards, then you might consider capped collections. You can have a total size or time to live option for these and the natural insertion order favours processing of queued entries.
You could use something like that:
var d = new Date();
d.setSeconds(d.getSeconds() - 30);
db.mycollection.find({ created_at: { $lt: d } }).forEach(function(err, doc) {} );
The TTL option is also an elegant solution. It's an index that deletes documents automatically after x seconds, see here: https://docs.mongodb.org/manual/core/index-ttl/
Example code would be:
db.yourCollection.createIndex({ created:1 }, { expireAfterSeconds: 30 } )

Range query for MongoDB pagination

I want to implement pagination on top of a MongoDB. For my range query, I thought about using ObjectIDs:
db.tweets.find({ _id: { $lt: maxID } }, { limit: 50 })
However, according to the docs, the structure of the ObjectID means that "ObjectId values do not represent a strict insertion order":
The relationship between the order of ObjectId values and generation time is not strict within a single second. If multiple systems, or multiple processes or threads on a single system generate values, within a single second; ObjectId values do not represent a strict insertion order. Clock skew between clients can also result in non-strict ordering even for values, because client drivers generate ObjectId values, not the mongod process.
I then thought about querying with a timestamp:
db.tweets.find({ created: { $lt: maxDate } }, { limit: 50 })
However, there is no guarantee the date will be unique — it's quite likely that two documents could be created within the same second. This means documents could be missed when paging.
Is there any sort of ranged query that would provide me with more stability?
It is perfectly fine to use ObjectId() though your syntax for pagination is wrong. You want:
db.tweets.find().limit(50).sort({"_id":-1});
This says you want tweets sorted by _id value in descending order and you want the most recent 50. Your problem is the fact that pagination is tricky when the current result set is changing - so rather than using skip for the next page, you want to make note of the smallest _id in the result set (the 50th most recent _id value and then get the next page with:
db.tweets.find( {_id : { "$lt" : <50th _id> } } ).limit(50).sort({"_id":-1});
This will give you the next "most recent" tweets, without new incoming tweets messing up your pagination back through time.
There is absolutely no need to worry about whether _id value is strictly corresponding to insertion order - it will be 99.999% close enough, and no one actually cares on the sub-second level which tweet came first - you might even notice Twitter frequently displays tweets out of order, it's just not that critical.
If it is critical, then you would have to use the same technique but with "tweet date" where that date would have to be a timestamp, rather than just a date.
Wouldn't a tweet "actual" timestamp (i.e. time tweeted and the criteria you want it sorted by) be different from a tweet "insertion" timestamp (i.e. time added to local collection). This depends on your application, of course, but it's a likely scenario that tweet inserts could be batched or otherwise end up being inserted in the "wrong" order. So, unless you work at Twitter (and have access to collections inserted in correct order), you wouldn't be able to rely just on $natural or ObjectID for sorting logic.
Mongo docs suggest skip and limit for paging:
db.tweets.find({created: {$lt: maxID}).
sort({created: -1, username: 1}).
skip(50).limit(50); //second page
There is, however, a performance concern when using skip:
The cursor.skip() method is often expensive because it requires the server to walk from the beginning of the collection or index to get the offset or skip position before beginning to return result. As offset increases, cursor.skip() will become slower and more CPU intensive.
This happens because skip does not fit into the MapReduce model and is not an operation that would scale well, you have to wait for a sorted collection to become available before it can be "sliced". Now limit(n) sounds like an equally poor method as it applies a similar constraint "from the other end"; however with sorting applied, the engine is able to somewhat optimize the process by only keeping in memory n elements per shard as it traverses the collection.
An alternative is to use range based paging. After retrieving the first page of tweets, you know what the created value is for the last tweet, so all you have to do is substitute the original maxID with this new value:
db.tweets.find({created: {$lt: lastTweetOnCurrentPageCreated}).
sort({created: -1, username: 1}).
limit(50); //next page
Performing a find condition like this can be easily parallellized. But how to deal with pages other than the next one? You don't know the begin date for pages number 5, 10, 20, or even the previous page! #SergioTulentsev suggests creative chaining of methods but I would advocate pre-calculating first-last ranges of the aggregate field in a separate pages collection; these could be re-calculated on update. Furthermore, if you're not happy with DateTime (note the performance remarks) or are concerned about duplicate values, you should consider compound indexes on timestamp + account tie (since a user can't tweet twice at the same time), or even an artificial aggregate of the two:
db.pages.
find({pagenum: 3})
> {pagenum:3; begin:"01-01-2014#BillGates"; end:"03-01-2014#big_ben_clock"}
db.tweets.
find({_sortdate: {$lt: "03-01-2014#big_ben_clock", $gt: "01-01-2014#BillGates"}).
sort({_sortdate: -1}).
limit(50) //third page
Using an aggregate field for sorting will work "on the fold" (although perhaps there are more kosher ways to deal with the condition). This could be set up as a unique index with values corrected at insert time, with a single tweet document looking like
{
_id: ...,
created: ..., //to be used in markup
user: ..., //also to be used in markup
_sortdate: "01-01-2014#BillGates" //sorting only, use date AND time
}
The following approach wil work even if there are multiple documents inserted/updated at same millisecond even if from multiple clients (which generates ObjectId). For simiplicity, In following queries I am projecting _id, lastModifiedDate.
First page, fetch the result Sorted by modifiedTime (Descending), ObjectId (Ascending) for fist page.
db.product.find({},{"_id":1,"lastModifiedDate":1}).sort({"lastModifiedDate":-1, "_id":1}).limit(2)
Note down the ObjectId and lastModifiedDate of the last record fetched in this page. (loid, lmd)
For sencod page, include query condition to search if (lastModifiedDate = lmd AND oid > loid ) OR (lastModifiedDate < loid)
db.productfind({$or:[{"lastModifiedDate":{$lt:lmd}},{"_id":1,"lastModifiedDate":1},{$and:[{"lastModifiedDate":lmd},{"_id":{$gt:loid}}]}]},{"_id":1,"lastModifiedDate":1}).sort({"lastModifiedDate":-1, "_id":1}).limit(2)
repeat same for subsequent pages.
ObjectIds should be good enough for pagination if you limit your queries to the previous second (or don't care about the subsecond possibility of weirdness). If that is not good enough for your needs then you will need to implement an ID generation system that works like an auto-increment.
Update:
To query the previous second of ObjectIds you will need to construct an ObjectID manually.
See the specification of ObjectId http://docs.mongodb.org/manual/reference/object-id/
Try using this expression to do it from a mongos.
{ _id :
{
$lt : ObjectId(Math.floor((new Date).getTime()/1000 - 1).toString(16)+"ffffffffffffffff")
}
}
The 'f''s at the end are to max out the possible random bits that are not associated with a timestamp since you are doing a less than query.
I recommend during the actual ObjectId creation on your application server rather than on the mongos since this type of calculation can slow you down if you have many users.
I have build a pagination using mongodb _id this way.
// import ObjectId from mongodb
let sortOrder = -1;
let query = []
if (prev) {
sortOrder = 1
query.push({title: 'findTitle', _id:{$gt: ObjectId('_idValue')}})
}
if (next) {
sortOrder = -1
query.push({title: 'findTitle', _id:{$lt: ObjectId('_idValue')}})
}
db.collection.find(query).limit(10).sort({_id: sortOrder})

Categories

Resources