I have a lastTracks collection which I intend to keep it updated with new track data.
I try to do that using this piece of code with MongoDB driver for NodeJS:
db.collection('lastTracks').updateOne(
{
'track.deviceID': lastPacket.deviceID,
timestampZero: {$lte: lastPacket.timestamp}
},
{
$set: {
timestampZero: lastPacket.timestamp,
},
$setOnInsert: {'track.deviceID': lastPacket.deviceID}
},
{ upsert: true}
);
What I want to do is to perform an update for timestampZero if there is a document for deviceID in collection and it's timestampZero is lower or equal than packet's timestamp; And if there is no such document, then insert it.
The problem is whenever this piece of code runs, a new document gets inserted and no update operation happens on existing document with given deviceID. And even if I create an index for 'track.deviceID' a duplicate key error happens.
UPDATE:
I found that the reason a new document gets inserted every time is because there is a document with same deviceID but greater timestamp in the collection, and so my update condition evaluates to false and then Mongo tries to insert a new document (as I wanted with upsert:true).
But what I want is that if timestamp is greater than lastPacket.timestamp neither update nor insert happen.
With a little search I found that my solution could be simply creating a unique index on 'tracks.deviceID' and then ignoring the duplicate error: https://github.com/winstonjs/winston#logging-levels
or I can do a two phase commit: https://docs.mongodb.com/manual/tutorial/perform-two-phase-commits/
The first solution works simple and well but just doesn't feel right and the second solution is a little complicated both for understanding and implementation, so my question is is there any better way for this?
Related
I am working with a project where we create a bunch of entries in firestore based on results from an API endpoint we do not control, using a firestore cloud function. The API endpoint returns ids which we use for the document ids, but it does not include any timestamp information. Since we want to include a createdDate in our documents, we are using admin.firestore.Timestamp.now() to set the timestamp of the document.
On subsequent runs of the function, some of the documents will already exist so if we use batch.commit with create, it will fail since some of the documents exist. However, if we use batch.commit with update, we will either not be able to include a timestamp, or the current timestamp will be overwritten. As a final requirement, we do update these documents from a web application and set some properties like a state, so we can't limit the permissions on the documents to disallow update completely.
What would be the best way to achieve this?
I am currently using .create and have removed the batch, but I feel like this is less performant, and I occasionally do get the error Error: 4 DEADLINE_EXCEEDED on the firestore function.
First prize would be a batch that can create or update the documents, but does not edit the createdDate field. I'm also hoping to avoid reading the documents first to save a read, but I'd be happy to add it in if it's the best solution.
Thanks!
Current code is something like this:
const createDocPromise = docRef
.create(newDoc)
.then(() => {
// success, do nothing
})
.catch(err => {
if (
err.details &&
err.details.includes('Document already exists')
) {
// doc already exists, ignore error
} else {
console.error(`Error creating doc`, err);
}
});
This might not be possible with batched writes as set() will overwrite the existing document, update() will update the timestamp and create() will throw an error as you've mentioned. One workaround would be to use create() for each document with Promise.allSettled() that won't run catch() if any of the promise fails.
const results = [] // results from the API
const promises = results.map((r) => db.doc(`col/${r.id}`).create(r));
const newDocs = await Promise.allSettled(promises)
// either "fulfilled" or "rejected"
newDocs.forEach((result) => console.log(result.status))
If any documents exists already, create() will throw an error and status for that should be rejected. This way you won't have to read the document at first place.
Alternatively, you could store all the IDs in a single document or RTDB and filter out duplicates (this should only cost 1 read per invocation) and then add the data.
Since you prefer to keep the batch and you want to avoid reading the documents, a possible solution would be to store the timestamps in a field of type Array. So, you don't overwrite the createdDate field but save all the values corresponding to the different writes.
This way, when you read one of the documents you sort this array and take the oldest value: it is the very first timestamp that was saved and corresponds to the document creation.
This way you don't need any extra writes or extra reads.
I have a collection
/userFeed
Where I create/delete docs (representing users) when the current user starts following/unfollowing them.
...
/userFeed (C)
/some-followed-user (D)
-date <timestamp>
-interactions <number>
When the user likes a post, the interactions field will be updated. But... what if the user doesn't follow the post owner? Then, I will just need to skip the document update, without necessity of producing failures/errors.
const currentUserFeedRef = firestore
.collection("feeds")
.doc(currentUserId)
.collection("userFeed")
.doc(otherUserId);
const data = {
totalInteractions: admin.firestore.FieldValue.increment(value),
};
const precondition = {
exists: false, // I am trying weird things
};
if (batchOrTransaction) {
return batchOrTransaction.update(
currentUserFeedRef,
data,
precondition
);
}
Is it possible to just "skip the update if the doc doesn't exist"?
Is it possible to just "skip the update if the doc doesn't exist"?
No, not in the way that you're explaining it. Firestore updates don't silently fail.
If you need to know if a document exists before updating it, you should simply read it first and check that it exists. You can do this very easily in a transaction, and you can be sure that the update won't fail due to the document being missing if you check it this way first using the transaction object.
In fact, what you are trying to do is illustrated as the very first example in the documentation.
My application tracks the movements of data throughout the system. When a movement is recorded it is placed in a separate collection that determines whether the document is enroute, available or out of service. I used $addToSet to place the _id, and $pullAll to make sure that when a doc is moved from enroute to available, it is not duplicated. But when the _id is moved to a new location entirely, I need to remove the old data from the old location and insert it into the new location. The insertion works but I cannot figure out how to properly remove the data from the old location. These are all down within Meteor Calls and Mongodb
if last.status is "Enroute"
LastLocation.update locationId: last.locationId,partId: last.partId,
$addToSet:
enroutePurchaseIds: lastPurchaseId
$pullAll:
availiblePurchaseIds: lastPurchaseId
outOfServicePurchaseIds: lastPurchaseId
Update
You can run the merge command from upcoming 4.4 version which allows updating the same collection the aggregation is running on. Pass the array as old location and new location
db.collection.aggregate([
{"$match":{"location":{"$in":[oldLocation,newLocation]}}},
{"$addFields":{"sortOrder":{"$indexOfArray":[[oldLocation,newLocation],"$location"]}}},
{"$sort":{"sortOrder":1}},
{"$group":{
"_id":null,
"oldLocationDoc":{"$first":"$$ROOT"},
"newLocationDoc":{"$last":"$$ROOT"}
}},
{"$addFields":{
"oldLocationDoc.old":{
"$filter":{
"input":"$oldLocationDoc.old",
"cond":{"$ne":["$$this",oldLocation]}
}
},
"newLocationDoc.new":{"$concatArrays":["$newLocationDoc.new",[newLocation]]}
}},
{"$project":{"locations":["$oldLocationDoc","$newLocationDoc"]}},
{"$unwind":"$locations"},
{"$replaceRoot":{"newRoot":"$locations"}},
{"$merge":{
"into":{"db":"db","coll":"collection"},
"on":"_id",
"whenMatched":"merge",
"whenNotMatched":"failed"
}}
]
Original
Not possible to move array/field value from one document to another document in a single update operation.
You would want to use transactions to perform multi document updates in a atomic way. Requires replica set.
var session = db.getMongo().startSession();
var collection = session.getDatabase('test').getCollection('collection');
session.startTransaction({readConcern: {level:'snapshot'},writeConcern: {w:'majority'}});
collection.update({location:oldLocation},{$pull:{availiblePurchaseIds:lastPurchaseId}});
collection.update({location:newLocation},{$push:{enroutePurchaseIds:lastPurchaseId}});
session.commitTransaction()
session.endSession()
Other options would be to perform bulk updates in case of standalone mongod instance.
var bulk = db.getCollection('collection').initializeUnorderedBulkOp();
bulk.find({location:oldLocation}).updateOne({$pull:{availiblePurchaseIds:lastPurchaseId}});
bulk.find({location:newLocation}).updateOne({$push:{enroutePurchaseIds:lastPurchaseId}});
bulk.execute();
Are you moving the entire document from one collection to another or just moving the document's id? I can't help much with coffeescript but if you're looking to move entire documents you might find the following thread helpful.
mongodb move documents from one collection to another collection
I'm using nedb and I'm trying to update an existing record by matching it's ID, and changing a title property.
What happens is that a new record gets created, and the old one is still there.
I've tried several combinations, and tried googling for it, but the search results are scarce.
var Datastore = require('nedb');
var db = {
files: new Datastore({ filename: './db/files.db', autoload: true })
};
db.files.update(
{_id: id},
{$set: {title: title}},
{},
callback
);
What's even crazier when performing a delete, a new record gets added again, but this time the record has a weird property:
{"$$deleted":true,"_id":"WFZaMYRx51UzxBs7"}
This is the code that I'm using:
db.files.remove({_id: id}, callback);
In the nedb docs it says followings :
localStorage has size constraints, so it's probably a good idea to set
recurring compaction every 2-5 minutes to save on space if your client
app needs a lot of updates and deletes. See database compaction for
more details on the append-only format used by NeDB.
Compacting the database
Under the hood, NeDB's persistence uses an append-only format, meaning
that all updates and deletes actually result in lines added at the end
of the datafile. The reason for this is that disk space is very cheap
and appends are much faster than rewrites since they don't do a seek.
The database is automatically compacted (i.e. put back in the
one-line-per-document format) everytime your application restarts.
You can manually call the compaction function with
yourDatabase.persistence.compactDatafile which takes no argument. It
queues a compaction of the datafile in the executor, to be executed
sequentially after all pending operations.
You can also set automatic compaction at regular intervals with
yourDatabase.persistence.setAutocompactionInterval(interval), interval
in milliseconds (a minimum of 5s is enforced), and stop automatic
compaction with yourDatabase.persistence.stopAutocompaction().
Keep in mind that compaction takes a bit of time (not too much: 130ms
for 50k records on my slow machine) and no other operation can happen
when it does, so most projects actually don't need to use it.
I didn't use this but it seems , it uses localStorage and it has append-only format for update and delete methods.
When investigated its source codes, in that search in persistence.tests they wanted to sure checking $$delete key also they have mentioned `If a doc contains $$deleted: true, that means we need to remove it from the data``.
So, In my opinion you can try to compacting db manually, or in your question; second way can be useful.
I have a couple of ideas for stopping duplicate handling of messages from Amazon's SQS queues. The app will also have a MongoDB server, which I think can be an effective part of either strategy:
Store queue items in Mongo, with a 'status' field - default to Pending. Then use SQS to queue the ID of the new message. One of the worker processes will get the ID, then do a findAndModify on the actual item in Mongo to set the status to Processing, unless it's already being processed, when it will flag that up.
Store queue items in the queue. Workers pick up items from the queue, then attempt to do an insert into Mongo with the item ID and some other info. If the item already existed, don't do the insert or continue, since it's a dupe.
The problems and questions I have:
Solution 1 seems counter-intuitive: why use SQS at all? I think it's because polling SQS is more correct than a whole load of worker processes polling Mongo for work.
Solution 2 I don't know how to implement. Is there an atomic find-and-insert-if-doesn't-exist? A simple get-or-insert-but-tell-me-which-occurred operation would do the trick.
Will any of these work in a large scale scenario, and/or is there a proven method that I haven't grasped?
....Humm, just wrote the question above, then had a thought for a get-or-insert-but-tell-me-which-occurred operation (in JS psuedocode):
var thingy = getrandomnumber();
findAndModify({
new: false,
upsert: true,
query: { $eq: { id: item_id } },
update: { thingy: thingy },
fields: { thingy: 1 }
});
If the item exists (and this is a conflict), then since new is false, the old document will be returned.
If the item didn't exist, new is false, so an empty document {} would be returned.
So either we got {}, indicating it resulted in an insert, or an actual document, indicating it was a get, and that ID already exists... all atomic. The thingy is in there because I don't know if MongoDB actually needs data there, I guess it would? If I used $inc on a duplicates field instead, would that work with an upsert? Then we could get stats on dupes later.
Is that right, maybe that would work?