Performing Data Cleanup In Mongodb

Performing Data Cleanup In Mongodb - javascript

My application tracks the movements of data throughout the system. When a movement is recorded it is placed in a separate collection that determines whether the document is enroute, available or out of service. I used $addToSet to place the _id, and $pullAll to make sure that when a doc is moved from enroute to available, it is not duplicated. But when the _id is moved to a new location entirely, I need to remove the old data from the old location and insert it into the new location. The insertion works but I cannot figure out how to properly remove the data from the old location. These are all down within Meteor Calls and Mongodb
if last.status is "Enroute"
LastLocation.update locationId: last.locationId,partId: last.partId,
$addToSet:
enroutePurchaseIds: lastPurchaseId
$pullAll:
availiblePurchaseIds: lastPurchaseId
outOfServicePurchaseIds: lastPurchaseId

Update
You can run the merge command from upcoming 4.4 version which allows updating the same collection the aggregation is running on. Pass the array as old location and new location
db.collection.aggregate([
{"$match":{"location":{"$in":[oldLocation,newLocation]}}},
{"$addFields":{"sortOrder":{"$indexOfArray":[[oldLocation,newLocation],"$location"]}}},
{"$sort":{"sortOrder":1}},
{"$group":{
"_id":null,
"oldLocationDoc":{"$first":"$$ROOT"},
"newLocationDoc":{"$last":"$$ROOT"}
}},
{"$addFields":{
"oldLocationDoc.old":{
"$filter":{
"input":"$oldLocationDoc.old",
"cond":{"$ne":["$$this",oldLocation]}
}
},
"newLocationDoc.new":{"$concatArrays":["$newLocationDoc.new",[newLocation]]}
}},
{"$project":{"locations":["$oldLocationDoc","$newLocationDoc"]}},
{"$unwind":"$locations"},
{"$replaceRoot":{"newRoot":"$locations"}},
{"$merge":{
"into":{"db":"db","coll":"collection"},
"on":"_id",
"whenMatched":"merge",
"whenNotMatched":"failed"
}}
]
Original
Not possible to move array/field value from one document to another document in a single update operation.
You would want to use transactions to perform multi document updates in a atomic way. Requires replica set.
var session = db.getMongo().startSession();
var collection = session.getDatabase('test').getCollection('collection');
session.startTransaction({readConcern: {level:'snapshot'},writeConcern: {w:'majority'}});
collection.update({location:oldLocation},{$pull:{availiblePurchaseIds:lastPurchaseId}});
collection.update({location:newLocation},{$push:{enroutePurchaseIds:lastPurchaseId}});
session.commitTransaction()
session.endSession()
Other options would be to perform bulk updates in case of standalone mongod instance.
var bulk = db.getCollection('collection').initializeUnorderedBulkOp();
bulk.find({location:oldLocation}).updateOne({$pull:{availiblePurchaseIds:lastPurchaseId}});
bulk.find({location:newLocation}).updateOne({$push:{enroutePurchaseIds:lastPurchaseId}});
bulk.execute();

Are you moving the entire document from one collection to another or just moving the document's id? I can't help much with coffeescript but if you're looking to move entire documents you might find the following thread helpful.
mongodb move documents from one collection to another collection

Related

Eliminating documents from Firebase Realtime database (Not the entire collection)

I have deployed an app with React and I am using Firebase Realtime database to store some info about attention tickets in a call center. The database will store aprox 80 tickets info per day, but this is cumulative. I want to avoid this so I will not reach the firebase storage limit.
My idea so far is to delete every day at noon the tickets from the database, so It will only store the data from the current date and eliminate it at noon.
I am using remove() function from firebase, but when I tried referencing to the collection, It was entired deleted, I just want to delete the documents but not the entire collection.
Is there a way to specify Firebase to delete docs only, maybe to delete every docs except one?
This is the bunch of code that I pretend to use for deleting (Based on JS)
function deletedata(){
const dbRef = query(ref(db,'/mycollectionpath'));
onValue(dbRef, (snapshot)=>{
snapshot.forEach(childSnapshot=>{
let keyName = childSnapshot.key;
remove(dbRef);
})
});
}
setInterval(deletedata,1000)

The Firebase Realtime Database automatically creates parent nodes when you write a value under them, and it automatically deletes parent nodes when there are no longer any values under them. There is no way to have a node without a value.
If you want to keep the node, consider writing a dummy value. For example, this would remove all child nodes from mycollectionpath and replace them with a single true value:
const dbRef = ref(db, '/mycollectionpath');
dbRef.set(true);

Difference between jsforce bulk api options

I'm using jsforce to access salesforce using the bulk api. It has two ways of updating and deleting records. One is using the normal bulk api which means creating a job and batches:
var job = conn.bulk.createJob("Account", "delete");
var batch = job.createBatch();
var accounts = getAccountsByDate(jsforce.Date.TODAY);
batch.execute(accounts);
batch.on('response', function(rets) {
// do things
});
The other way is to the "query" interface like this:
conn.sobject('Account')
.find({ CreatedDate: jsforce.Date.TODAY })
.destroy(function(err, rets) {
// do things
});
The second way certainly seems easier but I can't get it to update or delete more than 10,000 records at a time, which appears to be a salesforce api limit on batch size. Note that using maxFetch property from jsforce appears to have no effect in this case.
So is it safe to assume that the query style interface only creates a single batch? The jsforce documentation is not clear on this point.

Currently the bulk.load() method in JSforce bulk api generates a job with one batch, so the limit of 10,000 per batch will be applied. It is also true when using find-and-destroy interface, which uses bulk.load() internally.
To avoid this limit you can create a job by bulk.createJob() and create several batches by job.createBatch(), then dispatch the records to delete into these batches so that each records will not exceed the limit.

MongoDB updateOne with upsert:true keeps inserting new document

I have a lastTracks collection which I intend to keep it updated with new track data.
I try to do that using this piece of code with MongoDB driver for NodeJS:
db.collection('lastTracks').updateOne(
{
'track.deviceID': lastPacket.deviceID,
timestampZero: {$lte: lastPacket.timestamp}
},
{
$set: {
timestampZero: lastPacket.timestamp,
},
$setOnInsert: {'track.deviceID': lastPacket.deviceID}
},
{ upsert: true}
);
What I want to do is to perform an update for timestampZero if there is a document for deviceID in collection and it's timestampZero is lower or equal than packet's timestamp; And if there is no such document, then insert it.
The problem is whenever this piece of code runs, a new document gets inserted and no update operation happens on existing document with given deviceID. And even if I create an index for 'track.deviceID' a duplicate key error happens.
UPDATE:
I found that the reason a new document gets inserted every time is because there is a document with same deviceID but greater timestamp in the collection, and so my update condition evaluates to false and then Mongo tries to insert a new document (as I wanted with upsert:true).
But what I want is that if timestamp is greater than lastPacket.timestamp neither update nor insert happen.
With a little search I found that my solution could be simply creating a unique index on 'tracks.deviceID' and then ignoring the duplicate error: https://github.com/winstonjs/winston#logging-levels
or I can do a two phase commit: https://docs.mongodb.com/manual/tutorial/perform-two-phase-commits/
The first solution works simple and well but just doesn't feel right and the second solution is a little complicated both for understanding and implementation, so my question is is there any better way for this?

PouchDB delete document upon successful replication to CouchDB

I am trying to implement a process as described below:
Create a sale_transaction document in the device.
Put the sale_transaction document in Pouch.
Since there's a live replication between Pouch & Couch, let the sale_transaction document flow to Couch.
Upon successful replication of the sale_transaction document to Couch, delete the document in Pouch.
Don't let the deleted sale_transaction document in Pouch, to flow through Couch.
Currently, I have implemented a two-way sync from both databases, where I'm filtering each document that is coming from Couch to Pouch, and vice versa.
For the replication from Couch to Pouch, I didn't want to let sale_transaction documents to go through, since I could just get these documents from Couch.
PouchDb.replicate(remoteDb, localDb, {
// Replicate from Couch to Pouch
live: true,
retry: true,
filter: (doc) => {
return doc.doc_type!=="sale_transaction";
}
})
While for the replication from Pouch to Couch, I put in a filter not to let deleted sale_transaction documents to go through.
PouchDb.replicate(localDb, remoteDb, {
// Replicate from Pouch to Couch
live: true,
retry: true,
filter: (doc) => {
if(doc.doc_type==="sale_transaction" && doc._deleted) {
// These are deleted transactions which I dont want to replicate to Couch
return false;
}
return true;
}
}).on("change", (change) => {
// Handle change
replicateOutChangeHandler(change)
});
I also implemented a change handler to delete the sale_transaction documents in Pouch, after being written to Couch.
function replicateOutChangeHandler(change) {
for(let doc of change.docs) {
if(doc.doc_type==="sale_transaction" && !doc._deleted) {
localDb.upsert(doc._id, function(prevDoc) {
if(!prevDoc._deleted) {
prevDoc._deleted = true;
}
return prevDoc;
}).then((res)=>{
console.log("Deleted Document After Replication",res);
}).catch((err)=>{
console.error("Deleted Document After Replication (ERROR): ",err);
})
}
}
}
The flow of the data seems to be working at first, but when I get the sale_transaction document from Couch, then do some editing, I would then have to repeat the process of writing the document in Pouch, then let it flow to Couch, then delete it in Pouch. But, after some editing with the same document, the document in Couch, has also been deleted.
I am fairly new with Pouch & Couch, specifically in NoSQL, and was wondering if I'm doing something wrong in the process.

For a situation like the one you've described above, I'd suggest tweaking your approach as follows:
Create a PouchDB database as a replication target from CouchDB, but treat this database as a read-only mirror of the CouchDB database, applying whatever transforms you need in order to strip certain document types from the local store. For the sake of this example, let's call this database mirror. The mirror database only gets updated one-way, from the canonical CouchDB database via transform replication.
Create a separate PouchDB database to store all your sales transactions. For the sake our this example, let's call this database user-data.
When the user creates a new sale transaction, this document is written to user-data. Listen for changes on user-data, and when a document is created, use the change handler to create and write the document directly to CouchDB.
At this point, CouchDB is recieving sales transactions from user-data, but your transform replication is preventing them from polluting mirror. You could leave it at that, in which case user-data will have local copies of all sales transactions. On logout, you can just delete the user-data database. Alternatively, you could add some more complex logic in the change handler to delete the document once CouchDB has recieved it.
If you really wanted to get fancy, you could do something even more elaborate. Leave the sales transactions in user-data after it's written to CouchDB, and in your transform replication from CouchDB to mirror, look for these newly-created sales transactions documents. Instead of removing them, just strip them of anything but their _id and _rev fields, and use these as 'receipts'. When one of these IDs match an ID in user-data, that document can be safely deleted.
Whichever method you choose, I suggest you think about your local PouchDB's _changes feed as a worker queue, instead of putting all of this elaborate logic in replication filters. The methods above should all survive offline cases without introducing conflicts, and recover nicely when connectivity is restored. I'd recommend the last solution, though it might be a bit more work than the others. Hope this helps.

Maybe additional field for delete - thus marking the record for deletion.
Then periodic routine running on both Pouch and Couch that scan for marked for deletion records and delete them.

What's remove() and save() mean in mongodb node.js when initializing one database

I am newly using node.js. I am reading the code of one app. The code below is to initialize the db, to load some question into the survey system I can't understand what's remove() and save() means here. Because I can't find any explanation about these two method. It seems mongoose isn't used after being connected. Could any one explain the usage of these methods?
Well, this is my understanding of this code, not sure to be correct. My TA tell me it should be run before server.js.
/**
* This is a utility script for dropping the questions table, and then
* re-populating it with new questions.
*/
// connect to the database
var mongoose = require('mongoose');
var configDB = require('./config/database.js');
mongoose.connect(configDB.url);
// load the schema for entries in the 'questions' table
var Question = require('./app/models/questions');
// here are the questions we'll load into the database. Field names don't
// quite match with the schema, but we'll be OK.
var questionlist = [
/*some question*/
];
// drop all data, and if the drop succeeds, run the insertion callback
Question.remove({}, function(err) {
// count successful inserts, and exit the process once the last insertion
// finishes (or any insertion fails)
var received = 0;
for (var i = 0; i < questionlist.length; ++i) {
var q = new Question();
/*some detail about defining q neglected*/
q.save(function(err, q) {
if (err) {
console.error(err);
process.exit();
}
received++;
if (received == questionlist.length)
process.exit();
});
}
});

To add some additional detail, mongoose is all based on using schemas and working with those to manipulate your data. In a mongodb database, you have collections, and each collection holds different kinds of data. When you're using mongoose, what's happening behind the scenes is every different Schema you work with maps to a mongodb collection. So when you're working with Question Schema in mongoose land, there's really some Question collection behind the scenes in the actual db that your working with. You might also have a Users Schema, which would act as an abstraction for some Users collection in the db, or maybe you could have a Products Schema, which again would map to some collection of products behind the scenes in the actual db.
As mentioned previously, when calling remove({}, callback) on the Questions Schema, you're telling mongoose to go find the Questions collection in the db and remove all entries, or documents as they're called in mongodb, that match a certain criteria. You specify that criteria in the object literal that is passed in as the first argument. So if the Questions Schema has some boolean field called correct and you wanted to delete all of the incorrect questions, you could say Question.remove({ correct: false }, callback). Also as mentioned previously, when passing an empty object to remove, your telling mongoose to remove ALL documents in the Schema, or collection rather. If you're not familiar with callback functions, pretty much the callback function says, "hey after you finish this async operation, go ahead and do this."
The save() function that is used here is a little different than how save() is used in the official mongodb driver, which is one reason why I don't prefer mongoose. But to explain, pretty much all save is doing here is you're creating this new question, referred to by the q variable, and when you call save() on that question object, you're telling mongoose to take that object and insert it as a new document into your Questions collection behind the scenes. So save here just means insert into the db. If you were using the official mongo driver, it would be db.getCollection('collectionName').insert({/* Object representing new document to insert */}).
And yes your TA is correct. This code will need to run before your server.js file. Whatever your server code does, I assume it's going to connect to your database.
I would encourage you to look at the mongoose API documentation. Long term though, the official mongodb driver might be your best bet.

Mongoose basically maps your MongoDB queries to JavaScript objects using schema.
remove() receives a selector, and callback function. Empty selector means, that all Questions will be affected.
After that a new Question object is created. I guess that you omitted some data being set on it. After that it's being saved back into MongoDB.
You can read more about that in the official documentation:
http://mongoosejs.com/docs/api.html#types-subdocument-js

remove query is use for removing all documents from collection and save is use for creating new document.
As per your code it seems like every time the script run it removes all the record from Question collection and then save new records for question from question list.

Develop Reference

JavaScript is the programming language of the Web.