UPDATE:
This is the console.error(err) message i get:
Error: 6 ALREADY_EXISTS: Document already exists: projects/projectOne-e3999/databases/(default)/documents/storage/AAAAAAAAAA1111111111
I do get why I get the error. Because a transaction running in parallel was faster executing the else Part of the transaction. So the transaction that is returning the error does not have to create but to update the document. It would be enough if the transaction woul re-run. But it does not altough it states Transactions are committed once 'updateFunction' resolves and attempted up to five times on failure.
And that is the whole point of the question. How can i prevent the transaction from failing when the document already exists, because it was created in parallel or how to re-run the transaction?
Original Question:
I have a cloud function that updates another document whenever a document in another collection gets created. As there can be multiple documents created in parallel which all access the same document this will run as transaction, idempotance is insured.
The structure of the function looks like the following. The problem is the very first write (so the creation of the storage doc). As i've said two documents can get created in parallel. But now the transaction reads a document which is not there and tries to create it, but as the other transaction (parallel running one) already created the document the transaction fails. (Which i do not understand in the first place, as I thought the transaction will block all access while executing)
This is not a problem at all, but it does not retry (although it states it will retry up to 5 times automatically). I think this has to do something with my try & catch block for the async function.
If it would retry it would detect that the document exists and automatically update the existing one.
Let's put it simple. This transaction must never fail. It would leave the database inconsistent without a possible automatic way to recover.
.onCreate(async (snapshot, context) => {
//Handle idempotence
const eventId = context.eventId;
try {
const process = await shouldProcess(eventId)
if (!process) {
return null
}
const storageDoc = admin.firestore().doc(`storage/${snapshot.id}`)
await admin.firestore().runTransaction(async t => {
const storageDbDoc = await storageDoc.get()
const dataDb = storageDbDoc.data()
if (dataDb) {
//For sake of testing just rewrite it
t.update(storageDoc, dataDb )
} else {
//Create new
const storage = createStorageDoc()
t.create(storageDoc, storage )
}
})
return markProcessed(eventId)
} catch (err) {
console.error(`Execution of ${eventId} failed: ${err}`)
throw new Error(`Transaction failed.`);
}
Your transaction insists on being able to create a new document:
t.create(storageDoc, storage)
Since create() fails on the condition of the document already existing, the entire transaction simply fails, without the ability to recover.
If your transaction must be able to recover, you should check if that document exists prior to trying to write it, and decide what you want to do in that case.
Related
Below is my code summarized, my question is quite simple, I actually already have the object order loaded before all this code, but I am afraid the update on this object will not be handled inside the transaction if I don't get it again inside it.
So my question is, do I have to retrieve again my order object inside the transaction?
try {
await runTransaction(db, async (transaction) => {
// First object to update
const dailyCountSnap = await transaction.get(dailyCountRef);
let dailyCount;
if (dailyCountSnap.exists) {
dailyCount = dailyCountSnap.data();
} else {
return Promise.reject(`ERROR: No daily count found`);
}
/* ... Code to update dailyCount ... */
transaction.set(dailyCountRef, {...});
// Second object to update
const orderSnap = await transaction.get(orderRef);
let order;
if (orderSnap.exists) {
order = orderSnap.data();
} else {
return Promise.reject(`ERROR: No order found`);
}
/* ... Code to update order ... */
transaction.update(orderRef, {...});
});
} catch (e) {
functions.logger.error(e);
}
A transaction on Firestore only has context on data that was read within that transaction.
From the documentation on optimistic concurrency controls:
In the Mobile/Web SDKs, a transaction keeps track of all the documents you read inside the transaction.
and
In the server client libraries, transactions place locks on the documents they read.
So if you write a document that you didn't read before, the write operation could be overwriting data in the document and not meet the isolation guarantee of the transaction. The SDK might even raise an error when you try this, although I didn't check that.
New to MongoDB, very new to Atlas. I'm trying to set up a trigger such that it reads all the data from a collection named Config. This is my attempt:
exports = function(changeEvent) {
const mongodb = context.services.get("Cluster0");
const db = mongodb.db("TestDB");
var collection = db.collection("Config");
config_docs = collection.find().toArray();
console.log(JSON.stringify(config_docs));
}
the function is part of an automatically created realm application called Triggers_RealmApp, which has Cluster0 as a named linked data source. When I go into Collections in Cluster0, TestDB.Config is one of the collections.
Some notes:
it's not throwing an error, but simply returning {}.
When I change context.services.get("Cluster0"); to something else, it throws an error
When I change "TestDB" to a db that doesnt exist, or "Config" to a collection which doesn't exist, I get the same output; {}
I've tried creating new Realm apps, manually creating services, creating new databases and new collections, etc. I keep bumping into the same issue.
The mongo docs reference promises and awaits, which I haven't seen in any examples (link). I tried experimenting with that a bit and got nowhere. From what I can tell, what I've already done is the typical way of doing it.
Images:
Collection:
Linked Data Source:
I ended up taking it up with MongoDB directly, .find() is asynchronous and I was handling it incorrectly. Here is the reply straight from the horses mouth:
As I understand it, you are not getting your expected results from the query you posted above. I know it can be confusing when you are just starting out with a new technology and can't get something to work!
The issue is that the collection.find() function is an asynchronous function. That means it sends out the request but does not wait for the reply before continuing. Instead, it returns a Promise, which is an object that describes the current status of the operation. Since a Promise really isn't an array, your statment collection.find().toArray() is returning an empty object. You write this empty object to the console.log and end your function, probably before the asynchronous call even returns with your data.
There are a couple of ways to deal with this. The first is to make your function an async function and use the await operator to tell your function to wait for the collection.find() function to return before continuing.
exports = async function(changeEvent) {
const mongodb = context.services.get("Cluster0");
const db = mongodb.db("TestDB");
var collection = db.collection("Config");
config_docs = await collection.find().toArray();
console.log(JSON.stringify(config_docs));
};
Notice the async keyword on the first line, and the await keyword on the second to last line.
The second method is to use the .then function to process the results when they return:
exports = function(changeEvent) {
const mongodb = context.services.get("Cluster0");
const db = mongodb.db("TestDB");
var collection = db.collection("Config");
collection.find().toArray().then(config_docs => {
console.log(JSON.stringify(config_docs));
});
};
The connection has to be a connection to the primary replica set and the user log in credentials are of a admin level user (needs to have a permission of cluster admin)
I am working on versioning changes for an application. I am making use of the mongoose pre-hook to alter the queries before processing according to the versioning requirements, I came across a situation where I need to do a separate query to check whether the other document exists and if it is I don't have to execute the current query as shown below,
schema.pre('find', { document: false, query: true }, async function (next) {
const query = this.getQuery();
const doc = await model.find(query).exec();
if (!doc) {
const liveVersion = { ...query, version: "default" };
this.setQuery(liveVersion);
} else {
return doc;
}
});
In the above find pre-hook, I am trying to
check the required doc exists in the DB using the find query and return if does exist and
if the document does not exist, I am executing the query by setting the default version based query.
The problem here is mongoose will execute the set query no matter what and the result its returning is also the one which I got for the this.setQuery, not the other DB query result(doc).
Is there a way to stop the default query execution in mongoose pre-hook?
Any help will be appreciated.
The only way to stop the execution of the subsequent action would be to throw an error, so you can throw a specific error in else, with your data in the property of the error object, something like:
else {
let err = new Error();
err.message = "not_an_error";
err.data = doc;
}
but that would mean wrapping all your find calls with a try/catch, and in the catch deal with this specific error in the way of extracting your data, or throw for the main error checking if it's an actual error. In the end you'll be having a very ugly code and logic.
This is specifically for the way you ask it, but normally you can just define another method, like findWithCheck(), and do your checks of the pre hook above in this custom method.
Of course you could try also overriding the actual find(), but that would be overkill, and in this case it means pretty much breaking the whole thing more for test purposes rather than development.
What I would like to do is add custom properties to telemetry data as it leaves my application. Currently I am achieving this using a Telemetry Processor, however ideally I would like to read the value to be sent with the event from a database.
Is it possible to perform async operations inside a telemetry processor?
var TraceProcessor = function (app) {
return function (envelope) {
var i;
var objTelemetryController = app.telemetryController;
objTelemetryController.__proto__.getActiveTraces('GLOBAL', function (err, objTraces) {
if (err) {
// Error controller log error
return;
}
if (objTraces) {
for (i = 0; i < objTraces.length; i++) {
envelope.data.baseData.properties['TraceProperty'] = objTraces[i];
}
return true;
}
});
};
};
module.exports = TraceProcessor;
Using his code the telemetry data is not sent because insights requires true to be returned from any telemetry processors that are in use. Obviously this does happen eventually but not so that the properties can be added.
I think it's better to use TelemetryInitializer to enrich the telemetry data with extra information, the purpose of the TelemetryProcessor skewed more towards filtering rather than data enrichment.
However, I think that if you try to call SQL or HTTP dependency from within the Telemetry Initializer it might go into an endless cycle:
Telemetry Item is processed in Initializer
Initializer starts SQL query AI
Detects SQL query and start processing telemetry item about it
Telemetry Initializer calls into SQL....
I doubt that async is really supported here at this moment, it could've helped (e.g. return a task and wait for value to fill in) but it would require an immersive investigation to consider all the cases.
I need to create several deployment scripts like data migration and fixtures for a MongoDB database and I couldn't find enough information about how to drop indexes using Mongoose API. This is pretty straight-forward when using the official MongoDB API:
To delete all indexes on the specified collection:
db.collection.dropIndexes();
However, I would like to use Mongoose for this and I tried to use executeDbCommand adapted from this post, but with no success:
mongoose.connection.db.executeDbCommand({ dropIndexes: collectionName, index: '*' },
function(err, result) { /* ... */ });
Should I use the official MongoDB API for Node.js or I just missed something in this approach?
To do this via the Mongoose model for the collection, you can call dropAllIndexes of the native collection:
MyModel.collection.dropAllIndexes(function (err, results) {
// Handle errors
});
Update
dropAllIndexes is deprecated in the 2.x version of the native driver, so dropIndexes should be used instead:
MyModel.collection.dropIndexes(function (err, results) {
// Handle errors
});
If you want to maintain your indexes in your schema definitions with mongoose (you probably do if you're using mongoose), you can easily drop ones not in use anymore and create indexes that don't exist yet. You can just run a one off await YourModel.syncIndexes() on any models that you need to sync. It will create ones in the background with .ensureIndexes and drop any that no longer exist in your schema definition. You can look at the full docs here:
https://mongoosejs.com/docs/api.html#model_Model.syncIndexes
It looks like you're attempting to drop all of the indexes on a given collection.
According to the MongoDB Docs, this is the correct command.
... I tried to use executeDbCommand adapted from this post, but with no success:
To really help here, we need more details:
What failed? How did you measure "no success"?
Can you confirm 100% that the command ran? Did you output to the logs in the callback? Did you check the err variable?
Where are you creating indexes? Can you confirm that you're not re-creating them after dropping?
Have you tried the command while listing specific index names? Honestly, you should not be using "*". You should be deleting and creating very specific indexes.
This might not be the best place to post this, but I think its worth posting anyway.
I call model.syncIndexes() every time a model is defined/created against the db connection, this ensures the indexes are current and up-to-date with the schema, however as it has been highlighted online (example), this can create issues in distributed architectures, where multiple servers are attempting the same operation at the same time. This is particularly relevant if using something like the cluster library to spawn master/slave instances on multiple cores on the same machine, since they often boot up in close proximity to each other when the whole server is started.
In reference to the above 'codebarbarian' article, the issue is highlighted clearly when they state:
Mongoose does not call syncIndexes() for you, you're responsible for
calling syncIndexes() on your own. There are several reasons for this,
most notably that syncIndexes() doesn't do any sort of distributed
locking. If you have multiple servers that call syncIndexes() when
they start, you might get errors due to trying to drop an index that
no longer exists.
So What I do is create a function which uses redis and redis redlock to gain a lease for some nominal period of time to prevent multiple workers (and indeed multiple workers in multiple servers) from attempting the same sync operation at the same time.
It also bypasses the whole thing unless it is the 'master' that is trying to perform the operation, I don't see any real point in delegating this job to any of the workers.
const cluster = require('cluster');
const {logger} = require("$/src/logger");
const {
redlock,
LockError
} = require("$/src/services/redis");
const mongoose = require('mongoose');
// Check is mongoose model,
// ref: https://stackoverflow.com/a/56815793/1834057
const isMongoModel = (obj) => {
return obj.hasOwnProperty('schema') && obj.schema instanceof mongoose.Schema;
}
const syncIndexesWithRedlock = (model,duration=60000) => new Promise(resolve => {
// Ensure the cluster is master
if(!cluster.isMaster)
return resolve(false)
// Now attempt to gain redlock and sync indexes
try {
// Typecheck
if(!model || !isMongoModel(model))
throw new Error('model argument is required and must be a mongoose model');
if(isNaN(duration) || duration <= 0)
throw new Error('duration argument is required, and must be positive numeric')
// Extract name
let name = model.collection.collectionName;
// Define the redlock resource
let resource = `syncIndexes/${name}`;
// Coerce Duration to Integer
// Not sure if this is strictly required, but wtf.
// Will ensure the duration is at least 1ms, given that duration <= 0 throws error above
let redlockLeaseDuration = Math.ceil(duration);
// Attempt to gain lock and sync indexes
redlock.lock(resource,redlockLeaseDuration)
.then(() => {
// Sync Indexes
model.syncIndexes();
// Success
resolve(true);
})
.catch(err => {
// Report Lock Error
if(err instanceof LockError){
logger.error(`Redlock LockError -- ${err.message}`);
// Report Other Errors
}else{
logger.error(err.message);
}
// Fail, Either LockError error or some other error
return resolve(false);
})
// General Fail for whatever reason
}catch(err){
logger.error(err.message);
return resolve(false);
}
});
I wont go into setting up Redis connection, that is the subject of some other thread, but the point of this above code is to show how you can use syncIndexes() reliably and prevent issues with one thread dropping an index and another trying to drop the same index, or other distributed issues with attempting to modify indexes concurrently.
to drop a particular index you could use
db.users.dropIndex("your_index_name_here")