In my Parse backend I have an array that contains unique number codes, so users must not be able to get the same code twice. For that reason somewhere in a column of some table I am keeping an index to this array.
Now there is a very simple operation - users ask for a unique code. The cloud function increments the current value of the index and returns the value of the array at the new index. The problem is that at first glance the Parse JS API has only increment operation performed atomically, but not the following read operation, since increment doesn't return a promise with a value which was set during THAT increment.
Now imagine the following scenario (pseudocode):
Field index has value 76, two users try to get the next code at the same time:
User1 -> increment('index') -> save -> then(obj1) -> return array[obj1.index]
User2 -> increment('index') -> save -> then(obj2) -> return array[obj2.index]
Now atomic increment will guarantee that after these 2 calls the index column will have value 78. But what about obj1 and obj2? If their value reading was not done atomically together with the increment operation, but was done through fetching after the increment was performed, then they both might have value 78! And the whole uniqueness logic will be broken.
Is there a way to get the atomic write operation result in Parse?
Increment does return the final value that was atomically incremented:
First the unit test to show how it is used:
fit('increment', (done) => {
new Parse.Object('Counter')
.set('value', 1)
.save()
.then(result => {
console.log('just saved', JSON.stringify(result));
return result
.increment('value')
.save();
})
.then(result => {
console.log('incremented', JSON.stringify(result))
expect(result.get('value')).toBe(2);
done();
})
.catch(done.fail);
});
Behind the scenes, here's what's happening (if you're using mongo, there's similar for postgress too):
MongoAdapter
turns into a mongo $inc operation which is returned
Mongo documentation explaining $inc which includes "$inc is an atomic operation within a single document."
Related
I'm trying to write a query to get the unique values of an attribute from the final merged collection(sm-Survey-merged). Something like:
select distinct(participantID) from sm-Survey-merged;
I get a tree-cache error with the below equivalent JS query. Can someone help me with a better query?
[...new Set (fn.collection("sm-Survey-merged").toArray().map(doc => doc.root.participantID.valueOf()).sort(), "unfiltered")]
If there are a lot of documents, and you attempt to read them all in a single query, then you run the risk of blowing out the Expanded Tree Cache. You can try bumping up that limit, but with a large database with a lot of documents you are still likely to hit that limit.
The fastest and most efficient way to produce a list of the unique values is to create a range index, and select the values from that lexicon with cts.values().
Without an index, you could attempt to perform iterative queries that search and retrieve a set of random values, and then perform additional searches excluding those previously seen values. This still runs the risk of either blowing out the Expanded Tree Cache, timeouts, etc. So, may not be ideal - but would allow you to get some info now without reindexing the data.
You could experiment with the number of iterations and search page size and see if that stays within limits, and provides consistent results. Maybe add some logging or flags to know if you have hit the iteration limit, but still have more values returned to know if it's a complete list or not. You could also try running without an iteration limit, but run the risk of blowing OOM or ETC errors.
function distinctParticipantIDs(iterations, values) {
const participantIDs = new Set([]);
const docs = fn.subsequence(
cts.search(
cts.andNotQuery(
cts.collectionQuery("sm-Survey-merged"),
cts.jsonPropertyValueQuery("participantID", Array.from(values))
),
("unfiltered","score-random")),
1, 1000);
for (const doc of docs) {
const participantID = doc.root.participantID.valueOf();
participantIDs.add(participantID);
}
const uniqueParticipantIDs = new Set([...values, ...participantIDs]);
if (iterations > 0 && participantIDs.size > 0) {
//there are still unique values, and we haven't it our iterations limit, so keep searching
return distinctParticipantIDs(iterations - 1, uniqueParticipantIDs);
} else {
return uniqueParticipantIDs;
}
}
[...distinctParticipantIDs(100, new Set()) ];
Another option would be to run a CoRB job against the database, and apply the EXPORT-FILE-SORT option with ascending|distinct or descending|distinct, to dedup the values produced in an output file.
I'm trying to understand why, when I assign the results from an axios call to a variable, console logging said variable will show the complete object, yet consoling its length returns zero.
As such, when I try to run a forEach on the results, there is no love to be had.
getNumberOfCollections() {
let results = queries.getTable("Quality"); // imported function to grab an Airtable table.
console.log(results); // full array, i.e. ['bing', 'bong', 'boom']
console.log(results.length); // 0
results.forEach((result) =>{ // no love });
}
It is quite likely that when you console.log the array, the array is still empty.
console.log(results); // full array, i.e. ['bing', 'bong', 'boom']
console.log(results.length); // 0
when console.log(results.length) is run, it is doing the console.log(0) and that's why 0 is printed out.
When console.log(results) is run, it is going to print out the results array later. That array is populated later when console.log() finally runs. (so console.log is not synchronous -- it will print something out a little bit later on.)
You can try
console.log(JSON.stringify(results));
and you are likely to see an empty array, because JSON.stringify(results) immediately evaluates what it is and make it into a string at that current time, not later.
It looks like you are fetching some data. The correct way usually is by a callback or a promise's fulfillment handler:
fetch(" some url here ")
.then(response => response.json())
.then(data => console.log(data));
so you won't have the data until the callback or the "fulfillment handler" is invoked. If you console.log(results.length) at that time, you should get the correct length. (and the data is there).
I am currently parsing a list of js objects that are upserted to the db one by one, roughly like this with Node.js:
return promise.map(list,
return parseItem(item)
.then(upsertSingleItemToDB)
).then(all finished!)
The problem is that when the list sizes grew very big (~3000 items), parsing all the items in parallel is too memory heavy. It was really easy to add a concurrency limit with the promise library and not run out of memory that way(when/guard).
But I'd like to optimize the db upserts as well, since mongodb offers a bulkWrite function. Since parsing and bulk writing all the items at once is not possible, I would need to split the original object list in smaller sets that are parsed with promises in parallel and then the result array of that set would be passed to the promisified bulkWrite. And this would be repeated for the remaining sets if list items.
I'm having a hard time wrapping my head around how I can structure the smaller sets of promises so that I only do one set of parseSomeItems-BulkUpsertThem at time (something like Promise.all([set1Bulk][set2Bulk]), where set1Bulk is another array of parallel parser Promises?), any pseudo code help would be appreciated (but I'm using when if that makes a difference).
It can look something like this, if using mongoose and the underlying nodejs-mongodb-driver:
const saveParsedItems = items => ItemCollection.collection.bulkWrite( // accessing underlying driver
items.map(item => ({
updateOne: {
filter: {id: item.id}, // or any compound key that makes your items unique for upsertion
upsert: true,
update: {$set: item} // should be a key:value formatted object
}
}))
);
const parseAndSaveItems = (items, offset = 0, limit = 3000) => { // the algorithm for retrieving items in batches be anything you want, basically
const itemSet = items.slice(offset, limit);
return Promise.all(
itemSet.map(parseItem) // parsing all your items first
)
.then(saveParsedItems)
.then(() => {
const newOffset = offset + limit;
if (items.length >= newOffset) {
return parseAndSaveItemsSet(items, newOffset, limit);
}
return true;
});
};
return parseAndSaveItems(yourItems);
The first answer looks complete. However here are some other thoughts that came to mind.
As a hack-around, you could call a timeout function in the callback of your write operation before the next write operation performs. This can give your CPU and Memory a break inbetween calls. Even if you add one millisecond between calls, that is only adding 3 seconds if you have a total of 3000 write objects.
Or you can segment your array of insertObjects, and send them to their own bulk writer.
I'm using node.js for a project, and I have this certain structure in my code which is causing problems. I have an array dateArr of sequential dates that contains 106 items. I have an array resultArr to hold resulting data. My code structure is like this:
function grabData(value, index, dateArr) {
cassandra client execute query with value from dateArr {
if (!err) {
if (result has more than 0 rows) {
process the query data
push to resultArr
}
if (result is empty) {
push empty set to resultArr
}
}
}
}
dateArr.forEach(grabData);
I logged the size of resultArr after each iteration and it appears that on some iterations nothing is being pushed to resultArr. The code completes with only 66 items stored in resultArr when 106 items should be stored because the I/O structure between dateArr and resultArr is 1 to 1.
I logged the size of resultArr after each iteration
When the grabData method gets called you start a query to somewhere, or someone named cassandra. As Felix Kling wrote, your notation seems to show an asynchronous function, that starts the request and returns.
As the function is asynchronous, you don't know when the query is ready. That might even take very long, when the database is locked for a dump, or whatever.
When you return from grabData "iteration" and check your resultArr, the resultArr will exactly be filled with each returned value. It might even be that the fifth iteration returns a query before the third, or fourth or any iteration before. So in you resultArr you sometimes have values of iteration n at some point m<n or o>n.
As long as you (or we) don't know anything about how cassandra operates, you cannot say when a query gets answered.
So when you check your result array, it returns the number of completed queries, not the number of iterations.
Found the root cause: There is a hard limit when querying Cassandra using node.js. The query that I am trying to completely execute is too large. Breaking dateArr up into smaller chunks and querying using those smaller pieces solved the problem.
I am looking for a way to generate a unique ID for nosql database. Unlike relational database there is no idea of rows which means there is no last row to increment from.
The most common way to handle this is to use UUID's. But my problem is I need to add another ID (other than the UUID) which needs to be:
Unique
Unsigned Int32
Total data could reach around 50,000,000. So how would you generate somewhat unique uint32 ID's?
The UInt32 value type represents unsigned integers with values ranging from 0 to 4,294,967,295.
Only generated when a new user registers.
3 Id's are given to each new user.
Currently using Couchbase Server.
This problem has already been solved - I would suggest using the atomic Increment (or Decrement) functions in Couchbase - these are a common pattern to generate unique IDs.
Whenever the incr() method is called, it atomically increments the counter by the specified value, and returns the old value, therefore it's safe if two clients try to increment at the same time.
Pseudocode example (I'm no Node.JS expert!):
// Once, at the beginning of time we init the counter:
client.set("user::count", 0);
...
// Then, whenever a new user is needed:
nextID = client.incr("user::count", 1); // increments counter and returns 'old' value.
newKey = "user_" + nextID;
client.add(newKey, value);
See the Node.JS SDK for reference, and see Using Reference Doucments for Lookups section in the Couchbase Developer Guide for a complete usage example.
Here's a function that returns a unique identifier each time it's called. It should be fine as long as the number of items does not exceed the range of 32-bit integers, which seems to be the case given the described requirements. (warning: once the array of UIDs fills up, this enters an infinite loop. You may also want to create some sort of a reset function that can empty the array and thus reset the UID when necessary.)
var getUID = (function() {
var UIDs = [];
return function() {
var uid;
do {
uid = Math.random() * Math.pow(2, 32) | 0x0;
} while (UIDs[uid] !== undefined);
return UIDs[uid] = uid;
};
}());
if you will call this insert method by passing "user" as key then your docId will be auto increment as
user_0
user_1
user_2
etc...
Please note that couchbase will show one extra row in your bucket with key as meta id and next counter value as doc value. Do not get surprised if you use query like select count(*) total from table; as it will show one more than real count, to avoid use where clause so that this row won't be counted.
public insert(data: any, key: string) {
return new Promise((resolve, reject) => {
let bucket = CouchbaseConnectionManager.getBucket(`${process.env.COUCHBASE_BUCKET}`)
bucket.counter(key, 1, {initial:0}, (err:any, res:any)=>{
if(err){
this.responseHandler(err, res, reject, resolve);
}
const docId = key + "_" + res.value;
bucket.insert(docId, data, (error:any, result:any) => {
this.responseHandler(error, result, reject, resolve);
});
});
});
}