ConcurrentModificationException in amazon neptune using gremlin javascript language variant - javascript

I am trying to check and insert 1000 vertices in chunk using promise.all(). The code is as follows:
public async createManyByKey(label: string, key: string, properties: object[]): Promise<T[]> {
const promises = [];
const allVertices = __.addV(label);
const propKeys: Array<string> = Object.keys(properties[0]);
for(const propKey of propKeys){
allVertices.property(propKey, __.select(propKey));
}
const chunkedProperties = chunk(properties, 5); // [["demo-1", "demo-2", "demo-3", "demo-4", "demo-5"], [...], ...]
for(const property of chunkedProperties){
const singleQuery = this.g.withSideEffect('User', property)
.inject(property)
.unfold().as('data')
.coalesce(__.V().hasLabel(label).where(eq('data')).by(key).by(__.select(key)), allVertices).iterate();
promises.push(singleQuery);
}
const result = await Promise.all(promises);
return result;
}
This code throws ConcurrentModificationException. Need help to fix/improve this issue.

I'm not quite sure about the data and parameters you are using, but I needed to modify your query a bit to get it to work with a data set I have handy (air routes) as shown below. I did this to help me think through what your query is doing. I had to change the second by step. I'm not sure how that was working otherwise.
gremlin> g.inject(['AUS','ATL','XXX']).unfold().as('d').
......1> coalesce(__.V().hasLabel('airport').limit(10).
......2> where(eq('d')).
......3> by('code').
......4> by(),
......5> constant('X'))
==>v['3']
==>v['1']
==>X
While a query like this runs fine in isolation, once you start running several asynchronous promises (that contain mutating steps as in your query), what can happen is that one promise tries to access a part of the graph that is locked by another one. Even though the execution I believe is more "concurrent" than truly "parallel" if one promise yields due to an IO wait allowing another to run, the next one may fail if the prior promise already has locks in the database that the next promise also needs. In your case as you have a coalesce that references all vertices with a given label and properties, that can potentially cause conflicting locks to be taken. Perhaps it will work better if you await after each for loop iteration rather than do it all at the end in one big Promise.all.
Something else to keep in mind is that this query is going to be somewhat expensive regardless, as the mid traversal V is going to happen five times (in the case of your example) for each for loop iteration. This is because the unfold of the injected data is taken from chunks of size 5 and therefore spawns five traversers, each of which starts by looking at V.
EDITED 2021-11-17
As discussed a little in the comments, I suspect the most optimal path is actually to use multiple queries. The first query simply does a g.V(id1,id2,...) on all the IDs you are potentially going to add. Have it return a list of IDs found. Remove those from the set to add. Next break the adding part up into batches and do it without coalesce as you now know that those elements do not exist. This is most likely the best way to reduce locking and avoid the CMEs (exceptions). Unless someone else may be also trying to add them in parallel, this is the approach I think I would take.

Related

Split array into chunks/bulks and made operations with them one by one

I'm a bit sorry about tags, probably I understood my problem not right and used them wrong but..
The problem I'm faced with my project is new for me and I never before experienced it. So in my case I have a huge dataset response from DB (Mongo, 100'000+ docs) and I needed to http-request every specific field from doc.
Example array from dataset will be like:
{
_id: 1,
http: http.request.me
},
{
//each doc of 99k docs more
}
So guess you already understood that I cannot use default for loop because
if it async I'll be made a huge amount request to API and will
be banned/restricted/whatever
if I made it one-by-one it will take me about 12-23H of
waiting before my loop completes itself. (actually, this way is in
use)
This is what I'm trying to do right now
there is also another way and that's why I'm here. I could split my huge array in to chunks for example each 5/10/100..N and request them one-by-one
│→await[request_map 0,1,2,3,4]→filled
│→await[request_map 5..10]→filled
│→await[request_map n..n+5]→filled
↓
According to the Split array into chunks I could easily do it. But then I should use 2 for cycles, first one will split default array and second async-request this new array (length 5/10/100...N)
But I have recently heard about reactive paradigm and RxJS that (probably) could solve this. Is this right? What operator should I use? What keyword should I use to find relative problems? (if I google reactive programming I'll receive a lot of useless result with react.js but not what I want)
So should I care about all this and just write an unoptimized code or there is an npm-module for that or another-better-pattern/solution?
Probably I found and answer here
RxJS 1 array item into sequence of single items - operator I'm checking it now, but I also appreciate any relevant contribution
to this question
RxJS has truly been helpful in this case and worth looking. It's an
elegant solution for this kind of problems
Make use of bufferCount and concatMap
range(0,100).pipe(
// save each http call into array as observable but not executing them
map(res=>http(...)),
//5 at a time
bufferCount(5),
//execute calls concurrently and in a queue of 5 calls each time
concatMap(res=>forkJoin(res))
).subscribe(console.log)
There's actually an even easier way to do what you want with mergeMap operator and it's second optional argument which sets the number of concurrent inner Observables:
from([obj1, obj2, obj3, ...]).pipe(
mergeMap(obj => /* make a request out of `obj` */, 5), // keep only 5 concurrent requests
).subscribe(result => ...)

MongoDB query performance with Promise

currently I'm working on a personal project which i'm struggling with two ways of doing a query on MongoDB.
CustomerSchema.methods.GetOrders = function(){
return Promise.all(
this.orders.map(orderId => Order.findOne(orderId))
);
};
// This will find all of an user order by their ObjectId
const orders = await Order.find({customerId:req.params});
My question is which one way is better and why? Or what are their pros and cons?
I tested here and the firt one method is performed in double time.
The find() approach should be faster for numerous reasons. First, it sends one query as opposed to one for every order, so fewer round trips to the database. A much less significant performance impact also comes from the fact that you're creating a new promise for every findOne() call.
Also, I don't think GetOrders() as written actually works. I think you meant to use Order.findById(orderId), or Order.findOne({ _id: orderId })

Synchronous iteration not working in JavaScript

I'm trying to iterate through an array and create records for each iteratee. This is what I am doing like mentioned here at another question:
async.each(data, (datum, callback) => {
console.log('Iterated')
Datum.create({
row: datum,
}).exec((error) => {
if (error) return res.serverError(error)
console.log('Created')
callback()
})
})
Unfortunately, it results in this:
Iterated
Iterated
Iterated
Created
Created
Created
Not this as wanted:
Iterated
Created
Iterated
Created
Iterated
Created
What I'm doing wrong?
async.eachSeries() will run one iteration at a time and wait for each iteration to be terminated before pursuing the next step.
I create an unique user friendly identifier before each creation (like 1, 2, 3 and so on). For that, I've to query the data base to find the latest identifier and increment it which is not available because the records are nearly created at the same time.
This sounds like here's the bottleneck. I don't like running async code in series because this usually slows processes down. How about this approach:
Due to data you know how many identifier you'll need.
implement a function in the backend that doesn't create a single but n such identifier at a time (including the necessary incrementing, etc.) and return that Array to the frontend. Now you can run your regular requests in paralell using/mapping that array of precomputed IDs to the data-array.
This should reduces the runtime from (createAnId + request) * data.length pretty much down to to the runtime of a single iteration. Due to the fact that all these requests can run in paralell, and therefore mostly overlap.
It looks like Datum.create is an asynchronous function.
The forEach whips through each of the three elements of the array, logging them in turn. and since JavaScript won't block prior to the asynchronous events being returned, you get each of the console.logs in turn.
Then after some amount of time, the results come in and "created" is logged to the console.
You seem to be using an asynchronous data processing library. For the result you intend to get, you need to process the data synchronously. Here's how you could do it:
data.forEach(function(datum) {
console.log('Iterated')
Datum.create({
row: datum,
}).exec((error) => {
if (error) return res.serverError(error)
console.log('Created')
callback()
})
})
You may also want to remove the callback function entirely now since the data is processed synchronously.

Riak MapReduce in single node using javascript and python

I want to perform MapReduce job on data in Riak DB using javascript. But stuck in very begining, i couldnot understand how it is returning value.
client = riak.RiakClient()
query = client.add('user')
query.map("""
function(v){
var i=0;
i++;
return [i];
}
""")
for result in query.run():
print "%s" % (result);
For simplicity i have checked the above example.
Here query is bucket and user contain five sets of data in RiakDB.
i think map() returns single value but it returns array with 5 value, i think equivalent to five set of data in RiakDB.
1
1
1
1
1
And here, why I can return only array? it treats each dataset independently, and returns for each. so i think i have five 1's. Due to this reason when i process fetched data inside map(), returns gives unexpected result for me.
so please give me some suggestion. i think it is basic thing but i couldnot get it. i highly appreciate your help.
When you run a MapReduce job, the map phase code is sent out to the vnodes where the data is stored and executed for each value in the data. The resulting arrays are collected and passed to a single reduce phase, which also returns an array. If there are sufficiently many results, the reduce phase may be run multiple times, with the previous reduce result and a batch of map results as input.
The fact that you are getting 5 results implies that 5 keys were seen in your bucket. There is no global state shared between instances of the map phase function, so each will have an independent i, which is why each result is 1.
You might try returning [v.key] so that you have something unique for each one, or if the values are expected to be small, you could return [JSON.stringify(v)] so you can see the entire structure that is passed to the map.
You should note that according to the docs site javascript Map Reduce has been officially deprecated, so you may want to use Erlang functions for new development.

How to find all entries of more than one type of record in Ember.js?

I have two types of records in the Ember DS.Store: users and location. (In fact I have more, but for the sake of simplicity).
Now to get all entries of, say, 'user', I would simply do
this.store.find('user')
Now say I have a variable allResults and I assign it to this command.
allResults = this.store.find('user')
This gives me what Ember calls a promiseArray, and to do anything after this promise array loads I simply
allResults.then(successfunction(), failurefunction())
Now this works great when I need only one type of record - say I need only users, I can easily call my successfunction() as the first argument.
However, my need goes beyond that: I am basically building a searchbar that searches through these records, so if someone types in "mary", it needs to both show the user "Mary" and the location "mary Ave" for example.)
So I need the combined results of
this.store.find('user')
&
this.store.find('location')
Therefore, here are my questions: [I feel that either of them would work.]
Is there a way I can fetch all data pertaining to both 'user' and 'location' and have it returned as one glorious promiseArray? This seems likely and also the best way of approaching this issue.
Can you concatenate two promiseArrays to make one larger one, then user the .then function for the larger one? If so, how?
You can combine promises.
http://emberjs.com/api/classes/Ember.RSVP.Promise.html
Something like this should work for you:
var bothPromise = Promise.all([
store.find('user'),
store.find('location')
]).then(function(values){
//merge the values arrays
var all = Em.A();
all.addObjects(values[0]); //users
all.addObjects(values[1]); //locations
return all;
});
bothPromise.then(function(allObjects){...
See the "Combine" section at this promise library called Q for another explanation of combining promises.

Categories

Resources