I'm trying to iterate through an array and create records for each iteratee. This is what I am doing like mentioned here at another question:
async.each(data, (datum, callback) => {
console.log('Iterated')
Datum.create({
row: datum,
}).exec((error) => {
if (error) return res.serverError(error)
console.log('Created')
callback()
})
})
Unfortunately, it results in this:
Iterated
Iterated
Iterated
Created
Created
Created
Not this as wanted:
Iterated
Created
Iterated
Created
Iterated
Created
What I'm doing wrong?
async.eachSeries() will run one iteration at a time and wait for each iteration to be terminated before pursuing the next step.
I create an unique user friendly identifier before each creation (like 1, 2, 3 and so on). For that, I've to query the data base to find the latest identifier and increment it which is not available because the records are nearly created at the same time.
This sounds like here's the bottleneck. I don't like running async code in series because this usually slows processes down. How about this approach:
Due to data you know how many identifier you'll need.
implement a function in the backend that doesn't create a single but n such identifier at a time (including the necessary incrementing, etc.) and return that Array to the frontend. Now you can run your regular requests in paralell using/mapping that array of precomputed IDs to the data-array.
This should reduces the runtime from (createAnId + request) * data.length pretty much down to to the runtime of a single iteration. Due to the fact that all these requests can run in paralell, and therefore mostly overlap.
It looks like Datum.create is an asynchronous function.
The forEach whips through each of the three elements of the array, logging them in turn. and since JavaScript won't block prior to the asynchronous events being returned, you get each of the console.logs in turn.
Then after some amount of time, the results come in and "created" is logged to the console.
You seem to be using an asynchronous data processing library. For the result you intend to get, you need to process the data synchronously. Here's how you could do it:
data.forEach(function(datum) {
console.log('Iterated')
Datum.create({
row: datum,
}).exec((error) => {
if (error) return res.serverError(error)
console.log('Created')
callback()
})
})
You may also want to remove the callback function entirely now since the data is processed synchronously.
Related
I am trying to check and insert 1000 vertices in chunk using promise.all(). The code is as follows:
public async createManyByKey(label: string, key: string, properties: object[]): Promise<T[]> {
const promises = [];
const allVertices = __.addV(label);
const propKeys: Array<string> = Object.keys(properties[0]);
for(const propKey of propKeys){
allVertices.property(propKey, __.select(propKey));
}
const chunkedProperties = chunk(properties, 5); // [["demo-1", "demo-2", "demo-3", "demo-4", "demo-5"], [...], ...]
for(const property of chunkedProperties){
const singleQuery = this.g.withSideEffect('User', property)
.inject(property)
.unfold().as('data')
.coalesce(__.V().hasLabel(label).where(eq('data')).by(key).by(__.select(key)), allVertices).iterate();
promises.push(singleQuery);
}
const result = await Promise.all(promises);
return result;
}
This code throws ConcurrentModificationException. Need help to fix/improve this issue.
I'm not quite sure about the data and parameters you are using, but I needed to modify your query a bit to get it to work with a data set I have handy (air routes) as shown below. I did this to help me think through what your query is doing. I had to change the second by step. I'm not sure how that was working otherwise.
gremlin> g.inject(['AUS','ATL','XXX']).unfold().as('d').
......1> coalesce(__.V().hasLabel('airport').limit(10).
......2> where(eq('d')).
......3> by('code').
......4> by(),
......5> constant('X'))
==>v['3']
==>v['1']
==>X
While a query like this runs fine in isolation, once you start running several asynchronous promises (that contain mutating steps as in your query), what can happen is that one promise tries to access a part of the graph that is locked by another one. Even though the execution I believe is more "concurrent" than truly "parallel" if one promise yields due to an IO wait allowing another to run, the next one may fail if the prior promise already has locks in the database that the next promise also needs. In your case as you have a coalesce that references all vertices with a given label and properties, that can potentially cause conflicting locks to be taken. Perhaps it will work better if you await after each for loop iteration rather than do it all at the end in one big Promise.all.
Something else to keep in mind is that this query is going to be somewhat expensive regardless, as the mid traversal V is going to happen five times (in the case of your example) for each for loop iteration. This is because the unfold of the injected data is taken from chunks of size 5 and therefore spawns five traversers, each of which starts by looking at V.
EDITED 2021-11-17
As discussed a little in the comments, I suspect the most optimal path is actually to use multiple queries. The first query simply does a g.V(id1,id2,...) on all the IDs you are potentially going to add. Have it return a list of IDs found. Remove those from the set to add. Next break the adding part up into batches and do it without coalesce as you now know that those elements do not exist. This is most likely the best way to reduce locking and avoid the CMEs (exceptions). Unless someone else may be also trying to add them in parallel, this is the approach I think I would take.
I have a class in javascript, with the following structure:
class TableManager {
/** an array containing Table objects **/
protected Tables = [];
protected getTable(tableId) {
// iterates over this.Tables, and searches for a table with a specific id: if found, it returns the table object, otherwise it returns null
}
protected async createTable(tableId) {
const Table = await fetchTable(tableId); /** performs an asynchronous operation, that creates a Table object by performing a select operation on the database **/
this.Tables.push(Table);
return Table;
}
protected async joinTable(user, tableId) {
const Table = this.getTable(tableId) ?? await this.createTable(tableId);
Table.addUser(user);
}
}
The idea behind this class, is that it will receive commands via a socket. For example, it may receive the joinTable command, in which case, it should first check if the table that is being joined already exists in the memory: if it does, it will add the user to that table, otherwise, it will create the table, store it in the memory, and add the user to the table.
I am a bit concerned, that this could result in a race condition, if two joinTable() calls are made in a short amount of time, in which case the tables will be created twice, and stored in memory as two separate table instances. Am I right to be afraid about this? If yes, would checking if the table exists before adding it to the array in the createTable function, solve this race condition?
Your concern is right. the idea is transactions and make sure that there is only one transaction running at a given time. In Nodejs, you can use Mutex to implement that. Read more: https://www.nodejsdesignpatterns.com/blog/node-js-race-conditions/.
I am a bit concerned, that this could result in a race condition, if
two joinTable() calls are made in a short amount of time, in which
case the tables will be created twice, and stored in memory as two
separate table instances. Am I right to be afraid about this?
This shouldn't be a problem as long as you await each call (or chain it properly). That is to say, it won't be a problem as long as the operations are sequential. If you allow the promises to resolve at the same time (like with Promise.all) then yes, as it is right now, there would be a race condition.
If yes, would checking if the table exists before adding it to the array in
the createTable function, solve this race condition?
As I understand it, no, it would still create a race condition. The first function call would do the check, see the table does not exist and proceed to send the query to your server in order to create the new entry. The second function call would also do the check but since it's not waiting for the previous request, it's possible the check happens before the first request finishes (this is your race condition). That means another request can be sent to create another table.
What you can do is store your entry as a promise. I would use a Map for this:
protected Tables = new Map();
protected getTable(tableId) {
let Table = this.Tables.get(tableId);
if(!Table){
Table = fetchTable(tableId);
this.Tables.set(tableId, Table);
}
return Table;
}
This way, joinTable can instead do getTable which also creates the Table if it doesn't exist. If the Table is being created it will pick up the promise and no duplicates are ever made this way.
Ultimately, the creation or not of any entity on a server, needs to be managed there... on the server. Otherwise, you risk multiple clients (or even a client restart) creating these duplicates.
I want to perform MapReduce job on data in Riak DB using javascript. But stuck in very begining, i couldnot understand how it is returning value.
client = riak.RiakClient()
query = client.add('user')
query.map("""
function(v){
var i=0;
i++;
return [i];
}
""")
for result in query.run():
print "%s" % (result);
For simplicity i have checked the above example.
Here query is bucket and user contain five sets of data in RiakDB.
i think map() returns single value but it returns array with 5 value, i think equivalent to five set of data in RiakDB.
1
1
1
1
1
And here, why I can return only array? it treats each dataset independently, and returns for each. so i think i have five 1's. Due to this reason when i process fetched data inside map(), returns gives unexpected result for me.
so please give me some suggestion. i think it is basic thing but i couldnot get it. i highly appreciate your help.
When you run a MapReduce job, the map phase code is sent out to the vnodes where the data is stored and executed for each value in the data. The resulting arrays are collected and passed to a single reduce phase, which also returns an array. If there are sufficiently many results, the reduce phase may be run multiple times, with the previous reduce result and a batch of map results as input.
The fact that you are getting 5 results implies that 5 keys were seen in your bucket. There is no global state shared between instances of the map phase function, so each will have an independent i, which is why each result is 1.
You might try returning [v.key] so that you have something unique for each one, or if the values are expected to be small, you could return [JSON.stringify(v)] so you can see the entire structure that is passed to the map.
You should note that according to the docs site javascript Map Reduce has been officially deprecated, so you may want to use Erlang functions for new development.
I have a javascript application, that calls an api, and the api returns json. With the json, I select a specific object, and loop through that.
My code flow is something like this:
Service call -> GetResults
Loop through Results and build Page
The problem though, is sometimes that api returns only one result, so that means it returns an object instead of an array, so I cant loop through results. What would be the best way to go around this?
Should i convert my object, or single result to an arrary? Put/Push it inside an array? or should I do a typeof and check if the element is an array, then do the looping?
Thanks for the help.
//this is what is return when there are more than one results
var results = {
pages: [
{"pageNumber":204},
{"pageNumber":1024},
{"pageNumber":3012}
]
}
//this is what is returned when there is only one result
var results = {
pages: {"pageNumber": 105}
}
My code loops through results, just using a for loop, but it will create errors, since sometimes results is not an array. So again, do I check if its an array? Push results into a new array? What would be better. Thanks
If you have no control over the server side, you could do a simple check to make sure it's an array:
if (!(results.pages instanceof Array)) {
results.pages = [results.pages];
}
// Do your loop here.
Otherwise, this should ideally happen on the server; it should be part of the contract that the results can always be accessed in a similar fashion.
Arrange whatever you do to your objects inside the loop into a separate procedure and if you discover that the object is not an array, apply the procedure to it directly, otherwise, apply it multiple times to each element of that object:
function processPage(page) { /* do something to your page */ }
if (pages instanceof Array) pages.forEach(processPage);
else processPage(pages);
Obvious benefits of this approach as compared to the one, where you create a redundant array is that, well, you don't create a redundant array and you don't modify the data that you received. While at this stage it may not be important that the data is intact, in general it might cause you more troubles, when running integration and regression tests.
I'm sitting here for a while now wondering why I'm losing an array parameter on a function call when calling it a second time.
The script I'm working on is mapped after CouchDB/PouchDB and stores items as JSON strings in multiple storages (including local storage). Parameters are:
_id id of the item
_rev revision string (version), counter and hash
_content whatever content
_revisions array of all prior hashes and current counter
_revs_info all previous revisions of this item with status
I'm currently trying a PUT operation which by default updates an existing document. As I'm working with multiple storages, I also have a PUT SYNC, which "copy&pastes" versions of a document from one storage to another (with the goal having every version available on every storage). I'm also keeping a separate file with a document tree, which stores all the version hashes. This tree file is updated on SYNCs using the _revs_info supplied with the PUT.
My problem is sequential SYNC PUTs. The first one works, on the second I'm losing the _revs_info parameter. And I don't know why...
Here is my first call (from my QUnit module), which works fine:
o.jio.put({
"content":'a_new_version',
"_id":'myDoc',
"_rev":"4-b5bb2f1657ac5ac270c14b2335e51ef1ffccc0a7259e14bce46380d6c446eb89",
"_revs_info":[
{"rev":"4-b5bb2f1657ac5ac270c14b2335e51ef1ffccc0a7259e14bce46380d6c446eb89","status":"available"},
{"rev":"3-a9dac9ff5c8e1b2fce58e5397e9b6a8de729d5c6eff8f26a7b71df6348986123","status":"deleted"},
{"rev":fake_rev_1,"status":"deleted"},
{"rev":fake_rev_0,"status":"deleted"}
],
"_revisions":{
"start":4,
"ids":[
"b5bb2f1657ac5ac270c14b2335e51ef1ffccc0a7259e14bce46380d6c446eb89",
"a9dac9ff5c8e1b2fce58e5397e9b6a8de729d5c6eff8f26a7b71df6348986123",
fake_id_1,
fake_id_0
]}
},
function(err, response) {
// run tests
});
However, when I call the same function a second time:
o.jio.put({
"content":'a_deleted_version',
"_id":'myDoc',
"_rev":"3-05210795b6aa8cb5e1e7f021960d233cf963f1052b1a41777ca1a2aff8fd4b61",
"_revs_info":[ {"rev":"3-05210795b6aa8cb5e1e7f021960d233cf963f1052b1a41777ca1a2aff8fd4b61","status":"deleted"},{"rev":"2-67ac10df5b7e2582f2ea2344b01c68d461f44b98fef2c5cba5073cc3bdb5a844","status":"deleted"},{"rev":fake_rev_2,"status":"deleted"}],
"_revisions":{
"start":3,
"ids":[
"05210795b6aa8cb5e1e7f021960d233cf963f1052b1a41777ca1a2aff8fd4b61",
"67ac10df5b7e2582f2ea2344b01c68d461f44b98fef2c5cba5073cc3bdb5a844",
fake_id_2
]}
},
function(err, response) {
// run tests
});
My script fails, because the _revs_info array does not include anything. All other parameters and all random parameters I'm adding are transferred. If I add a string or object instead of an array they also safely make it into my script alive.
Array however... does not pass...
Question:
I have been sitting on this for a few hours trying to nail down points I have not found, but I'm pretty clueless. So does anyone know of reasons, why arrays might lose their content, when passing them on as parameters in Javascript?
Thanks!
EDIT:
I added a regular PUT after my first SYNC-PUT, which passed fine (without _revs_info being defined).
It's completely possible for a JavaScript function to mutate an array passed in. Consider this example:
function removeAll(a) { a.splice(0); }
var arr = [1, 2, 3];
removeAll(arr);
console.log(arr); // empty array