I'm using node.js for a project, and I have this certain structure in my code which is causing problems. I have an array dateArr of sequential dates that contains 106 items. I have an array resultArr to hold resulting data. My code structure is like this:
function grabData(value, index, dateArr) {
cassandra client execute query with value from dateArr {
if (!err) {
if (result has more than 0 rows) {
process the query data
push to resultArr
}
if (result is empty) {
push empty set to resultArr
}
}
}
}
dateArr.forEach(grabData);
I logged the size of resultArr after each iteration and it appears that on some iterations nothing is being pushed to resultArr. The code completes with only 66 items stored in resultArr when 106 items should be stored because the I/O structure between dateArr and resultArr is 1 to 1.
I logged the size of resultArr after each iteration
When the grabData method gets called you start a query to somewhere, or someone named cassandra. As Felix Kling wrote, your notation seems to show an asynchronous function, that starts the request and returns.
As the function is asynchronous, you don't know when the query is ready. That might even take very long, when the database is locked for a dump, or whatever.
When you return from grabData "iteration" and check your resultArr, the resultArr will exactly be filled with each returned value. It might even be that the fifth iteration returns a query before the third, or fourth or any iteration before. So in you resultArr you sometimes have values of iteration n at some point m<n or o>n.
As long as you (or we) don't know anything about how cassandra operates, you cannot say when a query gets answered.
So when you check your result array, it returns the number of completed queries, not the number of iterations.
Found the root cause: There is a hard limit when querying Cassandra using node.js. The query that I am trying to completely execute is too large. Breaking dateArr up into smaller chunks and querying using those smaller pieces solved the problem.
Related
I'm trying to write a query to get the unique values of an attribute from the final merged collection(sm-Survey-merged). Something like:
select distinct(participantID) from sm-Survey-merged;
I get a tree-cache error with the below equivalent JS query. Can someone help me with a better query?
[...new Set (fn.collection("sm-Survey-merged").toArray().map(doc => doc.root.participantID.valueOf()).sort(), "unfiltered")]
If there are a lot of documents, and you attempt to read them all in a single query, then you run the risk of blowing out the Expanded Tree Cache. You can try bumping up that limit, but with a large database with a lot of documents you are still likely to hit that limit.
The fastest and most efficient way to produce a list of the unique values is to create a range index, and select the values from that lexicon with cts.values().
Without an index, you could attempt to perform iterative queries that search and retrieve a set of random values, and then perform additional searches excluding those previously seen values. This still runs the risk of either blowing out the Expanded Tree Cache, timeouts, etc. So, may not be ideal - but would allow you to get some info now without reindexing the data.
You could experiment with the number of iterations and search page size and see if that stays within limits, and provides consistent results. Maybe add some logging or flags to know if you have hit the iteration limit, but still have more values returned to know if it's a complete list or not. You could also try running without an iteration limit, but run the risk of blowing OOM or ETC errors.
function distinctParticipantIDs(iterations, values) {
const participantIDs = new Set([]);
const docs = fn.subsequence(
cts.search(
cts.andNotQuery(
cts.collectionQuery("sm-Survey-merged"),
cts.jsonPropertyValueQuery("participantID", Array.from(values))
),
("unfiltered","score-random")),
1, 1000);
for (const doc of docs) {
const participantID = doc.root.participantID.valueOf();
participantIDs.add(participantID);
}
const uniqueParticipantIDs = new Set([...values, ...participantIDs]);
if (iterations > 0 && participantIDs.size > 0) {
//there are still unique values, and we haven't it our iterations limit, so keep searching
return distinctParticipantIDs(iterations - 1, uniqueParticipantIDs);
} else {
return uniqueParticipantIDs;
}
}
[...distinctParticipantIDs(100, new Set()) ];
Another option would be to run a CoRB job against the database, and apply the EXPORT-FILE-SORT option with ascending|distinct or descending|distinct, to dedup the values produced in an output file.
I'm trying to understand why, when I assign the results from an axios call to a variable, console logging said variable will show the complete object, yet consoling its length returns zero.
As such, when I try to run a forEach on the results, there is no love to be had.
getNumberOfCollections() {
let results = queries.getTable("Quality"); // imported function to grab an Airtable table.
console.log(results); // full array, i.e. ['bing', 'bong', 'boom']
console.log(results.length); // 0
results.forEach((result) =>{ // no love });
}
It is quite likely that when you console.log the array, the array is still empty.
console.log(results); // full array, i.e. ['bing', 'bong', 'boom']
console.log(results.length); // 0
when console.log(results.length) is run, it is doing the console.log(0) and that's why 0 is printed out.
When console.log(results) is run, it is going to print out the results array later. That array is populated later when console.log() finally runs. (so console.log is not synchronous -- it will print something out a little bit later on.)
You can try
console.log(JSON.stringify(results));
and you are likely to see an empty array, because JSON.stringify(results) immediately evaluates what it is and make it into a string at that current time, not later.
It looks like you are fetching some data. The correct way usually is by a callback or a promise's fulfillment handler:
fetch(" some url here ")
.then(response => response.json())
.then(data => console.log(data));
so you won't have the data until the callback or the "fulfillment handler" is invoked. If you console.log(results.length) at that time, you should get the correct length. (and the data is there).
I have an array of objects which I am using a for/in loop to manipulate into key/value pairs, which as it works writes each to a database.
My app is designed to send a push notification when the database is updated, which works as intended, but when there are multiple items in the loop it will send a notification per each of these objects (expected, but not so intended)
My simple loop code is:
for(var idx in multiSelectSaveArr) {
calendarPage.saveCalendarMultiItem(time = multiSelectSaveArr[idx].creationDate, multiSelectSaveArr[idx])
}
The purpose of this is that time will become the unique key, whilst the value is a JSON object.
my calendarPage.saveCalendarMultiItem is a signal to my dataModel, which is where the function to store the data is, so each iteration then calls a function which handles the database writing as an individual object
The nature of my loop is that sometimes it could contain 3 objects, other times 30 - it is a different number each time.
My Question Is
Is there a way I can signal the end of the loop, meaning once the final object has been iterated through to then only send a single notification?
Thanks for any help!
You can use Array.prototype.forEach to check the index and get the array value during iteration, once the idx is at the last but one index you call you notification function to signal the iteration is over:
multiSelectSaveArr.forEach(function(val, idx){
calendarPage.saveCalendarMultiItem(time = val.creationDate, val);
if(idx === multiSelectSaveArr.length - 1) notificationCallback();
}
You can also do it using Array.prototype.entries and the for..of loop which is designed for iteration of an iterable like an array:
for(const [idx, val] of multiSelectSaveArr.entries()){
calendarPage.saveCalendarMultiItem(time = val.creationDate, val);
if(idx === multiSelectSaveArr.length - 1) notificationCallback();
}
The reason I am calling the notificationCallback() inside the loop along with the if condition is because I assume you won't send the notification if the array is empty, if I placed the callback call after the loop it would've worked regardless and also it didn't require an if, but that would still call the notificationCallback() although the loop never ran.
Do not use for..in loop to iterate over the array as it iterates over property values of an object (in this case an array object) in an unspecified order.
See this answer for more details.
[{"creationDate":"2011-03-13T00:17:25.000Z","fileName":"IMG_0001.JPG"},
{"creationDate":"2009-10-09T21:09:20.000Z","fileName":"IMG_0002.JPG"}]
[{"creationDate":"2012-10-08T21:29:49.800Z","fileName":"IMG_0004.JPG",
{"creationDate":"2010-08-08T18:52:11.900Z","fileName":"IMG_0003.JPG"}]
I use a HTTP get method to receive data. Unfortunately, while I do receive this data in chunks, it is not sorted by creationDate DESCENDING.
I need to sort these objects by creationDate my expected result would be.
[{"creationDate":"2012-10-08T21:29:49.800Z","fileName":"IMG_0004.JPG"},
{"creationDate":"2011-03-13T00:17:25.000Z","fileName":"IMG_0001.JPG"}]
[{"creationDate":"2010-08-08T18:52:11.900Z","fileName":"IMG_0003.JPG"},
{"creationDate":"2009-10-09T21:09:20.000Z","fileName":"IMG_0002.JPG"}]
Here's what I tried:
dataInChunks.map(data => {
return data.sort((a,b)=> {
return new Date(b.creationDate).getTime() - new Date(a.creationDate).getTime();
});
})
.subscribe(data => {
console.log(data);
})
This works only but only 1 object at a time which results in giving me the very top result. I need some way to join these chunks together and sort them and in some way break the whole object again into chunks of two.
Are there any RSJX operators I can use for this?
If you know the call definitely completes (which it should) then you can just use toArray which as the name suggests returns an array, which you can then sort. The point of toArray is that it won't produce a stream of data but will wait until the observer completes and return all values:
var allData = dataInChunks.toArray().sort(/*sorting logic*/);
However, if you are required to show the data in the browser as it arrives (if the toArray() approach makes the UI feel unresponsive), then you will have to re-sort the increasing dataset as it arrives:
var allData =[];
dataInChunks
.bufferWithCount(4)
.subscribe(vals => {
allData = allData.concat(vals);
allData.sort(/* sort logic*/);
})
This is slightly hacky as it's relying on a variable outside the stream, but yet get the idea. It uses a buffer bufferWithCount which will allow you to limit the number of re-sorts you do.
TBH, I would just go with the toArray approach, which begs the question why it's an observable in the first place! Good luck.
I am somewhat new to JavaScript, but I am reasonably experienced with programming in general. I suspect that my problem may have something to do with scoping, or the specifics of passing arrays as parameters, but I am uncertain.
The high-level goal is to have live plotting with several 'nodes', each of which generates 50 points/sec. I have gotten this working running straight into an array and rendered by dygraphs and C3.js and quickly realized that this is too much data to continually live render. Dygraphs seems to start impacting the user experience after about 30s and C3.js seems to choke at around 10s.
The next attempt is to decimate the plotted data based on zoom level.
I have data saved into an 'object' which I am using somewhat like a dictionary in other languages. This is going well using AJAX requests. The idea is to create a large data buffer using AJAX requests and use the keys to store the data generated by units according to the serial number as the keys. This is working well and the object is being populated as expected. I feel that it is informative to know the 'structure' of this object before I get to my question. It is as follows:
{
1: [[x10,y10], [x11,y11], [...], [x1n, y1n]],
2: [[x20,y20], [x21,y21], [...], [x2n, y2n]],
... : [ ... ]
a: [[xa0,ya0], [xa1,ya1], [...], [xan, yan]]
}
Periodically, a subset of that data will be used to generate a dygraphs plot. I am decimating the stored data and creating a 'plot buffer' to hold a subset of the actual data.
The dygraphs library takes data in several ways, but I would like to structure it 'natively', which is just an array of arrays. Each array within the array is a 'row' of data. All rows must have the same number of elements in order to line up into columns. The data generated may or may not be at the same time. If the data x values perfectly match, then the resulting data would look like the following for only two nodes since x10 = x20 = xn0:
[
[x10, y10, y20],
[x11, y11, y21],
[ ... ],
[xan, yan, yan]
]
Note that this is just x and y in rows. In reality, the times for each serial number may not line up, so it may be much closer to:
[
[x10, y10, null],
[x20, null, y20],
[x11, y11, y21],
[ ... ],
[xan, yan, yan]
]
Sorry for all of the background. We can get to the code tha tI'm having trouble with. I'm periodically attempting to create the plot buffer using the following code:
window.intervalId = setInterval(
function(){
var plotData = formatData(nodeData, 45000, 49000, 200);
/* dygraphs stuff here */
},
500
);
function formatData(dataObject, start, end, stride){
var smallBuffer = [];
var keys = Object.keys(dataObject);
keys.forEach(
function(key){
console.log('key: ', key);
mergeArrays(dataObject[key], smallBuffer, start, end, stride);
}
);
return smallBuffer;
}
function mergeArrays(sourceData2D, destDataXD, startInMs, endInMs, strideInMs){
/* ensure that the source data isn't undefined */
if(sourceData2D){
/* if the destDataXD is empty, then basically copy the
* sourceData2D into it as-is taking the stride into account */
if(destDataXD.length == 0){
/* does sourceData2D have a starting point in the time range? */
var startIndexSource = indexNear2D(sourceData2D, startInMs);
var lastTimeInMs = sourceData2D[startIndexSource][0];
for(var i=startIndexSource; i < sourceData2D.length; i++){
/* start to populate the destDataXD based on the stride */
if(sourceData2D[i][0] >= (lastTimeInMs + strideInMs)){
destDataXD.push(sourceData2D[i]);
lastTimeInMs = sourceData2D[i][0];
}
/* when the source data is beyond the time, then break the loop */
if(sourceData2D[i][0] > endInMs){
break;
}
}
}else{
/* the destDataXD already has data in it, so this needs to use that data
* as a starting point to merge the new data into the destination array */
var finalColumnCount = destDataXD[0].length + 1;
console.log('final column count: ', finalColumnCount);
/* add the next column to each existing row as 'null' */
destDataXD.forEach(
function(element){
element.push(null);
}
);
/* TODO: move data into destDataXD from sourceData2D */
}
}
}
To add some information since it probably isn't self-explanatory without some effort. I create two functions, 'formatData' and 'mergeArrays'. These could have been done in a single function, but it was easier for me to separate out the 'object' domain and the 'array' domain conceptually. The 'formatData' function simply iterates through all of the data stored in each key, calling the 'mergeArray' routine each time through. The 'mergeArray' routine is not yet complete and is where I'm having my issue.
The first time through, formatData should be creating an empty array - smallBuffer - into which data is merged using mergeArrays. The first time executing 'mergeArrays' I see that the smallBuffer is indeed being created and is an empty array. This empty array is supplied as a parameter to 'mergeArrays' and - the first time through - this works perfectly. The next time through, the 'smallBuffer' array is no longer empty, so the second case in 'mergeArrays' gets executed. The first step for me was to calculate the number of columns so that I could pad each row appropriately. This worked fine, but helped point out the problem. The next step was to simply append an empty column of 'null' values to each row. This is where things got weird. After the 1st time through 'mergeData', the destDataXD still contained 'null' data from the previous executions. In essence, it appears that the 'var smallBuffer = [];' doesn't actually clear and retains something. That something is not apparent until near the end. I can't explain exactly what is going on b/c I don't fully understand it, but destDataXD continually grows 'nulls' at the end without ever being properly reset as expected.
Thank you for the time and I look forward to hearing your thoughts, j
Quickly reading through the code, the danger point I see is where you first add an element to destDataXD.
destDataXD.push(sourceData2D[i]);
Note that you are not pushing a copy of the array. You are adding a reference to that array. destDataXD and sourceData2D are now sharing the same data.
So, of course, when you push any null values onto an array in destDataXD, you are also modifying sourceData2D.
You should use the javascript array-copying method slice
destDataXD.push(sourceData2D[i].slice());