JavaScript - Can't seem to 'reset' array passed to a function - javascript

I am somewhat new to JavaScript, but I am reasonably experienced with programming in general. I suspect that my problem may have something to do with scoping, or the specifics of passing arrays as parameters, but I am uncertain.
The high-level goal is to have live plotting with several 'nodes', each of which generates 50 points/sec. I have gotten this working running straight into an array and rendered by dygraphs and C3.js and quickly realized that this is too much data to continually live render. Dygraphs seems to start impacting the user experience after about 30s and C3.js seems to choke at around 10s.
The next attempt is to decimate the plotted data based on zoom level.
I have data saved into an 'object' which I am using somewhat like a dictionary in other languages. This is going well using AJAX requests. The idea is to create a large data buffer using AJAX requests and use the keys to store the data generated by units according to the serial number as the keys. This is working well and the object is being populated as expected. I feel that it is informative to know the 'structure' of this object before I get to my question. It is as follows:
{
1: [[x10,y10], [x11,y11], [...], [x1n, y1n]],
2: [[x20,y20], [x21,y21], [...], [x2n, y2n]],
... : [ ... ]
a: [[xa0,ya0], [xa1,ya1], [...], [xan, yan]]
}
Periodically, a subset of that data will be used to generate a dygraphs plot. I am decimating the stored data and creating a 'plot buffer' to hold a subset of the actual data.
The dygraphs library takes data in several ways, but I would like to structure it 'natively', which is just an array of arrays. Each array within the array is a 'row' of data. All rows must have the same number of elements in order to line up into columns. The data generated may or may not be at the same time. If the data x values perfectly match, then the resulting data would look like the following for only two nodes since x10 = x20 = xn0:
[
[x10, y10, y20],
[x11, y11, y21],
[ ... ],
[xan, yan, yan]
]
Note that this is just x and y in rows. In reality, the times for each serial number may not line up, so it may be much closer to:
[
[x10, y10, null],
[x20, null, y20],
[x11, y11, y21],
[ ... ],
[xan, yan, yan]
]
Sorry for all of the background. We can get to the code tha tI'm having trouble with. I'm periodically attempting to create the plot buffer using the following code:
window.intervalId = setInterval(
function(){
var plotData = formatData(nodeData, 45000, 49000, 200);
/* dygraphs stuff here */
},
500
);
function formatData(dataObject, start, end, stride){
var smallBuffer = [];
var keys = Object.keys(dataObject);
keys.forEach(
function(key){
console.log('key: ', key);
mergeArrays(dataObject[key], smallBuffer, start, end, stride);
}
);
return smallBuffer;
}
function mergeArrays(sourceData2D, destDataXD, startInMs, endInMs, strideInMs){
/* ensure that the source data isn't undefined */
if(sourceData2D){
/* if the destDataXD is empty, then basically copy the
* sourceData2D into it as-is taking the stride into account */
if(destDataXD.length == 0){
/* does sourceData2D have a starting point in the time range? */
var startIndexSource = indexNear2D(sourceData2D, startInMs);
var lastTimeInMs = sourceData2D[startIndexSource][0];
for(var i=startIndexSource; i < sourceData2D.length; i++){
/* start to populate the destDataXD based on the stride */
if(sourceData2D[i][0] >= (lastTimeInMs + strideInMs)){
destDataXD.push(sourceData2D[i]);
lastTimeInMs = sourceData2D[i][0];
}
/* when the source data is beyond the time, then break the loop */
if(sourceData2D[i][0] > endInMs){
break;
}
}
}else{
/* the destDataXD already has data in it, so this needs to use that data
* as a starting point to merge the new data into the destination array */
var finalColumnCount = destDataXD[0].length + 1;
console.log('final column count: ', finalColumnCount);
/* add the next column to each existing row as 'null' */
destDataXD.forEach(
function(element){
element.push(null);
}
);
/* TODO: move data into destDataXD from sourceData2D */
}
}
}
To add some information since it probably isn't self-explanatory without some effort. I create two functions, 'formatData' and 'mergeArrays'. These could have been done in a single function, but it was easier for me to separate out the 'object' domain and the 'array' domain conceptually. The 'formatData' function simply iterates through all of the data stored in each key, calling the 'mergeArray' routine each time through. The 'mergeArray' routine is not yet complete and is where I'm having my issue.
The first time through, formatData should be creating an empty array - smallBuffer - into which data is merged using mergeArrays. The first time executing 'mergeArrays' I see that the smallBuffer is indeed being created and is an empty array. This empty array is supplied as a parameter to 'mergeArrays' and - the first time through - this works perfectly. The next time through, the 'smallBuffer' array is no longer empty, so the second case in 'mergeArrays' gets executed. The first step for me was to calculate the number of columns so that I could pad each row appropriately. This worked fine, but helped point out the problem. The next step was to simply append an empty column of 'null' values to each row. This is where things got weird. After the 1st time through 'mergeData', the destDataXD still contained 'null' data from the previous executions. In essence, it appears that the 'var smallBuffer = [];' doesn't actually clear and retains something. That something is not apparent until near the end. I can't explain exactly what is going on b/c I don't fully understand it, but destDataXD continually grows 'nulls' at the end without ever being properly reset as expected.
Thank you for the time and I look forward to hearing your thoughts, j

Quickly reading through the code, the danger point I see is where you first add an element to destDataXD.
destDataXD.push(sourceData2D[i]);
Note that you are not pushing a copy of the array. You are adding a reference to that array. destDataXD and sourceData2D are now sharing the same data.
So, of course, when you push any null values onto an array in destDataXD, you are also modifying sourceData2D.
You should use the javascript array-copying method slice
destDataXD.push(sourceData2D[i].slice());

Related

_colorIndex and _symbolIndex when creating object

Good day,
Im currently on a project that uses javascript as it's front end im having a hard time figuring out why does everytime that im creating a object there will be always a _symbolIndex and _colorIndex in the object.
ex. my codes looks like this.
test_data = {
name: data[i].number,
data: reply_stats*100
};
series_value.push(test_data);
the test_data object has
Object
_colorIndex:0
_symbolIndex:0
data:25
name:"09356152280"
but i only add name and data.
i used this for the highcharts
thanks in advance
I encountered the same issue, haven't found it in the docs (maybe it is there, I just haven't picked it up). So I went investigating a bit.
Turns out Highcharts autoinserts metadata _colorIndex and _symbolIndex properties into a serie object that is passed alone or as an array element into their series property.
If the object being passed already has _indexColor and _symbolIndex, for example from the previous loading into the chart, then these _someprefixIndexproperties will not be modified.
Otherwise, if these properties do not exist, new propertis will be autoinserted into each serie object (in the array passed to series property) in the following way:
autoinsert will start a counter value from 0 and incrementing it on each insert.
Not knowing this can lead to a problem that I had. The _colorIndex will dictate the autocoloring of series on the chart. I ended up having the same color for two series that I expected to be autocolored in different color. It was due to the fact I added one serie, for example i referred to it in my code as
var serieA = {
name: "Temperature",
data: [34, 32, ...]
}
Then I added another serie similarly, serieB as
var series = [
serieA,
serieB
]
It turned out that serieA already had _colorIndex === 0 from first loading, and serieB got the same _colorIndex === 0 on autoinsert
Hope it helps someone

How to organise/nest data for d3.js chart output

I'm looking for some advice on how to effectively use large amounts of data with d3.js. Lets say for instance, I have this data set taken from a raw .csv file (converted from excel);
EA
,Jan_2016,Feb_2016,Mar_2016
Netherlands,11.7999,15.0526,13.2411
Belgium,25.7713,24.1374
France,27.6033,23.6186,20.2142
EB
,Jan_2016,Feb_2016,Mar_2016
Netherlands,1.9024,2.9456,4.0728
Belgium,-,6.5699,7.8894
France,5.3284,4.8213,1.471
EC
,Jan_2016,Feb_2016,Mar_2016
Netherlands,3.1499,3.1139,3.3284
Belgium,3.0781,4.8349,5.1596
France,16.3458,12.6975,11.6196
Using csv I guess the best way to represent this data would be something like;
Org,Country,Month,Score
EA,Netherlands,Jan,11.7999
EA,Belgium,Jan,27.6033
EA,France,Jan,20.2142
EA,Netherlands,Feb,15.0526
EA,Belgium,Feb,25.9374
EA,France,Feb,23.6186
EA,Netherlands,Mar,13.2411
EA,Belgium,Mar,24.1374
EA,France,Mar,20.2142
This seems very long winded to me, and would use up a lot of time. I was wondering if there was an easier way to do this?
From what I can think of, I assume that JSON may be the more logical choice?
And for context of what kind of chart this data would go into, I would be looking to create a pie chart which can update the data depending on the country/month selected and comparing the three organisations scores each time.
(plnk to visualise)
http://plnkr.co/edit/P3loEGu4jMRpsvTOgCMM?p=preview
Thanks for any advice, I'm a bit lost here.
I would say the intermediary step you propose is a good one for keeping everything organized in memory. You don't have to go through a csv file though, you can just load your original csv file and turn it into an array of objects. Here is a parser:
d3.text("data.csv", function(error, dataTxt) { //import data file as text first
var dataCsv=d3.csv.parseRows(dataTxt); //parseRows gives a 2D array
var group=""; // the current group header ("organization")
var times=[]; //the current month headers
var data=[]; //the final data object, will be filled up progressively
for (var i=0;i<dataCsv.length;i++) {
if (dataCsv[i].length==1 ) { //group name
if ( dataCsv[i][0] == "")
i++; //remove empty line
group = dataCsv[i][0]; //get group name
i++;
times = dataCsv[i];//get list of time headings for this group
times.shift(); // (shift out first empty element)
} else {
country=dataCsv[i].shift(); //regular row: get country name
dataCsv[i].forEach(function(x,j){ //enumerate values
data.push({ //create new data item
Org: group,
Country: country,
Month: times[j],
Score: x
})
})
}
}
This gives the following data array:
data= [{"Org":"EA","Country":"Netherlands","Month":"Jan_2016","Score":"11.7999"},
{"Org":"EA","Country":"Netherlands","Month":"Feb_2016","Score":"15.0526"}, ...]
This is IMO the most versatile structure you can have. Not the best for memory usage though.
A simple way to nest this is the following:
d3.nest()
.key(function(d) { return d.Month+"-"+d.Country; })
.map(data);
It will give a map with key-values such as:
"Jan_2016-Netherlands":[{"Org":"EA","Country":"Netherlands","Month":"Jan_2016","Score":"11.7999"},{"Org":"EB","Country":"Netherlands","Month":"Jan_2016","Score":"1.9024"},{"Org":"EC","Country":"Netherlands","Month":"Jan_2016","Score":"3.1499"}]
Use entries instead of mapto have an array instead of a map, and use a rollup function if you want to simplify the data by keeping only the array of scores. At this point it is rather straightforward to plug it into any d3 drawing tool.
PS: a Plunker with the running code of this script. Everything is shown in the console.

Filter/Search JavaScript array of objects based on other array in Node JS

i have one array of ids and one JavaScript objects array. I need to filter/search the JavaScript objects array with the values in the array in Node JS.
For example
var id = [1,2,3];
var fullData = [
{id:1, name: "test1"}
,{id:2, name: "test2"}
,{id:3, name: "test3"}
,{id:4, name: "test4"}
,{id:5, name: "test5"}
];
Using the above data, as a result i need to have :
var result = [
{id:1, name: "test1"}
,{id:2, name: "test2"}
,{id:3, name: "test3"}
];
I know i can loop through both and check for matching ids. But is this the only way to do it or there is more simple and resource friendly solution.
The amount of data which will be compared is about 30-40k rows.
This will do the trick, using Array.prototype.filter:
var result = fullData.filter(function(item){ // Filter fulldata on...
return id.indexOf(item.id) !== -1; // Whether or not the current item's `id`
}); // is found in the `id` array.
Please note that this filter function is not available on IE 8 or lower, but the MDN has a polyfill available.
As long as you're starting with an unsorted Array of all possible Objects, there's no way around iterating through it. #Cerbrus' answer is one good way of doing this, with Array.prototype.filter, but you could also use loops.
But do you really need to start with an unsorted Array of all possible Objects?
For example, is it possible to filter these objects out before they ever get into the Array? Maybe you could apply your test when you're first building the Array, so that objects which fail the test never even become part of it. That would be more resource-friendly, and if it makes sense for your particular app, then it might even be simpler.
function insertItemIfPass(theArray, theItem, theTest) {
if (theTest(theItem)) {
theArray.push(theItem);
}
}
// Insert your items by using insertItemIfPass
var i;
for (i = 0; i < theArray.length; i += 1) {
doSomething(theArray[i]);
}
Alternatively, could you use a data structure that keeps track of whether an object passes the test? The simplest way to do this, if you absolutely must use an Array, would be to also keep an index to it. When you add your objects to the Array, you apply the test: if an object passes, then its position in the Array gets put into the index. Then, when you need to get objects out of the Array, you can consult the index: that way, you don't waste time going through the Array when you don't need to touch most of the objects in the first place. If you have several different tests, then you could keep several different indexes, one for each test. This takes a little more memory, but it can save a lot of time.
function insertItem(theArray, theItem, theTest, theIndex) {
theArray.push(theItem);
if (theTest(theItem)) {
theIndex.push(theArray.length - 1);
}
}
// Insert your items using insertItem, which also builds the index
var i;
for (i = 0; i < theIndex.length; i += 1) {
doSomething(theArray[theIndex[i]]);
}
Could you sort the Array so that the test can short-circuit? Imagine a setup where you've got your array set up so that everything which passes the test comes first. That way, as soon as you hit your first item that fails, you know that all of the remaining items will fail. Then you can stop your loop right away, since you know there aren't any more "good" items.
// Insert your items, keeping items which pass theTest before items which don't
var i = 0;
while (i < theArray.length) {
if (!theTest(theArray[i])) {
break;
}
doSomething(theArray[i]);
i += 1;
}
The bottom line is that this isn't so much a language question as an algorithms question. It doesn't sound like your current data structure -an unsorted Array of all possible items- is well-suited for your particular problem. Depending on what else the application needs to do, it might make more sense to use another data structure entirely, or to augment the existing structure with indexes. Either way, if it's planned carefully, will save you some time.

Store a data table as array of row objects, or as an object of column arrays?

Main question: whether to store a data table as array of row objects, or as an object of column arrays.
Proximate question: How to measure the memory footprint of an object.
Practical question: How do I read the memory profiler in Chrome?
Background
Working with rectangular data tables in Javascript, both in browser and/or Node.js. Many leading libraries like D3 and Crossfilter store data as arrays of objects, e.g.
var rows =
[{name: 'apple', price: 1.79, ...},
{name: 'berry', price: 3.49, ...},
{name: 'cherry', price: 4.29, ...}, ...
]
However, it seems with many columns (my use case) and potentially many rows, the overhead of storing keys can become very heavy, and it would be more efficient to store the data (and iterate over it) storing each column as an array, as in:
var cols = {
name: ['apple', 'berry', 'cherry', ...],
price: [1.79, 3.49, 4.29, ...],
...
}
Profiling question
One answer to this post describes using the Chrome memory profile: JavaScript object size
I set up the following simplistic benchmark below. The code can be copied/pasted to the console of Chrome and executed. I then looked at the Chrome profiler, but not sure how to read it.
At first glance, the retained size seems clearly in favor of columns:
window.rowData: 294,170,760 bytes
window.colData: 44,575,896 bytes
But if I click on each, they give me the same (huge) retained size:
window.rowData: 338,926,668 bytes
window.colData: 338,926,668 bytes
Benchmark code
The following code can be copy/pasted to Chrome console:
function makeid(len) {
var text = "";
var possible = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
for (var i = 0; i < len; i++)
text += possible.charAt(Math.floor(Math.random() * possible.length));
return text;
}
/* create set of 400 string keys with 8 random characters.*/
var keys = window.keys = [], i, c;
for ( i=0; i < 400; i++) {
keys.push(makeid(8));
}
/* METHOD 1: Create array of objects with {colName: cellValue} pairs */
var rows = window.rowData = [];
for ( i = 0; i < 10000; i++) {
var row = {};
for ( c = 0; c < 400; c++) {
row[keys[c]] = Math.random();
}
rows.push(row);
}
/* METHOD 2: Create set of columns { colName: [values]} */
var cols = window.colData = {};
for ( c=0; c<400; c++) {
var col = cols[keys[c]] = [];
for ( i=0; i<10000; i++) {
col[i] = rows[i][keys[c]];
}
}
I would be very careful about storing data in this fashion.
The main thing that worries me is usability. In my opinion, the biggest drawback of storing data in columns like this is that you now become responsible for managing insertion and removal of data in an atomic fashion. You will need to be very careful to ensure that if you remove or insert a value in one column, you also remove or insert a value at the same location in all of the other columns. You'll also have to make sure that whatever is using the data does not read values in the middle of the removal/insertion. If something tries to read a "row" from the data before the update finishes, it will see an inconsistent view which would be a bad thing. This all sounds very complicated and generally unpleasant to handle in Javascript to me.
When data is stored as objects in array, you can handle insertion/deletion very simply. Simply remove or add an entire object to the array and your done. The whole operation is atomic so you don't have to worry about timing, and you'll never have to worry about forgetting to remove an item from a column.
As far as memory usage is concerned, it really depends on the actual data you are storing. If you have data like that shown in your test example, where every "row" has a value in every "column" you will likely save some memory because the interpreter does not need to store the names of keys for each value in an object. How this is done is implementation specific, however, and after a little research I couldn't really identify if this is the case or not. I could easily imagine a clever interpreter using a look up table to store shared key names, in which case you will have almost negligible overhead when storing objects in an array compared to the column solution. Also, if you data happens to be sparse, I.E. not every row has a value for every column, you could actually use more memory storing data in columns. In the column scheme you will need to store a value in every single column for every row, even if it's a null or some other indicator of empty space, to maintain alignment. If you store objects in an array, you can leave off key/value pairs where necessary. If there are a lot of key/value pairs that you can leave off, you can save a ton of memory.
As Donald Knuth said, "Premature optimization is the root of all evil." By storing your data in columns like that, you will be taking on a lot of extra work to make sure that your data is consistent (which may lead to fragile code) and you will be making your code much harder to read because people won't be expecting data to be stored like that. You should only inflict these things upon yourself if you really, really, need to. My recommendation would be to stick to the objects in an array solution, since it make your code much easier to read and write, and it's pretty unlikely that you actually need to save the memory that would be saved by the column solution. If, down the line, you have performance issues you can re-visit the idea of storing data this way. Even then, I'd be willing to bet that there are other, easier ways of making things run faster.

How to change result position based off parameter in a mongodb / mongoose query?

So I am using mongoose and node.js to access a mongodb database. I want to bump up each result based on a number (they are ordered by date created if none are bumped up). For example:
{ name: 'A',
bump: 0 },
{ name: 'B',
bump: 0 },
{ name: 'C',
bump: 2 },
{ name: 'D',
bump: 1 }
would be retreived in the order: C, A, D, B. How can this be accomplished (without iterating through every entry in the database)?
Try something like this. Store a counter tracking the total # of threads, let's call it thread_count, initially set to 0, so have a document somewhere that looks like {thread_count:0}.
Every time a new thread is created, first call findAndModify() using {$inc : {thread_count:1}} as the modifier - i.e., increment the counter by 1 and return its new value.
Then when you insert the new thread, use the new value for the counter as the value for a field in its document, let's call it post_order.
So each document you insert has a value 1 greater each time. For example, the first 3 documents you insert would look like this:
{name:'foo', post_order:1, created_at:... } // value of thread_count is at 1
{name:'bar', post_order:2, created_at:... } // value of thread_count is at 2
{name:'baz', post_order:3, created_at:... } // value of thread_count is at 3
etc.
So effectively, you can query and order by post_order as ASCENDING, and it will return them in the order of oldest to newest (or DESCENDING for newest to oldest).
Then to "bump" a thread in its sorting order when it gets upvoted, you can call update() on the document with {$inc:{post_order:1}}. This will advance it by 1 in the order of result sorting. If two threads have the same value for post_order, created_at will differentiate which one comes first. So you will sort by post_order, created_at.
You will want to have an index on post_order and created_at.
Let's guess your code is the variable response (which is an array), then I would do:
response.sort(function(obj1, obj2){
return obj2.bump - obj1.bump;
});
or if you want to also take in mind name order:
response.sort(function(obj1, obj2){
var diff = obj2.bump - obj1.bump;
var nameDiff = (obj2.name > obj1.name)?-1:((obj2.name < obj1.name)?1:0);
return (diff == 0) ? nameDiff : diff;
});
Not a pleasant answer, but the solution you request is unrealistic. Here's my suggestion:
Add an OrderPosition property to your object instead of Bump.
Think of "bumping" as an event. It is best represented as an event-handler function. When an item gets "bumped" by whatever trigger in your business logic, the collection of items needs to be adjusted.
var currentOrder = this.OrderPosition
this.OrderPosition = currentOrder - bump; // moves your object up the list
// write a foreach loop here, iterating every item AFTER the items unadjusted
// order, +1 to move them all down the list one notch.
This does require iterating through many items, and I know you are trying to prevent that, but I do not think there is any other way to safely ensure the integrity of your item ordering - especially when relative to other pulled collections that occur later down the road.
I don't think a purely query-based solution is possible with your document schema (I assume you have createdDate and bump fields). Instead, I suggest a single field called sortorder to keep track of your desired retrieval order:
sortorder is initially the creation timestamp. If there are no "bumps", sorting by this field gives the correct order.
If there is a "bump," the sortorder is invalidated. So simply correct the sortorder values: each time a "bump" occurs swap the sortorder fields of the bumped document and the document directly ahead of it. This literally "bumps" the document up in the sort order.
When querying, sort by sortorder.
You can remove fields bump and createdDate if they are not used elsewhere.
As an aside, most social sites don't directly manipulate a post's display position based on its number of votes (or "bumps"). Instead, the number of votes is used to calculate a score. Then the posts are sorted and displayed by this score. In your case, you should combine createdDate and bumps into a single score that can be sorted in a query.
This site (StackOverflow.com) had a related meta discussion about how to determine "hot" questions. I think there was even a competition to come up with a new formula. The meta question also shared the formulas used by two other popular social news sites: Y Combinator Hacker News and Reddit.

Categories

Resources