dc.js lineChart performance issue with 8k+ items

dc.js lineChart performance issue with 8k+ items - javascript

This is my second question on the dc.js/d3.js/crossfilter.js topic. I am trying to realize a basic personal dashboard and I started by creating a very simple lineChart (with a rangeChart associated) that outputs metrics over time.
The data I have is saved as json (it will be stored in a mongoDb instance at a later stage, so for now I used JSON that also keep datetime format) and looks like this:
[
{"date":1374451200000,"prodPow":0.0,"consPow":0.52,"toGridPow":0.0,"fromGridPow":0.52,"prodEn":0.0,"consEn":0.0,"toGridEn":0.0,"fromGridEn":0.0},
{"date":1374451500000,"prodPow":0.0,"consPow":0.34,"toGridPow":0.0,"fromGridPow":0.34,"prodEn":0.0,"consEn":0.0,"toGridEn":0.0,"fromGridEn":0.0},
{"date":1374451800000,"prodPow":0.0,"consPow":0.42,"toGridPow":0.0,"fromGridPow":0.42,"prodEn":0.0,"consEn":0.0,"toGridEn":0.0,"fromGridEn":0.0},
...
]
I have around 22000 entries like this and I am experiencing lot of performance issues when opening the dashboard. Even if I try to slice the data in a set of 8000 records, the performance are still pretty bad (but at least the rendering finishes after some time) and the interaction with the data is awful.
I am guessing that my code has some pitfall that makes it under-perform since I'd expect dc.js and crossfilter.js to struggle with 100k+ entries and more than one dimension!
Nevertheless, profiling with chrome and reading online didn't help much (more details on what I tried to change later).
Here is my graph.js code:
queue()
.defer(d3.json, "/data")
.await(makeGraphs);
function makeGraphs(error, recordsJson) {
// Clean data
var records = recordsJson;
// Slice data to avoid browser deadlock
records = records.slice(0, 8000);
// Crossfilter instance
ndx = crossfilter(records);
// Define Dimensions
var dateDim = ndx.dimension(function(d) { return d.date; });
// Define Groups
var consPowByDate = dateDim.group().reduceSum(function (d) { return d.consPow; });
var prodPowByDate = dateDim.group().reduceSum(function (d) { return d.prodPow; });
// Min and max dates to be used in the charts
var minDate = dateDim.bottom(1)[0]["date"];
var maxDate = dateDim.top(1)[0]["date"];
// Charts instance
var chart = dc.lineChart("#chart");
var volumeChart = dc.barChart('#volume-chart');
chart
.renderArea(true)
/* Make the chart as big as the bootstrap grid by not setting ".width(x)" */
.height(350)
.transitionDuration(1000)
.margins({top: 30, right: 50, bottom: 25, left: 40})
.dimension(dateDim)
/* Grouped data to represent and label to use in the legend */
.group(consPowByDate, "Consumed")
/* Function to access grouped-data values in the chart */
.valueAccessor(function (d) {
return d.value;
})
/* x-axis range */
.x(d3.time.scale().domain([minDate, maxDate]))
/* Auto-adjust y-axis */
.elasticY(true)
.renderHorizontalGridLines(true)
.legend(dc.legend().x(80).y(10).itemHeight(13).gap(5))
/* When on, you can't visualize values, when off you can filter data */
.brushOn(false)
/* Add another line to the chart; pass (i) group, (ii) legend label and (iii) value accessor */
.stack(prodPowByDate, "Produced", function(d) { return d.value; })
/* Range chart to link the brush extent of the range with the zoom focus of the current chart. */
.rangeChart(volumeChart)
;
volumeChart
.height(60)
.margins({top: 0, right: 50, bottom: 20, left: 40})
.dimension(dateDim)
.group(consPowByDate)
.centerBar(true)
.gap(1)
.x(d3.time.scale().domain([minDate, maxDate]))
.alwaysUseRounding(true)
;
// Render all graphs
dc.renderAll();
};
I Used chrome dev tools to do some CPU profiling and as a summary these are the results:
d3_json parsing at the top takes around 70ms (independent from #records)
with 2000 records:
make_graphs takes slightly under 1s;
dimensions aggregated take around 11ms;
groups aggregated take around 8ms;
dc.lineChart take around 16ms;
dc.barChart take around 8ms;
rendering takes around 700ms (450ms for lineChart);
data interaction is not super smooth but it is still good enough.
with 8000 records:
make_graphs takes around 6s;
dimensions aggregated take around 80ms;
groups aggregated take around 55ms;
dc.lineChart take around 25ms;
dc.barChart take around 15ms;
rendering takes around 5.3s (3s for lineChart);
data interaction is awful and filtering takes lot of time.
with all records the browser stalls and I need to stop the script.
After reading this thread I thought it could have been an issue with dates so I tried to modified the code to use numbers instead of dates. Here is what I modified (I will write down only the changes):
// Added before creating the crossfilter to coerce a number date
records.forEach(function(d) {
d.date = +d.date;
});
// In both the lineChart and barChart I used a numeric range
.x(d3.scale.linear().domain([minDate, maxDate]))
Unfortunately nothing noticeable changed performance-wise.
I have no clue on how to fix this and actually I would like to add more groups, dimensions and charts to the dashboard...
Edit:
Here is a github link if you want to test my code by yourself.
I used python3 and flask for the server side, so you just have to install flask:
pip3 install flask
run the dashboard:
python3 dashboard.py
and then go with your browser to:
localhost:5000

It's hard to tell without trying it out but probably what is happening is that there are too many unique dates, so you end up with a huge number of DOM objects. Remember that JavaScript is fast, but the DOM is slow - so dealing with up to half a gigabyte of data should be fine, but you can only have a few thousand DOM objects before the browser chokes up.
This is exactly what crossfilter was designed to deal with, however! All you need to do is aggregate. You're not going to be able to see 1000s of points; they will only get lost, since your chart is (probably) only a few hundred pixels wide.
So depending on the time scale, you could aggregate by hour:
var consPowByHour = dateDim.group(function(d) {
return d3.time.hour(d);
}).reduceSum(function (d) { return d.consPow; });
chart.group(consPowByHour)
.xUnits(d3.time.hours)
or similarly for minutes, days, years, whatever. It may be more complicated than you need, but this example shows how to switch between time intervals.
(I'm not going to install a whole stack to try this - most examples are JS only so it's easy to try them out in jsfiddle or whatever. If this doesn't explain it, then adding a screenshot might also be helpful.)
EDIT: I also notice that your data is integers but your scale is time-based. Maybe this causes objects to be built all the time. Please try :
records.forEach(function(d) {
d.date = new Date(+d.date);
});

Related

Issues sorting Row Chart in Y axis. Dc-js RowChart

This is my first question here, I'm going mad with a problem.
I'm using DC.js ( lib on top of D3 ) and I'm trying to add my own data to one of them. It sorts the data just fine when it's like 10 rows. But after that is just all over the place.
I want to group the data by price (Kurs) and add the volume together. Then sort it from low to high price.
This code runs just fine on "dc.barChart" but on rowChart I don't scale right.
I have been using the example code, but with my own CSV.
https://dc-js.github.io/dc.js/examples/row.html
var chart = dc.rowChart("#test");
d3.csv("tickdata.csv").then(function(experiments) {
experiments.forEach(function(x) {
x.Volym = +x.Volym;
x.Kurs = +(roundNumber(x.Kurs,0));
});
var ndx = crossfilter(experiments),
runDimension = ndx.dimension(function(d) {return +d.Kurs;}),
speedSumGroup = runDimension.group().reduceSum(function(d) {return +d.Volym;});
chart
.width(1024)
.height(600)
.margins({top: 20, right: 20, bottom: 20, left: 20})
.ordering(function(y){return -y.value.Kurs})
.elasticX(true)
.dimension(runDimension)
.group(speedSumGroup)
.renderLabel(true);
chart.on('pretransition', function() {
chart.select('y.axis').attr('transform', 'translate(0,10000)');
chart.selectAll('line.grid-line').attr('y2', chart.effectiveHeight());
});
chart
.render();
});
And the csv looks like this:
Tid,Volym,Volym_fiat,Kurs
2018-06-27 09:46:00,5320,6372,1515.408825
2018-06-27 09:47:00,3206,4421,1515.742652
2018-06-27 09:48:00,2699,4149,1515.013167
2018-06-27 09:49:00,3563,4198,1515.175342
And I want to sort the Y axis by "Kurs" - value. I can make this work in Bar chart but it does not work in RowChart. Please help!

It would be easier to test this with a fiddle, but it looks like in your ordering function, you assume that the reduced value will have a Kurs field (y.value.Kurs).
However, when you use group.reduceSum(), just a simple numeric value will be produced.
So this should work
.ordering(function(y){return -y.value})
This is the default behavior in dc.js 2.1+ so you might not need the line at all.
Incidentally, if you ever have problems with guessing the right shape of the reduced data, the way to troubleshoot any accessor is to put a breakpoint or console.log inside. You should see what is going wrong pretty quick with a case like this.

dc.js line chart starting from zero after each point

I have a line chart as shown in the fiddle http://jsfiddle.net/djmartin_umich/qBr7y/
The graph works fine and plots as expected. But I need one change to be made so that the plots become triangular shaped and I could see a series of irregular triangles. I mean after every point in Y, it should drop to 0 and start afresh. I know we could achieve this by explicitly adding data points to point to 0. But, just wondering if we could do that without creating additional data points.
HTML:
<div id="line-chart"></div>
<div id="log">Incoming Data:</div>
JS:
var startDate = new Date("2011-11-14T16:17:54Z");
var currDate = moment(startDate);
var cf = crossfilter([{date: startDate, quantity: 1}]);
AddData();
var timeDimension = cf.dimension(function(d){ return d.date; });
var totalGroup = timeDimension.group().reduceSum(function(d){ return d.quantity; });
var lineChart = dc.lineChart("#line-chart")
.brushOn(false)
.width(800)
.height(200)
.elasticY(true)
.x(d3.time.scale().domain([startDate, currDate]))
.dimension(timeDimension)
.group(totalGroup);
dc.renderAll();
window.setInterval(function(){
AddData();
lineChart.x(d3.time.scale().domain([startDate, currDate]));
dc.renderAll();
}, 800);
function AddData(){
var q = Math.floor(Math.random() * 6) + 1;
currDate = currDate.add('day', 5);
cf.add( [{date: currDate.clone().toDate(), quantity: q}]);
$("#log").append(q + ", ");
}
CSS:
#log{
clear:both;
}
Thanks,
Vicky

You can use a "fake group" to achieve this effect.
This is a general-purpose technique for preprocessing data that allows you to change what the chart sees without modifying the data in the crossfilter. In this case, we want to add a data point immediately after each point returned by the crossfilter group.
The fake group wraps the crossfilter group in an object that works like a group. Since in most cases dc.js only needs to call group.all(), this is pretty easy:
function drop_to_zero_group(key_incrementor, group) {
return {
all: function() {
var _all = group.all(), result = [];
_all.forEach(function(kv) {
result.push(kv);
result.push({key: key_incrementor(kv.key), value: 0});
})
return result;
}
}
}
The fake group here produces two data points for each one it reads. The first is just a duplicate (or reference) of the original, and the second has its key incremented by a user-specified function.
It might make sense to parameterize this function by the zero value as well, but I mostly wanted to pull out the date incrementor, since that involves another trick. Here is a date incrementor:
function increment_date(date) {
return new Date(date.getTime()+1);
}
This uses date.getTime() to get the integer value (in milliseconds since the beginning of 1970), adds one, and converts back to a date.
Actually, the first time I tried this, I forgot to include +1, and it still worked! But I don't recommend that, since dc.js is likely to get confused if there is more than one point with the same x value.
Apply the fake group by wrapping the group before passing it to the chart
lineChart
.group(drop_to_zero_group(increment_date, totalGroup));
Here's a fork of DJ's fiddle: http://jsfiddle.net/gordonwoodhull/dwfgma8j/4/
FWIW I also changed dc.renderAll() to dc.redrawAll() in order to enable animated transitions instead of blinking white and rendering from scratch each time. The transitions are not perfect but I think it's still better than the blink. I have a fix but it's a breaking change so it will go into dc.js 2.1.

D3 - Dimple basic example improve rapidity

Let me explain my situation.
First, I have chosen to use Dimple because I am new with d3, and I see dimple as a way to progressively get more familiar with d3 (but still produce interesting plots).
I want to plot a multiple line graph.
Each line represents the power demand at a location during the day.
The data is coming from a Python algorithm under the following shape:
{ time:[00:00:00...23:59:59], locationName1:[power values], ..., locationNameN:[]}
In order to plot it, I transformed it into a flat format, and so I wrote a piece of code to create a csv file such as there are 3 columns:
"Time,Location,Power_Demand"
"00:00,Home,1000"
"...,...,..."
My csv file is approximately 0.14MB
I use the following script to plot my result:
var svg = dimple.newSvg("#chartContainer", 1500, 800);
d3.csv("data.csv", function (data) {
var myChart = new dimple.chart(svg, data);
myChart.setBounds(100, 100, 1000, 620)
var x = myChart.addTimeAxis("x", "Time", "%H:%M:%S", "%H:%M");
x.addOrderRule("Time");
var y = myChart.addMeasureAxis("y", "Power_Demand");
y.overrideMax = 300000;
y.overrideMin = 0;
var s = myChart.addSeries(["Location"], dimple.plot.line);
myChart.addLegend(130, 10, 400, 35, "right");
myChart.draw();
});
It takes approximately 1 minutes to draw.
My main question is: why is it that slow ? Is it my JavaScript code ?
In the end it's just 5 curves with 1439 points each... it should be quick.
(ps: I have also been a bit disappointed that working with a non-flat JSON object is not easier)

Alright, turned out that trying to follow this dimple example http://dimplejs.org/examples_viewer.html?id=lines_horizontal_stacked
made me format my data in a weird way without questioning it.
I have decided to use http://bl.ocks.org/mbostock/3884955 instead and realized that I could also write my data under this flat format:
Time,Location1,Location2,...,LocationN
00:00,power value1.1,power value2.1,...,power valueN.1
The result is instantaneous.
Not using Dimple was a little bit harder at first, but worth it in the end.
I am sure that my JavaScript code using dimple wasn't the good way to proceed (probably because I am new to it). But still it's a bit disappointing that there are no examples using a simpler dataset on the dimple page. As a result it turns out to be confusing to use a very simple dataset (according to me).

vis.js graph not stabilizing even after hours

I have a network of around 1000 nodes. I have set stabilize:true and zoomExtentOnStabilize: true. The nodes are being added from JSON using vis.network.gephiParser.parseGephi() function. When I tried to plot this graph it never stabilizes even after hours of letting it idle. But then smaller number of nodes stabilize in reasonable time. What am I missing here. Is there any way to stabilize big graphs. I even tried setting the number of iterations to stabilize to 1000 and even higher. Thanks in advance for the help.
P.S.:The coordinates of the nodes are not available from JSON. The graph is redrawn based on the user input.
EDIT 1:
The JSON data being plotted is available at http://pastebin.com/raw.php?i=Mzy4ncxw. I couldn't make a reproducible example at jsbin because of CORS error.
The JavaScript code is:
message = JSON.parse(json_data); // json_data is sent from R server.
var nodes = new vis.DataSet();
var edges = new vis.DataSet();
var container = document.getElementById("div_graph");
var data = {
nodes: nodes,
edges: edges
};
var options = {
tooltip: {
delay: 50,
fontColor: "black",
fontSize: 14,
fontFace: "verdana",
color: {
border: "#666",
background: "#FFFFC6"
}
},
clustering: {
enabled: clusteringOn,
clusterEdgeThreshold: 50
},
hideEdgesOnDrag: true,
stabilize: true,
zoomExtentOnStabilize: true,
navigation: true,
keyboard: true,
edges: {
inheritColor: "to"
}
};
var network = new vis.Network(container, data, options);
nodes.clear();
edges.clear();
var parsed = vis.network.gephiParser.parseGephi(message);
nodes.add(parsed.nodes);
edges.add(parsed.edges);
network.redraw();

I'm the developer of the network module of visjs. we have used it to stabilize much larger sets than 1000 nodes. I can't really say what's going wrong here based on the information you supply. I'd like to invite you to make an issue on our github page. We try to collect all questions there. Can you share the code you use or your data (labels scrambled for anonymity ofcourse).
If I were to guess, a 1000 node system would stabilize with about 3000 iterations. If you are using dynamic smooth curves this increases greatly as support nodes are added to position the curves. I have used 15000 iterations for a 3000 node and 25000 edge system and even then it is not finished but I stop the simulation at that point regardless.
When you say redrawn on user input, is the data reloaded or redrawn in the sense that you see the dragging or zooming (similar to the redraw function)?
~ Alex
EDIT:
Based on your data I encoutered a few problems.
First, it seems you do not allow the nodes to move but also do not supply their positions, leading to an infinite recursion in the quadtree building process. I'll make the gephiParser more robust for this in the future.
See here for settings of the gephi parser: http://visjs.org/docs/network.html#Gephi_import
Secondly, You use dynamic smooth curves and a lot of interconnected nodes. Each smooth curve has an invisible support node that helps the positioning. This makes your system unstable (look at it with stabilize of to see the behaviour). In the v4 version you can set your own timestep to rectify this, but alternatively you can change your physics settings. Try the configurePhysics option and see if that helps. You can still use static smooth curves for aesthetic purposes.
To wrap up, I could get your system to stabilize with static smooth curves in about 3000 iterations, taking about a minute. I disabled clustering in your options. I'd recommend you wait for the 4.0 release to use clustering as it will be much much more powerful.
EDIT 2:
Here is a JSBin showing a working stabilization with your code and data (although modified)
http://jsbin.com/tiwijixoha/5/edit?html,output
So if you ment that it does not stabilize in the sense that it does not hide itself and only shows when it is ready instead of never reaching a stabilized state, then the problem is that stabilization is only done with a setData(), not with a dataset update.
In this jsbin I have also changed your edges and altered the physics to make it stable. You can play around with it a bit more if you're unhappy with it.

d3.v3 scatterplot with all circles the same radius

Every example I have found shows all of the scatter plot points to be of random radii. Is it possible to have them all the same size? If I try to statically set the radius all of the circles will be very small (I'm assuming the default radius). However, if I use Math.random() as in most examples there are circles large and small. I want all the circles to be large. Is there a way to do that? Here's the code snippet forming the graph data using Math.random() (this works fine for some reason):
function scatterData(xData, yData)
{
var data = [];
for (var i = 0; i < seismoNames.length; i++)
{
data.push({
key: seismoNames[i],
values: []
});
var xVals=""+xData[i];
xVals=xVals.split(",");
var yVals=""+yData[i];
yVals=yVals.split(",");
for (var j = 0; j < xVals.length; j++)
{
data[i].values.push({
x: xVals[j],
y: yVals[j],
size: Math.random()
});
}
}
return data;
}
Math.random() spits out values between 0 and 1 such as 0.164259538891095 and 0.9842195005008699. I have tried putting these as static values in the 'size' attribute, but no matter what the circles are always really small. Is there something I'm missing?

Update: The NVD3 API has changed, and now uses pointSize, pointSizeDomain, etc. instead of just size. The rest of the logic for exploring the current API without complete documentation still applies.
For NVD3 charts, the idea is that all adjustments you make can be done by calling methods on the chart function itself (or its public components) before calling that function to draw the chart in a specific container element.
For example, in the example you linked too, the chart function was initialized like this:
var chart = nv.models.scatterChart()
.showDistX(true)
.showDistY(true)
.color(d3.scale.category10().range());
chart.xAxis.tickFormat(d3.format('.02f'));
chart.yAxis.tickFormat(d3.format('.02f'));
The .showDistX() and .showDistY() turn on the tick-mark distribution in the axes; .color() sets the series of colours you want to use for the different categories. The next too lines access the default axis objects within the chart and set the number format to be a two-digit decimal. You can play around with these options by clicking on the scatterplot option from the "Live Code" page.
Unfortunately, the makers of the NVD3 charts don't have a complete documentation available yet describing all the other options you can set for each chart. However, you can use the javascript itself to let you find out what methods are available.
Inspecting a NVD3.js chart object to determine options
Open up a web page that loads the d3 and nvd3 library. The live code page on their website works fine. Then open up your developer's console command line (this will depend on your browser, search your help pages if you don't know how yet). Now, create a new nvd3 scatter chart function in memory:
var testChart = nv.models.scatterChart();
On my (Chrome) console, the console will then print out the entire contents of the function you just created. It is interesting, but very long and difficult to interpret at a glance. And most of the code is encapsulated so you can't change it easily. You want to know which properties you can change. So run this code in the next line of your console:
for (keyname in testChart){console.log(keyname + " (" + typeof(testChart[keyname]) + ")");}
The console should now print out neatly the names of all the methods and objects that you can access from that chart function. Some of these will have their own methods and objects you can access; discover what they are by running the same routine, but replacing the testChart with testChart.propertyName, like this:
for (keyname in testChart.xAxis){console.log(keyname + " (" + typeof(testChart.xAxis[keyname]) + ")");}
Back to your problem. The little routine I suggested above doesn't sort the property names in any order, but skimming through the list you should see three options that relate to size (which was the data variable that the examples were using to set radius)
size (function)
sizeDomain (function)
sizeRange (function)
Domain and range are terms used by D3 scales, so that gives me a hint about what they do. Since you don't want to scale the dots, let's start by looking at just the size property. If you type the following in the console:
testChart.size
It should print back the code for that function. It's not terribly informative for what we're interested in, but it does show me that NVD3 follows D3's getter/setter format: if you call .property(value) you set the property to that value, but if you call .property() without any parameters, it will return back the current value of that property.
So to find out what the size property is by default, call the size() method with no parameters:
testChart.size()
It should print out function (d) { return d.size || 1}, which tells us that the default value is a function that looks for a size property in the data, and if it doesn't exist returns the constant 1. More generally, it tells us that the value set by the size method determines how the chart gets the size value from the data. The default should give a constant size if your data has no d.size property, but for good measure you should call chart.size(1); in your initialization code to tell the chart function not to bother trying to determine size from the data and just use a constant value.
Going back to the live code scatterplot can test that out. Edit the code to add in the size call, like this:
var chart = nv.models.scatterChart()
.showDistX(true)
.showDistY(true)
.color(d3.scale.category10().range())
.size(1);
chart.xAxis.tickFormat(d3.format('.02f'));
chart.yAxis.tickFormat(d3.format('.02f'));
Adding that extra call successfully sets all the dots to the same size -- but that size is definitely not 1 pixel, so clearly there is some scaling going on.
First guess for getting bigger dots would be to change chart.size(1) to chart.size(100). Nothing changes, however. The default scale is clearly calculating it's domain based on the data and then outputting to a standard range of sizes. This is why you couldn't get big circles by setting the size value of every data element to 0.99, even if that would create a big circle when some of the data was 0.01 and some was 0.99. Clearly, if you want to change the output size, you're going to have to set the .sizeRange() property on the chart, too.
Calling testChart.sizeRange() in the console to find out the default isn't very informative: the default value is null (nonexistent). So I just made a guess that, same as the D3 linear scale .range() function, the expected input is a two-element array consisting of the max and min values. Since we want a constant, the max and min will be the same. So in the live code I change:
.size(1);
to
.size(1).sizeRange([50,50]);
Now something's happening! But the dots are still pretty small: definitely not 50 pixels in radius, it looks closer to 50 square pixels in area. Having size computed based on the area makes sense when sizing from the data, but that means that to set a constant size you'll need to figure out the approximate area you want: values up to 200 look alright on the example, but the value you choose will depend on the size of your graph and how close your data points are to each other.
--ABR
P.S. I added the NVD3.js tag to your question; be sure to use it as your main tag in the future when asking questions about the NVD3 chart functions.

The radius is measured in pixels. If you set it to a value less than one, yes, you will have a very small circle. Most of the examples that use random numbers also use a scaling factor.
If you want all the circles to have a constant radius you don't need to set the value in the data, just set it when you add the radius attribute.
Not sure which tutorials you were looking at, but start here: https://github.com/mbostock/d3/wiki/Tutorials
The example "Three little circles" does a good step-by-step of the different things you can do with circles:
http://mbostock.github.io/d3/tutorial/circle.html

Develop Reference

JavaScript is the programming language of the Web.