More modular D3.js coding - javascript

Consider the code snippet
let circles = svg.selectAll("circle")
.data(data)
.attr("cx", d => d.x)
.attr("cy", d => d.y)
.attr("r", 2);
The three lines attr-cx, attr-cy, and attr-r operate internally using the following pseudo-code:
foreach d in update-selection:
d.cx = (expression)
foreach d in update-selection:
d.cy = (expression)
foreach d in update-selection:
d.r = (constant)
Now suppose that we want to do it differently. We'd like to instead run:
foreach d in update-selection:
d.cx = (expression)
d.cy = (expression)
d.r = (constant)
by writing either
let circles = svg.selectAll("circle")
.data(data)
.myfunction(d => d);
or
let circles = svg.selectAll("circle")
.data(data)
.myfunction(d);
We might want to do this because:
No matter how fast the iteration control, it's still faster if we iterate once rather than three times.
The sequence of attr-cx, attr-cy, and attr-r is not just three statements, but a sequence of many dozens or hundreds of statements (that manipulate attributes, among other changes), and we'd like to isolate them into a separate block for readability and testability.
As an exercise to better understand the options available when coding in D3.
How might you isolate the triple of attr statements through a single function call?
Update
Towards Reusable Charts is a rare post from Mike Bostock suggesting a way to organize a visualization by separating the bulk of the code into a separate module. You know the rest: modularity facilitates reuse, enhances teamwork by programming against APIs, enables testing, etc. Other D3.js examples suffer for the most part from a reliance on monolithic programming that is more suited for discardable one-shot visualizations. Are you aware of other efforts to modularize D3.js code?

TL;DR: there is no performance gain in changing the chained attr methods for a single function that sets all attributes at once.
We can agree that a typical D3 code is quite repetitive, sometimes with a dozen attr methods chained. As a D3 programmer I'm used to it now, but I understand the fact that a lot of programmers cite that as their main complaint regarding D3.
In this answer I'll not discuss if that is good or bad, ugly or beautiful, nice or unpleasant. That would be just an opinion, and a worthless one. In this answer I'll focus on performance only.
First, let's consider a few hypothetical solutions:
Using d3-selection-multi: that may seem as the perfect solution, but actually it changes nothing: in its source code, d3-selection-multi simply gets the passed object and call selection.attr several times, just like your first snippet.
However, if performance (your #1) is not an issue and your only concern is readability and testability (as in your #2), I'd go with d3-selection-multi.
Using selection.each: I believe that most D3 programmers will immediately think about encapsulating the chained attr in an each method. But in fact this changes nothing:
selection.each((d, i, n)=>{
d3.select(n[i])
.attr("foo", foo)
.attr("bar", bar)
//etc...
});
As you can see, the chained attr are still there. It's even worse, not that we have an additional each (attr uses selection.each internally)
Using selection.call or any other alternative and passing the same chained attr methods to the selection.
These are not adequate alternatives when it comes to performance. So, let's try another ways of improving performance.
Examining the source code of attr we can see that, internally, it uses Element.setAttribute or Element.setAttributeNS. With that information, let's try to recreate your pseudocode with a method that loops the selection only once. For that, we'll use selection.each, like this:
selection.each((d, i, n) => {
n[i].setAttribute("cx", d.x);
n[i].setAttribute("cy", d.y);
n[i].setAttribute("r", 2);
})
Finally, let's test it. For this benchmark I wrote a very simple code, setting the cx, cy and r attributes of some circles. This is the default approach:
const data = d3.range(100).map(() => ({
x: Math.random() * 300,
y: Math.random() * 150
}));
const svg = d3.select("body")
.append("svg");
const circles = svg.selectAll(null)
.data(data)
.enter()
.append("circle")
.attr("cx", d=>d.x)
.attr("cy", d=>d.y)
.attr("r", 2)
.style("fill", "teal");
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.7.0/d3.min.js"></script>
And this the approach using setAttribute in a single loop:
const data = d3.range(100).map(() => ({
x: Math.random() * 300,
y: Math.random() * 150
}));
const svg = d3.select("body")
.append("svg");
const circles = svg.selectAll(null)
.data(data)
.enter()
.append("circle")
.each((d, i, n) => {
n[i].setAttribute("cx", d.x);
n[i].setAttribute("cy", d.y);
n[i].setAttribute("r", 2);
})
.style("fill", "teal")
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.7.0/d3.min.js"></script>
Finally, the most important moment: let's benchmark it. I normally use jsPerf, but it's down for me, so I'm using another online tool. Here it is:
https://measurethat.net/Benchmarks/Show/6750/0/multiple-attributes
And the results were disappointing, there is virtually no difference:
There is some fluctuation, sometimes one code is faster, but most of the times they are pretty equivalent.
However, it gets even worse: as another user correctly pointed in their comment, the correct and dynamic approach would involve looping again in your second pseudocode. That would make the performance even worse:
Therefore, the problem is that your claim ("No matter how fast the iteration control, it's still faster if we iterate once rather than three times") doesn't need to be necessarily true. Think like that: if you had a selection of 15 elements and 4 attributes, the question would be "is it faster doing 15 external loops with 4 internal loops each or doing 4 external loops with 15 internal loops each?". As you can see, nothing allows us to say that one is faster than the other.
Conclusion: there is no performance gain in changing the chained attr methods for a single function that sets all attributes at once.

Does the .call() method of the d3 selection do what you're after? Documentation at https://github.com/d3/d3-selection/blob/v1.4.1/README.md#selection_call
I have sometimes used this method to define a more 'modular' feeling update function, and even pass different functions into the call() to do different things as required.
In your example I think we can do:
function updateFunction(selection){
selection
.attr("cx", d => d.x)
.attr("cy", d => d.y)
.attr("r", 2);
}
let circles = svg.selectAll("circle")
.data(data)
.call(updateFunction);

Related

d3 Force: Making sense of data binding

I can recreate the following 1000 times and have enough of an understanding to do so. But I'm trying to get my head around a few specific bits that I just 'do', rather than understand:
var w = 900,
h = 500;
var svg = d3.select("body").append("svg")
.attr("width", w)
.attr("height", h)
.attr("style", "border: 1px solid grey;")
.on("mousemove", fn)
var force = d3.layout.force()
.size([w, h])
.on("tick", tick)
.gravity(0)
.charge(0)
.start()
function fn() {
var m = d3.mouse(this);
var point = {x: m[0], y: m[1]};
d3.select("#output").text(force.nodes().length)
var node = svg
.append("circle")
.data([point])
.attr("cx", function(d) {return d.x})
.attr("cy", function(d) {return d.y})
.attr("r", 0.1)
.transition().ease(Math.sqrt)
.attr("r", 5)
.transition().delay(1000)
.each("end", function() {
force.nodes().shift()
})
.remove()
force.nodes().push(point)
force.start()
}
function tick() {
svg.selectAll("circle")
.attr("cx", function(d) {return d.x})
.attr("cy", function(d) {return d.y})
}
In particular it's the data binding part I'm not sure about.
In function fn() (on mousemove of svg space) we define a new point and we need to do two things with it; push it into force.nodes() so that the x and y coordinates of the point can be manipulated by forces configured in the force layout, and we need to use the coordinates of the point to create and manipulate the visualisation.
So we create the point first off. We then build a circle to represent this point. We push the point into force.nodes() and after a short delay, we remove both the visualisation and the point from the force.nodes() array.
The bit I don't understand is how the visualisation and the point in the array stay "connected"?
Conjecture: The data point is an object which the force layout is constantly updating the x and y properties of. There is a "link" to this object bound to the circle element. The object is therefore easily accessed and used by the circle object, but not without us controlling that process. The circle is defined as having a cx and cy at point of its creation, but we need to keep accessing the underlying data to update its cx and cy?
If that's the case, how is the object "shared" by both force.nodes() and the circle element?
Or am I miles off the mark?
Also I have read a lot of documentation on this but I feel this is something more intrinsic to javascript rather than d3 necessarily, so it's not elaborated on in any literature I've so far read.
The link between the data structures that the force layout updates and the visualization (i.e. the DOM elements) is the tick event handler function. The tick event is generated by the force layout to signify that the force simulation has progressed another step (i.e. tick) and its internal state has changed. This signals that the visualization needs to be updated.
There are two parts to making this link happen. First, the data operated on by the force layout (i.e. the links and nodes) needs to be bound to DOM elements. This is done using the usual .selectAll().data().enter().append() pattern, usually in the initialisation code, sometimes in the tick event handler function. This establishes the link between data and DOM elements.
The second part to this is the code that updates the DOM elements when the force layout changes their positions. This is what happens in the tick event handler function. If you're not adding or removing elements, there's usually no need to rebind data and often you won't see the .selectAll().data() pattern, but only the code that actually updates the positions based on the data already bound to the elements (in your case this works even though you're changing the elements because the data binding happens in the function that updates the data for the force layout as well).
As an experiment, take an arbitrary force layout example and delete the tick event handler function -- you'll see that nothing happens at all even though the force layout is running.

Drawing multiple sets of multiple lines on a d3.js chart

I have a d3 chart that displays two lines showing a country's imports and exports over time. It works fine, and uses the modular style described in 'Developing a D3.js Edge' so that I could quite easily draw multiple charts on the same page.
However, I now want to pass in data for two countries and draw imports and exports lines for both of them. After a day of experimentation, and getting closer to making it work, I can't figure out how to do this with what I have. I've successfully drawn multi-line charts with d3 before, but can't see how to get there from here.
You can view what I have here: http://bl.ocks.org/philgyford/af4933f298301df47854 (or the gist)
I realise there's a lot of code. I've marked with "Hello" the point in script.js where the lines are drawn. I can't work out how to draw those lines once for each country, as opposed to just for the first one, which is what it's doing now.
I'm guessing that where I'm applying data() isn't correct for this usage, but I'm stumped.
UPDATE: I've put a simpler version on jsfiddle: http://jsfiddle.net/philgyford/RCgaL/
The key to achieving what you want are nested selections. You first bind the entire data to the SVG element, then add a group for each group in the data (each country), and finally get the values for each line from the data bound to the group. In code, it looks like this (I've simplified the real code here):
var svg = d3.select(this)
.selectAll('svg')
.data([data]);
var g = svg.enter().append('svg').append('g');
var inner = g.selectAll("g.lines").data(function(d) { return d; });
inner.enter().append("g").attr("class", "lines");
inner.selectAll("path.line.imports").data(function(d) { return [d.values]; })
.enter().append("path").attr('class', 'line imports')
.attr("d", function(d) { return imports_line(d); });
The structure generated by this looks like svg > g > g.lines > path.line.imports. I've omitted the code for the export line here -- that would be below g.lines as well. Your data consists of a list of key-value pairs with a list as value. This is mirrored by the SVG structure -- each g.lines corresponds to a key-value pair and each path to the value list.
Complete demo here.
The point is that you're thinking to imperative. That's why you have so much code. I really can't put it better than Mike Bostock, you have to start Thinking with Joins:
svg.append("circle")
.attr("cx", d.x)
.attr("cy", d.y)
.attr("r", 2.5);
But that’s just a single circle, and you want many circles: one for each data point. Before you bust out a for loop and brute-force it, consider this mystifying sequence from one of D3’s examples.
Here data is an array of JSON objects with x and y properties, such as: [{"x": 1.0, "y": 1.1}, {"x": 2.0, "y": 2.5}, …].
svg.selectAll("circle")
.data(data)
.enter().append("circle")
.attr("cx", function(d) { return d.x; })
.attr("cy", function(d) { return d.y; })
.attr("r", 2.5);
I'll leave translating this example to the "from one line to many lines" as an excerxise.

Where is the notion of .data(xx.map(function(d))...) defined in JavaScript?

Sorry for the vague title (I will update it later, and I'll update tags too). In some code I am trying to understand, I found the following:
var map = svg.append("svg")
.attr({x: 10,
y: 10)
.selectAll("path")
.data(cl.map(function(d) { // ????
return d3.range(d.x.length).map(function(i) {
return {x: d.x[i], y: d.y[i]};});}))
.enter().append("svg:path")
.attr("d", lineMap)
.style("fill", "none")
.style("stroke", "darkgreen")
.style("stroke-width", 1);
cl is a set of contour lines data to be connected up as a path. My question is about the part I marked ????. I understand what this does in general terms, and even the details. Since I'm new to this stuff, I'm wondering where the .data and .map ideas are documented, and to what language they belong (JavaScript? JSON?). I've looked around and it's a bit tough to find answers when googling 'map data'! Also, is the d notion that seems to be in most JavaScript functions a required name? Or just the custom?
Here are a couple of pointers:
.map is a javascript array function (note that d3.arrays also have a map function) that return a new array with the result of calling the callback function for each element in the original array.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map
.data is a d3 concept and is the way to bind a data set to a selection
http://alignedleft.com/tutorials/d3/binding-data
http://bost.ocks.org/mike/join/
Hope this helps.

What is the shorthand in d3 for the identity function ("function(d) { return d; }")?

Looking through the d3 docs, I see this code (the identity function) repeated everywhere:
function(d) { return d; }
Is there a built-in way in d3 to do this? I know I could create my own no-op identity function and use it everywhere, but it seems like d3 should provide this.
I was wondering why there wasn't a d3.identity function as part of the library, and couldn't find a reason not to have one.
From a performance point of view, defining an identity function gives better performance than reusing the Object constructor. It makes little difference if you reuse the same identity function across different types. Some performance tests are here.
So in my case I abuse D3 and added the function myself:
d3.identity = function(d) { return d; }
If you're using underscore then you can also use the _.identity function.
Regarding using the Object constructor, my assumption is that this creates a new, unnecessary object each time it's called which wastes memory and CPU time, both for creation and garbage collection. This may be optimised away for immutable types such as numbers in some runtimes.
EDIT Phrogz has a brief article showing some useful shorthand for reducing the number of lambdas when working with D3, that includes an identity function.
I used to see Mike do .data(Object) which seems to work
http://tributary.io/inlet/5842519
but I'm not sure why I don't see it around anymore
var svg = d3.select("svg")
var data = [[10,20],[30,40]];
svg.selectAll("g")
.data(data)
.enter()
.append("g")
.attr("transform", function(d,i) { return "translate(" + [i * 100, 0] + ")"})
.selectAll("circle")
//.data(function(d) { console.log(d); return d })
.data(Object)
.enter()
.append("circle")
.attr({
cx: function(d,i) { return 100 + i * 40 },
cy: 100,
r: function(d,i) { return d }
})

Chained animations/transitions over each graph node - D3.js

I want to be able to change the radius of each node in my graph that i am creating using d3.js. However, i want to change the radius of each node, one at a time, and i want to able to control the delay between each change along with the sequence of the nodes.
For now this is what i have in terms of code:
var nodes = svg.selectAll(".node");
nodes.each(function() {
d3.select(this).
transition().
delay(100).
attr("r", "5")
});
You can replicate this simply by using the code at this link: http://bl.ocks.org/mbostock/4062045. The code that i have pasted above is simply an addition to the code at the aforementioned link.
When i run this, all the nodes in my graph transition simultaneously, i.e. grow in size (radius) simultaneously. I however want them to transition i.e. grow in size (radius), one at a time. I repeat that i want to be able to control:
the delay between the transition of each node and
the order of nodes that undergo the transitions.
Any pointers, tutorials, or even other stackoverflow answers would be great. I would ideally want some code examples.
The closest i have come to in terms of online references is this subsection of a tutorial on d3.js transitions: http://bost.ocks.org/mike/transition/#per-element. However, it lacks a concrete code example. I, being new to d3.js and javascript in general, am not able to pick it up without concrete code examples.
You can do this quite easily by calculating a delay based on each node's index. Mike Bostock has an example of such an implementation here. This is the relevant code:
var transition = svg.transition().duration(750),
delay = function(d, i) { return i * 50; };
transition.selectAll(".bar")
.delay(delay)
.attr("x", function(d) { return x0(d.letter); }); // this would be .attr('r', ... ) in your case
To control the order of the transition, all you would then have to do is sort the array so that the elements' indices reflect the animation flow you want. To see how to sort an array, refer to the documentation on JavaScript's array.sort method and also see the Arrays > Ordering section of the D3 API reference.

Categories

Resources