Highcharts Boxplots How to get five point summary? - javascript

I want to use HighCharts to create boxplots. As I can see in the docs I need to already provide Highcharts with the required five-point-summary, i.e., min, max, q1, q3, median values for creating the boxplot.
Given an arbitrary-length array constisting of numbers, how can I calculate these five numbers efficiently? Is there a quick means in JS to do so?

Although you have a solution for doing it server side, I took a few minutes to convert my PHP solution to a Javascript solution, to address the initial question.
step 1) function to calculate percentiles:
//get any percentile from an array
function getPercentile(data, percentile) {
data.sort(numSort);
var index = (percentile/100) * data.length;
var result;
if (Math.floor(index) == index) {
result = (data[(index-1)] + data[index])/2;
}
else {
result = data[Math.floor(index)];
}
return result;
}
//because .sort() doesn't sort numbers correctly
function numSort(a,b) {
return a - b;
}
step 2) wrapper to grab min, max, and each of the required percentiles
//wrap the percentile calls in one method
function getBoxValues(data) {
var boxValues = {};
boxValues.low = Math.min.apply(Math,data);
boxValues.q1 = getPercentile(data, 25);
boxValues.median = getPercentile(data, 50);
boxValues.q3 = getPercentile(data, 75);
boxValues.high = Math.max.apply(Math,data);
return boxValues;
}
step 3) build a chart with it
example:
http://jsfiddle.net/jlbriggs/pvq03hr8/
[[edit]]
A quick update that contemplates outliers:
http://jsfiddle.net/jlbriggs/db11fots/

Related

Crossfilter - Cannot get filtered records from other groups (NOT from associate groups)

I'm working with "airplane" data set from this reference http://square.github.io/crossfilter/
date,delay,distance,origin,destination
01010001,14,405,MCI,MDW
01010530,-11,370,LAX,PHX
...
// Create the crossfilter for the relevant dimensions and groups.
var flight = crossfilter(flights),
all = flight.groupAll(),
date = flight.dimension(function(d) { return d.date; }),
dates = date.group(d3.time.day),
hour = flight.dimension(function(d) { return d.date.getHours() + d.date.getMinutes() / 60; }),
hours = hour.group(Math.floor),
delay = flight.dimension(function(d) { return Math.max(-60, Math.min(149, d.delay)); }),
delays = delay.group(function(d) { return Math.floor(d / 10) * 10; }),
distance = flight.dimension(function(d) { return Math.min(1999, d.distance); }),
distances = distance.group(function(d) { return Math.floor(d / 50) * 50; });
Following document of Crossfilter, "groups don't observe the filters on their own dimension" => we can get filtered records from groups that theirs dimension are not filtered at this moment, can't we?
I have performed some test but this is not correct:
console.dir(date.group().all()); // 50895 records
console.dir(distance.group().all()); // 297 records
date.filter([new Date(2001, 1, 1), new Date(2001, 2, 1)]);
console.dir(date.group().all()); // 50895 records => this number still the same because we are filtering on its dimension
console.dir(distance.group().all()); // 297 records => but this number still the same too. I don't know why
Could you please explain for me why number of "distance.group().all()" still the same as before we perform the filter? Am I missing something here?
If we really cannot get "filtered records" from "distance dimension" by this way, how can I achive this?
Thanks.
So, yes, this is the expected behavior.
Crossfilter will create a "bin" in the group for every value it finds by applying the dimension key and group key functions. Then when a filter is applied, it will apply the reduce-remove function, which by default subtracts the count of rows removed.
The result is that empty bins still exist, but they have a value of 0.
EDIT: here is the Crossfilter Gotchas entry with further explanation.
If you want to remove the zeros, you can use a "fake group" to do that.
function remove_empty_bins(source_group) {
return {
all:function () {
return source_group.all().filter(function(d) {
//return Math.abs(d.value) > 0.00001; // if using floating-point numbers
return d.value !== 0; // if integers only
});
}
};
}
https://github.com/dc-js/dc.js/wiki/FAQ#remove-empty-bins
This function wraps the group in an object which implements .all() by calling source_group.all() and then filters the result. So if you're using dc.js you could supply this fake group to your chart like so:
chart.group(remove_empty_bins(yourGroup));

Dc.js no update between charts

I am starting to use an try to understand dc.js.
Unfortunately, I cannot manage to make my graphs update when I select one value in one graph, as all the tutorials/examples are supposed to work.
I have made a jsfiddle here: http://jsfiddle.net/hqwzs3ko/12/
var ndx = crossfilter(dataSet);
dims = groups = {};
dims.countries = ndx.dimension(function(d) {
return d.countryCode;
});
dims.gender = ndx.dimension(function(d) {
return d.Gender;
});
dims.emailFlag = ndx.dimension(function(d) {
return d.emailFlag;
});
//dims.countries.filter("DEU");
groups.all = ndx.groupAll();
groups.countries = dims.countries.group();
groups.gender = dims.gender.group();
groups.emailFlag = dims.emailFlag.group();
The 3 graphs display 3 different dimensions, so filter applied to one show apply to the other?
Thanks in advance for your help.
Okay spotted.
All is working perfectly here: http://jsfiddle.net/hqwzs3ko/22/
The idea is to define a reduce function based on the value that you want to be counted on, in my case the number of records (clients).
Therefore the reduce function must be based on a different value than the one used for the dimension creation:
groups.clientsPerCountries = dims.countries.group().reduceCount(function (d) { return +d.key});
groups.clientsPerGender = dims.gender.group().reduceCount(function (d) { return +d.key});
groups.clientsPerEmailFlag = dims.emailFlag.group().reduceCount(function (d) { return +d.key});
With this everything is fine!

Row Chart grouping on two text dimensions [duplicate]

I need to create a rowchart in dc.js with inputs from multiple columns in a csv. So i need to map a column to each row and each columns total number to the row value.
There may be an obvious solution to this but i cant seem to find any examples.
many thanks
S
update:
Here's a quick sketch. Apologies for the standard
Row chart;
column1 ----------------- 64 (total of column 1)
column2 ------- 35 (total of column 2)
column3 ------------ 45 (total of column 3)
Interesting problem! It sounds somewhat similar to a pivot, requested for crossfilter here. A solution comes to mind using "fake groups" and "fake dimensions", however there are a couple of caveats:
it will reflect filters on other dimensions
but, you will not be able to click on the rows in the chart in order to filter anything else (because what records would it select?)
The fake group constructor looks like this:
function regroup(dim, cols) {
var _groupAll = dim.groupAll().reduce(
function(p, v) { // add
cols.forEach(function(c) {
p[c] += v[c];
});
return p;
},
function(p, v) { // remove
cols.forEach(function(c) {
p[c] -= v[c];
});
return p;
},
function() { // init
var p = {};
cols.forEach(function(c) {
p[c] = 0;
});
return p;
});
return {
all: function() {
// or _.pairs, anything to turn the object into an array
return d3.map(_groupAll.value()).entries();
}
};
}
What it is doing is reducing all the requested rows to an object, and then turning the object into the array format dc.js expects group.all to return.
You can pass any arbitrary dimension to this constructor - it doesn't matter what it's indexed on because you can't filter on these rows... but you probably want it to have its own dimension so it's affected by all other dimension filters. Also give this constructor an array of columns you want turned into groups, and use the result as your "group".
E.g.
var dim = ndx.dimension(function(r) { return r.a; });
var sidewaysGroup = regroup(dim, ['a', 'b', 'c', 'd']);
Full example here: https://jsfiddle.net/gordonwoodhull/j4nLt5xf/5/
(Notice how clicking on the rows in the chart results in badness, because, what is it supposed to filter?)
Are you looking for stacked row charts? For example, this chart has each row represent a category and each color represents a sub-category:
Unfortunately, this feature is not yet supported at DC.js. The feature request is at https://github.com/dc-js/dc.js/issues/397. If you are willing to wade into some non-library code, you could check out the examples referenced in that issue log.
Alternatively, you could use a stackable bar chart. This link seems to have a good description of how this works: http://www.solinea.com/blog/coloring-dcjs-stacked-bar-charts

Efficiently detect missing dates in array and inject a null (highcharts and jquery)

I'm using highcharts.js to visualize data series from a database. There's lots of data series and they can potantially change from the database they are collected from with ajax. I can't guarantee that they are flawless and sometimes they will have blank gaps in the dates, which is a problem. Highcharts simply draws a line through the entire gap to the next available date, and that's bad in my case.
The series exists in different resolutions. Hours, Days and Weeks. Meaning that a couple of hours, days or weeks can be missing. A chart will only show 1 resolution at a time on draw, and redraw if the resolution is changed.
The 'acutal' question is how to get highcharts to not draw those gaps in an efficient way that works for hous, days and weeks
I know highcharts (line type) can have that behaviour where it doesn't draw a single line over a gap if the gap begins with a null.
What I tried to do is use the resolution (noted as 0, 1, 2 for hour day or week), to loop through the array that contains the values for and detect is "this date + 1 != (what this date + 1 should be)
The code where I need to work this out is here. Filled with psudo
for (var k in data.values) {
//help start, psudo code.
if(object-after-k != k + resolution){ //The date after "this date" is not as expected
data.values.push(null after k)
}
//help end
HC_datamap.push({ //this is what I use to fill the highchart later, so not important
x: Date.parse(k),
y: data.values[k]
});
}
the k objects in data.values look like this
2015-05-19T00:00:00
2015-05-20T00:00:00
2015-05-21T00:00:00
...and more dates
as strings. They can number in thousands, and I don't want the user to have to wait forever. So performance is an issue and I'm not an expert here either
Please ask away for clarifications.
I wrote this loop.
In my case my data is always keyed to a date (12am) and it moves either in intervals of 1 day, 1 week or 1 month. Its designed to work on an already prepared array of points ({x,y}). Thats what dataPoints is, these are mapped to finalDataPoints which also gets the nulls. finalDataPoints is what is ultimately used as the series data. This is using momentjs, forwardUnit is the interval (d, w, or M).
It assumes that the data points are already ordered from earliest x to foremost x.
dataPoints.forEach(function (point, index) {
var plotDate = moment(point.x);
finalDataPoints.push(point);
var nextPoint = dataPoints[index+1];
if (!nextPoint) {
return;
}
var nextDate = moment(nextPoint.x);
while (plotDate.add(1, forwardUnit).isBefore(nextDate)) {
finalDataPoints.push({x: plotDate.toDate(), y: null});
}
});
Personally, object with property names as dates may be a bit problematic, I think. Instead I would create an array of data. Then simple loop to fill gaps shouldn't be very slow. Example: http://jsfiddle.net/4mxtvotv/ (note: I'm changing format to array, as suggested).
var origData = {
"2015-05-19T00:00:00": 20,
"2015-05-20T00:00:00": 30,
"2015-05-21T00:00:00": 50,
"2015-06-21T00:00:00": 50,
"2015-06-22T00:00:00": 50
};
// let's change to array format
var data = (function () {
var d = [];
for (var k in origData) {
d.push([k, origData[k]]);
}
return d;
})();
var interval = 'Date'; //or Hour or Month or Year etc.
function fillData(data, interval) {
var d = [],
now = new Date(data[0][0]), // first x-point
len = data.length,
last = new Date(data[len - 1][0]), // last x-point
iterator = 0,
y;
while (now <= last) { // loop over all items
y = null;
if (now.getTime() == new Date(data[iterator][0]).getTime()) { //compare times
y = data[iterator][1]; // get y-value
iterator++; // jump to next date in the data
}
d.push([now.getTime(), y]); // set point
now["set" + interval](now.getDate() + 1); // jump to the next period
}
return d;
}
var chart = new Highcharts.StockChart({
chart: {
renderTo: 'container'
},
series: [{
data: fillData(data, interval)
}]
});
Second note: I'm using Date.setDay() or Date.setMonth(), of course if your data is UTC-based, then should be: now["setUTC" + interval].

Calculate Max of Sum Product of D3 array

I'm reading a csv file, and need to compute two figures from this data using D3.js or normal JavaScript:
This might be able to be done in one step, but I've broken it down for the purposes of explanation:
Once my data is read in, I need to iterate through each of the columns, labelled "one" to "ten"
(the length of this data is an unknown length, so it might go up to twelve or twenty),
...each time multiplying each column which comes after "multiplier" by variable called "multiplier"
(in the data, I gave it arbitrary values of 1.5, 1, 0,5 to make reading visually clearer).
This gives a new grid of figures from which a maximum score and minimum score of each of these new figures must be calculated for each ID from 1 to n. So each ID will have a max and minimum. I need to know the maximum and minimum of these new scores across the entire data returned as variables.
The data is read in:
d3.csv("data.csv", function(csv) {
var mydata = bars
.selectAll("rect")
.data(csv)
.enter()};
The example data appears as:
ID,total,mutiplier,one,two,three,four,five,six,seven,eight,nine,ten
1,16500,1.5,0.362,0.37,0.1,0.101,0.035,0.362,0.37,0.1,0.101,0.035
2,61000,1,0.426,0.382,0.115,0.084,0.053,0.426,0.382,0.115,0.084,0.053
3,48700,1.5,0.156,0.531,0.195,0.399,0.14,0.156,0.149,0.106,0.399,0.14
4,33000,0.5,0.462,0.409,0.149,0.106,0.149,0.106,0.085,0.1,0.106,0.051
5,8000,0.5,0.327,0.316,0.085,0.1,0.085,0.1,0.057,0.245,0.1,0.057
6,12760,1,0.149,0.195,0.057,0.245,0.057,0.245,0.119,0.114,0.245,0.08
This original data cannot be replaced as I reference it later.
So from this data, after iterating through all columns, and taking the max and min from each over the whole data --- the minimum is 0.003535 and the maximum is 3.8875575
...and I need the function to return a var min and var max for next calculation.
Hope someone out there can help!
One way you can load your data as a text, and latter use d3.csv.parseRows to parse your CSV as an array of arrays. So just take the slice, ignoring the first 3 columns.
d3.text('data.csv', function(text)
{
var rows = d3.csv.parseRows(text, function(row, index)
{
// skip header, coerce to Number values
if(index > 0)
{
return row.map(Number);
}
});
var extent = rows.reduce(function(result, row)
{
return d3.extent(result.concat(row.slice(3).map(function(value)
{
return value * row[2];
})));
}, [NaN, NaN]);
var min = extent[0];
var max = extent[1];
});
Other way if array of objects is more convenient structure for later plotting, you can do the following.
var nonMeasureColumns = {'ID': 0, 'total': 0, 'multiplier': 0};
d3.csv('data.csv')
.row(function(row)
{
for(var key in row)
{
row[key] = Number(row[key]);
}
return row;
})
.get(function(error, rows)
{
var extent = rows.reduce(function(result, row)
{
return d3.extent(result.concat(d3.map(row).entries()
.filter(function(entry)
{
return !(entry.key in nonMeasureColumns);
})
.map(function(entry)
{
return entry.value * row['multiplier'];
})
));
}, [NaN, NaN]);
var min = extent[0];
var max = extent[1];
});

Categories

Resources