I'm reading a csv file, and need to compute two figures from this data using D3.js or normal JavaScript:
This might be able to be done in one step, but I've broken it down for the purposes of explanation:
Once my data is read in, I need to iterate through each of the columns, labelled "one" to "ten"
(the length of this data is an unknown length, so it might go up to twelve or twenty),
...each time multiplying each column which comes after "multiplier" by variable called "multiplier"
(in the data, I gave it arbitrary values of 1.5, 1, 0,5 to make reading visually clearer).
This gives a new grid of figures from which a maximum score and minimum score of each of these new figures must be calculated for each ID from 1 to n. So each ID will have a max and minimum. I need to know the maximum and minimum of these new scores across the entire data returned as variables.
The data is read in:
d3.csv("data.csv", function(csv) {
var mydata = bars
.selectAll("rect")
.data(csv)
.enter()};
The example data appears as:
ID,total,mutiplier,one,two,three,four,five,six,seven,eight,nine,ten
1,16500,1.5,0.362,0.37,0.1,0.101,0.035,0.362,0.37,0.1,0.101,0.035
2,61000,1,0.426,0.382,0.115,0.084,0.053,0.426,0.382,0.115,0.084,0.053
3,48700,1.5,0.156,0.531,0.195,0.399,0.14,0.156,0.149,0.106,0.399,0.14
4,33000,0.5,0.462,0.409,0.149,0.106,0.149,0.106,0.085,0.1,0.106,0.051
5,8000,0.5,0.327,0.316,0.085,0.1,0.085,0.1,0.057,0.245,0.1,0.057
6,12760,1,0.149,0.195,0.057,0.245,0.057,0.245,0.119,0.114,0.245,0.08
This original data cannot be replaced as I reference it later.
So from this data, after iterating through all columns, and taking the max and min from each over the whole data --- the minimum is 0.003535 and the maximum is 3.8875575
...and I need the function to return a var min and var max for next calculation.
Hope someone out there can help!
One way you can load your data as a text, and latter use d3.csv.parseRows to parse your CSV as an array of arrays. So just take the slice, ignoring the first 3 columns.
d3.text('data.csv', function(text)
{
var rows = d3.csv.parseRows(text, function(row, index)
{
// skip header, coerce to Number values
if(index > 0)
{
return row.map(Number);
}
});
var extent = rows.reduce(function(result, row)
{
return d3.extent(result.concat(row.slice(3).map(function(value)
{
return value * row[2];
})));
}, [NaN, NaN]);
var min = extent[0];
var max = extent[1];
});
Other way if array of objects is more convenient structure for later plotting, you can do the following.
var nonMeasureColumns = {'ID': 0, 'total': 0, 'multiplier': 0};
d3.csv('data.csv')
.row(function(row)
{
for(var key in row)
{
row[key] = Number(row[key]);
}
return row;
})
.get(function(error, rows)
{
var extent = rows.reduce(function(result, row)
{
return d3.extent(result.concat(d3.map(row).entries()
.filter(function(entry)
{
return !(entry.key in nonMeasureColumns);
})
.map(function(entry)
{
return entry.value * row['multiplier'];
})
));
}, [NaN, NaN]);
var min = extent[0];
var max = extent[1];
});
Related
I'm learning crossfilter and want to filter some data.
So I have this big json file (It's actually csv) with almost 4 million lines of data.
The file looks like this:
timestamp,speed,power,distance,temperature,heart_rate,cadence,altitude,lat,long
1514806362,6569,172,6.63,14,90,87,2548,500870453,33664825
And all I'm trying to do is filter the distance.
d3.csv('data2.json').then(function(data) {
data.forEach(function(d, i) {
d.date = parseDate(d.timestamp);
});
// Create instance of crossfilter with dataset
var cf = crossfilter(data);
// Ask crossfilter how many rows it has / Size of data
dataSize = cf.size();
console.log("Data size: " + dataSize);
function parseDate(d) {
return new Date(d*1000);
}
var distanceDimension = cf.dimension(function(d) { return d.distance; });
console.log("Creating delay dimension");
// List top 3 distance
distanceDimensionTop3 = distanceDimension.top(3);
console.log("Top 3 distance");
console.table(distanceDimensionTop3);
// List bottom 3 distance
distanceDimensionBottom3 = distanceDimension.bottom(3);
console.log("Bottom 3 distance");
console.table(distanceDimensionBottom3);
// Apply filter to get only distance above 5000 meters
distanceDimension.filterFunction(function(d) {return d > 5000;});
console.log("Appliyng distance filter for only distance above 5000 meters");
// List bottom 3 distance
console.log("Bottom 3 distance with filter applied");
console.table(distanceDimension.bottom(3));
});
But somehow my code fails right at the beginning listing the top 3 distance.
I get a value of 99999.88 but in my data file, I have bigger values.
Also when I apply the filter to my dimension it doesn't filter right.
Thanks in advance.
I'm working with "airplane" data set from this reference http://square.github.io/crossfilter/
date,delay,distance,origin,destination
01010001,14,405,MCI,MDW
01010530,-11,370,LAX,PHX
...
// Create the crossfilter for the relevant dimensions and groups.
var flight = crossfilter(flights),
all = flight.groupAll(),
date = flight.dimension(function(d) { return d.date; }),
dates = date.group(d3.time.day),
hour = flight.dimension(function(d) { return d.date.getHours() + d.date.getMinutes() / 60; }),
hours = hour.group(Math.floor),
delay = flight.dimension(function(d) { return Math.max(-60, Math.min(149, d.delay)); }),
delays = delay.group(function(d) { return Math.floor(d / 10) * 10; }),
distance = flight.dimension(function(d) { return Math.min(1999, d.distance); }),
distances = distance.group(function(d) { return Math.floor(d / 50) * 50; });
Following document of Crossfilter, "groups don't observe the filters on their own dimension" => we can get filtered records from groups that theirs dimension are not filtered at this moment, can't we?
I have performed some test but this is not correct:
console.dir(date.group().all()); // 50895 records
console.dir(distance.group().all()); // 297 records
date.filter([new Date(2001, 1, 1), new Date(2001, 2, 1)]);
console.dir(date.group().all()); // 50895 records => this number still the same because we are filtering on its dimension
console.dir(distance.group().all()); // 297 records => but this number still the same too. I don't know why
Could you please explain for me why number of "distance.group().all()" still the same as before we perform the filter? Am I missing something here?
If we really cannot get "filtered records" from "distance dimension" by this way, how can I achive this?
Thanks.
So, yes, this is the expected behavior.
Crossfilter will create a "bin" in the group for every value it finds by applying the dimension key and group key functions. Then when a filter is applied, it will apply the reduce-remove function, which by default subtracts the count of rows removed.
The result is that empty bins still exist, but they have a value of 0.
EDIT: here is the Crossfilter Gotchas entry with further explanation.
If you want to remove the zeros, you can use a "fake group" to do that.
function remove_empty_bins(source_group) {
return {
all:function () {
return source_group.all().filter(function(d) {
//return Math.abs(d.value) > 0.00001; // if using floating-point numbers
return d.value !== 0; // if integers only
});
}
};
}
https://github.com/dc-js/dc.js/wiki/FAQ#remove-empty-bins
This function wraps the group in an object which implements .all() by calling source_group.all() and then filters the result. So if you're using dc.js you could supply this fake group to your chart like so:
chart.group(remove_empty_bins(yourGroup));
The docs for d3's stacking function d3.stack show an example with an array of objects (each json object representing the ensemble of points for whatever the x-axis is measuring). Eg:
var data = [
{month: new Date(2015, 0, 1), apples: 3840, bananas: 1920, cherries: 960},
{month: new Date(2015, 1, 1), apples: 1600, bananas: 1440, cherries: 720}
]
I'm trying to produce a stacked histogram with a matrix of data series ([ [], [], [], etc ]). It's easy enough to iterate through the rows and get a series of histogram bins (having pre-defined the x scale and domain elsewhere):
for(let i=0; i<data.length; i++){
bins[i] = d3.histogram()
.domain(x.domain())
.thresholds(x.ticks(10))
(data[i]);
}
And create groups for each data series inside another loop:
let bars = this.svg.selectAll(".series" + i)
.data(this.bins[i])
.enter().append("g")
.classed("series" + i, true)
But of course doing it like that I get stuck here. How am I supposed to bars.append("rect") at the correct x,y coords for that particular series? Stated differently, I have a really useful array of bins at the moment, looking something like:
[
[[1,2,3,3], [5,8,9], [10], ... etc], //series0 grouping by bins of 5
[[1,3], [7,7,9,9], [11], ... etc], //series1
[[2,3,3], [8,9], [10,12], ... etc], //series2
...etc
]
Is there a way to invoke stack without munging all the data into json key,value pairs?
I took a glance at the source and no comments + single char variables = me understanding that it's not going to happen without munging. I present therefore my shoddy attempt at saving someone else some time:
/*
* Static helper method to transform an array of histogram bins into an array of objects
* suitable for feeding into the d3.stack() function.
* Args:
* bins (array): an array of d3 histogram bins
*/
static processBins(bins){
let temp = {}; // the keys for temp will be the bin name (i.e. the bin delimiter value)
// now create an object with a key for each bin, and an empty object as a placeholder for the data
bins[0].map( (bin) => { temp[bin.x0] = {}});
for(let i=0; i<bins.length; i++){
//traverse each series
bins[i].map( bin => {
temp[bin.x0]["series"+i] = bin.length; //push the frequency counts for each series
});
}
/* now we have an object whose top-level keys are the bins:
{
binName0: { series0: freqCount0, series1: freqCount1, ...},
binName1: {...},
...
}
now, finally we're going to make an arrays of objects containing all the series' freqencies for that bin
*/
let result = [];
for(let binName in temp){ // iterate through the bin objects
let resultRow = {};
if(temp.hasOwnProperty(binName)){
resultRow["bin"] = binName; //put the bin name key/value pair into the result row
for(let seriesName in temp[binName]){ //iterate through the series keys
if(temp[binName].hasOwnProperty([seriesName])){
resultRow[seriesName] = temp[binName][seriesName];
}
}
}
result.push(resultRow);
}
return result;
}
Call like:
let stack = d3.stack().keys( bins.map( (d,i)=>{return "series"+i})); //stack based on series name keys
let layers = stack(MyCoolHistogram.processBins(bins));
//and now your layers are ready to enter() into a d3 selection.
Edit:
I note that the stack data third argument in anonymous functions seems to be the array of elements. I.e. it's no longer the stack layer index. Eg, when grouping bars side-by-side: http://bl.ocks.org/mbostock/3943967
This breaks grouping functions that rely on this index number to calculate the x position:
rect.attr("x", (d,i,j) => { return x(d.data.bin) + j*barWidth/numberOfSeries});
I guess it's telling that Mike's gist still uses v3, despite being updated long after v4 came out.
To get the layer index you have to use the layer.index attribute directly. So when grouping you would translate the entire layer (which screws up bar-by-bar animations, of course... sigh).
let layers = d3.stack(yourData);
let layer = this.svg.selectAll(".layer")
.data(layers)
layer.transition()
.attr("transform", d => { return "translate(" + d.index*barWidth/numberOfSeries + ",0)"; });
I want to use HighCharts to create boxplots. As I can see in the docs I need to already provide Highcharts with the required five-point-summary, i.e., min, max, q1, q3, median values for creating the boxplot.
Given an arbitrary-length array constisting of numbers, how can I calculate these five numbers efficiently? Is there a quick means in JS to do so?
Although you have a solution for doing it server side, I took a few minutes to convert my PHP solution to a Javascript solution, to address the initial question.
step 1) function to calculate percentiles:
//get any percentile from an array
function getPercentile(data, percentile) {
data.sort(numSort);
var index = (percentile/100) * data.length;
var result;
if (Math.floor(index) == index) {
result = (data[(index-1)] + data[index])/2;
}
else {
result = data[Math.floor(index)];
}
return result;
}
//because .sort() doesn't sort numbers correctly
function numSort(a,b) {
return a - b;
}
step 2) wrapper to grab min, max, and each of the required percentiles
//wrap the percentile calls in one method
function getBoxValues(data) {
var boxValues = {};
boxValues.low = Math.min.apply(Math,data);
boxValues.q1 = getPercentile(data, 25);
boxValues.median = getPercentile(data, 50);
boxValues.q3 = getPercentile(data, 75);
boxValues.high = Math.max.apply(Math,data);
return boxValues;
}
step 3) build a chart with it
example:
http://jsfiddle.net/jlbriggs/pvq03hr8/
[[edit]]
A quick update that contemplates outliers:
http://jsfiddle.net/jlbriggs/db11fots/
I'm using highcharts.js to visualize data series from a database. There's lots of data series and they can potantially change from the database they are collected from with ajax. I can't guarantee that they are flawless and sometimes they will have blank gaps in the dates, which is a problem. Highcharts simply draws a line through the entire gap to the next available date, and that's bad in my case.
The series exists in different resolutions. Hours, Days and Weeks. Meaning that a couple of hours, days or weeks can be missing. A chart will only show 1 resolution at a time on draw, and redraw if the resolution is changed.
The 'acutal' question is how to get highcharts to not draw those gaps in an efficient way that works for hous, days and weeks
I know highcharts (line type) can have that behaviour where it doesn't draw a single line over a gap if the gap begins with a null.
What I tried to do is use the resolution (noted as 0, 1, 2 for hour day or week), to loop through the array that contains the values for and detect is "this date + 1 != (what this date + 1 should be)
The code where I need to work this out is here. Filled with psudo
for (var k in data.values) {
//help start, psudo code.
if(object-after-k != k + resolution){ //The date after "this date" is not as expected
data.values.push(null after k)
}
//help end
HC_datamap.push({ //this is what I use to fill the highchart later, so not important
x: Date.parse(k),
y: data.values[k]
});
}
the k objects in data.values look like this
2015-05-19T00:00:00
2015-05-20T00:00:00
2015-05-21T00:00:00
...and more dates
as strings. They can number in thousands, and I don't want the user to have to wait forever. So performance is an issue and I'm not an expert here either
Please ask away for clarifications.
I wrote this loop.
In my case my data is always keyed to a date (12am) and it moves either in intervals of 1 day, 1 week or 1 month. Its designed to work on an already prepared array of points ({x,y}). Thats what dataPoints is, these are mapped to finalDataPoints which also gets the nulls. finalDataPoints is what is ultimately used as the series data. This is using momentjs, forwardUnit is the interval (d, w, or M).
It assumes that the data points are already ordered from earliest x to foremost x.
dataPoints.forEach(function (point, index) {
var plotDate = moment(point.x);
finalDataPoints.push(point);
var nextPoint = dataPoints[index+1];
if (!nextPoint) {
return;
}
var nextDate = moment(nextPoint.x);
while (plotDate.add(1, forwardUnit).isBefore(nextDate)) {
finalDataPoints.push({x: plotDate.toDate(), y: null});
}
});
Personally, object with property names as dates may be a bit problematic, I think. Instead I would create an array of data. Then simple loop to fill gaps shouldn't be very slow. Example: http://jsfiddle.net/4mxtvotv/ (note: I'm changing format to array, as suggested).
var origData = {
"2015-05-19T00:00:00": 20,
"2015-05-20T00:00:00": 30,
"2015-05-21T00:00:00": 50,
"2015-06-21T00:00:00": 50,
"2015-06-22T00:00:00": 50
};
// let's change to array format
var data = (function () {
var d = [];
for (var k in origData) {
d.push([k, origData[k]]);
}
return d;
})();
var interval = 'Date'; //or Hour or Month or Year etc.
function fillData(data, interval) {
var d = [],
now = new Date(data[0][0]), // first x-point
len = data.length,
last = new Date(data[len - 1][0]), // last x-point
iterator = 0,
y;
while (now <= last) { // loop over all items
y = null;
if (now.getTime() == new Date(data[iterator][0]).getTime()) { //compare times
y = data[iterator][1]; // get y-value
iterator++; // jump to next date in the data
}
d.push([now.getTime(), y]); // set point
now["set" + interval](now.getDate() + 1); // jump to the next period
}
return d;
}
var chart = new Highcharts.StockChart({
chart: {
renderTo: 'container'
},
series: [{
data: fillData(data, interval)
}]
});
Second note: I'm using Date.setDay() or Date.setMonth(), of course if your data is UTC-based, then should be: now["setUTC" + interval].