How to organise/nest data for d3.js chart output - javascript

I'm looking for some advice on how to effectively use large amounts of data with d3.js. Lets say for instance, I have this data set taken from a raw .csv file (converted from excel);
EA
,Jan_2016,Feb_2016,Mar_2016
Netherlands,11.7999,15.0526,13.2411
Belgium,25.7713,24.1374
France,27.6033,23.6186,20.2142
EB
,Jan_2016,Feb_2016,Mar_2016
Netherlands,1.9024,2.9456,4.0728
Belgium,-,6.5699,7.8894
France,5.3284,4.8213,1.471
EC
,Jan_2016,Feb_2016,Mar_2016
Netherlands,3.1499,3.1139,3.3284
Belgium,3.0781,4.8349,5.1596
France,16.3458,12.6975,11.6196
Using csv I guess the best way to represent this data would be something like;
Org,Country,Month,Score
EA,Netherlands,Jan,11.7999
EA,Belgium,Jan,27.6033
EA,France,Jan,20.2142
EA,Netherlands,Feb,15.0526
EA,Belgium,Feb,25.9374
EA,France,Feb,23.6186
EA,Netherlands,Mar,13.2411
EA,Belgium,Mar,24.1374
EA,France,Mar,20.2142
This seems very long winded to me, and would use up a lot of time. I was wondering if there was an easier way to do this?
From what I can think of, I assume that JSON may be the more logical choice?
And for context of what kind of chart this data would go into, I would be looking to create a pie chart which can update the data depending on the country/month selected and comparing the three organisations scores each time.
(plnk to visualise)
http://plnkr.co/edit/P3loEGu4jMRpsvTOgCMM?p=preview
Thanks for any advice, I'm a bit lost here.

I would say the intermediary step you propose is a good one for keeping everything organized in memory. You don't have to go through a csv file though, you can just load your original csv file and turn it into an array of objects. Here is a parser:
d3.text("data.csv", function(error, dataTxt) { //import data file as text first
var dataCsv=d3.csv.parseRows(dataTxt); //parseRows gives a 2D array
var group=""; // the current group header ("organization")
var times=[]; //the current month headers
var data=[]; //the final data object, will be filled up progressively
for (var i=0;i<dataCsv.length;i++) {
if (dataCsv[i].length==1 ) { //group name
if ( dataCsv[i][0] == "")
i++; //remove empty line
group = dataCsv[i][0]; //get group name
i++;
times = dataCsv[i];//get list of time headings for this group
times.shift(); // (shift out first empty element)
} else {
country=dataCsv[i].shift(); //regular row: get country name
dataCsv[i].forEach(function(x,j){ //enumerate values
data.push({ //create new data item
Org: group,
Country: country,
Month: times[j],
Score: x
})
})
}
}
This gives the following data array:
data= [{"Org":"EA","Country":"Netherlands","Month":"Jan_2016","Score":"11.7999"},
{"Org":"EA","Country":"Netherlands","Month":"Feb_2016","Score":"15.0526"}, ...]
This is IMO the most versatile structure you can have. Not the best for memory usage though.
A simple way to nest this is the following:
d3.nest()
.key(function(d) { return d.Month+"-"+d.Country; })
.map(data);
It will give a map with key-values such as:
"Jan_2016-Netherlands":[{"Org":"EA","Country":"Netherlands","Month":"Jan_2016","Score":"11.7999"},{"Org":"EB","Country":"Netherlands","Month":"Jan_2016","Score":"1.9024"},{"Org":"EC","Country":"Netherlands","Month":"Jan_2016","Score":"3.1499"}]
Use entries instead of mapto have an array instead of a map, and use a rollup function if you want to simplify the data by keeping only the array of scores. At this point it is rather straightforward to plug it into any d3 drawing tool.
PS: a Plunker with the running code of this script. Everything is shown in the console.

Related

Parsing CSV file with non normalized data and overloaded delimiter in NodeJS

My aim is to parse a CSV-dataset with non normalized data in some rows which is enclosed in "". I cannot split it by ";" because this char is also used inside of the data.
I ask myself, if there is an easy way to solve this!?
Some rows contain non normalized data in "goes_to" column, others in "comes_from" column like this (see row 'John' and 'David'). This data uses the ";" delimiter which creates problems.
Name;goes_to;comes_from
Peter;;London
Ruth;Boston;
Brandon;;
John;;"Bern;Madrid;Tel Aviv"
David;"New York;Paris;Berlin";
Eventually the aim is to normalize the data and put it into two separate multimap structures, so I am able to access that data individually.
comes_from_multimap.get('John'); >>> ['Bern', 'Madrid']
goes_to_multimap.get('David') >>> ['New York','Paris','Bern']
I use a line leader, read the CSV line by line, and I manage to extract the string between the parenthesis like the following code, to decide if this line needs normalisation. If the row contains non normalized data, I would use a loop. Though with my approach I am losing the information, if it came from "goes_to" or "comes_from" column because my code just gets me the text between two parenthsis without the context where it came from.
nonNormalizedSubString = line.substring(line.indexOf("\"") + 1, line.lastIndexOf("\""));
try csv-parser
set ; as a delimiter, you'll get an array of results, so you can use element's position to determine to which column it belongs and create the dataset you need:
const csvParse = require('csv-parse')
const data = `Name;goes_to;comes_from
Peter;;London
Ruth;Boston;
Brandon;;
John;;"Bern;Madrid;Tel Aviv"
David;"New York;Paris;Berlin";`
const records = csvParse.parse(data, {
delimiter: ';',
trim: true
}, (err, records) => {
console.log(records);
});

D3 making new, smaller CSV file

I'm stuck with a quite simple problem and need help.
I have a big CSV file with 50 columns which i absolutely can't modifie.
Now i want to make a chart where i only need 5-6 columns out of it.
My idea was now to make a new "data2" which contains only these 5-6 columns (with key and evertything) and work with this data2.
But i'm not able to create this data2.
To filter which columns i need i wanted to work with regex. Something like this:
d3.keys(data[0]).filter(function(d) { return d.match(/.../); })
But how do i create the new data2 then? I'm sure i need to work with d3.map but even with the api i'm not able to understand how it works correctly.
Can someone help me out?
Firstly, your question's title is misleading: you're not asking about making a smaller CSV file, since the file itself is not changed. You're asking about changing the data array created by D3 when that CSV was parsed.
That brings us to the second point: you don't need to do that. Since you already lost some time/resources loading the CSV and parsing that CSV, the best idea is just keeping it the way it is, and using only those 5 columns you want. If you try to filter some columns out (which means deleting some properties from each object in the array) you will only add more unnecessary tasks for the browser to execute. A way better idea is changing the CSV itself.
However, if you really want to do this, you can use the array property that d3.csv creates when it loads an CSV, called columns, and a for...in loop to delete some properties from each object.
For instance, here...
var myColumns = data.columns.splice(0, 4);
... I'm getting the first 4 columns in the CSV. Then, I use this array to delete, in each object, the properties regarding all other columns:
var filteredData = data.map(function(d) {
for (var key in d) {
if (myColumns.indexOf(key) === -1) delete d[key];
}
return d;
})
Here is a demo. I'm using a <pre> element because I cannot use a real CSV in the Stack snippet. My "CSV" has 12 columns, but my filtered array keeps only the first 4:
var data = d3.csvParse(d3.select("#csv").text());
var myColumns = data.columns.splice(0, 4);
var filteredData = data.map(function(d) {
for (var key in d) {
if (myColumns.indexOf(key) === -1) delete d[key];
}
return d;
})
console.log(filteredData)
pre {
display: none;
}
<script src="https://d3js.org/d3.v4.min.js"></script>
<pre id="csv">foo,bar,baz,foofoo,foobar,foobaz,barfoo,barbar,barbaz,bazfoo,bazbar,bazbaz
1,2,5,4,3,5,6,5,7,3,4,3
3,4,2,8,7,6,5,6,4,3,5,4
8,7,9,6,5,6,4,3,4,2,9,8</pre>

JavaScript - Can't seem to 'reset' array passed to a function

I am somewhat new to JavaScript, but I am reasonably experienced with programming in general. I suspect that my problem may have something to do with scoping, or the specifics of passing arrays as parameters, but I am uncertain.
The high-level goal is to have live plotting with several 'nodes', each of which generates 50 points/sec. I have gotten this working running straight into an array and rendered by dygraphs and C3.js and quickly realized that this is too much data to continually live render. Dygraphs seems to start impacting the user experience after about 30s and C3.js seems to choke at around 10s.
The next attempt is to decimate the plotted data based on zoom level.
I have data saved into an 'object' which I am using somewhat like a dictionary in other languages. This is going well using AJAX requests. The idea is to create a large data buffer using AJAX requests and use the keys to store the data generated by units according to the serial number as the keys. This is working well and the object is being populated as expected. I feel that it is informative to know the 'structure' of this object before I get to my question. It is as follows:
{
1: [[x10,y10], [x11,y11], [...], [x1n, y1n]],
2: [[x20,y20], [x21,y21], [...], [x2n, y2n]],
... : [ ... ]
a: [[xa0,ya0], [xa1,ya1], [...], [xan, yan]]
}
Periodically, a subset of that data will be used to generate a dygraphs plot. I am decimating the stored data and creating a 'plot buffer' to hold a subset of the actual data.
The dygraphs library takes data in several ways, but I would like to structure it 'natively', which is just an array of arrays. Each array within the array is a 'row' of data. All rows must have the same number of elements in order to line up into columns. The data generated may or may not be at the same time. If the data x values perfectly match, then the resulting data would look like the following for only two nodes since x10 = x20 = xn0:
[
[x10, y10, y20],
[x11, y11, y21],
[ ... ],
[xan, yan, yan]
]
Note that this is just x and y in rows. In reality, the times for each serial number may not line up, so it may be much closer to:
[
[x10, y10, null],
[x20, null, y20],
[x11, y11, y21],
[ ... ],
[xan, yan, yan]
]
Sorry for all of the background. We can get to the code tha tI'm having trouble with. I'm periodically attempting to create the plot buffer using the following code:
window.intervalId = setInterval(
function(){
var plotData = formatData(nodeData, 45000, 49000, 200);
/* dygraphs stuff here */
},
500
);
function formatData(dataObject, start, end, stride){
var smallBuffer = [];
var keys = Object.keys(dataObject);
keys.forEach(
function(key){
console.log('key: ', key);
mergeArrays(dataObject[key], smallBuffer, start, end, stride);
}
);
return smallBuffer;
}
function mergeArrays(sourceData2D, destDataXD, startInMs, endInMs, strideInMs){
/* ensure that the source data isn't undefined */
if(sourceData2D){
/* if the destDataXD is empty, then basically copy the
* sourceData2D into it as-is taking the stride into account */
if(destDataXD.length == 0){
/* does sourceData2D have a starting point in the time range? */
var startIndexSource = indexNear2D(sourceData2D, startInMs);
var lastTimeInMs = sourceData2D[startIndexSource][0];
for(var i=startIndexSource; i < sourceData2D.length; i++){
/* start to populate the destDataXD based on the stride */
if(sourceData2D[i][0] >= (lastTimeInMs + strideInMs)){
destDataXD.push(sourceData2D[i]);
lastTimeInMs = sourceData2D[i][0];
}
/* when the source data is beyond the time, then break the loop */
if(sourceData2D[i][0] > endInMs){
break;
}
}
}else{
/* the destDataXD already has data in it, so this needs to use that data
* as a starting point to merge the new data into the destination array */
var finalColumnCount = destDataXD[0].length + 1;
console.log('final column count: ', finalColumnCount);
/* add the next column to each existing row as 'null' */
destDataXD.forEach(
function(element){
element.push(null);
}
);
/* TODO: move data into destDataXD from sourceData2D */
}
}
}
To add some information since it probably isn't self-explanatory without some effort. I create two functions, 'formatData' and 'mergeArrays'. These could have been done in a single function, but it was easier for me to separate out the 'object' domain and the 'array' domain conceptually. The 'formatData' function simply iterates through all of the data stored in each key, calling the 'mergeArray' routine each time through. The 'mergeArray' routine is not yet complete and is where I'm having my issue.
The first time through, formatData should be creating an empty array - smallBuffer - into which data is merged using mergeArrays. The first time executing 'mergeArrays' I see that the smallBuffer is indeed being created and is an empty array. This empty array is supplied as a parameter to 'mergeArrays' and - the first time through - this works perfectly. The next time through, the 'smallBuffer' array is no longer empty, so the second case in 'mergeArrays' gets executed. The first step for me was to calculate the number of columns so that I could pad each row appropriately. This worked fine, but helped point out the problem. The next step was to simply append an empty column of 'null' values to each row. This is where things got weird. After the 1st time through 'mergeData', the destDataXD still contained 'null' data from the previous executions. In essence, it appears that the 'var smallBuffer = [];' doesn't actually clear and retains something. That something is not apparent until near the end. I can't explain exactly what is going on b/c I don't fully understand it, but destDataXD continually grows 'nulls' at the end without ever being properly reset as expected.
Thank you for the time and I look forward to hearing your thoughts, j
Quickly reading through the code, the danger point I see is where you first add an element to destDataXD.
destDataXD.push(sourceData2D[i]);
Note that you are not pushing a copy of the array. You are adding a reference to that array. destDataXD and sourceData2D are now sharing the same data.
So, of course, when you push any null values onto an array in destDataXD, you are also modifying sourceData2D.
You should use the javascript array-copying method slice
destDataXD.push(sourceData2D[i].slice());

JSON keep one line out of n

I want to display curves of data in JSON format using flot.js but the data file contains two many lines (over than 50,000) and it takes too much time to render the graph. So I would like to keep only one line out of n.
For example for n=3, I want to keep lines with indexes : 1,4,7,10 etc
The JSON data file has the following format :
[{"t":"22.40",
"lumi":"738.00",
"h":"31.20",
"f":"72.32",
"hi":"76.43",
"date":"2015-02-28T13:38:41.025Z",
"_id":"54f1c4e17cb06e5e09015b63"},
... 50,000 other lines
What is the simplest way to achieve this ?
you may want to use Array.filter(), see https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Array/filter
to compact your array by removing rows every n, create this
Array.prototype.oneOutOf= function(n) {
return this.filter(function(element,index) {
return (index%n)==0
});
}
and then try
JSON.parse(jsondata).oneOutOf(10);

Crossfilter.js: Creating Dimensions/Groups With Nested Attributes

I'm working on a visualization utilizing the crossfilter.js library, and I'm a bit confused on how to create some dimensions from nested attributes within my dataset. For instance, each instance of the dataset has multiple dates associated with it, resulting in a data structure that looks like this:
[ { name: 'instance_name',
dates: ['2014-11-11', '2013-07-06', '2011-02-04'],
category: 'category 1' },
{ name: 'instance_name2',
dates: ['2012-01-01', '2013-03-07'],
category: 'category 2' } ]
I'd like to be able to create dimensions that will allow for filtering based on, say, the dates and the category and dimensions are a straightforward way to do this with crossfilter. However, I'm not sure how to parse the dates. I've tried first creating a date dimension using something like:
var cf = crossfilter(data);
var dateDim = cf.dimension(function(d) { return d.dates; });
and then tried to store just the dates as a variable using the .map() method like so:
var date = dateDim.top(Infinity).map(function(d) { return d.dates; });
The above does retrieve just the dates and stores them as a variable, however (a) this is just an array of dates each of which is a string, and (b) this doesn't get me any closer to linking the dateDim to other crossfilter dimensions I'd like to create for the visualization. Any help would be greatly appreciated. Thanks for reading.
My recommendation is be to flatten your structure before loading it in to Crossfilter. So your 1st record will become 3 records (1 for each date) and your 2nd record will become 2 records. You can then parse the dates and treat them as a dimension in Crossfilter without too much trouble. The downside is that counting because a problem, but that is manageable with custom grouping functions.

Categories

Resources