D3: Nest and excluding certain keys - javascript

I am new to d3 and trying to plot some data in one box for each of four specific states, similar to this page but with states not continents, and a lot more data points. I have a json dataset of more than 42,000 entries supposedly from just those 4 states.
To key by state, I used this:
d3.json("data/business.json",function(json) {
var data=d3.nest()
.key(function(d) {return d.state;})
.sortKeys(d3.ascending)
.entries(json);
Then later make one box for each state:
// One cell for each state
var g=svg.selectAll("g").data(data).enter()
.append("g")
(attributes, etc)
Fine, but I soon found that the dataset includes some data from several states I don't want to consider so it was plotting more boxes than I wanted.
I would like a way to exclude the data that isn't from the four states without altering the original data file. What is the best way to go about this?

Filter your json:
var keep = ["state1", "state2", "state3", "state4"];
json = json.filter(function(d) { return keep.indexOf(d.state) > -1; });

It's possible to filter the output from d3.nest rather than the original array:
function trim(nested, f) {
return nested
.filter(function (e) {
return f(e.key)
})
.map(function (e) {
if (e && (typeof e =='object') && Array.isArray(e.values) && ('key' in e)) {
return { key: e.key, values: trim(e.values, f)}
}
else return e
})
}
For instance:
function isNewEngland(st) {
return ["ME","VT","NH","MA", "CT", "RI"].indexOf(st)>=0
}
data = trim(data, isNewEngland)

Related

How to skip lines based on a criteria while loading a CSV file?

I'm trying to load a csv file with D3.js but I want to skip some lines based on some criteria When I apply filter() it says no such function. Any help would be appreciated.
var dataSet = d3.csv("mydata.csv", function(d)
{
// trying to skip the line based on the below condition
if (d.year < 1900)
{
//skip loading, not sure what should I return
return false;
}
else
return {
name: d.name,
year: d.year,
average_rating: d.average_rating,
user_rated: d.user_rated
};
})
data.csv:
name,year,average_rating,users_rated
King of Tokyo,2011,7.23048,48611
Love Letter,2012,7.25253,47014
There is no way to skip lines when loading the CSV. So, if your CSV has 1MB but only a couple of lines fit the criteria, you'll still have to load the whole 1MB. However, despite you having "skip loading" in your question it seems to me that you just want to filter the parsed CSV. If that's correct, you can either use a regular filter or, as you're trying to do in your question, you can use a row conversion function.
In that case just check the year: you return the whole object if it fits the criterion or, otherwise, return null. Check this simple demo:
const csv = `name,year
foo,1300
bar,1800,
baz,2200`;
const data = d3.csvParse(csv, d => +d.year < 1900 ? d : null);
console.log(data)
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.7.0/d3.min.js"></script>
Pay attention to the fact that numbers are parsed as strings, hence the +d.year.
Worked just fine as:
const dataSet = d3.csv("mydata.csv", function(d)
{
// skips all the lines where d.year >= 1900 condition
if (d.year >= 1900){
return {
name: d.name,
year: d.year,
average_rating: d.average_rating,
user_rated: d.user_rated
}
};
}).then(function(d){
//some more data manipulation
// d.field = +d.field;
})
.catch(function(error)
{
console.log(error);
});

amcharts4 proper way to handle zero values on logarithmic value axis

I want to display some data with high variety in amcharts4, so I like to use the logarithmic scale on the value axis. Unfortunately the data contain some zero values, which of course can't be displayed with logarithmic scale.
I tried to change the zero values to 1 before rendering the chart, which would work, but now the values are not correct any more.
data.forEach(item => {
for (const key in item) {
if (item[key] === 0) {
item[key] = 1;
}
}
});
Is there any better way to handle zero values with logarithmic value axis, that I can display the correct data?
Here is a code pen, which shows my current solution.
Edit
As of version 4.9.34, treatZeroAs is officially supported. Just set it on the value axis to the desired value to remap your zero values to:
valueAxis.treatZeroAs = 0.1;
Updated codepen.
The below workaround isn't needed anymore, but you may find the value axis adapter snippet helpful in changing the first label when using treatZeroAs.
Old method - pre 4.9.34
There doesn't appear to be a direct equivalent to v3's treatZeroAs property, which automatically handled this sort of thing. Pre-processing the data is one step, but you can also copy the original value into a separate object property and use a series tooltip adapter to dynamically show your actual value:
data.forEach(item => {
for (const key in item) {
if (item[key] <= 0) {
item[key+"_actual"] = item[key]; //copy the original value into a different property
item[key] = 1;
}
}
});
// ...
//display actual data that was re-mapped if it exists
chart.series.each((series) => {
series.adapter.add("tooltipText", (text, target) => {
if (target.dataFields) {
let valueField = target.dataFields.valueY;
let tooltipData = target.tooltipDataItem;
if (tooltipData.dataContext[valueField + "_actual"] !== undefined) {
return '{' + valueField + '_actual}';
}
else {
return text;
}
}
else {
return text;
}
})
});
If you want to fake a zero label, you can use an adapter for that as well since your smallest value in this case will be 1:
//fake the zero axis label
valueAxis.renderer.labels.template.adapter.add("text", (text) => {
if (text === "1") {
return "0"
}
else {
return text;
}
})
Codepen

dc.js Using two reducers without a simple dimension and second grouping stage

Quick question following up my response from this post:
dc.js Box plot reducer using two groups
Just trying to fully get my head around reducers and how to filter and collect data so I'll step through my understanding first.
Data Format:
{
"SSID": "eduroam",
"identifier": "Client",
"latitude": 52.4505,
"longitude": -1.9361,
"mac": "dc:d9:16:##:##:##",
"packet": "PR-REQ",
"timestamp": "2018-07-10 12:25:26",
"vendor": "Huawei Technologies Co.Ltd"
}
(1) Using the following should result in an output array of key value pairs (Key MAC Address & Value Count of networks connected to):
var MacCountsGroup = mac.group().reduce(
function (p, v) {
p[v.mac] = (p[v.mac] || 0) + v.counter;
return p;
},
function (p, v) {
p[v.mac] -= v.counter;
return p;
},
function () {
return {}; // KV Pair of MAC -> Count
}
);
(2) Then in order to use the object this must be passed flattened so it can be passed to a chart as follows:
function flatten_object_group(group) {
return {
all: function () {
return group.all().map(function (kv) {
return {
key: kv.key,
value: Object.values(kv.value).filter(function (v) {
return v > 0;
})
};
});
}
};
}
var connectionsGroup = flatten_object_group(MacCountsGroup);
(3) Then I pass mac as a piechart dimension & connectionsGroup as the group. This gives a chart back a chart with roughly 50,000 slices based on my dataset.
var packetPie = dc.pieChart("#packetPie");
packetPie
.height(495)
.width(350)
.radius(180)
.renderLabel(true)
.transitionDuration(1000)
.dimension(mac)
.ordinalColors(['#07453E', '#145C54', '#36847B'])
.group(connectionsGroup);
This works A'OK and I follow up to this point.
(4) Now I want to group by the values given out by the first reducer, i.e I want to combine all of the mac addresses with 1 network connection, 2 network connections and so on as slices.
How would this be done as a dimension of "Network connections"? How can I produce this summarized data which doesn't exist in my source data and is generated from mac?
Or would this require an intermediate function between the first reducer and flattening to combine all of the values from the first reducer?
You don't need to do all of that to get a pie chart of mac addresses.
There are a few faulty understandings in points 1-3, which I guess I'll address first. It looks like you copy and pasted code from the previous question, so I'm not really sure if this helps.
(1) If you have a dimension of mac addresses, reducing it like this won't have any further effect. The original idea was to dimension/group by vendor and then reduce counts for each mac address. This reduction will group by mac address and then further count instances of each mac address within each bin, so it's just an object with one key. It will produce a map of key value pairs like
{key: 'MAC-123', value: {'MAC-123': 12}}
(2) This will flatten the object within the values, dropping the keys and producing just an array of counts
{key: 'MAC-123', value: [12]}
(3) Since the pie chart is expecting simple key/value pairs with the value being a number, it is probably unhappy with getting values like the array [12]. The values are probably coerced to NaN.
(4) Okay, here's the real question, and it's actually not as easy as your previous question. We got off easy with the box plot because the "dimension" (in crossfilter terms, the keys you filter and group on) existed in your data.
Let's forget the false lead in points 1-3 above, and start from first principles.
There is no way to look at an individual row of your data and determine, without looking at anything else, if it belongs to the category "has 1 connection", "has 2 connections", etc. Assuming you want to be able to click on slices in the pie chart and filter all the data, we'll have to find another way to implement that.
But first let's look at how to produce a pie chart of "number of network connections". That's a little bit easier, but as far as I know, it does require a true "double reduce".
If we use the default reduction on the mac dimension, we'll get an array of key/value pairs, where the key is a mac address, and the value is the number of connections for that address:
[
{
"key": "1c:b7:2c:48",
"value": 8
},
{
"key": "1c:b7:be:ef",
"value": 3
},
{
"key": "6c:17:79:03",
"value": 2
},
...
How do we now produce a key/value array where the key is number of connections, and the value is the array of mac addresses for that number of connections?
Sounds like a job for the lesser-known Array.reduce. This function is the likely inspiration for crossfilter's group.reduce(), but it's a bit simpler: it just walks through an array, combining each value with the result of the last. It's great for producing an object from an array:
var value_keys = macPacketGroup.all().reduce(function(p, kv) {
if(!p[kv.value])
p[kv.value] = [];
p[kv.value].push(kv.key);
return p;
}, {});
Great:
{
"1": [
"b8:1d:ab:d1",
"dc:d9:16:3a",
"dc:d9:16:3b"
],
"2": [
"6c:17:79:03",
"6c:27:79:04",
"b8:1d:aa:d1",
"b8:1d:aa:d2",
"dc:da:16:3d"
],
But we wanted an array of key/value pairs, not an object!
var key_count_value_macs = Object.keys(value_keys)
.map(k => ({key: k, value: value_keys[k]}));
Great, that looks just like what a "real group" would produce:
[
{
"key": "1",
"value": [
"b8:1d:ab:d1",
"dc:d9:16:3a",
"dc:d9:16:3b"
]
},
{
"key": "2",
"value": [
"6c:17:79:03",
"6c:27:79:04",
"b8:1d:aa:d1",
"b8:1d:aa:d2",
"dc:da:16:3d"
]
},
...
Wrapping all that in a "fake group", which when asked to produce .all(), queries the original group and does the above transformations:
function value_keys_group(group) {
return {
all: function() {
var value_keys = group.all().reduce(function(p, kv) {
if(!p[kv.value])
p[kv.value] = [];
p[kv.value].push(kv.key);
return p;
}, {});
return Object.keys(value_keys)
.map(k => ({key: k, value: value_keys[k]}));
}
}
}
Now we can plot the pie chart! The only fancy thing here is that the value accessor should look at the length of the array for each value (instead of assuming the value is just a number):
packetPie
// ...
.group(value_keys_group(macPacketGroup))
.valueAccessor(kv => kv.value.length);
Demo fiddle.
However, clicking on slices won't work. I'll return to that in a minute - just want to hit "save" first!
Part 2: Filtering based on counts
As I remarked at the start, it's not possible to create a crossfilter dimension which will filter based on the count of connections. This is because crossfilter always needs to look at each row and determine, based only on the information in that row, whether it belongs in a group or filter.
If you add another chart at this point and try clicking on a slice, everything in the other charts will disappear. This is because the keys are now counts, and counts are invalid mac addresses, so we're telling it to filter to a key which doesn't exist.
However, we can obviously filter by mac address, and we also know the mac addresses for each count! So this isn't so bad. It just requires a filterHandler.
Although, hmmm, in producing the fake group, we seem to have forgotten value_keys. It's hidden away inside the function, and then let go.
It's a little ugly, but we can fix that:
function value_keys_group(group) {
var saved_value_keys;
return {
all: function() {
var value_keys = group.all().reduce(function(p, kv) {
if(!p[kv.value])
p[kv.value] = [];
p[kv.value].push(kv.key);
return p;
}, {});
saved_value_keys = value_keys;
return Object.keys(value_keys)
.map(k => ({key: k, value: value_keys[k]}));
},
value_keys: function() {
return saved_value_keys;
}
}
}
Now, every time .all() is called (every time the pie chart is drawn), the fake group will stash away the value_keys object. Not a great practice (.value_keys() would return undefined if you called it before .all()), but safe based on the way dc.js works.
With that out of the way, the filterHandler for the pie chart is relatively simple:
packetPie.filterHandler(function(dimension, filters) {
if(filters.length === 0)
dimension.filter(null);
else {
var value_keys = packetPie.group().value_keys();
var all_macs = filters.reduce(
(p, v) => p.concat(value_keys[v]), []);
dimension.filterFunction(k => all_macs.indexOf(k) !== -1);
}
return filters;
});
The interesting line here is another call to Array.reduce. This function is also useful for producing an array from another array, and here we use it just to concatenate all of the values (mac addresses) from all of the selected slices (connection counts).
Now we have a working filter. It doesn't make too much sense to combine it with the box plot from the last question, but the new fiddle demonstrates that filtering based on number of connections does work.
Part 3: what about zeroes?
As commonly comes up, crossfilter considers a bin with value zero to still exist, so we need to "remove the empty bins". However, in this case, we've added a non-standard method to the first fake group, in order to allow filtering. (We could have just used a global there, but globals are messy.)
So, we need to "pass through" the value_keys method:
function remove_empty_bins_pt(source_group) {
return {
all:function () {
return source_group.all().filter(function(d) {
return d.key !== '0';
});
},
value_keys: function() {
return source_group.value_keys();
}
};
}
packetPie
.group(remove_empty_bins_pt(value_keys_group(macPacketGroup)))
Another oddity here is we are filtering out the key zero, and that's a string here!
Demo fiddle!
Alternately, here's a better solution! Do the bin filtering before passing to value_keys_group, and then we can use the ordinary remove_empty_bins!
function remove_empty_bins(source_group) {
return {
all:function () {
return source_group.all().filter(function(d) {
//return Math.abs(d.value) > 0.00001; // if using floating-point numbers
return d.value !== 0; // if integers only
});
}
};
}
packetPie
.group(value_keys_group(remove_empty_bins(macPacketGroup)))
Yet another demo fiddle!!

if statement in a mapping

Currently I'm getting a
continue must be inside a loop
which I recognize as a syntax error on my part because it should be fixed.
Will fixing this to retain this logic in an if statement work with the mapping?
sales = data.map(function(d) {
if (isNaN(+d.BookingID) == false && isNaN(+d["Total Paid"]) == false) {
return [+d.BookingID, +d["Total Paid"]];
} else {
continue;
}
});
map is meant to be 1:1.
If you also want filtering, you should filter and then map
sales = (
data
.filter(d => (!isNaN(+d.BookingID)&& !isNaN(+d["Total Paid"]))
.map(d => [+d.BookingID, +d["Total Paid"]];
});
As others have mentioned, you cannot "continue" from within a map callback to skip elements. You need to use filter. To avoid referencing the fields twice, once in the filter, and once in the map, I'd filter afterwards:
sales = data
.map(d => [+d["bookingId"], +d["Total Paid"]])
.filter(([id, total]) => !isNaN(id) && !isNaN(total));
or, to make it easier in case you later want to include additional values in the array:
sales = data
.map(d => [+d["bookingId"], +d["Total Paid"]])
.filter(results => results.every(not(isNaN)));
where
function not(fn) { return x => !fn(x); }
or
function allNotNaN(a) { return a.every(not(isNaN)); }
and the, using parameter destructuring:
sales = data
.map(({bookingId, "Total Paid": total)) => [bookingId, total])
.filter(allNotNaN);

Creating a binding list projection from a list of search terms

I am trying to create a filtered list projection from a collection of search terms. For instance, if I have one search term, I can do something like this:
if (options.groupKey == "filtered") {
this._items = Data.getItemsFromGroup(this._group);
var query = Windows.Storage.ApplicationData.current.localSettings.values["filters"];
this._items = this._items.createFiltered(function (item) {
if (item.content.search(query) > -1) {
return true
} else {
return false
}
})
}
But what if the 'filters' local setting is a CRLF delimited list, like this:
Cisco
Microsoft
Dell
Currently, the search will compare each term to 'Cisco/nMicrosoft/nDell' which obviously won't work. content.search doesn't accept an array. Should I just do a loop in the createFiltered function somehow? That doesn't seem to be in the spirit of the projection. What is the generally accepted way to do this?
What about storing and object in the "filters" settings where every filter is a property? will that work for you?
if (options.groupKey == "filtered") {
this._items = Data.getItemsFromGroup(this._group);
var query = Windows.Storage.ApplicationData.current.localSettings.values["filters"];
this._items = this._items.createFiltered(function (item) {
return Object.keys(query).indexOf(item) > -1;
})
}
The query object would be something as follows:
{
Cisco: "",
Microsoft: "",
Dell: ""
}
Does that make sense?
edit: made a little change in the code since I believe if (query[item]) would always return false because of javascript type-casting

Categories

Resources