d3: flatten nested data?

d3: flatten nested data? - javascript

How do I flatten a table based on a series of nested values, using D3?
For the following cars.json, I wish to use D3 to flatten the hierarchy, providing a new row for each model year of each model. So there should be a total of nine line items, three for each make and model.
I'm sure I'm approaching this wrong, but I'm a bit new at D3, and I don't know how to think about it. I've seen other questions using d3.nest, but as I'm not trying to group anything, it doesn't seem applicable. Thanks!
cars.json
[
{
"make": "Ford",
"model": "Escape",
"years": [
{
"year": 2013,
"price": 16525
},
{
"year": 2014
},
{
"year": 2015
}
]
},
{
"make": "Kia",
"model": "Sportage",
"years": [
{
"year": 2012
},
{
"year": 2013,
"price": 16225
},
{
"year": 2014
}
]
},
{
"make": "Honda",
"model": "CR-V",
"years": [
{
"year": 2008
},
{
"year": 2009
},
{
"year": 2010,
"price": 12875
}
]
}
]
desired output
<table>
<thead>
<tr><th>Make</th><th>Model</th><th>Year</th><th>Price</th></tr>
</thead>
<tbody>
<tr><td>Ford</td><td>Escape</td><td>2013</td><td>16525</td></tr>
<tr><td>Ford</td><td>Escape</td><td>2014</td><td></td></tr>
<tr><td>Ford</td><td>Escape</td><td>2015</td><td></td></tr>
<tr><td>Kia</td><td>Sportage</td><td>2012</td><td></td></tr>
<tr><td>Kia</td><td>Sportage</td><td>2013</td><td>16225</td></tr>
<tr><td>Kia</td><td>Sportage</td><td>2014</td><td></td></tr>
<tr><td>Honda</td><td>CR-V</td><td>2008</td><td></td></tr>
<tr><td>Honda</td><td>CR-V</td><td>2009</td><td></td></tr>
<tr><td>Honda</td><td>CR-V</td><td>2010</td><td>12875</td></tr>
</tbody>
</table>
current attempt
<table id="cars_table">
<thead>
<th>Make</th><th>Model</th><th>Year</th><th>Price</th>
</thead>
<tbody></tbody>
<tfoot></tfoot>
</table>
<script>
(function(){
d3.json('/static/cars.json', function(error, cars) {
var tbody = d3.select('tbody')
rows = tbody.selectAll('tr').data(cars).enter().append('tr')
rows.append('td').html(function(d) {
return d.make
})
rows.append('td').html(function(d) {
return d.model
})
var years = rows.append('td').html(function(d) {
return d.years
// don't want this nested; probably should be peeled out into another `selectAll`, but I don't know where?
})
})
})()
</script>

You have to flatten the data before you render it, so that there is one datum per row (and since the rows are not nested the data shouldn't be nested). That way the table-rendering code you showed should just work.
Ideally, you'd transfer the data flat already. CSV lends itself well to transferring flat data, which is often how it comes out of relational databases. In your case the columns would be "make", "model", "year" and "price", where each make/model appears 3 times — once per year.
If you can't modify the data then flatten it in JS as soon as it's loaded. I'm nearly sure that there isn't a d3 utility for this (d3.nest() does the opposite of what you're asking to do), but it's simple enough to do this with a loop:
var flatCars = []
cars.forEach(function(car) {
car.years.forEach(function(carYear) {
flatCars.push({
make: car.make,
model: car.model,
year: carYear.year,
price: carYear.price
});
});
});
or
var flatCars = cars.reduce(memo, car) {
return memo.concat(
car.years.map(function(carYear) {
return {
make: car.make,
model: car.model,
year: carYear.year,
price: carYear.price
}
});
);
}, [])

You have to flatten your data before passing it to D3's data() method. D3 should be responsible only for transforming data structure into DOM tree. In other words: use nested data structure if you wish a nested DOM structure.
So, flatten data like this (using lodash here):
data = _.flatten(data.map(function (model) {
return model.years.map(function (year) {
return _.assign(year, _.pick(model, 'make', 'model'));
});
}));
and then pass it to data() method. Working codepen here: http://codepen.io/anon/pen/grPzPJ?editors=1111

Related

React Native how to create a VictoryPie from nested data

In my React Native application, I am accessing data from my store in the following form:
Array [
Checkout {
"date": 2020-12-27T13:24:08.734Z,
"id": "Sun Dec 27 2020 08:24:08 GMT-0500 (EST)",
"items": Array [
Object {
"productBrand": "Microsoft",
"productCategory": "Gaming",
"productId": "p1",
"productTitle": "Xbox",
"quantity": 2,
"x": 1.815,
},
Object {
"productBrand": "Apple",
"productCategory": "Computers",
"productId": "p2",
"productTitle": "MacBook Pro",
"quantity": 1,
"x": 1.905,
},
],
"total": 3.720,
},
Checkout {
"date": 2020-12-27T13:24:47.790Z,
"id": "Sun Dec 27 2020 08:24:47 GMT-0500 (EST)",
"items": Array [
Object {
"productBrand": "Apple",
"productCategory": "Computers",
"productId": "p2",
"productTitle": "MacBook Pro",
"quantity": 1,
"x": 1.905,
},
],
"total": 1.905,
},
]
I am trying to use VictoryPie to create a pie chart that shows productBrand weighted by the sum of x over all the objects. In this example, I would need a pie chart showing Microsoft and Apple, weighted by 1.815 and 2*1.905 = 3.81, respectively. Is there any way to do this without writing a separate function to calculate these sums? I would like the pie chart to update automatically every time new data is added to the store.
I tried this, where history is a variable containing the above array but no pie chart is produced.
<VictoryPie data={history} x={(data) => data.items.productBrand} y={(data) => data.items.x} />

See my working sample: https://codesandbox.io/s/react-victory-pie-chart-forked-kpe39?file=/src/index.js
Like this:
x="productBrand"
y={(data) => data.x * data.quantity}

For anyone trying to do something similar, I ended up extracting the data I needed by using a nested for loop within the useSelector hook:
const allBrands = useSelector(state => {
let allData = {};
for (const key1 in state.history.history) {
for (const key2 in state.history.history[key1].items) {
if (allData.hasOwnProperty(state.history.history[key1].items[key2].productBrand)) {
allData[state.history.history[key1].items[key2].productBrand] += state.history.history[key1].items[key2].x;
} else {
allData[state.history.history[key1].items[key2].productBrand] = state.history.history[key1].items[key2].x;
}
}
};
let dataArray = [];
for (const prop in allData) {
dataArray.push({ brand: prop, total: allData[prop] })
}
return dataArray
});
Passing allBrands to the VictoryPie data prop produced the correct pie chart.

How to update date string in array to date format in mongoDB?

My mongoDB collection looks like this:
[
{
"id": "myid",
"field": {
"total": 1,
"subfield": [
{
"time": "2020-08-06T08:33:57.977+0530"
},
{
"time": "2020-05-08T04:13:27.977+0530"
}
]
}
},
{
"id": "myid2",
"field": {
"total": 1,
"subfield": [
{
"time": "2020-07-31T10:15:50.184+0530"
}
]
}
}
]
I need to update all the documents and convert date string in the field time available in the subfieldarray to mongoDB ISO date format.
I have thousands of documents and hundreds of objects in subfield array
I'm aware of the aggregate function $todate and $convert.
But I don't want to use aggregation because,
To use $todate or $convert, I need to unwind the field.subfield array which is again an expensive operation.
I want to update my document and save it with the date format.
My MongoDB server version: 4.0.3
I tried the following but it doesn't seem to work and also doesn't return any errors.
db.collection.find().forEach(function(doc) {
doc.field.subfield.time=new ISODate(doc.field.subfield.time);
db.collection.save(doc);
})

You missed a loop for subfield, because its an array,
db.collection.find().forEach(function(doc) {
doc.field.subfield.forEach(function(r) {
r.time = new ISODate(r.time);
})
db.collection.save(doc);
})
If this is for one time then time does not matter, i think both will take same time if you do with aggregation or forEach.
If you are planing to update MongoDb version then form 4.2,
a option you can update with updateMany() using update with aggregation pipeline,
db.collection.updateMany({},
[{
$set: {
"field.subfield": {
$map: {
input: "$field.subfield",
as: "r",
in: {
$mergeObjects: [
"$$r",
{ time: { $toDate: "$$r.time" } }
]
}
}
}
}
}]
)

Dynamically render information from two arrays

I have this data:
"PensionPlanSummary": [
{
"Type": "DefinedContributionPension",
"Participants": [
{
"Year": 2018,
"Value": 425.0
}
],
"TotalAssets": [
{
"Year": 2018,
"Value": 629282.0
}
],
"NetAssets": [
{
"Year": 2018,
"Value": 629282.0
}
],
},
{
"Type": "Welfare",
"Participants": [
{
"Year": 2018,
"Value": 252.0
},
{
"Year": 2017,
"Value": 389.0
}
],
"TotalAssets": [
{
"Year": 2018,
"Value": 0.0
},
{
"Year": 2017,
"Value": 0.0
}
],
"NetAssets": [
{
"Year": 2018,
"Value": 0.0
},
{
"Year": 2017,
"Value": 0.0
}
]
}
]
I want to render data in this table:
Focus only on Participants. As you can see the data is not populated correct because it is populating by row and for example it should skip 2016 and 2017 for DefinedContributionPension and to fill 2018.
This table is result of this code:
{element.Participants.reverse().map((el, ind) => {
return uniqueYears.map((e, i) => {
// console.log(el.Year);
if (el.Year == e) {
console.log(el.Year);
return (
<td key={ind}>
${numeral(el.Value).format("0,0")}
</td>
);
} else {
return <td key={ind}> - </td>;
}
});
})}
uniqueYears =[2016,2017,2018]
element is the single object (I have another map above). So as you can see I am mapping 1 time the participants and 1 time unique years and finding the 1 record that is true for the condition of equality of the element year. As you can see it is not putting the dash - and it is not populating table correct. I tried looping in other way - first to loop over the uniqueYears array and then to element.Participants but again it not worked as expected. Any ideas how to make it?
P.S.: Table should look like this way:
But lets focus only on participants as in example.

I have tried coming up with a very dirty solution in as little as time as possible, but it seems to be working for me
pensionPlans and uniqueYears are arrays that you have mentioned. My code is below
pensionPlans.map(e => {
if(e.Type === "DefinedContributionPension"){
uniqueYears.map(year => {
e.Participants.map(item => {
if(item.Year === year){
console.log('found')
} else {
console.log("not found")
}
})
})
}
})
Also, I notice that you have used == instead of === while checking for if (el.Year == e). Although correct, this might have some implications later as it doesn't check the type.
You can see my answer running in console here https://jsfiddle.net/z65mp0s8/

Okay, I made a solution for your problem!
So, to get the years from your table, I made a simple function, which just returns the years from your statistics table header, instead of using the pre-defined uniqueYears = [2016, 2017, 2018], but feel free to use it if you need.
Obs.: The advantage of using this function is that you don't need to update your year heading and you uniqueYears array, update only your html table headings and you get the new data for all.
function getYears() {
let years = [];
for(const cell of document.querySelectorAll("[data-statistics='year'] th")) {
if(cell.innerText.trim() !== "") years.push(parseInt(cell.innerText));
}
return years;
}
So if you choose to use the function above make sure you have in yout html table this, the data-statistics="year" is required by the function.
<thead>
<tr data-statistics="year">
<th></th>
<th>2016</th>
<th>2017</th>
<th>2018</th>
</tr>
</thead>
Right after, to get the entries of each one of your data, you can use the Object.entries(), which gives you the key and the entry of each property of your object.
The Array.prototype.splice() is to remove the type property, focusing only on the statistics data.
for(const Plan of PensionPlanSummary) {
let PlanEntries = Object.entries(Plan);
PlanEntries = PlanEntries.splice(1, PlanEntries.length - 1);
for(const pe of PlanEntries) {
// pe[0] -> Title
// pe[1] -> Data
getRow(pe[0], pe[1]);
}
}
Then with your entries with a simple for loops you can achieve your data appending everything into <td>Data</td> and return a html row;
function getRow(index = "", data = null) {
let years = getYears(); // Use your uniqueYears array here.
let html = "<tr>"
// Sort by year
data = data.slice(0);
data.sort((a, b) => {
return a.Year - b.Year;
})
html += `<td>${index}</td>`;
for (const d of data) {
while (years.length !== 0) {
if (d.Year === years[0]) {
html += `<td>${d.Value}</td>`;
years.shift();
break;
} else {
html += `<td>-</td>`;
}
years.shift();
}
}
html += "</tr>";
// console.log(html);
return html;
}
The final result will be this html:
<tr>
<td>Participants</td>
<td>-</td>
<td>-</td>
<td>425</td>
</tr>
Or for your 2 year data participants:
<tr>
<td>Participants</td>
<td>-</td>
<td>389</td>
<td>252</td>
</tr>
Now you only need to append in your html, as you want, take a look at JsFiddle if you need.
It is a little bit dirty code, but hope it helps!

Count the number of events attended for a specific user in Cloudant using mapreduce

I'm using Cloudant's map reduce functionality and I want to find how many events (count of events object) the specific user with name (input from user) has attended for a date range (input from user).
I have docs that look like below.
{
user: {
name: 'peter pan'
},
startEventDateTime: <timestamp>,
endDateDateTime: <timestamp>,
events: [
{
name: 'La la land',
text: 'more info'
},
{
name: 'La la land',
text: 'more info'
}
]
}
Above means, user attended 2 events between between that start and end time. There are many documents for the same user for a different date range too with the events attended list.
How can I achieve this in Cloudant map reduce?
My Attempt:
unable to get map correctly. I can filter by name by doing
map:
function (doc) {
emit([doc.user, doc.events, startEventDateTime, endDateDateTime], doc)
}
reduce:
function (keys, values, rereduce) {
if (rereduce) {
return sum(values);
} else {
return values.length;
}
}

I would suggest considering a different format for your documents. Instead of having a user document with a list of events, make a separate document for each event, timestamped for the time at which it happened, such as:
{
"_id": "c48ee0881ce7c5d39243d2243d2e63cb",
"_rev": "1-c2f71fba5f09b129f1db20785f2429b2",
"user": "bob",
"datetime": "Thu 30 Nov 2017 09:46:02 GMT",
"event": {
"name": "lalaland",
"text": "more info"
}
}
Then you can rely on MapReduce to pick out date ranges per user. Here's a map function that does just that:
function (doc) {
if (doc && doc.user && doc.datetime) {
var when = new Date(Date.parse(doc.datetime));
emit([doc.user, when.getFullYear(), when.getMonth(), when.getDay()], 1);
}
}
and using the built-in reduce _sum. You can now use key ranges to slice the data. Say you want the events attended by user bob in Aug, 2017:
curl 'https://ACCT.cloudant.com/DBNAME/_design/DDOC/_view/VIEWNAME?startkey=\["bob", 2017, 7\]&endkey=\["bob", 2017, 8]&group=true&inclusive_end=false&reduce=true'
{
"rows": [
{
"key": [
"bob",
2017,
7,
4
],
"value": 1
}
]
}

Return zero count where data does not exist

Documents in my mongodb data collection follows this format,
[
{
"_id": xxxxxxxxx,
"crime_type": "illegal_trade",
"crime_year": "2013",
"location": "Kurunegala"
},
{
"_id": xxxxxxxxx,
"crime_type": "illegal_trade",
"crime_year": "2013",
"location": "Colombo"
},
{
"_id": xxxxxxxxx,
"crime_type": "illegal_trade",
"crime_year": "2014",
"location": "Kandy"
},
{
"_id": xxxxxxxxx,
"crime_type": "murder",
"crime_year": "2013",
"location": "Kadawatha"
}
]
When I run this aggregate operation,
db.collection.aggregate(
[
{ $group : { _id : {type: "$crime_type", year: "$crime_year"}, count: { $sum: 1 } } }
]
)
The result contains only the items that have count > 0
for an example the results for _id : {type: "murder", year: "2014"} in which count = 0 won't include in the results.
My question is,
How should I alter my query so that those count = 0 items are also in the results?
In other words how to do something like this with mongodb...?

Basically what you are asking for is results that are not present in the data, so where that key combination does not exist then you return a count of 0. In truth, no database system "truly" does this, but there are ways to make it look like that is happening. But it also means understanding what is really happening to make that so.
It is true that the SQL approach to this would be to make a sub-query of the expected keys from distinct values and then "join" that to the existing dataset in order to create "false positives" for grouping accumulation. That is the general method there, but of course there is the basic "joining" concept which is not supported by MongoDB for reasons of scalability. Whole different argument there, just accept that MongoDB does not do joins on it's own server architecture, and likely never will.
As such, the task of creating that "false set" when working with MongoDB is relegated to a client side ( and consider this only in terms of "client" is a separate process to the database server ) operation. So you essentially get both the "result set" and the "blank set" and "merge" the results.
Different language approaches vary, but here is an efficient listing for node.js:
var async = require('async'),
mongo = require('mongodb'),
MongoClient = mongo.MongoClient,
DataStore = require('nedb'),
combined = new DataStore();
var data = [
{
"crime_type": "illegal_trade",
"crime_year": "2013",
"location": "Kurunegala"
},
{
"crime_type": "illegal_trade",
"crime_year": "2013",
"location": "Colombo"
},
{
"crime_type": "illegal_trade",
"crime_year": "2014",
"location": "Kandy"
},
{
"crime_type": "murder",
"crime_year": "2013",
"location": "Kadawatha"
}
];
MongoClient.connect('mongodb://localhost/test',function(err,db) {
if (err) throw err;
db.collection('mytest',function(err,collection) {
if (err) throw err;
async.series(
[
// Clear collection
function(callback) {
console.log("Dropping..\n");
collection.remove({},callback);
},
// Insert data
function(callback) {
console.log("Inserting..\n");
collection.insert(data,callback);
},
// Run parallel merge
function(callback) {
console.log("Merging..\n");
async.parallel(
[
// Blank Distincts
function(callback) {
collection.distinct("crime_year",function(err,years) {
if (err) callback(err);
async.each( years, function(year,callback) {
collection.distinct("crime_type",function(err,types) {
if (err) callback(err);
async.each( types, function(type,callback) {
combined.update(
{ "type": type, "year": year },
{ "$inc": { "count": 0 } },
{ "upsert": true },
callback
);
},callback);
});
},callback);
});
},
// Result distincts
function(callback) {
collection.aggregate(
[
{ "$group": {
"_id": {
"type": "$crime_type",
"year": "$crime_year"
},
"count": { "$sum": 1 }
}}
],
function(err,results) {
async.each( results, function(result, callback) {
combined.update(
{ "type": result._id.type, "year": result._id.year },
{ "$inc": { "count": result.count } },
{ "upsert": true },
callback
);
},callback);
}
);
}
],
function(err) {
callback(err);
}
)
},
// Retrieve result
function(callback) {
console.log("Fetching:\n");
combined.find({},{ "_id": 0 }).sort(
{ "year": 1, "type": 1 }).exec(function(err,results) {
if (err) callback(err);
console.log( JSON.stringify( results, undefined, 4 ) );
callback();
});
}
],
function(err) {
if (err) throw err;
db.close();
}
)
});
});
And this will return a result that not only "combines" results for grouped keys, but also contains a 0 entry for "murder" in year "2014":
[
{
"type": "illegal_trade",
"year": "2013",
"count": 2
},
{
"type": "murder",
"year": "2013",
"count": 1
},
{
"type": "illegal_trade",
"year": "2014",
"count": 1
},
{
"type": "murder",
"year": "2014",
"count": 0
}
]
So consider what the meat of the operations is here, mostly within the "parallel" section of the code under "Merging", as this is an efficient way for node to issue all of the queries (and potentially quite a few) all at the same time.
The first part in order to get the "blank" results with no count is essentially a double loop operation, where the point is to get the distinct values for each of "year" and "type". Whether you use the .distinct() method as shown here or the or using the .aggregate() method with a "cursor" for output and iteration is a matter of how much data you have or what you personally like. For a small set an generation then .distinct() is fine with the results in memory. But we want to create "blank" or 0 count entries for each possible pairing, or more importantly including those that are "non existent" as a pairing in the dataset.
Secondly, and in parallel where possible, the aggregation result is run with the standard results. Of course these results will not return a count for "murder" in "2014" because there is none. But this is what basically comes down to merging the results.
The "merge" is basically working with a "hash/map/dict" ( whatever your term is ), of the combined keys for "year" and "type". So you just use that structure, adding the key where it does not exist or incrementing the "count" value on that key where it does. That's an age old operation, and essentially the basis of all aggregation techniques.
The neat little thing being done here ( not that you need to use it ), is the use of nedb, which is a nice little module that allows the use of MongoDB "like" operations on in-memory or other self contained data files. Think of it like SQLite to SQL RDBMS operations. Only a little lighter on the complete functionality.
Part of the point here is that the "hash merge" functions now look like regular MongoDB "upsert" operations to the code. In fact, the same code essentially applies if you have a large result that needs to end up in a "result collection" on the server instead.
The overall point is that this is effectively a "join" operation or otherwise a "fill in the blanks" operation depending on the overall size and expectancy of "keys" in your operation. The MongoDB server is not going to do this, but there is nothing stopping you from writing what is effectively your own "data layer" as a middle tier between your end application and the database. Such a distributed server model can be scaled out so that this service level performs these sorts of "joining" operations.
All of the queries used to get the data to merge can effectively be run in parallel under the right coding environment, so while this may not seem as straightforward as the SQL approach to doing this, it can however still be very effective and efficient at actually processing the results.
The approach is different, but then again that is part of the philosophy here. MongoDB relagates "joining" activities to different parts of your application architecture in order to keep it's own server specific operations more efficient, and mostly with regards to sharded clusters. "Joining" or this "Hash Merge" is a "code" function that can be handled by other infrastructure than the database server.

Develop Reference

JavaScript is the programming language of the Web.

d3: flatten nested data? - javascript

Related

React Native how to create a VictoryPie from nested data

How to update date string in array to date format in mongoDB?

Dynamically render information from two arrays

Count the number of events attended for a specific user in Cloudant using mapreduce

Return zero count where data does not exist

Categories

Resources