Dynamically render information from two arrays - javascript

I have this data:
"PensionPlanSummary": [
{
"Type": "DefinedContributionPension",
"Participants": [
{
"Year": 2018,
"Value": 425.0
}
],
"TotalAssets": [
{
"Year": 2018,
"Value": 629282.0
}
],
"NetAssets": [
{
"Year": 2018,
"Value": 629282.0
}
],
},
{
"Type": "Welfare",
"Participants": [
{
"Year": 2018,
"Value": 252.0
},
{
"Year": 2017,
"Value": 389.0
}
],
"TotalAssets": [
{
"Year": 2018,
"Value": 0.0
},
{
"Year": 2017,
"Value": 0.0
}
],
"NetAssets": [
{
"Year": 2018,
"Value": 0.0
},
{
"Year": 2017,
"Value": 0.0
}
]
}
]
I want to render data in this table:
Focus only on Participants. As you can see the data is not populated correct because it is populating by row and for example it should skip 2016 and 2017 for DefinedContributionPension and to fill 2018.
This table is result of this code:
{element.Participants.reverse().map((el, ind) => {
return uniqueYears.map((e, i) => {
// console.log(el.Year);
if (el.Year == e) {
console.log(el.Year);
return (
<td key={ind}>
${numeral(el.Value).format("0,0")}
</td>
);
} else {
return <td key={ind}> - </td>;
}
});
})}
uniqueYears =[2016,2017,2018]
element is the single object (I have another map above). So as you can see I am mapping 1 time the participants and 1 time unique years and finding the 1 record that is true for the condition of equality of the element year. As you can see it is not putting the dash - and it is not populating table correct. I tried looping in other way - first to loop over the uniqueYears array and then to element.Participants but again it not worked as expected. Any ideas how to make it?
P.S.: Table should look like this way:
But lets focus only on participants as in example.

I have tried coming up with a very dirty solution in as little as time as possible, but it seems to be working for me
pensionPlans and uniqueYears are arrays that you have mentioned. My code is below
pensionPlans.map(e => {
if(e.Type === "DefinedContributionPension"){
uniqueYears.map(year => {
e.Participants.map(item => {
if(item.Year === year){
console.log('found')
} else {
console.log("not found")
}
})
})
}
})
Also, I notice that you have used == instead of === while checking for if (el.Year == e). Although correct, this might have some implications later as it doesn't check the type.
You can see my answer running in console here https://jsfiddle.net/z65mp0s8/

Okay, I made a solution for your problem!
So, to get the years from your table, I made a simple function, which just returns the years from your statistics table header, instead of using the pre-defined uniqueYears = [2016, 2017, 2018], but feel free to use it if you need.
Obs.: The advantage of using this function is that you don't need to update your year heading and you uniqueYears array, update only your html table headings and you get the new data for all.
function getYears() {
let years = [];
for(const cell of document.querySelectorAll("[data-statistics='year'] th")) {
if(cell.innerText.trim() !== "") years.push(parseInt(cell.innerText));
}
return years;
}
So if you choose to use the function above make sure you have in yout html table this, the data-statistics="year" is required by the function.
<thead>
<tr data-statistics="year">
<th></th>
<th>2016</th>
<th>2017</th>
<th>2018</th>
</tr>
</thead>
Right after, to get the entries of each one of your data, you can use the Object.entries(), which gives you the key and the entry of each property of your object.
The Array.prototype.splice() is to remove the type property, focusing only on the statistics data.
for(const Plan of PensionPlanSummary) {
let PlanEntries = Object.entries(Plan);
PlanEntries = PlanEntries.splice(1, PlanEntries.length - 1);
for(const pe of PlanEntries) {
// pe[0] -> Title
// pe[1] -> Data
getRow(pe[0], pe[1]);
}
}
Then with your entries with a simple for loops you can achieve your data appending everything into <td>Data</td> and return a html row;
function getRow(index = "", data = null) {
let years = getYears(); // Use your uniqueYears array here.
let html = "<tr>"
// Sort by year
data = data.slice(0);
data.sort((a, b) => {
return a.Year - b.Year;
})
html += `<td>${index}</td>`;
for (const d of data) {
while (years.length !== 0) {
if (d.Year === years[0]) {
html += `<td>${d.Value}</td>`;
years.shift();
break;
} else {
html += `<td>-</td>`;
}
years.shift();
}
}
html += "</tr>";
// console.log(html);
return html;
}
The final result will be this html:
<tr>
<td>Participants</td>
<td>-</td>
<td>-</td>
<td>425</td>
</tr>
Or for your 2 year data participants:
<tr>
<td>Participants</td>
<td>-</td>
<td>389</td>
<td>252</td>
</tr>
Now you only need to append in your html, as you want, take a look at JsFiddle if you need.
It is a little bit dirty code, but hope it helps!

Related

Javascript: Removing Semi-Duplicate Objects within an Array with Conditions

I am trying to remove the "Duplicate" objects within an array while retaining the object that has the lowest value associated with it.
~~Original
var array = [
{
"time": "2021-11-12T20:37:11.112233Z",
"value": 3.2
},
{
"time": "2021-11-12T20:37:56.115222Z",
"value": 3.8
},
{
"time": "2021-11-13T20:37:55.112255Z",
"value": 4.2
},
{
"time": "2021-11-13T20:37:41.112252Z",
"value": 2
},
{
"time": "2021-11-14T20:37:22.112233Z",
"value": 3.2
}
]
~~Expected Output
var array = [
{
"time": "2021-11-12T20:37:11.112233Z",
"value": 3.2
},
{
"time": "2021-11-13T20:37:41.112252Z",
"value": 2
},
{
"time": "2021-11-14T20:37:22.112233Z",
"value": 3.2
}
]
What I have so far:
var result = array.reduce((aa, tt) => {
if (!aa[tt.time]) {
aa[tt.time] = tt;
} else if (Number(aa[tt.time].value) < Number(tt.value)) {
aa[tt.time] = tt;
}
return aa;
}, {});
console.log(result);
I realize the issue with what I am trying to do is that the "time" attribute is not identical to the other time values I am considering as duplicates.
Though for this use case I do not need the time out to ms. YYYY-MM-DDTHH:MM (to the minute) is fine. I am not sure how to implement a reduction method for this case when the time isnt exactly the same. Maybe if only the first 16 characters were checked in the string?
Let me know if any additional information is needed.
So a few issues:
If you want to only check the first 16 characters to detect a duplicate, you should use that substring of tt.time as key for aa instead of the whole string.
Since you want the minimum, your comparison operator is wrong.
The code produces an object, while you want an array, so you still need to extract the values from the object.
Here is your code with those adaptations:
var array = [{"time": "2021-11-12T20:37:11.112233Z","value": 3.2},{"time": "2021-11-12T20:37:56.115222Z","value": 3.8},{"time": "2021-11-13T20:37:55.112255Z","value": 4.2},{"time": "2021-11-13T20:37:41.112252Z","value": 2},{"time": "2021-11-14T20:37:22.112233Z","value": 3.2}];
var result = Object.values(array.reduce((aa, tt) => {
var key = tt.time.slice(0, 16);
if (!aa[key]) {
aa[key] = tt;
} else if (Number(aa[key].value) > Number(tt.value)) {
aa[key] = tt;
}
return aa;
}, {}));
console.log(result);

Getting undefined instead of the return value of a function [duplicate]

This question already has answers here:
Why does this forEach return undefined when using a return statement
(5 answers)
Closed 1 year ago.
I have this simple array of two objects where I need to get the price of the products of 1 month.
The array can 2 or 3 items so I'm checking whether it is 2 or 3. Then I'm storing the objects in their own variable and calling the function getOneMonthPrice with parameter value of those variables.
The function getOneMOnthPrice loops over the products and checks if the months property as a value of 01 month, and if it does then it returns the price.
But right now I'm only getting undefined, whereas if I use console.log inside the functions, I can see the values showing in the console.
What is going wrong here?
Please note that I have actually removed most of the codes from the actual project where I'm working on. This is just the shortest version. If you can please give me a solutoin based on this format.
Here's the snippet:
const arr = [{
"name": "Boom Boom Ltd.",
"products": [{
"price": "3.33",
"months": "24 months"
},
{
"price": "10.95",
"months": "01 month"
}
]
},
{
"name": "Bam Bam Ltd.",
"products": [{
"price": "5.93",
"months": "24 months"
},
{
"price": "12.95",
"months": "01 month"
}
]
}
];
function getOneMonthPrice(company) {
company.products.forEach(item => {
if (item.months === "01 month") {
return item.price;
}
});
};
if (arr.length === 2) {
const company1 = arr[0];
const company2 = arr[1];
console.log(getOneMonthPrice(company1));
}
Returning from the forEach callback function doesn't return from the containing function.
Instead of forEach, you can use find() to find the array element that matches your criteria, then return the price from that.
function getOneMonthPrice(company) {
let product = company.products.find(item => item.months === "01 month");
return product && product.price;
}

replace multiple values in json/jsObject/string

I have a response from a web service and want to replace some values in the response with my custom values.
One way is to write a tree traverser and then check for the value and replace with my custom value
so the response is some what like this:
[
{
"name": "n1",
"value": "v1",
"children": [
{
"name": "n2",
"value": "v2"
}
]
},
{
"name": "n3",
"value": "v3"
}
]
now my custom map is like this
const map = {
"v1": "v11",
"v2": "v22",
"v3": "v33"
};
All I want is
[
{
"name": "n1",
"value": "v11",
"children": [
{
"name": "n2",
"value": "v22"
}
]
},
{
"name": "n3",
"value": "v33"
}
]
I was thinking if I could stringify my response and then replace values using a custom build regex from my map of values.
Will it be faster as compared to tree traverser?
If yes, how should I do that?
somewhat like this
originalString.replace(regexp, function (replacement))
The tree traversal is faster
Note that some things could be done more efficiently in the regex implementation but I still think there are some more bottlenecks to explain.
Why the regex is slow:
There are probably many more reasons why the regex is slower but I'll explain at least one significant reason:
When you're using regex to find and replace, you're using creating new strings every time and performing your matches every time. Regex expressions can be very expensive and my implementation isn't particularly cheap.
Why is the tree traversal faster:
In the tree traversal, I'm mutating the object directly. This doesn't require creating new string objects or any new objects at all. We're also not performing a full search on the whole string every time as well.
RESULTS
run the performance test below. The test using console.time to record how long it takes. See the the tree traversal is much faster.
function usingRegex(obj, map) {
return JSON.parse(Object.keys(map).map(oldValue => ({
oldValue,
newValue: map[oldValue]
})).reduce((json, {
oldValue,
newValue
}) => {
return json.replace(
new RegExp(`"value":"(${oldValue})"`),
() => `"value":"${newValue}"`
);
}, JSON.stringify(obj)));
}
function usingTree(obj, map) {
function traverse(children) {
for (let item of children) {
if (item && item.value) {
// get a value from a JS object is O(1)!
item.value = map[item.value];
}
if (item && item.children) {
traverse(item.children)
}
}
}
traverse(obj);
return obj; // mutates
}
const obj = JSON.parse(`[
{
"name": "n1",
"value": "v1",
"children": [
{
"name": "n2",
"value": "v2"
}
]
},
{
"name": "n3",
"value": "v3"
}
]`);
const map = {
"v1": "v11",
"v2": "v22",
"v3": "v33"
};
// show that each function is working first
console.log('== TEST THE FUNCTIONS ==');
console.log('usingRegex', usingRegex(obj, map));
console.log('usingTree', usingTree(obj, map));
const iterations = 10000; // ten thousand
console.log('== DO 10000 ITERATIONS ==');
console.time('regex implementation');
for (let i = 0; i < iterations; i += 1) {
usingRegex(obj, map);
}
console.timeEnd('regex implementation');
console.time('tree implementation');
for (let i = 0; i < iterations; i += 1) {
usingTree(obj, map);
}
console.timeEnd('tree implementation');
Will it be faster as compared to tree traverser?
I don't know. I think it would depend on the size of the input, and the size of the replacement map. You could run some tests at JSPerf.com.
If yes, how should I do that?
It's fairly easy to do with a regex-based string replacement if the values you are replacing don't need any special escaping or whatever. Something like this:
const input = [
{
"name": "n1",
"value": "v1",
"children": [
{
"name": "n2",
"value": "v2"
}
]
},
{
"name": "n3",
"value": "v3"
}
];
const map = {
"v1": "v11",
"v2": "v22",
"v3": "v33"
};
// create a regex that matches any of the map keys, adding ':' and quotes
// to be sure to match whole property values and not property names
const regex = new RegExp(':\\s*"(' + Object.keys(map).join('|') + ')"', 'g');
// NOTE: if you've received this data as JSON then do the replacement
// *before* parsing it, don't parse it then restringify it then reparse it.
const json = JSON.stringify(input);
const result = JSON.parse(
json.replace(regex, function(m, key) { return ': "' + map[key] + '"'; })
);
console.log(result);
definitely traverser go faster as string replace means travels against each characters in the final string as opposed to iterator that can skips no necessarily item.

d3: flatten nested data?

How do I flatten a table based on a series of nested values, using D3?
For the following cars.json, I wish to use D3 to flatten the hierarchy, providing a new row for each model year of each model. So there should be a total of nine line items, three for each make and model.
I'm sure I'm approaching this wrong, but I'm a bit new at D3, and I don't know how to think about it. I've seen other questions using d3.nest, but as I'm not trying to group anything, it doesn't seem applicable. Thanks!
cars.json
[
{
"make": "Ford",
"model": "Escape",
"years": [
{
"year": 2013,
"price": 16525
},
{
"year": 2014
},
{
"year": 2015
}
]
},
{
"make": "Kia",
"model": "Sportage",
"years": [
{
"year": 2012
},
{
"year": 2013,
"price": 16225
},
{
"year": 2014
}
]
},
{
"make": "Honda",
"model": "CR-V",
"years": [
{
"year": 2008
},
{
"year": 2009
},
{
"year": 2010,
"price": 12875
}
]
}
]
desired output
<table>
<thead>
<tr><th>Make</th><th>Model</th><th>Year</th><th>Price</th></tr>
</thead>
<tbody>
<tr><td>Ford</td><td>Escape</td><td>2013</td><td>16525</td></tr>
<tr><td>Ford</td><td>Escape</td><td>2014</td><td></td></tr>
<tr><td>Ford</td><td>Escape</td><td>2015</td><td></td></tr>
<tr><td>Kia</td><td>Sportage</td><td>2012</td><td></td></tr>
<tr><td>Kia</td><td>Sportage</td><td>2013</td><td>16225</td></tr>
<tr><td>Kia</td><td>Sportage</td><td>2014</td><td></td></tr>
<tr><td>Honda</td><td>CR-V</td><td>2008</td><td></td></tr>
<tr><td>Honda</td><td>CR-V</td><td>2009</td><td></td></tr>
<tr><td>Honda</td><td>CR-V</td><td>2010</td><td>12875</td></tr>
</tbody>
</table>
current attempt
<table id="cars_table">
<thead>
<th>Make</th><th>Model</th><th>Year</th><th>Price</th>
</thead>
<tbody></tbody>
<tfoot></tfoot>
</table>
<script>
(function(){
d3.json('/static/cars.json', function(error, cars) {
var tbody = d3.select('tbody')
rows = tbody.selectAll('tr').data(cars).enter().append('tr')
rows.append('td').html(function(d) {
return d.make
})
rows.append('td').html(function(d) {
return d.model
})
var years = rows.append('td').html(function(d) {
return d.years
// don't want this nested; probably should be peeled out into another `selectAll`, but I don't know where?
})
})
})()
</script>
You have to flatten the data before you render it, so that there is one datum per row (and since the rows are not nested the data shouldn't be nested). That way the table-rendering code you showed should just work.
Ideally, you'd transfer the data flat already. CSV lends itself well to transferring flat data, which is often how it comes out of relational databases. In your case the columns would be "make", "model", "year" and "price", where each make/model appears 3 times — once per year.
If you can't modify the data then flatten it in JS as soon as it's loaded. I'm nearly sure that there isn't a d3 utility for this (d3.nest() does the opposite of what you're asking to do), but it's simple enough to do this with a loop:
var flatCars = []
cars.forEach(function(car) {
car.years.forEach(function(carYear) {
flatCars.push({
make: car.make,
model: car.model,
year: carYear.year,
price: carYear.price
});
});
});
or
var flatCars = cars.reduce(memo, car) {
return memo.concat(
car.years.map(function(carYear) {
return {
make: car.make,
model: car.model,
year: carYear.year,
price: carYear.price
}
});
);
}, [])
You have to flatten your data before passing it to D3's data() method. D3 should be responsible only for transforming data structure into DOM tree. In other words: use nested data structure if you wish a nested DOM structure.
So, flatten data like this (using lodash here):
data = _.flatten(data.map(function (model) {
return model.years.map(function (year) {
return _.assign(year, _.pick(model, 'make', 'model'));
});
}));
and then pass it to data() method. Working codepen here: http://codepen.io/anon/pen/grPzPJ?editors=1111

mapreduce with sort on inner document mongodb

I have a quick question on map-reduce with mongodb. I have this following document structure
{
"_id": "ffc74819-c844-4d61-8657-b6ab09617271",
"value": {
"mid_tag": {
"0": {
"0": "Prakash Javadekar",
"1": "Shastri Bhawan",
"2": "Prime Minister's Office (PMO)",
"3": "Narendra Modi"
},
"1": {
"0": "explosion",
"1": "GAIL",
"2": "Andhra Pradesh",
"3": "N Chandrababu Naidu"
},
"2": {
"0": "Prime Minister",
"1": "Narendra Modi",
"2": "Bharatiya Janata Party (BJP)",
"3": "Government"
}
},
"total": 3
}
}
when I am doing my map reduce code on this collection of documents I want to specify total as the sort field in this command
db.ana_mid_big.mapReduce(map, reduce,
{
out: "analysis_result",
sort: {"value.total": -1}
}
);
But this does not seem to work. How can I specify a key which is nested for sorting? Please help.
----------------------- EDIT ---------------------------------
as per the comments I am posting my whole problem here. I have started with a collection with a little more than 3.5M documents (this is just an old snap shot of the live one, which already crossed 5.5 M) which looks like this
{
"_id": ObjectId("53b394d6f9c747e33d19234d"),
"autoUid": "ffc74819-c844-4d61-8657-b6ab09617271"
"createDate": ISODate("2014-07-02T05:12:54.171Z"),
"account_details": {
"tag_cloud": {
"0": "FIFA World Cup 2014",
"1": "Brazil",
"2": "Football",
"3": "Argentina",
"4": "Belgium"
}
}
}
So, there can be many documents with the same autoUid but with different (or partially same or even same) tag_cloud.
I have written this following map-reduce to generate an intermediate collection which looks like the one at the start of the question. So, evidently that is collection of all the tag_clouds belongs to one person in a single document. To achieve this the MR code i used looks like the following
var map = function(){
final_val = {
tag_cloud: this.account_details.tag_cloud,
total: 1
};
emit(this.autoUid, final_val)
}
var reduce = function(key, values){
var fv = {
mid_tags: [],
total: 0
}
try{
for (i in values){
fv.mid_tags.push(values[i].tag_cloud);
fv.total = fv.total + 1;
}
}catch(e){
fv.mid_tags.push(values)
fv.total = fv.total + 1;
}
return fv;
}
db.my_orig_collection.mapReduce(map, reduce,
{
out: "analysis_mid",
sort: {createDate: -1}
}
);
Here comes problem Number-1 when somebody has more than one record it obeys reduce function. But when somebody has only one instead of naming it "mid_tag" it retains the name "tag_cloud". I understand that there is some problem with the reduce code but can not find what.
Now I want to reach to a final result which looks like
{"_id": "ffc74819-c844-4d61-8657-b6ab09617271",
"value": {
"tags": {
"Prakash Javadekar": 1,
"Shastri Bhawan": 1,
"Prime Minister's Office (PMO)": 1,
"Narendra Modi": 2,
"explosion": 1,
"GAIL": 1,
"Andhra Pradesh": 1,
"N Chandrababu Naidu": 1,
"Prime Minister": 1,
"Bharatiya Janata Party (BJP)": 1,
"Government": 1
}
}
Which is finally one document for each person representing the tag density they have used. The MR code I am trying to use (not tested yet) looks like this---
var map = function(){
var val = {};
if ("mid_tags" in this.value){
for (i in this.value.mid_tags){
for (j in this.value.mid_tags[i]){
k = this.value.mid_tags[i][j].trim();
if (!(k in val)){
val[k] = 1;
}else{
val[k] = val[k] + 1;
}
}
}
var final_val = {
tag: val,
total: this.value.total
}
emit(this._id, final_val);
}else if("tag_cloud" in this.value){
for (i in this.value.tag_cloud){
k = this.value.tag_cloud[i].trim();
if (!(k in val)){
val[k] = 1;
}else{
val[k] = val[k] + 1;
}
}
var final_val = {
tag: val,
total: this.value.total
}
emit(this._id, final_val);
}
}
var reduce = function(key, values){
return values;
}
db.analysis_mid.mapReduce(map, reduce,
{
out: "analysis_result"
}
);
This last piece of code is not tested yet. That is all I want to do. Please help
Your PHP background appears to be showing. The data structures you are representing are not showing arrays in typical JSON notation, however there are noted calls to "push" in your mapReduce code that at least in your "interim document" the values are actually arrays. You seem to have "notated" them the same way so it seems reasonable to presume they are.
Actual arrays are your best option for storage here, especially considering your desired outcome. So even if they do not, your original documents should look like this, as they would be represented in the shell:
{
"_id": ObjectId("53b394d6f9c747e33d19234d"),
"autoUid": "ffc74819-c844-4d61-8657-b6ab09617271"
"createDate": ISODate("2014-07-02T05:12:54.171Z"),
"account_details": {
"tag_cloud": [
"FIFA World Cup 2014",
"Brazil",
"Football",
"Argentina",
"Belgium"
]
}
}
With documents like that or if you change them to be like that, then your right tool for doing this is the aggregation framework. That works in native code and does not require JavaScript interpretation, hence it is much faster.
An aggregation statement to get to your final result is like this:
db.collection.aggregate([
// Unwind the array to "de-normalize"
{ "$unwind": "$account_details.tag_cloud" },
// Group by "autoUid" and "tag", summing totals
{ "$group": {
"_id": {
"autoUid": "$autoUid",
"tag": "$account_details.tag_cloud"
},
"total": { "$sum": 1 }
}},
// Sort the results to largest count per user
{ "$sort": { "_id.autoUid": 1, "total": -1 }
// Group to a single user with an array of "tags" if you must
{ "$group": {
"_id": "$_id.autoUid",
"tags": {
"$push": {
"tag": "$_id.tag",
"total": "$total"
}
}
}}
])
Slightly different output, but much simpler to process and much faster:
{
"_id": "ffc74819-c844-4d61-8657-b6ab09617271",
"tags": [
{ "tag": "Narendra Modi", "total": 2 },
{ "tag": "Prakash Javadekar", "total": 1 },
{ "tag": "Shastri Bhawan", "total": 1 },
{ "tag": "Prime Minister's Office (PMO)", "total": 1 },
{ "tag": "explosion", "total": 1 },
{ "tag": "GAIL", "total": 1 },
{ "tag": "Andhra Pradesh", "total": 1 },
{ "tag": "N Chandrababu Naidu", "total": 1 },
{ "tag": "Prime Minister", "total": 1 },
{ "tag": "Bharatiya Janata Party (BJP)", "total": 1 },
{ "tag": "Government", "total": 1 }
]
}
Also sorted by "tag relevance score" for the user for good measure, but you can look at dropping that or even both of the last stages as is appropriate to your actual case.
Still, by far the best option. Get to learn how to use the aggregation framework. If your "output" will still be "big" ( over 16MB ) then try to look at moving to MongoDB 2.6 or greater. Aggregate statements can produce a "cursor" which can be iterated rather than pull all results at once. Also there is the $out operator which can create a collection just like mapReduce does.
If your data is actually in the "hash" like format of sub-documents how you indicate in your notation of this ( which follows a PHP "dump" convention for arrays ), then you need to use mapReduce as the aggregation framework cannot traverse "hash-keys" the way these are represented. Not the best structure, and you should change it if this is the case.
Still there are several corrections to your approach and this does in fact become a single step operation to the final result. Again though, the final output will contain and "array" of "tags", since it really is not good practice to use your "data" as "key" names:
db.collection.mapReduce(
function() {
var tag_cloud = this.account_details.tag_cloud;
var obj = {};
for ( var k in tag_cloud ) {
obj[tag_cloud[k]] = 1;
}
emit( this.autoUid, obj );
},
function(key,values) {
var reduced = {};
// Combine keys and totals
values.forEach(function(value) {
for ( var k in value ) {
if (!reduced.hasOwnProperty(k))
reduced[k] = 0;
reduced[k] += value[k];
}
});
return reduced;
},
{
"out": { "inline": 1 },
"finalize": function(key,value) {
var output = [];
// Mapped to array for output
for ( var k in value ) {
output.push({
"tag": k,
"total": value[k]
});
}
// Even sorted just the same
return output.sort(function(a,b) {
return ( a.total < b.total ) ? -1 : ( a.total > b.total ) ? 1 : 0;
});
}
}
)
Or if it actually is an "array" of "tags" in your original document but your end output will be too big and you cannot move up to a recent release, then the initial array processing is just a little different:
db.collection.mapReduce(
function() {
var tag_cloud = this.account_details.tag_cloud;
var obj = {};
tag_cloud.forEach(function(tag) {
obj[tag] = 1;
});
emit( this.autoUid, obj );
},
function(key,values) {
var reduced = {};
// Combine keys and totals
values.forEach(function(value) {
for ( var k in value ) {
if (!reduced.hasOwnProperty(k))
reduced[k] = 0;
reduced[k] += value[k];
}
});
return reduced;
},
{
"out": { "replace": "newcollection" },
"finalize": function(key,value) {
var output = [];
// Mapped to array for output
for ( var k in value ) {
output.push({
"tag": k,
"total": value[k]
});
}
// Even sorted just the same
return output.sort(function(a,b) {
return ( a.total < b.total ) ? -1 : ( a.total > b.total ) ? 1 : 0;
});
}
}
)
Everything essentially follows the same principles to get to the end result:
De-normalize to a "user" and "tag" combination with "user" and the grouping key
Combine the results per user with a total on "tag" values.
In the mapReduce approach here, apart from being cleaner than what you seemed to be trying, the other main point to consider here is that the reducer needs to "output" exactly the same sort of "input" that comes from the mapper. The reason is actually well documented, as the "reducer" can in fact get called several times, basically "reducing again" output that has already been through reduce processing.
This is generally how mapReduce deals with "large inputs", where there are lots of values for a given "key" and the "reducer" only processes so many of them at one time. For example a reducer may actually only take 30 or so documents emitted with the same key, reduce two sets of those 30 down to 2 documents and then finally reduce to a single output for a single key.
The end result here is the same as the other output shown above, with the mapReduce difference that everything is under a "value" key as that is just how it works.
So a couple of ways to do it depending on your data. Do try to stick with the aggregation framework where possible as it is much faster and modern versions can consume and output just as much data as you can throw at mapReduce.

Categories

Resources