replace multiple values in json/jsObject/string - javascript

I have a response from a web service and want to replace some values in the response with my custom values.
One way is to write a tree traverser and then check for the value and replace with my custom value
so the response is some what like this:
[
{
"name": "n1",
"value": "v1",
"children": [
{
"name": "n2",
"value": "v2"
}
]
},
{
"name": "n3",
"value": "v3"
}
]
now my custom map is like this
const map = {
"v1": "v11",
"v2": "v22",
"v3": "v33"
};
All I want is
[
{
"name": "n1",
"value": "v11",
"children": [
{
"name": "n2",
"value": "v22"
}
]
},
{
"name": "n3",
"value": "v33"
}
]
I was thinking if I could stringify my response and then replace values using a custom build regex from my map of values.
Will it be faster as compared to tree traverser?
If yes, how should I do that?
somewhat like this
originalString.replace(regexp, function (replacement))

The tree traversal is faster
Note that some things could be done more efficiently in the regex implementation but I still think there are some more bottlenecks to explain.
Why the regex is slow:
There are probably many more reasons why the regex is slower but I'll explain at least one significant reason:
When you're using regex to find and replace, you're using creating new strings every time and performing your matches every time. Regex expressions can be very expensive and my implementation isn't particularly cheap.
Why is the tree traversal faster:
In the tree traversal, I'm mutating the object directly. This doesn't require creating new string objects or any new objects at all. We're also not performing a full search on the whole string every time as well.
RESULTS
run the performance test below. The test using console.time to record how long it takes. See the the tree traversal is much faster.
function usingRegex(obj, map) {
return JSON.parse(Object.keys(map).map(oldValue => ({
oldValue,
newValue: map[oldValue]
})).reduce((json, {
oldValue,
newValue
}) => {
return json.replace(
new RegExp(`"value":"(${oldValue})"`),
() => `"value":"${newValue}"`
);
}, JSON.stringify(obj)));
}
function usingTree(obj, map) {
function traverse(children) {
for (let item of children) {
if (item && item.value) {
// get a value from a JS object is O(1)!
item.value = map[item.value];
}
if (item && item.children) {
traverse(item.children)
}
}
}
traverse(obj);
return obj; // mutates
}
const obj = JSON.parse(`[
{
"name": "n1",
"value": "v1",
"children": [
{
"name": "n2",
"value": "v2"
}
]
},
{
"name": "n3",
"value": "v3"
}
]`);
const map = {
"v1": "v11",
"v2": "v22",
"v3": "v33"
};
// show that each function is working first
console.log('== TEST THE FUNCTIONS ==');
console.log('usingRegex', usingRegex(obj, map));
console.log('usingTree', usingTree(obj, map));
const iterations = 10000; // ten thousand
console.log('== DO 10000 ITERATIONS ==');
console.time('regex implementation');
for (let i = 0; i < iterations; i += 1) {
usingRegex(obj, map);
}
console.timeEnd('regex implementation');
console.time('tree implementation');
for (let i = 0; i < iterations; i += 1) {
usingTree(obj, map);
}
console.timeEnd('tree implementation');

Will it be faster as compared to tree traverser?
I don't know. I think it would depend on the size of the input, and the size of the replacement map. You could run some tests at JSPerf.com.
If yes, how should I do that?
It's fairly easy to do with a regex-based string replacement if the values you are replacing don't need any special escaping or whatever. Something like this:
const input = [
{
"name": "n1",
"value": "v1",
"children": [
{
"name": "n2",
"value": "v2"
}
]
},
{
"name": "n3",
"value": "v3"
}
];
const map = {
"v1": "v11",
"v2": "v22",
"v3": "v33"
};
// create a regex that matches any of the map keys, adding ':' and quotes
// to be sure to match whole property values and not property names
const regex = new RegExp(':\\s*"(' + Object.keys(map).join('|') + ')"', 'g');
// NOTE: if you've received this data as JSON then do the replacement
// *before* parsing it, don't parse it then restringify it then reparse it.
const json = JSON.stringify(input);
const result = JSON.parse(
json.replace(regex, function(m, key) { return ': "' + map[key] + '"'; })
);
console.log(result);

definitely traverser go faster as string replace means travels against each characters in the final string as opposed to iterator that can skips no necessarily item.

Related

Javascript: Removing Semi-Duplicate Objects within an Array with Conditions

I am trying to remove the "Duplicate" objects within an array while retaining the object that has the lowest value associated with it.
~~Original
var array = [
{
"time": "2021-11-12T20:37:11.112233Z",
"value": 3.2
},
{
"time": "2021-11-12T20:37:56.115222Z",
"value": 3.8
},
{
"time": "2021-11-13T20:37:55.112255Z",
"value": 4.2
},
{
"time": "2021-11-13T20:37:41.112252Z",
"value": 2
},
{
"time": "2021-11-14T20:37:22.112233Z",
"value": 3.2
}
]
~~Expected Output
var array = [
{
"time": "2021-11-12T20:37:11.112233Z",
"value": 3.2
},
{
"time": "2021-11-13T20:37:41.112252Z",
"value": 2
},
{
"time": "2021-11-14T20:37:22.112233Z",
"value": 3.2
}
]
What I have so far:
var result = array.reduce((aa, tt) => {
if (!aa[tt.time]) {
aa[tt.time] = tt;
} else if (Number(aa[tt.time].value) < Number(tt.value)) {
aa[tt.time] = tt;
}
return aa;
}, {});
console.log(result);
I realize the issue with what I am trying to do is that the "time" attribute is not identical to the other time values I am considering as duplicates.
Though for this use case I do not need the time out to ms. YYYY-MM-DDTHH:MM (to the minute) is fine. I am not sure how to implement a reduction method for this case when the time isnt exactly the same. Maybe if only the first 16 characters were checked in the string?
Let me know if any additional information is needed.
So a few issues:
If you want to only check the first 16 characters to detect a duplicate, you should use that substring of tt.time as key for aa instead of the whole string.
Since you want the minimum, your comparison operator is wrong.
The code produces an object, while you want an array, so you still need to extract the values from the object.
Here is your code with those adaptations:
var array = [{"time": "2021-11-12T20:37:11.112233Z","value": 3.2},{"time": "2021-11-12T20:37:56.115222Z","value": 3.8},{"time": "2021-11-13T20:37:55.112255Z","value": 4.2},{"time": "2021-11-13T20:37:41.112252Z","value": 2},{"time": "2021-11-14T20:37:22.112233Z","value": 3.2}];
var result = Object.values(array.reduce((aa, tt) => {
var key = tt.time.slice(0, 16);
if (!aa[key]) {
aa[key] = tt;
} else if (Number(aa[key].value) > Number(tt.value)) {
aa[key] = tt;
}
return aa;
}, {}));
console.log(result);

How to get all values of given specific keys (for e.g: name) without loop from json?

I want to fetch all the names and label from JSON without loop. Is there a way to fetch with any filter method?
"sections": [
{
"id": "62ee1779",
"name": "Drinks",
"items": [
{
"id": "1902b625",
"name": "Cold Brew",
"optionSets": [
{
"id": "45f2a845-c83b-49c2-90ae-a227dfb7c513",
"label": "Choose a size",
},
{
"id": "af171c34-4ca8-4374-82bf-a418396e375c",
"label": "Additional Toppings",
},
],
},
]
}
When you say "without loops" I take it as without For Loops. because any kind of traversal of arrays, let alone nested traversal, involve iterating.
You can use the reduce method to have it done for you internally and give you the format you need.
Try this :
const data = {
sections: [
{
id: "62ee1779",
name: "Drinks",
items: [
{
id: "1902b625",
name: "Cold Brew",
optionSets: [
{
id: "45f2a845-c83b-49c2-90ae-a227dfb7c513",
label: "Choose a size"
},
{
id: "af171c34-4ca8-4374-82bf-a418396e375c",
label: "Additional Toppings"
}
]
}
]
}
]
};
x = data.sections.reduce((acc, ele) => {
acc.push(ele.name);
otherName = ele.items.reduce((acc2, elem2) => {
acc2.push(elem2.name);
label = elem2.optionSets.reduce((acc3, elem3) => {
acc3.push(elem3.label);
return acc3;
}, []);
return acc2.concat(label);
}, []);
return acc.concat(otherName);
}, []);
console.log(x);
Go ahead and press run snippet to see if this matches your desired output.
For More on info reduce method
In the context of cJSON
yes, we can fetch the key value for any of the object.
1 - each key value is pointed by one of the objects. will simply fetch that object and from there will get the key value.
In the above case for
pre-requisition: root must contain the json format and root must be the cJSON pointer. if not we can define it and use cJSON_Parse() to parse the json.
1st name object is "sections" will use
cJSON *test = cJSON_GetObjectItem(root, "sections");
char *name1 = cJSON_GetObjectItem(test, "name" )->valuestring;
2nd name key value
cJSON *test2 = cJSON_GetObjectItem(test, "items");
char *name2 = cJSON_GetObjectItem(tes2, "name")->valuestring;
likewise, we can do for others as well to fetch the key value.

How to extract multiple hashtags from a JSON object?

I am trying to extract "animal" and "fish" hashtags from the JSON object below. I know how to extract the first instance named "animal", but I have no idea how to extract both instances. I was thinking to use a loop, but unsure where to start with it. Please advise.
data = '{"hashtags":[{"text":"animal","indices":[5110,1521]},
{"text":"Fish","indices":[122,142]}],"symbols":[],"user_mentions":
[{"screen_name":"test241","name":"Test
Dude","id":4999095,"id_str":"489996095","indices":[30,1111]},
{"screen_name":"test","name":"test","id":11999991,
"id_str":"1999990", "indices":[11,11]}],"urls":[]}';
function showHashtag(data){
i = 0;
obj = JSON.parse(data);
console.log(obj.hashtags[i].text);
}
showHashtag(data);
Use Array.prototype.filter():
let data = '{"hashtags":[{"text":"animal","indices":[5110,1521]},{"text":"Fish","indices":[122,142]}],"symbols":[],"user_mentions":[{"screen_name":"test241","name":"Test Dude","id":4999095,"id_str":"489996095","indices":[30,1111]}, {"screen_name":"test","name":"test","id":11999991, "id_str":"1999990", "indices":[11,11]}],"urls":[]}';
function showHashtag(data){
return JSON.parse(data).hashtags.filter(e => /animal|fish/i.test(e.text))
}
console.log(showHashtag(data));
To make the function reusable, in case you want to find other "hashtags", you could pass an array like so:
function showHashtag(data, tags){
let r = new RegExp(tags.join("|"), "i");
return JSON.parse(data).hashtags.filter(e => r.test(e.text))
}
console.log(showHashtag(data, ['animal', 'fish']));
To get only the text property, just chain map()
console.log(showHashtag(data, ['animal', 'fish']).map(e => e.text));
or in the function
return JSON.parse(data).hashtags
.filter(e => /animal|fish/i.test(e.text))
.map(e => e.text);
EDIT:
I don't really get why you would filter by animal and fish if all you want is an array with ['animal', 'fish']. To only get the objects that have a text property, again, use filter, but like this
let data = '{"hashtags":[{"text":"animal","indices":[5110,1521]},{"text":"Fish","indices":[122,142]}],"symbols":[],"user_mentions":[{"screen_name":"test241","name":"Test Dude","id":4999095,"id_str":"489996095","indices":[30,1111]}, {"screen_name":"test","name":"test","id":11999991, "id_str":"1999990", "indices":[11,11]}],"urls":[]}';
function showHashtag(data){
return JSON.parse(data).hashtags
.filter(e => e.text)
.map(e => e.text);
}
console.log(showHashtag(data));
For me, Lodash can be of great use here, which have different functions in terms of collections. For your case i'd use _.find function to help check the array and get any of the tags with the creteria passed in as second argument like so:
.find(collection, [predicate=.identity], [fromIndex=0])
source npm package
Iterates over elements of collection, returning the first element
predicate returns truthy for. The predicate is invoked with three
arguments: (value, index|key, collection).
with your case this should work
var data = '{ "hashtags": [ { "text": "animal", "indices": [ 5110, 1521 ] }, { "text": "Fish", "indices": [ 122, 142 ] } ], "symbols": [], "user_mentions": [ { "screen_name": "test241", "name": "Test \n Dude", "id": 4999095, "id_str": "489996095", "indices": [ 30, 1111 ] }, { "screen_name": "test", "name": "test", "id": 11999991, "id_str": "1999990", "indices": [ 11, 11 ] } ], "urls": [] }';
var obj = JSON.parse(data);
_.find(obj.hashtags, { 'text': 'animal' });
// => { "text": "animal", "indices": [ 5110, 1521 ] }
For simple parsing like this one, I would use the plain old obj.forEach() method, it is more readable and easy to understand, especially for javascript beginner.
obj = JSON.parse(data).hashtags;
obj.forEach(function(element) {
console.log(element['text']);
});

Is it possible to access a json array element without using index number?

I have the following JSON:
{
"responseObject": {
"name": "ObjectName",
"fields": [
{
"fieldName": "refId",
"value": "2170gga35511"
},
{
"fieldName": "telNum",
"value": "4541885881"
}]}
}
I want to access "value" of the the array element with "fieldName": "telNum" without using index numbers, because I don't know everytime exactly at which place this telNum element will appear.
What I dream of is something like this:
jsonVarName.responseObject.fields['fieldname'='telNum'].value
Is this even possible in JavaScript?
You can do it like this
var k={
"responseObject": {
"name": "ObjectName",
"fields": [
{
"fieldName": "refId",
"value": "2170gga35511"
},
{
"fieldName": "telNum",
"value": "4541885881"
}]
}};
value1=k.responseObject.fields.find(
function(i)
{return (i.fieldName=="telNum")}).value;
console.log(value1);
There is JSONPath that lets you write queries just like XPATH does for XML.
$.store.book[*].author the authors of all books in the store
$..author all authors
$.store.* all things in store, which are some books and a red bicycle.
$.store..price the price of everything in the store.
$..book[2] the third book
$..book[(#.length-1)]
$..book[-1:] the last book in order.
$..book[0,1]
$..book[:2] the first two books
$..book[?(#.isbn)] filter all books with isbn number
$..book[?(#.price<10)] filter all books cheapier than 10
$..* All members of JSON structure.
You will have to loop through and find it.
var json = {
"responseObject": {
"name": "ObjectName",
"fields": [
{
"fieldName": "refId",
"value": "2170gga35511"
},
{
"fieldName": "telNum",
"value": "4541885881"
}]
};
function getValueForFieldName(fieldName){
for(var i=0;i<json.fields.length;i++){
if(json.fields[i].fieldName == fieldName){
return json.fields[i].value;
}
}
return false;
}
console.log(getValueForFieldName("telNum"));
It might be a better option to modify the array into object with fieldName as keys once to avoid using .find over and over again.
fields = Object.assign({}, ...fields.map(field => {
const newField = {};
newField[field.fieldName] = field.value;
return newField;
}
It's not possible.. Native JavaScript has nothing similar to XPATH like in xml to iterate through JSON. You have to loop or use Array.prototype.find() as stated in comments.
It's experimental and supported only Chrome 45+, Safari 7.1+, FF 25+. No IE.
Example can be found here
Clean and easy way to just loop through array.
var json = {
"responseObject": {
"name": "ObjectName",
"fields": [
{
"fieldName": "refId",
"value": "2170gga35511"
},
{
"fieldName": "telNum",
"value": "4541885881"
}]
}
$(json.responseObject.fields).each(function (i, field) {
if (field.fieldName === "telNum") {
return field.value // break each
}
})

mapreduce with sort on inner document mongodb

I have a quick question on map-reduce with mongodb. I have this following document structure
{
"_id": "ffc74819-c844-4d61-8657-b6ab09617271",
"value": {
"mid_tag": {
"0": {
"0": "Prakash Javadekar",
"1": "Shastri Bhawan",
"2": "Prime Minister's Office (PMO)",
"3": "Narendra Modi"
},
"1": {
"0": "explosion",
"1": "GAIL",
"2": "Andhra Pradesh",
"3": "N Chandrababu Naidu"
},
"2": {
"0": "Prime Minister",
"1": "Narendra Modi",
"2": "Bharatiya Janata Party (BJP)",
"3": "Government"
}
},
"total": 3
}
}
when I am doing my map reduce code on this collection of documents I want to specify total as the sort field in this command
db.ana_mid_big.mapReduce(map, reduce,
{
out: "analysis_result",
sort: {"value.total": -1}
}
);
But this does not seem to work. How can I specify a key which is nested for sorting? Please help.
----------------------- EDIT ---------------------------------
as per the comments I am posting my whole problem here. I have started with a collection with a little more than 3.5M documents (this is just an old snap shot of the live one, which already crossed 5.5 M) which looks like this
{
"_id": ObjectId("53b394d6f9c747e33d19234d"),
"autoUid": "ffc74819-c844-4d61-8657-b6ab09617271"
"createDate": ISODate("2014-07-02T05:12:54.171Z"),
"account_details": {
"tag_cloud": {
"0": "FIFA World Cup 2014",
"1": "Brazil",
"2": "Football",
"3": "Argentina",
"4": "Belgium"
}
}
}
So, there can be many documents with the same autoUid but with different (or partially same or even same) tag_cloud.
I have written this following map-reduce to generate an intermediate collection which looks like the one at the start of the question. So, evidently that is collection of all the tag_clouds belongs to one person in a single document. To achieve this the MR code i used looks like the following
var map = function(){
final_val = {
tag_cloud: this.account_details.tag_cloud,
total: 1
};
emit(this.autoUid, final_val)
}
var reduce = function(key, values){
var fv = {
mid_tags: [],
total: 0
}
try{
for (i in values){
fv.mid_tags.push(values[i].tag_cloud);
fv.total = fv.total + 1;
}
}catch(e){
fv.mid_tags.push(values)
fv.total = fv.total + 1;
}
return fv;
}
db.my_orig_collection.mapReduce(map, reduce,
{
out: "analysis_mid",
sort: {createDate: -1}
}
);
Here comes problem Number-1 when somebody has more than one record it obeys reduce function. But when somebody has only one instead of naming it "mid_tag" it retains the name "tag_cloud". I understand that there is some problem with the reduce code but can not find what.
Now I want to reach to a final result which looks like
{"_id": "ffc74819-c844-4d61-8657-b6ab09617271",
"value": {
"tags": {
"Prakash Javadekar": 1,
"Shastri Bhawan": 1,
"Prime Minister's Office (PMO)": 1,
"Narendra Modi": 2,
"explosion": 1,
"GAIL": 1,
"Andhra Pradesh": 1,
"N Chandrababu Naidu": 1,
"Prime Minister": 1,
"Bharatiya Janata Party (BJP)": 1,
"Government": 1
}
}
Which is finally one document for each person representing the tag density they have used. The MR code I am trying to use (not tested yet) looks like this---
var map = function(){
var val = {};
if ("mid_tags" in this.value){
for (i in this.value.mid_tags){
for (j in this.value.mid_tags[i]){
k = this.value.mid_tags[i][j].trim();
if (!(k in val)){
val[k] = 1;
}else{
val[k] = val[k] + 1;
}
}
}
var final_val = {
tag: val,
total: this.value.total
}
emit(this._id, final_val);
}else if("tag_cloud" in this.value){
for (i in this.value.tag_cloud){
k = this.value.tag_cloud[i].trim();
if (!(k in val)){
val[k] = 1;
}else{
val[k] = val[k] + 1;
}
}
var final_val = {
tag: val,
total: this.value.total
}
emit(this._id, final_val);
}
}
var reduce = function(key, values){
return values;
}
db.analysis_mid.mapReduce(map, reduce,
{
out: "analysis_result"
}
);
This last piece of code is not tested yet. That is all I want to do. Please help
Your PHP background appears to be showing. The data structures you are representing are not showing arrays in typical JSON notation, however there are noted calls to "push" in your mapReduce code that at least in your "interim document" the values are actually arrays. You seem to have "notated" them the same way so it seems reasonable to presume they are.
Actual arrays are your best option for storage here, especially considering your desired outcome. So even if they do not, your original documents should look like this, as they would be represented in the shell:
{
"_id": ObjectId("53b394d6f9c747e33d19234d"),
"autoUid": "ffc74819-c844-4d61-8657-b6ab09617271"
"createDate": ISODate("2014-07-02T05:12:54.171Z"),
"account_details": {
"tag_cloud": [
"FIFA World Cup 2014",
"Brazil",
"Football",
"Argentina",
"Belgium"
]
}
}
With documents like that or if you change them to be like that, then your right tool for doing this is the aggregation framework. That works in native code and does not require JavaScript interpretation, hence it is much faster.
An aggregation statement to get to your final result is like this:
db.collection.aggregate([
// Unwind the array to "de-normalize"
{ "$unwind": "$account_details.tag_cloud" },
// Group by "autoUid" and "tag", summing totals
{ "$group": {
"_id": {
"autoUid": "$autoUid",
"tag": "$account_details.tag_cloud"
},
"total": { "$sum": 1 }
}},
// Sort the results to largest count per user
{ "$sort": { "_id.autoUid": 1, "total": -1 }
// Group to a single user with an array of "tags" if you must
{ "$group": {
"_id": "$_id.autoUid",
"tags": {
"$push": {
"tag": "$_id.tag",
"total": "$total"
}
}
}}
])
Slightly different output, but much simpler to process and much faster:
{
"_id": "ffc74819-c844-4d61-8657-b6ab09617271",
"tags": [
{ "tag": "Narendra Modi", "total": 2 },
{ "tag": "Prakash Javadekar", "total": 1 },
{ "tag": "Shastri Bhawan", "total": 1 },
{ "tag": "Prime Minister's Office (PMO)", "total": 1 },
{ "tag": "explosion", "total": 1 },
{ "tag": "GAIL", "total": 1 },
{ "tag": "Andhra Pradesh", "total": 1 },
{ "tag": "N Chandrababu Naidu", "total": 1 },
{ "tag": "Prime Minister", "total": 1 },
{ "tag": "Bharatiya Janata Party (BJP)", "total": 1 },
{ "tag": "Government", "total": 1 }
]
}
Also sorted by "tag relevance score" for the user for good measure, but you can look at dropping that or even both of the last stages as is appropriate to your actual case.
Still, by far the best option. Get to learn how to use the aggregation framework. If your "output" will still be "big" ( over 16MB ) then try to look at moving to MongoDB 2.6 or greater. Aggregate statements can produce a "cursor" which can be iterated rather than pull all results at once. Also there is the $out operator which can create a collection just like mapReduce does.
If your data is actually in the "hash" like format of sub-documents how you indicate in your notation of this ( which follows a PHP "dump" convention for arrays ), then you need to use mapReduce as the aggregation framework cannot traverse "hash-keys" the way these are represented. Not the best structure, and you should change it if this is the case.
Still there are several corrections to your approach and this does in fact become a single step operation to the final result. Again though, the final output will contain and "array" of "tags", since it really is not good practice to use your "data" as "key" names:
db.collection.mapReduce(
function() {
var tag_cloud = this.account_details.tag_cloud;
var obj = {};
for ( var k in tag_cloud ) {
obj[tag_cloud[k]] = 1;
}
emit( this.autoUid, obj );
},
function(key,values) {
var reduced = {};
// Combine keys and totals
values.forEach(function(value) {
for ( var k in value ) {
if (!reduced.hasOwnProperty(k))
reduced[k] = 0;
reduced[k] += value[k];
}
});
return reduced;
},
{
"out": { "inline": 1 },
"finalize": function(key,value) {
var output = [];
// Mapped to array for output
for ( var k in value ) {
output.push({
"tag": k,
"total": value[k]
});
}
// Even sorted just the same
return output.sort(function(a,b) {
return ( a.total < b.total ) ? -1 : ( a.total > b.total ) ? 1 : 0;
});
}
}
)
Or if it actually is an "array" of "tags" in your original document but your end output will be too big and you cannot move up to a recent release, then the initial array processing is just a little different:
db.collection.mapReduce(
function() {
var tag_cloud = this.account_details.tag_cloud;
var obj = {};
tag_cloud.forEach(function(tag) {
obj[tag] = 1;
});
emit( this.autoUid, obj );
},
function(key,values) {
var reduced = {};
// Combine keys and totals
values.forEach(function(value) {
for ( var k in value ) {
if (!reduced.hasOwnProperty(k))
reduced[k] = 0;
reduced[k] += value[k];
}
});
return reduced;
},
{
"out": { "replace": "newcollection" },
"finalize": function(key,value) {
var output = [];
// Mapped to array for output
for ( var k in value ) {
output.push({
"tag": k,
"total": value[k]
});
}
// Even sorted just the same
return output.sort(function(a,b) {
return ( a.total < b.total ) ? -1 : ( a.total > b.total ) ? 1 : 0;
});
}
}
)
Everything essentially follows the same principles to get to the end result:
De-normalize to a "user" and "tag" combination with "user" and the grouping key
Combine the results per user with a total on "tag" values.
In the mapReduce approach here, apart from being cleaner than what you seemed to be trying, the other main point to consider here is that the reducer needs to "output" exactly the same sort of "input" that comes from the mapper. The reason is actually well documented, as the "reducer" can in fact get called several times, basically "reducing again" output that has already been through reduce processing.
This is generally how mapReduce deals with "large inputs", where there are lots of values for a given "key" and the "reducer" only processes so many of them at one time. For example a reducer may actually only take 30 or so documents emitted with the same key, reduce two sets of those 30 down to 2 documents and then finally reduce to a single output for a single key.
The end result here is the same as the other output shown above, with the mapReduce difference that everything is under a "value" key as that is just how it works.
So a couple of ways to do it depending on your data. Do try to stick with the aggregation framework where possible as it is much faster and modern versions can consume and output just as much data as you can throw at mapReduce.

Categories

Resources