what is the equivalent of a reduce in javascript - javascript

I'm a backend dev moved recently onto js side. I was going through a tutorial and came across the below piece of code.
clickCreate: function(component, event, helper) {
var validExpense = component.find('expenseform').reduce(function (validSoFar, inputCmp) {
// Displays error messages for invalid fields
inputCmp.showHelpMessageIfInvalid();
return validSoFar && inputCmp.get('v.validity').valid;
}, true);
// If we pass error checking, do some real work
if(validExpense){
// Create the new expense
var newExpense = component.get("v.newExpense");
console.log("Create expense: " + JSON.stringify(newExpense));
helper.createExpense(component, newExpense);
}
}
Here I tried to understand a lot on what's happening, there is something called reduce and another thing named validSoFar. I'm unable to understand what's happening under the hood. :-(
I do get the regular loops stuff as done in Java.
Can someone please shower some light on what's happening here. I should be using this a lot in my regular work.
Thanks

The reduce function here is iterating through each input component of the expense form and incrementally mapping to a boolean. If you have say three inputs each with a true validity, the reduce function would return:
true && true where the first true is the initial value passed into reduce.
true && true and where the first true here is the result of the previous result.
true && true
At the end of the reduction, you're left with a single boolean representing the validity of the entire, where by that if just a single input component's validity is false, the entire reduction will amount to false. This is because validSoFar keeps track of the overall validity and is mutated by returning the compound of the whether the form is valid so far and the validity of the current input in iteration.

This is a reasonable equivalent:
var validExpense = true;
var inputCmps = component.find('expenseform')
for (var i = 0; i < inputCmps.length; i++) {
// Displays error messages for invalid fields
inputCmp.showHelpMessageIfInvalid();
if (!inputCmp.get('v.validity').valid) {
validExpense = false;
}
}
// Now we can use validExpense
This is a somewhat strange use of reduce, to be honest, because it does more than simply reducing a list to a single value. It also produces side effects (presumably) in the call to showHelpMessageIfInvalid().
The idea of reduce is simple. Given a list of values that you want to fold down one at a time into a single value (of the same or any other type), you supply a function that takes the current folded value and the next list value and returns a new folded value, and you supply an initial folded value, and reduce combines them by calling the function with each successive list value and the current folded value.
So, for instance,
var items = [
{name: 'foo', price: 7, quantity: 3},
{name: 'bar', price: 5, quantity: 5},
{name: 'baz', price: 19, quantity: 1}
]
const totalPrice = items.reduce(
(total, item) => total + item.price * item.quantity, // folding function
0 // initial value
); //=> 65

It does not make sense to use reduce there and have side effects in the reduce. Better use Array.prototype.filter to get all invalid expense items.
Then use Array.prototype.forEach to produce side effect(s) for each invalid item. You can then check the length of invalid expense items array to see it your input was valid:
function(component, event, helper) {
var invalidExpenses = component.find('expenseform').filter(
function(ex){
//return not valid (!valid)
return !ex.get('v.validity').valid
}
);
invalidExpenses.forEach(
//use forEach if you need a side effect for each thing
function(ex){
ex.showHelpMessageIfInvalid();
}
);
// If we pass error checking, do some real work
if(invalidExpenses.length===0){//no invalid expense items
// Create the new expense
var newExpense = component.get("v.newExpense");
console.log("Create expense: " + JSON.stringify(newExpense));
helper.createExpense(component, newExpense);
}
}
The mdn documentation for Array.prototype.reduce has a good description and examples on how to use it.
It should take an array of things and return one other thing (can be different type of thing). But you won't find any examples there where side effects are initiated in the reducer function.

Related

Use one object to reduce another using lodash

I have a need to use the information in one object to determine which information in another to include.
The idea being that display controls what should be included, while alarms contains all the alarms in all their states.
display={
0: false,
1: true,
2: true,
3: true,
ACK: false,
MASKED: false,
SHELVED: false,
}
alarms={inAlarm:{ 0:["A"],
1:["B"],
2:["C"],
3:["D"],
},
latched:{ 1:[],
2:["C"],
3:[],
},
connectionError:["E"],
ACK:["F"],
SHELVED:["G"],
MASKED:[],
}
Everything in latched shoudl always be included, but the others can be turned on/off.
So in this case I want to yield a single array which should be: ["B","C","D"]. I'm looking for a neat lodash way to do it, or even an efficient and elegant standard js version - but preferably something that doesn't need helper functions.
Extending #Fonty's answer which will not work for keys 0, 1, 2, 3 in display object becuase for them the inAlarm key's value neeed to be picked:
var result = [];
for (var key in display) {
if (!display[key]) {
continue;
}
if (isNaN(key)) { // for non-numeric keys
result = _.union(result, alarms[key]);
} else { // for numeric keys
result = _.union(result, alarms.inAlarm[key]);
}
}
result = _.union(result, _.union(_.flatten(_.values(alarms.latched))));
console.log(result);
So, while it may not be the most elegant solution and it's not straight up lodash, it's not too bad.
The best solution was to not include the inAlarm key, but rather pull those values out to the top level. Then I can simply do:
let tempA=[]
for(var key in display){
if(display[key]){
tempA=_.union(tempA, alarms[key])
}
}
tempA=_.union(tempA, _.flatten(_.union(_.values(alarms.LATCHED))))
This seems quick and non intensive, even if I had a few thousand entries.

Extracting specific values from an array within an object

I'm setting up a test to ensure that a faceted Solr query 'contents' are correctly displayed within a page element, using javascript.
The Solr query result, which I've named "ryanlinkstransmissionpage", is;
{ Transmission: [ 'Manual', 12104, 'Automatic', 9858 ] }
What I would like to do is extract the 'Manual' and 'Automatic' only, so I can then test that these values are displayed on a page.
However, it is more the functionality involved in this that I cannot get my head around, as I will be using this method on other Solr query results.
To possibly complicate things, this Solr query result "ryanlinkstransmissionpage" is from a dynamic 'live' Solr, so the values may change each time it's run (so there may be more or less values within this array when it's tested on the following day for example).
I've tried a few javascript commands, but to no avail.
JSON.parse(ryanlinkstransmissionpage)
JSON.stringify(ryanlinkstransmissionpage)
Object.values(ryanlinkstransmissionpage)
Any help would be greatly appreciated. Thanks.
If possible, i highyl recommend changing the transmission field to be an object, rather than an array. That will give you far greater ability to read the data within.
Ignoring that, are you looking to extract the string values and the number values that follow them? ie. "Manual" and "12104"? Or are you simply trying to assert that the string values are present on the page?
Either way, here are two possible approaches.
const ryanlinkstransmissionpage = { Transmission: [ 'Manual', 12104, 'Automatic', 9858 ] };
// Pull out the string values
const strngVals = ryanlinkstransmissionpage.Transmission.filter(val => typeof val === 'string');
// Pull out the string values and the numbers that follow
const strngNumVals = ryanlinkstransmissionpage.Transmission.reduce((keyVals, val, idx, srcArr) => {
if (typeof val === 'string') keyVals[val] = srcArr[idx + 1];
return keyVals;
}, {});
The reduce approach is not stable or robust to changes in data provided from this Solr query result you refer to, nor is it tested. #shrug
Javascript has a built in method called Array.prototype.find(() =>). If you just want to check if this value exists to ensure its on the page, you can simply do:
const ryanlinkstransmissionpage = { Transmission: [ 'Manual', 12104, 'Automatic', 9858 ] };
const manual = ryanlinkstransmissionpage.Transmission.find((ele) => ele === 'Manual'); // returns 'Manual'
const automatic = ryanlinkstransmissionpage.Transmission.find((ele) => ele === 'Automatic'); // returns 'Automatic'
console.log(automatic);
console.log(manual);
// or
const findInArray = (arr, toFind) => {
const result = arr.find((ele) => ele === toFind);
return !!result;
}
console.log(findInArray(ryanlinkstransmissionpage.Transmission, 'Automatic')); // true
console.log(findInArray(ryanlinkstransmissionpage.Transmission, 'HelloWorld')); // false
console.log(findInArray(ryanlinkstransmissionpage.Transmission, 'Manual')); // true

Search through a big collection of objects

i have a really big collection of objects that i want to search through.
The array have > 60.000 items and the search performance can be really slow from time to time.
One object in that array looks like this:
{
"title": "title"
"company": "abc company"
"rating": 13 // internal rating based on comments and interaction
...
}
I want to search for the title and the company info and order that by the rating of the items.
This is what my search currently look like:
onSearchInput(searchTerm) {
(<any>window).clearTimeout(this.searchInputTimeout);
this.searchInputTimeout = window.setTimeout(() => {
this.searchForFood(searchTerm);
}, 500);
}
searchForFood(searchTerm) {
if (searchTerm.length > 1) {
this.searchResults = [];
this.foodList.map(item => {
searchTerm.split(' ').map(searchTermPart => {
if (item.title.toLowerCase().includes(searchTermPart.toLowerCase())
|| item.company.toLowerCase().includes(searchTermPart.toLowerCase())) {
this.searchResults.push(item);
}
});
});
this.searchResults = this.searchResults.sort(function(a, b) {
return a.rating - b.rating;
}).reverse();
} else {
this.searchResults = [];
}
}
Question: Is there any way to improve the search logic and performance wise?
A bunch of hints:
It's a bit excessive to put searching through 60,000 items on the front-end. Any way you can perform part of the search on the back-end? If you really must do it on the front-end considering searching in chunks of e.g. 10,000 and then using a setImmediate() to perform the next part of the search so the user's browser won't completely freeze during processing time.
Do the splitting and lowercasing of the search term outside of the loop.
map() like you're using it is weird as you don't use the return value. Better to use forEach(). Better still, is use filter() to get the items that match.
When iterating over the search terms, use some() (as pointed out in the comments) as it's an opportunity to early return.
sort() mutates the original array so you don't need to re-assign it.
sort() with reverse() is usually a smell. Instead, swap the sides of your condition to be b - a.
At this scale, it may make sense to do performance tests with includes(), indexOf(), roll-your-own-for-loop, match() (can almost guarantee it will be slower though)
Alex's suggestions are good. My only suggestion would be, if you could afford to pre-process the data during idle time (preferably don't hold up first render or interaction) you could process the data into a modified prefix trie. That would let you search for the items in O(k) time where k is the length of the search term (right now you are searching in O(kn) time because you look at every item and then do an includes which takes k time (it's actually a little worse because of the toLowerCase's but I don't want to get into the weeds of it).
If you aren't familiar with what a trie is, hopefully the code below gives you the idea or you can search for information with your search engine of choice. It's basically a mapping of characters in a string in nested hash maps.
Here's some sample code of how you might construct the trie:
function makeTries(data){
let companyTrie = {};
let titleTrie = {};
data.forEach(item => {
addToTrie(companyTrie, item.company, item, 0);
addToTrie(titleTrie, item.title, item, 0);
});
return {
companyTrie,
titleTrie
}
}
function addToTrie(trie, str, item, i){
trie.data = trie.data || [];
trie.data.push(item);
if(i >= str.length)
return;
if(! trie[str[i]]){
trie[str[i]] = {};
}
addToTrie(trie[str[i]], str, item, ++i);
}
function searchTrie(trie, term){
if(trie == undefined)
return [];
if(term == "")
return trie.data;
return searchTrie(trie[term[0]], term.substring(1));
}
var testData = [
{
company: "abc",
title: "def",
rank: 5
},{
company: "abd",
title: "deg",
rank: 5
},{
company: "afg",
title: "efg",
rank: 5
},{
company: "afgh",
title: "efh",
rank: 5
},
];
const tries = makeTries(testData);
console.log(searchTrie(tries.companyTrie, "afg"));

dc.js Using two reducers without a simple dimension and second grouping stage

Quick question following up my response from this post:
dc.js Box plot reducer using two groups
Just trying to fully get my head around reducers and how to filter and collect data so I'll step through my understanding first.
Data Format:
{
"SSID": "eduroam",
"identifier": "Client",
"latitude": 52.4505,
"longitude": -1.9361,
"mac": "dc:d9:16:##:##:##",
"packet": "PR-REQ",
"timestamp": "2018-07-10 12:25:26",
"vendor": "Huawei Technologies Co.Ltd"
}
(1) Using the following should result in an output array of key value pairs (Key MAC Address & Value Count of networks connected to):
var MacCountsGroup = mac.group().reduce(
function (p, v) {
p[v.mac] = (p[v.mac] || 0) + v.counter;
return p;
},
function (p, v) {
p[v.mac] -= v.counter;
return p;
},
function () {
return {}; // KV Pair of MAC -> Count
}
);
(2) Then in order to use the object this must be passed flattened so it can be passed to a chart as follows:
function flatten_object_group(group) {
return {
all: function () {
return group.all().map(function (kv) {
return {
key: kv.key,
value: Object.values(kv.value).filter(function (v) {
return v > 0;
})
};
});
}
};
}
var connectionsGroup = flatten_object_group(MacCountsGroup);
(3) Then I pass mac as a piechart dimension & connectionsGroup as the group. This gives a chart back a chart with roughly 50,000 slices based on my dataset.
var packetPie = dc.pieChart("#packetPie");
packetPie
.height(495)
.width(350)
.radius(180)
.renderLabel(true)
.transitionDuration(1000)
.dimension(mac)
.ordinalColors(['#07453E', '#145C54', '#36847B'])
.group(connectionsGroup);
This works A'OK and I follow up to this point.
(4) Now I want to group by the values given out by the first reducer, i.e I want to combine all of the mac addresses with 1 network connection, 2 network connections and so on as slices.
How would this be done as a dimension of "Network connections"? How can I produce this summarized data which doesn't exist in my source data and is generated from mac?
Or would this require an intermediate function between the first reducer and flattening to combine all of the values from the first reducer?
You don't need to do all of that to get a pie chart of mac addresses.
There are a few faulty understandings in points 1-3, which I guess I'll address first. It looks like you copy and pasted code from the previous question, so I'm not really sure if this helps.
(1) If you have a dimension of mac addresses, reducing it like this won't have any further effect. The original idea was to dimension/group by vendor and then reduce counts for each mac address. This reduction will group by mac address and then further count instances of each mac address within each bin, so it's just an object with one key. It will produce a map of key value pairs like
{key: 'MAC-123', value: {'MAC-123': 12}}
(2) This will flatten the object within the values, dropping the keys and producing just an array of counts
{key: 'MAC-123', value: [12]}
(3) Since the pie chart is expecting simple key/value pairs with the value being a number, it is probably unhappy with getting values like the array [12]. The values are probably coerced to NaN.
(4) Okay, here's the real question, and it's actually not as easy as your previous question. We got off easy with the box plot because the "dimension" (in crossfilter terms, the keys you filter and group on) existed in your data.
Let's forget the false lead in points 1-3 above, and start from first principles.
There is no way to look at an individual row of your data and determine, without looking at anything else, if it belongs to the category "has 1 connection", "has 2 connections", etc. Assuming you want to be able to click on slices in the pie chart and filter all the data, we'll have to find another way to implement that.
But first let's look at how to produce a pie chart of "number of network connections". That's a little bit easier, but as far as I know, it does require a true "double reduce".
If we use the default reduction on the mac dimension, we'll get an array of key/value pairs, where the key is a mac address, and the value is the number of connections for that address:
[
{
"key": "1c:b7:2c:48",
"value": 8
},
{
"key": "1c:b7:be:ef",
"value": 3
},
{
"key": "6c:17:79:03",
"value": 2
},
...
How do we now produce a key/value array where the key is number of connections, and the value is the array of mac addresses for that number of connections?
Sounds like a job for the lesser-known Array.reduce. This function is the likely inspiration for crossfilter's group.reduce(), but it's a bit simpler: it just walks through an array, combining each value with the result of the last. It's great for producing an object from an array:
var value_keys = macPacketGroup.all().reduce(function(p, kv) {
if(!p[kv.value])
p[kv.value] = [];
p[kv.value].push(kv.key);
return p;
}, {});
Great:
{
"1": [
"b8:1d:ab:d1",
"dc:d9:16:3a",
"dc:d9:16:3b"
],
"2": [
"6c:17:79:03",
"6c:27:79:04",
"b8:1d:aa:d1",
"b8:1d:aa:d2",
"dc:da:16:3d"
],
But we wanted an array of key/value pairs, not an object!
var key_count_value_macs = Object.keys(value_keys)
.map(k => ({key: k, value: value_keys[k]}));
Great, that looks just like what a "real group" would produce:
[
{
"key": "1",
"value": [
"b8:1d:ab:d1",
"dc:d9:16:3a",
"dc:d9:16:3b"
]
},
{
"key": "2",
"value": [
"6c:17:79:03",
"6c:27:79:04",
"b8:1d:aa:d1",
"b8:1d:aa:d2",
"dc:da:16:3d"
]
},
...
Wrapping all that in a "fake group", which when asked to produce .all(), queries the original group and does the above transformations:
function value_keys_group(group) {
return {
all: function() {
var value_keys = group.all().reduce(function(p, kv) {
if(!p[kv.value])
p[kv.value] = [];
p[kv.value].push(kv.key);
return p;
}, {});
return Object.keys(value_keys)
.map(k => ({key: k, value: value_keys[k]}));
}
}
}
Now we can plot the pie chart! The only fancy thing here is that the value accessor should look at the length of the array for each value (instead of assuming the value is just a number):
packetPie
// ...
.group(value_keys_group(macPacketGroup))
.valueAccessor(kv => kv.value.length);
Demo fiddle.
However, clicking on slices won't work. I'll return to that in a minute - just want to hit "save" first!
Part 2: Filtering based on counts
As I remarked at the start, it's not possible to create a crossfilter dimension which will filter based on the count of connections. This is because crossfilter always needs to look at each row and determine, based only on the information in that row, whether it belongs in a group or filter.
If you add another chart at this point and try clicking on a slice, everything in the other charts will disappear. This is because the keys are now counts, and counts are invalid mac addresses, so we're telling it to filter to a key which doesn't exist.
However, we can obviously filter by mac address, and we also know the mac addresses for each count! So this isn't so bad. It just requires a filterHandler.
Although, hmmm, in producing the fake group, we seem to have forgotten value_keys. It's hidden away inside the function, and then let go.
It's a little ugly, but we can fix that:
function value_keys_group(group) {
var saved_value_keys;
return {
all: function() {
var value_keys = group.all().reduce(function(p, kv) {
if(!p[kv.value])
p[kv.value] = [];
p[kv.value].push(kv.key);
return p;
}, {});
saved_value_keys = value_keys;
return Object.keys(value_keys)
.map(k => ({key: k, value: value_keys[k]}));
},
value_keys: function() {
return saved_value_keys;
}
}
}
Now, every time .all() is called (every time the pie chart is drawn), the fake group will stash away the value_keys object. Not a great practice (.value_keys() would return undefined if you called it before .all()), but safe based on the way dc.js works.
With that out of the way, the filterHandler for the pie chart is relatively simple:
packetPie.filterHandler(function(dimension, filters) {
if(filters.length === 0)
dimension.filter(null);
else {
var value_keys = packetPie.group().value_keys();
var all_macs = filters.reduce(
(p, v) => p.concat(value_keys[v]), []);
dimension.filterFunction(k => all_macs.indexOf(k) !== -1);
}
return filters;
});
The interesting line here is another call to Array.reduce. This function is also useful for producing an array from another array, and here we use it just to concatenate all of the values (mac addresses) from all of the selected slices (connection counts).
Now we have a working filter. It doesn't make too much sense to combine it with the box plot from the last question, but the new fiddle demonstrates that filtering based on number of connections does work.
Part 3: what about zeroes?
As commonly comes up, crossfilter considers a bin with value zero to still exist, so we need to "remove the empty bins". However, in this case, we've added a non-standard method to the first fake group, in order to allow filtering. (We could have just used a global there, but globals are messy.)
So, we need to "pass through" the value_keys method:
function remove_empty_bins_pt(source_group) {
return {
all:function () {
return source_group.all().filter(function(d) {
return d.key !== '0';
});
},
value_keys: function() {
return source_group.value_keys();
}
};
}
packetPie
.group(remove_empty_bins_pt(value_keys_group(macPacketGroup)))
Another oddity here is we are filtering out the key zero, and that's a string here!
Demo fiddle!
Alternately, here's a better solution! Do the bin filtering before passing to value_keys_group, and then we can use the ordinary remove_empty_bins!
function remove_empty_bins(source_group) {
return {
all:function () {
return source_group.all().filter(function(d) {
//return Math.abs(d.value) > 0.00001; // if using floating-point numbers
return d.value !== 0; // if integers only
});
}
};
}
packetPie
.group(value_keys_group(remove_empty_bins(macPacketGroup)))
Yet another demo fiddle!!

mongo/mongoid MapReduce on batch inserted documents

Im creating my batch and inserting it to collection using command i specified below
batch = []
time = 1.day.ago
(1..2000).each{ |i| a = {:name => 'invbatch2k'+i.to_s, :user_id => BSON::ObjectId.from_string('533956cd4d616323cf000000'), :out_id => 'out', :created_at => time, :updated_at => time, :random => '0.5' }; batch.push a; }
Invitation.collection.insert batch
As stated above, every single invitation record has user_id fields value set to '533956cd4d616323cf000000'
after inserting my batch with created_at: 1.day.ago i get:
2.1.1 :102 > Invitation.lte(created_at: 1.week.ago).count
=> 48
2.1.1 :103 > Invitation.lte(created_at: Date.today).count
=> 2048
also:
2.1.1 :104 > Invitation.lte(created_at: 1.week.ago).where(user_id: '533956cd4d616323cf000000').count
=> 14
2.1.1 :105 > Invitation.where(user_id: '533956cd4d616323cf000000').count
=> 2014
Also, I've got a map reduce which counts invitations sent by each unique User (both total and sent to unique out_id)
class Invitation
[...]
def self.get_user_invites_count
map = %q{
function() {
var user_id = this.user_id;
emit(user_id, {user_id : this.user_id, out_id: this.out_id, count: 1, countUnique: 1})
}
}
reduce = %q{
function(key, values) {
var result = {
user_id: key,
count: 0,
countUnique : 0
};
var values_arr = [];
values.forEach(function(value) {
values_arr.push(value.out_id);
result.count += 1
});
var unique = values_arr.filter(function(item, i, ar){ return ar.indexOf(item) === i; });
result.countUnique = unique.length;
return result;
}
}
map_reduce(map,reduce).out(inline: true).to_a.map{|d| d['value']} rescue []
end
end
The issue is:
Invitation.lte(created_at: Date.today.end_of_day).get_user_invites_count
returns
[{"user_id"=>BSON::ObjectId('533956cd4d616323cf000000'), "count"=>49.0, "countUnique"=>2.0} ...]
instead of "count" => 2014, "countUnique" => 6.0 while:
Invitation.lte(created_at: 1.week.ago).get_user_invites_count returns:
[{"user_id"=>BSON::ObjectId('533956cd4d616323cf000000'), "count"=>14.0, "countUnique"=>6.0} ...]
Data provided by query, is accurate before inserting the batch.
I cant wrap my head around whats going on here. Am i missing something?
The part that you seemed to have missed in the documentation seem to be the problem here:
MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.
And also later:
the type of the return object must be identical to the type of the value emitted by the map function to ensure that the following operations is true:
So what you see is your reduce function is returning a signature different to the input it receives from the mapper. This is important since the reducer may not get all of the values for a given key in a single pass. Instead it gets some of them, "reduces" the result and that reduced output may be combined with other values for the key ( possibly also reduced ) in a further pass through the reduce function.
As a result of your fields not matching, subsequent reduce passes do not see those values and do not count towards your totals. So you need to align the signatures of the values:
def self.get_user_invites_count
map = %q{
function() {
var user_id = this.user_id;
emit(user_id, {out_id: this.out_id, count: 1, countUnique: 0})
}
}
reduce = %q{
function(key, values) {
var result = {
out_id: null,
count: 0,
countUnique : 0
};
var values_arr = [];
values.forEach(function(value) {
if (value.out_id != null)
values_arr.push(value.out_id);
result.count += value.count;
result.countUnique += value.countUnique;
});
var unique = values_arr.filter(function(item, i, ar){ return ar.indexOf(item) === i; });
result.countUnique += unique.length;
return result;
}
}
map_reduce(map,reduce).out(inline: true).to_a.map{|d| d['value']} rescue []
end
You also do not need user_id in the values emitted or kept as it is already the "key" value for the mapReduce. The remaining alterations consider that both "count" and "countUnique" can contain an exiting value that needs to be considered, where you were simply resetting the value to 0 on each pass.
Then of course if the "input" has already been through a "reduce" pass, then you do not need the "out_id" values to be filtered for "uniqueness" as you already have the count and that is now included. So any null values are not added to the array of things to count, which is also "added" to the total rather than replacing it.
So the reducer does get called several times. For 20 key values the input will likely not be split, which is why your sample with less input works. For pretty much anything more than that, then the "groups" of the same key values will be split up, which is how mapReduce optimizes for large data processing. As the "reduced" output will be sent back to the reducer again, you need to be mindful that you are considering the values you already sent to output in the previous pass.

Categories

Resources