Intersect multiple arrays of objects

Intersect multiple arrays of objects - javascript

So first of all, I am not expecting a specific solution to my problem, but instead some insights from more experienced developers that could enlighten me and put me on the right track. As I am not yet experienced enough in algorithms and data structures and I take this as a challenge for myself.
I have n number of arrays, where n >= 2.
They all contain objects and in the end, I want an array that contains only the common elements between all these arrays.
array1 = [{ id: 1 }, { id: 2 }, { id: 6 }, { id: 10 }]
array2 = [{ id: 2 }, { id: 4 }, { id: 10 }]
array3 = [{ id: 2 }, { id: 3 }, { id: 10 }]
arrayOfArrays = [array1, array2, array3]
intersect = [{ id: 2 }, { id: 10 }]
How would one approach this problem? I have read solutions using Divide And Conquer, or Hash tables, and even using the lodash library but I would like to implement my own solution for once and not rely on anything external, and at the same time practice algorithms.

For efficiency, I would start by locating the shortest array. This should be the one you work with. You can run a reduce on the arrayOfArrays to iterate through and return the index of the shortest length.
const shortestIndex = arrayOfArrays.reduce((accumulator, currentArray, currentIndex) => currentArray.length < arrayOfArrays[index] ? currentIndex : accumulator, 0);
Take the shortest array and call the reduce function again, this will iterate through the array and allow you to accumulate a final value. The second parameter is the starting value, which is a new array.
shortestArray.reduce((accumulator, currentObject) => /*TODO*/, [])
For the code, we basically need to loop through the remaining arrays and make sure it exists in all of them. You can use the every function since it will fail fast meaning the first array it doesn't exist in will trigger it to return false.
Inside the every you can call some to check if there is at least one match.
isMatch = remainingArrays.every(array => array.some(object => object.id === currentObject.id))
If it's a match, add it to the accumulator which will be your final result. Otherwise, just return the accumulator.
return isMatch ? [...accumulator, currentObject] : accumulator;
Putting all that together should get you a decent solution. I'm sure there are more optimizations that could be made, but that's where I would start.
reduce
every
some

The general solution is to iterate through an input and check for each value whether it exists in all of the other inputs. (Time complexity: O(l * n * l) where n is number of arrays and l is the average length of an array)
Following the ideas of the other two answers, we can improve this brute-force approach a bit by
iterating through the smallest input
using a Set for efficient lookup of ids instead of iteration
so it becomes (with O(l * n + min_l * n) = O(n * l))
const arrayOfIdSets = arrayOfArrays.map(arr =>
new Set(arr.map(val => val.id))
);
const smallestArray = arrayOfArrays.reduce((smallest, arr) =>
smallest.length < arr.length ? smallest : arr
);
const intersection = smallestArray.filter(val =>
arrayOfIdSets.every(set => set.has(val.id))
);

A good way to approach these kinds of problems, both in interviews and in just regular life, is to think of the most obvious approach you can come up with, no matter how inefficient, and think think about how you can improve it. This is usually called a "brute force" approach.
So for this problem, perhaps an obvious but inefficient approach would be to iterate through every item in array1 and check if it is in both array2 and array 3, and note it down (in another array) if it is. Then repeat again for each item in array2 and in array 3, making sure to only note down items you haven't noted down before.
We can see that will be inefficient because we'll be looking for a single item in an array many times, which is quite slow for an array. But it'll work!
Now we can get to work improving our solution. One thing to notice is that finding the intersection of 3 arrays is the same as finding the intersection of the third array with the intersection of the first and second array. So we can look for a solution to the simpler problem of the intersection of 2 arrays, to build one of an intersection for 3 arrays.
This is where it's handy to know your datastructures. You want to be able to ask the question, "does this structure contain a particular element?" as quickly as possible. Think about what structures are good for that kind of a lookup (known as search). More experienced engineers have this memorized/learned, but you can reference something like https://www.bigocheatsheet.com/ to see that sets are good at this.
I'll stop there to not give the full solution, but once you've seen that sets are fast at both insertion and search, think about how you can use that to solve your problem.

Related

Time complexity between two iterate methods of immutable collection

There is an Immutable.js with the structural data as collections. Let's take a Map. Also there are methods to work with:
includes
filter
Let's consider these data:
const data = Map({
id1: Map({id: 'id1'}),
id2: Map({id: 'id2'}),
id100: Map({id: 'id100'}),
});
const ids = List(['id1', 'id100']);
And two approaches to iterate this Map:
function selectData() {
return data.filter((item) => ids.includes(item.get("id")));
}
function selectData() {
let selected = Map();
ids.forEach((id) => {
selected = selected.set(id, data.get(id));
});
return selected;
}
So, the question is: are these two approaches equivalent and have the same time complexity in
general
this special case with the data in Map above
From my POV they are not equivalent but time complexity should be the same.
Update: equivalent - do the same, provide the same result.

As you pointed out, the semantics are slightly different. In the example case they both provide an intersection of the ids, so
> JSON.stringify(s1())
'{"id1":{"id":"id1"},"id100":{"id":"id100"}}'
> JSON.stringify(s2())
'{"id1":{"id":"id1"},"id100":{"id":"id100"}}'
However, there are edge cases with the data structure which do not produce like for like results, such as the example you gave in the comment:
const data = Map({
id1: Map({id: 'id1'}),
id2: Map({id: 'id1'}),
id100: Map({id: 'id100'}),
});
const ids = List(['id1', 'id100']);
...
> JSON.stringify(s1())
'{"id1":{"id":"id1"},"id2":{"id":"id1"},"id100":{"id":"id100"}}'
> JSON.stringify(s2())
'{"id1":{"id":"id1"},"id100":{"id":"id100"}}'
Note. The above case looks like a bug as the id in the (value) map doesn't match the id of the key; but who knows?
In general
approach 1 produces 1 item for each item in the top level map (data) whose value has an id item that is contained in the list.
approach 2 produces 1 item for each value in the list that has an entry in the top level map (data)
As the two approaches differ in terms of lookup in the amp (data) -- one goes by the key, the other by the value of id in the value map -- if there is an inconsistency in these two values, as per the second example, you will get a difference.
In general you may be better with the second approach as the lookup into the Map will likely be cheaper than lookup into the list if both collections are of similar size. If the collections are of largely different size, you would need to take that into account.
Lookup into the list will be O(N) whereas lookup into the Map is documented as O(log32 N) (so some kind of wide tree implementation). So for a map M and list L, the cost of apprach 1 would be O(L * log32 M) whereas the cost of the second approach would be O(M * L), if M == L (or is close), then of course approach 1 wins, on paper.
It's almost always best to profile these things, rather than worry about the theoretical time complexity.
Practically, there may be another nice approach that relies on the already sorted nature of the map. If you sort the list first, (O(L log L)), then you can just use a sliding window over elements of both to find the intersection...

Understanding indexOf and lastIndexOf

I'am doing some JavaScript exercises and I stumbled upon this one "Write a JavaScript program to filter out the non-unique values in an array."
I tried and found a solution, which worked but it was to cumbersome. A better answer, according to the site, is the following:
const filter_Non_Unique = arr =>
arr.filter(l => arr.indexOf(l) === arr.lastIndexOf(l));
console.log(filter_Non_Unique([1,2,3,4,4,5,6,6])) // 1,2,3,5
Now I recked my head trying to understand why this solution works but I still don't get it.
Can somebody explain to me?
Thanks in advance.

If the element only occurs once in the array, the first index will be the same as the last index, the index will not change for both calls.
eg:
const arr = [1,2,3,4,4,5,6,6]
console.log(arr.indexOf(5))
console.log(arr.lastIndexOf(5))
Since both of of these functions return the same index, filter will keep the element in.
On the other hand if there are multiple values, those values will have different indexes, so the filter will return false, and remove it from the array:
const arr = [1,2,3,4,4,5,6,6]
console.log(arr.indexOf(4))
console.log(arr.lastIndexOf(4))
I've answered a question similar to the one you solved here, you could try that logic too.

Beside of the indexOf/lastIndexOf approach which needs a lot of iterating the array, you could take a two loop approach.
By getting an array of single items by using a hash table and three states, like
undefined, the standard value of not declared properties of an object,
true for the first found value,
false for all values who are repeated in the array.
Then filter by the value of the hash table.
const
filterNonUnique = array => {
var hash = {};
for (let v of array) hash[v] = hash[v] === undefined;
return array.filter(v => hash[v]);
}
console.log(filterNonUnique([1, 2, 3, 4, 4, 5, 6, 6, 7, 7, 7, 7]))

What is the runtime complexity of this function?

I believe it's quadratic O(n^2) but not 100% sure due to uncertainty of how the .filter() and .map() operations work in JavaScript.
The big question I have is whether the entire filter() operation completes before starting a single map() operation, or if it's smart enough to perform the map() operation while it's already iterating within the filter() operation.
The method
function subscribedListsFromSubscriptions(subscriptions: Subscription[]) {
return new Set(listSubscriptions.filter((list) => {
return list.subscribed;
}).map((list) => {
return list.list_id;
}));
}
Example input data
let subscriptions = [ {
list_id: 'abc',
subscribed: false
}, {
list_id: 'ghi',
subscribed: false
}];
From what I see
It appears to be:
filter() for each element of subscriptions - time n
map() for each remaining element - time n (at maximum)
new Set() for each remaining element - time n (at maximum)
For the new Set() operation, I'm guessing it's creating a new object and adding each element to the created instance.
If there were many duplicates in data, one might expect the efficiency to increase. But we don't expect many duplicates in data, and from my understanding of 'Big O', the maximal limit is what's used.
From this analysis, I'm expecting the time complexity to be either O(n^2) or O(n^3). But as stated, I'm unsure of how to interpret it for certain.
Any help in this would be greatly appreciated. Thanks in advance!

I think your interpretation of the order of operations is correct: filter, then map, then create a Set.
However, in order for this algorithm to reach O(n^2), you would have to create a nested loop, for example:
create the Set for each element of the array
compare each element witch each other element in the array.
This is not the case here. In the worst case scenario (no duplicates), the algorithm will iterate the input array three times, meaning the O(3*n) complexity which is still linear, not quadratic.

Filter/Search JavaScript array of objects based on other array in Node JS

i have one array of ids and one JavaScript objects array. I need to filter/search the JavaScript objects array with the values in the array in Node JS.
For example
var id = [1,2,3];
var fullData = [
{id:1, name: "test1"}
,{id:2, name: "test2"}
,{id:3, name: "test3"}
,{id:4, name: "test4"}
,{id:5, name: "test5"}
];
Using the above data, as a result i need to have :
var result = [
{id:1, name: "test1"}
,{id:2, name: "test2"}
,{id:3, name: "test3"}
];
I know i can loop through both and check for matching ids. But is this the only way to do it or there is more simple and resource friendly solution.
The amount of data which will be compared is about 30-40k rows.

This will do the trick, using Array.prototype.filter:
var result = fullData.filter(function(item){ // Filter fulldata on...
return id.indexOf(item.id) !== -1; // Whether or not the current item's `id`
}); // is found in the `id` array.
Please note that this filter function is not available on IE 8 or lower, but the MDN has a polyfill available.

As long as you're starting with an unsorted Array of all possible Objects, there's no way around iterating through it. #Cerbrus' answer is one good way of doing this, with Array.prototype.filter, but you could also use loops.
But do you really need to start with an unsorted Array of all possible Objects?
For example, is it possible to filter these objects out before they ever get into the Array? Maybe you could apply your test when you're first building the Array, so that objects which fail the test never even become part of it. That would be more resource-friendly, and if it makes sense for your particular app, then it might even be simpler.
function insertItemIfPass(theArray, theItem, theTest) {
if (theTest(theItem)) {
theArray.push(theItem);
}
}
// Insert your items by using insertItemIfPass
var i;
for (i = 0; i < theArray.length; i += 1) {
doSomething(theArray[i]);
}
Alternatively, could you use a data structure that keeps track of whether an object passes the test? The simplest way to do this, if you absolutely must use an Array, would be to also keep an index to it. When you add your objects to the Array, you apply the test: if an object passes, then its position in the Array gets put into the index. Then, when you need to get objects out of the Array, you can consult the index: that way, you don't waste time going through the Array when you don't need to touch most of the objects in the first place. If you have several different tests, then you could keep several different indexes, one for each test. This takes a little more memory, but it can save a lot of time.
function insertItem(theArray, theItem, theTest, theIndex) {
theArray.push(theItem);
if (theTest(theItem)) {
theIndex.push(theArray.length - 1);
}
}
// Insert your items using insertItem, which also builds the index
var i;
for (i = 0; i < theIndex.length; i += 1) {
doSomething(theArray[theIndex[i]]);
}
Could you sort the Array so that the test can short-circuit? Imagine a setup where you've got your array set up so that everything which passes the test comes first. That way, as soon as you hit your first item that fails, you know that all of the remaining items will fail. Then you can stop your loop right away, since you know there aren't any more "good" items.
// Insert your items, keeping items which pass theTest before items which don't
var i = 0;
while (i < theArray.length) {
if (!theTest(theArray[i])) {
break;
}
doSomething(theArray[i]);
i += 1;
}
The bottom line is that this isn't so much a language question as an algorithms question. It doesn't sound like your current data structure -an unsorted Array of all possible items- is well-suited for your particular problem. Depending on what else the application needs to do, it might make more sense to use another data structure entirely, or to augment the existing structure with indexes. Either way, if it's planned carefully, will save you some time.

Find maximum occurance of string element in an array in javascript

I have an Array of string elements and I need to find how many times a elements occured in a Array.
my Array is following:
var x=["water","water","water","land", "land","land","land","forest"];
I need to know which element is prominent in Array. I have tried to use example from this discussion "Counting the occurrences of JavaScript array elements".
But I did not get any expected result.
Please help me to find a possible solution. :-)

There may not be an unambiguous answer to which item occurs the most times. Here is how you can get the item counts in a functional style:
x.reduce(function(counts, key) {
if(!counts.hasOwnProperty(key))
counts[key] = 0
counts[key] = counts[key] + 1
return counts }, {})
Returns {"water": 3, "land": 4, "forest": 1}

Develop Reference

JavaScript is the programming language of the Web.