Complex challenge about complexity and intersection - javascript

Preface
Notice: This question is about complexity. I use here a complex design pattern, which you don't need to understand in order to understand the question. I could have simplified it more, but I chose to keep it relatively untouched for the sake of preventing mistakes. The code is written in TypeScript which is a super-set of JavaScript.
The code
Regard the following class:
export class ConcreteFilter implements Filter {
interpret() {
// rows is a very large array
return (rows: ReportRow[], filterColumn: string) => {
return rows.filter(row => {
// I've hidden the implementation for simplicity,
// but it usually returns either an empty array or a very short one.
}
}).map(row => <string>row[filterColumn]);
}
}
}
It receives an array of report row, then it filters the array by some logic that I've hidden. Finally it does not return the whole row, but only one stringy column that is mentioned in filterColumn.
Now, take a look at the following function:
function interpretAnd (filters: Filter[]) {
return (rows: ReportRow[], filterColumn: string) => {
var runFilter = filters[0].interpret();
var intersectionResults = runFilter(rows, filterColumn);
for (var i=1; i<filters.length; i++) {
runFilter = filters[i].interpret();
var results = runFilter(rows, filterColumn);
intersectionResults = _.intersection(intersectionResults, results);
}
return intersectionResults;
}
}
It receives an array of filters, and returns a distinct array of all the "filterColumn"s that the filters returned.
In the for loop, I get the results (string array) from every filter, and then make an intersection operation.
The problem
The report row array is large so every runFilter operation is expensive (while on the other hand the filter array is pretty short). I want to iterate the report row array as fewer times as possible. Additionally, the runFilter operation is very likely to return zero results or very few.
Explanation
Let's say that I have 3 filters, and 1 billion report rows. the internal iterration, i.e. the iteration in ConcreteFilter, will happen 3 billion times, even if the first execution of runFilter returned 0 results, so I have 2 billion redundant iterations.
So, I could, for example, check if intersectionResults is empty in the beginning of every iteration, and if so, then break the loop. But I'm sure that there are better solutions mathematically.
Also if the first runFIlter exectuion returned say 15 results, I would expect the next exectuion to receive an array of only 15 report rows, meaning I want the intersection operation to influence the input of the next call to runFilter.
I can modify the report row array after each iteration, but I don't see how to do it in an efficient way that won't be even more expensive than now.
A good solution would be to remove the map operation, and then passing the already filtered array in each operation instead of the entire array, but I'm not allowed to do it because I must not change the results format of Filter interface.
My question
I'd like to get the best solution you could think of as well as an explanation.
Thanks a lot in advance to every one who would spend his time trying to help me.

Not sure how effective this will be, but here's one possible approach you can take. If you preprocess the rows by the filter column you'll have a way to retrieve the matched rows. If you typically have more than 2 filters then this approach may be more beneficial, however it will be more memory intensive. You could branch the approach depending on the number of filters. There may be some TS constructs that are more useful, not very familiar with it. There are some comments in the code below:
var map = {};
// Loop over every row, keep a map of rows with a particular filter value.
allRows.forEach(row => {
const v = row[filterColumn];
let items;
items = map[v] = map[v] || [];
items.push(row)
});
let rows = allRows;
filters.forEach(f => {
// Run the filter and return the unique set of matched strings
const matches = unique(f.execute(rows, filterColumn));
// For each of the matched strings, go and look up the remaining rows and concat them for the next filter.
rows = [].concat(...matches.reduce(m => map[v]));
});
// Loop over the rows that made it all the way through, extract the value and then unique() the collection
return unique(rows.map(row => row[filterColumn]));
Thinking about it some more, you could use a similar approach but just do it on a per filter basis:
let rows = allRows;
filters.forEach(f => {
const matches = f.execute(rows, filterColumn);
let map = {};
matches.forEach(m => {
map[m] = true;
});
rows = rows.filter(row => !!map[row[filterColumn]]);
});
return distinctify(rows.map(row => row[filterColumn]));

Related

Is this O(N) approach the only way of avoiding a while loop when walking this linked list in Javascript?

I have a data structure that is essentially a linked list stored in state. It represents a stream of changes (patches) to a base object. It is linked by key, rather than by object reference, to allow me to trivially serialise and deserialise the state.
It looks like this:
const latest = 'id4' // They're actually UUIDs, so I can't sort on them (text here for clarity)
const changes = {
id4: {patch: {}, previous: 'id3'},
id3: {patch: {}, previous: 'id2'},
id2: {patch: {}, previous: 'id1'},
id1: {patch: {}, previous: undefined},
}
At some times, a user chooses to run an expensive calculation and results get returned into state. We do not have results corresponding to every change but only some. So results might look like:
const results = {
id3: {performance: 83.6},
id1: {performance: 49.6},
}
Given the changes array, I need to get the results closest to the tip of the changes list, in this case results.id3.
I've written a while loop to do this, and it's perfectly robust at present:
let id = latest
let referenceId = undefined
while (!!id) {
if (!!results[id]) {
referenceId = id
id = undefined
} else {
id = changes[id].previous
}
}
The approach is O(N) but that's the pathological case: I expect a long changelist but with fairly frequent results updates, such that you'd only have to walk back a few steps to find a matching result.
While loops can be vulnerable
Following the great work of Gene Krantz (read his book "Failure is not an option" to understand why NASA never use recursion!) I try to avoid using while loops in code bases: They tend to be susceptible to inadvertent mistakes.
For example, all that would be required to make an infinite loop here is to do delete changes.id1.
So, I'd like to avoid that vulnerability and instead fail to retrieve any result, because not returning a performance value can be handled; but the user's app hanging is REALLY bad!
Other approaches I tried
Sorted array O(N)
To avoid the while loop, I thought about sorting the changes object into an array ordered per the linked list, then simply looping through it.
The problem is that I have to traverse the whole changes list first to get the array in a sorted order, because I don't store an ordering key (it would violate the point of a linked list, because you could no longer do O(1) insert).
It's not a heavy operation, to push an id onto an array, but is still O(N).
The question
Is there a way of traversing this linked list without using a while loop, and without an O(N) approach to convert the linked list into a normal array?
Since you only need to append at the end and possibly remove from the end, the required structure is a stack. In JavaScript the best data structure to implement a stack is an array -- using its push and pop features.
So then you could do things like this:
const changes = [];
function addChange(id, patch) {
changes.push({id, patch});
}
function findRecentMatch(changes, constraints) {
for (let i = changes.length - 1; i >= 0; i--) {
const {id} = changes[i];
if (constraints[id]) return id;
}
}
// Demo
addChange("id1", { data: 10 });
addChange("id2", { data: 20 });
addChange("id3", { data: 30 });
addChange("id4", { data: 40 });
const results = {
id3: {performance: 83.6},
id1: {performance: 49.6},
}
const referenceId = findRecentMatch(changes, results);
console.log(referenceId); // id3
Depending on what you want to do with that referenceId you might want findRecentMatch to return the index in changes instead of the change-id itself. This gives you the possibility to still retrieve the id, but also to clip the changes list to end at that "version" (i.e. as if you popped all the entries up to that point, but then in one operation).
While writing out the question, I realised that rather than avoiding a while-loop entirely, I can add an execution count and an escape hatch which should be sufficient for the purpose.
This solution uses Object.keys() which is strictly O(N) so not technically a correct answer to the question - but it is very fast.
If I needed it faster, I could restructure changes as a map instead of a general object and access changes.size as per this answer
let id = latest
let referenceId = undefined
const maxLoops = Object.keys(changes).length
let loop = 0
while (!!id && loop < maxLoops) {
loop++
if (!!results[id]) {
referenceId = id
id = undefined
} else {
id = changes[id].previous
}
}

How to take three different words from an array?

I want to create a function taking three random different words from an array
I followed React native filter array not working on string to filter the array. However, the filter is not working.
takeThreeWords=()=>{
for(i=0;i<3;i++){
rand =(max, min=0)=> Math.floor(Math.random()*max-min)
randomWord=()=> this.state.wordsBank[rand(this.state.wordsBank.length)]
let aRandomWord = randomWord()
this.setState(prevState=>({
wordsToUse:[...prevState.wordsToUse, aRandomWord],
wordsBank: prevState.wordsBank.filter(word=>word!== aRandomWord)
}))
The last line is to make sure that no words from wordsBank are taken twice. However, the function works just as if the last line does not exist. wordsToUse take three words, but sometimes they are the same...
Can you please point me out what I am missing ?
You are updating wordsBank via this.setState, but your loop keeps operating on the initial copy of wordsBank, which has not been filtered.
The cleanest fix is to not call this.setState multiple times in a loop.
let wordsBank = this.state.wordsBank;
let extractedWords = [];
while(extractedWords.length<3) {
let i = ~~(Math.random()*wordsBank.length);
extractedWords.push(wordsBank.splice(i, 1)[0]);
}
this.setState(prevState=>({
wordsToUse: [...prevState.wordsToUse, ...extractedWords],
wordsBank
}));

How to add in a count monitor to array differencing?

I'm trying to create a simple array comparer that will tell the differences between two different arrays. I've managed to get this much, and it mostly works. However, my problem is that it compares all specific differences in the array, and because this is meant to compare two of the same types of arrays, but one is just an updated version, there is a problem with the counting. Say we have element 1 and element 2. If element 1 is not equal to element 2, then since we're comparing just an updated version of the array and an older version, all other elements are on that list still. Because of this, if we compare the data from all elements after the difference between element 1 and element 2, then all of our updated values should be x higher depending on how many differences there were before those elements. Because of this, after the first difference, every single value is different, even if it was already on the old version of the array.
This accurately describes my problem. The first difference comes at 1 and 2,
however, because of this difference, every other value is bumped up:
0|0
1|2
2|3
3|4
4|5
5|6
The actual arrays would kinda be like this:
['Auto','Sniper','Citadel','Tank']['Auto','Gunner','Sniper','Citadel','Tank']
As you can see because of the addition of Gunner, all the other argument values coming after Gunner are moved up. But because of this, ever single value after Gunner is now differnt from its original counterpart too, meaning that in what I originally had, it would log everything afterwards.
async function c() {
const fetchO = await fetch('https://a/data');
const fetchN = await fetch('https://a/data');
const O = await fetchO.json();
const N = await fetchN.json();
for(let count in N || O) {
if(N[count] !== O[count]) {
console.table('Old: ', O[count], 'New: ', N[count]);
}
}
}
c();
I've been trying to use a counting mediocre variable as a control so that this error doesn't occur. Like this.
let countControl=0;
async function c() {
const fetchO = await fetch('https://a/data');
const fetchN = await fetch('https://a/data');
const O = await fetchO.json();
const N = await fetchN.json();
for(let count in N || O) {
if(N[count] !== O[count+countControl]) {
console.table('Old: ', O[count+countControl], 'New: ', N[count]);
if(O.length>N.length){countControl=countControl-1;}
if(O.length<N.length){countControl=countControl+1;}
}
}
}
c();
The problem with this is that count isn't actually a defined variable, so when I do [count+countControl], it comes out as being undefined, but if I turn count into a defined variable I'll have to make an updating function on it, which won't work in this situation. How can I add in a working count monitor? Or is there some different way to do this?
If I follow your question correctly, you want to count the differences between the two arrays. If we can make the assumption that the elements in each array are unique (i.e. there are no duplicates) and that order does not matter, it becomes a very simple problem to solve. This would be the equivalent to the set operation "symmetric difference" which gives the set of elements only in one of two given sets. From there, it's just a matter of counting the elements which make up the symmetric difference. This can be implemented as follows.
const a = ['Auto','Sniper','Citadel','Tank'];
const b = ['Auto','Gunner','Sniper','Citadel','Tank'];
const symmetricDifference = (a, b) => {
const difference = new Set(a);
for (const element of new Set(b)) {
if (difference.has(element)) {
difference.delete(element);
} else {
difference.add(element);
}
}
return Array.from(difference);
};
const difference = symmetricDifference(a, b);
const differenceCount = difference.length;
console.log({ difference, differenceCount });
(This implementation still deals with Arrays instead of Sets outside of symmetricDifference(). Depending on other constraints and the number of objects involved, it might be significantly better for performance to use Sets outside of symmetricDifference() too.)
If the assumption of uniqueness does not hold, the general algorithm above should be adaptable to instead deal with counts of each unique element. If, however, the assumption of order not mattering does not hold up, the problem becomes much harder in the general case. Depending on your specific case though, there might be shortcuts to be had.

Filter an array from another array of elements and return specific positions

I'm finally learning better methods for my JS. I'm trying to find a way to go faster than I do so far :
In my code, I have two arrays :
one with unique keys in first position
one with those keys in first position but not unique. There are multiple entries with a certain value I want to filter.
The thing is I don't want to filter everything that is in the second array. I want to select some positions, like item[1]+item[5]+item[6]. What I do works, but I wonder if there isn't a faster way to do it ?
for (let i=0;i<firstArrayOfUniques.length;i++){
const findRef = secondArrayOfMultiple
.filter(item => item[0]==firstArrayOfUniques[i][0]);
// Afterwards, I redo a map and select only the second element,
//then I join the multiple answers
// Is there a way to do all that in the first filter?
const refSliced = findRef.map(x=>x[1]);
const refJoin = refSliced.join(" - ");
canevasSheet.getRange(1+i,1).setValue(refJoin);
}
The script snippet you quote will spend almost all of its running time calling the Range.setValue() method. It gets called separately for every data row. Use Range.setValues() instead, and call it just once, like this:
function moot(firstArrayOfUniques, secondArrayOfMultiple) {
const result = firstArrayOfUniques.map(uniqueRow =>
secondArrayOfMultiple
.filter(row => row[0] === uniqueRow[0])
.map(row => row[1])
.join(' - '));
canevasSheet.getRange(1, 1, result.length, result[0].length).setValues(result);
}
See Apps Script best practices.

Puzzling behaviour of two seemingly identical MapReduce functions

Our MongoDB database contains a list of all user accounts, where each new registration has a 'created_at' field in the account document with the current date and time when it was created.
We wanted to find out how many new registrations there were or each day, so put together a MapReduce query to find this out for us.
db.accounts.mapReduce(
function() {
var date = this.created_at.toLocaleDateString();
emit(date, 1);
},
function(key, values) {
return values.length;
},
{ out: "output" })
Our first attempt was above. For each registration, it emits a value of 1 for that date. The length of each array is then used to determine how many registrations there were on that day.
However, while the results were mostly correct, there were notable inaccuracies. For example the first day gave us a value in double figures when we know the actual figure was far higher. Some values changed after running the map reduce function a second time, despite operating on the same data.
We changed the function to instead sum up the values of the array (which, remember, should only consist of 1's and therefore be identical to array.length.
db.accounts.mapReduce(
function() {
var date = this.created_at.toLocaleDateString();
emit(date, 1);
},
function(key, values) {
var sum = 0;
for(var i = 0; i < values.length; i++) {
sum += values[i];
};
return sum;
},
{ out: "output" })
To our surprise, this gave the correct result for every date that was wrong before.
Does anyone know why the first map reduce did not operate as intended?
Reduce may be called multiple times for emit-ed values with later calls being passed the output of earlier calls to reduce. When you only look at the length of the array, you miss the fact that you may be looking at partially aggregated data. Summing the values will make the earlier aggregations accumulate, which is what you want.

Categories

Resources