Efficient duplicates search algorithm

Efficient duplicates search algorithm - javascript

I need a script to search efficiently all the duplicates in a one-dimensional array.
I tried a naive method :
for(var i=0, ii<arr.length-1; i<ii; i++)
for(var j=i+1, jj<arr.length; j<jj; j++)
if(arr[i] == arr[j])
// remove the duplicate
Very simple but it takes a too long time if the array contains a large set of values. The tables that I use often contain hundreds of thousands of values, so that the number of iterations required for this operation is HUGE !
If someone has an idea !?

Use a LinkedHashSet or OrderedHashSet implementation, it does not allow duplicates and provides expected O(1) on insertion, lookup, and deletion. Since your OP says you want to remove the duplicates, there is no faster way to do this than O(n). In an array of 1,000,000 items max time was 16ms
Create a LinkedHashSet hs
foreach object obj in arr
-- hs.add(obj);
Complexity is expected O(n) with a good hash function.

This code could be the most efficient way you can do it ..!! Which is nothing but the direct implementation of set .
function eliminateDuplicates(arr) {
var i,
len=arr.length,
out=[],
obj={};
for (i=0;i<len;i++) {
obj[arr[i]]=0;
}
for (i in obj) {
out.push(i);
}
return out;
}

Related

Javascript loop time complexity

When looping an array, people often use a simple method like below.
const array = [1,2,3,4,5];
for (let i = 0; i < array.length; i++) {
console.log(array[i]);
}
My question is if array[i] is O(1) operation or not.
For example, when i is 3, does javascript get the number immediately OR count from 0 to 3 again?

Yes. array[i] is O(1). However you do it N times, which makes the entire loop O(n).

Yes, it is O(1).
Because it takes a single step to access an item of an array via its index, or add/remove an item at the end of an array, the complexity for accessing, pushing, or popping a value in an array is O(1).
ref: here

Should be O(1) you can read it at here
javascript array complexity

What is the time complexity of this recursive solution for removing duplicates?

After I complete a Leetcode question, I always try to also determine the asymptotic time complexity, for practice.
I am now looking at problem 26. Remove Duplicates from Sorted Array:
Given a sorted array nums, remove the duplicates in-place such that
each element appears only once and returns the new length.
Do not allocate extra space for another array, you must do this by
modifying the input array in-place with O(1) extra memory.
Clarification:
Confused why the returned value is an integer but your answer is an
array?
Note that the input array is passed in by reference, which means a
modification to the input array will be known to the caller as well.
Internally you can think of this:
// nums is passed in by reference. (i.e., without making a copy) int
len = removeDuplicates(nums);
// any modification to nums in your function would be known by the caller.
// using the length returned by your function, it prints the first len elements.
for (int i = 0; i < len; i++) {
print(nums[i]);
}
Example 1:
Input: nums = [1,1,2]
Output: 2, nums = [1,2]
Explanation: Your
function should return length = 2, with the first two elements of nums
being 1 and 2 respectively. It doesn't matter what you leave beyond
the returned length.
My code:
/**
* #param {number[]} nums
* #return {number}
*/
var removeDuplicates = function(nums) {
nums.forEach((num,i) => {
if(nums[i+1] !== null && nums[i+1] == nums[i] ){
nums.splice(i, 1);
console.log(nums)
removeDuplicates(nums)
}
})
return nums.length;
};
For this problem, I got O(log n) from my research. Execution time halves each time it runs. Can someone please verify or determine if I am wrong?
Are all recursive functions inherently O(logn)? Even if there are multiple loops?

For this problem, I got O(log n) from my research. Execution time halves for each time it's run. Can someone please verify or determine if I am wrong?
The execution time does not halve for each run: imagine an extreme case where the input has 100 values and they are all the same. Then at each level of the recursion tree one of those duplicates will be found and removed. Then a deeper recursive call is made. So for every duplicate value there is a level in the recursion tree. So in this extreme case, the recursion tree will have a depth of 99.
Even if you would revise the algorithm, it would not be possible to make it O(log n), as all values in the array need to be read at least once, and that alone already gives it a time complexity of O(n).
Your implementation uses splice which needs to shift all the values that follow the deletion point, so one splice is already O(n), making your algorithm O(n²) (worst case).
Because of the recursion, it also uses O(n) extra space in the worst case (for the call stack).
Are all recursive functions inherently O(logn)?
No. Using recursion does not say anything about the overall time complexity. It could be anything. You typically get O(logn) when you can ignore O(n) (like half) of the current array when making the recursive call. This is for instance the case with a Binary Search algorithm.
Improvement
You can avoid the extra space by not using recursion, but an iterative method. Also, you are not required to actually change the length of the given array, only to return what its new length should be. So you can avoid using splice. Instead, use two indexes in the array: one that runs to the next character that is different, and another, a slower one, to which you copy that new character. When the faster index reaches the end of the input, the slower one indicates the size of the part that has the unique values.
Here is how that looks:
var removeDuplicates = function(nums) {
if (nums.length == 0) return 0;
let len = 1;
for (let j = 1; j < nums.length; j++) {
if (nums[j-1] !== nums[j]) nums[len++] = nums[j];
}
return len;
};

for..in loop loops over non-numeric indexes “clean” and “remove”

This is something very basic I might be missing here but I haven't seen such result till now.
I have a for loop where options.headers.length is 3. And in for loop I am dynamically creating a table header. Ideally this loop should run three times for 0 1 and 2 but when I have printed index it's printing 0,1,2,clean and remove. I haven't seen clean and remove as indexes. I know this information is not sufficient enough but if you have any clue please suggest. something might be overriding this is all I am concluded too after my debugging.
for (index in options.headers)

if you don't want to iterate clean and remove then change the loop to:
for (var i=0; i< options.headers.length;i++){
//use i for getting the array data
}
if you use for (index in options.headers) it will iterate for non-numeric keys also.

don use just index (as that is = window.index = global = bad) use var index
(read more here https://www.google.pl/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=globals+javascript+bad)
you have to check does the array has it as own property or maybe its some function (more after answer)
for (var index in options.headers) {
if (options.headers.hasOwnProperty(index) {
// code here
}
}
more about #2:
let's say we have
var array = [0,1,2,3];
and besides that, extending array with function (arrays can have functions in javascript and strings too)
Array.prototype.sayHello = function() {
alert('Hello');
};
then your loop would print sayHello as part of the array, but that's not it's own property, only the arrays

I assume that options.headers is an Array?
This happens when you (or some framework you load) adds methods to the Array prototype. The "for in" loop will enumerate also these added methods. Hence you should do the loop for an array with:
for (var i = 0; i < options.headers.length; i++)
That way you will only get the real values instead of added methods.

Loop through arrays using arr[i]!==undefined

I know the classic way of looping through an array arr is:
for(var i=0 ; i<arr.length ; i++) {
// code
}
But someone recently showed me a different way of implementing the condition inside that loop, like this:
for(var i=0 ; arr[i] !== undefined ; i++) {
I think this solution is interesting because this is exactly what you need when you loop through an array: you don't want to get undefineds when you try to access an undefined index.
I realize that if you count the characters it looks longer, and also that you might have some problems with arrays like this: ["Hello", , "World"], but apart from that - is there anything else I'm missing here? Why shouldn't we be using this technique instead?

Why shouldn't we be using this technique instead?
It doesn't work on sparse arrays (as you mentioned)
It doesn't work on arrays that contain undefined values
It's not as easily optimised (assuming the .length stays constant during the loop)
(In the old days, the undefined identifier could be overwritten, you'd need to use typeof)
Of cource, wheter it "works" for you depends on the use case, and sometimes you might want to use it. Most times, you simply don't.
And even if both ways would work in your case, it's better practise to use the standard approach (i<arr.length) as there is lower mental overhead. Everyone recognises that pattern and knows what it does, while with arr[i]!==undefined one would need to think about why the uncommon approach was chosen.

Sometimes arrays have empty values and your way of iteration will fail.
var arr = [];
arr[5] = 5;
for (var i = 0; arr[i] !== undefined; ++i) {
console.log(arr[i]);
}
console.log('done');
If you want to iterate real array values and skip undefined's, i suggest you to filter the array first and do iteration after. So your code will be more understandable. Example:
var arr = [];
arr[5] = 5;
arr.filter(Boolean).forEach(function (e) {
console.log(e);
});
console.log('done');

performance difference between for loop and for.. in loop when iterating an array in javascript?

Are there any performance difference between
var a = [10,20,30,40];// Assume we have thousands of values here
// Approach 1
var i, len = a.length;
for(i=0;i<len;i++){
alert(i);
alert(a[i]);
}
// Approach 2
for( i in a ){
alert(i);
alert(a[i]);
}

Use for (var i = 0, len = a.length; i < len; i++) because it's way faster and it's the correct way or iterating the items in an array.
First: It's not correct to iterate arrays with for (i in a) because that iteration will include enumerable properties in addition to array elements. If any methods or properties have been added to the array, they will be part of the iteration when using for (i in a) which is never what you want when trying to traverse the elements of the array.
Second: The correct option is a lot faster (9-20x faster). See this jsPerf test which shows the for (var i = 0; i < len; i++) option to be about 9x faster in Chrome and even more of a speed difference in Firefox: http://jsperf.com/for-loop-comparison2.
As an example of the problems that can occur when using for (var i in a), when I use that when the mootools library is included in the project, I get all these values for i:
0
1
2
3
$family
$constructor
each
clone
clean
invoke
associate
link
contains
append
getLast
getRandom
include
combine
erase
empty
flatten
pick
hexToRgb
rgbToHex
which appears to be a bunch of methods that mootools has added to the array object.

I don't know across browsers, but in my test with Firefox there is. for (i=0; etc...) is much faster. Here is a jsfiddle example that shows the difference. http://jsfiddle.net/pseudosavant/VyRH3/
Add to that the problems that you can encounter with (for i in etc) when the Array object has been prototype (perhaps in a library), you should always use for (i=0; etc...) for looping over arrays.
(for i in etc) should only ever be used on objects.

Develop Reference

JavaScript is the programming language of the Web.

Efficient duplicates search algorithm - javascript

This code could be the most efficient way you can do it ..!! Which is nothing but the direct implementation of set . function eliminateDuplicates(arr) { var i, len=arr.length, out=[], obj={}; for (i=0;i<len;i++) { obj[arr[i]]=0; } for (i in obj) { out.push(i); } return out; }

Related

Javascript loop time complexity

What is the time complexity of this recursive solution for removing duplicates?

for..in loop loops over non-numeric indexes “clean” and “remove”

Loop through arrays using arr[i]!==undefined

performance difference between for loop and for.. in loop when iterating an array in javascript?

Categories

Resources