I have been creating some data structures to keep my skills sharp. I created a BST and have been stress testing its speed against an array. I noticed that the insertion speed for the BST is far slower than the array.push. I have been using Math.random and adding millions of numbers to both data structures. Oddly the BST is much faster at finding a value then the array.includes/indexOf. Is there a better way to code my insert function or is inserting just a slow part with BST
Here is my code
insert(data) {
if(this.root === null) {
this.root = new Node(data)
return this
}
let current = this.root
while(current) {
if(current.data === data) {
console.log(data + ' Already exists in tree')
return
}
if(data < current.data) {
if(current.left === null) {
current.left = new Node(data)
this.size++
return this
}
current = current.left
}
if(data > current.data) {
if(current.right === null) {
current.right = new Node(data)
this.size++
return this
}
current = current.right
}
}
}
Your code is perfectly fine.
TL;DR: Your comparison/performance benchmark is inaccurate
My answer assumes you are familiar with the Big O notation for time measurement of algorithms, if you aren't do say so in a comment and I will edit my answer.
The thing is, array.push is a simple operation since it always appends the element at the end of the array, which takes O(1) (constant) time, while inserting an element into a BST means you are looking for the right place to insert it, because it has to be in order, you don't just chuck it at the end of the tree like you do with array,push. This operation takes more time (O(logn) where n is the number of nodes in the tree, to be precise), so if you compare these two, of course array.push will be faster.
If you tried inserting an element in order into the array, it would have been much much slower than inserting it into a BST, because you would have to search each and every element in the array until you come to the right spot, then move everything to fit your new element in the array, which takes O(n) time, where n is the number of elements in the array.
So in conclusion, BST excel at finding and inserting elements in order, and when the order doesn't matter and you can just put the element wherever, array will usually do it faster, so that's why searching in your BST is faster than includes or indexOf for an array
Related
See edit too
I'm working in JavaScript but readable psuedocode of any kind may be able to answer my question.
I'm going to struggle describing this so please feel free to comment clarifications - I will answer them and refine my post accordingly.
Essentially, I have an array that acts as a queue, items are added, and when they are processed they need to be removed and more will eventually be added. What is the fastest way to get the first item in an array when I don't care about the index? I just want to iterate on a first-added basis without having to shift all entries in the array down each time. Also worth mentioning I am not looping through the array, I have a master loop for my app that checks if the array has items on each loop, and if it does, it will grab that data then remove it from the array. Currently to do this quickly I use pop, but now recognize I need to get the oldest item in the queue first each time, not the newest.
For more clarification if needed:
In my app I have blocks of raw data that first need to be ran through a function to be ready to be used by other parts of my app. Each block has a unique ID that I pass to the function in order for it to be parsed. For optimization purposes I only parse the blocks of data as it is needed.
In order to do this, my current system is when I realize I need a block to be parsed, I push its unique ID into an array, then a continuous loop in my app checks said array constantly, seeing if it has items in it. If it does, it pops the last item of the array and passes the unique id into the parsing function.
For performance reasons, on each iteration of the loop, only one block of data can be parsed. The issue arises when multiple blocks of data are in queue array already, and I add more items to the array before the loop can finish passing the already existing ID's in the array to the function. Basically, new ID's that are needed to be parsed are added to the end of the array before my loop can clear them out.
Now, this isn't all too bad because new data is needed somewhat sparsely, but when it is, lots of ID's are added at once, and this is an attribute of the app I can't really change.
Since I'm using pop, the most recently added ID is obviously always parsed first, but I chose this method as I believed it to be the fastest way to iterate a queue like this. However, I've come to realize I would rather parse the oldest items in the list first.
In essence, I'm looking for a way to loop through an array oldest to newest without having to re-organize the array each time. The index of the array is not important, I just need first-added, first-parsed behavior.
For example, I know I could always just pass the 0th item in the array to my function then shift the rest of the entries down, however, I believe having to shift down the rest of the items in the array is too costly to performance and not really worth it. If I'm just dumb and that should have no real-world cost please let me know, but still it seems like a band-aid fix. I'm certain there is a better solution out there.
I'm open to other data structures too as the array only holds strings.
Thank you
EDIT: While doing more googling I'm having a face palm moment and realized the problem I'm describing is a stack vs a queue. But now my question moves to what is the fastest implementation of a queue when the index isn't really of value to me?
The following is the FIFO queue implementation using singly linked list in javascript.
For more information of linked list -> https://www.geeksforgeeks.org/linked-list-set-1-introduction/
// queue implementation with linked list
var ListNode = function(val,next = null){
this.val = val
this.next = next
};
var Queue = function(){
let head = null;
let tail = null;
this.show = function(){
let curr = head;
let q = [];
while(curr){
q.push(curr.val);
curr = curr.next;
}
return q.join(' -> ');
}
this.enqueue = function(item){
let node = new ListNode(item);
if(!head) {
head = node;
tail = node;
} else {
tail.next = node;
tail = node;
}
}
this.dequeue = function(){
if(!head) return null;
else {
let first = head;
head = head.next;
first.next = null;
return first;
}
}
}
var myQueue = new Queue();
myQueue.enqueue(1); // head -> 1
console.log(myQueue.show())
myQueue.enqueue(2); // head -> 1 -> 2
console.log(myQueue.show())
myQueue.enqueue(3); // head -> 1 -> 2 -> 3
console.log(myQueue.show())
myQueue.dequeue(); // head -> 2 -> 3
console.log(myQueue.show())
After I complete a Leetcode question, I always try to also determine the asymptotic time complexity, for practice.
I am now looking at problem 26. Remove Duplicates from Sorted Array:
Given a sorted array nums, remove the duplicates in-place such that
each element appears only once and returns the new length.
Do not allocate extra space for another array, you must do this by
modifying the input array in-place with O(1) extra memory.
Clarification:
Confused why the returned value is an integer but your answer is an
array?
Note that the input array is passed in by reference, which means a
modification to the input array will be known to the caller as well.
Internally you can think of this:
// nums is passed in by reference. (i.e., without making a copy) int
len = removeDuplicates(nums);
// any modification to nums in your function would be known by the caller.
// using the length returned by your function, it prints the first len elements.
for (int i = 0; i < len; i++) {
print(nums[i]);
}
Example 1:
Input: nums = [1,1,2]
Output: 2, nums = [1,2]
Explanation: Your
function should return length = 2, with the first two elements of nums
being 1 and 2 respectively. It doesn't matter what you leave beyond
the returned length.
My code:
/**
* #param {number[]} nums
* #return {number}
*/
var removeDuplicates = function(nums) {
nums.forEach((num,i) => {
if(nums[i+1] !== null && nums[i+1] == nums[i] ){
nums.splice(i, 1);
console.log(nums)
removeDuplicates(nums)
}
})
return nums.length;
};
For this problem, I got O(log n) from my research. Execution time halves each time it runs. Can someone please verify or determine if I am wrong?
Are all recursive functions inherently O(logn)? Even if there are multiple loops?
For this problem, I got O(log n) from my research. Execution time halves for each time it's run. Can someone please verify or determine if I am wrong?
The execution time does not halve for each run: imagine an extreme case where the input has 100 values and they are all the same. Then at each level of the recursion tree one of those duplicates will be found and removed. Then a deeper recursive call is made. So for every duplicate value there is a level in the recursion tree. So in this extreme case, the recursion tree will have a depth of 99.
Even if you would revise the algorithm, it would not be possible to make it O(log n), as all values in the array need to be read at least once, and that alone already gives it a time complexity of O(n).
Your implementation uses splice which needs to shift all the values that follow the deletion point, so one splice is already O(n), making your algorithm O(n²) (worst case).
Because of the recursion, it also uses O(n) extra space in the worst case (for the call stack).
Are all recursive functions inherently O(logn)?
No. Using recursion does not say anything about the overall time complexity. It could be anything. You typically get O(logn) when you can ignore O(n) (like half) of the current array when making the recursive call. This is for instance the case with a Binary Search algorithm.
Improvement
You can avoid the extra space by not using recursion, but an iterative method. Also, you are not required to actually change the length of the given array, only to return what its new length should be. So you can avoid using splice. Instead, use two indexes in the array: one that runs to the next character that is different, and another, a slower one, to which you copy that new character. When the faster index reaches the end of the input, the slower one indicates the size of the part that has the unique values.
Here is how that looks:
var removeDuplicates = function(nums) {
if (nums.length == 0) return 0;
let len = 1;
for (let j = 1; j < nums.length; j++) {
if (nums[j-1] !== nums[j]) nums[len++] = nums[j];
}
return len;
};
Look at these two pieces of code, the second only add the third line. But time is 84 times. Anybody can explain why?
let LIMIT = 9999999;
let arr = new Array(LIMIT);
// arr.push(1);
console.time('Array insertion time');
for (let i = 1; i < LIMIT; i++) {
arr[i] = i;
}
console.timeEnd('Array insertion time');
let LIMIT = 9999999;
let arr = new Array(LIMIT);
arr.push(1);
console.time('Array insertion time');
for (let i = 1; i < LIMIT; i++) {
arr[i] = i;
}
console.timeEnd('Array insertion time');
The arr.push(1) operation creates a "sparse" array: it has a single element present at index 9999999. V8 switches the internal representation of such a sparse array to "dictionary mode", i.e. the array's backing store is an index→element dictionary, because that's significantly more memory efficient than allocating space for 10 million elements when only one of them is used.
The flip side is that accessing (reading or writing) elements of a dictionary-mode array is slower than for arrays in "fast/dense mode": every access has to compute the right dictionary index, and (in the scenario at hand) the dictionary has to be grown several times, which means copying all existing elements to a new backing store.
As the array is filled up, V8 notices that it's getting denser, and at some point transitions it back to "fast/dense mode". By then, most of the slowdown has already been observed. The remainder of the loop has some increased cost as well though, because by this time, the arr[i] = i; store has seen two types of arrays (dictionary mode and dense mode), so on every iteration it must detect which state the array is in now and handle it accordingly, which (unsurprisingly) costs more time than not having to make that decision.
Generalized conclusion: with JavaScript being as dynamic and flexible as it is, engines can behave quite differently for very similar-looking pieces of code; for example because the engine optimizes one case for memory consumption and the other for execution speed, or because one of the cases lets it use some shortcut that's not applicable for the other (for whatever reason). The good news is that in many cases, correct and understandable/intuitive/simple code also tends to run quite well (in this example, the stray arr.push looks a lot like a bug).
Preface
Notice: This question is about complexity. I use here a complex design pattern, which you don't need to understand in order to understand the question. I could have simplified it more, but I chose to keep it relatively untouched for the sake of preventing mistakes. The code is written in TypeScript which is a super-set of JavaScript.
The code
Regard the following class:
export class ConcreteFilter implements Filter {
interpret() {
// rows is a very large array
return (rows: ReportRow[], filterColumn: string) => {
return rows.filter(row => {
// I've hidden the implementation for simplicity,
// but it usually returns either an empty array or a very short one.
}
}).map(row => <string>row[filterColumn]);
}
}
}
It receives an array of report row, then it filters the array by some logic that I've hidden. Finally it does not return the whole row, but only one stringy column that is mentioned in filterColumn.
Now, take a look at the following function:
function interpretAnd (filters: Filter[]) {
return (rows: ReportRow[], filterColumn: string) => {
var runFilter = filters[0].interpret();
var intersectionResults = runFilter(rows, filterColumn);
for (var i=1; i<filters.length; i++) {
runFilter = filters[i].interpret();
var results = runFilter(rows, filterColumn);
intersectionResults = _.intersection(intersectionResults, results);
}
return intersectionResults;
}
}
It receives an array of filters, and returns a distinct array of all the "filterColumn"s that the filters returned.
In the for loop, I get the results (string array) from every filter, and then make an intersection operation.
The problem
The report row array is large so every runFilter operation is expensive (while on the other hand the filter array is pretty short). I want to iterate the report row array as fewer times as possible. Additionally, the runFilter operation is very likely to return zero results or very few.
Explanation
Let's say that I have 3 filters, and 1 billion report rows. the internal iterration, i.e. the iteration in ConcreteFilter, will happen 3 billion times, even if the first execution of runFilter returned 0 results, so I have 2 billion redundant iterations.
So, I could, for example, check if intersectionResults is empty in the beginning of every iteration, and if so, then break the loop. But I'm sure that there are better solutions mathematically.
Also if the first runFIlter exectuion returned say 15 results, I would expect the next exectuion to receive an array of only 15 report rows, meaning I want the intersection operation to influence the input of the next call to runFilter.
I can modify the report row array after each iteration, but I don't see how to do it in an efficient way that won't be even more expensive than now.
A good solution would be to remove the map operation, and then passing the already filtered array in each operation instead of the entire array, but I'm not allowed to do it because I must not change the results format of Filter interface.
My question
I'd like to get the best solution you could think of as well as an explanation.
Thanks a lot in advance to every one who would spend his time trying to help me.
Not sure how effective this will be, but here's one possible approach you can take. If you preprocess the rows by the filter column you'll have a way to retrieve the matched rows. If you typically have more than 2 filters then this approach may be more beneficial, however it will be more memory intensive. You could branch the approach depending on the number of filters. There may be some TS constructs that are more useful, not very familiar with it. There are some comments in the code below:
var map = {};
// Loop over every row, keep a map of rows with a particular filter value.
allRows.forEach(row => {
const v = row[filterColumn];
let items;
items = map[v] = map[v] || [];
items.push(row)
});
let rows = allRows;
filters.forEach(f => {
// Run the filter and return the unique set of matched strings
const matches = unique(f.execute(rows, filterColumn));
// For each of the matched strings, go and look up the remaining rows and concat them for the next filter.
rows = [].concat(...matches.reduce(m => map[v]));
});
// Loop over the rows that made it all the way through, extract the value and then unique() the collection
return unique(rows.map(row => row[filterColumn]));
Thinking about it some more, you could use a similar approach but just do it on a per filter basis:
let rows = allRows;
filters.forEach(f => {
const matches = f.execute(rows, filterColumn);
let map = {};
matches.forEach(m => {
map[m] = true;
});
rows = rows.filter(row => !!map[row[filterColumn]]);
});
return distinctify(rows.map(row => row[filterColumn]));
Are there any adventage of using linked lists in javascript? Its main adventage over arrays (for example) is that we can insert element at random index without moving every element and that they are not limited to size as arrays.
However, arrays in JS are dynamically expanded, shrink, and arrays are faster to access data. We can also use Array.prototype.splice() method (indeed linked lists could be still faster than this one) to insert data.
Are there any advantages (speed and so on) of using linked lists over arrays in JavaScript then?
Code of basic linked lists using JS.
function list() {
this.head = null;
this.tail = null;
this.createNode=function(data) {
return {data: data, next: null }
};
this.addNode=function(data) {
if (this.head == null) {
this.tail = this.createNode(data);
this.head = this.tail;
} else {
this.tail.next = this.createNode(data);
this.tail = this.tail.next;
}
};
this.printNode=function() {
var x = this.head;
while (x != null) {
console.log(x.data);
x = x.next;
}
}
}
var list = new list();
list.addNode("one");
list.addNode("two");
list.printNode();
In a linked list if you are prepending or appending elements at the front or at the back then the time complexity is O(1), however it is O(n) for an array. However if you are retrieving an element from an array using the index then the time complexity will be O(1) against the linked list which would be O(n).
So it depends as to what you are trying to do, you need to create benchmarks and then test it as to which operation is taking how much time.
You can check the wiki:
I don't know the performance differences. As you say, linked lists have advantages over arrays in other languages in terms of memory allocation, garbage collection, sparseness, but Javascript arrays handle some of those problems. Nevertheless you still may have reason to use linked lists if your use case calls for that kind of data structure: that is, you only need to reach items starting from the front (or either end with doubly-linked lists) and proceeding from from item to next item, without the need for random access by array index.
Some colorful metaphors about linked lists here: What is a practical, real world example of the Linked List?