I've started a new JavaScript project based on the example at:
http://bl.ocks.org/mbostock/4063570
Everything with the d3 Dendrogram is great so far except that my data will probably always contain duplicate leaves (terminal nodes). In my data only the leaves could ever contain duplicate data. All internal nodes (between root and leaves) are strictly distinct well before d3 comes into play.
I could add something to the node(s) name (d.name) to make each node totally unique, but I'd rather 'reuse' leaf nodes and make all internal nodes point to and share a single leaf if possible.
Does anyone out there know how to do this?
Many thanks in advance!
Drew Barfield
The D3 data join expects that each DOM node will correspond to a different element in the data array. However, there's nothing stopping 2 elements in the data array from referring to the same underlying object.
It comes down to whether you are OK with the default join key (which is array index) or if you want to achieve a sense of "object permanence" on data update by mapping specific data elements to specific nodes. To have that happen you need to define a custom join key function, which by definition relies on some way to differentiate the data elements.
Personally, I think that if you're doing any amount of data updating involving enter/exit/update, life will be much easier if each data element is unique and has some kind of "id" or "key" property that you can use to identify it. Reusing data elements will likely be more headache than it's worth.
You didn't actually mention what you are trying to achieve by sharing data? Is it just a memory saving optimization or is there another reason? If it's just memory, I wouldn't bother.
Related
I have to reach a value (direct access) many times in a very large 2D array. Is it better to assign a temporary variable or should I use the array[req.params.position.x][req.params.position.y].anyValue every time?
I know the "new variable" option would make it easier to look at it, I was wondering if that would make an impact on the performance of the code.
My hypothesis is that it acts as some kind of forEach in a forEach and thus takes more time to reach it every time.. ?
From your description array[req.params.position.x][req.params.position.y], it sounds like that whilst this is a 2D array, you also know up front the indexes of each array. This is direct access to the array which is extremely quick. It would be different if you needed to search for something in the array, but here you aren't needing that.
Internally, in browsers, this will be constant time access no matter how big the array is. It does not need to "lookup", since the passed indexes will reference the value location in memory -- where it will be retrieved directly.
So there is no performance concern here.
I have an object store that has an inline key path and two indexes. The first index identifies a portfolio, such as '2'. The second index identifies the module under a portfolio, such as '2.3' for the 3rd module of portfolio '2'. And the key path identifies the data object within the module, such as '2.3.5'. As the user builds his or her modules, the individual objects are written to the database.
Suppose a user decides to delete an entire portfolio of large size. I'd like to understand which of two ways would be the best in terms of efficiency and memory usage for deleting all data from the object store for that specific portfolio.
One method could open a cursor on the desired portfolio index within a single transaction to delete each item in the object store having that portfolio index value.
A second method, in my particular case, is to use a pointer. I have a pointer array that keeps track of every data object's key path in the object store. This is because a user could choose to insert a new item at position 2 within a module of 100 items. Rather than stepping through the store for the particular module and changing all the key paths for each item 2.3.2 through 2.3.100 to be 2.3.3 through 2.3.101, for example, I add the inserted item as 2.3.101 and place it at position 2 in the pointer, since it's much easier to update the pointer than to move large pieces of data in the database through copying them, deleting them, and writing them again with a new key path.
So, the second method could be to step through the pointer deleting all data objects by the key path stored in the pointer for that portfolio and to perform each deletion in a separate transaction.
The questions are:
Is it accurate that a large transaction, such as an open cursor across many data objects, requires the browser to store large amounts of data in memory in order to be able to rollback that transaction if any step fails along the way?
If so, is it better to employ the second approach since each transaction works on one data object only? Or, is it inefficient to repeatedly open small transactions on the same object store and search for each object by specific key path as opposed to just stepping through the store by an ordered index?
If the multiple-transaction approach is taken, would the browser release the memory quickly enough that there'd be a reduction in the total memory used during the entire process, or would it hold on to it until after the entire process completes such that it would accumulate to the same point anyway.
In this case, please assume the expected size of a large portfolio to be 50 modules of 100 objects each, such that the comparison is between a single transaction working on 5,000 data objects through a cursor versus performing 5,000 individual transactions on a single data object at a time through the known key path.
Perhaps, I am overthinking all of this because I am new to it and attempting to learn a bunch of things in a hurry. Thank you for any guidance you may be able to provide.
I'm a newbie to data structures + algorithms, and while practicing interview questions, I came across the following:
say that we have individual nodes connecting to and from other (possibly multiple) nodes:
(1) Generate the data structure to store it all
(2) Generate the function to retrieve a list of all the nodes within "n"
hops from a particular node
I'm particularly not sure about what "data structure" to use. I thought okay, maybe we can create a node class, and have an array to reference all the nodes that a particular node points to. Then, in order to find the nodes that are within "n" hops from a particular node, we can iterate through that array of nodes, recursively call the function (passing n-1 this time) until we hit a base case of "1" hops. Could someone explain the error in my thinking and/or improve on it?
Appreciate any suggestions!
I am using the InfoVis SpaceTree to visualize a tree. The complete tree is loaded in one call to the loadJSON method. Each node's children are already in the correct order. But the nodes do not display in the order they are defined in the data structure, i.e. according to their array index.
How can I make them display in the right order? Any help would be greatly appreciated.
The tree does not display nodes based on their order in the JSON data structure / array index.
Instead, it is based on the 'id' attribute, which is used as the key for storing nodes in an internal hash (well, technically an object). Note it is a hash and not an array, so the order is irrelevant.
A node with id 100 will always be displayed before a node with id 101. If you want your node's children to display in a particular order, make sure your child nodes are sorted by id.
I have a json nested object, similar to this.
In my case, I have a unique id field of type int(say instead name above). This is not a binary tree, but more depict parent-child relationship. I wanted a way to easy lookup the child tree (children) rooted at say id = 121. In a brute force way, I may compare all nodes till I find one, and return the children. But I was think of keeping a map of {id, node}. For example {"121" : root[1][10]..[1]}. This may be super wastefulness of memory (unless use a pointer to the array).Note sure any better way.
I have control over what to send from server, so may augment above data structure. but need a quick way to get child tree based on node id in the client side.
EDIT:
I am considering keeping another data structure, map of {id, []ids}, where ids is the ordered path from root. Any better way?
Objects in javascript are true pointer-based objects, meaning that you can keep multiple references to them without using much more memory. Why not do a single traversal to assign the sub-objects to a new id-based parent object? Unless your hierarchical object is simply enormous, this should be very fast.
In light of best practice and what would happen if the application you're building were to scale to millions of users, you might rethink whether you really want the server to do more work. The client's computer is sitting there, ready to provide you with remote computing power for FREE. Why move the work load to the server causing it to process fewer client requests per second? That may not be a direction you want to go.
Here is a fiddle demonstrating this index-building technique. You run through it once, and use the index over and over as you please. It only takes 4 or 5 ms to build said index. There is no performance problem!
One more note: if you are concerned with bandwith, one simple way to help with that is trim down your JSON. Don't put quotes around object key names, use one-letter key names, and don't use whitespace and line breaks. That will get you a very large improvement. Performing this change to your example JSON, it goes from 11,792 characters to 5,770, only 49% of the original size!
One minor note is that object keys in javascript are always Strings. The numeric ids I added to your example JSON are coerced to strings when used as a key name. This should be no impediment to usage, but it is a subtle difference that you may want to be aware of.
I don't assume that the ids are somehow ordered, but still it might help to prune at least parts of the tree if you add to each node the information about the minimum and maximum id value of its children (and sub... children).
This can be quite easily be achieved at server side and when searching the tree you can check if the id you're looking for is within the id-range of a node, before stepping inside and searching all children.