I was reading the dojo query tutorial and saw
// retrieve an array of nodes with the class name "odd"
// from the first list using a selector
var odds1 = query("#list .odd");
// retrieve an array of nodes with the class name "odd"
// from the first list using a DOM node
var odds2 = query(".odd", document.getElementById("list"));
and they explain that odds2 is faster than odds1 because odds2 searches for .odd in the #list dom rather than the whole html dom. What I'm wondering is what are the advantages of doing odds1 (other than cleaner code, i guess)? Because it seems to me for any case where query is searching for objects within an id element the odds2 style should always be used (assuming proper id, class html use), so why doesn't dojo automatically parse the query string in odds1 to call odds2?
Well looking at the code (http://svn.dojotoolkit.org/src/dojo/trunk/query.js for query and http://svn.dojotoolkit.org/src/dojo/trunk/selector/acme.js the default selector engine) it appears that the "big" performance improvements comes from the fact that the initial DOMNode List is reduced when you give the query method some help with the document.getElementById("list"), however it appears you can just pass the query method a string of the parent node's id and achieve the same performance.
query(".odd", "list");
Then there is the fact that you can cache the DOMNode list by storing the result of document.getElementById("list") in a variable and reuse it. However in general readability (in matters that are this trivial) tends to trump performance. Considering the number of problems that a bad JavaScript interpreter can hide, having readable code can end up saving you a lot of trouble.
Related
Okay, so I'm working on a method that accepts any form of iterable as its argument. I'd like to perform what amounts to a splice on the parameter, but - and this is the crucial bit - without changing its Symbolic Type. That is to say, I'd like the return value's Symbol to match that of the input (NodeList, DOMTokenList, HTMLCollection, etc.)
Contrived Example:
let iterableList = document.querySelectorAll('p');
// Result: Nodelist(5): p, p.body, p.body, p, p
Assumptions
Assume I wish to remove those two <p class="body"> tags (from the COLLECTION, NOT the DOM).
Assume iterableList is user-provided (e.g. I cannot simply alter the query to read p:not(.body))
Assume we do not know what kind of iterator is the input (i.e. in this example, by virtue of the fact it was a querySelectorAll that produced our results, it is a NodeList. Conversely, if it had been a getElementsByTagName it would be a HTMLCollection. Or any of a half dozen others)
Assume the return value should have the same Symbol as the input.
Assume I chose this example because the results are live nodes (and we know the subset could have come from a querySelectorAll('p:not(.body)'), so we know the subset is possible)
1. Now, obviously I can convert the symbol to an Array...
...through any number of means. I could then remove the two offending nodes, again by a variety of tactics. The problem is then my return value is an Array, and I've killed off the live node correspondence within the browser.
2. I could clone just the desired nodes...
...into a document fragment or similar, then re-query the fragment to return just the subset, but then I'd be targeting clones of the DOM nodes in question, not really those themselves, plus it would shake free any event bindings they had once had.
3a. I could insert a DOM node, append the array nodes I wish to keep, then query THAT, then put them back again, and destroy the inserted node...
... which keeps the nodes in tact (replete with any events bound to them) and even auto-remaps their xPaths to their original DOM locations...
3b. ...or even use the unique attributes of each node to attempt to dynamically construct a selector that matched only those I wished to keep...
...but because I don't know the Symbolic Type of iterator the collection is prior to receiving it, I would need a massive collection of such workarounds to conditionally handle each of the possibilities (and that's entirely notwithstanding the fact it'd be a bicycle-tire-with-a-banana solution that I'D never let through code review, either).
So I guess my question is:
Is there a way to modify, selectively clone, or even iteratively construct such an output? I don't need to perform any operation other than REMOVING one or more of the offending nodes; no edits or inserts.
A for...of statement will let one iterate them without conversion to the Array prototype, although I'm not aware of a way to call the Iterator constructors on the fly, even if I could figure out a way to move one of the live nodes from one collection to the other (the spread operator coerces the collection into an array).
Anyone know any good voodoo for this one? Or am I just SOL here?
I have a set of documents, each annotated with a set of tags, which may contain spaces. The user supplies a set of possibly misspelled tags and I wants to find the documents with the highest number of matching tags (optionally weighted).
There are several thousand documents and tags but at most 100 tags per document.
I am looking on a lightweight and performant solution where the search should be fully on the client side using JavaScript but some preprocessing of the index with node.js is possible.
My idea is to create an inverse index of tags to documents using a multiset, and a fuzzy index that that finds the correct spelling of a misspelled tag, which are created in a preprocessing step in node.js and serialized as JSON files. In the search step, I want to consult for each item of the query set first the fuzzy index to get the most likely correct tag, and, if one exists to consult the inverse index and add the result set to a bag (numbered set). After doing this for all input tags, the contents of the bag, sorted in descending order, should provide the best matching documents.
My Questions
This seems like a common problem, is there already an implementation for it that I can reuse? I looked at lunr.js and fuse.js but they seem to have a different focus.
Is this a sensible approach to the problem? Do you see any obvious improvements?
Is it better to keep the fuzzy step separate from the inverted index or is there a way to combine them?
You should be able to achieve what you want using Lunr, here is a simplified example (and a jsfiddle):
var documents = [{
id: 1, tags: ["foo", "bar"],
},{
id: 2, tags: ["hurp", "durp"]
}]
var idx = lunr(function (builder) {
builder.ref('id')
builder.field('tags')
documents.forEach(function (doc) {
builder.add(doc)
})
})
console.log(idx.search("fob~1"))
console.log(idx.search("hurd~2"))
This takes advantage of a couple of features in Lunr:
If a document field is an array, then Lunr assumes the elements are already tokenised, this would allow you to index tags that include spaces as-is, i.e. "foo bar" would be treated as a single tag (if this is what you wanted, it wasn't clear from the question)
Fuzzy search is supported, here using the query string format. The number after the tilde is the maximum edit distance, there is some more documentation that goes into the details.
The results will be sorted by which document best matches the query, in simple terms, documents that contain more matching tags will rank higher.
Is it better to keep the fuzzy step separate from the inverted index or is there a way to combine them?
As ever, it depends. Lunr maintains two data structures, an inverted index and a graph. The graph is used for doing the wildcard and fuzzy matching. It keeps separate data structures to facilitate storing extra information about a term in the inverted index that is unrelated to matching.
Depending on your use case, it would be possible to combine the two, an interesting approach would be a finite state transducers, so long as the data you want to store is simple, e.g. an integer (think document id). There is an excellent article talking about this data structure which is similar to what is used in Lunr - http://blog.burntsushi.net/transducers/
Backbone provides options to select models from collections both by ID (a unique identifier attribute assigned to every model) and by index. Which of these is the fastest way to access items from a collection?
Cracking open Backbone.js, I can see that collection.get(id) (the select-by-ID function) uses a simple object-literal look-up and collection.at(index) (the select-by-index function) uses a simple array look-up.
from Backbone.js:
collection.get(id):
// Get a model from the set by id.
get: function(obj) {
if (obj == null) return void 0;
return this._byId[obj] || this._byId[obj.id] || this._byId[obj.cid];
}
collection.at(index):
// Get the model at the given index.
at: function(index) {
return this.models[index];
}
Because of this, the answer to this question should be directly tied to which is faster - array access or object literal access (in this case assuming that .get is used in its first iteration, where it's sent an ID, and not a model with an ID, or CID on it).
According to this JSPerf, select by index (using collection.at(index)) is generally faster than select by ID (using collection.get(id)) but by how much varies widely by browser. On Chrome and at least one of the versions of Firefox I tested, the difference is negligible, but still systematically in favor of select by index; in IE11, however, select by index is consistently (and almost exactly) twice as fast.
The moral of the story here is to use select by index whenever possible; hashed object retrieval is fast and convenient, but lacks the raw efficiency of indexed look-ups.
To access objects from a hash, Javascript engines must go through an additional look-up step, this in addition the overall complexity of objects make them a less-than-ideal choice for any script where performance is a consideration.
I have found the following tutorial on creating a selector engine..
http://blog.insicdesigns.com/2010/04/creating-your-own-selector-engine/
In javascript we have functions like
getElementById()
getElementsByTageName()
getElementsByName()
etc,.....But for the same functionality,in their selector engine,they are doing checks like
this.nodes[i].tagName == nm.toUpperCase()
instead of getElementsByTagName.What is the advantage of this approach?...
Also what is the usage of assigning all nodes to a vairiable using
e.getElementsByTagName('*');
There is an inconsistency when you get the tagName property of elements. Some browsers return uppercase and others lowercase. To normalize the output of the value, you have to do one or the other before continuing further operation.
As for e.getElementsByTagName('*');, i recently answered a question where the OP wants to find ALL elements containing an attribute name which has a prefix mce_. The only way to get such elements is to get all elements in the DOM, and inspect their attribute names.
There is also a good application of this getElementsByTagName('*') and that is determining the direct child of an element. For instance, in a very deep DOM. If I were to find certain parent elements based on an attribute and get it's children, normally you would do a recursive search from body downwards to find the parent. This would take a lot of recursive operations. Afterwards, you determine their children.
Another way to do it is to get all tags, determine their parent node, and if they have the parent with the attribute, they are the direct child. This method requires no recursion, only getElementsByTagName('*') and a loop through the nodeList that was returned.
this.nodes[i].tagName == nm.toUpperCase() is part of a method to filter the list of nodes by tag name.... so nothing to do with 'getting elements by their tag name'
the last point is not a real question.. you want to know reasons for "why would you select all nodes"? well you are writing a sector engine....
The following line
this.nodes[i].tagName == nm.toUpperCase()
Is within the function ofTag. It's filtering a set of nodes looking for nodes with the given name.
The next line
e.getElementsByTagName('*');
Retrieves all the children/descendants under a node so you can later filter it like the following
new DOMNodes(Selector.getAll(document)).ofTag('p').hasClass('note');
I want to save an element's ID in a database column.
I also need to save the complete DOM hierarchy of that element in the same column. Later i will use this column information to get the value of that element using JavaScript parsing and traversing.
I am uncertain about which pattern to use to save this information.
e.g:
I am thinking about following patterns:
elementLocation = "iframe[iframeName],iframe[iframeName2],element[elementID]"
elementLocation= "frame[frameName],frame[frameName2],element[elementID]"
elementLocation= "i[iframeName],i[iframeName2],e[elementID]"
elementLocation= "f[iframeName],f[iframeName2],e[elementID]"
Please suggest a better pattern that can be used to represent any kind of hierarchy.
you can use the XML Path language, which is designed for this purpose.