Too often I find myself building selectors with string manipulation (split, search, replace, concat, +, join).
Good or bad?
What's wrong with that? What are the alternatives — just hardcoding them as single strings? But you may use conventions on your site for how the layout is organized. If you just define the selector components in one place, and use it to build a selector, sounds like this would be less hassle than going through all the code and doing search-replace everywhere it shows up.
I'd say it's good assuming you have the strings otherwise organized (defined in one place, used in several places).
This is somewhat unrelated to your question, but:
One thing I would recommend is to be cautious with descendant-based or child selectors (e.g.: 'div.product > span.price'). Often, UI parts are being reorganized, moved around or wrapped with something else. When it happens, descendant-based selectors break.
Another thing to keep in mind is that attribute-based selectors (e.g.: 'input[value="Login"]') are often fragile when dealing with localized content (if attribute values are localized).
Related
I'm interested in finding a more-sophisticated-than-typical algorithm for finding differences between strings, that can be "tuned" via some parameters, to balance between such things as "maximize count of identical characters" vs. "maximize the length of spans" vs. "try to keep whole words intact".
Ultimately, I want to be able to make the results as human readable as possible. For instance, if a long sentence has been replaced with an entirely new sentence, where the only things it has in common with the original are the words "the" "and" and "a" in that order, I might want it treated as if the whole sentence is changed, rather than just that 4 particular spans are changed --- just like how a reasonable person would see it.
Does such a thing exist? Although I'm working in javascript/node.js, an algorithm in any language would be helpful.
I'm actually ok with something that uses Monte Carlo methods or the like, if its results are better. Computation time is not an issue (within reason), nor is determinism.
Note: although this is beyond the scope of what I'm asking, I'll throw one more thing out there just in case: It would also be great if it could recognize changes that are out of order....for instance if someone changes the order of two paragraphs while leaving them otherwise identical, it would be awesome if it recognized it as a simple move, rather than as one subtraction and and one unrelated addition.
I've had good luck with diff_match_patch. There are some good options for tuning it for readability.
Try http://prettydiff.com/ Its code is already formatted for compatibility with CommonJS, which is the framework Node uses.
I am making an add-on for firefox and it loads a html page using ajax (add-on has it's XUL panel).
Now at this point, i did not search for a ways of creating a document object and placing the ajax request contents into it and then using xPath to find what i need.
Instead i am loading the contents and parsing it as text with regular expresion.
But i got a question. Which would be better to use, xPath or regular expression? Which is faster to perform?
The HTML page would consist of hundreds of elements which contain same text, and what i basically want to do is count how many elements are there.
I want my add-on to work as fast as possible and i do not know the mechanics behind regexp or xPath, so i don't know which is more effective.
Hope i was clear. Thanks
Whenever you are dealing with XML, use XPath (or XSLT, XQuery, SAX, DOM or any other XML-aware method to go through your data). Do never use regular expressions for this task.
Why? XML processing is intricate and dealing with all its oddities, external/parsed/unparsed entities, DTD's, processing instructions, whitespace handling, collapsing, unicode normalization, CDATA sections etc makes it very hard to create a reliable regex-way of getting your data. Just consider that it has taken the industry years to learn how to best parse XML, should be enough reason not to try to do this by yourself.
Answering your q.: when it comes to speed (which should not be your primary concern here), it highly depends on the implementation of either the XPath or Regex compiler / processor. Sometimes, XPath will be faster (i.e., when using keys, if possible, or compiled XSLT), other times, regexes will be faster (if you can use a precompiled regex and your query is easy). But regexes are never easy with HTML/XML simply because of the matching nested parentheses (tags) problem, which cannot be reliably solved with regexes alone.
If input is huge, regex will tend to be faster, unless the XPath implementation can do streaming processing (which I believe is not the method inside Firefox).
You wrote:
"which is more effective"*
the one that brings you quickest to a reliable and stable implementation that's comparatively speedy. Use XPath. It's what's used inside Firefox and other browsers as well if you need your code to run from a browser.
Implementing a large JavaScript application with a lot of scripts, its become necessary to put together a build script. JavaScript labels being ubiquitous, I've decided to use them as annotations for a custom script collator. So far, I'm just employing the use statement, like this:
use: com.example.Class;
However, I want to support an 'optional quotes' syntax, so the following would be parsed correctly as well
use: 'com.example.Class';
I'm currently using this pattern to parse the first form:
/\s*use:\s*(\S+);\s*/g
The '\S+' gloms all characters between the annotation name declaration and the terminating semi colon. What rule can I write to substitute for \S+ that will return an annotation value without quotes, no matter if it was quoted or not to begin with? I can do it in two steps, but I want to do it in one.
Thanks- I know I've put this a little awkwardly
Edit 1.
I've been able to use this, but IMHO its a mess- any more elegant solutions? (By the way, this one will parse ALL label names)
/\s*([a-z]+):\s*(?:['])([a-zA-Z0-9_.]+)(?:['])|([a-zA-Z0-9_.]+);/g
Edit 2.
The logic is the same, but expresses a little more succinctly. However, it poses a problem as it seems to pull in all sorts of javascript code as well.
/\s*([a-z]+):\s*'([\w_\.]+)'|([\w_\.]+);/g
Ok -this seemed to do it. Hope someone can improve on it.
/\s*([a-z]+): *('[\w_\/\.]+'|[\w_\/\.]+);/g
I am using the jQuery.extend filter for filtering items. When i use * for filtering, it is a problem, because the filter is case sensitive.
*People and *people are two different strings.
People and people filtered same items, but only when starting on this string.
How I disable case sensitivity for strings which start with *?
You should standardize on casing in your application; I prefer using camelCase. IE treats "People" and "people" as identical. Therefore, if you use those as ids then you may not even get the element you wanted. Also, when those strings are used for class names, the second will overwrite the first in IE. Other browsers follow the standard, and case matters. For that reason, it should also matter in jQuery.
That answer is making a lot of assumptions with your question. If you edit it, then I can better help you.
I know that accessing and manipulating the DOM can be very costly, so I want to do this as efficiently as possible. My situation is that a certain div will always contain a list of items, however sometimes I want to refresh that list with a completely different set of items. In this case, I can build the new list and append it to that div, but I also need to clear out the out list. What's the best way? Set the innerHTML to the empty string? Iterate over the child nodes and call "removeChild"? Something else?
Have a look on QuirksMode. It may take some digging, but there are times for just this operation in various browsers. Although the tests were done more than a year ago, setting innerHTML to "" was the fastest in most browsers.
P.S. here is the page.
set innerHTML to an empty string.
Emphasis on the can be costly. It is not always costly; in fact when dealing with a small number of elements it can be trivial. Remember optimizations should always be done last, and only after you have demonstrable evidence that the specific aspect you want to improve is really your performance bottleneck.
I recommend avoiding mucking with innerHTML if possible; it's easy to mess up and do something nasty to the DOM that you can't recover from as gracefully.
This method is quite fast for 99.9% of cases, unless you are removing massive hierarchy chunks from the DOM:
while(ele.childNodes.length > 0) {
ele.removeChild(ele.childNodes[ele.childNodes.length - 1])
}
I'd recommend testing a couple implementations on the browsers you care about. Different Javascript engines have different performance characteristics and different DOM manipulation methods have different performance on different engines. Beware of premature optimization and beware of differences in browsers.