I have made a framework that generates a HTML "DOM" tree on the server, as a tree of python objects, and then spits it out as a string to be sent to the client. The way it does this is via a recursive depth-first traversal of the tree: for example a div would spit out the opening "div", spit out all it's children's html and then spit out the closing "/div".
This tree is broken down into conceptual components, as shown below:
graph http://lhy.mit.edu/media/Flow_Chart.png
This only shows the first two levels of hierarchy; the actual site has many more: for example each comment in the comment bar is a self contained component, each button on the menu bar is a self contained component. As you can see, the various components do not need to be on the same depth in the tree. What constitutes a "component" is decided by me.
What I want is the complete html string for each component (everything from the root node of that component downwards), as well as the partial HTML string for every component (The HTML of that component, minus the HTML of its children). The partial HTML of main section, for example, would be the html, head and two div tags only. The complete html of main section, on the other hand, would be every node on the page.
How would i do this? I could just find the complete HTML string of every component and sub-component, mark the boundaries of each sub-component with some string and do Regex-Removals in order to find the partial HTML string for every component, but that feels clunky and inefficient.
I could do an iterative-deepening DFS, halting at the boundary between a component and its sub-components until every node in that component has been explored. I would then have the partial HTML for every component but then i would need to do a similarly hacky Regex-Inserts to later build up the complete HTML for every component.
I could do both, but that would take two passes and would be expensive, though maybe not as expensive as the above Regex gymnastics.
I could do a priority-queue Dijkstra's, having each component be strictly higher priority than its children. It would traverse the tree in the correct order, finishing each component before moving on to its children, but i have no idea how i would get the final well-formed HTML string out of it.
The purpose of all this is so the server can intelligently and completely autonomously determine the minimal set of components on the client's page that need to change on a page-transition between two arbitrary pages.
If i create a new page on my site, I should need no more than Zero extra lines of code to have it ajax smoothly with any existing page.
But first i need to get my graph-traversing html-spewing algorithms in order. Any ideas?
I am presuming your client is Javscript code as you didn't specify anything.
Don't do anything too complicated. In particular, for the love of god don't try using regexes to work with HTML.
Is your server sending you a fully funciton HTML string? In this case, you can convert this into an actual DOM you can work with (there are many ways to do so) and then use the .innerHTML of an element to get your "complete html"s and use the .tagName to get a tag's name.
I still don't really get why you need all this complication. If you already went through the trouble of downloading the whole "new page" there isn't too much of a reason to try to change as few parts as possible - just replace averything and forget about it (the calls to the server should be the most expensive thing anyway).
If you really want to use less brute force, than you should find a way to request/be notified of only the interesting changes without having to look at everything. Then, given the part that is to be changed and the text, you just need to do something like
document.getElementById('mainCommentArea').innerHTML = newHTML;
Related
I am trying to add 'sticky note' annotations (which i call TourPoints) to a React-based prototype I am creating. I created a TourPoint component which I have been manually 'wrapping' around elements of my interface as I go. The TourPoint displays the 'content' as a pink tag on the side of the element.
<TourPoint content="Sticky note content goes in this prop">
<button id="elementToWrap">Button element to annotate</button>
</TourPoint>
However, this gets a little messy and bloats my code... With jQuery, I used to be able to write a script where I could keep something like TourPoints neatly in a separate javascript file, then simply target DOM ids or classes to append elements.
$("#elementToWrap").wrap( "<div class='tourpoint'>Sticky note content goes here</div>" );
// $.append() or $.insertBefore() were also useful functions for this kind of thing
I am wondering how I might do a similar thing in React, and thought refs { useRef } might come to the rescue - but have not used refs before and can't quite get my head around if this is the right approach, or whether I am barking up the wrong tree with this.
The idea would be able to reference a ref globally (?) so that i can simply append the TourPoint to the element from a separate js/jsx file (sorry, no code example, as I really don't know what this would look like...)
The ease of having my TourPoints managed from a central file for the application what i am trying to achieve. The application has multiple pages and use React-Router.
Any pointers on how to think about this problem in the 'React' way would be most welcome.
I wanted to know if there is a way for filter the innerHTML of a DOM to just contain the actual HTML and discard all the comment nodes?
Actually, I'm working with Angularjs and writing some tests with Selenium. And Angular litters the rendered HTML with a lot of comments such as:
<!-- ngSwitchWhen: join -->
<div data-ng-switch-when="leave">
<!-- ngIf: isNow -->
.
.
.
</div>
I'm trying this currently for matching the result: #client is the WebDriver instance.
#client.findElement(By.xpath("//*[#id='log']/li")).getAttribute('innerHTML').then (innerHtml) ->
html = innerHtml.trim()
expect(html).to.equal """
<div class="image"><i class="icon-refresh"></i></div>
<div class="fade-6 content">Getting more activities...</div>
"""
This creates a big problem when I'm trying to test the returned DOM's structure with Mocha. What do I test for? I can't possibly repeat all the useless comments in my expected value, that would be immensely wasteful.
Is there a better way?
Writing tests that rely on innerHTML is not a good idea at all.
When you fetch innerHTML, the browser serialises the information in the DOM into a new markup string which is not necessarily the same as the markup that was originally parsed to make the DOM.
Markup details such as:
what order attributes are in
what case tags are
what whitespace there is in tags
what quotes are used to delimit attribute values
what content characters are encoded as entity or character references
are not stored in the DOM information set so are not preserved. Different browsers can and will produce different output. In some cases IE even returns invalid markup, or markup that does not round-trip back to the same information set when parsed.
+1 katspaugh's answer demonstrates ways to get the information out of the DOM rather than relying on innerHTML, which avoids this problem.
However, more generally, it is usually a bad idea to write tests that depend strongly on the exact markup your application uses. This is too-tight coupling between the requirements in the test and the implementation details. And any little change you make to the markup for even a trivial stylistic reason or textual update means you have to update all your tests to match. Tests are a useful tool to catch things that you didn't mean to break; tests that always break on every change are giving you no feedback on whether you broke something so are non-useful.
Whilst there's generally no magic bullet to separate tests completely from application markup, generally you should reduce the test to the minimum that satisfies the user's requirement, and add signalling to catch those cases. I don't know what exactly your app is doing but I would guess the requirement is something like: "When the user clicks the 'more' button, a busy-spinner should appear to let them know the information is being fetched".
To test this you might do a check like "does the element with id 'log' contain an element with class 'icon-refresh'?". If you wanted to be more specific that it's a spinner to do with fetching activities, you could add a class like "refresh-activities" to the "Getting more activities..." div, and detect the element with that class instead of relying on text which is likely to change (especially if you ever translate your app).
Comment nodes are DOM nodes, as you know. You can iterate over all nodes and filter comments out by their node type:
recursivelyIterate(container, function (subNode) {
if (subNode.nodeType == Node.COMMENT_NODE) {
subNode.parentNode.removeChild(subNode);
}
});
(I haven't included the code for recursivelyIterate function, but it should be trivial to write one.)
Alternatively, leave them comments be and don't work with DOM nodes, work with DOM elements. getElementsByTagName, querySelectorAll and friends.
I have a bug I'm trying to track down, and it is very difficult to do so because of the complexity of the web app. There are many frames, and many instances of Javascript code that is embedded into the HTML in different ways.
The thing that needs to be fixed is a sub-page created with showModalDialog (so you already know it's going to be a disaster), and I am hoping that I can find a way to serialize as much of the DOM as possible within this dialog page context, so that I may open it to the same content both when the bug is present and when it is not, in hopes of detecting missing/extra/different Javascript, which would become apparent by pumping the result through a diff.
I tried jQuery(document).children().html(). This gets a little bit of the way there (it's able to serialize one of the outer <script> tags!) but does not include the contents of the iframe (most of the page content is about 3 iframe/frame levels deep).
I do have a custom script which I'm very glad I made, as it's able to walk down into the frame hierarchy recursively, so I imagine I can use .html() in conjunction with that to obtain my "serialization" which I can then do some manual checking to see if it matches up with what the web inspector tells me.
Perhaps there exists some flag I can give to html() to get it to recurse into the iframes/frames?
The real question, though, is about how to get a dump of all the JS code that is loaded in this particular page context. Because of the significant server-side component of this situation, javascript resources can be entirely dynamic and therefore should also be checked for differences. How would I go about (in JS on the client) extracting the raw contents of a <script src='path'> tag to place into the serialization? I can work around this by manually intercepting these resources but it would be nice if everything can go into one thing for use with the diff.
Is there no way to do this other than by separately re-requesting those JS resources (not from script tags) with ajax?
I've had this happen to me three times now and I feel it's time I learned how to avoid this scenario.
Typically, I build the HTML. Once I'm content with the structure and visual design, I start using jQuery to wire up events and other things.
Thing is, sometimes the client wants a small change or even a medium change that requires me to change the HTML, and this causes my javascript code to break because it depends on HTML selectors that no longer exist.
How can I avoid digging myself into this hole every time I create a website? Any articles I should read?
Make your selectors less brittle.
Don't use a selector by index, next sibling, immediate child, or the like
Use classes so even if you have to change the tag name and the element's position in the HTML, the selector will still work
Don't use parent() or child() without specifying a selector. Make sure you look for a parent or child with a specific class
Sometimes, depending on the amount of rework, you'll have to update the script. Keep them as decoupled as possible, but there's always some coupling, it's the interface between script and HTML. It's like being able to change an implementation without having to change the interface. Sometimes you need new behavior that needs a new interface.
I think the best way to help you is for you to show a small sample of a change in the HTML that required a change to your jQuery code. We could then show you how to minimize changes to JS as you update the HTML
I'm trying to make an AJAXy submission and have the resulting partial be inserted into my list at the proper place. I can think of a few options, but none is terribly good:
Option 1: Return JSON, do rendering in Javascript. That seems like the wrong place to render this, especially since the list itself is rendered in my application server. It has the benefit, though, of making it easy to access the value to be sorted (response.full_name).
Option 2: Return an HTML fragment, parse the sort value out. Parsing HTML in Javascript is probably worse than rendering it.
Option 3: Return an HTML fragment that also contains a <script> section that gets evaluated. This could add the DOM node to a master list and then make a JS call to insert itself at the right point. The downside here is that IE doesn't evaluate <script> tags when innerHTML or appendChild are called.
Personally I would do #1. Nothing is wrong with combining the server-side generated HTML with the client-side generated one, but if it is a complicated procedure it is better to keep it in one place (on the server in your case). So you may want to return (as JSON) two values: the sort value, and the HTML snippet.
After that it is simple: find the position, instantiate the snippet (e.g., using dojo.html.set()), and place it with dojo.place(). Or instantiate it directly in-place.