How to find height of a HTML element in node.js?

How to find height of a HTML element in node.js? - javascript

I am stuck with quite a tricky issue for a couple of days now. I have a auto-generated HTML page in a variable as string in node.js. I need to find the height of some HTML elements and do HTML manipulations (like tag creation, deletion, append, css attribute setting etc).
Obviously I need to make a DOM like structure of my HTML page first and then proceed.
While for the HTML manipulations I have many options like cheerio, node.io, jsdom etc but none of these allow me to find the height of the element at the node.
So after wasting quite a lot of time on it, I have decided to look for heavier solutions, something like implementing a headless browser (phantomjs etc.) at the node and drive an elements offsetHeight through plain javascript.
Can anyone tell me if it is possible to reach my objective like this? What headless browser will be best suited for this task?
If i am going in the wrong direction, then can anyone suggest me any other working solution?
At this point I am ready to try anything.
Thnx in advance!!
Note: Using javascript at the client side has many problems in my particular case because the contents of the generated HTML page are supposed to be used by the client to paste in his website. Leaving a running javascript that re-structures the HTML will make it difficult at his end.

Node's server-side HTML libraries (like cheerio and jsdom) are strictly DOM API emulation libraries. They do not attempt to actually render a document, which is necessary to compute element size and position.
If you really need to calculate the size of an element on the server, you need a headless browser like PhantomJS. It is a full WebKit renderer with a JavaScript API. It is entirely separate from node, so you either need to write a utility script using Phantom's API, or use a npm module that lets you control Phantom from node.
After reading the comments under your question, it is pretty clear that you should not be calculating heights on the server. Client-side code is the proper place to do it.

Related

Retrieve website DOM via http request in GO

in GO I use HTTP request to get a site html and I see in some elements difference than when using Inspect in Chrome. A search in google and some reading led me to understand that what I see in Inspect is a stage called DOM which takes the raw html and runs some java scripts that add info and alter elements (go easy on me, I'm new at this ^_^).
Is there a why I can receive in GO the DOM instead of the raw html? I know I can use Chromedp, but I'm hoping for something more like some sort of an HTTP package because Chromedp a bit heavy on performance.
I would really appreciate any suggestions, thank you.

A simple HTTP request (via Go or anything else) will only ever get the raw HTML. The DOM is a browser-generated interpretation of the raw HTML. Yes, there is even something like the Shadow DOM.
JavaScript is interpreted by the browsers' JavaScript engine which applies changes to the DOM, adds event listeners and dynamically manipulates said DOM.
This is why you cannot get the DOM state you see in a browser through a HTTP request. The request does not contain all the client-side DOM manipulations done through a browsers' JavaScript engine. A request library is not a browser.
To get access to the full rendered DOM you're accustomed to see in the Developer Tools, you're going to need a more involved web scraping setup, usually involving a headless browser, like Puppeteer. However, this is written in Node.js. Given Go, you may have better luck with chromedp or cdp.

DOM stands for "Document Object Model", which is a tree of nodes where each node represents an element of the underlying document. Nodes may correspond to elements, text, comments, etc. There are many go-based DOM packages around. One you should look at is:
https://godoc.org/golang.org/x/net/html
It lets you parse HTML, and traverse elements of the document programatically.

Programatically reload CSS files with Selenium or JavaScript

I'm currently working on a tiny Sublime Text plugin that involves controlling a Chrome instance with Selenium. One of the features I'm working on is live reloading of styles. The idea is to reload the CSS on the browser (through Selenium) any time you make a change on any of your CSS files.
I can easily reload the browser, but I don't want to do that, because its rather slow, and maybe you have some input on the page, which would be lost. Ideally I would like to force Chrome to reload all styles without reloading the page. Since I can inject JavaScript code into Chrome with Selenium, it suffices an answer using JavaScript only, I can deal with the Selenium part. However, if there is some Selenium-specific way of doing this, even better!
I would rather not depend on jQuery or any other external libraries for this, but if needed, I can live with that also.
For the time being, I don't need compatibility with any other browser than Chrome (>31), but if there is a cross-browser compatible solution, it would be a plus!
EDIT: After reading a few answers to similar questions, I want to add a few constrains:
I'm not writing the HTML, so I cannot change the way styleheets are added, or the order, neither can I expect them to have a specific id, or anything like that. I can inject any JavaScipt I want, but I cannot control how the HTML was generated in the first place.
I don't want to append ?v=random() or anything like that, because I don't have control over the server either, so I don't know if the server will look at that and do something different. Besides, I would not like to circumvent the caching, if the server responds 302 and the browser caches, then I'm OK with that.
If possible, I would like the solution to work also for programatically injected CSS (as far as is possible), since the page may be using some CSS AMD loader, or any other Ajax stuff. Ideally, I would like to say to Chrome "please reapply all CSS" as abstract as possible, instead of relying on finding all link tags and such, because those solutions always have a gotcha.
Just to clarify, I know these are kind of hard constrains, but I'm developing a plugin for a text editor, so I need to cope with every technology/web framework/methodology the developer is using, so I cannot trust too much that the developer will follow certain patterns. Of course, I would like to cope with as many situations as possible, but I'm willing to drop unrealistic constrains if necessary. However, I do can force developers to use a specific version of Chrome, at least for developing.
As of now, none of the answers I've found on SO so far provide this level of development-agnosticism.

How do I render an html file in javascript?

OK, I am using javascript sever side, including node.js. Because of performance issues, we have decided to move one page to being rendered server-side, not client side, so the server returns a stream of html, fully rendered, back to the client.
I have seen this question and the related answers, but wondered if this was the best or right approach. In particular, what is the most appropriate way to render a page, and run all of the javascript on it within a js or node.js call?
Ideas that I have looked at:
Call the javascript code directly on the page, and invert everything to make it generate the html items needed. As this is urgent, I would rather avoid re-writing any more than I have to.
Render a document, with a simple iframe to generate the html. But how do I point to the page in the iframe, as I am server side? Surely this is just adding another level of abstraction to the same problem.
Using the ideas detailed above, but I am wondering whether this is the right route, given some of the problems I have seen encountered with it.
EDIT: Just to clarify - I want to, in effect, load the html page in a browser, let it finish rendering, and then capture the entire generated html for passing through to the client (saving the time to render on the client).

This is a simple example that does server-side templating (no express): https://github.com/FissionCat/handlebars-node-server-example
This is an example that serves html, js, css and an mp3 (but doesn't use express or any templating): https://github.com/FissionCat/Hue-Disco

There's some pretty useful documentation found here: http://www.hongkiat.com/blog/node-js-server-side-javascript/
Like you said, avoiding lots of rewriting is a bonus.

Might be the information provided in the article be of some help.

Appending base tag to head with JavaScript

Can you append a base tag to the head of a document from a div in the body of the document using JavaScript? By that, I mean, what are some drawbacks of doing that? My concern is that I'll run into a sort of race condition because the base tag is understood to exist in the head so it won't get respected if the page has already been rendered. I haven't yet experienced this problem, but I was wondering whether it should be a concern.
To be clear, I know how do this via JavaScript. My question is whether the tag will be respected/honored if it's appended to the DOM after the page loads/renders...
My code is an HTML fragment that is likely to appear in the body, but I need to set the base tag because my assets are referenced relatively. Let's assume that I can't change that (because I can't. At least, not right away). You can also assume that setting the base won't break anything that's not my HTML fragment and that there are no other base tags...ever.

Yes, for example:
<script>
var base = document.createElement('base');
base.href = 'http://www.w3.org/';
document.getElementsByTagName('head')[0].appendChild(base);
</script>
I don’t see why you would want to do this, but it’s possible.

I might be wrong (or partially wrong depending on how each browser chose to implement that), but AFAIK the document URL base is parsed only once. By the time you append that BASE Element to the DOM it is already too late.
EDIT: Looks like I was wrong
Apparently, there is a way. But there are also downsides about search engines.

Jukka to answer your question of WHY you would want to do it that way.
Example.
A mobile application such as phonegap that is a thin wrapper around a webapp, but smart enough to know if it's running in a browser or on the device.
Once it knows that it's on a device, then it needs to know the base url so it can properly locate everything that was previously referenced as relative URLs.
In our case, we have 4 different systems, dev, test, beta & live, each with different URLs.
Usually changes are incremental, but a lot of times we do want to test back and forth between each system, for instance in a/b testing.
Since the routing layouts are basically identical, switching back and forth on the base URL makes a lot of sense.
Remember many web apps use a static asset such as an html page for the application skeleton, javascript for the glue logic and a web based backend that is really nothing more than a thin layer over a DB. eg MEAN apps are frequently this way.
Building your apps this way provides a phenomenal speed up in scalability and responsiveness since the "web" server doesn't have to slow down long enough to construct the page view as happens in template languages.
Anyways setting the base url means being able to change where the app sources it's data on the fly and can be incredible speed up for developer productivity due to code reuse.

Search engines?
There was a time when search engines crawling bots did not "understand" or run any of the Javascript code. In this case, such bots would get all the links wrong and the crawling would stop right there.
So basically it might hamper some crawlers from crawling and indexing your links.

Safely parse/work with HTML from XMLHttpRequest

I'm writing code (right now it's a Chrome extension, though I may make it cross-browser, or try to get the site owners to include the enhancement) that works with the contents of a particular <div> on a webpage that the user is viewing, and any pages that are part of the same discussion thread. So I find links to the other pages of the thread, and get them with XMLHttpRequest. What I want to do is just be able to use .getElementsByClassName('foo') on the resulting page.
I know I can do that by loading the results of the request into a div (i.e. Optimal way to extract a URL from web page loaded via XMLHTTPRequest?). However, while figuring out the best way to do this, I read that there are security concerns (MDN - Safely Parsing Simple HTML to DOM).
In this case, I'm not sure that matters much, since the extension would just load a page from the same comment thread that the user was already looking at, but I'd still like to do this the right way.
So what's the right way to work with HTML from an XMLHttpRequest?
P.S. If the best answer is jQuery, then tell me that, but I've yet to start using jQuery, and would also like to know the fundamentals here.
Edit: I don't know why I phrased things the way I did, but let me be clearer that I'm really hoping for a non-JQuery answer. I've been trying to learn the basics of javascript before learning JQuery and I'd prefer not to import a whole framework to call one function when I don't understand what I'm doing. That may seem irrational, but it's what I'm doing for the moment.

Since you say you're not opposed to using jQuery, you should look at the load function. It loads html from the address you specify, then places it into the matched elements. So for example
$("#formDiv").load("../AjaxContent/AdvSearchForm.aspx?ItemType=" + ItemType);
Would load the html from ../AjaxContent/AdvSearchForm.aspx then place it in the div with the id of formDiv
Optional parameters exist for passing a data to the server with the request, and also a callback function.

Develop Reference

JavaScript is the programming language of the Web.