Convert GET Request (HTML string) to a full DOM object

Convert GET Request (HTML string) to a full DOM object - javascript

I'm writing a JavaScript script that periodically checks for new elements within a page, that is, DOM tree updates. One of those specific elements contains an hyperlink to other page. My objective is to perform a GET of that page and convert the results to a DOM object in order to trigger a particular event of a particular element within that page. I could do this by var newPage = window.open(hyperlink); and then have access to the elements within the page through newPage.document.getElementById('elementId');. However, the script iterates over many hyperlinks and it is not efficient to open them all up.
So, is there any way to manipulate an object of an entire page efficiently, i.e., without opening it (e.g., $.get(hyperlink, function(page) { // convert page to DOM });)?
Appreciate any answers,
Thanks.

Perhaps you're taking the wrong approach. Rather than convert the page to DOM, you should simply do a regex search for a link. That would clearly be the most efficient way to make use of a page's contents. However, admittedly, it is also a pain to do properly and it doesn't take into consideration links added by javascript.
It depends entirely on what your scope is. If you tell me you're looking for an efficient way of accomplishing this, then I offer this solution. Otherwise, there's no "quick" way of parsing an entire page into DOM no matter which way you slice it.
This will get you started on a regular expression for extracting html links.

Related

Page is not updating url or source code in chrome's dev tool when navigating the website

Goal
I'm making a Chrome extension to perform some manipulations on my university's website since the layout to select a course is bad. For this I need to access elements to read their inner information and also copy their CSS to add certain information that I will obtain from a different site, in a way that fits the style of the page.
Problem
When I open the source code on the exact page I want to use, it doesn't display the correct HTML. Instead it shows the main page's code under the dev tool. The interesting part is that when I highlight a certain element the code shows up and I'm able to make changes within the tool. But if I try to call a specific element under the console using $(id) or $$(id) it would show either null or [].
This causes some problems to because I'm new to any sort of web-related development and I would like to see the complete source so that I can select the elements I want and manipulate the page the way I would like. Maybe there is something I'm overlooking? that's why I need your help.
Possible reasons
I tried many things and try to research and concluded that it might have to do with frames since the url is not changing. However I'm not able to find any resources to teach me about frames (I know nothing about it) if that's the actual problem.
If the problem is another I would appreciate any assistance in solving it or any work around that I am not aware of.

The reason is definitely the use of frames. There are multiple documents at play here, the top level document and each frame has it's own document. This is important because the JavaScript you are executing is 99.9999% the top level document and not a child frame's document. Due to this, it's not finding the DOM nodes because it doesn't search the frames' documents.

Why focus an input on page load instead of inline?

Almost all web pages that I see designed to set the focus to an input box add the code into a body onload event. This causes the code to execute once the entire html document has loaded. In theory, this seems like good practice.
However, in my experience, what this usually causes is double work on the user, as they have already entered data into two or three fields and are typing into another when their cursor is jumped back without their knowledge. I've seen a staggering number of users type the last 2/3 of their password into the beginning of a username field. As such, I've always placed the JS focus code immediately after the input to insure there is no delay.
My question is: Is there any technical reason not to place this focus code inline? Is there an advantage to calling it at the end of the page, or within an onload event? I'm curious why it has become common practice considering the obvious practical drawbacks.

A couple thoughts:
I would use a framework like jQuery and have this type of code run on $(document).ready(.... window.onload doesn't run until everything on the page is fully loaded, which explains the delay you have experienced. $(document).ready(... runs when jQuery determines the DOM has been loaded. You could probably write the same sort of logic without jQuery, but it varies by browser.
I prefer to keep my Javascript separate from my HTML because it allows for a cleaner separation of concerns. Then your behavior is then kept separate from your document structure which is separate from your presentation in your CSS. This also allows you to more easily re-use logic and maintain that code — possibly across projects.

Google and Yahoo both suggest placing scripts at the bottom of the html page for performance reasons.
The Yahoo article: http://developer.yahoo.com/performance/rules.html#js_bottom
You should definitely place the script in the appropriate place if it means the correct user experience -- in fact I would load that part of the script (Used for tabbing inputs) before the inputs to ensure it always works no matter how slow the connection.
The "document.ready" function allows you to ensure the elements you want to reference are in the dom and fires right when your whole document dom is loaded (This does not mean images are fully loaded).
If you want you could have the inputs start out as disabled and then reenable them on document ready. This would handle the rare case the script is not ready yet when the inputs are displayed.

Well if you call it before whole page has loaded you really don't know if the element already has been loaded when you make your call. And if you make your call in pre-hand you should check if the element really exists even if you know it always should.
Then to make the call inline, which might seem ideal. But on the other hand it's really bad if a page takes that long to load that you can make several inputs during the loading phase.
Also you could check if input has been made etc.

Also it is possible to check if any input on page contains focus if($("input::focus, textarea::focus").length)... and otherwise set focus on desired input.

Use the autofocus HTML attribute to specify which element should initially receive focus. This decouples JavaScript and gracefully degrades in older browsers.

how to find a jquery or pure javascript associated with a class or div or any other elements in webpage

I was wondering if there is any way to find all the scripts associated with a particular element in web page.
That is if there is a photo, and there is two attached jquery function like on mouse over and on click, I need to get details of this functions without looking onto entire script.

One way is with a bookmarklet called Visual Event

There isn't really an easy way. I spent a few days trying to write an augmentation wrapper/extension that would track all even assignment in page and thus allow for inspection of such - the problem is that it requires tweaking for each library, and iirc wasn't useful if any native event assignment was used.
This is exactly the reason there needs to be well organized code, and remembering that "unobtrusive" doesn't mean "incomprehensible" - try to keep all your event assignments well organized and easily associated/found for a particular element.

Is it more efficient to keep DOM elements on the page or to re-render them as needed?

I have a dialog box that has settings associated with it. When the user clicks the "settings" button, a form is displayed so they can modify them.
What is more efficient:
to have the settings div exist hidden on the page and display when needed
OR
to create the settings div and populate it with data when needed?
In the first scenario you don't need to create the DOM elements and populate them every time, but if there are many dialog boxes open at once (a common situation) then the amount of elements on the page is pretty large and many of them are not going to be used often. But in the second situation, elements are created and appended to the DOM which gets expensive.

I'd suggest you to "cache" your html on the page, but enforce browser to do not render it until necessary (until user request the data, or simply scroll to it). The main idea is to add your html (with data) to the page, but comment it out. For example,
<div id="cached-html">
<!--
<div>
...some custom html here
</div>
-->
</div>
Then once user requested the html, you can do the following:
var html = document.getElementById('cached-html'),
inner = html.innerHTML;
html.innerHTML = inner.substring(4, inner.length - 4);
Pros. is that you don't bother your browser with initial rendering (later you can simply user display:none to hide it again), so your page renders faster.
And another note - if your data (and as a consequence inner html) changes frequently, then it will be better to re-render it each time user request it, but if it is almost static, then hide/show should be more effective.

There can be problems either way, it depends on your page. If you already have a lot of elements on the page, it may be better load add them when you need them. If your page is already very "scripty" you may want to load the elements and show them when needed.
The real question is what would be better for your page, more script, or more dom elements.

When you have to display same setting div at multiple places.
Keeping that hidden is a better solution.
Remember that creating a new dom element or cloning a existing dom element gives almost same performance, but for code clarity/maintainence cloning or template is better.
Implementation using template: Make a template of div setting and keep that hidden:
<div class="template_setting">
Your settings(children of template_setting)
</div>
Javascript/Jquery code:
-Whenever someone opens a dialogue box, make a clone of childrens of template_setting and append to div_dialogue.
-As you may have multiple templates on the same page( which is not always true).
Apply a custom event on the id of newly created setting div.( keep id of each setting div different, you can increment each one by some character/number).
$('#dialogue_opener').click(function(event){
$('.template_setting').children().clone().appendTo(div_dialogue)
.trigger('adjustSettingID');

Consider a hybrid solution. Load the "settings" div after the page is ready. This way, the user won't feel the extra "expense", and you'll have the div ready for when you need it.

I've typically seen that rendering from JavaScript is pretty darn fast. I've built lots of "just in time" menus, grids, and forms and the users can't tell the difference. The nice thing about it is that you don't have to keep a form current, just blow it away and default everything to the data in you settings object. Makes for cleaner code in my opinion.

Using jQuery to disable everything on a page. Break my code

For my current project, I require the facility to be able to remove all functionality from a page, so that it is complete and literal static page. Removing the ability to follow any links, and disabling and javascript listeners allowing content to be changed on the page. Here is my attempt so far:
$("*").unbind().attr("href", "#");
But in the pursuit of a perfect script, and to allow it to work in every eventuality for any possible page (and with the uncertainty of a on liner being effective enough), I thought i'd consult the experts here at stackOverflow.
In summary, my question is, 'Can this be (and has it been) done in a one liner, is there anything this could miss?'. Please break this as best you can.

No. Nothing in this stops meta redirects, or timeouts or intervals already in flight, and it does nothing about same origin iframes (or ones that can become same origin via document.domain) that can reach back into the parent page to redynamize content.
EDIT:
The sheer number of ways scripts can stay submerged to pop up later is large, so unless you control all the code that can run before you want to do this, I would be inclined to say that it's impossible in practice to lock this down unless you have a team including some browser implementors working on this for some time.
Other possible sources of submarine scripts : XMLHttpRequests (and their onreadystatechange handlers), flash objects that canscript, web workers, and embedding code to run in things like Object.prototype.toString.

I did not want to write a lengthy comment so I'm posting this instead.
As #Felix Kling said, I don't think your code will remove the href attributes on every element but rather remove every element and then select their href attributes.
You probably need to write:
$("*").attr("href", "#").detach() ;
to remove the attributes instead of the elements.
Other than that, I doubt that you could remove the event handlers in one line. For one thing you would need to account for DOM level 2 Event registration (only settable with scripting) and DOM level 1 Event registration (via attributes or scripting).
As far as I'm concerned, your best bet is to make a shallow document copy using an XML parser and replace the old document (which you could backup-save to the window).

First: Your code will remove everything from the page, leaving a blank page. I cannot see how it would make the page "static".
$('*').detach();
will remove every element form the DOM. Nothing left. So yes, you remove every functionality in a way, but you also remove all the content.
Update: Even with the change from detach to unbind, the below points are still valid.
Event listeners added in the markup via oneventname="foo()" won't be affected.
javascript: URLs e.g. in images might still be triggered.
Event listeners added to window and document will persist.
See a DEMO.

Develop Reference

JavaScript is the programming language of the Web.

Convert GET Request (HTML string) to a full DOM object - javascript

Related

Page is not updating url or source code in chrome's dev tool when navigating the website

Why focus an input on page load instead of inline?

how to find a jquery or pure javascript associated with a class or div or any other elements in webpage

Is it more efficient to keep DOM elements on the page or to re-render them as needed?

Using jQuery to disable everything on a page. Break my code

Categories

Resources