Get at entire web page contents using Javascript - javascript

Is there a way to load the entire contents of a page into a javascript variable? (the page is not properly formatted HTML.) Ie store the page contents as a string in a variable. It only needs to work with Firefox.
I have some javascript running in one firefox tab that accesses the content of a page in another tab (the target window). Normally the content of the target is an HTML page so I can get at its content like this...
targetWindowName.document.getElementsByTagName("html")[0].innerHTML;
However I have come across a page that is not in proper HTML and so the above doesnt work.
(The actual content of this awkward page is JSON. I know this would be best loaded up with AJAX or something but I have a framework already setup to process HTML pages and it would be very handy if I can treat this particular (one off) page just like a regular HTML page.)
Thanks

Guess you can use:
win.document.documentElement.innerHTML

Read the file into a variable. Like you would any text file.
So, Page "A" has code that goes out and gets the HTML page contents and loads it into a variable.

Related

'document.getElementById' only works on index.html

Ultimate goal is to cycle through photos on a blog page. Seems like 'document.getElementById().src' would be a good approach.
Problem: To make sure the javascript code is successfully linking to the blog page, I tried testing with this in my script.js file:
document.getElementById('testID').innerHTML = "Running test";
and this in my .html file:
<div id="testID"></div>
But, the text "Running Test" does not show up on the blog page. However, when running this same exact test in my index.html page, it does work. Both .html files load the same script file along with jQuery. I don't understand why it works in one html file and not the other.
NEW FINDING:
This line of code now works on the blog page when I remove it from inside
$(document).ready(function(){ ... });
Why would that be?
The Javascript in the current page can only access HTML elements that are in pages that are currently loaded into the browser.
More specifically, document.getElementById() ONLY searches the current web page's document for matching elements. It does not search any other pages and certainly does not search other files on your server that are not loaded into the browser. "current web page" means the HTML loaded from the current URL in the browser bar.
When a web page is no longer visible in the browser window (e.g it's been replaced by some other page), it is gone and no longer reachable by any Javascript. In some specific cases, you can access document loaded into other tabs or other frames (subject to same-origin security rules and requires a different method of access).
In addition, no changes to a web page are persistent in the browser. As soon as a web page is no longer loaded into an active browser window, it is gone and reloading it again will load the original, unmodified version of that document.
If you want the same code from one page to run in another page, then you must include that same code in the other page. You can want, you can share a reference to the code by putting the code into its own page and then using a <script src="xxx.js"> tag in each page to cause the same code to get loaded into each page.
If interpret Question correctly, try using .load()
$("#container").load("/blog/blog_1.html #testID")

Where (and in what format) to store html code of a page within another html page?

I have a page that has an iframe with src=... etc
I need to build the same page but make it work off-line. One page, no multiple pages.
So I will do an http get, get the contents of the page in the iframe and try to store the html code in the main page. Then, onload, I will create an iframe on the fly and I will try to populate the iframe with the html code I stored earlier.
So, the stored html code will have <html><head> etc. The question is where to store the html code in the page (and in what format) in order to access it later?
In an <object> element?
In a <div> element?
In a JSON object or JavaScript variable?
Other idea?
in your filesystem you could also store the second html-document in the same folder and refference to it like
src="framecontent.html"
For your main page: create an offline app. If the content of the <iframe> is under your control, do the same for that page too. Your users will need to visit the online version once, after which the browser will cache the pages and serve them locally. No need to worry about storage: the browser will do it all for you.
If the <iframe> is not under your control, put it in localStorage (reference here). Your other suggestions won't persist over a page load.

cURL returns full HTML via AJAX - how to display to user?

I am building a Wordpress plugin to display a list of jobs to a user pulled from a recruiting platform API. On click of a job, a cURL request is sent to the API that pulls the job details as a full HTML page (the online job advertisement). I have everything working fine in terms of pulling the HTML, but I cannot figure out how to display it to the user.
How can I either:
Open a new tab to display the HTML pulled from the AJAX request
or
Open the full HTML within a div on the same page (i.e. a modal)
I would prefer to open the HTML in a new page, but don't know how to use jQuery to do this... Opening within the page in a modal is also fine, but as far as I understand iFrames (which I would rather not use anyway), you have to pass a url (and I simply have the full markup). Is there a way to display this within a page, perhaps using canvas? It carries its own links to CSS and Javascript that need to apply only within that sub-page.
EDIT:
As a clarification, I know that I can simply place the HTML within the page. My issue is that it is a full page. This means it has a <head> <body>, and its own CSS links. Just putting it in the page messes with the rest of the CSS and produces invalid HTML.
This is what I already have:
$.post(ajaxurl, data, function(response) {
$('.sg-jobad-full').html(response);
});
It places the response within the page perfectly well... but it messes up the page by introducing a <body> within a <body> and competing CSS.
If you put the response in a <div>, it will mess the markup because css/js/meta definitions may not be put into the <body>.
If there is a way to retrieve the data without the markup already beeing in, you could parse the data and let it print via a javascript, which is the method I'd prefere.
According to your comment, you should really go with iframes, all other methods will alter your markup to have <html> tags inside <html>, which is very bad practice.
Iframes can be styled just like a <div> element, and it is realy not dirty to use iframes for the purpose you mentioned (it does not load from a foreign host, it is not hidden, it does not track).
<iframe class="job-offers-plugin" src=".../wp-content/plugins/yourplugin/getJobs.php">
</iframe>
Put some style into it like width;height;padding;margin;overflow; place it where you like..
This helps you with the databse:
Using WPDB in standalone script?
Add permalinks to your plugin script:
http://teachingyou.net/wordpress/wordpress-how-to-create-custom-permalinks-to-use-in-your-plugins-the-easy-way/
If you get the full HTML in an jQuery.ajax(...) call, you can always just show it in a certain div on your page.
$.ajax({
success: function (resp){
// resp should be your html code
$("#div").html(resp);
}
});
You can use the $(selector).html(htmlCode) everywhere you want. You can insert it into modals, divs, new pages...
If you have to inject a whole HTML page you can:
strip the tags you don't need
or
use an iframe and write the content to that iframe: How to set HTML content into an iframe
iframes aren't my favourite thing... but it's a possibility

Load pages via AJAX and execute javascript and CSS

I've been searching for a while now, but I can't figure out how to load an entire page via AJAX and still execute all javascript and css.
Mostly I just end up with the plain text without any CSS.
Is there a way to do this? I tried jQuery.get, jQuery.load and jQuery.ajax, but none really work like that.
I have a different solution. You may try it with an iframe. Use jQuery to append an iframe script including all relevant codes into some part of your page (like some div). This may do it for you including CSS, like;
$('<iframe src="your_page.html"/>').appendTo('#your_div');
Or you may try something like;
$('<iframe src="your_page.html"/>').load(function(){
alert('the iframe is done loading');
}).appendTo('#your_div');
I have solved similar problem as following.
Download the webpage over ajax
Iterate it over and find any <script> and </script> tags
Get content from within these tags as text
Create new <script> element and insert there the code
Append the tag to your webpage
Another thing is you will need to somehow call the script..
I have done it this way:
I set standardized function names like initAddedScript callback which I am calling after appending the script to the page. Same as I have deinitScript called when I do not need the code (and its variables,..) anymore.
I must say this is awful solution, which likely means you have bad application architecture so as I have had:)
With css is it the same, but you do not need any handlers. Just append the style tag to your documents head.
If the page you load doesn't have any style data, then the external stylesheets must have relative paths that are not correct relative to the invoking document. Remember, this isn't an iFrame - you aren't framing an external document in your document, you're combining one document into another.
Another problem is that loading your complete page will also load the doctype, html, head, and body tags - which modern browsers will cope with most of the time, but the results are undefined because it's not valid HTML to jam one document into another wholesale. And this brings me to the third reason why it won't work: CSS links outside of the head section aren't valid, and the misplaced head section caused by your haphazard document-in-document collage.
What I'd do for compliance (and correct rendering) is this, which would be implemented in the Success callback:
Copy all link elements to a new jQuery element.
Copy the contents of all script in the head section
Copy the .html() contents from the loaded document's body tag
Append the link elements (copied out in step 1) to your host document's head
Create a new script tag with your copied script contents and stick it in the head too
Done!
Complicated? Kind of, I guess, but if you really want to load an entire page using AJAX it's your only option. It's also going to cause problems with the page's JavaScript no matter what you do, particularly code that's supposed to run during the initial load. There's nothing you can do about this. If it's a problem, you need to either rewrite the source page to be more load-friendly or you could figure out how to make an iFrame suit your needs.
It's also worth considering whether it'd work to just load your external CSS in the host document in the first place.
I suppose you are looking for something like this:
your page div --> load --> www.some-site.com
After a quik search the closest solution seems to be the one by "And": Load website into DIV
You have to run a web server and create a proxy.php page with this content:
Then your JQuery load() function should be like this:
$("#your_div_id").load("proxy.php?url=http://some-site.com");
NB. I have tested this solution and it should not load all the CSS from the target page, probably you'll have to recreate them. For example the image files stored on the remote server will not loaded, I suppose due to authentication policy.
You will be also able to view only the target page without the possibility to browse the target site.
Anyway I hope this could be a step forward to your solution.
Get your entire webpage as text using ajax
document.open();
document.write(this.responseText);
document.close();
OR
document.documentElement.outerHTML = this.responseText;
But you need to change the path of css and js pages in original webpage if the resulting webpage is in another directory.

Help with editing an existing Javascript file

I'm trying to edit the readability.js file from http://code.google.com/p/arc90labs-readability/.
It's a bookmarklet that "cleans" the current page by stripping everything except for the web page/web article title and body.
However, I'd like to edit the script so that when the bookmarklet is active, the current page is untouched but outputs the "cleaned" html file to a specified local directory instead.
Can anyone help? Thank you!
Note: The clean HTML file is called 'document.body.innerHTML'
To begin with, it can't be done without touching the original page. The way the script works, it edits the current page (so image urls continue to work, etc). The best you could do would be to store the innerHTML of the root html and then restore it after you have grabbed the content (or store the head and body separately) It would look something like this:
First you would need to store the existing innerHTML of the html element.
Next, you would have the script run as needed, just remove the readability-controls part.
Get the HTML contents of either the readability-content or the whole document and store it in a variable.
Restore the original content using the content stored in step 1 (so the page goes back to how it was before)
At this point, depending on your browser, you could either try to use a dataURI or you could dynamically add a reference to the Downloadify library, images, etc and add the download button to the page. Finally, clicking the "Download" button you could pre-supply the filename and the data stored in step 3, but the location would have to be selected every time.
Sorry this is so hypothetical, but it would take quite a bit of work to put this together.
You don't really need to modify the readability code. Just pull the contents of:
document.getElementById("readability-content");
You can then pass that onto a local script to be saved.

Categories

Resources