Unable to access the whole content of the downloaded html file - javascript

My original task is to download multiple scientific publications as html file. Currently my script downloads a file in chrome but it takes to the url in firefox. But that is not my questions.
If you will see the downloaded html source, you will find that not all content has got downloaded. Only some of the content shows up in the downloaded html file. That is my problem. Why I am not able to get the whole html document content in the downloaded html file. The file I want to download is this
var links = [
'http://www.sciencedirect.com/science/article/pii/S2078152015000516'
];
I thought probably it is because of CORS issue. But, after implementing CORS script, it was still showing the partially downloaded content in the responseText.
Any assistance will be appreciated.
Also, if someone can tell me why in firefox, the script does not downloads the file and takes me to the url instead.

The reason why you are unable to download the entire page, is because the page only loads half way, and the rest is added dynamically once you scroll down.
Therefore, when you try to download the page, you only receive the initially loaded half without the dynamic part.
since it is done using javascript, this particular website offers you an alternative in case you have javascript disabled and do not want to/cant enable it (like with a reader):
If you view the source of the page, you can locate the following message box at the very beginning of the body:
<div class="ua_btn" role="region" aria-label="screen reader compatability">
<a role="button" rel="nofollow" href="http://www.sciencedirect.com/science/article/pii/S2078152015000516?np=y">
Screen reader users, click here to load entire article
</a>
This page uses JavaScript to progressively load the article content as a user scrolls.
Screen reader users, click the load entire article button to bypass dynamically loaded article content.
</div>
here you are offered a link with a query part "np=y" which overrides the dynamic loading and initializes the whole page right away:
http://www.sciencedirect.com/science/article/pii/S2078152015000516?np=y
use this link in order to download the artice and it will work.
Firefox:
As mentioned in the comments, firefox does not support CORS downloads by design due to potential security risks. more about it can be found Here

Related

How to download a canvas rendered pdf file opening in pdf.js?

I have zero knowledge with regard to coding. I desperately need to download a pdf file which is being shown in the fashion attached to this post. The download button is not working and I've tried everything which I can to download the file. Any help is welcome. Things which I've tried are
Finding file source in network tab under inspect element
Finding any URl leading to the pdf again in the inspect elements tab.
Saving the page as Html, upon which it downloads but never opens again with the pdf required.
Also in my limited research which I could make sense of was, the page used canvas element to render the pdf. Text of pdf is seen to be entered manually in a separate layer.
The address of the pdf being rendered is actually visible in the url on your image.
the ../../ means go up 2 directories.
So that means the absolute url for your pdf is this.
https://www.time4education.com/MoodlePages/catmttt/cat20materialvideos/VAHO1002103.pdf

How to create a chrome extension code to extract ads from a webpage and save it as an html file?

I am developing a chrome extension for fetching ads from a web page. What I am trying to do is that:
My extension should look for HTML5 banner ads from the opened web page.
It should detach the ad code and save it to my computer as an html file.
The html file created should not depend on an external JS or CSS file. It means when it gets detached, the CSS or JS code attached to it should be detached and saved as a part of the html page (not a hyper link).
I was wondering if there are any existing libraries or open source plugins that do that. If not, can anyone point me in the right direction where to begin?
This won't directly pick out banner ads for you, you'll need to do that yourself, but all the functionality you're hoping for is available using content scripts.

'document.getElementById' only works on index.html

Ultimate goal is to cycle through photos on a blog page. Seems like 'document.getElementById().src' would be a good approach.
Problem: To make sure the javascript code is successfully linking to the blog page, I tried testing with this in my script.js file:
document.getElementById('testID').innerHTML = "Running test";
and this in my .html file:
<div id="testID"></div>
But, the text "Running Test" does not show up on the blog page. However, when running this same exact test in my index.html page, it does work. Both .html files load the same script file along with jQuery. I don't understand why it works in one html file and not the other.
NEW FINDING:
This line of code now works on the blog page when I remove it from inside
$(document).ready(function(){ ... });
Why would that be?
The Javascript in the current page can only access HTML elements that are in pages that are currently loaded into the browser.
More specifically, document.getElementById() ONLY searches the current web page's document for matching elements. It does not search any other pages and certainly does not search other files on your server that are not loaded into the browser. "current web page" means the HTML loaded from the current URL in the browser bar.
When a web page is no longer visible in the browser window (e.g it's been replaced by some other page), it is gone and no longer reachable by any Javascript. In some specific cases, you can access document loaded into other tabs or other frames (subject to same-origin security rules and requires a different method of access).
In addition, no changes to a web page are persistent in the browser. As soon as a web page is no longer loaded into an active browser window, it is gone and reloading it again will load the original, unmodified version of that document.
If you want the same code from one page to run in another page, then you must include that same code in the other page. You can want, you can share a reference to the code by putting the code into its own page and then using a <script src="xxx.js"> tag in each page to cause the same code to get loaded into each page.
If interpret Question correctly, try using .load()
$("#container").load("/blog/blog_1.html #testID")

changing images within a 'flash embedded image'

We are a little stuck however on the following embedded flash image:
[kml_flashembed movie="http://www.griffintaxfree.com/images/logos/stacklogos.swf" height="250" width="500" /]
I need to know how to open up this, change the images within it, and then re-post it to our web page.
It was created by someone who no-longer handles our site.
You need a Flash Decompiler. Once the program is compiled to SWF it is very difficult to extract what's inside.
If you can get the program to display the image it's probably easier to take a screenshot.
Downloading the file...
If you can navigate directly to the URL (your case):
Use the Save Page As... feature, typically under File, or Page Settings in most browsers
If it is embedded within a page:
You can usually navigate to the file itself, by looking at the available items on the page, such as using the following in Firefox:
Tools => Page Info => Media => Find the flash file => Save As...
or by scouring through the source and finding a link to the file itself and saving it.
Editing the file...
The editing process however, can be much more difficult, especially if you don't have anyone with experience working with Flash. You will need a Flash Decompiler to make the file editable.

How to show loading screen before PDF document

In our application we have links to dynamically generated PDF documents. The links look something like this host/22-5/file_3136.pdf so to the browser it seems like a static pdf document. When link is clicked it opens a new window. That window receives PDF document only (no HTML) with headers like:
Content-Disposition: inline; filename=file_3136.pdf
Content-Type: application/pdf
We want users to be able to see the PDF in the browser if PDF plug-in is installed and to be able save the document with correct filename.
Now we want to add a loading screen that would be shown while the PDF is being generated. Whats the best way to do that, while retaining the current functionality.
One option would be to show the loading screen and then to redirect to PDF when generation is complete. This would require me to retain the PDF on the server for some time. Currently they are being deleted as soon as the response is sent.
Another option is to send some HTML and javascript (to show the loading page) with <embed>, <iframe> or <object> tag that points to the pdf on the server.
What the best approach? What works with most browsers?
On download sites, I often see an additional (small) window pop up. I believe that window acts as a "choreographer" to control what displays on the main page while also firing off a redirect to the download file.
HTML redirects. You create a page that redirects "to itself" every few seconds. When the PDF is done, you generate a redirect to that instead.
To actually preload the PDF file so that the impression of instant loading is given. in the page prior to the pdf linked page add a preload script:
<img id="pdfLoader" src="preloader.jpg"/>
You can get a preloader image from ajaxload.info
<script language="javascript" type="text/javascript">
//<![CDATA[
<!--
var pdfLoader = document.getElementById("pdfLoader");
pdfLoader.src = "http://mysite.com/mypdf.pdf";
//-->
//]]>
</script>
The code above placed between and in the page containing the link to the .pdf file (or html file with the .pdf in it) instructs the browser to download the pdf file to the browser cache, but just leave it there. (filetype is irelevant image() is convenient, use the same script for any filetype as it is not ever going to be rendered) the download happens after the page is fully rendered so does not delay the current page. on clicking the link to the .pdf file (or html page) the browser finds the .pdf in the browser cache and does not download it, but displays from the cache, at apparently blinding download speeds.
In browsers with javascript disabled, the function degrades gracefully
If you linearize the PDf you can also display the first page very quickly. I wrote 2 articles introducing linearized PDF at http://www.jpedal.org/PDFblog/2010/11/do-i-have-to-download-the-whole-pdf-if-i-view-it-across-the-internet/ and http://www.jpedal.org/PDFblog/2010/02/linearized-pdf-files/

Categories

Resources