Get Source of Loaded URLs via Chrome Extension?

Get Source of Loaded URLs via Chrome Extension? - javascript

I'm writing a Chrome extension that needs to be able to analyze the source code of a specific HTML page and all the external Javascript and CSS files it loads without loading them again via an XHR request - that is, it will be analyzing the running copies loaded by the browser.
Is that possible? I know it's possible to analyze the source of a particular open tab, but while these Javascript files will be loaded by the browser, they obviously won't be occupying their own tab or window (only the HTML loading them will be.) Please help!

Out of the box, there is no way to get the source of the resources without resorting to the chrome.experimental.devtools.resources APIs.
However, when the experimental APIs are enabled using the --enable-experimental-extension-apis switch, you can do the following to retrieve the source of each resource:
chrome.experimental.devtools.resources.onFinished.addListener(function(resource) {
resource.getContent(function(content, encoding) {
if(encoding !== 'base64') {
alert(content);
}
});
});

Related

How do I take a converted file (pdf to html), and open it locally in a new tab in google chrome?

Basically I have a django app that communicates with a chrome extension. I have a bunch of functionality that interfaces with normal HTML pages that's all done by the extension. I want to allow users to have the same functionalities for PDF files. I have a python script which translates the pdf file into an html page.
The problem I run into is when the pdfs are open locally within chrome.
Like the following file:///home/wcr5048/Downloads/sample_pdf.pdf
This is my current solution, it basically gets the html and replaces all the current html, which is just an embedded pdf, and replaces it with the converted pdf(html). But I run into an issue because the "url" isn't really a url, and therefore I can't append html to something that doesn't exist.
function convert_to_html(request) {
console.log('converting to html...');
document.getElementsByTagName('html')[0].innerHTML = request.data;
chrome.runtime.sendMessage({
detail: 'refresh'
});
}
What I don't want to happen is to download a file just like the pdf but one that's been converted into html. I would rather have everything happen automatically.
I only see two possible options:
I create a unique link for the converted pdf file for every user, and then send the raw html string to populate the corresponding view.
I somehow tell the extension to use a popup to cover the entire width of the screen, and then populate it with the data.
Are there any suggested solutions that would be a better fit, and if not, which would be a better solution.
Thanks for viewing

Chrome extension: How to show custom UI for a PDF file?

I'm trying to write a Google Chrome extension for showing PDF files. As soon as I detect that browser is redirecting to a URL pointing to a PDF file, I want it to stop loading the default PDF viewer, but start showing my UI instead. The UI will use PDF.JS to render the PDF and jQuery-ui to show some other stuff.
Question: how do I make this? It's very important to block the original PDF viewer, because I don't want to double memory consumption by showing two instance of the document. Therefore, I should somehow navigate the tab to my own view.

As the main author of the PDF.js Chrome extension, I can share some insights about the logic behind building a PDF Viewer extension for Chrome.
How to detect a PDF file?
In a perfect world, every website would serve PDF files with the standard application/pdf MIME-type. Unfortunately, the real world is not perfect, and in practice there are many websites which use an incorrect MIME-type. You will catch the majority of the cases by selecting requests that satisfy any of the following conditions:
The resource is served with the Content-Type: application/pdf response header.
The resource is served with the Content-Type: application/octet-stream response header, and its URL contains ".pdf" (case-insensitive).
Besides that, you also have to detect whether the user wants to view the PDF file or download the PDF file. If you don't care about the distinction, it's easy: Just intercept the request if it matches any of the previous conditions.
Otherwise (and this is the approach I've taken), you need to check whether the Content-Disposition response header exists and its value starts with "attachment".
If you want to support PDF downloads (e.g. via your UI), then you need to add the Content-Disposition: attachment response header. If the header already exists, then you have to replace the existing disposition type (e.g. inline) with "attachment". Don't bother with trying to parse the full header value, just strip the first part up to the first semicolon, then put "attachment" in front of it. (If you really want to parse the header, read RFC 2616 (section 19.5.1) and RFC 6266).
Which Chrome (extension) APIs should I use to intercept PDF files?
The chrome.webRequest API can be used to intercept and redirect requests. With the following logic, you can intercept and redirect PDFs to your custom viewer that requests the PDF file from the given URL.
chrome.webRequest.onHeadersReceived.addListener(function(details) {
if (/* TODO: Detect if it is not a PDF file*/)
return; // Nope, not a PDF file. Ignore this request.
var viewerUrl = chrome.extension.getURL('viewer.html') +
'?file=' + encodeURIComponent(details.url);
return { redirectUrl: viewerUrl };
}, {
urls: ["<all_urls>"],
types: ["main_frame", "sub_frame"]
}, ["responseHeaders", "blocking"]);
(see https://github.com/mozilla/pdf.js/blob/master/extensions/chromium/pdfHandler.js for the actual implementation of the PDF detection using the logic described at the top of this answer)
With the above code, you can intercept any PDF file on http and https URLs.
If you want to view PDF files from the local filesystem and/or ftp, then you need to use the chrome.webRequest.onBeforeRequest event instead of onHeadersReceived. Fortunately, you can assume that if the file ends with ".pdf", then the resource is most likely a PDF file. Users who want to use the extension to view a local PDF file have to explicitly allow this at the extension settings page though.
On Chrome OS, use the chrome.fileBrowserHandler API to register your extension as a PDF Viewer (https://github.com/mozilla/pdf.js/blob/master/extensions/chromium/pdfHandler-vcros.js).
The methods based on the webRequest API only work for PDFs in top-level documents and frames, not for PDFs embedded via <object> and <embed>. Although they are rare, I still wanted to support them, so I came up with an unconventional method to detect and load the PDF viewer in these contexts. The implementation can be viewed at https://github.com/mozilla/pdf.js/pull/4549/files. This method relies on the fact that when an element is put in the document, it eventually have to be rendered. When it is rendered, CSS styles get applied. When I declare an animation for the embed/object elements in the CSS, animation events will be triggered. These events bubble up in the document. I can then add a listener for this event, and replace the content of the object/embed element with an iframe that loads my PDF Viewer.
There are several ways to replace an element or content, but I've used Shadow DOM to change the displayed content without affecting the DOM in the page.
Limitations and notes
The method described here has a few limitations:
The PDF file is requested at least two times from the server: First a usual request to get the headers, which gets aborted when the extension redirects to the PDF Viewer. Then another request to request the actual data.
Consequently, if a file is valid only once, then the PDF cannot be displayed (the first request invalidates the URL and the second request fails).
This method only works for GET requests. There is no public API to directly get response bodies from a request in a Chrome extension (crbug.com/104058).
The method to get PDFs to work for <object> and <embed> elements requires a script to run on every page. I've profiled the code and found that the impact on performance is negligible, but you still need to be careful if you want to change the logic.
(I first tried to use Mutation Observers for detection, which slowed down the page load by 3-20% on huge documents, and caused an additional 1.5 GB peak in memory usage in a complex DOM benchmark).
The method to detect <object> / <embed> tags might still cause any NPAPI/PPAPI-based PDF plugins to load, because it only replaced the <embed>/<object> tag's content when it has already been inserted and rendered. When a tab is inactive, animations are not scheduled, and hence the dispatch of the animation event will significantly be delayed.
Afterword
PDF.js is open-source, you can view the code for the Chrome extension at https://github.com/mozilla/pdf.js/tree/master/extensions/chromium. If you browse the source, you'll notice that the code is a bit more complex than I explained here. That's because extensions cannot redirect requests at the onHeadersReceived event until I implemented it a few months ago (crbug.com/280464, Chrome 35).
And there is also some logic to make the URL in the omnibox look a bit better.
The PDF.js extension continues to evolve, so unless you want to significantly change the UI of the PDF Viewer, I suggest to ask users to install the PDF.js's official PDF Viewer in the Chrome Web Store, and/or open issues on PDF.js's issue tracker for reasonable feature requests.

Custom PDF Viewer
Basically, to accomplish what you want to do you'll need to:
Interject the PDF's URL when it's loaded;
Stop the PDF from loading;
Start your own PDF viewer and load the PDF inside it.
How to
Using the chrome.webRequest API you can easily listen to the web requests made by Chrome, and, more specifically, the ones that are going to load .pdf files. Using the chrome.webRequest.onBeforeRequest event you can listen to all the requests that end with ".pdf" and get the URL of the requested resource.
Create a page, for example display_pdf.html where you will show the PDFs and do whatever you want with them.
In the chrome.webRequest.onBeforeRequest listener, prevent the resource from being loaded returning {redirectUrl: ...} to redirect to your display_pdf.html page.
Pass the PDF's URL to your page. This can be done in several ways, but, for me, the simplest one is to add the encoded PDF URL at the end of your page's url, like an encoded query string, something like display_pdf.html?url=http%3A%2F%2Fwww.example.com%2Fexample.pdf.
Inside the page, get the URL with JavaScript and process and render the PDF with any library you want, like PDF.js.
The code
Following the above steps, your extension will look like this:
<root>/
/background.js
/display_pdf.html
/display_pdf.js
/manifest.json
So, first of all, let's look at the manifest.json file: you will need to declare the permissions for webRequest and webRequestBlocking, so it should look like this:
{
"manifest_version": 2,
"name": "PDF Test",
"version": "0.0.1",
"background": {
"scripts": ["/background.js"]
},
"permissions": ["webRequest", "webRequestBlocking", "<all_urls>"],
}
Then, in your background.js you will listen to the chrome.webRequest.onBeforeRequest event and update the tab which is loading the PDF with the URL of your custom display_pdv.html page, like this:
chrome.webRequest.onBeforeRequest.addListener(function(details) {
var displayURL;
if (/\.pdf$/i.test(details.url)) { // if the resource is a PDF file ends with ".pdf"
displayURL = chrome.runtime.getURL('/display_pdf.html') + '?url=' + encodeURIComponent(details.url);
return {redirectUrl: displayURL};
// stop the request and proceed to your custom display page
}
}, {urls: ['*://*/*.pdf']}, ['blocking']);
And finally, in your display_pdf.js file you will extract the PDF's url from the query string and use it to do whatever you want:
var PDF_URL = decodeURIComponent(location.href.split('?url=')[1]);
// this will be something like http://www.somesite.com/path/to/example.pdf
alert('The PDF url is: ' + PDF_URL);
// do something with the pdf... like processing it with PDF.js
Working Example
A working example of what I said above can be found HERE.
Documentation links
I recommend you to take a look at the official documentation of the above specified APIs, that you can find following these links:
chrome.webRequest API
chrome.webRequest.onBeforeRequest event
chrome.runtime API
chrome.runtime.getURL method

How can I get the source code from a page using javascript

I'm trying to get the source code of that page (using js and d3js library)
http://www.brightpointinc.com/interactive/budget/index.html?source=d3js
I want to run it localy, so i've downloaded the source of the css file and "scripts/d3.js" file and places those file in the right place. However it seems that the javascript don't want to load anyway. Is it possible to run the page localy getting the source code ? If that's the case, how should I do ?

In firefox, you can simply right-click the page and do Save As... to download the complete website including all referenced files.
I looked at the website you mention and the reason that this doesn't work for this website is because this website is not completely client-side. The javascript does server-requests, which won't work on a local copy, because of security reasons.
This results in the following error:
NS_ERROR_DOM_BAD_URI: Access to restricted URI denied
d3.js (row 1674): request.send(data == null ? null : data);

Dynamically reload local JavaScript source / JSON data

What are the possible cross-browser (at least Firefox & Chrome) methods to dynamically reload a local JavaScript file that is referenced by a locally loaded HTML file?
Background:
A local HTML page is being used to render some data that is formatted and displayed by two referenced JavaScript files. One file contains the JavaScript code and the other file contains JSON data.
This JSON data is updated on disk by another program and it would be nice to have the UI automatically incorporate these updates without manually reloading the page (or opening a new page).
In Firefox, I believe the issue could be resolved using AJAX to load the HTML, but in Chrome this will not work due to the same origin policy failures (I unfortunately cannot necessarily rely on --disable-web-security to mitigate this since all prior instances of Chrome must be closed for that to work).
The only solution I see is to run a local web server, but I am hoping for something simpler and less invasive (Perhaps loading the JavaScript in an iframe and reloading the iframe, although I imagine this would be prevented by browser security).
Does anyone have any recommendations?

If your app starts up Chrome then you can include the --allow-file-access-from-files flag in the start command.

I use the following code to reload JavaScript and JSON files.
/* Load a JavaScript or JSON file to use its data */
function loadJsFile(filename, dataIsLoaded){
var fileref = document.createElement('script');
fileref.setAttribute("type","text/javascript");
fileref.setAttribute("src", filename);
fileref.onload = dataIsLoaded;
if (typeof fileref!="undefined"){
document.getElementsByTagName("head")[0].appendChild(fileref);
}
}
/* The callback that is invoked when the file is loaded */
function dataIsLoaded(){
console.log("Your data is ready to use");
}
Usage when the JSON file is in the same directory as the website:
var jsonFile= "myData.json";
loadJsFile(jsonFile, dataIsLoaded);
I tested it successfully in IE10 and Firefox 22; it doesn't work in Chrome though.

QWebView can't display external image content set via javascript $("#<node_name>").html(<html_content>)

I'm migrating project from MFC to Qt now, and it's using embedded web browser, which displays local (resource) html-page. Local page is displayed fine, no problems. But i have a problem to set html content to child tag. QWebView can't display external images set via javascript $("#").html() - only text and local (resource) images are displayed. In MFC version with IE webview the same script works fine.
I've tried to use QWebElement::setInnerXml, but result is the same: only local content is displayed.
After that i've tried to use QWebFrame::setHtml, but after call app crashes somewhere in QWebPuginDatabase::searchPathes, despite that i'm calling QWebFrame::setHtml from main thread.
Did anyone meet the same problem? Has anyone solution to resolve my problem?
Thank you

You may need to change a setting, try:
QWebSettings::globalSettings()->setAttribute(
QWebSettings::LocalContentCanAccessRemoteUrls, true);
The QWebSettings documentation describes the attribute as (emphasis mine):
Specifies whether locally loaded documents are allowed to access remote urls. This is disabled by default. For more information about security origins and local vs. remote content see QWebSecurityOrigin.

Develop Reference

JavaScript is the programming language of the Web.