How to reference HTML from external webpage - javascript

I apologize in advance for the rudimentary question.
I have web page A that has a link to web page B on it. I need to locate the link to web page B (easy enough), and then store the HTML from web page B in a variable in my javascript script.
To store the HTML from web page A, I know it's a simple:
html_A = document.body.innerHTML;
How do I store the HTML from web page B? I believe I need to use AJAX correct? Or can I do it with javascript? And if it's the former, let's just assume the server for web page B allows for it.
Thank you in advance!

If youre trying to load HTML from a website that resides on a different server you will get a Cross-Origin Request Blocked Error. I dealt with this in the past and found a way to do it using YQL. Try it out:
//This code is located on Website A
$(document).ready(function() {
var websiteB_url = 'http://www.somewebsite.com/page.html';
var yql = '//query.yahooapis.com/v1/public/yql?q=' + encodeURIComponent('select * from html where url="' + websiteB_url + '"') + '&format=xml&callback=?';
$.getJSON(yql, function(data) {
function filterDataCUSTOM(data) {
data = data.replace(/<?\/body[^>]*>/g, '');// no body tags
data = data.replace(/[\r|\n]+/g, ''); // no linebreaks
return data;
}
if (data.results[0]) {
var res = filterDataCUSTOM(data.results[0]);
$("div#results").html(res);
} else {
console.log("Error: Could not load the page.");
}
});
});

This is only possible if web page B is on the same domain due to the same-origin policy security feature of all major browsers.
If both pages are on the same domain you could do
$.get("/uri/to/webpage/b").then(function(html) {
//do something with the html;
});
Note that the html will be available only once the ajax request finishes inside the .then(...) function. It will NOT be available on the line after this code block.
Hard to tell without knowing more about your situation but this is rarely the correct thing to do. You might want to look into $.fn.load() (is limited by SOP) or using iframes (is not limited by SOP) as one of these might be more appropriate.
I should note that the standard way of doing this when you need access to html from another domain is to pull it down on your webserver and then re-serve it from there. That being said, it is probably a violation of that website's terms of use.

Related

Print a form from another link on clicking a button

I'm trying to figure out how to print an image from another web page link on clicking a button.
I know window.print() but how could I specify the other link I want to print the image from?
Same domain
If the page you wish to print is from the same domain as the iframe's parent then MDN has a good example of how to do this.
You should create a hidden iframe, load your page in it, print the iframe contents and then remove the iframe.
JavaScript:
function printURL( url ) {
var frame = document.createElement( "iframe" );
frame.onload = printFrame;
frame.style.display = 'none';
frame.src = url;
document.body.appendChild(frame);
return false;
}
function printFrame() {
this.contentWindow.__container__ = this;
this.contentWindow.onbeforeunload = closeFrame;
this.contentWindow.onafterprint = closeFrame;
this.contentWindow.focus(); // Required for IE
this.contentWindow.print();
}
function closeFrame () {
document.body.removeChild(this.__container__);
}
HTML:
<button onclick="printURL('page.html');">Print external page!</button>
Cross domain
If the page you wish to print is from another domain then your browser will throw a Same-Origin Policy error. This is a security feature that forbids scripts accessing some data from different domains.
To print cross domain content you will need to scrape the page's source and load it into the iframe. The browser will then believe that the iframe's content comes from your domain and won't hiccough when you try to print.
However, if you try to do this in the frontend, this just pushes the problem back one step further, as the same-origin policy also won't let you scrape content from another domain in this way. But the same-origin policy for data scraping is the equivalent of tying a bull up with cotton thread - it doesn't really hold you back - so this hurdle is easily circumvented. You can either write your own backend script (in PHP or your choice of language) that will scrape the content and deliver it to your page, or you can use any one of a number of web services that already do this. https://multiverso.me/AllOrigins/ is as good as any, it doesn't require backend programming, and it's free so I'll use that in this example.
Using Jquery, the modified printURL function from above would be:
function printURL( url ) {
var jsonUrl = 'http://allorigins.me/get?url=' + encodeURIComponent(url) + '&callback=?';
// the url / php function that allows content to be scraped from different origins.
$.getJSON( jsonUrl, function( data ) {
// get the scraped content in data.content
var frame = document.createElement( "iframe" ),
iframedoc = frame.contentDocument || frame.contentWindow.document;
frame.onload = printFrame;
frame.style.display = 'none';
iframedoc.body.html( data.contents );
document.body.appendChild(frame);
}
return false;
}
The other functions from above would remain the same.
Note that if the page you're printing is built using AJAX calls or is significantly styled with scripting then the iframe may print something that looks quite unlike what you were expecting.

Browser does not replace an include file by its including file

I am an absolute beginner in JS.
1) What I'm trying to do:
My web pages are composed of an index.php which is the same for all the files of a directory and one of a set of content.inc, like this: index.php?open=content.inc. This is done by a PHP snippet in the index.php and works well.
However, Google indexes all the content.inc files. The user's browser then displays the content.inc without the framing index.php. This I want to avoid. I therefore add a modest script at the beginning of each content.inc (which I would convert into a function once it runs) to tell the browser that instead of displaying the content.inc, it should display index.php?open=content.inc.
2) My unworkable solution:
var url = window.location.pathname;
var filename = url.substring(url.lastIndexOf('/')+1);
if (filename.indexOf("index.php") = -1)
{ var frame_name = "index.php?open="+filename;
window.location.replace(frame_name);
};
The browser (Firefox 60) ignores this; it displays content.inc. (I also have versions of this script which get the browser into an endless loop.)
What is wrong here? Please help!
PS: Please be assured that I have done extensive web search on this problem and found many pages of complaints about location.replace getting into an infinite loop; but none matches my situation. However, I gratefully accept a helpful link as an answer.
For starters, you have an error in this line:
if (filename.indexOf("index.php") = -1)
That's an assignment and will always evaluate to true, you need to use == or === (which should be more performant).
The guilty line is on your test case (see JP de la Torre answer). And to improve, here's a snippet to demo how to analyze the url with a regular expression :
function redirect(url) {
if(url && url.indexOf('.inc') >= 0) {
return url.replace(/\/(\w+)\.inc/, '/index.php?open=$1');
}
return url;
}
let urls = [
window.location.href,
'http://google.fr',
'http://example.com/index.php?open=wazaa',
'http://example.com/wazza.inc'
];
urls.forEach(url => {
console.log(url, ' => ', redirect(url));
});
The regexp will capture any text between a / and .inc. You can use it then as replacement value with the $1.
And applied to your case, you simply need :
if(window.location.href.indexOf('.inc') >= 0) {
window.location.href = window.location.href.replace(/\/(\w+)\.inc/, '/index.php?open=$1');
}
You can also use .htaccess server side to redirect request for .inc files on your index.php if mod_rewrite is enabled.
The solution to the problem of including an INC file called separately is the one proposed by Bertrand in his second code snippet above. It presupposes (correctly) that the inc extension is omitted in the replacement.
As I reported above, Firefox may get into an endless loop if it opens a PHP file directly, i.e. without involving the local host (with its php module).

detect XHR on a page using javascript

I want to develop a Chrome extension, just imagine when Facebook loads you are allowed to add extra JS on it.
But my problem is I can't modify the DOM of the later content, which means the newly loaded content that appear when the user scrolled down.
So I want to detect XHR using JavaScript.
I tried
send = XMLHttpRequest.prototype.send;
XMLHttpRequest.prototype.send = function() {
/* Wrap onreadystaechange callback */
var callback = this.onreadystatechange;
this.onreadystatechange = function() {
if (this.readyState == 4) {
/* We are in response; do something, like logging or anything you want */
alert('test');
}
callback.apply(this, arguments);
}
_send.apply(this, arguments);
}
But this is not working.. any ideas?
Besides Arun's correct remark that you should use _send for both, your approach doesn't work because of how Content Scripts work.
The code running in the content script works in an isolated environment, to prevent it from conflicting with page's own code. So it's not like you described - you're not simply adding JS to the page, you have it run isolated. As a result, your XHR replacement only affects XHR calls from your extension's content scripts and not the page.
It's possible to inject the code into the page itself. This will affect XHR's from the page, but might not work on all pages, if the Content Security Policy of the page in question disallows inline code. It seems like Facebook's CSP would allow this. Page's CSP should not be a problem according to the docs. So, this approach should work, see the question I linked.
That said, you're not specifically looking for AJAX calls, you're looking for new elements being inserted in the DOM. You can detect that without modifying the page's code, using DOM MutationObservers.
See this answer for more information.
to detect AJAX calls on a webpage you have to inject the code directly in that page and then call the .ajaxStart or .ajaxSuccess
Example:
// To Successfully Intercept AJAX calls, we had to embed the script directly in the Notifications page
var injectedCode = '(' + function() {
$('body').ajaxSuccess(function(evt, request, settings) {
if (evt.delegateTarget.baseURI == 'URL to check against if you want') {
// do your stuff
}
});
} + ')();';
// Inserting the script into the page
var script = document.createElement('script');
script.textContent = injectedCode;
(document.head || document.documentElement).appendChild(script);
script.parentNode.removeChild(script);

Is it possible to control Firefox's DNS requests in an addon?

I was wondering if it was possible to intercept and control/redirect DNS requests made by Firefox?
The intention is to set an independent DNS server in Firefox (not the system's DNS server)
No, not really. The DNS resolver is made available via the nsIDNSService interface. That interface is not fully scriptable, so you cannot just replace the built-in implementation with your own Javascript implementation.
But could you perhaps just override the DNS server?
The built-in implementation goes from nsDNSService to nsHostResolver to PR_GetAddrByName (nspr) and ends up in getaddrinfo/gethostbyname. And that uses whatever the the system (or the library implementing it) has configured.
Any other alternatives?
Not really. You could install a proxy and let it resolve domain names (requires some kind of proxy server of course). But that is a very much a hack and nothing I'd recommend (and what if the user already has a real, non-resolving proxy configured; would need to handle that as well).
You can detect the "problem loading page" and then probably use redirectTo method on it.
Basically they all load about:neterror url with a bunch of info after it. IE:
about:neterror?e=dnsNotFound&u=http%3A//www.cu.reporterror%28%27afew/&c=UTF-8&d=Firefox%20can%27t%20find%20the%20server%20at%20www.cu.reporterror%28%27afew.
about:neterror?e=malformedURI&u=about%3Abalk&c=&d=The%20URL%20is%20not%20valid%20and%20cannot%
But this info is held in the docuri. So you have to do that. Here's example code that will detect problem loading pages:
var listenToPageLoad_IfProblemLoadingPage = function(event) {
var win = event.originalTarget.defaultView;
var docuri = window.gBrowser.webNavigation.document.documentURI; //this is bad practice, it returns the documentUri of the currently focused tab, need to make it get the linkedBrowser for the tab by going through the event. so use like event.originalTarget.linkedBrowser.webNavigation.document.documentURI <<i didnt test this linkedBrowser theory but its gotta be something like that
var location = win.location + ''; //I add a " + ''" at the end so it makes it a string so we can use string functions like location.indexOf etc
if (win.frameElement) {
// Frame within a tab was loaded. win should be the top window of
// the frameset. If you don't want do anything when frames/iframes
// are loaded in this web page, uncomment the following line:
// return;
// Find the root document:
//win = win.top;
if (docuri.indexOf('about:neterror') == 0) {
Components.utils.reportError('IN FRAME - PROBLEM LOADING PAGE LOADED docuri = "' + docuri + '"');
}
} else {
if (docuri.indexOf('about:neterror') == 0) {
Components.utils.reportError('IN TAB - PROBLEM LOADING PAGE LOADED docuri = "' + docuri + '"');
}
}
}
window.gBrowser.addEventListener('DOMContentLoaded', listenToPageLoad_IfProblemLoadingPage, true);

Making a Same Domain iframe Secure

tl;dr Can I execute un-trusted scripts on an iframe safely?
Back story:
I'm trying to make secure JSONP requests. A lot of older browsers do not support Web Workers which means that the current solution I came up with is not optimal.
I figured I could create an <iframe> and load a script inside it. That script would perform a JSONP request (creating a script tag), which would post a message to the main page. The main page would get the message, execute the callback and destroy the iframe. I've managed to do this sort of thing.
function jsonp(url, data, callback) {
var iframe = document.createElement("iframe");
iframe.style.display = "none";
document.body.appendChild(iframe);
var iframedoc = iframe.contentDocument || iframe.contentWindow.document;
sc = document.createElement("script");
sc.textContent = "(function(p){ cb = function(result){p.postMessage(result,'http://fiddle.jshell.net');};})(parent);";
//sc.textContent += "alert(cb)";
iframedoc.body.appendChild(sc);
var jr = document.createElement("script");
var getParams = ""; // serialize the GET parameters
for (var i in data) {
getParams += "&" + i + "=" + data[i];
}
jr.src = url + "?callback=cb" + getParams;
iframedoc.body.appendChild(jr);
window.onmessage = function (e) {
callback(e.data);
document.body.removeChild(iframe);
}
}
jsonp("http://jsfiddle.net/echo/jsonp/", {
foo: "bar"
}, function (result) {
alert("Result: " + JSON.stringify(result));
});
The problem is that since the iframes are on the same domain, the injected script still has access to the external scope through .top or .parent and such.
Is there any way to create an iframe that can not access data on the parent scope?
I want to create an iframe where scripts added through script tags will not be able to access variables on the parent window (and the DOM). I tried stuff like top=parent=null but I'm really not sure that's enough, there might be other workarounds. I tried running a for... in loop, but my function stopped working and I was unable to find out why.
NOTE:
I know optimally WebWorkers are a better isolated environment. I know JSONP is a "bad" technique (I even had some random guy tell me he'd never use it today). I'm trying to create a secure environment for scenarios where you have to perform JSONP queries.
You can't really delete the references, setting null will just silently fail and there is always a way to get the reference to the parent dom.
References like frameElement and frameElement.defaultView etc. cannot be deleted. Attempting to do so will either silently fail or throw exception depending on browser.
You could look into Caja/Cajita though.
tl;dr no
Any untrusted script can steal cookies (like a session id!) or read information from the DOM like the value of a credit card input field.
JavaScript relies on the security model that all code is trusted code. Any attempts at access from another domain requires explicit whitelisting.
If you want to sandbox your iframe you can serve the page from another domain. This does mean that you can't share a session or do any kind of communication because it can be abused. It's just like including an unrelated website. Even then there are possibilities for abuse if you allow untrusted JavaScript. You can for instance do: window.top.location.href = 'http://my.phishing.domain/';, the user might not notice the redirect.

Categories

Resources