I am using phantomjs to retrieve CSS information from a page without execute its javascript. For example here is the code snippet.
page.settings.javascriptEnabled = false;
page.open('file:///home/sample.html', function(status) {
if (status !== 'success') {
console.log('Unable to access network');
} else {
page.includeJs("file:///home/sample.js", function() {
var class = page.evaluate(function() {
return document.querySelector('body').className;
});
console.log(class);
});
}
}
If I disabled the javascript, the evaluate function always return null. But when I tried to enable the javascript, the evaluate function will return some value. Is there any idea to disable the javascript in the page, but my included javascript have to work ?
No
page.evaluate() executes JavaScript on the page. If you disable JavaScript in PhantomJS, then you effectively can't use page.evaluate() anymore. And with it goes every way of accessing DOM elements. page.includeJs() will also not work, because it the script cannot be executed on the page.
You can still access page.content which provides access to the current page source (computed source). You may try to use some DOM library to parse the source into a DOM object1 or if the task is simple, you may try to use Regular Expressions.
1 Note that PhantomJS and node.js have different execution environments, so most node.js modules that deal with the DOM won't work
As suggested by Artjom, there is no way to disable execution of the target website JavaScript without disabling PhantomJS ability to execute JavaScript on the page. However, there is a simple way to ensure that no scripts are executed by the target website (which achieves the same result, at the end).
Create a HTTP proxy that intercepts all requests.
Detect responses with Content-Type: text/html.
Remove all <script> tags from the document.
You can configure phantomjs to use proxy using --proxy configuration.
Use http-proxy to create a proxy server.
Use cheerio to remove, comment out, or otherwise invalidate the <script> tags.
Related
I have to use a local analytics.js that I serve from my server. I just want to use the local version if necessary - so is there a solution for checking if the call for analytics.js has failed?
I thought about checking it with a global window.onerror, but I don't think a failed call for an external file causes an error. I've tried checking if ga() is available, but it is even if analytics.js isn't loaded.
Any ideas? If you are wondering, not all users of this site has internet access, that's why I'm serving a local version. There is more things happening in this case, like adding a sendHitTask to redirect the answer from analytics.js to the local server.
EDIT
A solution where you check if the user has Internet access would also be OK. But I have not found any solution for this either that works on all modern browsers.
There's a function to track if the library has loaded. From the docs:
ga(function(tracker) {
var defaultPage = tracker.get('page');
});
The passed in function is executed when the library is loaded, so you could set a variable to keep track of whether or not it has loaded. You'd have to put it on some sort of timer to decide when you want to consider it failed:
var loaded = false;
ga(function() {
loaded = true;
});
// after one second do something if the library hasn't loaded
setTimeout(function(){
if (!loaded){
//do something
}
},1000);
instead of waiting for a callback, you can easily get it with
if(window.ga && ga.loaded) {
// yeps... it is loaded!
}
you can easily see this in the Firefox documentation
same trick can be applied if you want to see if the tracker is blocked (by any plugin for example)
if(window.ga && ga.q) {
// yeps... blocked! >:o
}
A particularly elegant solution would be to use RequireJS and leverage its support for fallback paths. I do this on my site to load a stub version of analytics.js if loading GA fails because the visitor uses a privacy tool blocking the request:
http://veithen.github.io/2015/02/14/requirejs-google-analytics.html
Your use case is similar, except that you want to fallback to a complete local copy. You also probably don't want to change all calls to GA as described in that article. If that's the case then you could use a hybrid approach where you only use RequireJS to load analytics.js (Google's version or the local copy), without changing any other code.
Setting this up would involve the following steps:
Add RequireJS to your site and configure it as follows:
require.config({
paths: {
"ga": [
"//www.google-analytics.com/analytics",
"local-copy-of-analytics"
]
}
});
Use the alternative version of the tracking code, but replace <script async src='//www.google-analytics.com/analytics.js'></script> with the following JavaScript code:
require(["ga"]);
Ultimatly what I want to do is look for, is the text It's just you! or It's not just you! within the container ID of: '< div id="container">' within the source of the resulting page of isup.me/echelonservices.co.uk.
I have found a way to do this now, but cannot point the Java to the correct site. What I have done is use the following URL: isup.me/echelonservices.co.uk. However, I am having problems doing this. Can someone let me know of a way of either using Javascript or another script source that can be used, without hosting the webpage from webserver. So running the webpage from local client computer.
Here is my last attempt I have come up with and failed miserably so far:
<script type="text/javascript">
//Specify where the sctipt should be running the code against.
window.location.protocol = "http"
window.location.host = "isup.me/echelonservices.co.uk"
//Look for the Class ID and output to a variable.
element = document.getElementById('container');
var Status = element.textContent || element.innerText;
//If the "Status" says: UP write UP, DOWN write DOWN and anything else write Status could not be determined!
if (Status=="UP") {document.write('<div style="color:#00BB00;font-family:Arial;font-weight:bold">UP</div>')}
else {if (Status=="DOWN") {document.write('<div style="color:#FF0000;font-family:Arial;font-weight:bold">DOWN</div>')}
else {document.write('<div style="color:#EDA200;font-family:Arial;font-weight:bold">WARNING:<br>Status could not be determined!</div>')};};
</script>
Fortunately, what you're doing isn't possible. JavaScript cannot read data from another domain (unless that domain is set up explicitly to allow it) - even if that domain is localhost. Otherwise, it would be possible to create a web page that loads Facebook in a hidden IFrame and steals a bunch of confidential user data.
What you'll need to do instead is implement this same logic up on your own web server, using the server side programming language of your choice (PHP, Java, C#, etc). You'd initiate an HTTP request to the desired server, parse the results accordingly, and then echo the results to the client. Basically, you're creating a proxy to that service.
If you're trying to do this without using a web server at all, you might want to check into another client-side technology such as WPF, Air, WinForms, Java, etc.
If you are using firefox, element.innerText isn't supported. Use element.innerHTML instead. Also, I'm not sure if this is a typo, but you have an extra closing bracket around your final else statement. I would recommend using a different syntax for your if statements to make them neater:
if (Status=="UP") {
document.write('<div style="color:#00BB00;font-family:Arial;font-weight:bold">UP</div>')
}
else if (Status=="DOWN"){
document.write('<div style="color:#FF0000;font-family:Arial;font-weight:bold">DOWN</div>')
}
else {
document.write('<div style="color:#EDA200;font-family:Arial;font-weight:bold">WARNING <br>Status could not be determined!</div>')
}
I'm currently saving a cookie in jQuery's document ready event handler, like:
$(function() {
document.cookie = <cookie with info not dependent on DOM>
});
Is it possible and safe to save a cookie even earlier, e.g. as a JavaScript statement outside any event handler that executes as the JavaScript file is being interpreted? Any browsers that may not be reliable to do in?
It is 100% ok to read and write to cookies before the DOM has completed loading if you are not dependent on values from the DOM. If you use the Ghostery extension for Chrome and go to any website you can have a look at the tracking tags that load before the DOM is ready, most of which will be using normal cookies and that will give you an idea of how common it is to do this.
Using PhantomJS you can execute code in the browser by doing page.evaluate(). Are we opening ourselves up to an attack vector if we allow users to specify code which could be executed in that browser context? Is there a way to escape from the browser context into the phantomJS environment thereby executing commands on our servers?
Here's an example:
page.open(options.url, function(status) {
var test = function() {
return page.evaluate(function() {
return eval({{USER JAVASCRIPT STRING}});
});
});
var interval = setInterval(function() {
if (test()) {
clearInterval(interval);
// take screenshot, do other stuff, close phantom
}
}, 250);
});
From my understanding, the eval() occuring inside the page.evaluate() prevents them from ever escaping the context of the page which was opened. The user javascript string is passed as a string (it is not "compiled" into a single javascript file). It appears to me that it is no different then a user browsing to a site with a browser and attempting to hack away through their favorite Javascript console. Thus, this usage does not represent a security vulnerability. Is this correct?
Update
To provide a little more clarity about the exact use case. The basic gist is that someone will go to a url, http://www.myapp.com/?url=http://anotherurl.com/&condition={{javascriptstring}}. When a worker is available, it will spin up a phantom instance, page.open the URL provided, and then when condition is met, it will take a screenshot of the webpage. The purpose for this is that some pages, especially those with massive amounts of async javascript, have bizarre "ready" conditions that aren't as simple as DOM ready or window ready. In this way the screenshot won't be taken until a javascript condition is true. Examples include $(".domNode").data("jQueryUIWidget").loaded == true or $(".someNode").length > 0.
I'm not very familiar with PhantomJS, but eval is inherently unsafe when it comes to running unknown code. It would be very easy to escape the intended context:
return page.evaluate(function() {
return eval({{javascriptstring}});
});
http://example.com/?url=http://anotherurl.com/&condition={{javascriptstring}}
How about where {{javascriptstring}} equals:
console.log('All your script are belong to us');
I'm not sure what kind of nasty things you could do with PhantomJS, but it's an example of a user being able to run any code they want, so this doesn't sound like a good idea. The user string could literally be an entire program.
To clarify, the injection vulnerability is not in page.evaluate(), it's in the eval in your code.
Yes, this is DOM based XSS. This is a vulnerability that can be used to hijack user's (or administrative) sessions and expose users to other attacks.
If the input comes from a GET/POST or Fragment or part of the URL then its very easy to exploit. If the input comes from the UI, then it can be exploited with clickjacking.
I work on a javascript library that customers include on their site to embed a UI widget. I want a way to test dev versions of the library live on the customer's site without requiring them to make any changes to their code. This would make it easy to debug issues and test new versions.
To do this I need to change the script include to point to my dev server, and then override the load() method that's called in the page to add an extra parameter to tell it what server to point to when making remote calls.
It looks like I can add JS to the page using a chrome extension, but I don't see any way to modify the page before it's loaded. Is there something I'm missing, or are chrome extensions not allowed to do this kind of thing?
I've done a fair amount of Chrome extension development, and I don't think there's any way to edit a page source before it's rendered by the browser. The two closest options are:
Content scripts allow you to toss in extra JavaScript and CSS files. You might be able to use these scripts to rewrite existing script tags in the page, but I'm not sure it would work out, since any script tags visible to your script through the DOM are already loaded or are being loaded.
WebRequest allows you to hijack HTTP requests, so you could have an extension reroute a request for library.js to library_dev.js.
Assuming your site is www.mysite.com and you keep your scripts in the /js directory:
chrome.webRequest.onBeforeRequest.addListener(
function(details) {
if( details.url == "http://www.mysite.com/js/library.js" )
return {redirectUrl: "http://www.mysite.com/js/library_dev.js" };
},
{urls: ["*://www.mysite.com/*.js"]},
["blocking"]);
The HTML source will look the same, but the document pulled in by <script src="library.js"></script> will now be a different file. This should achieve what you want.
Here's a way to modify content before it is loaded on the page using the WebRequest API. This requires the content to be loaded into a string variable before the onBeforeRequest listener returns. This example is for javascript, but it should work equally well for other types of content.
chrome.webRequest.onBeforeRequest.addListener(
function (details) {
var javascriptCode = loadSynchronously(details.url);
// modify javascriptCode here
return { redirectUrl: "data:text/javascript,"
+ encodeURIComponent(javascriptCode) };
},
{ urls: ["*://*.example.com/*.js"] },
["blocking"]);
loadSynchronously() can be implemented with a regular XMLHttpRequest. Synchronous loading will block the event loop and is deprecated in XMLHttpRequest, but it is unfortunately hard to avoid with this solution.
You might be interested in the hooks available in the Opera browser. Opera used to have* very powerful hooks, available both to User JavaScript files (single-file things, very easy to write and deploy) and Extensions. Some of these are:
BeforeExternalScript:
This event is fired when a script element with a src attribute is encountered. You may examine the element, including its src attribute, change it, add more specific event listeners to it, or cancel its loading altogether.
One nice trick is to cancel its loading, load the external script in an AJAX call, perform text replacement on it, and then re-inject it into the webpage as a script tag, or using eval.
window.opera.defineMagicVariable:
This method can be used by User JavaScripts to override global variables defined by regular scripts. Any reference to the global name being overridden will call the provided getter and setter functions.
window.opera.defineMagicFunction:
This method can be used by User JavaScripts to override global functions defined by regular scripts. Any invocation of the global name being overridden will call the provided implementation.
*: Opera recently switched over to the Webkit engine, and it seems they have removed some of these hooks. You can still find Opera 12 for download on their website, though.
I had an idea, but I didn't try it, but it worked in theory.
Run content_script that was executed before the document was loaded, and register a ServiceWorker to replace page's requested file content in real time. (ServiceWorker can intercept all requests in the page, including those initiated directly through the dom)
Chrome extension (manifest v3) allow us to add rules for declarativeNetRequest:
chrome.declarativeNetRequest.updateDynamicRules({
addRules: [
{
"id": 1002,
"priority": 1,
"action": {
"type": "redirect",
"redirect": {
"url": "https://example.com/script.js"
}
},
"condition": {
"urlFilter": 'https://www.replaceme.com/js/some_script_to_replace.js',
"resourceTypes": [
'csp_report',
'font',
'image',
'main_frame',
'media',
'object',
'other',
'ping',
'script',
'stylesheet',
'sub_frame',
'webbundle',
'websocket',
'webtransport',
'xmlhttprequest'
]
}
},
],
removeRuleIds: [1002]
});
and debug it by adding listener:
chrome.declarativeNetRequest.onRuleMatchedDebug.addListener(
c => console.log('onRuleMatchedDebug', c)
)
It's not a Chrome extension, but Fiddler can change the script to point to your development server (see this answer for setup instructions from the author of Fiddler). Also, with Fiddler you can setup a search and replace to add that extra parameter that you need.