Anonymizing Google Pagespeed - javascript

I'm new to javascript. How do I anonymize Google Pagespeed?
Here is the original code:
http://pastebin.com/xRbTekDA. It works when I load the page
Here is the anonymize code: http://pastebin.com/fj9rP7FM. It shows a javascript error every time I load the page. It says "ReferenceError: runPagespeedCallbacks is not defined" because I anonymized it.
How do I anonymize that original code?

The problem you are having is the method the code is expecting to call is not in scope. So if you modify the code slightly this should rid you of the error. This code should fix the issue. http://pastebin.com/RrQ2848j
Notice i'm just returning the callback function and assigning it as a variable. There are other approachs you can take but there needs to be something in the global scope to call.
The reason for this is a script block is being created to get script and data, because an AJAX(XHR) request would violate the same-origin policy trying to reach out to google.com while executing on yourdomain.com . When the script is downloaded, it's going to expect to call a function in the global scope to pass some data into it. That function is named on the query string of the SRC attribute when creating the script block as shown here:
function runPagespeed() {
var s = document.createElement('script');
s.type = 'text/javascript';
s.async = true;
var query = [
'url=' + YN_URL,
'callback=runPagespeedCallbacks',
'key=' + API_KEY
].join('&');
s.src = API_URL + query;
document.head.insertBefore(s, null);
}

The only difference between the two is that the second is wrapped in an immediately invoked function expression (IIFE). The IIFE encapsulates the code so that the free variables are not globally visible. Normally this is a good thing, but if other services rely on that code, it will not be visible.

Related

Unable to access Global variables in chrome.tabs.executescript in Chrome version 55

I recently updated Chrome to version 55.0.2883.75.
I am using a self developed Chrome Plugin to parse my HTML files wherein I use chrome.tabs.executescript to get data from background HTML page.
So when I execute chrome.extension.onRequest, I save the background page's parsed data to a global variable and access it in the callback function of chrome.tabs.executescript and process it.
This was working fine till I update to Version 55.0.2883.75.
How can I access the global variables in the new version ??
My Code Below :
Step 1 :
chrome.extension.onRequest.addListener(
function (request, sender, sendResponse) {
parser = new DOMParser();
htmlDoc = parser.parseFromString(request.content, "text/html");
//outputJson is a global variable which is Populated here
outputJson = parseMyPage(outputJson, htmlDoc);
});
Step 2:
chrome.tabs.getSelected(null, function (tab) {
// Now inject a script onto the page
chrome.tabs.executeScript(tab.id,{
code: "chrome.extension.sendRequest({content: document.body.innerHTML}, function(response) { console.log('success'); });"
}, function () {
//my code to access global variables
if (outputJson && null != outputJson) {
// other stuff
}
});
});
The way your code is designed, you are relying on the order in which two asynchronous blocks of code are executed: the extension.onRequest1 event and the callback for tabs.executeScript(). Your code requires that the extension.onRequest1 event fires before the tabs.executeScript() callback is executed. There is no guarantee that this will be the order in which these occur. If this is a released extension, it is quite possible that this was failing on users' machines, depending on their configuration. It is also possible that the code in Chrome, prior to Chrome 55, resulted in the event and callback always happening in the order you required.
The solution is to to rewrite this to not require any particular order for the execution of these asynchronous code blocks. Fortunately, there is a way to do that and reduce complexity at the same time.
You can transfer the information you desire from the content script to your background script directly into the callback of the tabs.executeScript(), without the need to explicitly pass a message. The value of the executed script is passed to the callback in an array containing one entry per frame in which the script was injected. This can very conveniently be used to pass data from a content script to the tabs.executeScript() callback. Obviously, you can only send back a single value per frame this way.
The following code should do what you desire. I hand edited this code from your code in this Question and my answer here. While the code in that answer is fully tested, the fact that I edited this only within this answer means that some errors may have crept in:
chrome.tabs.getSelected(null, function (tab) {
// Now inject a script onto the page
chrome.tabs.executeScript(tab.id,{
code: "document.body.innerHTML;"
}, function (results) {
parser = new DOMParser();
htmlDoc = parser.parseFromString(results[0], "text/html");
//outputJson is a global variable which is Populated here
outputJson = parseMyPage(outputJson, htmlDoc);
//my code to access global variables
if (outputJson && null != outputJson) {
// other stuff
}
});
});
extension.sendRequest() and extension.onRequest have been deprecated since Chrome 33. You should replace these anywhere you are using them with runtime.sendmessage() and runtime.onMessage.

Making a Same Domain iframe Secure

tl;dr Can I execute un-trusted scripts on an iframe safely?
Back story:
I'm trying to make secure JSONP requests. A lot of older browsers do not support Web Workers which means that the current solution I came up with is not optimal.
I figured I could create an <iframe> and load a script inside it. That script would perform a JSONP request (creating a script tag), which would post a message to the main page. The main page would get the message, execute the callback and destroy the iframe. I've managed to do this sort of thing.
function jsonp(url, data, callback) {
var iframe = document.createElement("iframe");
iframe.style.display = "none";
document.body.appendChild(iframe);
var iframedoc = iframe.contentDocument || iframe.contentWindow.document;
sc = document.createElement("script");
sc.textContent = "(function(p){ cb = function(result){p.postMessage(result,'http://fiddle.jshell.net');};})(parent);";
//sc.textContent += "alert(cb)";
iframedoc.body.appendChild(sc);
var jr = document.createElement("script");
var getParams = ""; // serialize the GET parameters
for (var i in data) {
getParams += "&" + i + "=" + data[i];
}
jr.src = url + "?callback=cb" + getParams;
iframedoc.body.appendChild(jr);
window.onmessage = function (e) {
callback(e.data);
document.body.removeChild(iframe);
}
}
jsonp("http://jsfiddle.net/echo/jsonp/", {
foo: "bar"
}, function (result) {
alert("Result: " + JSON.stringify(result));
});
The problem is that since the iframes are on the same domain, the injected script still has access to the external scope through .top or .parent and such.
Is there any way to create an iframe that can not access data on the parent scope?
I want to create an iframe where scripts added through script tags will not be able to access variables on the parent window (and the DOM). I tried stuff like top=parent=null but I'm really not sure that's enough, there might be other workarounds. I tried running a for... in loop, but my function stopped working and I was unable to find out why.
NOTE:
I know optimally WebWorkers are a better isolated environment. I know JSONP is a "bad" technique (I even had some random guy tell me he'd never use it today). I'm trying to create a secure environment for scenarios where you have to perform JSONP queries.
You can't really delete the references, setting null will just silently fail and there is always a way to get the reference to the parent dom.
References like frameElement and frameElement.defaultView etc. cannot be deleted. Attempting to do so will either silently fail or throw exception depending on browser.
You could look into Caja/Cajita though.
tl;dr no
Any untrusted script can steal cookies (like a session id!) or read information from the DOM like the value of a credit card input field.
JavaScript relies on the security model that all code is trusted code. Any attempts at access from another domain requires explicit whitelisting.
If you want to sandbox your iframe you can serve the page from another domain. This does mean that you can't share a session or do any kind of communication because it can be abused. It's just like including an unrelated website. Even then there are possibilities for abuse if you allow untrusted JavaScript. You can for instance do: window.top.location.href = 'http://my.phishing.domain/';, the user might not notice the redirect.

Why does the Segment.io loader script push method names/args onto a queue which seemingly gets overwritten?

I've been dissecting the following code snippet, which is used to asynchronously load the Segment.io analytics wrapper script:
// Create a queue, but don't obliterate an existing one!
var analytics = analytics || [];
// Define a method that will asynchronously load analytics.js from our CDN.
analytics.load = function(apiKey) {
// Create an async script element for analytics.js.
var script = document.createElement('script');
script.type = 'text/javascript';
script.async = true;
script.src = ('https:' === document.location.protocol ? 'https://' : 'http://') +
'd2dq2ahtl5zl1z.cloudfront.net/analytics.js/v1/' + apiKey + '/analytics.min.js';
// Find the first script element on the page and insert our script next to it.
var firstScript = document.getElementsByTagName('script')[0];
firstScript.parentNode.insertBefore(script, firstScript);
// Define a factory that generates wrapper methods to push arrays of
// arguments onto our `analytics` queue, where the first element of the arrays
// is always the name of the analytics.js method itself (eg. `track`).
var methodFactory = function (type) {
return function () {
analytics.push([type].concat(Array.prototype.slice.call(arguments, 0)));
};
};
// Loop through analytics.js' methods and generate a wrapper method for each.
var methods = ['identify', 'track', 'trackLink', 'trackForm', 'trackClick',
'trackSubmit', 'pageview', 'ab', 'alias', 'ready'];
for (var i = 0; i < methods.length; i++) {
analytics[methods[i]] = methodFactory(methods[i]);
}
};
// Load analytics.js with your API key, which will automatically load all of the
// analytics integrations you've turned on for your account. Boosh!
analytics.load('MYAPIKEY');
It's well commented and I can see what it's doing, but I'm puzzled when it comes to the methodFactory function, which pushes details (method name and arguments) of any method calls made before the main analytics.js script has loaded onto the global analytics array.
This is all well and good, but then if/when the main script does load, it seemingly just overwrites the global analytics variable (see last line here), so all that data will be lost.
I see how this prevents script errors in a web page by stubbing out methods which don't exist yet, but I don't understand why the stubs can't just return an empty function:
var methods = ['identify', 'track', 'trackLink', 'trackForm', 'trackClick',
'trackSubmit', 'pageview', 'ab', 'alias', 'ready'];
for (var i = 0; i < methods.length; i++) {
lib[methods[i]] = function () { };
}
What am I missing? Please, help me understand!
Ian here, co-founder at Segment.io—I didn't actually write that code, Calvin did, but I can fill you in on what it's doing.
You're right, the methodFactory is stubbing out the methods so that they are available before the script loads, which means people can call analytics.track without wrapping those calls in an if or ready() call.
But the methods are actually better than "dumb" stubs, in that they save the method that was called, so we can replay the actions later. That's this part:
analytics.push([type].concat(Array.prototype.slice.call(arguments, 0)));
To make that more readable:
var methodFactory = function (method) {
return function () {
var args = Array.prototype.slice.call(arguments, 0);
var newArgs = [method].concat(args);
analytics.push(newArgs);
};
};
It tacks on the name of the method that was called, which means if I analytics.identify('userId'), our queue actually gets an array that looks like:
['identify', 'userId']
Then, when our library loads in, it unloads all of the queued calls and replays them into the real methods (that are now available) so that all of the data recorded before load is still preserved. That's the key part, because we don't want to just throw away any calls that happen before our library has the chance to load. That looks like this:
// Loop through the interim analytics queue and reapply the calls to their
// proper analytics.js method.
while (window.analytics.length > 0) {
var item = window.analytics.shift();
var method = item.shift();
if (analytics[method]) analytics[method].apply(analytics, item);
}
analytics is a local variable at that point, and after we're done replaying, we replace the global with the local analytics (which is the real deal).
Hope that makes sense. We're actually going to have a series on our blog about all the little tricks for 3rd-party Javascript, so you might dig that soon!
Not very related to the question, but may be useful to those who googled for issue "segment not sends queued events".
In my code I assigned window.analytics to another variable at page loading stage:
let CLIENT = analytics;
Then I used this variable instead of using global analytics:
CLIENT.track();
CLIENT.page();
// etc
But I encountered a problem when sometimes events are sent, and sometimes nothing is being sent. That "sometimes" vary between page reloads. Sometimes it also could ignore all events that fire at page loading, and without page reloading start sending events that are binded after page loading.
Then I debugged and found that CLIENT holds all not sent events in queue. Obviously they were put using methodFactory(). Then I found this SO question. So that's what is happening I think:
CLIENT holds reference to stub analytics object, which calls this methodFactory(). After Segment is fully loaded it replaces window.analytics with actual code while CLIENT still holds reference to old window.analytics. That's why this "sometimes" happens: sometimes window.analytics was replaced by Segment before loading the main script which initializes this CLIENT, and sometimes main script loaded earlier than Segment script.
New code:
let CLIENT = undefined;
if (CLIENT) {
CLIENT.page();
} else {
window.analytics.page();
}
I need to have this CLIENT because I'm using same analytics code for web and mobile. On mobile this CLIENT will be initialized separately while on web window.analytics is always available.

Using script tag to pass arguments to JavaScript

I need to implement a cross-site comet http server push mechanism using script tag long polling. (phew...) For this, I dynamically insert script tags into the DOM and the server sends back short js scripts that simply call a local callback function that processes the incoming messages. I am trying to figure out a way to associate each one of these callback calls with the script tag that sent it, to match incoming replies with their corresponding requests.
Clearly, I could simply include a request ID in the GET url, which is then returned back in the js script that the server generates, but this creates a bunch of unnecessary traffic and doesn't strike me as particularly elegant or clever.
What I would like to do is to somehow associate the request ID with the script tag that I generate and then read out this request ID from within the callback function that is called from inside this script tag. That way, all the request management would remain on the client.
This leads me to the following question: Is there a way to ask the browser for the DOM element of the currently executing script tag, so I can use the tag element to pass arguments to the contained javascript?
I found this thread:
Getting the currently executing, dynamically appended, script tag
Which is asking exactly this question, but the accepted answer isn't useful to me since it still requires bloat in the server-returned js script (setting marker-variables inside the script) and it relies on unique filenames for the scripts, which I don't have.
Also, this thread is related:
How may I reference the script tag that loaded the currently-executing script?
And, among other things, suggests to simply grab the last script in the DOM, as they are executed in order. But this seems to only work while the page is loading and not in a scenario where scripts are added dynamically and may complete loading in an order that is independent of their insertion.
Any thoughts?
PS: I am looking for a client-only solution, i.e. no request IDs or unique callback function names or other non-payload data that needs to get sent to and handled by the server. I would like for the server to (theoretically) be able to return two 100% identical scripts and the client still being able to associate them correctly.
I know you would like to avoid discussions about changing the approach, but that's really what you need to do.
First, each of the script tags being added to the DOM to fire off the poll request is disposable, i.e. each needs to be removed from the DOM as soon as its purpose has been served. Else you end up flooding your client DOM with hundreds or more dead script tags.
A good comparable example of how this works is jsonp implementations. You create a client-side named function, create your script tag to make the remote request, and pass the function name in the request. The response script wraps the json object in a function call with the name, which then executes the function on return and passes the json payload into your function. After execution, the client-side function is then deleted. jQuery does this by creating randomly generated names (they exist in the global context, which is really the only way this process works), and then deletes the callback function when its done.
In regards to long polling, its a very similar process. Inherently, there is no need for the response function call to know, nor care, about what script tag initiated it.
Lets look at an example script:
window.callback = function(obj){
console.log(obj);
}
setInterval(function(){
var remote = document.createElement('script');
remote.src = 'http://jsonip.com/callback';
remote.addEventListener('load', function(){
remote.parentNode.removeChild(remote);
},false);
document.querySelector('head').appendChild(remote);
}, 2000);​
This script keeps no references to the script elements because again, they are disposable. As soon as their jobs are done, they are summarily shot.
The example can be slightly modified to not use a setInterval, in which case you would replace setInterval with a named function and add logic into the remote load event to trigger the function when the load event completes. That way, the timing between script tag events depends on the response time of your server and is much closer to the actual long polling process.
You can extend this even further by using a queueing system to manage your callbacks. This could be useful if you have different functions to respond to different kinds of data coming back.
Alternatively, and probably better, is to have login in your callback function that handles the data returned from each poll and executes whatever other specific client-side logic at that point. This also means you only need 1 callback function and can get away from creating randomly generated callback names.
If you need more assistance with this, leave a comment with any specific questions and I can go into more detail.
It's most definitely possible but you need a little trick. It's a common technique known as JSONP.
In JavaScript:
var get_a_unique_name = (function () {
var counter = 0;
return function () {
counter += 1;
return "function_" + counter;
}
}()); // no magic, just a closure
var script = document.createElement("script");
var callback_name = get_a_unique_name();
script.src = "/request.php?id=12345&callback_name=" + callback_name;
// register the callback function globally
window[callback_name] = function (the_data) {
console.log(the_data);
handle_data(the_data); // implement this function
};
// add the script
document.head.appendChild(script);
The serverside you can have:
$callback_name = $_GET["callback_name"];
$the_data = handle_request($_GET["id"]); // implement handle_request
echo $callback_name . "(" . json_encode($the_data) . ");";
exit; // done
The script that is returened by /request.php?id=12345&callback_name=XXX will look something like this:
function_0({ "hello": "world", "foo" : "bar" });
There may be a solution using onload/onreadystate events on the script. I can pass these events a closure function that carries my request ID. Then, the callback function doesn't handle the server reply immediately but instead stores it in a global variable. The onload/onreadystate handler then picks up the last stored reply and tags it with the request ID it knows and then processes the reply.
For this to work, I need to be able to rely on the order of events. If onload is always executed right after the corresponding script tag finishes execution, this will work beautifully. But, if I have two tags loading simultaneously and they return at the same time and there is a chance that the browser will execute both and afterwards execute botth onload/onreadystate events, then I will loose one reply this way.
Does anyone have any insight on this?
.
Here's some code to demonstrate this:
function loadScript(url, requestID) {
var script = document.createElement('script');
script.setAttribute("src", url);
script.setAttribute("type", "text/javascript");
script.setAttribute("language", "javascript");
script.onerror = script.onload = function() {
script.onerror = script.onload = script.onreadystatechange = function () {}
document.body.removeChild(script);
completeRequest(requestID);
}
script.onreadystatechange = function () {
if (script.readyState == 'loaded' || script.readyState == 'complete') {
script.onerror = script.onload = script.onreadystatechange = function () {}
document.body.removeChild(script);
completeRequest(requestID);
}
}
document.body.appendChild(script);
}
var lastReply;
function myCallback(reply) {
lastReply = reply;
}
function completeRequest(requestID) {
processReply(requestID, lastReply);
}
function processReply(requestID, reply) {
// Do something
}
Now, the server simply returns scripts of the form
myCallback(message);
and doesn't need to worry at all about request IDs and such and can always use the same callback function.
The question is: If I have two scripts returning "simultaneously" is it possible that this leads to the following calling order:
myCallback(message1);
myCallback(message2);
completeRequest(requestID1);
completeRequest(requestID2);
If so, I would loose the actual reply to request 1 and wrongly associate the reply to request 2 with request 1.
It should be quite simple. There is only one script element for each server "connection", and it can easily be stored in a scoped, static variable.
function connect(nameOfCallback, eventCallback) {
var script;
window[nameOfCallback] = function() { // this is what the response invokes
reload();
eventCallback.call(null, arguments);
};
reload();
function reload() {
if (script && script.parentNode)
script.parentNode.removeChild(script);
script = document.createElement(script);
script.src = "…";
script.type = "text/javascript";
document.head.appendChild(script);
// you might use additional error handling, e.g. something like
// script.onerror = reload;
// but I guess you get the concept
}
}

Hijacking a variable with a userscript for Chrome

I'm trying to change the variable in a page using a userscript.
I know that in the source code there is a variable
var smilies = false;
In theory I should be able to change it like that:
unsafeWindow.smilies = true;
But it doesn't work. When I'm trying to alert or log the variable to the console without hijacking I get that it's undefined.
alert(unsafeWindow.smilies); // undefined !!!
EDIT: I'm using Chrome if it changes anything...
http://code.google.com/chrome/extensions/content_scripts.html says:
Content scripts execute in a special environment called an isolated
world. They have access to the DOM of the page they are injected into,
but not to any JavaScript variables or functions created by the page.
It looks to each content script as if there is no other JavaScript
executing on the page it is running on.
It's about Chrome Extensions but I guess it's the same story with Userscripts too?
Thank you, Rob W. So the working code for people who need it:
var scriptText = "smilies = true;";
var rwscript = document.createElement("script");
rwscript.type = "text/javascript";
rwscript.textContent = scriptText;
document.documentElement.appendChild(rwscript);
rwscript.parentNode.removeChild(rwscript);
In Content scripts (Chrome extensions), there's a strict separation between the page's global window object, and the content script's global object.
To inject the code, a script tag has to be injected.
Overwriting a variable is straightforward.
Overwriting a variable, with the intention of preventing the variable from being overwritten requires the use of Object.defineProperty Example + notes.
The final Content script's code:
// This function is going to be stringified, and injected in the page
var code = function() {
// window is identical to the page's window, since this script is injected
Object.defineProperty(window, 'smilies', {
value: true
});
// Or simply: window.smilies = true;
};
var script = document.createElement('script');
script.textContent = '(' + code + ')()';
(document.head||document.documentElement).appendChild(script);
script.parentNode.removeChild(script);

Categories

Resources