Is it possible to detect page modification through userscripts? [duplicate]

Is it possible to detect page modification through userscripts? [duplicate] - javascript

This question already has answers here:
Can a website know if I am running a userscript?
(2 answers)
Closed 9 years ago.
If you have a website, can you somehow find out if visitors are modifying your site with javascript userscripts?

In short: EEEEEEK! Don't do it! Rather, decide what needs to be guarded, and guard that. Avoid polling (periodical checking) at all costs. Especially, avoid periodical heavy checks of anything.
Not every change is possible to track. Most changes are just extremely hard to track, since there are so many things that could change.
Changes to the DOM (new nodes, removed nodes, changed attributes) can be detected. The other answer suggests checking innerHTML periodically, but it's better to use mutation observers (supported by Firefox, Chrome) or the older mutation events (DOMSubtreeModified et al.) (support varies by event) instead.
Changes to standard methods cannot be reliably detected, except by comparing every single method and property manually (eeeek). This includes the need to reference tons of objects including, say, Array.prototype.splice (and Array and Array.prototype as well, of course), and run a heavy script periodically. However, this is not what a userscript typically does.
The state of an input is a property, not an attribute. This means that the document HTML won't change. If the state is changed by a script, the change event won't fire either. Again, the only solution is to poll every single input manually (eeek).
There is no reliable way to detect if an event handler has been attached. For starters, you would need to guard the onX attributes (paragraph #2), detect any call to addEventListener (ek) (without tripping the paragraph #2 check), detect any calls to the respective methods by your library (jQuery.bind and several others).
One thing that plays in your favor, and possibly the only one: user scripts run on page load (never sooner), so you have plenty of time to prepare your defenses. not even that plays in your favor (thanks Brock Adams for noting and the link)
You can detect a standard method has been called by replacing it with your own (ek). There are many methods that you would need to instrument this way (eek), some by the browser, some by your framework. The fact that IE (and even firefox can be instructed to, thanks #Brock) won't let you touch the prototypes of the DOM classes adds another "e" or two to the "eek". The fact that some methods can only be obtained via a method call (return value, callback arguments) adds another "e" or two, for a total of "eeeek". The idea of crawling across the entirety of window will be foiled by security exceptions and uncatchable security exceptions. That is, unless you don't use iFrames and you are not within an iFrame.
Even if you detect every method call, DOM can be changed by writing to innerHTML. Firefox and Chrome support Mutation Observers, so you can use these.
Even if you detect every method call to a pre-existing method and listen to mutations, most properties are reflected by neither, so you need to watch all properties of every object as well. Pray someone does not add a non-enumerable property with a key you would never guess. Incidentally, this will catch DOM mutations as well. In ES6, it will be possible to observe an object's property set. I'm not sure if you can attach a setter to an existing object property in ES5 (while adhering to ES3 syntax). Polling every property is eeeek.
Of course, you should allow your own scripts to do some changes. The work flow would be to set a flag (not accessible from the global scope!) "I'm legit", do your job, and clear the flag - remember to flank all your callbacks as well. The method observers will then check the flag is set. The property watchdogs will have a harder time detecting if a change is valid, but they could be notified from the script of every legit change (manually; again make sure the userscripts cannot see that notification stream). Eeek.
There's an entirely different problem that I didn't realise at first: Userscripts run at page load, but they can create an iFrame as well. It's not entirely inconcievable (but still unlikely now) that a userscript would: 1) detect your script blocker, 2) nuke the page from the orbit (you can't prevent document.body.innerHTML =, at least not without heavily tampering with document.body), 3) insert a single iframe with the original URL (prevent double loads server-side?) and 4) have a plenty of time to act on that empty iframe before your protection is even loaded.
Also, see the duplicate found by Brock Adams, which shows several other checks that I didn't think of that should be done.

If you don't have script yourself that changes things you cold compare document.body.innerHTML and document.head.innerHTL with what it was.
When you do change DOM in your script you can update the values to compare it with. Use setInterval to compare periodically.

Related

Why is checking for an attribute using dot notation before removing faster than removing the attribute outright?

I asked this question, and it turned out that when removing an attribute from an element, checking whether the element exists first using elem.xxx!==undefined makes the runtime faster. Proof.
Why is it quicker? There's more code to go through and you'll have to encounter the removeAttribute() method whichever way you go about this.

Well, first thing you need to know is that elem.xxx is not the same as elem.getAttribute() or any other method relative to the attribute.
elem.xxx is a property of a DOM element while attribute and element on the HTML inside the DOM, both are similar but different. For exemple, take this DOM element: <a href="#"> and this code :
//Let say var a is the <a> tag
a.getAttribute('href');// == #
a.href;// == http://www.something.com/# (i.e the complet URL)
But let take a custom attribute : <a custom="test">
//Let say var a is the <a> tag
a.getAttribute('custom');// == test
a.custom;// == undefined
So you can't really compare the speed of both since they don't achieve the same result. But one is clearly faster since properties are a fast access data while attribute use the get/hasAttribute DOM functions.
Now, Why without the condition is faster? Simply because removeAttribute doesn't care is the attribute is missing, it check if it is not.
So using hasAttribute before removeAttribute is like doing the check twice, but the condition is a little slower since it need to check if the condition is satisfied to run the code.

I have a suspicion that the reason for the speed boost are trace trees.
Trace trees were first introduced by Andreas Gal and Michael Franz of the University of California, Irvine, in their paper Incremental Dynamic Code Generation with Trace Trees.
In his blog post Tracing the Web Andreas Gal (the co-author of the paper) explains how tracing Just-in-Time compilers works.
To explain tracing JIT compilers as sententiously as possible (since my knowledge about the subject isn't profound) a tracing JIT compiler does the following:
Initially all the code to be run is interpreted.
A count is kept for the number of times each code path is executed (e.g. the number of times the true branch of an if statement is executed).
When the number of times a code path is taken is greater than a predefined threshold the code path is compiled into machine code to speed up execution (e.g. I believe SpiderMonkey executes code paths executed more than once).
Now let's take a look at your code and understand what is causing the speed boost:
Test Case 1: Check
if (elem.hasAttribute("xxx")) {
elem.removeAttribute("xxx");
}
This code has a code path (i.e. an ifstatement). Remember that tracing JITs only optimize code paths and not entire functions. This is what I believe is happening:
Since the code is being benchmarked by JSPerf it's being executed more than once (an understatement). Hence it is compiled into machine code.
However it still incurs the overhead of the extra function call to hasAttribute which is not JIT compiled because it's not a part of the conditional code path (the code between the curly braces).
Hence although the code inside the curly braces is fast the conditional check itself is slow because it's not compiled. It is interpreted. The result is that the code is slow.
Test Case 2: Remove
elem.removeAttribute("xxx");
In this test case we don't have any conditional code paths. Hence the JIT compiler never kicks in. Thus the code is slow.
Test Case 3: Check (Dot Notation)
if (elem.xxx !== undefined) {
elem.removeAttribute("xxx");
}
This is the same as the first test case with one significant difference:
The conditional check is a simple non-equivalence check. Hence it doesn't incur the full overhead of a function call.
Most JavaScript interpreters optimize simple equivalence checks like this by assuming a fixed data type for both the variables. Since the data type of elem.xxx or undefined is not changing every iteration this optimization makes the conditional check even faster.
The result is that the conditional check (although interpreted) does not slow down the compiled code path significantly. Hence this code is the fastest.
Of course this is just speculation on my part. I don't know the internals of a JavaScript engine and I hence my answer is not canonical. However I opine that it is a good educated guess.

Your proof is incorrect...
elem.class !== undefined always evaluates to false and thus elem.removeAttribute("class") is never called, therefore, this test will always be quicker.
The correct property on elem to use is className, e.g.:
typeof elem.className !== "undefined"

As Karl-André Gagnon pointed out, accessing a [native] JavaScript property and invoking a DOM function/property are two different operations.
Some DOM properties are exposed as JavaScript properties via the DOM IDL; these are not the same as adhoc JS properties and require DOM access. Also, even though the DOM properties are exposed, there is not strict relation with DOM attributes!
For instance, inputElm.value = "x" will not update the DOM attribute, even though the element will display and report an updated value. If the goal is to deal with DOM attributes, the only correct method is to use hasAttribute/setAttribute, etc.
I've been working on deriving a "fair" micro-benchmark for the different function calls, but it is fairly hard and there is alot of different optimization that occurs. Here my best result, which I will use to argue my case.
Note that there is no if or removeAttribute to muddle up the results and I am focusing only on the DOM/JS property access. Also, I attempt to rule out the claim that the speed difference is merely due to a function call and I assign the results to avoid blatant browser optimizations. YMMV.
Observations:
Access to a JS property is fast. This is to be expected1,2
Calling a function can incur a higher cost than direct property access1, but is not nearly as slow as DOM properties or DOM functions. That is, it is not merely a "function call" that makes hasAttribute so much slower.
DOM properties access is slower than native JS property access; however, performance differs widely between the DOM properties and browsers. My updated micro-benchmark shows a trend that DOM access - be it via DOM property or DOM function - may be slower than native JS property access2.
And going back to the very top: Accessing a non-DOM [JS] property on an element is fundamentally different than accessing a DOM property, much less a DOM attribute, on the same element. It is this fundamental difference, and optimizations (or lack thereof) between the approaches across browsers, that accounts for the observed performance differences.
1 IE 10 does some clever trick where the fake function call is very fast (and I suspect the call has been elided) even though it has abysmal JS property access. However, considering IE an outlier or merely reinforcement that the function call is not what introduces the inherently slower behavior, doesn't detract from my primary argument: it is the DOM access that is fundamentally slower.
2 I would love to say DOM property access is slower, but FireFox does some amazing optimization of input.value (but not img.src). There is some special magic that happens here. Firefox does not optimize the DOM attribute access.
And, different browsers may exhibit entirely different results .. however, I don't think that one has to consider any "magic" with the if or removeAttribute to at least isolate what I believe to be the "performance issue": actually using the DOM.

jQuery .remove() vs Node.removeChild() and external maps to DOM nodes

The jQuery API documentation for jQuery .remove() mentions:
In addition to the elements themselves, all bound events and jQuery
data associated with the elements are removed.
I assume "bound events" here means "event handlers"; documentation for the similar .empty() says:
To avoid memory leaks, jQuery removes other constructs such as data
and event handlers from the child elements before removing the
elements themselves.
It does sound like leaks would ensue if one were to not use these functions and use Node.removeChild() (or ChildNode.remove()) instead.
Is this true for modern browsers?
If so, why exactly can't properties and event handlers be collected once the node is removed?
If not, do we still need to use .data()? Is it only good to retrieve HTML5 data- attributes?
Documentation for jQuery.data() (lower-level function) says:
The jQuery.data() method allows us to attach data of any type to DOM
elements in a way that is safe from circular references and therefore
free from memory leaks. jQuery ensures that the data is removed when
DOM elements are removed via jQuery methods, and when the user leaves
the page.
This sounds an awful lot like a solution to the old IE DOM/JS circular leak pattern which, AFAIK, is solved in all browsers today.
However, a comment in the jQuery src/data.js code (snapshot) says:
Provide a clear path for implementation upgrade to WeakMap in 2014
Which suggests that the idea of storing data strictly associated to a DOM node outside of the DOM using a separate data store with a map is still considered in the future.
Is this just for backward-compatibility, or is there more to it?
Answers provided to other questions like this one also seem to imply that the sole reason for an external map is to avoid cyclic refs between DOM objects and JS objects, which I consider irrelevant in the context of this question (unless I'm mistaken).
Furthermore, I've seen plugins that now set properties on relevant DOM nodes directly (e.g. selectize.js) and it doesn't seem to bother anyone. Is this an OK practice? It certainly looks that way, as it makes removing entire DOM trees very easy. No need to walk it down, no need to clean up any external data store, just detach it from the parent node, lose the reference, and let the garbage collector do its thing.
Further notes, context and rationale to the question:
This kind of capability is especially interesting for frameworks that manage views (e.g. Durandal), which often times have to replace entire trees that represent said views in their architecture. While most of them certainly support jQuery explicitly, this solution does not scale at all. Every component that uses a similar data store must also be cleaned up. In the case of Durandal, it seems they (at least in one occurrence, the dialog plugin - snapshot) rely on Knockout's .removeNode() (snapshot) utility function, which in turn uses jQuery's internal cleanData(). That's, IMHO, a prime example of horrible special-casing (I'm not sure it even works as it is now if jQuery is used in noConflict mode, which it is in most AMD setups).
This is why I'd love to know if I can safely ignore all of this or if we'll have to wait for Web Components in order to regain our long-lost sanity.

"It does sound like leaks would ensue if one were to not use these functions and use Node.removeChild() (or ChildNode.remove()) instead.
Is this true for modern browsers?
If so, why exactly can't properties and event handlers be collected once the node is removed?"
Absolutely. The data (including event handlers) associated with an element is held in a global object held at jQuery.cache, and is removed via a serial number jQuery puts on the element.
When it comes time for jQuery to remove an element, it grabs the serial number, looks up the entry in jQuery.cache, manually deletes the data, and then removes the element.
Destroy the element without jQuery, you destroy the serial number and the only association to the element's entry in the cache. The garbage collector has no knowledge of what the jQuery.cache object is for, and so it can't garbage collect entries for nodes that were removed. It just sees it as a strong reference to data that may be used in the future.
While this was a useful approach for old browsers like IE6 and IE7 that had serious problems with memory leaks, modern implements have excellent garbage collectors that reliably find things like circular references between JavaScript and the DOM. You can have some pretty nasty circular references via object properties and closures, and the GC will find them, so it's really not such a worry with those browsers.
However, since jQuery holds element data in the manner it does, we now have to be very careful when using jQuery to avoid jQuery-based leaks. This means never use native methods to remove elements. Always use jQuery methods so that jQuery can perform its mandatory data cleanup.
"Furthermore, I've seen plugins that now set properties on relevant DOM nodes directly (e.g. selectize.js) and it doesn't seem to bother anyone. Is this an OK practice?"
I think it is for the most part. If the data is just primitive data types, then there's no opportunity for any sort of circular references that could happen with functions and objects. And again, even if there are circular references, modern browsers handle this nicely. Old browsers (especially IE), not so much.
"This is why I'd love to know if I can safely ignore all of this or if we'll have to wait for Web Components in order to regain our long-lost sanity."
We can't ignore the need to use jQuery specific methods when destroying nodes. Your point about external frameworks is a good one. If they're not built specifically with jQuery in mind, there can be problems.
You mention jQuery's $.noConflict, which is another good point. This easily allows other frameworks/libraries to "safely" be loaded, which may overwrite the global $. This opens the door to leaks IMO.
AFAIK, $.noConflict also enables one to load multiple versions of jQuery. I don't know if there are separate caches, but I would assume so. If that's the case, I would imagine we'd have the same issues.
If jQuery is indeed going to use WeakMaps in the future as the comment you quoted suggests, that will be a good thing and a sensible move. It'll only help in browsers that support WeakMaps, but it's better than nothing.
"If not, do we still need to use .data()? Is it only good to retrieve HTML5 data- attributes?"
Just wanted to address the second question. Some people think .data() should always be used for HTML5 data- attributes. I don't because using .data() for that will import the data into jQuery.cache, so there's more memory to potentially leak.
I can see it perhaps in some narrow cases, but not for most data. Even with no leaks, there's no need to have most data- stored in two places. It increases memory usage with no benefit. Just use .attr() for most simple data stored as data- attributes.

In order to provide some of its features, jQuery has its own storage for some things. For example, if you do
$(elem).data("greeting", "hello");
Then, jQuery, will store the key "greeting" and the data "hello" on its own object (not on the DOM object). If you then use .removeChild(elem) to remove that element from the DOM and there are no other references to it, then that DOM element will be freed by the GC, but the data that you stored with .data() will not. This is a memory leak as the data is now orphaned forever (while you're on that web page).
If you use:
$(elem).remove();
or:
$(some parent selector).empty()
Then, jQuery will not only remove the DOM elements, but also clean up its extra shadow data that it keeps on items.
In addition to .data(), jQuery also keeps some info on event handlers that are installed which allows it to perform operations that the DOM by itself can't do such as $(elem).off(). That data also will leak if you don't dispose of an object using jQuery methods.
In a touch of irony, the reason jQuery doesn't store data as properties on the DOM elements themselves (and uses this parallel storage) is because there are circumstances where storing certain types of data on the DOM elements can itself lead to memory leaks.
As for the consequences of all this, most of the time it is a negligible issue because it's a few bytes of data that is recovered by the browser as soon as the user navigates to a new page.
The kinds of things that could make it become material are:
If you have a very dynamic web page that is constantly creating and removing DOM elements thousands of times and using jQuery features on those objects that store side data (jQuery event handlers, .data() on those elements, then any memory leak per operation could add up over time and become material.
If you have a very long running web page (e.g. a single page app) that stays on screen for very long periods of time and thus over time the memory leaks could accumulate.

Identify javascript closures with developer tools

I am currently developing a website that is pure javascript and relies heavily on the jQuery & jQuery UI libraries (this site is not intended for use by a general public, hence progressive enhancement is not a strict requirement for this project). I am encountering a significant memory leak on executing the following code:
oDialogBox = $("<div>...</div>");
/* Add useful things to the dialog box here */
oDialogBox.appendTo("body");
oDialogBox.dialog({
/* Other dialog box settings here */
close: function(event, ui) {
oDialogBox.dialog("destroy");
oDialogBox.remove();
oDialogBox = null;
}
});
At any given time in this dialog box, I am creating, removing and modifying a large number of instances of jQuery UI buttons, multiselects (per the Multiselect widget created by Eric Hynds) and on click event handlers. According to jQuery UI documentation, calling .remove() on oDialogBox should result in all child widgets being unbound and deleted. Yet my detached DOM tree shows a significant number of garbage elements that the GC isn't collecting.
It is highly likely I have missed a large set of closures that need to be finished off safely. How do I do the following:
1) How do I identify which closures are keeping a given detached DOM object alive (either in Firefox or Chrome)?
2) Assuming the complete set of closures is identified, does anything beyond nulling the variable need to be done to assure marking the DOM element for garbage collection?
3) I have also noticed my list of arrays stored by the page is giant and contains references to DOM elements not being gathered by the GC. Is there a documented best practice for cleaning arrays from javascript and allowing all elements to be marked for deletion? (Note: this is a current prime suspect for the source of the memory leak)

I'm afraid that I don't have a great answer for #1. I haven't found any really good tools for this myself, even given how good the development tools have become over the last few years. The best advice I can give is to always keep things in the smallest scope you possibly can. If things don't escape, it's generally easier to simply figure out where the references must be.
As to #2, there can be further concerns. If the object referenced by variable v1 closes over the free variables of some function, removing v1 will not be enough to make them eligible for garbage collection if another variable v2 closes over v1 in some other function. So I guess if you really mean the "complete set of closures", then you should be all set. But this might get hairy. Again, if most object have references only in narrow scopes, these problems are much less severe.
For #3, what sorts of arrays are you discussing? If it's jQuery collections, then perhaps you simply have too many of them around. The only reason I know for them to stay around for a long time is to bind event handlers to them, and that is almost always better handled by event delegation on parent elements. If it's you're own custom arrays, do you really have a good reason to store references to them in arrays that last for any substantial length of time? I've rarely found one.

Javascript: how to really prove a frame is loaded? Global variables not working?

I've been fooling around with developing an IETMs interface (Interactive Electronic Technical Manual - like an interactive parts catalogue) to display the data live from an existing Access database. The idea is to be able to run this interface on a network hosted intranet with straight HTML, plain Javascript, VBScript & ActiveX objects, so that it doesn't require IIS etc to run ASP or PHP etc (I don't want to involve corporate IT for the IIS).
All is going pretty well, & I'm impressed with the setup except for a few minor things - checking if a frame is loaded, & global variables.
My setup is a HTML page hosting 5 frames with each containing an empty (which gets the page written to it dynamically), but I need to ensure all frames are loaded before getting into the heavy stuff (which Javascript is handling brilliantly!). But I'm finding that Javascript sux at truly detecting if a frame is loaded (someone please prove me wrong!). I have all 5 frames call a function fnInitialiseIfReady(), then if I could either successfully test if all frames are loaded, or if I could globally count if this function has been called 5 times, I can proceed with confidence & call my function fnInitialise(). But unfortunately neither is working for me. :(
From tireless internet searches, I've tried the 'frames always load in order' theory, & that is simply not correct. I have set up a test with the frames calling a function passing their name as a parameter, & each time the frames load in a different order every time. It is totally random. Note: I proved this by having the first 4 frames call a certain function(which contains an alert() line showing the frame name parameter passed), & having the last frame call a different function (which contains an alert("all are loaded!") line). The "all are loaded!" does not always appear last.
I've also tried the '.frames["FrameName"].document.loaded' approach, & it ALWAYS returns 'undefined' for every frame. Am I doing something wrong here?
I've also tried the '.frames["FrameName"].window.location.href' approach & it ALWAYS returns the html filename regardless of whether that page has loaded or not, so it is not an indicator of loading completion.
I've also tried the '.frames["FrameName"].document.location' approach & it's ALWAYS the same as the '.window.location.href' approach.
Also, I'm finding Javascript will not hold global variables for me at all. I don't know if it's a combination of multiple frames & using Javascript & VBScript together, but global variables just do not hold a value at all. Local variables (within functions) are fine. Any ideas? I don't have many globals, so I'm thinking of using a cookie. A valid Solution?
BTW, the reason for also using VBScript is that it accesses the ActiveX controls by default, & being a corporate intranet app I can guarantee MSIE usage.
It's frustrating because if I can solve these 2 relatively minor issues, then I'm super impressed with the robustness of this Javascript/VBScript approach. By leveraging each of their strengths, it's crunching the data just as quickly as the VB, C#, & C++ programs I've written for this particular dataset. I'm impressed! :)
Thanks in advance,
Dave Lock.

AFAIK, each frame (window) has its own 'global' context. That's why your Javascript objects can't see each other without special effort.
Are the frames nested? If not (i.e. they're all in the same frameset), you could try to add an onload event handler for each, and have those refer to some central global object.
I'm thinking you might have to add those onload event handlers from code (using attachEvent), so that you could assign the same event handler to all frames. Otherwise, I think each onload would run within its own (window's) global context.

Javascript memory leak on page refresh; remedy?

I am experiencing a memory leak in IE that occurs upon a page refresh (as I described in this SO post).
All I want to know at this point is: is there a way, on the document "unload" event (which could get called when the page refreshes or closes), to clear EVERYTHING? I'm looking for a simple solution that would ensure that everything gets destroyed in order to avoid the leak. Is this even possible, or do I have to continue looking into the details of the leak and fixing it on an object by object basis?
Update: Ok maybe I wasn't descriptive enough. I can't (at least I dont think I can) just set all of my objects to null: I have event handlers for click events etc. so the application needs to be "live" constantly until it is closed. Also if I then think about just nulling everything out in an "unload" method (called when the page is exited), then all my objects would have to have global scope (right)? What is the best way to remedy this? Is there a way to get a list of all referenced objects so I can null them? Should I add every object I use to an array so that I can dereference it later?

try window.onbeforeunload or window.onunload and set the variables you use to null.
or you looking for something more?

Set your objects to null and they won't be leaked.

check if you are using a random anti-cache URL parameter, it might cause memory leaks
IE tries to keep all scripts loaded from the same domain in memory as you navigate from page to page, because there is a high chance that being on different pages you will be needing pretty much the same scripts.
a random anti-cache parameter added to the URL of a script makes it a different script (at least caching is fooled by that)
as we know IE tries to load all possible scripts for the domain and keep them
a random anti-cache parameters leads to memory leaks because every time otherwise the same scripts have a different URL and IE thinks they are different and downloads them over and over on each reload and keeps them in memory

Develop Reference

JavaScript is the programming language of the Web.