I am reviewing the privacy of data collected by Google Analytics when collecting on the default PageView action. Here is the code snippet being used:
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-1111111-11', 'auto');
ga('send', 'pageview');
</script>
I can't find any clear answer as to exactly which data is being collected. I want to make sure that no PII or PHI will be collected by accident if the page being tracked contains some, such as name, phone, medical info, etc.
Is there any clear guide that states which data is collected for PageView?
By default no PII should be collected by pageviews since the pageview data contains:
Page URL / Page title: which is publicly available information about others things (whatever content is on the website) and not the user itself
User browser / system info: which is technical information (eg browser version)
Like Marie explained, this is something you can verify yourself by inspecting the browser console, when browsing Stackoverflow for instance:
The payload being:
v=1&_v=j68&a=1398701675&t=pageview&_s=1&dl=https%3A%2F%2Fstackoverflow.com%2F&ul=en-us&de=UTF-8&dt=Stack%20Overflow%20-%20Where%20Developers%20Learn%2C%20Share%2C%20%26%20Build%20Careers&sd=24-bit&sr=1920x1080&vp=1840x486&je=0&_u=SACAAEABE~&jid=&gjid=&cid=1389717770.1529853314&uid=148108&tid=UA-108242619-1&_gid=263772020.1532245622&cd1=148108&cd3=Home%2FIndex&z=522475539
However in some cases, pageviews can collect PII, the most common case being that if the page URL or titles contain PII. I've faced such a situation with a company who were running GA on their intranet, and PII was getting exposed in 2 ways:
Employee profiles: https://myintranet.net/employees/firstname-lastname
Employee search: the most common use task of the intranet (large corporate) was to look people up via their email address, resulting in a search parameter added to the URL (https://myintranet.net/search/q=f.lastname#company.com) which was getting tracked as both pageview AND search keyword by GA
General remark about PII warnings: you simply won't find them. Google will not engage their liability saying something does or does not contain PII for the reason that it's out of their control: analytics implementations are customizable, and therefore any data point can potentially contain PII. So it's up to you (testing before implementation, monitoring once live) to ensure your GA implementation doesn't contain PII. If it does, you'll get a warning from Google. If you don't take any actions to correct it, they will shut down your account.
Related
I have used a JavaScript tag which is tracking Individual Logged in Users using Google Analytics. It has a statement "ga('send', 'pageview');", which results in showing the pages a particular user has viewed, in user explorer of Google Analytics. I want to send events (I have already created in GTM) along with pageviews with reference to User-ID i.e., shKey in my case.
Here is the code in my JS tag;
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-xxxxxxxxx-x' , 'auto');
ga('set', 'userId', arguments.shKey);
ga('set', 'dimension1', arguments.fname);
ga('set', 'dimension2', arguments.type);
ga('send', 'pageview');
ga('send', 'event');
</script>
You have 2 issues: your code won't send an event (you're missing parameters), and you need to address PII (see further below)
This is how you send an event programmatically:
https://developers.google.com/analytics/devguides/collection/analyticsjs/events
ga('send', 'event', 'My Category', 'My Action', 'My Label');
However if you're using GTM, you might want to look into how to handle events with GTM:
https://developers.google.com/tag-manager/devguide
// You need to configure GTM UI to handle below event as desired
dataLayer.push({'event': 'event_name'});
Regarding PII, this is the authoritative answer from Google:
https://support.google.com/analytics/answer/6366371?hl=en
Google policies mandate that no data be passed to Google that Google
could use or recognize as personally identifiable information (PII).
PII includes, but is not limited to, information such as email
addresses, personal mobile numbers, and social security numbers
If you want to use PII, it must be encrypted before being sent, from same link as above:
You can send Google Analytics an encrypted identifier or custom
dimension that is based on PII, as long as you use the proper
encryption level. Google has a minimum hashing requirement of SHA256
On top of that there might be additional legal restrictions depending on the jurisdiction you and your customers are in (eg in Europe you would have GDPR).
Your ga('send', 'event') has a wrong signature. Events require category and action. So your event should look something like: ga('send', 'event', 'category', 'action', 'label');
You're doing it strategically wrong. You're solving through JS what is supposed to be solved via GTM UI. In short term, it may seem easier and more elegant to do it in JS. In long term, however, it makes GTM obsolete and increases your technical debt, so the implementation becomes really difficult to maintain.
The use of a web app is to be evaluated statistically. It has been publicly available since spring of this year.
The web app is linked to Google Analytics. The following is done for the own user data collection:
A Unique User ID is created when the web app is called for the first time. It is stored in the localStorage and is compared each time the page is called up again.
if (localStorage.getItem("uuid") === null) {
localStorage.setItem("uuid", get_uuid());
}
function get_uuid() {
return ([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g, c =>
(c ^ crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> c / 4).toString(16)
)
}
This data is written to a database together with other information (concrete page, time, device type, etc.). Users without Javascript or localStorage will not be included; however, they will probably not be able to use the web app correctly anyway.
If I now compare the data from Google Analytics with my own variant, the discrepancy is considerable.
Different users according to Google: about 900
Different users due to UUID: about 400
Additionally about 100 visits (or interactions) without UUID were registered.
Now my question is why these big differences exist. In my opinion, my data collection should be pretty accurate. But maybe I have a thinking error with the approach of the UUID? Or could it be that Google counts quite differently; for example, any robots that don't leave a UUID behind?
Thank you very much for your answers and considerations.
I'm quite sure you have encountered Google Analytics (GA) spam.
This is because GA is JavaScript and your ID is listed in the html source.
So anyone who wants to create spam on your data can use your ID.
Why you ask... When you notice it you see that there are webpages listed you don't know in your GA data, you (the admin) open them and get a virus or worse.
Don't open the webpages...
There are as far as I know two ways to fix it. Regex filter wich is a common way.
All webpages that has refferals from other domains you don't "know" you need to block.
This takes time and is not a good approach.
My method is to pass a dimension from the html to GA.
If that dimension is missing the data is not real.
Your JavaScript probably looks something like:
.....
ga('require', 'linkid', 'linkid.js');
ga('require', 'displayfeatures');
ga('send', 'pageview');
</script>
If we add a dimension which we pick up in GA admin tools
.....
ga('require', 'linkid', 'linkid.js');
ga('require', 'displayfeatures');
ga('send', 'pageview', {
'dimension1': 'FooBar'
});
</script>
Go to admin -> Property (the middle column) and at the bottom you have Dd Custom Definitions.
Open Custom Dimensions and add the dimension you added to the html.
Now you can set up a filter in the view tab of GA admin to only show data with your custom dimension "FooBar".
Any data that does not have this "FooBar" is spam that is not generated from your webpage.
Just remember you need to change all GA JavaScript codes and add the dimension.
You can see this spam (if I'm correct) in the Acquisition -> All Traffic -> Referrals report.
If you see Sources that you don't recognize and looks odd it's most likely the spam.
Before I used this method my Referrals looked something like this, there is about 50 of these fake referrals.
I recently get a partnership with a website, in order to put my Universal Analytics tracking code on a single web page. The website's owner has already one.
I've searched on Google Developers and Stack to find how to use multiple tracking codes. I built this code but I'm not sure whether it's OK, I don't want to interfere with his data. I just want to get common analytics data in my own account.
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
//1- Create default tracker and mytracker
ga('create', 'UA-XXXXX-1', 'auto'); // His default tracker
ga('create', 'UA-XXXXX-50', 'auto', 'mytracker'); // My tracker
//2- Get the pageview data of mytracker
ga(function(){
var nonDefaultTracker = ga.getByName('mytracker');
var mytrackerPageview = nonDefaultTracker.get('pageview');
console.log(mytrackerPageview);
});
//3- Update mytracker pageview data
ga('mytracker.set','pageview');
//4- Send mytracker and default tracker pageview data
ga('send','pageview');
ga('mytracker.send','pageview');
</script>
Someone already did this ?
Thanks!
You do not need step 2 and 3. The get method of the tracker returns the field specified in the argument, and since GA does not have a data field called pageview this will be undefined (which is the value you set as page path - you probably do not want this).
However pageview tracking will work fine without trying to retrieve values from the previous tracker. Unless you have set fields specifically set for a named tracker both will report the same thing.
My project is on angularjs, which is both for mobile app and desktop site.
I have saved analytics.js on my local and used it in GA tracking code.
My tracking code is in index.html file:
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','js/analytics.js','ga');
ga('create', 'UA-57325467-1', 'auto');
ga('set', 'checkStorageTask', null);
ga('set', 'checkProtocolTask', null);
ga('send', 'pageview');
In cotroller , I have used following code for page tracking:
ga('send', 'pageview', $location.url());
It's showing 1 user online(that's me).It's tracking events on all pages.But problem is with pageview.
When i am visiting different pages of my application and checking in goolge analytics real time->overview
I am seeing "/" there ,whereas when I am coming back to the pages then page tracking is working fine(but not always).
Basically it's not tracking pages all the time.Sometimes it works and sometimes I see only "/".
For using google analytics in angular projects I recommend using the angular module.
It does what you want automatically.
I know it is not an direct answer to your question, but I thought instead of debugging yourself you might use a unit tested library.
Ensure that have enabled the Real time Analytics. Follow the steps in this reference document to enable Real time analytics
https://support.google.com/analytics/answer/1638635?hl=en
Other Reasons you are data may not be shown are
Data collection limit:
If a property sends more hits per month to Analytics than allowed by the Analytics Terms of Service,
there is no assurance that the excess hits will be processed. If the property's hit volume exceeds this limit,
a warning may be displayed in the user interface and you may be prevented from accessing reports.
Data processing latency:
Processing latency is 24-48 hours. Standard accounts that send more than 200,000 sessions per day to Analytics will result in the reports being refreshed only once a day.
This can delay updates to reports and metrics for up to two days.
To restore intra-day processing, reduce the number of sessions your account sends to < 200,000 per day.
Reference:
https://support.google.com/analytics/answer/1070983?hl=en
UPDATE
http://jsfiddle.net/musicisair/rsKtp/embedded/result/
Google Analytics sets 4 cookies that will be sent with all requests to that domain (and ofset its subdomains). From what I can tell no server actually uses them directly; they're only sent with __utm.gif as a query param.
Now, obviously Google Analytics reads, writes and acts on their values and they will need to be available to the GA tracking script.
So, what I am wondering is if it is possible to:
rewrite the __utm* cookies to local storage after ga.js has written them
delete them after ga.js has run
rewrite the cookies FROM local storage back to cookie form right before ga.js reads them
start over
Or, monkey patch ga.js to use local storage before it begins the cookie read/write part.
Obviously if we are going so far out of the way to remove the __utm* cookies we'll want to also use the Async variant of Analytics.
I'm guessing the down vote was because I didn't ask a question. DOH!
My questions are:
Can it be done as described above?
If so, why hasn't it been done?
I have a default HTML/CSS/JS boilerplate template that passes YSlow, PageSpeed, and Chrome's Audit with near perfect scores. I'm really looking for a way to squeeze those remaining cookie bytes from Google Analytics in browsers that support local storage.
Use this:
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
if(window.localStorage) {
ga('create', 'UA-98765432-1', 'www.example.com', {
'storage': 'none'
, 'clientId': window.localStorage.getItem('ga_clientId')
});
ga(function(tracker) {
window.localStorage.setItem('ga_clientId', tracker.get('clientId'));
});
}
else {
ga('create', 'UA-98765432-1', 'www.example.com');
}
ga('send', 'pageview');
First, I check if localStorage is supported. If it is supported then the 'storage': 'none' option will disable cookies. Now we can set the clientId from localStorage. If it is empty, Google Analytics will generate a new one for us. We save the new (or existing) clientid in localStorage after the tracker loads.
If localStorage is not supported, I just use the regular analytics method. After the initialization I send a pageView via ga('send', 'pageView').
Also, check out this plunk: http://plnkr.co/MwH6xwGK00u3CFOTzepK
Some experimentation in chrome shows that it may be possible to use getters and setters to patch document.cookie for this, something like:
document.__defineGetter__('cookie', function () {
// Replace this with code to read from localstorage
return "hello";
});
document.__defineSetter__('cookie', function (value) {
// Replace this with code to save to localstorage
console.log(value);
});
ga.js (or any other javascript) could run and access cookies as normal, they just would never get passed to the server.
Obviously this will only work in some browsers. Browsers in which it doesn't work will have to fall back to normal cookies.
There's some related ideas in this question: Is it possible to mock document.cookie in JavaScript?
Yes it can be done. You only have to request the __utm.gif with the parameters. The rest of the data is just used for keeping track of the source, session start time and/or previous visits.
You can easily transfer the cookies both ways, so your first approach should work fine.
If your second approach works... not sure. I don't know the ga.js code good enough to estimate wheter that would or would not be easily possible.
There is also a third option, run your own version of ga.js. You are not required to use the Google version.
Can it be done as described above?
Yes
Why hasn't it been done?
the cookies are small, there isn't that much benefit if you use cookieless domains for all your static content
it's less convenient since a lot of browsers don't support it yet