Massive differences between Google Analytics and own data collection - javascript

The use of a web app is to be evaluated statistically. It has been publicly available since spring of this year.
The web app is linked to Google Analytics. The following is done for the own user data collection:
A Unique User ID is created when the web app is called for the first time. It is stored in the localStorage and is compared each time the page is called up again.
if (localStorage.getItem("uuid") === null) {
localStorage.setItem("uuid", get_uuid());
}
function get_uuid() {
return ([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g, c =>
(c ^ crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> c / 4).toString(16)
)
}
This data is written to a database together with other information (concrete page, time, device type, etc.). Users without Javascript or localStorage will not be included; however, they will probably not be able to use the web app correctly anyway.
If I now compare the data from Google Analytics with my own variant, the discrepancy is considerable.
Different users according to Google: about 900
Different users due to UUID: about 400
Additionally about 100 visits (or interactions) without UUID were registered.
Now my question is why these big differences exist. In my opinion, my data collection should be pretty accurate. But maybe I have a thinking error with the approach of the UUID? Or could it be that Google counts quite differently; for example, any robots that don't leave a UUID behind?
Thank you very much for your answers and considerations.

I'm quite sure you have encountered Google Analytics (GA) spam.
This is because GA is JavaScript and your ID is listed in the html source.
So anyone who wants to create spam on your data can use your ID.
Why you ask... When you notice it you see that there are webpages listed you don't know in your GA data, you (the admin) open them and get a virus or worse.
Don't open the webpages...
There are as far as I know two ways to fix it. Regex filter wich is a common way.
All webpages that has refferals from other domains you don't "know" you need to block.
This takes time and is not a good approach.
My method is to pass a dimension from the html to GA.
If that dimension is missing the data is not real.
Your JavaScript probably looks something like:
.....
ga('require', 'linkid', 'linkid.js');
ga('require', 'displayfeatures');
ga('send', 'pageview');
</script>
If we add a dimension which we pick up in GA admin tools
.....
ga('require', 'linkid', 'linkid.js');
ga('require', 'displayfeatures');
ga('send', 'pageview', {
'dimension1': 'FooBar'
});
</script>
Go to admin -> Property (the middle column) and at the bottom you have Dd Custom Definitions.
Open Custom Dimensions and add the dimension you added to the html.
Now you can set up a filter in the view tab of GA admin to only show data with your custom dimension "FooBar".
Any data that does not have this "FooBar" is spam that is not generated from your webpage.
Just remember you need to change all GA JavaScript codes and add the dimension.
You can see this spam (if I'm correct) in the Acquisition -> All Traffic -> Referrals report.
If you see Sources that you don't recognize and looks odd it's most likely the spam.
Before I used this method my Referrals looked something like this, there is about 50 of these fake referrals.

Related

Can you re-"init" a Facebook pixel with different dataProcessing options?

With the advent of the California Consumer Privacy Act (CCPA), it's been necessary for some of our clients to implement Limited Data Usage (LDU) policies for Facebook. Our accepted practice has been to explicitly disable LDU fbq('dataProcessingOptions', []) until a user opts out (via a consent plugin). Here's the crux of my problem. Once a user opts out, I'd like to re-initialize the Facebook pixel with LDU enabled fbq('dataProcessingOptions', ['LDU'], 0, 0) so that future events on the page are processed using the LDU policies. Is it possible to simply call fbq('init', '{pixel_id}') a second time and have this "flag" set?
The Google Chrome Facebook Pixel extension will show what is sent for each event.
I was hoping that maybe sending something like fbq('trackCustom', 'optOut') might trigger it to re-send updated dataProcessing options, but it doesn't seem to.
Facebook is shooting everyone in the foot by not making this process clearer - it should absolutely be possible to wipe out data collected for the session and that's clearly the best way to do it.
I've spent all weekend trying to do this correctly from both technical and legal stand point and it's just a nightmare. CCPA is supposed to be opt-out!
This doesn't work:
// CCPA Notice. We allow California users to opt-out from Facebook's data collection by means
// of our 'Do not sell my information' link at the bottom of our website. Please use this link
// to trigger an opt-out via Facebook's API. Questions: privacy at example.com
fbq('dataProcessingOptions', []);
fbq('init', account_id);
fbq('track', 'PageView');
optOut()
{
fbq('dataProcessingOptions', ['LDO'], 1, 1000);
fbq('trackCustom', 'registerOptOut');
}
I'd recommend putting some text here because people are out to get us by finding vulnerable websites and this makes it look like I know what I'm doing.

Can you prevent users editing the chrome.storage.local values in a Chrome extension? What is the best way to store persistent values for an extension?

So I'm working on a Chrome extension for someone else. I don't want to give away specific details about the project, so for I'll use an equivalent example: let's assume it's an extension to run on an image/forum board. Imagine I have variables such as userPoints, isBanned etc. The later being fairly self-explanatory, while the former corresponding to points the user acquires as they perform certain actions, hence unlocking additional features etc
Let's imagine I have code like:
if(accountType !== "banned"){
if(userPoints > 10000) accountType = "gold";
else if(userPoints > 5000) accountType = "silver";
else if(userPoints > 2500) accountType = "bronze";
else if(userPoints <= 0) accountType = "banned";
else accountType = "standard";
}else{
alert("Sorry, you're banned");
stopExtension();
}
Obviously though, it becomes trivial for someone with the knowledge to just browse to the extensions background page and paste chrome.storage.local.set({'userPoints': 99999999}) in the console, hence giving them full access to all the site. And, with the Internet, someone can of course share this 'hack' on Twitter/YouTube/forums or whatever, then suddenly, since all they'd need to do is copy and paste a simple one-liner, you can have 1000s of people, even with no programming experience, all using a compromised version of your extension.
And I realise I could use a database on an external site, but realistically, it would be possible that I would be wanting to get/update these variables such as userPoints 200+ times per hour, if the user was browsing the extentions target site the entire time. So the main issues I have with using an external db are:
efficiency: realistically, I don't want every user to be querying the
db 200+ times per hour
ease-of-getting-started: I want the user to just download the
extension and go. I certainly don't want them to have to sign up. I
realise I could create a non-expiring cookie with for the user's ID
which would be used to access their data in the db, but I don't want
to do that, since users can e.g. clear all cookies etc
by default, I want all features to be disabled (i.e. effectively
being considered like a 'banned' user) - if, for some reason, the
connection with the db on my site fails, then the user wouldn't be
able to use the extension, which I wouldn't want (and just speaking
from experience of my parents being with Internet providers whose
connection could drop 10 times per hour, for some people, failed
connections could be a real issue) - in contrast, accessing data from
the local storage will have like a 99.999% success rate I'd assume,
so, for non-critical extensions like what I'm creating, that's more
than good enough
Still, at least from what I've found searching, I've not found any Chrome storage method that doesn't also allow the user to edit the values too. I would have thought there would be a storage method (or at least option with chrome.storage.local.set(...) to specify that the value could only be accessed from within the extension's context pages, but I've not found that option, at least.
Currently I'm thinking of encrypting the value to increment by, then obfuscating the code using a tool like obfuscator.io. With that, I can make a simple, like 30 character js file such as this
userPoints = userPoints + 1000;
become about 80,000...still, among all the junk, if you have the patience to scroll through the nonsense, it's still possible to find what you're looking for:
...[loads of code](_0x241f5c);}}}});_0x5eacdc(),***u=u+parseInt(decrypt('\u2300\u6340'))***;function _0x34ff36(_0x17398d)[loads more code]...
[note that, since it's an extension and the js files will be stored on the user's pc, things like file size/loading times of getting the js files from a server are irrelevant]
Hence meaning a user wouldn't be able to do something like chrome.storage.local.set({'userPoints': 99999999}), they'd instead have to set it to the encrypted version of a number - say, something like chrome.storage.local.set({'userPoints': "✀ເ찀삌ሀ"}) - this is better, but obviously, by no means secure.
So anyway, back to the original question: is there a way to store persistent values for a Chrome extension without the user being able to edit them?
Thanks

Which data is collected by Google Analytics PageView?

I am reviewing the privacy of data collected by Google Analytics when collecting on the default PageView action. Here is the code snippet being used:
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-1111111-11', 'auto');
ga('send', 'pageview');
</script>
I can't find any clear answer as to exactly which data is being collected. I want to make sure that no PII or PHI will be collected by accident if the page being tracked contains some, such as name, phone, medical info, etc.
Is there any clear guide that states which data is collected for PageView?
By default no PII should be collected by pageviews since the pageview data contains:
Page URL / Page title: which is publicly available information about others things (whatever content is on the website) and not the user itself
User browser / system info: which is technical information (eg browser version)
Like Marie explained, this is something you can verify yourself by inspecting the browser console, when browsing Stackoverflow for instance:
The payload being:
v=1&_v=j68&a=1398701675&t=pageview&_s=1&dl=https%3A%2F%2Fstackoverflow.com%2F&ul=en-us&de=UTF-8&dt=Stack%20Overflow%20-%20Where%20Developers%20Learn%2C%20Share%2C%20%26%20Build%20Careers&sd=24-bit&sr=1920x1080&vp=1840x486&je=0&_u=SACAAEABE~&jid=&gjid=&cid=1389717770.1529853314&uid=148108&tid=UA-108242619-1&_gid=263772020.1532245622&cd1=148108&cd3=Home%2FIndex&z=522475539
However in some cases, pageviews can collect PII, the most common case being that if the page URL or titles contain PII. I've faced such a situation with a company who were running GA on their intranet, and PII was getting exposed in 2 ways:
Employee profiles: https://myintranet.net/employees/firstname-lastname
Employee search: the most common use task of the intranet (large corporate) was to look people up via their email address, resulting in a search parameter added to the URL (https://myintranet.net/search/q=f.lastname#company.com) which was getting tracked as both pageview AND search keyword by GA
General remark about PII warnings: you simply won't find them. Google will not engage their liability saying something does or does not contain PII for the reason that it's out of their control: analytics implementations are customizable, and therefore any data point can potentially contain PII. So it's up to you (testing before implementation, monitoring once live) to ensure your GA implementation doesn't contain PII. If it does, you'll get a warning from Google. If you don't take any actions to correct it, they will shut down your account.

Google analytics page tracking is not working properly in angularjs app

My project is on angularjs, which is both for mobile app and desktop site.
I have saved analytics.js on my local and used it in GA tracking code.
My tracking code is in index.html file:
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','js/analytics.js','ga');
ga('create', 'UA-57325467-1', 'auto');
ga('set', 'checkStorageTask', null);
ga('set', 'checkProtocolTask', null);
ga('send', 'pageview');
In cotroller , I have used following code for page tracking:
ga('send', 'pageview', $location.url());
It's showing 1 user online(that's me).It's tracking events on all pages.But problem is with pageview.
When i am visiting different pages of my application and checking in goolge analytics real time->overview
I am seeing "/" there ,whereas when I am coming back to the pages then page tracking is working fine(but not always).
Basically it's not tracking pages all the time.Sometimes it works and sometimes I see only "/".
For using google analytics in angular projects I recommend using the angular module.
It does what you want automatically.
I know it is not an direct answer to your question, but I thought instead of debugging yourself you might use a unit tested library.
Ensure that have enabled the Real time Analytics. Follow the steps in this reference document to enable Real time analytics
https://support.google.com/analytics/answer/1638635?hl=en
Other Reasons you are data may not be shown are
Data collection limit:
If a property sends more hits per month to Analytics than allowed by the Analytics Terms of Service,
there is no assurance that the excess hits will be processed. If the property's hit volume exceeds this limit,
a warning may be displayed in the user interface and you may be prevented from accessing reports.
Data processing latency:
Processing latency is 24-48 hours. Standard accounts that send more than 200,000 sessions per day to Analytics will result in the reports being refreshed only once a day.
This can delay updates to reports and metrics for up to two days.
To restore intra-day processing, reduce the number of sessions your account sends to < 200,000 per day.
Reference:
https://support.google.com/analytics/answer/1070983?hl=en

How to use pure JavaScript to determine whether Facebook/Twitter is blocked?

Some countries like China is blocking Facebook/Twitter. How to use JavaScript to check whether a website is not accessible?
update:
I am adding a "Share to Facebook" button on a web page. 50% of the visitors are from China and 50% are from outside of China.
For those China visitors, they would never see that Facebook button because it's blocked. I want to use $.hide() or $.empty() to remove the related HTML if I detected that Facebook is blocked. How can I do that?
You can check if loading the facebook SDK is blocked in china (//connect.facebook.net/en_UK/all.js)
If this is the case then you could do something like this:
$.getScript('//connect.facebook.net/en_UK/all.js')
.success(function(){
// do something if facebook is available
});
You need to take care because you need to define a timeout if you want to make a callback for the fail case. I need to check the correct settings later, but currently i don't have time to.
EDIT
Based on the comment of funkybro it would be better to do a JSONP request. Loading the API would inject a butch of code you probably don't need.
So just request e.g.:
$.getJSON('https://graph.facebook.com/feed?callback=?')
.success(function(){
// do something if facebook is available
});
The request will include a failure code because you don't provide at graph node, but knowing that you get an error message from facebook means that it is reachable for the client.
Use jQuery.get like this:
$.get("http://facebook.com").fail(function() {
$(...).hide()
}).done(function() {
$(...).show()
})
Note that this is a cross-site request that will fail for security reasons unless you disable that browser feature.
If that's not possible for you, I suggest you use GeoIP or similar technologies to determine the users origin.

Categories

Resources