Override UserID in Google Analytics

Override UserID in Google Analytics - javascript

To import my crm data into Google analytics (GA), I linked the UserID of my users with ClientID in GA.
For this, I used the following code from GA documentation:
ga('set', 'userId', '432432');
Over time, the format of the User IDs on my website has changed - instead of the numbers, hashes are now used.
Can I now use the same code above, but only with new identifiers of my users, to send UserIDs то GA without damage current analytics?
In short, can I override the current User IDs in GA so that one user is not identified by the GA system as two different people?

You can't overwrite the historical data already processed by Google Analytics, only those of the current day.
You could apply the new algorithm only to new users of the crm, from a given id on, leaving the same encoding (numbers) for the previous ones (users already processed by Analytics).

If you have a mapping table between the new and old ids there is a solution.
You need to take out the data from GA - historically. (You can use the paid service from Scitylana for this - you get data in BigQuery or in S3)
Then you need a copy of the new_id old_id mappings table in the database where you put the exported data from Scitylana.
Since you can no longer rely on the ga:userType variable (new/returning). You need to create a query that calculates it again using the consolidated new ids.
This can all be set up in a flow that updates nightly.
But you need to analyze via SQL or use a dashboard tool like Power BI, Data Studio, Tableau etc.
Since the data from Scitylana is hit-level you can calculate everything correct, no need to worry about double aggregations etc.
(I work at Scitylana)

Related

Creating temp URLs in single page applications

In my react based single page application, my page is divided in two panes.
Left Pane: Filter Panel.
Right Pane: Grid (table containing data that passes through applied filters)
In summary, I have an application that looks very similar to amazon.com. By default, when user hits an application's root endpoint (/) in the browser, I fetch last 7 days of data from the server and show it inside the grid.
Filter panel has couple of filters (e.g. time filter to fetch data that falls inside specified time interval, Ids to search data with specific id etc.) and a search button attached in the header of filter panel. Hitting search button makes a post call to a server by giving selected filters inside post form body, server returns back data that matches filters passed and my frontend application displays this data returned back from the server inside grid.
Now, when someone hits search button in the filter panel I want to reflect selected filters in the query parameter of the URL, because it will help me to share these URLs with other users of my website, so that they can see filters I applied and see data inside the grid matching these filters only.
Problem here is, if on search button click, I use http get with query parameters, I will endup breaking application because of limit imposed on URL length by different browsers.
Please suggest me correct solution to create such URLs that will help me to set the selected filters in the filter panel without causing any side effect in my application.
Possible solution: Considering the fact that we cannot directly add plain strings in query parameter because of URL length limitation from different browsers (Note: Specification does not limit the length of an HTTP Get request but different browsers implement their own limitations), we can use something like message digest or hash (convert input of arbitrary length into an output of fixed length) and save it in DB for server to understand the request and serve content back. This is just a thought, I am not sure whether this is an ideal solution to this problem.
Behavior of other heavily used websites:
amazon.com, newegg.com -> uses hashed urls.
kayak.com -> since they have very well defined keywords, they use
short forms like IN for INDIA, BLR for Bangalore etc. and combine
this with negation logic to further optimize maximum url length. Not
checked but this will ideally break after large selection of filters.
flipkart.com -> appends strings directly to query parameters and breaks
after limit is breached. verified this.

In response to #cauchy's answer, we need to make a distinction between hashing and encryption.
Hashing
Hashes are by necessity irreversible. In order to map the hash to the specific filter combination, you would either need to
hash each permutation of filters on the server for every request to try matching the requested hash (computationally intensive) or
store a map of hash to filter combination on the server (memory intensive).
For the vast majority of cases, option 1 is going to be too slow. Depending on the number of filters and options, option B may require a sizable map, but it's still your best option.
Encryption
In this scheme, the server would send its public key to the client, then the client could use that to encrypt its filter options. The server would then decrypt the encrypted data with its private key. This is good, but your encrypted data will not be fixed length. So, as more options are selected, you run into the same problem of indeterminate parameter length.
Thus, in order to ensure your URL is short for any number of filters and options, you will need to maintain a mapping of hash->selection on the server.
How should we handle permanent vs temporary links?
You mentioned in your comment above
If we use some persistent store to save the mapping between this hash to actual filters, we would ideally want to segregate long-lived "permalinks" from short-lived ephemeral URLs, and use that understanding to efficiently expire the short-lived hashes.
You likely have a service on the server that handles all of the filters that you support in your application. The trick here is letting that service also manage the hashmap. As more filters and options are added/removed, the service will need to re-hash each permutation of filter selections.
If you need strong support for permalinks, then whenever you remove filters or options, you'll want to maintain the "expired" hashes and change their mapping to point to a reasonable alternative hash.
When do we update hashes in our DB?
There are lots of options, but I would generally prefer build time. If you're using a CI solution like Jenkins, Travis, AWS CodePipeline, etc., then you can add a build step to update your DB. Basically, you're going to...
Keep a persistent record of all the existing supported filters.
On build, check to see if there are any new filters. If so...
Add those filters to the record from step 1.
Hash all new filter permutations (just those that include your new filters) and store those in the hash DB
Check to see if any filters have been removed. If so...
Remove those filters from the record from step 1.
Find all the hashes for permutations that include those filters and either...
remove those hashes from the DB (weak permalinks), or
Point that hash to a reasonable alternative hash in the DB (strong permalinks)

Lets analyse your problem and the solution possible.
Problem : You want a URL which has information about the filter applied so that when you share that URL user doesn't land on arbitrary page.
Solutions:
1) Append filter applied with URL. To achieve this you will need to shorten the key of type of filter and the value of filter so that Length of URL don't exceed much for each filter.
Drawback: This is not most reliable solution as the number of filter increase URL length has to increase no other option.
2) Append a unique key of filter applied(hash) with URL. To achieve this you will need to do some changes on server and client both. On client side you will need a encoding algorithm which convert filter applied to unique hash. On server side you will need decoding algorithm which convert unique hash to filter applied. SO now client whenever a URL like this is hit you can make a POST api call which take this hash give you the array of filter applied or on client side only put the logic to convert this hash.
Do all this in componentWillMount to avoid any side effect.
I think 2nd solution is scalable and efficient in almost all cases.

any reason not to send data with events as opposed to custom dimensions?

It seems that Custom Dimensions are the new way to send custom data from a website to GA. I'm new to GA but my manager has used GA in the past and I'm guessing this was before the CD structure existed in GA. Do you know when the CD structure was introduced in GA?
He has sent custom data to GA in the past using events. This seems like a viable way of sending data and another manager at my company had referred to this approach last week so it seems like maybe this was a standard approach before GA introduced CD's. So given the following request:
var myRequest =
{
UserID:1234,
SelectedReportType:1,
};
What are the tradeoffs between sending this request data to GA as a CD like this:
ga('set', 'dimension1', JSON.stringify(myRequest));
ga('send', 'pageview');
Vs sending this request data to GA as event data like this:
ga('send', 'event', {
'eventCategory':'MyWidgetUserSelection',
'eventAction':JSON.stringify(myRequest)
});
?

Custom dimension where introduced with the switch from "classic" Analytics to Universal Analytics (IIRC that was in 2012), where they replaced (more or less) custom variables.
"Classic Analytics" (not an official name, AFAIK the previous GA version did not have a name other than GA) was a pretty messy thing that pretty much used the technology of the original Urchin tracker (Urchin was a web tracking company Google acquired in the early 2000s and rebranded their product as Google Analytics). Classic analytics pre-computed a lot of data on the client side (using up to five different cookies), including traffic source attribution, before it made a rather convoluted image request to the Google server.
In contrast Universal Analytics was designed on top of a clean protocol, the measurement protocol. It is "universal" because any device or program that can make a http request can now send data to Google Analytics. Universal Analytics does not compute any data on the client side, the data is processed only after it arrives at the Google tracking servers.
"Classic" Analytics had up to five custom variables in different scopes (hit, session,user)). They were displayed in the "custom" menu item of the GA interface (which is still there, but is now useless unless you have old data that was collected with classic analytics). Five variables posed a pretty tight limit, plus it was not always easy to understand how exactly they were supposed to work. So people developed a habit of storing additional data not in custom variables, but in events.
Universal Analytics in the free (commercial) version offers 20 (200) custom dimensions in four different scopes, to wit hit, session, user and product (and an additional 20 (200) custom metrics, although very few people seem to use custom metrics). "Hit scope" means you can add a dimension to every single interaction. "Session scope" only retains the last value for a session. "User scope" is primarily for values that are set once per recurring user (i.e. a user turns into a customer). With the product scope you can add additional properties to the products in an ecommerce-transaction (or production impression etc. if you are using enhanced e-commerce tracking).
Conceptually event tracking and custom dimensions are not remotely comparable. A dimension is a property that is connected to an interaction hit (or a collection of interaction hits like a session or a user) and allows to break down metrics into indivual rows. For example the "pageview" metric can be broken down by page path or page title, which are automatically collected. You might add a custom dimension "page category" and you can break down your total number of pageviews into separate rows that show the number of pageviews per category.
Custom dimensions do not have their own report; you can select them as secondary dimension in a standard report, or create custom reports based on them. You can also use custom dimensions to segment sessions or users by the respective values for the dimension.
Events on the other hand are interactions in their own right, with their own set of default metrics and dimensions (in fact you can amend events with their own custom dimensions). Proper usage of events is to track interactions that not load a new page (or do not change the page content enough to warrant a pageview call).
You can use events for segmentation (i.e. "show only sessions where the user had a certain event"), but you cannot break down pageview metrics by event properties. That is actually the main difference.
A more practical concern is that events, unlike custom dimensions, count toward you data collection limit (the free version of Google Analytics allows for 10 mio hits per month only, although the limit is so far not strictly enforced). Since custom dimension are not interactions by themselved they do not count towards the quota.

Meteor facebook id vs accounts id

I am making an application which uses both the accounts package and facebook's graph api. Specifically the friends api. The friends api returns all facebook friends that have used the application. The problem is that it returns facebook id's, and the accounts package generate application specific id's. This is problematic when i want to retrieve information from a collection containing a friends information, but stored with the application specific id. I have worked around this by storing both the fb id and the accounts id in the collection.
But i still can't update a user data based on their fb id, as update is only permitted using the application specific id. What i want, but not allowed:
UserData.update({fbId: friend.fbId},{$push: {some: data}});
The only solution i could think of is to get each user id first, like this:
var friendId = UserData.findOne({fbId: friend.fbId})._id;
This is obviously not a good solution as it needs one extra db call for every update.
Is there a way of setting the accounts id equal to the facebook id upon creation? Or do you have any other suggestions.

Extending on the comment above:
MoeRum: #Xinzz UserData is a custom collection. If try updating with fbId I get the
following error: Uncaught Error: Not permitted. Untrusted code may
only update documents by ID. [403]
That is because you're trying to update on the client-side. You can only update by ID on the client-side. What you're trying to do should not be a problem as long as you do it on the server.
From the Meteor docs (for more reference: http://docs.meteor.com/#/full/update):
The behavior of update differs depending on whether it is called by
trusted or untrusted code. Trusted code includes server code and
method code. Untrusted code includes client-side code such as event
handlers and a browser's JavaScript console.
Trusted code can modify multiple documents at once by setting multi to
true, and can use an arbitrary Mongo selector to find the documents to
modify. It bypasses any access control rules set up by allow and deny.
The number of affected documents will be returned from the update call
if you don't pass a callback.
Untrusted code can only modify a single document at once, specified by
its _id. The modification is allowed only after checking any
applicable allow and deny rules. The number of affected documents will
be returned to the callback. Untrusted code cannot perform upserts,
except in insecure mode.

only allow access to url in certain locations (qr codes)

My company has a social networking platform that is accessed via a URL.
We are trying to find a way to advertise our URL in sports stores, with access to our site only possible if you come to the store - we do not want the sharing of the URL to anyone, anywhere.
We have considered QR codes and wonder if it's possible our site can only be accessed when a provided QR code is scanned.
Please let me know if you have any suggestions.

You are basically looking for keys/ access codes that give your customers access to a site.
Those might have to be on a per-user basis, as otherwise one might just leak an access code for the whole public to use.
If sharing keys should be disallowed:
You need a database (sql) to store your cusstomers' information.
Depending on how you generate a key (dynamically, by a set of rules or randomly, using a catalogue of valid keys) you might need a further table to store the keys separately (in case you choose the more secure option of generating a predefined set of random keys)
You can then include those keys in your QR Codes' target URL like www.example.com?key=1jh303u or something similar.
(This means of course that you have to produce customized QR Codes, which in turn means they cannot be printex as a standard mass-produced offset job, but as a customized digital print - so you'd have to send all the different generated QR Codes to your printer)
Once the user visits this URL containing the query string, your site can then check to see if the key is a) valid (in the table) and b) unused, by taking the &_GET["key"] variable and querying the database.
If the key is invalid, output an "access denied" page.
If the key exists but has already been checked in, you can use a user-based login system to handle the login.
If the key is valid and hasn't been used yet, you can output your exclusive content at last.
If it doesn't matter whether people will be able to share their key, you don't need a database at all. You could build a keygen which creates keys after a certain destinct pattern, and use that same set of rules to validate against the entered key.

google analytics API custom variable

I have a google analytics account for an asp.net application which determines one of many clients based on a queryString within the URL. The snippet for the GA page is located within the master page, so the GA code is consistent across all clients.
What is the most efficient process for getting a set of basic analytics through the GA portal, per client (per queryString)?
---Edit---
Off hand , but never accomplished, I want to know if I can set a variable var1 within the snippet that gets set to GA identifying a client, then get discrete but identical reports, per var1 on the users. So that var[0] .. var[n-1] = Visitors where n is the number of clients.

You can just push the unique query string (assuming it's some sort of uID) as a custom variable into analytics. If it's sequential, you can assign ranges to clients (once again, am just guessing what your setup is like).
Alternatively, you can use the uID as key and have visit frequency as the value. (here is some prototyping code to check custom var key values https://github.com/vly/js_ga_cvars)
Just remember, when you set the custom vars you have to push a pageview or event to actually pass it to GA.
If you have access to the GA Universal beta program, you have the opportunity to define custom metrics (in case of a numeric data) or dimensions which would make aggregation reporting a lot easier.

If the client is an individual person, this is not allowed by the google terms of service.
You will not (and will not allow any third party to) use the Service
to track, collect or upload any data that personally identifies an
individual
http://www.google.com/analytics/terms/us.html

Develop Reference

JavaScript is the programming language of the Web.