How does HotJar generate their recordings?

How does HotJar generate their recordings? - javascript

Tracking mouse movement/scroll/click events is easy but how do they save the screen and keep it in sync so well?
The pages are rendered very quite well (at least for static HTML pages, haven't tested on Angular or any SPA), the sync is almost perfect.
To generate and upload a 23fps recording of my screen (1920x1080) it would take about 2Mbps of bandwidth. Maybe when recording only when there are some mouse events it would still take some 300-500Kbps on average? That seems way too much...

HTML content and DOM changes get pumped through a websocket and stored by Hotjar (minus sensitive information such as form inputs from the user, unless you've whitelisted them), the CSS isn't stored (it gets loaded by you when you watch the recording).
Because they're only recording user activity and DOM changes, there's a lot less data to record than if they were capturing a full video. The downside is that some Javascript driven widgets won't function correctly in the replay.
Relevant information from Hotjar docs:
When it comes to recordings, changes to the page are captured using the MutationObserver API which is built-in into every modern browser.
This makes it efficient since the change itself is already happening
on the page and the browser MutationObserver API allows us to record
this change which we then parse and also send through the websocket.
At regular short intervals, every 100ms or 10 times per second, the cursor position and scroll position are recorded. Clicks are recorded
when they happen, capturing the position of the cursor relative to the
element being clicked. These are functions which in no way hinder a
user's experience as they only capture the location of the pointer
when a click happens or every 100ms. The events are sent to the Hotjar
servers through frames within the websocket, which is more efficient
than sending XHR requests at regular intervals.
Source: https://help.hotjar.com/hc/en-us/articles/115009335727-Will-Hotjar-Slow-Down-My-Site-

Related

Detect change to element id=ASIN when Ajax is making changes

This question is more of a general coding practice question. I am writing an extension for Google Chrome that gathers the ASIN number from Amazon.com when viewing an item page.
Design requirements:
When viewing a page that contains an element with id=ASIN, capture the ASIN. (E.g. B004FYJFNA from http://www.amazon.com/gp/product/B004FYJFNA/?tag=justinreinhart-20 )
When user changes platforms (e.g. from Playstation 3 to Xbox 360), detect that a change has occurred and capture the new ASIN value.
I have a "content script", injection.js, that is injected on every amazon page and can successfully perform requirement #1.
The issue I have is with #2. I am not able to efficiently detect a change to the ASIN value. Either the code doesn't fire OR I pick an element that fires ~100 times. I am at a loss how to do this efficiently.
Failure examples:
// (A) This fires ~100 times when user changes platforms,
// and also fires during mouseover events.
// Unacceptable but it does work.
$("#handleBuy").bind('DOMNodeRemoved', OnProductChange);
// (B) This doesn't fire at all. I have a few guesses why but no certainties.
$("#ASIN").on('change', OnProductChange);
Blathering:
Switching product platforms when the user clicks seems to tear the Amazon page apart and it destroys any event binding that I attempt. (I believe the element is removed and reinserted--not just changed.) I do not know javascript well enough to skillfully take these massive DOM changes into account.
Your help and knowledge is appreciated.

Track when the user received first bytes of the video

There is a web page which has HTML5 video in it. When the user clicked start or when he navigates through the timeline, the video starts (either from start or from the position he selected). But it does not always happens instantly. I wanted to find how much time did it took from the user click event and the time the user received first bytes of the video.
Getting time of userclick is not a problem, but while looking through HTML5 video API here and I was not able to find any event which is close to what I am looking for.
Is it possible to tack such event?

The event(s) you listen for after you receive the click (or "play" or "seeking") event depends on the state of the video before the time of the click.
If you have a fresh, unplayed video element with the preload attribute set to "none", then the first data you're going to receive from the network is the metadata. so you can listen for the "loadedmetadata" event.
If preload is set to "metadata", you might have already loaded metadata, depending on the browser and platform. (e.g., Safari on iPad will not load metadata or anything else until the first user interaction.) In that case, you want to listen for either "loadedmetadata" or "progress". It couldn't hurt to listen for "loadeddata" as well, but I think "progress" fires first.
If preload is set to "auto" or if you've already played some of the video, you might have some actual video data. And while you're likely to have data at the current point on the timeline, you may or may not have it at the seek destination. It depends at least on how far ahead (or behind) you're seeking, how fast data is coming in and how much spare room the browser has in the media cache.
If there is no data at the destination time (you can check this in advance if you want with the .buffered property, see TimeRanges), then the next event you see will be either "loadeddata" or "progress", probably followed by "canplay". If there is enough data buffered at the target time of the seek, then the question doesn't really apply because nothing else will be transferred.
However, in any of the above cases, once there is enough data to display the frame at the new point on that timeline and that data has been decoded, the "seeked" event will fire. So if you were to only pick one (no reason you can't use more), this is the one to pick.

Record and replay Javascript

I know it is possible to record mouse movements, scrolling and keystrokes. But what about changes to the document? How can I record changes to the document?
Here is my try out. There must be a better more simple way to store all events?
I am thankful for all tips I can get!
<!DOCTYPE html>
<html>
<head>
<title>Record And replay javascript</title>
</head>
<body id="if_no_other_id_exist">
<div style="height:100px;background:#0F0" id="test1">click me</div>
<div style="height:100px;background:#9F9" class="test2">click me</div>
<div style="height:100px;background:#3F9" id="test3">click me</div>
<div style="height:100px;background:#F96" id="test4">click me</div>
<script src="http://code.jquery.com/jquery-latest.min.js"></script>
<script>
$(document).ready(function() {
var the_time_document_is_redy = new Date().getTime();
var the_replay = '';
$('div').live("click", function (){
var the_length_of_visit = new Date().getTime() - the_time_document_is_redy;
// check if the element that is clicked has an id
if (this.id)
{
the_replay =
the_replay
+
"setTimeout(\"$('#"
+
this.id
+
"').trigger('click')\","
+
the_length_of_visit
+
");"
;
alert (
"The following javascript will be included in the file in the replay version:\n\n"
+
the_replay
) // end alert
} // end if
// if it does not have an id, check if the element that is clicked has an class
else if (this.className)
{
// find the closest id to better target the element (needed in my application)
var closest_div_with_id = $(this).closest('[id]').attr('id');
the_replay =
the_replay
+
"setTimeout(\"$('#"
+
closest_div_with_id
+
" ."
+
this.className
+
"').trigger('click')\","
+
the_length_of_visit
+
");"
;
alert (
"The following javascript will be included in the file in the replay version:\n\n"
+
the_replay
) // end alert
} // end if
});
// fall back if there are no other id's
$('body').attr('id','if_no_other_id_exist');
// example of how it will work in the replay version
setTimeout("$('#test1').trigger('click')",10000);
});
</script>
</body>
</html>

I became curious by this question and implemented a proof of concept here
https://codesandbox.io/s/jquery-playground-y46pv?fontsize=14&hidenavigation=1&theme=dark
Using the demo
Press record, click on some circles, type something in the input, press record again to stop the recording and finally click play.
You can tweak the size of the playback by editing the REPLAY_SCALE variable in the source code.
You can control the playback speed by changing the SPEED variable in the source code.
Note, I only tested this on Chrome.
Implementation details:
It monitors mousemove, click and typing events. It should be easily extensible to add others such as scroll, window resizing, hover, focus etc.
Playback creates an <iframe>, injects the original HTML and replays the user events.
The event listeners bypass any event.stopPropagation() by using capture when listening for events on the document.
Displaying playback in a different resolution is done using zoom CSS3.
A transparent canvas could be overlaid to draw the trace lines of the mouse. I use just a simple div so no trace lines.
Considerations:
Imagining we are capturing user events on a real website. Since the page served could change between now and the playback we can't rely on the client's server when replaying the recording in the iframe. Instead we have to snapshot the html, all ajax requests and resource requests made during the recording. In the demo I only snapshot the HTML for simplicity. However in practice, all extra requests would have to be stored on the server in realtime as they are downloaded on the client page. Furthermore, during playback it is important that the requests are played back with the same timeline that they were perceived by the user. To simulate the request timeline, the offset and duration of each request must also be stored. Uploading all page requests as they are downloaded on the client will slow down the client page. One way to optimize this uploading could be to hash the contents of the request before they are uploaded, if the hash is already present on the server, the request data need not be reuploaded. Furthermore, the session of one user can leverage the request data uploaded by another user using this hashing method. Finally, the browser itself need not do the uploading, provided all requests are going through a central server, this snapshotting can happen server side so as not to impact the user's experience.
Careful consideration will be needed when uploading all the user events. Since lots of events will be generated, this means lots of data. Perhaps some compression of the events could be made e.g. losing some of the less important mousemove events. An upload request should not be made per event to minimize number of requests. The events should be buffered until a buffer size or timeout is reached before each batch of events is uploaded. A timeout should be used as the user could close the page at any point thus losing some events.
During playback outgoing POST requests should be mocked to prevent duplicating events elsewhere.
During playback the user agent should be spoofed but this may be unreliable in rendering the original display.
The custom recording code could conflict with client code. e.g. jquery. Namespacing will be required to avoid this.
There might be some edge cases where typing and clicking may not reproduce the same resulting HTML as seen in the original e.g. random numbers, date times. Mutation observers may be required to observe HTML changes, although not supported in all browsers. Screenshots could come in useful here but might prove OTT.

Replaying user actions with just Javascript is a complex problem.
First of all, you can't move mouse cursor, you can't emulate mouseover/hovers also. So there goes away a big part of user interactions with a page.
Second of all, actions, once recorded, for most of the time they have to be replayed in different environment than they were recorded in the first place. I mean you can replay the actions on screen with smaller resolutions, different client browser, different content served based on replaying browser cookies etc.
If you take a time to study available services that enable you to record website visitors actions ( http://clicktale.com, http://userfly.com/ to name a few), you'll see that none of them are capable of fully replaying users actions, especially when it comes to mouseovers, ajax, complex JS widgets.
As to your question for detecting changes made to the DOM - as Chris Biscardi stated in his answer, there are mutation events, that are designed for that. However, keep in mind, that they are not implemented in every browser. Namely, the IE doesn't support them (they will be supported as of IE 9, according to this blog entry on msdn http://blogs.msdn.com/b/ie/archive/2010/03/26/dom-level-3-events-support-in-ie9.aspx).
Relying on those events may be suitable for you, depending on use case.
As to "better more simple way to store all events". There are other ways (syntax wise), of listening for events of your choice, however handling (= storing) them can't be handled in simple way, unless you want to serialize whole event objects which wouldn't be a good idea, if you were to send information about those event to some server to store them. You have to be aware of the fact, that there are massive amount of events popping around while using website, hence massive amount of potential data to be stored/send to the server.
I hope I made myself clear and you find some of those information useful. I myself have been involved in project that aimed to do what you're trying to achive, so I know how complicated can it get once you start digging into the subject.

I believe you are looking for Mutation Events.
http://www.w3.org/TR/2000/REC-DOM-Level-2-Events-20001113/events.html#Events-eventgroupings-mutationevents
Here are some resources for you:
http://tobiasz123.wordpress.com/2009/01/19/utilizing-mutation-events-for-automatic-and-persistent-event-attaching/
http://forum.jquery.com/topic/mutation-events-12-1-2010
https://github.com/jollytoad/jquery.mutation-events
Update:
In Response to comment, a very, very basic implementation:
//callback function
function onNodeInserted(){alert('inserted')}
//add listener to dom(in this case the body tag)
document.body.addEventListener ('DOMNodeInserted', onNodeInserted, false);
//Add element to dom
$('<div>test</div>').appendTo('body')
Like WTK said, you are getting yourself into complex territory.

Record
Save the initial DOM of the page, remove the scripts from it and also you need to change all relative URLs to absolute ones.
Then, record DOM mutations and Keyboard/Mouse event.
Replay
Start with initial saved DOM, apply mutations and events using timestamp order.
In fact, clicks will not do anything because we have removed any scripts. but because we have saved the DOM changes we can replay the effect after the click.

I found these two solutions on github which allows your to capture the events and then replay that on a remote server.
https://github.com/ElliotNB/js-replay
and a more comprehensive solution
https://github.com/rrweb-io/rrweb
https://www.rrweb.io/#demos
Both has demos which you can try.

Lately, we can now use MutationObserver
MutationObserver provides developers with a way to react to changes in
a DOM. It is designed as a replacement for Mutation Events defined in
the DOM3 Events specification.
Slow demo, because the console.log message is huge.
var mutationObserver = new MutationObserver(function(mutations) {
mutations.forEach(function(mutation) {
console.log(mutation)
})
})
mutationObserver.observe(watchme, {
attributes: true,
characterData: true,
childList: true,
subtree: true,
attributeOldValue: true,
characterDataOldValue: true
})
<div id="watchme" contenteditable>
Hello world!
</div>

Recording user data for heatmap with JavaScript

I was wondering how sites such as crazyegg.com store user click data during a session. Obviously there is some underlying script which is storing each clicks data, but how is that data then populated into a database? It seems to me the simple solution would be to send data via AJAX but when you consider that it's almost impossible to get a cross browser page unload function setup, I'm wondering if there is perhaps some other more advanced way of getting metric data.
I even saw a site which records each mouse movement and I am guessing they are definitely not sending that data to a database on each mouse move event.
So, in a nutshell, what kind of technology would I need in order to monitor user activity on my site and then store this information in order to create metric data? I am not looking to recreate GA, I'm just very interested to know how this sort of thing is done.
Thanks in advance

Heatmap analytics turns out to be WAY more complicated than just capturing the cursor coordinates. Some websites are right-aligned, some are left-aligned, some are 100%-width, some are fixed-width-"centered"... A page element can be positioned absolutely or relatively, floated etc. Oh, and there's also different screen resolutions and even multi-monitor configurations.
Here's how it works in HeatTest (I'm one of the founders, have to reveal that due to the rules):
JavaScript handles the onClick event: document.onclick = function(e){ } (this will not work with <a> and <input> elements, have to hack your way around)
Script records the XPath-address of the clicked element (since coordinates are not reliable, see above) in a form //body/div[3]/button[id=search] and the coordinates within the element.
Script sends a JSONP request to the server (JSONP is used because of the cross-domain limitations in browsers)
Server records this data into the database.
Now, the interesting part - the server.
To calculate the heatmap the server launches a virtual instance of a browser in-memory (we use Chromium and IE9)
Renders the page
Takes a screenshot,
Finds the elements' coordinates and then builds the heatmap.
It takes a lot of cpu-power and memory usage. A lot. So most of the heatmap-services including both us and CrazyEgg, have stacks of virtual machines and cloud servers for this task.

The fundamental idea used by many tracking systems uses a 1x1px image which is requested with extra GET parameters. The request is added to server log file, then log files are processed to generate some statistics.
So a minimalist click tracking function might look like this:
document.onclick = function(e){
var trackImg = new Image();
trackImg.src = 'http://tracking.server/img.gif?x='+e.clientX+'&y='+e.clientY;
}
AJAX wouldn't be useful because it is subject to same-origin policy (you won't be able to send requests to your tracking server). And you'd have to add AJAX code to your tracking script.
If you want to send more data (like cursor movements) you'd store the coordinates in a variable and periodically poll for a new image with updated path in the GET parameter.
Now there are many many problems:
cross-browser compatibility - to make the above function work in all browsers that matter at the moment you'd probably have to add 20 more lines of code
getting useful data
many pages are fixed-width, centered, so raw X and Y coordinates won't let you create visual overlay of clicks n the page
some pages have liquid-width elements, or use a combination of min- and max-height
users may use different font sizes
dynamic elements that appear on the page in response to user's actions
etc. etc.
When you have the tracking script worked out you only need to create a tool that takes raw server logs and turns them into shiny heatmaps :)

Don't know the exact implementation details of how crazyegg does it, but the way I would do it is to store mouse events in an array which I'd send periodically over AJAX to the backend – e.g. the captured mouse events are collected and sent every 30 seconds to the server. This recudes the strain of creating a request for every event, but it also ensures that I will only lose 30 seconds of data at maximum. You can also add the sending to the unload event which increases the amount of data you get, but you wouldn't be dependent on it.
Some example on how I'd implement it (using jQuery as my vanilla JS skills are a bit rusty):
$(function() {
var clicks = [];
// Capture every click
$().click(function(e) {
clicks.push(e.pageX+','+e.pageY);
});
// Function to send clicks to server
var sendClicks = function() {
// Clicks will be in format 'x1,y1;x2,y2;x3,y3...'
var clicksToSend = clicks.join(';');
clicks = [];
$.ajax({
url: 'handler.php',
type: 'POST',
data: {
clicks: clicksToSend
}
});
}
// Send clicks every 30 seconds and on page leave
setInterval(sendClicks, 30000);
$(window).unload(sendClicks);
});
Note that I haven't tested or tried this in any way but this should give you a general idea.

If you're just looking for interaction, you could replace your <input type="button"> with <input type="image">. These are automatically submitted with the X, Y coordinates of where the user has clicked.
jQuery also has a good implementation of the mousemove event binding that can track the current mouse position. I don't know your desired end result, but you could setTimeOut(submitMousePosition, 1000) to send an ajax call with the mouse position every second or something like that.

I really don't see why do you think that is impossible to store all click points in one user session to the database?
Their moto is "See Where People Click"
Once when you gather enough data it is fairly easy to make heat maps in batch processes.
People are really underestimating databases, indexing and sharding. The only hard thing here is to gather enough money for underlying architecture :)

How can I update a webpage when a change occurs on the server?

Is there a way to push pages on change rather than putting a timer on a web page to refresh every x mins? I guess what Im trying to do is not refresh an entire page when only a portion of it may have changed. I have seen on FB when an update happens, it has message saying new content available.
Perhaps you could MD5 a page then when an update happens the MD5 changes and the page could be checking this. Not exactly push but it would reduce the traffic of an entire page.
How can I update a webpage when a change occurs on the server?

a good practice to "reduce the traffic" is to load content through AJAX requests.
the "timer" you mentioned above is my preferred method with my previous comment and a bit of extra logic. This is know as long-polling.

One way is to watch for specific keyboard events and/or mouse events and update the page if certain criteria is met within those events.

Develop Reference

JavaScript is the programming language of the Web.