Building an Enterprise Grade OpenSource Web Analytics System – Part 2: Client Tracking

This is the second part of a seven-part-series explaining how to build an Enterprise Grade OpenSource Web Analytics System. In this post we are setting up the Client Tracking using the Javascript tracker from Snowplow Analytics. In the last post we took a look at the system architecture that we are going to build. If you are new to this series it might help to start with the first post.

When building a mature Web Analytics system yourself, the first step is to build some function into your app or website to enable sending events to the backend analytics system. This is called client side tracking, since we rely on the application to send us events instead of looking at logfiles alone. For this series we are going to look at website tracking specifically, but the same principles apply to mobile apps or even server side tracking.

Almost every mature Web Analytics solution relies on client side tracking. No matter if you are using Google Analytics or Adobe Analytics (if you are lucky), they all use code that needs to be implemented to the app or website. This is also true for OpenSource systems like Matomo (which I will continue to call Piwik) or the ancient Open Web Analytics (which has become more active again in the recent time). Given that you are working anywhere around Web Analytics you will very likely have heard about Snowplow Analytics somewhere, so let’s have a look at that.

Introducing Snowplow Analytics

Giving a complete introduction into Snowplow is far beyond the scope of this post. They have both an OpenSource and a commercial offering for a complete analytics platform. This also includes processing and storing the data, which we are going to build on our own for this project. Since we are going to need the client code, Snowplow is an ideal solution because of all the different platforms they support (see the Github Repo for a list of different platforms. It’s amazing!) The most important feature for our project is the Javascript Tracker, which is only a single file that you can download from Github.

This Javascript Tracker will help us a lot with data collection. For example, it saves request that could not be sent due to connection issues and resends them later on. It also has some handy features for things like link tracking, time spent on site or loading times. We could build all of this ourselves but that would be a whole project on itself.

The download gives us our client code as a single Javascript file called sp.js. Let’s integrate that into a webpage to start tracking some events! Snowplow needs some code beyond just including the file itself. For our experiment, include it like this:

<script type="text/javascript" async=1>
    ;(function(p,l,o,w,i,n,g){if(!p[i]){p.GlobalSnowplowNamespace=p.GlobalSnowplowNamespace||[];
    p.GlobalSnowplowNamespace.push(i);p[i]=function(){(p[i].q=p[i].q||[]).push(arguments)
    };p[i].q=p[i].q||[];n=l.createElement(o);g=l.getElementsByTagName(o)[0];n.async=1;
    n.src=w;g.parentNode.insertBefore(n,g)}}(window,document,"script","sp.js","snowplow_tracker"));
</script>

This code creates the tracking functions. If you have renamed the downloaded file, you can replace “sp.js” with the filename of your script file. Also, you can change the name of the tracking function by changing “snowplow_tracker” to whatever name you like.

In the next step we are going to configure our tracker to match our system setup. The code for our experiment looks like this for my code version 2.12.0:

snowplow_tracker("newTracker", "webtracking", "127.0.0.1:8888/analytics", {
  appId: "testpage",
  eventMethod: "get",
  encodeBase64: false,
  userFingerprint: true,
  platform: "web",
  contexts: {
    webPage: true,
    performanceTiming: true,
    gaCookies: false,
    geolocation: false
  }
});

There is a lot going on here. So let’s start in line 1, where there are two important parameters to configure. First, we can define a namespace for our tracker, which is “webtracking” in our case but could be your company name. Second, we give Snowplow an URL endpoint and define where the data should be sent to. With “127.0.0.1:8888/analytics” it is going to be sent to our own computer on port 8888. On a production system, you would put the address of your analytics servers there.

On line 2 we name our platform. If we had multiple apps or websites, we could separate them with this ID. Lines 3 and 4 help us with developing, since they allow our requests to be sent in a readable fashion via GET and unencoded. With Lines 5 and 6, we enable Browser Fingerprinting for our clients and tell Snowplow to include some web-specific parameters.

From line 7 on we enable some more variables for our web platform. For example, we get some load time information and website specific data. We also disable the Google Analytics integration on line 10 and the geolocation feature on line 11 (the latter would prompt our users for permission to use location data, which we don’t need yet).

With one more code block, we will enable some more cool features:

snowplow_tracker('enableLinkClickTracking');
snowplow_tracker('enableFormTracking');
snowplow_tracker('enableErrorTracking')
snowplow_tracker('enableActivityTracking', 10, 10);
snowplow_tracker('trackPageView', 'my custom page title',[{
    data: {
      "a Date": new Date().toString(),
      "a String": "Hello there!"
    }
  }]);

Let’s go trough that. Line 1 enables us to track the links our website users click on. With line 2, we enable the tracking of our website forms, if there are any (caution, this might include personal information about our users!) To know about errors happening on our platform, we enable tracking for that in line 3 as well.

Now, line 4 is something really cool. Under normal circumstances, we define the time spent on a website by comparing the timestamp of the page load with the next page’s timestamp. But if there is no next page, there also is no time spent! To help with this issue Snowplow will send events with our configuration as long as the user is active on a loaded page (every 10 seconds with our config). If you have a content driven website this is a killer feature and would be quite expensive with a commercial Web Analytics system.

Beginning on line 5 we then set some variables we want to track with our page events. On line 5 we send the actual Page View and give our page a very creative name. In the array after that, we set two custom variables named “a Date” and “a String” to demonstrate how to send data to our backend.

Putting it all together

To demonstrate our tracker we should create a small testpage. I’ve went ahead and did that, see below:

<!DOCTYPE html>
<html>
  <head>
    <title>Welcome to the Testpage!</title>
    <script type="text/javascript" async=1>
      ;(function(p,l,o,w,i,n,g){if(!p[i]){p.GlobalSnowplowNamespace=p.GlobalSnowplowNamespace||[];
      p.GlobalSnowplowNamespace.push(i);p[i]=function(){(p[i].q=p[i].q||[]).push(arguments)
      };p[i].q=p[i].q||[];n=l.createElement(o);g=l.getElementsByTagName(o)[0];n.async=1;
      n.src=w;g.parentNode.insertBefore(n,g)}}(window,document,"script","sp.js","snowplow_tracker"));
    </script>
    <script type="text/javascript">
      snowplow_tracker("newTracker", "webtracking", "127.0.0.1:8888/analytics", {
        appId: "testpage",
        eventMethod: "get",
        encodeBase64: false,
        userFingerprint: true,
        platform: "web",
        contexts: {
          webPage: true,
          performanceTiming: true,
          gaCookies: false,
          geolocation: false
        }
      });
      snowplow_tracker('enableLinkClickTracking');
      snowplow_tracker('enableFormTracking');
      snowplow_tracker('enableErrorTracking')
      snowplow_tracker('enableActivityTracking', 10, 10);
      snowplow_tracker('trackPageView', 'my custom page title',[{
        data: {
          "a Date": new Date().toString(),
          "a String": "Hello there!"
        }
      }]);
    </script>
  </head>
  <body>
    <h1>Welcome to the test page!</h1>
    <p><form><input></form></p>
    <a href="#">Click!</a>
  </body>
</html>

Drop this as a HTML file in the same folder as our sp.js tracker file. When opening it, you will see a request in your browser’s network tab to the endpoint configured on line 12. This might give you an error right now, as we will create our endpoint in the next post. That request should include everything we configured above. Also, there will be events every 10 seconds (as enabled on line 28) or on clicks on the link according to line 25.

That’s it for this post! In the next one, we will build the backend to receive the events we send here.

Scroll to Top