Cookie-less Server Side Tracking with Adobe Customer Journey Analytics

If there is one big hot topic in digital analytics right now (besides the unfortunate sunset of Google Analytics 3 and GDPR news) it quite possibly is the recent trend of what many call server side tracking. Currently, server side tracking is an obligatory agenda item at every analytics conference and virtually every vendor of analytics or tag management systems is working on a way to serve the rising demand.

However, while there is a lot of talk around the topic, there is no shared definition in our industry of what server side tracking actually is. Jim Gordon has assembled a nice overview of what people might mean when they talk about any of the underlying concepts. In my personal experience, people usually refer to a form of server side tag management, often using Google’s server side tag manager, that still uses some logic in the client’s browser.

Adobe has the, in Jim’s terms, client-to-vendor use case covered with their normal implementation through Adobe Launch, where the tag manager in the user’s browser will take care of the data to be sent to tool vendors directly. With the recent introduction of the new Web SDK Adobe has created something that can be very similar to Google’s server side tag manager, where a small library in the user’s browser sends a single point of data to Adobe’s servers from where it can be forwarded to Adobe’s own or even other vendor’s tools.

In this post I want to highlight another approach that Jim refers to as a server-to-vendor setup. Specifically, I want to show how we can use the logfiles a web server produces to track user behavior on a site. This comes with a big benefit: It does not require any implementation on the front end or even cookies, so it is also harder (if not impossible) to block for browser vendors. We are then going to use Adobe’s Experience Platform and Customer Journey Analytics to build our analytics system because both are very flexible in terms of data collection. Let’s get started!

Collecting Apache log files for analytics

To track information about our users we first need a way to capture interactions, like a page load. Luckily, most web servers that serve websites have a way to log which content has been served. On top of that, they often allow for some clever customization that we will use to identify users. There are many web servers out there, but we will focus on the widely used Apache web server today. I provided some examples for NGINX and OpenResty in my previous posts on log analysis and advanced open source analytics systems.

The Apache web server is quite flexible when it comes to logging requests. While we could use the builtin access or error logs, those don’t contain all the information we need to identify users and have some challenges because of the specific formatting. We are going to use a Custom Log format that has all needed information we can collect from Apache. My instructions in the Apache config look like this:

LogFormat "{\"timestamp\":%{msec}t,\"userid\":\"%{REMOTE_IPHASH}e\", \"url\":\"%V%U%q\", \"host\":\"%V\", \"request\":\"%U\", \"method\":\"%m\", \"query\": \"%q\", \"status\":%>s, \"path\": \"%U\", \"referrer\": \"%{Referer}i\", \"agent\": \"%{User-agent}i\"}" tdf_hashed
CustomLog "logs/cja.log"    tdf_hashed  "expr=-T reqenv('log_hashed')"

This might look a bit cryptic if you are not familiar with Apache’s log config style. On the first line, we are defining the actual content of our log file as JSON for easier ingestion later on. Then, in the second row, we are instructing Apache to use that format for logging and tell it where to store our custom log file. Let’s go through the different fields that I have used above:

  • timestamp: The exact time a page has been requested in the millisecond format that AEP and CJA need
  • userid: A hash of the client’s IP address and user agent to identify the user
  • url: A combined field to include the full URL that has been requested
  • host: The server’s name that served the request
  • request: The requested file
  • method: The method that was used to request the file
  • query: The query part of the url
  • status: The return code of the request
  • path: The path of the requested file
  • referrer: The referrer of the requested file
  • agent: The user’s browser’s user agent

As you can see, we are using a hashed combination of the user’s IP address and user agent to identify them. Of course, like always when identifying users, you should consult your legal team to decide if you are allowed to use those. To generate the hash we use an md5 hash like this:

SetEnvIfExpr "true" log_hashed=true
SetEnvIfExpr "md5(%{REMOTE_ADDR}.%{HTTP_USER_AGENT}) =~ /(.*)$/" REMOTE_IPHASH=$1

This combines the user’s IP address and user agent into a single value and hashes them, returning the hash as a variable we can use in our log output. The full Apache log config looks like this in my example:

LogFormat "{\"timestamp\":%{msec}t,\"userid\":\"%{REMOTE_IPHASH}e\", \"url\":\"%V%U%q\", \"host\":\"%V\", \"request\":\"%U\", \"method\":\"%m\", \"query\": \"%q\", \"status\":%>s, \"path\": \"%U\", \"referrer\": \"%{Referer}i\", \"agent\": \"%{User-agent}i\"}" tdf_hashed
CustomLog "logs/cja.log"    tdf_hashed  "expr=-T reqenv('log_hashed')"

SetEnvIfExpr "true" log_hashed=true
SetEnvIfExpr "md5(%{REMOTE_ADDR}.%{HTTP_USER_AGENT}) =~ /(.*)$/" REMOTE_IPHASH=$1

With this config in place, every requested file from our web server will produce a log line in the output file we specified in exactly the JSON format we want. An example could look like this:

{"timestamp":1648898074725,"userid":"f727df652d5959e27e70e1b15c0a8a3c", "url":"", "host":"", "request":"/launch.html", "method":"GET", "query": "", "status":200, "path": "/launch.html", "referrer": "-", "agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.141 Safari/537.36"}

If we wanted we could also include some cookie values in our log (like from Adobe’s Experience Cloud ID Service) to improve our chances to reliably identify users but we are going with a complete cookie-less variant for today.

Next, we would need a way to collect our log files from the, potentially many, web servers we use. Because this part will be highly customized depending on the company you work for I’m going to skip this section. For a personal recommendation: I’ve used Apache NiFi in the past and it has worked great for use cases like this!

Adobe Experience Platform setup

Now that we collect data whenever a file has been requested from our web server, we can start setting up the environment we need in Adobe Experience Platform. Like all things in AEP our adventure begins with the XDM Schema definition. I’m using the AEP Web SDK ExperienceEvent Field Group that Adobe provides so we can benefit from some automated lookups in CJA later on. My very simple Schema just looks like this, the only changed field is the device model that I declared as an identiy field:

Experience Platform Schema for Server Side Tracking

With this simple Schema in place we can now start mapping our log file structure to our Schema. The workflow will be slightly different depending on which connector you use but the general steps in Data Prep should be pretty similar. You can see on the screenshot below how I mapped the fields in my log file to the Schema fields from the XDM Schema. There shouldn’t be any big surprises in this mapping besides the fact that I’m automatically generating an ID for every row of data using the handy Data Prep uuid() function:

Full field mapping from Experience Platform

This is the bare minimum of fields that can easily be mapped using the builtin Field Group we’ve used in the XDM Schema. We could use the other logged fields too if we would extend our Schema with some custom fields or another Field Group. We’ll go ahead with what you see above for now.

Customer Journey Analytics setup

The setup in Customer Journey Analytics can be quite simple. First, we quickly create the Connection based on the Dataset we created when connecting our log file storage to Experience Platform. Note how I used the Device Model field as the Person ID:

Customer Journey Analytics Connection configuration

With the Connection in place we go ahead and create a Data Set. I’ve added all fields that contain data, along with the builtin Standard Components. One thing you will likely want to do is rename some fields, like the Page URL and Referrer URL that are otherwise just called “URL” and “URL (2)”, and possibly the Event container to Request:

Customer Journey Analytics Data View configuration

One of Customer Journey Analytics’ best features (at least in my opinion) is that we can turn any dimension into an Entry- or Exit Dimension. This makes a lot of sense for our Page Name dimension so that we can use it for Entry- and Exit Page use cases. To do that, we just have to drag the dimension into the Data View one more time and change the settings to “First Known” for the Entry Page and “Last Known” for the Exit Page:

Creating Entry- and Exit Page Dimensions in Adobe Customer Journey Analytics

I love how easy it is to create those completely new dimensions, fully on-the-fly and after data has been collected and ingested!

Now that our data is flowing from Experience Platform into Customer Journey Analytics and is available in a Data View, it is ready to be analyzed in Analysis Workspace!

Server Side Tracking Dashboard in Adobe Customer Journey Analytics

As a first validation, I’ve built this small table to take a look at all the requests from my demo log file:

Adobe Customer Journey Analytics Workspace

Nice! As we can see above, the visitor identification through IP address and User Agent worked just fine. We can see that Sessions are way lower than individual Requests, meaning that each request is correctly attributed to the actual user.

Because we are using the full log file from our Apache web server we don’t just see page loads in the data. There’s quite a few other requests for images, the page icon, etc. So while our view is perfect for any server admin already we may want to use a Filter like below to only analyze actual page load requests:

Filtering for Page Loads

We could then apply this filter to our Workspace Panel or the whole Data View as shown below:

Filtering for Page Load Events on the Data View level in Customer Journey Analytics

Now that we collect data, bring it into Experience Platform, import it into CJA, and visualize it in Analysis Workspace, it’s finally time for the…

Wrap up

Wow, what a ride! We actually accomplished quite a few things in this short post, covering the full stack of analytics tasks (see what I did there?) Without using any frontend code or cookies we really managed to recognize our users. While there would be even more options for us to explore, like setting a cookie on the web server and reading it again to allow for some more robust identification, the result is still pretty cool.

Another great thing is how easy it is to build standard dimensions, like Entry- and Exit Page, without any additional processing in Query Service thanks to Customer Journey Analytics’ great new features. This wasn’t possible a few months back and really shows the speed and dedication that Adobe puts into this new product.

Of course this single post can only scratch the surface of how this style of web analytics can be used. Imagine: If we were working at a Netflix-like company that already has a global log stream with all meaningful user interactions available, it would be super easy for us to plug into the stream of Kafka events with Experience Platform and use the power of Analysis Workspace to make sense of content usage, feature adoption, and retention drivers. How cool is that!

I hope you found this basic introduction to be as fun to read as it was for me to create it. Have a great rest of your day!

What is Adobe Customer Journey Analytics (CJA)?

Adobe Customer Journey Analytics (CJA) is the event-driven successor of Adobe Analytics. It takes the same great interface, Analysis Workspace, and makes it available to every type of data using its new and improved data base engine. There are a lot of features in CJA that will never be available in Adobe Analytics thanks to the new engine.

What is Server Side Tracking?

Server Side Tracking is a relatively new approach that tries to work around browser and privacy limitations by hiding the data collection from the client’s browser. Therefore, all communication between marketing tools and a website happen between the involved servers, leading to the name Server Side Tracking.

What is Cookie-less Tracking?

Cookie-less tracking is a new trend aiming to get rid of using cookies for user identification in web analytics systems. This is due to increasingly strong limitations to, especially third-party, cookies in modern browsers and GDPR restrictions. Many companies erroneously assume this approach will not have any privacy implications but still need to work with legal teams to ensure compliance.

Can Adobe Customer Journey Analytics do Server Side or Cookie-less Tracking?

Absolutely! Customer Journey Analytics is perfectly equipped to analyze data from Server Side Tracking or Cookie-less Tracking as it can handle any type of data independent of the source system. This article shows how it can be done in detail.