Simple Time Series Analysis through Standard Deviation – Statistics in Adobe Analytics
In my last post, we took a look at how Descriptive Statistical Analysis can help us understand our site performance using the simple Mean. I introduced the concept of conditional counters to help us identify our top- and bottom-performing sites. Today we are going to extend our knowledge of descriptive statistical methods by using Standard Deviation on trended data and apply conditional counters to it as well, but with a new spin. If conditional counters are new to you, it might help to check out that last post!
As last time, we are setting ourselves a goal for this post. At the end, we want to have a nice workspace to help us understand our trended data better. We need a way to judge if the fluctuation in our data is within an expected range and how often it is not. This is what we are going to build:
Let’s first look at how to create this corridor of expected values.
Mean and Standard Deviation for trended data
We’ll start with a simple Freeform Table and Graph for the metric we are trying to describe. We want to understand our Unique Visitors better, so I will start with something like this for my demo dataset:
Our first metric for today is the simple Mean, like in the previous post. To achieve this, all we have to do is right-click our Unique Visitors metric, click on “Create metric from selection” and choose “Mean”. This will give us a new column in our table like this:
I personally find this quite interesting already. We can see that the Mean gives us a nice average across our trended data. There are days where the performance is above and below average, as we would expect. But how can we judge if one of those peaks is “special” compared to the others? Luckily, Statistics has a way to help us!
In Statistics, we can use the Standard Deviation to help us understand the distribution of our data a bit better. It expresses by how much our data varies around the Mean, which is just what we want. There is also the very handy 68-95-99.7 rule, which shows how Standard Deviations work: 68% of values are within the first Standard Deviation, 95% within the second and 99.7% within three Standard Deviations. This sounds a lot like anomaly detection, which it indeed is quite similar to!
Luckily, our trusty Adobe Analytics has a very simple way to calculate the Standard Deviation. Open the Calculated Metric Builder and drag the “Standard Deviation” function in the definition. I use Unique Visitors as metric and name it “Unique Visitors SD” like this:
Once drag this into our Table, we immediately get a better view on our data:
Now we know: On average, we have 5740 Unique Visitors per day and 68% of days are within half of 427 Visitors above or below the Mean! A more meaningful way to look at this is to actually see the corridor of expectation in the data, so let’s create it! All we have to do is select both the Mean and Standard Deviation and create two new Calculated Metrics by adding and subtracting them:
This gives us two new columns in our table. For our graph, I deselected the Standard Deviation to increase clarity:
Isn’t this awesome? With just a few clicks, we basically created our very own anomaly detection! Now we clearly see how our values are scattered around our corridor of expected values. Most of our values are within that corridor, with some outside of it. I wonder: How many are there of those outliers?
Counting values outside of expectation
As we know from the previous post, before we can actually count anything, we need to know what we want to count. In our case, we want to know how many days are above the upper expectation or below the lower expectation. The first step is to create two new metrics: One subtracting the lower expectation from the Unique Visitors and one subtracting the upper expectation from Unique Visitors. When visualized, the two lines look like this. Look at how nicely they are centered around 0:
This will be the basis for our counters. What is crucial to those lines is whenever they cross 0: When the lower line is above zero, that day’s performance is above the expectation (compare to previous chart). Whenever the upper line is below zero, that day was below the expectation. Now we almost have something to count!
In this post, we are going to create our counters in a different way than before, to show how things can be done differently. First, let’s create two new metrics. One should be “1” if the current day is above expectation while the other is “1” if the current day is below expectation. This can be done by using the “Greater Than” and “Less than” functions. All we have to do is open the two metrics from before in the Metric Builder and drag them into those functions, comparing them to zero. This is how it looks like for the “Below Expectation” Metric:
Once we drag them into our table, we can see that they actually work: All rows with a value outside the expectation have a “1” in one of the columns:
Last step! If we now want to see how many days there are below expectation, we just have to sum up the values of that column. This can be done via our handy right-click menu like so:
And there it is: We see how many days from our table are above and below expectation (last and third-to-last column):
Awesome! Now all we need is a nice dashboard!
Putting it all together
In our dashboard, I want to see how my metric is performing over time relative to the expected values and mean. On top of that, I would like to know both the mean and the standard deviation, as well as how many days are above or below expectation. With a Line Chart and some summary numbers, this can be done quite nicely:
You might have noticed: I divided the Standard Deviation by the Mean to get the relative traffic fluctuation. This way we can show how much our traffic fluctuates from day to day compared to the mean! As always, we can use this together with Filters and Segments to learn more about our performance over time for groups of our users.
That’s it for this post. I hope you found it helpful to get a different approach to how we can count things in Adobe Analytics. Let me know about the awesome stuff you create yourself!
German Analyst and Data Scientist working in and writing about (Web) Analytics and Online Marketing Tech.
2021&2022 Adobe Analytics Champion