# Predictive Regression Analysis – Statistics in Adobe Analytics

Adobe Analytics is awesome for analyzing historical data. Besides Segments, Drilldowns or Derived Metrics, it also offers some advanced statistical functions like Regression Analysis. Here are some examples for the different regression models that are available today:

It would be really cool if we could use this functionality to predict the future with some regressive models! This is what this article is going to describe by using advanced calculated metrics. In the end, we want to have a graph like this, with the historical and future data in the same visualization:

We will go through the whole process of generating a metric like shown above. If you just want the result, you can scroll down to the bottom of this article, where I show the complete metric. Let’s start!

## Statistics 101: Simple Linear Regression in Adobe Analytics

To start things off, let’s remind ourselves what regression analysis does. To keep things simple, we will focus on linear regression for now. We won’t go into full detail on a statistical level, but focus on the business application instead. This is the formula for it:

Simple linear regression formula, where Y is the predicted value, X represents the time for our application, a is the slope and b the intercept.

Y =aX + b

Speaking in business terms, we want to have a function that approximates our real values as closely as possible over time. So if we build a simple regression metric in Adobe Analytics, we would use the variables like this:

With that information, we can start building our calculated metric. The first component is our X variable which represents the time. All we need is a simple counter that counts up for every row in a table. We can use the “Cumulative” function in the calculated metric builder like this, with a constant “1” as metric:

With this metric, we can generate a table like this, where for every row our metric counts up by one:

Now if we want to only look at historical data, we can already build another metric to give us the predicted Y value from a linear regression. In the metric builder, we would just drag the Incrementor as the X variable and Unique Visitors (or any other metric) as the Y variable in the “Linear Regression: Predicted Y” function like this:

When we compare this to the actual values, we can see that the regression is working nicely and approximates our real values quite well:

But there is one problem with a variable like this: As soon as we look at a date in the future, those days also get included in the calculation and pull down our graph significantly:

This also happens when you include the current day, since data is not fully collected for today. Luckily, there is something we can do about that!

## Dealing with missing values for Regressions in Adobe Analytics

First, we are going to exclude the current day from our data with a simple segment like this:

With this segment, we now create a new Incrementor variable. This time, we exclude the current day and check if the current line in our table contains any data at all. If yes, we return the actual value from the Incrementor. But there is one more condition, since we only want to increment the Incrementor if the current day has a value. The “Incrementor with Data” metric looks like this now:

When we look at a date range with fragmented data we can see it is working nicely. Only when we actually have data the Incrementor is incremented and displayed as not-zero:

If we replace the Incrementor in our Regression metric, we have a quite complex setup now:

In our Graph, we can now see our new Metric is not affected by the missing data in the future, so it is working nicely (look for the range when there still is data):

Now that we are able to base our regression solely on existing data, we can start thinking about how we want to calculate the future values with our final metric!

## Predictive Analytics using Linear Regression in Adobe Analytics

Let’s remind ourselves what we are trying to achieve. We want to get the predicted future value of a metric based on past values. Like above, we want the regression to be calculated only on complete, past data.

Sadly, this cannot be done with the “Predicted Y” function we used before. Instead we need to calculate the regression by ourselves, with the “Linear Regression: Slope” and “Linear Regression: Intercept” functions in the Calculated Metric Builder. If we remember our formula from above, we now have to fill a and b on our own. Without actual values, the scaffold metric looks like this:

In the first X-slot, we are putting our “Incrementor with Data” metric. On the first Y-slot, we are putting the metric we want to predict with the Exclude Today segment. In the next slot, we need to put the “normal” Incrementor from before. For the Intercept we do the same as with the Slope, using the “Incrementor with Data” and target metric without today. At the end, it should look like this:

Wow, that’s quite a metric! Now let’s look at the result in a graph and compare it with the previous regression metric:

It works! Now our Regression Line adjust with the date range just nicely and ignores both today and the future in the calculation.

With this metric, we could quite easily modify the calculation even further to model our data in a totally different way. For example, we could predict this year’s sales with the regression slope from only last year! Or we could model our mobile traffic by only taking the iOS growth into account. By modifying only certain components of our metric, there is basically no limit to what we can do.

One topic I didn’t even touch are the other regression methods, like Quadratic or Log Regression. If your data matches it, you could also use the currently very popular Exponential Curve for your prediction. If you feel adventurous, you could also combine multiple models or incorporate other math functions. Just for fun, I added a simple sine function to our graph to make it look more groovy (as the statisticians say):

A more useful application would be to combine it with seasonal data, like Stefano demonstrated! Since we are able to look into the future with our two Incrementor functions, there is little we can’t do if we are creative (and willing to tackle the Calculated Metric builder, which can be a challenge on it’s own). Changing the base metric from our example is quite quick, but could even be automated through the API. Speaking of the API…

If you trust yourself in using the Analytics API to create Calculated Metrics, here is the definition from the 2.0 API endpoint. If you want to use this you will have to do one or two things first:

- Create the “Exclude Today” Segment and note the Segment ID. In the definition below, search and replace my ID (“s2030_5f0d50f80aa5c317200247b5”) with your own ID. It should appear four times. If you want to do this via the API as well, look below for the segment definition.
- (Optional) If you want to forecast something other than Unique Visitors, you can replace that metric as well. Search and replace the six occurences of “metrics/visitors” with your metric of choice.

I can highly recommend looking at the Excel from this post by Urs to help you with creating that segment.

This is the description and definition for the segment:

```
{
"name": "Exclude Today",
"description": "",
"definition": {
"container": {
"func": "container",
"pred": {
"func": "without",
"pred": {
"val": {
"func": "attr",
"name": "variables/hitdatetime"
},
"func": "datetime-within",
"description": "Today",
"interval-value": {
"func": "datetime-interval-ref",
"id": "today"
}
}
},
"context": "hits"
},
"func": "segment",
"version": [
1,
0,
0
]
},
"id": "s2030_5f0d50f80aa5c317200247b5",
"owner": {
"id":
},
"migratedIds": [],
"isPostShardId": true,
"rsid": ""
}
```

The description and definition of the metric is this monster:

```
{
"polarity": "positive",
"precision": 0,
"type": "decimal",
"definition": {
"formula": {
"func": "add",
"col1": {
"func": "visualization-group",
"col": {
"func": "multiply",
"col1": {
"func": "ls-slope-linear",
"description": "Linear regression: Slope",
"x": {
"func": "calc-metric",
"formula": {
"func": "if",
"description": "If",
"cond": {
"func": "metric",
"name": "metrics/visitors",
"description": "Unique Visitors"
},
"then": {
"func": "cumul",
"description": "Cumulative",
"n": 0,
"col": {
"func": "if",
"description": "If",
"cond": {
"func": "metric",
"name": "metrics/visitors",
"description": "Unique Visitors"
},
"then": 1,
"else": 0
}
},
"else": 0
},
"version": [
1,
0,
0
],
"filters": [
{
"func": "segment-ref",
"description": "Exclude Today",
"id": "s2030_5f0d50f80aa5c317200247b5"
}
]
},
"y": {
"func": "calc-metric",
"formula": {
"func": "metric",
"name": "metrics/visitors",
"description": "Unique Visitors"
},
"version": [
1,
0,
0
],
"filters": [
{
"func": "segment-ref",
"description": "Exclude Today",
"id": "s2030_5f0d50f80aa5c317200247b5"
}
]
},
"include-zeros": false
},
"col2": {
"func": "visualization-group",
"col": {
"func": "cumul",
"description": "Cumulative",
"n": 0,
"col": 1
}
}
}
},
"col2": {
"func": "ls-intercept-linear",
"description": "Linear regression: Intercept",
"x": {
"func": "calc-metric",
"formula": {
"func": "if",
"description": "If",
"cond": {
"func": "metric",
"name": "metrics/visitors",
"description": "Unique Visitors"
},
"then": {
"func": "cumul",
"description": "Cumulative",
"n": 0,
"col": {
"func": "if",
"description": "If",
"cond": {
"func": "metric",
"name": "metrics/visitors",
"description": "Unique Visitors"
},
"then": 1,
"else": 0
}
},
"else": 0
},
"version": [
1,
0,
0
],
"filters": [
{
"func": "segment-ref",
"description": "Exclude Today",
"id": "s2030_5f0d50f80aa5c317200247b5"
}
]
},
"y": {
"func": "calc-metric",
"formula": {
"func": "metric",
"name": "metrics/visitors",
"description": "Unique Visitors"
},
"version": [
1,
0,
0
],
"filters": [
{
"func": "segment-ref",
"description": "Exclude Today",
"id": "s2030_5f0d50f80aa5c317200247b5"
}
]
},
"include-zeros": false
}
},
"func": "calc-metric",
"version": [
1,
0,
0
]
},
"id": "",
"name": "Predictive Linear Regression",
"description": "",
"rsid": "",
"owner": {
"id":
}
}
```

German Analyst and Data Scientist working in and writing about (Web) Analytics and Online Marketing Tech.