Web Analytics with Adobe’s Customer Journey Analytics, Part 6: Advanced Data Processing in Query Service
This post is the sixth post of the eight-part-series Web Analytics with Adobe’s Customer Journey Analytics, showing how web sites can be analyzed better using Adobe’s next evolution of Adobe Analytics. In the previous post, we took a look at processing some basic data we need for our web analytics use case utilizing Query Service in Experience Platform. In this post, we are creating some advanced fields to our data in Query Service.
I think it’s fair to say that even with just the information from the previous part, we could have a very useful web analytics tool already. But if you know me, you know that I like to take things to the next level wherever I can, especially if it involves writing code. And is SQL not some sort of code too?
Entry and exit page were a nice start last time, but we have some fields still blank in our XDM schema. Today we are going to fill those by creating next- and previous page dimension, page view counters, and even more awesome fields. Whenever I still use Adobe Analytics today, I’m always very jealous of how we can just create those in Experience Platform with Query Service.
Then, after we have created all the data points we want, we still have to somehow get it out of Query Service and into a dedicated Experience Platform dataset to finally ingest it into Customer Journey Analytics. As that part is also somewhat tricky to get right, we are also going to take a good look at that.
Adding more, awesome fields
For a start, we are going to add the next- and previous page columns to our data. But there is one little things we need to keep in mind: Since we are adding the page name to every event (thanks to our merged data layer), we should only consider the page name if an actual page view happened. We can easily get that information by looking at the page view counter. Whenever it is larger than zero, we want to have the page name in a temporary column in the innermost query:
IF((web.webpagedetails.pageviews.value > 0), web.webpagedetails.name, '') conditional_page_name
Then, we can use that field in the next outer query to pull the next and previous page using the next and previous Adobe Defined Functions like this:
next(conditional_page_name, 1, true) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING).value `next_page`,
previous(conditional_page_name, 1, true) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW).value `previous_page`
Nice! Now, let’s create our page view counter column. With that, we want to have a column showing us which page view in a session a certain event happened on. Think of this like a Counter Evar in Adobe Analytics. To achieve this, we just have to count the page views that previously happened in a session with another window function:
sum(web.webpagedetails.pageviews.value) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) `page_view_counter`
By using the BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW condition in the window function, we only consider events before the current one, giving us a rolling sum or cumulative value. Neat!
We have two more of those fields left in our schema: The inverse hit depth and the inverse page view counter. They fundamentally work the same as the hit depth or page view counter, but count down from the last respective event of a session. In the outer query, we can just subtract the current hit depth and page view counter of every hit from the maximum hit depth and page views like this:
(session_depth_max - (session.depth - 1)) "inverse_hit_depth",
(session_page_views - (page_view_counter -1)) "inverse_page_view_counter"
The next field requires us to again edit our query on multiple levels. I’d like to have a time spent on page field, that only gives us the time until the next page view whenever a page view occurs. To do that, we first need to conditionally pull the current timestamp into a new column in the innermost query:
IF(web.webpagedetails.pageviews.value > 0, timestamp, '') conditional_page_timestamp
Then, in the next outer query, we pull in the next timestamp to the current row it has a page view on it:
IF(conditional_page_name != '', Next(conditional_page_timestamp, 1, true) OVER(PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING).value, '') `next_page_timestamp`
Lastly, in the outermost query, we just have to subtract the next timestamp from the current one to get the time spent on a given page:
IF(conditional_page_name != '', to_unix_timestamp(next_page_timestamp) - to_unix_timestamp(timestamp), '') `time_on_page`
Super cool! This nicely fits my use case to analyse the time spent on a page. It does however not respect repeating page names (like reloads) and treats those as new page views. That could be changed easily, but I want it exactly how it is now.
As the last fields for this post, we are going to create some awesome path reports (something I always wanted to have in Adobe Analytics) based on my previous post. To do that, we first have to create an inner field that either gives us a page name (on page views) or the name of an event (from the eventType schema field). This can be done like this:
IF(web.webpagedetails.pageviews.value > 0, concat('Page View: ',web.webpagedetails.name), eventType) `pathing_event`
What we do with that is quite simple: In every session on every event, we want to have four new fields, containing the full path of events, the previous events up to the current row, the following events after the current row, as well as a field containing the relative path (where we see the current event in relation to the full session). This is quite simple using the concat_ws function:
concat_ws(' -> ',collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) "session_full_path",
concat_ws(' -> ',collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)) "session_preceding_path",
concat_ws(' -> ',collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) "session_following_path",
concat_ws(' -> ',collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),'Current Event',collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) "session_relative_path"
Did I mention already that I love the amount of freedom Query Service gives us to create the ultimate dataset in Experience Platform? If not: I love this! If you are curious, here is the complete query for all those amazing fields:
SELECT
*
, (to_unix_timestamp(session_timestamp_max) - to_unix_timestamp(session_timestamp_min)) `session_duration`
, (session_depth_max - (session.depth - 1)) `inverse_hit_depth`
, (session_page_views - (page_view_counter - 1)) `inverse_page_view_counter`
, IF((conditional_page_name <> ''), (to_unix_timestamp(next_page_timestamp) - to_unix_timestamp(timestamp)), '') `time_on_page`
FROM
(
SELECT
*
, attribution_first_touch(timestamp, '', conditional_page_name) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value `session_entry_page`
, attribution_last_touch(timestamp, '', conditional_page_name) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value `session_exit_page`
, min(timestamp) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) `session_timestamp_min`
, max(timestamp) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) `session_timestamp_max`
, max(session.depth) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) `session_depth_max`
, sum(web.webpagedetails.pageviews.value) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) `session_page_views`
, sum(web.webpagedetails.pageviews.value) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) `page_view_counter`
, approx_count_distinct(web.webpagedetails.name) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) `session_distinct_pages`
, next(conditional_page_name, 1, true) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING).value `next_page`
, previous(conditional_page_name, 1, true) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW).value `previous_page`
, IF((conditional_page_name <> ''), next(conditional_page_timestamp, 1, true) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING).value, '') `next_page_timestamp`
, concat_ws(' -> ', collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) `session_full_path`
, concat_ws(' -> ', collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)) `session_preceding_path`
, concat_ws(' -> ', collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) `session_following_path`
, concat_ws(' -> ', collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 'Current Event', collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) `session_relative_path`
FROM
(
SELECT
*
, sess_timeout(timestamp, (60 * 30)) OVER (PARTITION BY identityMap.ecid ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) session
, IF((web.webpagedetails.pageviews.value > 0), web.webpagedetails.name, '') `conditional_page_name`
, IF((web.webpagedetails.pageviews.value > 0), timestamp, '') `conditional_page_timestamp`
, IF((web.webpagedetails.pageviews.value > 0), concat('Page View: ', web.webpagedetails.name), eventType) `pathing_event`
FROM
frederiks_website_dataset
ORDER BY timestamp ASC
)
)
Now we need to write all that data back into a new Experience Platform dataset, following the Experience Data Model schema.
Writing data back to Experience Platform
There are a few things we need to consider when we want our data from Query Service to be written back into an Experience Platform dataset. First, we need to write it to a dataset with an existing XDM schema, so that we can use the person and timestamp type fields for CJA. Second, we need to follow a specific syntax when mapping the data from our query into columns in the dataset.
If you follow my example, you should be able to use a query like this pretty much 1:1, replacing the company id, dataset names, and schema name with your own:
CREATE TABLE frederiks_enriched_website_dataset WITH (schema='Frederiks Website Schema') AS (
SELECT
STRUCT(STRUCT(STRUCT(session.depth "hit_depth", inverse_hit_depth, session.num "session_num", session_duration "time_spent") as meta,STRUCT(session_distinct_pages "distinct_pages", session_entry_page "entry", session_exit_page "exit", session_following_path "following_path", session_full_path "full_path", session_page_views "page_views", session_preceding_path "preceding_path", session_relative_path "relative_path") as page) as session) as _yourcompany,
Struct(Struct(Struct(web.webPageDetails._yourcompany.content, inverse_page_view_counter, next_page, page_view_counter, previous_page, time_on_page "time_spent") "_yourcompany", web.webPageDetails.pageViews "pageViews", web.webPageDetails.URL "URL", web.webPageDetails.name) "webPageDetails", web.webReferrer "webReferrer") "web",
_id "_id",eventType "eventType",identityMap "identityMap",timestamp "timestamp", device, environment, implementationDetails, placeContext
from (
SELECT
*
, (to_unix_timestamp(session_timestamp_max) - to_unix_timestamp(session_timestamp_min)) `session_duration`
, (session_depth_max - (session.depth - 1)) `inverse_hit_depth`
, (session_page_views - (page_view_counter - 1)) `inverse_page_view_counter`
, IF((conditional_page_name <> ''), (to_unix_timestamp(next_page_timestamp) - to_unix_timestamp(timestamp)), NULL) `time_on_page`
FROM
(
SELECT
*
, attribution_first_touch(timestamp, '', conditional_page_name) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value `session_entry_page`
, attribution_last_touch(timestamp, '', conditional_page_name) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value `session_exit_page`
, min(timestamp) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) `session_timestamp_min`
, max(timestamp) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) `session_timestamp_max`
, max(session.depth) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) `session_depth_max`
, sum(web.webpagedetails.pageviews.value) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) `session_page_views`
, sum(web.webpagedetails.pageviews.value) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) `page_view_counter`
, approx_count_distinct(web.webpagedetails.name) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) `session_distinct_pages`
, next(conditional_page_name, 1, true) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING).value `next_page`
, previous(conditional_page_name, 1, true) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW).value `previous_page`
, IF((conditional_page_name <> ''), next(conditional_page_timestamp, 1, true) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING).value, '') `next_page_timestamp`
, concat_ws(' -> ', collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) `session_full_path`
, concat_ws(' -> ', collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)) `session_preceding_path`
, concat_ws(' -> ', collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) `session_following_path`
, concat_ws(' -> ', collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 'Current Event', collect_list(pathing_event) OVER (PARTITION BY identityMap.ecid, session.num ORDER BY timestamp ASC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) `session_relative_path`
FROM
(
SELECT
*
, sess_timeout(timestamp, (60 * 30)) OVER (PARTITION BY identityMap.ecid ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) session
, IF((web.webpagedetails.pageviews.value > 0), web.webpagedetails.name, '') `conditional_page_name`
, IF((web.webpagedetails.pageviews.value > 0), timestamp, NULL) `conditional_page_timestamp`
, IF((web.webpagedetails.pageviews.value > 0), concat('Page View: ', web.webpagedetails.name), eventType) `pathing_event`
FROM
frederiks_website_dataset
ORDER BY timestamp ASC
)
)
)
)
In production, we would schedule a query like this to run on an interval. Doing that, we could pull in new rows of data automatically.
Conclusion
I really like how we went from a basic web analytics dataset to something that is actually superior to what Adobe Analytics offers today. Sure, some of the out-of-the-box features are not there yet, but I hope I was able to demonstrate how we can basically build anything we could dream of using a bit of SQL. Luckily, Experience Platform allows us to experiment and play around with those new features by creating new datasets and pulling them into Customer Journey Analytics. Pretty cool!
In the next post, we are going to connect our new and enriched dataset to Customer Journey Analytics and create our first Data View. See you there!

German Analyst and Data Scientist working in and writing about (Web) Analytics and Online Marketing Tech.