Even more Query Service tips to level up your Customer Journey Analytics game
It has been more than two weeks since my last post. Where have I been? Easy answer: Fully and blissfully engulfed in the blessing that is Adobe’s Customer Journey Analytics. I’ve imported my Adobe Analytics data, done some custom user stitching, and created some fun things that I didn’t even knew I’ve always wanted from Adobe Analytics. This experience (pun intended) has completely shifted the way I look at my own data.
Following up on my last post, I would like to share some new and cool things I discovered using some more Adobe Defined Functions, window functions, and standard SQL. I wont give a long introduction into Query Service again, but feel free to revisit the previous post for some information on how Query Service interacts with Adobe Experience Platform datasets and how to write data back to those.
Getting creative with Sessionization
Recreating Adobe Analytics sessions in Query Service is easy thanks to the SESS_TIMEOUT function. As a quick reminder: It’s a window functions that returns four values for each row of data, giving us something equivalent to the “Hit Depth” and “Time spent” values in Adobe Analytics. This is the syntax:
SESS_TIMEOUT(timestamp, timeout_in_seconds) OVER ([partition] [order] [frame])
To get the standard Visit from Analytics back, we would use it like this:
sess_timeout(timestamp, (60 * 30)) OVER (PARTITION BY user_id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) session
Which gives us an output like the rightmost column (in JSON format for easier readability):
Now we have our Analytics Visits back! timestamp_diff gives us both the time since the last event (seconds since last row of a visit) and, for the first hit, the time since the last visit (see timestamp_diff of second row). Nice!
SQL veterans will recognize this Adobe Defined Function as a normal variation on window functions. So let’s have some fun! For example, what would happen if we reversed the order of rows in the window frame? Let’s try this (note the “desc” instead of “asc”) and see what we get:
sess_timeout(timestamp, (60 * 30)) OVER (PARTITION BY user_id ORDER BY timestamp DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) reverse_session
Gives us:
Interesting! First thing to recognize: The num from reverse_session is useless now, since the functions is pretty confused by our reversed timestamps. But that doesn’t matter, we already have that information in the normal session output. The rest is in turn very helpful: depth now gives us an indication of how close to the end of a session an action occurred. Now we can easily answer questions like “what was the last thing a user did in their session?”, or “how many events happend after the purchase event in a session?” which are all quite helpful things to know.
Now let’s go even further!
Going crazy with Pathing
I already covered next- and previous page reports in the last post. For this post I have another goody for you: We are going to recreate pathing reports from the ancient Reports & Analytics within Adobe Analytics! And not only that: We are going to use Adobe Analytics Logfiles as our dataset to keep things interesting. The base dataset looks like this:
First, I would like to know how my users move through Adobe Analytics. This can be achieved by using two Spark SQL functions:
concat_ws(' -> ',collect_list(event) OVER (PARTITION BY userid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) "full_path_session"
What are we doing here? First, we are using collect_list() to gather a list of all the events for a session. This is, again, a window function and based on the userid and the session.num. After we collected the actions, we can use concat_ws() to glue them all together, using ” -> ” as separator. The output are values like this:
Awesome! We get the complete path of actions for every hit in a session. Another great detail: If we would leave out the session.num partition criteria, we would get the same path but for the whole user journey instead of just the current session! Let’s modify the command a bit to see what else we can do:
, concat_ws(' -> ',collect_list(event) OVER (PARTITION BY userid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)) "preceding_path_session"
, concat_ws(' -> ',collect_list(event) OVER (PARTITION BY userid, session.num ORDER BY timestamp ASC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) "following_path_session"
I didn’t need to modify much here. All that changed is the ROWS BETWEEN setting in the OVER statement. By introducing this new limit, we only get the path before the current row or after the current row, so only past or future values. Now our result looks like this:
We see how with each new event, our Path grows by one element. The following path for the same session looks like this:
Which is the exact opposite. We could use this in a report to see what happened before or after a certain event. We could for example break down our pages report by the following or preceding path to see what our users were up to before the current page. There is an even clearer way for this (with a more complicated syntax):
concat_ws(' -> ',collect_list(event) OVER (PARTITION BY userid, session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),'Current Event',collect_list(event) OVER (PARTITION BY userid, session.num ORDER BY timestamp ASC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) "relative_path_session"
Here we now combine both the preceding and following path, inserting a “Current Event” item between them. This gives us paths like this:
Here we clearly see how the “Current Event” moves through the path with each new event! Pretty neat. What else can we do?
Counting and calculating
There is an awesome Adobe Defined Function to calculate the time until or since a certain event. While this is very helpful to analyze things like “time to first conversion” or event “time to next conversion”, we will focus on something else here. With the NEXT function, we can get values from future rows of data. Based on our already sessionized dataset, it’s easy to get the Time Spent value of a row like this:
NEXT(session.timestamp_diff,1,true) OVER (PARTITION BY userid, session.num ORDER BY timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING).value "time_spent"
We can confirm this is working by looking at the output:

Perfect! Our functions pulls the time spent by looking at the next row for a session. But looking at the depth parameter for our session, I wonder if we could do the same on a user level. Thanks to SQL, we can!
ROW_NUMBER() OVER (PARTITION BY userid ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) "user_depth"
This is a new function for us, returning the number of an event relative to the user, thanks to our window-function’s settings. We get an output like this rightmost column:
Neat! Now it’s easy for us to analyze the second thing a user ever did, no matter what session it was in! That gives me another idea: Can we know how many actions a session or user will take in total? Of course we can:
attribution_last_touch(timestamp, '', session.depth) OVER (PARTITION BY userid,session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value max_session_depth
, attribution_last_touch(timestamp, '', user_depth) OVER (PARTITION BY userid ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value max_user_depth
Using the attribution functions we already used in the last post, we can easily get the last value from a session or user, which looks like this two new columns:
That was surprisingly simple. We know can already analyze how far sessions or even users went into our properties. Now let’s have some fun and throw in a bit of math:
int(session.depth / max_session_depth * 100) session_depth_percentage
, int(user_depth / max_user_depth * 100) user_depth_percentage
You see right: I just calculated the percentage of how far in a session or user journey a certain event occurred. It looks like those two new columns now:
What does this tell us? Easy: We could now build a segment for every action in the second half of a session or user journey! We could also see if a certain action occurs more often at the beginning or end of a session, regardless of how deep the session was. And we can do the same for the actual time with just a few more columns:
attribution_first_touch(timestamp, '', timestamp) OVER (PARTITION BY userid,session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value min_session_timestamp
, attribution_last_touch(timestamp, '', timestamp) OVER (PARTITION BY userid,session.num ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value max_session_timestamp
, attribution_first_touch(timestamp, '', timestamp) OVER (PARTITION BY userid ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value min_user_timestamp
, attribution_last_touch(timestamp, '', timestamp) OVER (PARTITION BY userid ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).value max_user_timestamp
First, we have to get the first and last timestamp of both the session and the user journey. Based on those, we can do some math again:
int(((to_unix_timestamp(timestamp) - to_unix_timestamp(min_session_timestamp))/(to_unix_timestamp(max_session_timestamp) - to_unix_timestamp(min_session_timestamp)))*100) session_time_percentage
, int(((to_unix_timestamp(timestamp) - to_unix_timestamp(min_user_timestamp))/(to_unix_timestamp(max_user_timestamp) - to_unix_timestamp(min_user_timestamp)))*100) user_time_percentage
This returns an output like:
Those two columns on the right now give us an percentage of where an event occurred relative to the session and user journey based on time. This is slightly different than the calculation based on the hit depth, since hits can be unevenly distributed on a timeline across a session or journey. With this time based calculations, we could even get the Time prior to event on both a session and user level, as well as the time after an event:
(to_unix_timestamp(timestamp) - to_unix_timestamp(min_session_timestamp)) session_time_prior_to_event
, (to_unix_timestamp(timestamp) - to_unix_timestamp(min_user_timestamp)) user_time_prior_to_event
, (to_unix_timestamp(max_session_timestamp) - to_unix_timestamp(timestamp)) session_time_after_event
, (to_unix_timestamp(max_user_timestamp) - to_unix_timestamp(timestamp)) user_time_after_event
The first two rows give us an indication on where in a visit or user journey an event occurred (similar to Time prior to Event in Adobe Analytics), while the last two rows show how close to the end of a session or journey something happend. That way we can answer questions like “what happened in the first five minutes of a user journey?” or “how does the last minute of a converting visit look like?” with ease and use it in segmentation.
Wrap up
Will I ever return to normal Adobe Analytics after all this Customer Journey Analytics and Query Service work? Yes, of course. It’s still my main driver for day-to-day analysis. But with all those exiting information I can get out of my boring old data I will gradually spend more and more time in CJA in the future.
With the amazing team at Adobe creating more and more features for CJA itself, it will get even easier to do simple tasks that require using Query Service today. I’m quite excited for those as well, since that would spare me from going back and forth between CJA and Query Service. If any of the functionality described above makes it’s way into the CJA frontend I would be quite happy.
That’s it for now. Have a nice day!

German Analyst and Data Scientist working in and writing about (Web) Analytics and Online Marketing Tech.