Tag: Open Source

Processing Adobe Analytics Data Feeds with Apache NiFi for Adobe Experience Platform

In the series of posts that is currently being released on this blog I’m showing how companies can move from Adobe Analytics to the brand new Customer Journey Analytics to utilize the many advantages of the new tool. However, I feel like the current Adobe-provided solution for bringing data from that old to the new world lacks some essential information. I did an extensive comparison in the most recent post of the series, but will give some of the reasons here again. When we use the Adobe Analytics Data Connector to bring data from an Adobe Analytics Report Suite into Experience Platform, we are dealing with some limitations: The data is based on what Adobe calls mid-values. Those sit between raw, unprocessed data, and fully processed data in the processing chain. Because of this, we don’t have access to dimensions like persisted Evars, Visit Number, and other data points we […]

Announcing the open collection of Adobe Analytics best practices

Imagine a situation like this: You are facing a new challenge when using or implementing Adobe Analytics. What do you do? If you are like me, you first check out the documentation to make sure you’ve understood the available features correctly. Then, you start researching blog posts and articles around your topic to see if and how anyone has solved this before. If you are still unsure, you might ask some people on Twitter, LinkedIn, or the Measure Chat. As a last resort, you might even reach out to Client Care and ask for help. It’s easy to see why this approach is not ideal. First, it’s not easy to know if the way you approach a task is still the best way or if new solutions exist. Depending on which pages you found when researching, you might end up with an outdated solution or contradicting approaches by different authors […]

Privacy-centered Analytics with Matomo and Adobe’s Customer Journey Analytics

Legal Disclaimer: Data Privacy is a diverse and ever-changing topic. This makes it nearly impossible to give reliable recommendations to a broad audience. Please consult your company’s legal department on whether those ideas described here are feasible under your jurisdiction. If there has been one predominant topic in the web analytics space for the last couple of years, it surely is data privacy. GDPR is a thing in Europa, COPPA in the US, ITP on planet Apple, and cookie consent banners on every website. Conducting a safe data collection practice as a global business has become more and more challenging, pushing businesses to be more and more careful. Because of this landscape, a lot of businesses are looking for a “bullet-proof” way to analyze website users’s behavior. While Google Analytics is a data privacy nightmare, tools like Piwik Matomo try to justify their existence by claiming to be more privacy […]

Call for contributions! Introducing the Open Adobe Analytics Component Repository

Over the last few months I created quite a lot of Calculated Metrics and Segments for this blog. While the feedback has been great, it became more and more difficult, for me and others, to keep up with all the different metrics and where exactly I used them. I’ve been using a privat Github repository to keep track of everything I create which I now make available to the public. I will put all the metrics and segments I have already created on there as I migrate them from my private repo. The same will be true for future posts on my own blog. My hope is that this will help me stay on top of all those components and maybe help somebody else to find them more quickly. But since I host it on Github, why not make this a collaborative effort? Share your work and earn kudos From […]

Building an Enterprise Grade OpenSource Web Analytics System – Part 7: Analytics Dashboard

This is the seventh part of a seven-part-series explaining how to build an Enterprise Grade OpenSource Web Analytics System. In this post we are building an Analytics Dashboard in Kibana for our data in Elasticsearch. In the previous post we build the connection from Kafka to Elasticsearch and Clickhouse to store the data. If you are new to this series it might help to start with the first post. We have come a long way in this series. We built everything from the client implementation with Snowplow to the processing and enrichment pipelines with Kafka and Python and stored all the data in Elasticsearch. Now it is time to make that data accessible in an appealing way to analysts and business users. The obvious solution for Elasticsearch is Kibana, which is developed by the same company and is designed to work perfectly with Elasticsearch! Webanalytics Dashboard in Kibana In Kibana, […]

Building an Enterprise Grade OpenSource Web Analytics System – Part 6: Data Storage

This is the sixth part of a seven-part-series explaining how to build an Enterprise Grade OpenSource Web Analytics System. In this post we are taking a brief look on what we can do with the data we collected and processed with Clickhouse. In the previous post we built a persisted visitor profile for our visitors with Python and Redis. If you are new to this series it might help to start with the first post. During this series we defined multiple topics within Kafka. Now we have different levels of processing and persistence available. If we want to keep any of it, we should put it in a persistent storage like a Data Lake with Hadoop or a Database. For this project, we are using Elasticsearch and dipping our toes in a database called Clickhouse for fun! Feeding Data into Elasticsearch From the previous part, we have a nice Kafka […]

Building an Enterprise Grade OpenSource Web Analytics System – Part 5: Visitor Profile

This is the fifth part of a seven-part-series explaining how to build an Enterprise Grade OpenSource Web Analytics System. In this post we are going to build a visitor profile to persist some of the data we track with Python and Redis. In the last post we processed the raw data using Python and wrote it back to Kafka. If you are new to this series it might help to start with the first post. Now that we have a nice processed version of our events, we want to remember certain things about our users. To do this, we are going to create a Visitor Profile in Redis as high performance storage. The process for persisting values will look like this: Building our Visitor Profile First things in this part, we are setting up a little helper script that will take our processed tracking events and flatten them. It looks […]

Building an Enterprise Grade OpenSource Web Analytics System – Part 4: Data Processing

This is the fourth part of a seven-part-series explaining how to build an Enterprise Grade OpenSource Web Analytics System. In this post we are building the processing layer to work with our raw log lines. In the last post we used Nginx and Filebeat to write our tracking events to Kafka. If you are new to this series it might help to start with the first post. At this part of the series, we have a lot of raw tracking events in our Kafka topic. We could already use this topic to store the raw loglines to our Hadoop cluster or a database. But it would be much easier later on to do some additional processing to make our life a litte easier. Since Python is the data science language today we will be using that language. The result will then be written to another Kafka topic for further processing […]

Building an Enterprise Grade OpenSource Web Analytics System – Part 3: Data Collection

This is the third part of a seven-part-series explaining how to build an Enterprise Grade OpenSource Web Analytics System. In this post we are setting up the tracking backend with Nginx and Filebeat. In the last post we took care of the client side implementation of Snowplow Analytics. If you are new to this series it might help to start with the first post. Now that we have a lot of data that is being sent from our clients, we need to build a backend to take care of all the events we want. Since we are sending our requests unencoded via GET, we can just configure our web server to write all requests to a logfile and send them off to the processing layer. Configuring Nginx with Filebeat In our last project we used a configuration just like the one we need. As web server, we used and will […]

Building an Enterprise Grade OpenSource Web Analytics System – Part 2: Client Tracking

This is the second part of a seven-part-series explaining how to build an Enterprise Grade OpenSource Web Analytics System. In this post we are setting up the Client Tracking using the Javascript tracker from Snowplow Analytics. In the last post we took a look at the system architecture that we are going to build. If you are new to this series it might help to start with the first post. When building a mature Web Analytics system yourself, the first step is to build some function into your app or website to enable sending events to the backend analytics system. This is called client side tracking, since we rely on the application to send us events instead of looking at logfiles alone. For this series we are going to look at website tracking specifically, but the same principles apply to mobile apps or even server side tracking. Almost every mature […]

Building an Enterprise Grade OpenSource Web Analytics System – Part 1: Architecture

Some time ago I wrote a litte series on how to amp up your log analytics activities. Ever since then I wanted to start another project building a fully fledged Analytics system with client side tracking and unlimited scalability out of OpenSource components. This is what this series is about, since I had some time to kill during Easter in isolation 😊 This time, we will be using a tracker on the browser or mobile app of our users instead of logfiles alone, which is called client side tracking. That will give us a lot more information about our visitors and allow for some cool new use cases. It also is similar to how tools like Adobe Analytics or Google Analytics work. The data we collect has then to be processed and stored for analysis and future use. As a client side tracker, we will be using the Snowplow tracker. […]

Building your own Web Analytics from Log Files – Part 6: Conclusion

This is the sixth part of the six-part-series “Building your own Web Analytics from Log Files”. In this series we built a rather sophisticated logging and tracking functionality for our website. We used OpenResty to identify and fingerprint our users via cookies, stored that information to log files which were shipped to Elasticsearch and visualized with Kibana. Web Analytics democratized By using those techniques, we are able to use what we already have (log file processing) to answer questions about our users. Under best conditions this doesn’t even lead to a bigger technical footprint. This way we can have deep insights into our user behavior without external tools. Even as a startup or hobby developer you are now able to put the user first on your digital platforms. Next steps While this series is done for now we have a starting point to further build our platform. With some frontend […]

Building your own Web Analytics from Log Files – Part 5: Building our first Dashboard

This is the fifth part of the six-part-series “Building your own Web Analytics from Log Files”. At this part of the series we have our log files in Elasticsearch with indices like “custom-filebeat-tracking-logs-7.4.0-2020.01.03”. First thing is to set up a Kibana index pattern for this. Kibana Configuration In Kibana we go to Management -> Index Patterns -> Create index pattern. As Index pattern we use “custom-filebeat-tracking-logs-*”, which gives us all the indices with our daily index pattern. In the next step, we set the Time Filter field name to “@timestamp”. This is the timestamp that marks the point where Filebeat indexed the document. This is fine for now, we click “Create index pattern” and are done with this part! Checking our Data Now, let’s head to the Discover section in Kibana and look at our index pattern. And there it is: Our log entries show up like we wanted: This […]

Building your own Web Analytics from Log Files – Part 4: Data Collection and Processing

This is the fourth part of the six-part-series “Building your own Web Analytics from Log Files”. Legal Disclaimer: This post describes how to identify and track the users on your website using cookies, IP adresses and browser fingerprinting. The information and process described here may be subject to data privacy regulations under your legislation. It is your responsibility to comply with all regulations. Please educate yourself if things like GDPR apply to your use case (which is very likely), and act responsibly. In the last part we have built a configuration for OpenResty to generate user and session IDs and store them in browser cookies. Now we need a way to actually log and collect those IDs together with the requests our web server handles. OpenResty Configuration To be able to log our custom variables we need to announce them to Nginx. This is done right in the server-part of […]

Building your own Web Analytics from Log Files – Part 3: Setting up Nginx with OpenResty

This is the third part of the six-part-series “Building your own Web Analytics from Log Files”. Legal Disclaimer: This post describes how to identify and track the users on your website using cookies and browser fingerprinting. The information and process described here may be subject to data privacy regulations under your legislation. It is your responsibility to comply with all regulations. Please educate yourself if things like GDPR apply to your use case (which is very likely), and act responsibly. Identifying Users and Sessions One of our goals for this project is to be able to tell how many people are using our site. This means we need a way to differentiate between the users on our site. One approach would be to look at the IP addresses of our users. This is not very precise since all devices with the same internet connection share an IP address. Especially for […]

How I contributed to Elasticsearch without writing any code

Open Source Software (OSS) is becoming more important each day. While in the early days, most software written was offered as proprietary products, today large products are available as OSS. On one hand this often includes the ability to use it for free and change it if needed. On the other hand, those projects rely on contributions by personal and corporate volunteers to maintain the software. But the process for that can be intimidating, since those big projects seem to large and professional for anyone to make a meaningful contribution. Bricking my Elasticsearch cluster I am using OSS for almost all my projects. For personal projects, I simply don’t have the resources to adopt large scale enterprise systems, and for professional projects it’s great to be able to save cost and help those projects grow. Like I’ve described before, I use Elasticsearch whenever I need to process logfiles or vastly […]