Building an Event Analytics Pipeline with Kafka, ksqlDB, and Druid

A presentation at Big Data Conference Europe in November 2023 in Vilnius, Lithuania by Hellmar Becker

Slide 1

Slide 1

Building an Event Analytics Pipeline with Kafka, ksqlDB, and Druid Hellmar Becker, Senior Sales Engineer ©2022, imply ©2022, Imply 1

Slide 2

Slide 2

About Me Hellmar Becker Sr. Sales Engineer at Imply Lives near Munich hellmar.becker@imply.io https://www.linkedin.com/in/hellmarbecker/ https://blog.hellmar-becker.de/ 2 ©2022, Imply

Slide 3

Slide 3

Agenda ● ● ● ● ● ● ● The Case for Streaming Analytics How to Prepare Your Data: Streaming ETL How to Analyze Your Data: Streaming Analytics Apache Druid - A Streaming Analytics Database K2D - A Streaming Analytics Architecture Live Demo Q&A ©2022, Imply

Slide 4

Slide 4

The Case for Streaming Analytics ● ● ● Analytics - “the process of discovering, interpreting, and communicating significant patterns in data.” OLAP = Online Analytical Processing Classical: Source Data Transactional Database (OLTP) Batch ETL Analytical Database (OLAP) Client But that’s not enough anymore! ©2022, Imply

Slide 5

Slide 5

The Case for Streaming Analytics (contd.) ● ● ● ● The Big Data Hype gave us the Lambda Architecture Separate paths for batch and realtime One common serving layer Complex, hard to reconcile Image source: https://www.ericsson.com/en/blog/2015/11/data-processing-architectures—lambda-and-kappa ©2022, Imply

Slide 6

Slide 6

The Case for Streaming Analytics (contd.) ● ● 2014 Jay Krepps: Kappa Architecture Avoids having separate code paths for batch and streaming Image source: https://www.ericsson.com/en/blog/2015/11/data-processing-architectures—lambda-and-kappa ©2022, Imply

Slide 7

Slide 7

How to prepare your Data: Streaming ETL ETL = Extract, Transform, Load Let’s focus on the Transform part Simple Event Processing = 1 event at a time ● Filter ● Transform ● Cleanse Complex Event Processing = Relate events to each other ● Windowing ● Aggregations ● Joins ● Enrichment ksqlDB is a tool by Confluent that does streaming ETL using streaming SQL ©2022, Imply

Slide 8

Slide 8

How to analyze your data: Streaming Analytics with Druid For analytics applications that require: 1 Sub-second queries at any scale 2 High concurrency at the lowest cost 3 Real-time and historical insights Interactive analytics on TB-PBs of data 100s to 1000s QPS via a highly efficient engine True stream ingestion for Kafka and Kinesis Plus, non-stop reliability with automated fault tolerance and continuous backup ©2022, Imply

Slide 9

Slide 9

Why do you need a Streaming Analytics Database? ©2022, Imply

Slide 10

Slide 10

K2D Architecture - Kafka to Druid Data sources Stream ETL Stream Processor µService Data/Event driven Apps Custom visualizations Streams Database CDC Event Streaming Infrastructure Event Analytics Infrastructure Files Realtime Analytics Messaging BI tools App data Databases Data Lake Root-cause analysis Dashboards & reports

Slide 11

Slide 11

Preprocessing - What we are going to do today * * * * * * * * * Filter out data by type * * * Filter out data by field values * * * ©2022, Imply

Slide 12

Slide 12

Use Case: Publisher Clickstream Data ©2022, Imply 12

Slide 13

Slide 13

Demo Architecture Delivery Processing Storage Query Visualisation Analytics Pipeline Data Production Tracking, Transactions Kafka as an event streaming platform Preprocessing with ksqlDB: - Filter - Enrich - Transform 13 Apache Druid Elastic storage model (in production backed by cloud storage) Apache Druid Analytical queries against realtime, detail data Imply Pivot a data exploration and adhoc analytics GUI for Druid Highly scalable, built in DR ©2022, Imply

Slide 14

Slide 14

Live Demo ©2022, Imply 14

Slide 15

Slide 15

Learnings ● ● Kafka and Druid complement each other Use ksqlDB for ● ● ● ● Use Druid for ● ● ● ● Preprocessing Enrichment Materialized views Scalable analytical applications Adhoc data exploration OLAP style analysis Integration is easy with native integration APIs ©2022, Imply 15

Slide 16

Slide 16

Questions hellmar.becker@imply.io https://www.linkedin.com/in/hellmarbecker/ https://blog.hellmar-becker.de/ ©2022, imply ©2022, Imply 16