Let’s Do Data Lineage in Kafka, Flink and Druid by Tracking Aircraft!

A presentation at DevDays Europe 2024 in in Vilnius, Lithuania by Hellmar Becker

Data lineage means you can track the data bits in your system and know at any time where they come from and how they have been processed. Enterprise systems need to be able to prove lineage for compliance reasons, but in general, lineage is also a significant aspect of data discoverability and governance.

In this talk, Hellmar is going to connect several Raspberry Pi devices that collect ADS-B (aircraft radar) data to a KFD (Kafka-Flink-Druid) stack for analytical processing. He will deliver the data through Kafka, cleanse and enrich it with Flink, and run analytical queries on the results using Druid.

He will demonstrate how to track data lineage through Kafka metadata and show how this information can be maintained throughout the processing pipeline. This relies on using Kafka headers, an underused feature of Kafka that also integrates readily and easily with Druid!

You will learn how to implement data lineage using the open-source KFD stack and readily available data sources. This way, you can try out enterprise-style data lineage processing at home and prepare yourself for a question that will inevitably arise in any enterprise data engineering project!