Stream, Materialize, Serve – Knitting Flawless Pipelines with Kafka, Flink, and Pinot

A presentation at RTA Summit in May 2025 in by Viktor Gamov

Slide 1

Slide 1

Stream, Materialize, Serve Knitting Flawless Pipelines with Kafka, Flink, and Pinot Tim Berglund VP DevRel, Confluent Viktor Gamov Principal Developer Advocate @gamussa | developer.confluent.io | @tlberglund

Slide 2

Slide 2

What is Apache Pinot ? ™ @gamussa | developer.confluent.io | @tlberglund

Slide 3

Slide 3

“Apache Pinot is a real-time distributed OLAP database, designed to serve OLAP workloads on streaming data with extreme low latency and high concurrency.” @gamussa | developer.confluent.io | @tlberglund

Slide 4

Slide 4

The essence of real-time analytics LATENCY The amount of time it takes to execute a query CONCURRENCY The ability of a system to handle multiple queries simultaneously @gamussa | developer.confluent.io | @tlberglund FRESHNESS The up-to-date nature of data in the system

Slide 5

Slide 5

The essence of real-time analytics LATENCY CONCURRENCY FRESHNESS As low as 10ms As many as 100,000 queries per second Seconds from event time till queryable in Pinot @gamussa | developer.confluent.io | @tlberglund

Slide 6

Slide 6

OLTP OLTP OLAP • Transaction focused • Write-heavy workloads • Often involves a single record per operation • Aggregation-focused • Read-heavy workloads • Often involves many records in one operation @gamussa | developer.confluent.io | @tlberglund

Slide 7

Slide 7

Data Model ● Pinot uses the completely familiar tabular data model ● Table creation and schema definition expressed in JSON ● Queries expressed in SQL

Slide 8

Slide 8

Kafka + Pinot Streaming Ingestion @gamussa | developer.confluent.io | @tlberglund

Slide 9

Slide 9

@gamussa | developer.confluent.io | @tlberglund

Slide 10

Slide 10

Kafka + Flink + Pinot Knitting Flawless Pipelines @gamussa | developer.confluent.io | @tlberglund

Slide 11

Slide 11

Flink 101 @gamussa | developer.confluent.io | @tlberglund

Slide 12

Slide 12

«Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.» @gamussa | developer.confluent.io | @tlberglund

Slide 13

Slide 13

Real-time services rely on stream processing Files Real-time Stream Processing Ka ka Sinks Sources Apps Databases Key/Value Stores f @gamussa | developer.confluent.io | @tlberglund

Slide 14

Slide 14

What is Flink SQL @gamussa | developer.confluent.io | @tlberglund

Slide 15

Slide 15

A standards-compliant SQL engine for processing both batch and streaming data with the scalability, performance, and consistency of Apache Flink @gamussa | developer.confluent.io | @tlberglund

Slide 16

Slide 16

How does Flink work with Kafka? @gamussa | developer.confluent.io | @tlberglund

Slide 17

Slide 17

Slide 18

Slide 18

@gamussa | developer.confluent.io | @tlberglund

Slide 19

Slide 19

Source: Streaming Databases, Hubert Dulay, Ralph Matthias Debusmann @gamussa | developer.confluent.io | @tlberglund

Slide 20

Slide 20

Check out developer.confluent.io @tlberglund | @gamussa