One Does Not Simply Query a Stream

A presentation at Zurich Kafka Meetup in January 2025 in Zürich, Switzerland by Viktor Gamov

Slide 1

Slide 1

One Does Not Simply Query a Stream! Viktor Gamov, Con luent @gamussa Zurich Ka ka Meetup, Switzerland 2025 f f @gamussa | @confluentinc | @apacheflink

Slide 2

Slide 2

@gamussa | @confluentinc | @apacheflink

Slide 3

Slide 3

@gamussa | @confluentinc | @apacheflink

Slide 4

Slide 4

Viktor GAMOV Principal Developer Advocate | Con luent f f THE CLOUD CONNECTIVITY COMPANY Twitter X: @gamussa Kong Con idential

Slide 5

Slide 5

Simpler times Monolith @gamussa | gamov.dev/rel | @ConfluentInc

Slide 6

Slide 6

Simpler analytics ETL and CDC @gamussa | gamov.dev/rel | @ConfluentInc

Slide 7

Slide 7

DHW->Hadoop Mobile Era @gamussa | gamov.dev/rel | @ConfluentInc

Slide 8

Slide 8

Data Pipelines Streaming data pipelines and Microservices @gamussa | gamov.dev/rel | @ConfluentInc

Slide 9

Slide 9

LOG @gamussa | gamov.dev/rel | @ConfluentInc

Slide 10

Slide 10

@gamussa | @confluentinc | @apacheflink

Slide 11

Slide 11

@gamussa | @confluentinc | @apacheflink

Slide 12

Slide 12

OLTP stream vs OLAP vs. OLTP in Streams OLAP streams @gamussa | gamov.dev/rel | @ConfluentInc

Slide 13

Slide 13

 Skip Paywall Sign Up for Confluent Cloud Get $400 worth free credits for your first 30 Days Use Promo Code POPTOUT000MZG62 to skip the paywall! 13

Slide 14

Slide 14

Our Options f • Connect/Relational DB • Ka ka Streams • Streaming SQL • Data Warehouse • Data Lake • Real-Time OLAP Database

Slide 15

Slide 15

f Ka ka Connect

Slide 16

Slide 16

Connect/RDBMS Broker Broker Broker Cluster Data Source Kafka Connect Kafka Connect Data Sink

Slide 17

Slide 17

` Connect/RDBMS • Suitable for smaller data • Transactional • Familiar to users

Slide 18

Slide 18

f Ka ka Streams

Slide 19

Slide 19

Ka ka Streams (transactional) f • Ingests directly from a topic • KTable • Forms an in-memory key/value store suitable for querying by topic key • Scalable across members of a consumer group • Readable through Interactive Queries

Slide 20

Slide 20

Ka ka Streams (transactional) final KStream<String, String> stream = builder.stream(inputTopic, Consumed.with(stringSerde, stringSerde)); f final KTable<String, String> convertedTable = stream.toTable(Materialized.as(“streamconverted-to-table”));

Slide 21

Slide 21

Ka ka Streams (analytical) • • • • • Full-featured Java stream processing API Arbitrary streaming computation Can emit new streams (not this talk) KTables queryable by key f Every read pattern requires its own topology • Interactive Queries again

Slide 22

Slide 22

Ka ka Streams (analytical) KTable<String, Long> wordCounts = textLines .flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split(“\W+”))) .groupBy((key, word) -> word) .count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>>as(“counts-store”)); f wordCounts.toStream().to(“WordsWithCountsTopic”, Produced.with(Serdes.String(), Serdes.Long()));

Slide 23

Slide 23

Streaming SQLs

Slide 24

Slide 24

Streaming SQL • • • • Materialize DeltaStream RisingWave ksqlDB

Slide 25

Slide 25

Why not Flink? @gamussa | gamov.dev/rel | @ConfluentInc

Slide 26

Slide 26

@gamussa | gamov.dev/rel | @ConfluentInc

Slide 27

Slide 27

Materialize f • Replacement data warehouse • Integrates with Ka ka, Postgres, dbt • The Materialized View is the central abstraction • Views are persistent and queryable • Postgres wire-compatible • Positioned as an analytics solution

Slide 28

Slide 28

Delta Stream • • • • f Cloud-native streaming SQL Serverless, BYOC Ka ka, Kinesis integration Materialized views and streaming pipelines • streaming database and streaming analytics

Slide 29

Slide 29

Rising Wave f • Distributed SQL Streaming database • Cloud and OSS versions • Implementation of Flink in Rust • Ka ka, Pulsar, Kinesis integrations • Flink+persistent views • Postgres wire-compatible

Slide 30

Slide 30

ksqlDB f • «Streaming Database» • Provides persistent TABLE abstraction • Pull and Push queries • Like Ka kaStreams, but in SQL

Slide 31

Slide 31

Real-Time Analytics Database

Slide 32

Slide 32

Real-Time OLAP f • Designed for high concurrency, low latency queries • Ingests from streaming and batch sources • Intimate integration with Ka ka • Conventional tables and SQL

Slide 33

Slide 33

Real-Time OLAP • Analytics shaped like real-time data • Analytics when users are decision makers

Slide 34

Slide 34

Cloud Data Warehouses

Slide 35

Slide 35

Cloud Data Warehouses

Slide 36

Slide 36

Cloud Data Warehouses • The cloud-based heir of legacy DWH • Ingest from batch and streaming sources • Biased towards structured data and batch access

Slide 37

Slide 37

Data Lake

Slide 38

Slide 38

Data Lake f Anything else We’ll igure this out

Slide 39

Slide 39

Data Lakes • • • • • Started as the HDFS cluster Became S3 That didn’t help… ELT vs. ETL Iceberg/Hudi/DeltaLake

Slide 40

Slide 40

Data Lakes f • Storage and compute are radically decoupled • Structure is relatively less important • Reads are slow • Streaming is historically dif icult

Slide 41

Slide 41

No Solutions Technology Selection only Trade Offs @gamussa | gamov.dev/rel | @ConfluentInc

Slide 42

Slide 42

Sometimes you go with what you know

Slide 43

Slide 43

This is not bad!

Slide 44

Slide 44

Performance Performance

Slide 45

Slide 45

Community/Adoption Community

Slide 46

Slide 46

Differentiated Application Code Area of Exploration Kafka @gamussa | gamov.dev/rel | @ConfluentInc

Slide 47

Slide 47

@gamussa | @confluentinc | @apacheflink

Slide 48

Slide 48

 Skip Paywall Sign Up for Confluent Cloud Get $400 worth free credits for your first 30 Days Use Promo Code POPTOUT000MZG62 to skip the paywall! 48