One Does Not Simply Query a Stream! Viktor Gamov, Con luent @gamussa Zurich Ka ka Meetup, Switzerland 2025 f
f
@gamussa | @confluentinc | @apacheflink
Slide 2
@gamussa | @confluentinc | @apacheflink
Slide 3
@gamussa | @confluentinc | @apacheflink
Slide 4
Viktor GAMOV Principal Developer Advocate | Con luent
f
f
THE CLOUD CONNECTIVITY COMPANY
Twitter X: @gamussa
Kong Con idential
Slide 5
Simpler times Monolith
@gamussa | gamov.dev/rel | @ConfluentInc
Slide 6
Simpler analytics ETL and CDC
@gamussa | gamov.dev/rel | @ConfluentInc
Slide 7
DHW->Hadoop Mobile Era
@gamussa | gamov.dev/rel | @ConfluentInc
Slide 8
Data Pipelines Streaming data pipelines and Microservices @gamussa | gamov.dev/rel | @ConfluentInc
Slide 9
LOG @gamussa | gamov.dev/rel | @ConfluentInc
Slide 10
@gamussa | @confluentinc | @apacheflink
Slide 11
@gamussa | @confluentinc | @apacheflink
Slide 12
OLTP stream vs OLAP vs. OLTP in Streams OLAP streams @gamussa | gamov.dev/rel | @ConfluentInc
Slide 13

Skip Paywall
Sign Up for Confluent Cloud Get $400 worth free credits for your first 30 Days
Use Promo Code POPTOUT000MZG62 to skip the paywall!
13
Slide 14
Our Options
f
• Connect/Relational DB • Ka ka Streams • Streaming SQL • Data Warehouse • Data Lake • Real-Time OLAP Database
Slide 15
f
Ka ka Connect
Slide 16
Connect/RDBMS Broker Broker Broker Cluster Data Source
Kafka Connect
Kafka Connect
Data Sink
Slide 17
`
Connect/RDBMS • Suitable for smaller data • Transactional • Familiar to users
Slide 18
f
Ka ka Streams
Slide 19
Ka ka Streams (transactional)
f
• Ingests directly from a topic • KTable • Forms an in-memory key/value store suitable for querying by topic key • Scalable across members of a consumer group • Readable through Interactive Queries
Slide 20
Ka ka Streams (transactional)
final KStream<String, String> stream = builder.stream(inputTopic, Consumed.with(stringSerde, stringSerde));
f
final KTable<String, String> convertedTable = stream.toTable(Materialized.as(“streamconverted-to-table”));
Slide 21
Ka ka Streams (analytical) • • • • •
Full-featured Java stream processing API Arbitrary streaming computation Can emit new streams (not this talk) KTables queryable by key
f
Every read pattern requires its own topology • Interactive Queries again
Slide 22
Ka ka Streams (analytical)
KTable<String, Long> wordCounts = textLines .flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split(“\W+”))) .groupBy((key, word) -> word) .count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>>as(“counts-store”));
f
wordCounts.toStream().to(“WordsWithCountsTopic”, Produced.with(Serdes.String(), Serdes.Long()));
Why not Flink?
@gamussa | gamov.dev/rel | @ConfluentInc
Slide 26
@gamussa | gamov.dev/rel | @ConfluentInc
Slide 27
Materialize
f
• Replacement data warehouse • Integrates with Ka ka, Postgres, dbt • The Materialized View is the central abstraction • Views are persistent and queryable • Postgres wire-compatible • Positioned as an analytics solution
Slide 28
Delta Stream • • • •
f
Cloud-native streaming SQL Serverless, BYOC Ka ka, Kinesis integration Materialized views and streaming pipelines • streaming database and streaming analytics
Slide 29
Rising Wave
f
• Distributed SQL Streaming database • Cloud and OSS versions • Implementation of Flink in Rust • Ka ka, Pulsar, Kinesis integrations • Flink+persistent views • Postgres wire-compatible
Slide 30
ksqlDB
f
• «Streaming Database» • Provides persistent TABLE abstraction • Pull and Push queries • Like Ka kaStreams, but in SQL
Slide 31
Real-Time Analytics Database
Slide 32
Real-Time OLAP
f
• Designed for high concurrency, low latency queries • Ingests from streaming and batch sources • Intimate integration with Ka ka • Conventional tables and SQL
Slide 33
Real-Time OLAP • Analytics shaped like real-time data • Analytics when users are decision makers
Slide 34
Cloud Data Warehouses
Slide 35
Cloud Data Warehouses
Slide 36
Cloud Data Warehouses • The cloud-based heir of legacy DWH • Ingest from batch and streaming sources • Biased towards structured data and batch access
Slide 37
Data Lake
Slide 38
Data Lake
f
Anything else
We’ll igure this out
Slide 39
Data Lakes • • • • •
Started as the HDFS cluster Became S3 That didn’t help… ELT vs. ETL Iceberg/Hudi/DeltaLake
Slide 40
Data Lakes
f
• Storage and compute are radically decoupled • Structure is relatively less important • Reads are slow • Streaming is historically dif icult
Slide 41
No Solutions Technology Selection only Trade Offs @gamussa | gamov.dev/rel | @ConfluentInc
Slide 42
Sometimes you go with what you know
Slide 43
This is not bad!
Slide 44
Performance
Performance
Slide 45
Community/Adoption
Community
Slide 46
Differentiated Application Code
Area of Exploration Kafka @gamussa | gamov.dev/rel | @ConfluentInc
Slide 47
@gamussa | @confluentinc | @apacheflink
Slide 48

Skip Paywall
Sign Up for Confluent Cloud Get $400 worth free credits for your first 30 Days
Use Promo Code POPTOUT000MZG62 to skip the paywall!
48