One Does Not Simply Query a Stream

Slide 1

One Does Not Simply Query a Stream! Viktor Gamov, Con luent @gamussa Zurich Ka ka Meetup, Switzerland 2025 f f @gamussa | @confluentinc | @apacheflink

Slide 2

@gamussa | @confluentinc | @apacheflink

Slide 3

@gamussa | @confluentinc | @apacheflink

Slide 4

Viktor GAMOV Principal Developer Advocate | Con luent f f THE CLOUD CONNECTIVITY COMPANY Twitter X: @gamussa Kong Con idential

Slide 5

Simpler times Monolith @gamussa | gamov.dev/rel | @ConfluentInc

Slide 6

Simpler analytics ETL and CDC @gamussa | gamov.dev/rel | @ConfluentInc

Slide 7

DHW->Hadoop Mobile Era @gamussa | gamov.dev/rel | @ConfluentInc

Slide 8

Data Pipelines Streaming data pipelines and Microservices @gamussa | gamov.dev/rel | @ConfluentInc

Slide 9

LOG @gamussa | gamov.dev/rel | @ConfluentInc

Slide 10

@gamussa | @confluentinc | @apacheflink

Slide 11

@gamussa | @confluentinc | @apacheflink

Slide 12

OLTP stream vs OLAP vs. OLTP in Streams OLAP streams @gamussa | gamov.dev/rel | @ConfluentInc

Slide 13

Skip Paywall Sign Up for Confluent Cloud Get $400 worth free credits for your first 30 Days Use Promo Code POPTOUT000MZG62 to skip the paywall! 13

Slide 14

Our Options f • Connect/Relational DB • Ka ka Streams • Streaming SQL • Data Warehouse • Data Lake • Real-Time OLAP Database

Slide 15

f Ka ka Connect

Slide 16

Connect/RDBMS Broker Broker Broker Cluster Data Source Kafka Connect Kafka Connect Data Sink

Slide 17

` Connect/RDBMS • Suitable for smaller data • Transactional • Familiar to users

Slide 18

f Ka ka Streams

Slide 19

Ka ka Streams (transactional) f • Ingests directly from a topic • KTable • Forms an in-memory key/value store suitable for querying by topic key • Scalable across members of a consumer group • Readable through Interactive Queries

Slide 20

Ka ka Streams (transactional) final KStream<String, String> stream = builder.stream(inputTopic, Consumed.with(stringSerde, stringSerde)); f final KTable<String, String> convertedTable = stream.toTable(Materialized.as(“streamconverted-to-table”));

Slide 21

Ka ka Streams (analytical) • • • • • Full-featured Java stream processing API Arbitrary streaming computation Can emit new streams (not this talk) KTables queryable by key f Every read pattern requires its own topology • Interactive Queries again

Slide 22

Ka ka Streams (analytical) KTable<String, Long> wordCounts = textLines .flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split(“\W+”))) .groupBy((key, word) -> word) .count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>>as(“counts-store”)); f wordCounts.toStream().to(“WordsWithCountsTopic”, Produced.with(Serdes.String(), Serdes.Long()));

Slide 23

Streaming SQLs

Slide 24

Streaming SQL • • • • Materialize DeltaStream RisingWave ksqlDB

Slide 25

Why not Flink? @gamussa | gamov.dev/rel | @ConfluentInc

Slide 26

@gamussa | gamov.dev/rel | @ConfluentInc

Slide 27

Materialize f • Replacement data warehouse • Integrates with Ka ka, Postgres, dbt • The Materialized View is the central abstraction • Views are persistent and queryable • Postgres wire-compatible • Positioned as an analytics solution

Slide 28

Delta Stream • • • • f Cloud-native streaming SQL Serverless, BYOC Ka ka, Kinesis integration Materialized views and streaming pipelines • streaming database and streaming analytics

Slide 29

Rising Wave f • Distributed SQL Streaming database • Cloud and OSS versions • Implementation of Flink in Rust • Ka ka, Pulsar, Kinesis integrations • Flink+persistent views • Postgres wire-compatible

Slide 30

ksqlDB f • «Streaming Database» • Provides persistent TABLE abstraction • Pull and Push queries • Like Ka kaStreams, but in SQL

Slide 31

Real-Time Analytics Database

Slide 32

Real-Time OLAP f • Designed for high concurrency, low latency queries • Ingests from streaming and batch sources • Intimate integration with Ka ka • Conventional tables and SQL

Slide 33

Real-Time OLAP • Analytics shaped like real-time data • Analytics when users are decision makers

Slide 34

Cloud Data Warehouses

Slide 35

Cloud Data Warehouses

Slide 36

Cloud Data Warehouses • The cloud-based heir of legacy DWH • Ingest from batch and streaming sources • Biased towards structured data and batch access

Slide 37

Data Lake

Slide 38

Data Lake f Anything else We’ll igure this out

Slide 39

Data Lakes • • • • • Started as the HDFS cluster Became S3 That didn’t help… ELT vs. ETL Iceberg/Hudi/DeltaLake

Slide 40

Data Lakes f • Storage and compute are radically decoupled • Structure is relatively less important • Reads are slow • Streaming is historically dif icult