Debezium vs. the world: an overview of the CDC ecosystem

A presentation at Kafka Summit 2024 in March 2024 in London, UK by Marta Paes

Slide 1

Slide 1

Debezium vs. the world An overview of the CDC ecosystem Marta Paes Sr. Product Manager @Materialize

Slide 2

Slide 2

This is not a 🌶 talk. Things move fast. If you notice inaccuracies, or are building a tool that could be featured in a future version of this talk, come around after the talk!

Slide 3

Slide 3

What we talk about when we talk about CDC Query-based CDC ❌ Some data changes might get lost ❌ DELETE operations are not captured ❌ Trade-off: frequency vs. load on source DBs ❌ Can’t propagate schema changes

Slide 4

Slide 4

What we talk about when we talk about CDC Query-based CDC What if we just tapped into the transaction log?

Slide 5

Slide 5

What we talk about when we talk about CDC Query-based CDC Log-based CDC ✅ All data changes are captured ✅ More context on the actual changes ✅ Low propagation delay (i.e. near real time) ✅ Less taxing on the source database

Slide 6

Slide 6

Tale of the tape Or, how it all started.

Slide 7

Slide 7

How it all started Like most tools that are a commodity in streaming today, the first CDC systems were developed at internet-scale companies. 2013 Databus (LinkedIn), Wormhole (Facebook), MoSQL (Stripe)

Slide 8

Slide 8

How it all started Like most tools that are a commodity in streaming today, the first CDC systems were developed at internet-scale companies. Maxwell (Zendesk), Bottled Water (Confluent) 2013 2015 Databus (LinkedIn), Wormhole (Facebook), MoSQL (Stripe)

Slide 9

Slide 9

How it all started Like most tools that are a commodity in streaming today, the first CDC systems were developed at internet-scale companies. Maxwell (Zendesk), Bottled Water (Confluent) 2016 2013 2015 Databus (LinkedIn), Debezium (Red Hat), Wormhole (Facebook), MySQL Streamer (Yelp) MoSQL (Stripe)

Slide 10

Slide 10

How it all started Like most tools that are a commodity in streaming today, the first CDC systems were developed at internet-scale companies. Maxwell (Zendesk), Bottled Water (Confluent) 2016 2013 2015 2019 2018 Databus (LinkedIn), Debezium (Red Hat), Wormhole (Facebook), MySQL Streamer (Yelp) MoSQL (Stripe) Spinal Tap (Airbnb) DBLog (Netflix)

Slide 11

Slide 11

Where it landed Debezium has become the standard CDC tool over time, with a strong community behind it. Like any tool, it has some good and some less good. The good 😚 ● The less good 😕 Deployment via well-understood tools ● (Kafka + Kafka Connect). ● Standard schema for change events. ● Support for a large number of CDC At-least-once delivery guarantees*, no transactional consistency OOTB. ● No graceful schema evolution OOTB. connectors.

  • Exactly-once support (KIP-618) will gradually roll out, starting with the PostgreSQL connector in 2.3.

Slide 12

Slide 12

Round 1 🔔 Same same, but different.

Slide 13

Slide 13

“Have you heard about this new CDC tool?” Myth buster 👻: you don’t need Kafka and Kafka Connect to run Debezium! You can embed it in your applications using the Debezium Engine, or target other sink types (e.g. Amazon Kinesis, Google Pub/Sub) using the Debezium Server.

Slide 14

Slide 14

Running Debezium under the hood Tools that leverage the Debezium Engine or the Debezium Server can: ● Abstract some complexity of operating Debezium et. al from the end user. ● Enable advanced features like schema evolution using existing primitives. Examples Debezium Streamkap RisingWave CDC connectors Flink CDC connectors Confluent CDC connectors

Slide 15

Slide 15

Round 2 🔔 CDC for the rest of us.

Slide 16

Slide 16

“Have you heard about streaming?” Tools building support for CDC from scratch can: ● Create a user experience that is tailored to long-time SQL users. ● Have more control over semantics. Examples Artie Acquisitions Estuary Materialize HVR (Fivetran) Arcion (Databricks)

Slide 17

Slide 17

Decision Debezium isn’t going anywhere…

Slide 18

Slide 18

…but there’s a whole world to explore! Check out Materialize and our native PostgreSQL and MySQL CDC sources if you’re considering streaming SQL!