Debezium vs. the world An overview of the CDC ecosystem
Marta Paes Sr. Product Manager @Materialize
Slide 2
This is not a 🌶 talk. Things move fast. If you notice inaccuracies, or are building a tool that could be featured in a future version of this talk, come around after the talk!
Slide 3
What we talk about when we talk about CDC Query-based CDC
❌ Some data changes might get lost ❌ DELETE operations are not captured ❌ Trade-off: frequency vs. load on source DBs ❌ Can’t propagate schema changes
Slide 4
What we talk about when we talk about CDC Query-based CDC
What if we just tapped into the transaction log?
Slide 5
What we talk about when we talk about CDC Query-based CDC
Log-based CDC
✅ All data changes are captured ✅ More context on the actual changes ✅ Low propagation delay (i.e. near real time) ✅ Less taxing on the source database
Slide 6
Tale of the tape Or, how it all started.
Slide 7
How it all started Like most tools that are a commodity in streaming today, the first CDC systems were developed at internet-scale companies.
2013
Databus (LinkedIn), Wormhole (Facebook), MoSQL (Stripe)
Slide 8
How it all started Like most tools that are a commodity in streaming today, the first CDC systems were developed at internet-scale companies. Maxwell (Zendesk), Bottled Water (Confluent) 2013 2015 Databus (LinkedIn), Wormhole (Facebook), MoSQL (Stripe)
Slide 9
How it all started Like most tools that are a commodity in streaming today, the first CDC systems were developed at internet-scale companies. Maxwell (Zendesk), Bottled Water (Confluent) 2016
2013 2015 Databus (LinkedIn),
Debezium (Red Hat),
Wormhole (Facebook),
MySQL Streamer (Yelp)
MoSQL (Stripe)
Slide 10
How it all started Like most tools that are a commodity in streaming today, the first CDC systems were developed at internet-scale companies. Maxwell (Zendesk), Bottled Water (Confluent) 2016
2013 2015
2019 2018
Databus (LinkedIn),
Debezium (Red Hat),
Wormhole (Facebook),
MySQL Streamer (Yelp)
MoSQL (Stripe)
Spinal Tap (Airbnb)
DBLog (Netflix)
Slide 11
Where it landed Debezium has become the standard CDC tool over time, with a strong community behind it. Like any tool, it has some good and some less good. The good 😚 ●
The less good 😕
Deployment via well-understood tools
●
(Kafka + Kafka Connect). ●
Standard schema for change events.
●
Support for a large number of CDC
At-least-once delivery guarantees*, no transactional consistency OOTB.
●
No graceful schema evolution OOTB.
connectors.
Exactly-once support (KIP-618) will gradually roll out, starting with the PostgreSQL connector in 2.3.
Slide 12
Round 1 🔔 Same same, but different.
Slide 13
“Have you heard about this new CDC tool?”
Myth buster 👻: you don’t need Kafka and Kafka Connect to run Debezium! You can embed it in your applications using the Debezium Engine, or target other sink types (e.g. Amazon Kinesis, Google Pub/Sub) using the Debezium Server.
Slide 14
Running Debezium under the hood Tools that leverage the Debezium Engine or the Debezium Server can: ●
Abstract some complexity of operating Debezium et. al from the end user.
●
Enable advanced features like schema evolution using existing primitives. Examples
Debezium
Streamkap
RisingWave CDC connectors
Flink CDC connectors
Confluent CDC connectors
Slide 15
Round 2 🔔 CDC for the rest of us.
Slide 16
“Have you heard about streaming?” Tools building support for CDC from scratch can: ●
Create a user experience that is tailored to long-time SQL users.
●
Have more control over semantics.
Examples
Artie
Acquisitions
Estuary
Materialize
HVR (Fivetran)
Arcion (Databricks)
Slide 17
Decision Debezium isn’t going anywhere…
Slide 18
…but there’s a whole world to explore! Check out Materialize and our native PostgreSQL and MySQL CDC sources if you’re considering streaming SQL!