The Changing Face of ETL Event-Driven Architectures for Data Engineers Photo by rmoff
@rmoff
Build data pipelines better Currently: Inflexible, Slow, & Brittle Technology now exists for scalable, flexible, low-latency pipelines —-Big Data Tech Warsaw: 30 minutes Kafka Paris Meetup: 43 minutes
Photo by Samuel Sianipar on Unsplash
Think about pipelines • Traditional ETL - building DW/DL • Integration - building pipelines to feed other systems e.g. IoT -> timeseries, log aggregation, etc
Photo by Rohit Tandon on Unsplash
Pipelines grow larger, more complex
Photo by Theodore Moore on Unsplash
Pipelines become intertwined, tightly-coupled; difficult to unravel
Photo by Cristian Grecu on Unsplash
We’ve all got skeletons in our pipelines of which we’re not proud
@rmoff #stratadata
Photo by Patrick Fore on Unsplash
It used to be so simple
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Used to be a single DB from a single mandated vendor with a few transactional systems Load it into a single centralised DW
@rmoff #stratadata
Photo by Eugenio Mazzone on Unsplash
More Sources
Microservices store data where they want Diverse technology, on-premises & cloud SaaS & third-party data
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata
Photo by Tom Barrett on Unsplash
More Targets
More users of the data. Not just a Data warehouse / data mart / data lake any more Other analytics platforms (e.g. S3, HDFS, Snowflake, BigQuery) Specialised technologies: Graph, Full Text Search, NoSQL
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata
Photo by Kirill on Unsplash
More Data
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Seems obviously to mention Big Data but we build systems same way we did 40 years ago Orders of magnitude more data - IoT, mobile, app generated More diverse data sets too
@rmoff #stratadata
Batches and Buckets The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Despite this, we do things the same way we also did. Batch. We wait … and then we process data. We land it down. We pick it up, we process it. LATENCY. Downstream use dictated by upstream assumptions.
@rmoff #stratadata
Analytics
Applications
Tell Us What Happened
Respond
Photo by Deva Darshan from Pexels
→ an order was placed!
→ how many orders were placed
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Data flows used to be one way. We need to think beyond our silos. It’s the same data Historically, technology was such you had to have this divide. OLTP/OLAP compromise. Batch ETL was the inevitable sticking plaster on top of that.
@rmoff #stratadata
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
All systems need to have a way of exchanging data. Analytics generates new data which drives applications Applications need contextual data to improve the user experience Applications need to get data to analytics at lower latency
Photo by NASA on Unsplash
@rmoff #stratadata
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Ultimately we need a common way to work with data Systems and teams across a company need to use the same data in a loosely coupled way Not compromise, not crowbaring everything into new shiny technology Adopting a unified platform. Enables both apps and analytics to be better lower latency, more flexible architecture, more scalable The common denomintator here is events
Photo by Mark Kamalov on Unsplash
Events All data is built from events Events are the lowest granularity of data Events describe our business
@rmoff #stratadata
“
An event is both: * Notification * State transfer The Changing Face of ETL: Event-Driven Architectures for Data Engineers
• “We sold something” -> what did we sell, to whom did we sell it • “Someone clicked a link” -> what did they click, who clicked it
@rmoff #stratadata
A Customer Experience
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Events model our business. They can describe real world interactions
@rmoff #stratadata
A Sensor Reading
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Events can also be generated by machines
@rmoff #stratadata
Events
Basket Bread
Tinned Spaghetti
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Usually we start with state and how we’re going to store it. Let’s consider an online retailer. A basket at checkout might look like this. What it doesn’t show is how that basket was created.
Events
@rmoff #stratadata
Basket Bread
ItemAdd Bread
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
An event happened - “something was added to the basket” -> what was added
@rmoff #stratadata
Events
Basket Bread
ItemAdd
ItemAdd
Bread
Baked Beans
Baked Beans
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Then we add Baked beans
@rmoff #stratadata
Events
Basket Bread
ItemAdd
ItemAdd
ItemRemove
Bread
Baked Beans
Baked Beans
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Then we change our minds, and take beans out
@rmoff #stratadata
Events
Basket Bread
ItemAdd
ItemAdd
ItemRemove
ItemAdd
Bread
Baked Beans
Baked Beans
Tinned Spaghetti
Tinned Spaghetti
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
And add tinned spaghetti back in
@rmoff #stratadata
Events
Basket Bread
ItemAdd
ItemAdd
ItemRemove
ItemAdd
Bread
Baked Beans
Baked Beans
Tinned Spaghetti
Tinned Spaghetti
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
The stream of events describes the behaviour - the interaction with our business If we simply capture state (the final basket) we lose the behaviour information
@rmoff #stratadata
Events
Basket Bread
ItemAdd
ItemAdd
ItemRemove
ItemAdd
Bread
Baked Beans
Baked Beans
Tinned Spaghetti
Tinned Spaghetti
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
From a stream of events you can derive state Same concept as in analytics. You can aggregate up, but not down. From state we cannot discern the events that created it. But from events we can build state.
@rmoff #stratadata
Events
Basket Bread
ItemAdd
ItemAdd
ItemRemove
ItemAdd
Bread
Baked Beans
Baked Beans
Tinned Spaghetti
Tinned Spaghetti
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
So all we actually need to accurately model our business is the event stream. Everything else can be built from that. This event stream can be implemented using an event streaming platform, like Apache Kafka
@rmoff #stratadata
Databases
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
More interestingly, databases are also streams of events. This might seem unintuitive at first. When most people think about databases, they immediately think of tables.
@rmoff #stratadata
The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
The Changing Face of ETL: Event-Driven Architectures for Data Engineers Photo by Bobby Burch on Unsplash
@rmoff #stratadata
What is an Event Streaming Platform? Producer
Connectors
Consumer
The Log
Connectors
Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Apache Kafka is an event streaming platform Persisted event stream Stream processing Integration It’s a distributed system providing horizontal scalability at scale (Netflix, Uber, etc)
Immutable Event Log
Old
@rmoff #stratadata
New
Messages are added at the end of the log The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Distributed, Append-only, Immutable event log Persisted New messages written to the end
@rmoff #stratadata
Topics Clicks Orders Customers
Topics are similar in concept to tables in a database The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Arranged as Topics, akin to Tables in DB
@rmoff #stratadata
Partitions Clicks
p0 P1 P2
Messages are guaranteed to be strictly ordered within a partition The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Messages are just K/V bytes
@rmoff #stratadata
plus headers + timestamp
Clicks Header Timestamp Key Value
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Messages are just K/V bytes
@rmoff #stratadata
With great power comes great responsibility
Avro
-> Confluent Schema Registry
Protobuf
JSON
CSV
https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata
Consumers have a position all of their own
Old
New
Sally is here
Scan
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Consumers read from a position in the log Commit when done Kafka stores their offset
@rmoff #stratadata
Consumers have a position all of their own
Old
New
Fred is here
Scan
Sally is here
Scan
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Other consumers can read the same data Data is not transient Data is persisted according to configured retention settings
@rmoff #stratadata
Consumers have a position all of their own George is here
Scan
Old
New
Fred is here
Scan
Sally is here
Scan
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Message Replay No slow consumer problem Topics don’t care who’s reading them. Different consumers can read from different offsets.
@rmoff #stratadata
The Connect API Producer
Connectors
Consumer
The Log
Connectors
Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Kafka Connect for Integration
@rmoff #stratadata
Streaming Integration with Kafka Connect syslog
Sources Tasks
Workers
Kafka Connect Kafka Brokers
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata
Streaming Integration with Kafka Connect Amazon S3
Google BigQuery
Sinks
Tasks
Workers
Kafka Connect Kafka Brokers
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata
Streaming Integration with Kafka Connect Amazon S3
syslog
Google BigQuery
Tasks
Workers
Kafka Connect Kafka Brokers
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Stream Processing in Kafka Producer
Connectors
@rmoff #stratadata
Consumer
The Log
Connectors
Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Stream processing - transform messages as they pass through Kafka, write back to another Kafka topic
@rmoff #stratadata
Kafka Streams API
final StreamsBuilder builder = new StreamsBuilder() .stream(“orders”, Consumed.with(stringSerde, ordersSerde)) .filter( (key, order) -> order.getStatus().equals(“COMPLETE”) ) .to(“complete_orders”, Produced.with(stringSerde, ordersSerde));
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Kafka Streams is part of Apache Kafka Java library, integrate stream processing capability natively into your application
Stream Processing with KSQL
@rmoff #stratadata
CREATE STREAM completedOrders AS SELECT * FROM orders WHERE status=’COMPLETE’;
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
KSQL is project from Confluent Build stream processing applications declared in a SQL-like language
@rmoff #stratadata
Photo by Ash from Modern Afflatus on Unsplash
This is Something New
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Reset your assumptions, Squeegie your third eye Don’t do the same just because that’s what you always did • You don’t need a database! Event log can rebuild state if you need it - but you don’t always need it, so why add a database? Human nature to look for the parallel in a new situation to a current one • Driving a new car, where’s the handbrake, where’s the accelerator • event stream is different • where we’re going we don’t need cars
@rmoff #stratadata
Events in Action Review events
reviews
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Let’s take the example from earlier of our online retailer. Users can leave reviews for products. These get streamed into Kafka. You’ll note that we’re not writing them to a database!
@rmoff #stratadata
Events in Action Review events
reviews
Operational dashboard
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
We want to do different things with the reviews. We can use Kafka Connect to stream to Elasticsearch and provide a Kibana dashboard to customer ops team
@rmoff #stratadata
Events in Action Review events
reviews
Operational dashboard
Data lake
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
We can stream the same data to our data lake
@rmoff #stratadata
Events in Action reviews
Review events
CREATE STREAM reviews_clean AS SELECT * FROM reviews WHERE id IS NOT NULL;
reviews_clean Operational dashboard
Filter out bad data
Data lake
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
We can also tidy up the data as it passes through the system Write transformed data back to Kafka and write that to the targets instead
@rmoff #stratadata
Events in Action Existing apps
User data
RDBMS txn log
users
Kafka Connect Kafka The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Whilst the review data is useful it only includes the user id, e.g. “42” We want to know more about the users, and this information (name, email, loyalty status) is held in a database Funnily enough databases are also built on top of immutable event logs - the transaction log! We can mine the txn log with Kafka Connect into Kafka.
@rmoff #stratadata
Events in Action Review events
reviews
users
reviews_clean Operational dashboard
User data
Data lake
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
User data in a Kafka topic, synchronised with the database in realtime Join events as they arrive with the reference information, to improve the data written to the dashboard and data lake
@rmoff #stratadata
Events in Action Review events
CREATE CREATE SELECT SELECT
STREAM STREAM enriched_reviews reviews_clean AS AS ** FROM reviews_clean r FROM reviews INNER JOIN users u WHERE id IS NOT NULL ON r.userid=u.userid;
reviews
users
reviews_clean enriched_reviews Operational dashboard
User data Join events to users, and filter
Data lake
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Use KSQL, or Kafka Streams, to do the transformation. Transform once, use many
@rmoff #stratadata
Events in Action Notification service Review events Operational dashboard
User data
Data lake
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
We can also drive applications from these events Let’s imagine we want to alert our ops team if an important customer leaves a bad review No need to write the reviews to a data store for the service to then poll and query Instead filter the reviews as they arrive and route them to a new topic Separation of responsibilities. Notification service just subscribes to the topic. Same data can also be sent to ops dashboard in Elasticsearch.
Events in Action Review events
@rmoff #stratadata
CREATE STREAM unhappy_vips AS SELECT * FROM enriched_reviews WHERE rating < 3 Notification AND status = ‘Platinum’; service reviews users reviews_clean enriched_reviews Operational unhappy_vips dashboard
User data Join events to users, and filter
Data lake
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Photo by rmoff
The Power of an Event-Driven Architecture
There are some powerful things that event-first architecture gives you but just like when you get taken out of the matrix (your warm comfortable way of doing things currently), the shock can be extreme Understand what your pain points are and relate events to those be aware of the other potential of events so as to holistically build the best architecture key benefits accurate modeling of what happened Simplified, more powerful, more flexible archicture Data when you need it scale when you need it
Not Everything is a Nail
Events
@rmoff #stratadata
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Best tool for the job
@rmoff #stratadata
Not Everything is a Nail
Events
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Best tool for the job
@rmoff #stratadata
Not Everything is a Nail
Events
Elasticsearch
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Best tool for the job
@rmoff #stratadata
Not Everything is a Nail Graph
Events
Elasticsearch
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Best tool for the job
Side-by-Side Tech Evaluation
@rmoff #stratadata
Events
HDFS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Mix & match to find the best No slow consumer problem
Side-by-Side Tech Evaluation
Events
@rmoff #stratadata
BiqQuery
HDFS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Mix & match to find the best No slow consumer problem
Side-by-Side Tech Evaluation
@rmoff #stratadata
Snowflake
Events
BiqQuery
HDFS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Mix & match to find the best No slow consumer problem
@rmoff #stratadata
Evolve Data Sources Producer Onpremises
Consuming App A
Consuming App B
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata
Evolve Data Sources Producer Onpremises
Consuming App A
Consuming Producer
App B
Cloud
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata
Evolve Data Sources Consuming App A
Consuming Producer
App B
Cloud
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Tight Coupling != Flexible
Orders
@rmoff #stratadata
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc
@rmoff #stratadata
Tight Coupling != Flexible
Orders
RDBMS
HDFS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc
@rmoff #stratadata
Tight Coupling != Flexible
Orders
RDBMS
HDFS
App
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc
@rmoff #stratadata
Loose Coupling == Freedom to Evolve RDBMS
Orders
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc
@rmoff #stratadata
Loose Coupling == Freedom to Evolve RDBMS
Orders
HDFS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc
@rmoff #stratadata
Loose Coupling == Freedom to Evolve RDBMS
Orders
App
HDFS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc
@rmoff #stratadata
Transform Once, Use Many: Data Cleansing temp_raw App
IoT
App
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Transform once use many Cleansing data - filter out bad records
@rmoff #stratadata
Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129 IoT
reading 13.05 13.11 13.11 13.04
temp_raw App
App
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Transform once use many Cleansing data - filter out bad records
@rmoff #stratadata
Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129 IoT
reading 13.05 13.11 13.11 13.04
temp_raw Cleanse App
App
Cleanse
RDBMS Cleanse
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Transform once use many Cleansing data - filter out bad records
@rmoff #stratadata
Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129
reading 13.05 13.11 13.11 13.04
IoT
time_epoch 1551136074 1551136125 1551138129
reading 13.05 13.11 13.04
App
temp_raw
SENSOR_ID IS NOT NULL
Transform once use many Cleansing data - filter out bad records
temp_clean sensor_id 42 42 42 App
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata
Transform Once, Use Many: Data Enrichment RDBMS Events
App 01 Join
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Enriching data in user info to an event stream vs each application that needs it directly calling back to the source
@rmoff #stratadata
Transform Once, Use Many: Data Enrichment RDBMS Events
App 01 Join
Elasticsearch App 02 Join The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Enriching data in user info to an event stream vs each application that needs it directly calling back to the source
@rmoff #stratadata
Transform Once, Use Many: Data Enrichment App 01
Events
Elasticsearch Join RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Enriching data in user info to an event stream vs each application that needs it directly calling back to the source
Message Payload Compatibility
@rmoff #stratadata
Producer Consuming App
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Transforming Schema compatibility
Message Payload Compatibility
@rmoff #stratadata
Producer Consuming App
Producer
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Transforming Schema compatibility
Message Payload Compatibility
@rmoff #stratadata
Producer Consuming App
Producer
Triangles to Squares The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Transforming Schema compatibility
@rmoff #stratadata
Build Resilient Pipelines with Schemas sales_csv
Apply
COL1 ID INT COL2 NAME VARCHAR
schema App 01 Producer
Apply App 02
COL1 ID INT COL2 NAME VARCHAR
schema
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
CSV -> Avro
@rmoff #stratadata
Build Resilient Pipelines with Schemas Schema Registry
sales App 01
Producer App 02
sales_csv Apply schema COL1 ID INT COL2 NAME VARCHAR The Changing Face of ETL: Event-Driven Architectures for Data Engineers
CSV -> Avro
Photo by rmoff
Say NO to brittle pipelines
Event streaming platform gives you the freedom to EVOLVE as REQUIREMENTS and TECHNOLOGY change
Photo by Benjamin Lambert on Unsplash
@rmoff #stratadata
EVOLVE don’t GAMBLE How do I know what we want to use? at scale? next month? year?
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Latency requirements
Photo by Benjamin Lambert on Unsplash
Users of the data
!
Photo by Benjamin Lambert on Unsplash
GAMBLE ON: Latency requirements Number of applications for the data Data fidelity / event stream -> behaviour Scale
Scale Data fidelity
@rmoff #stratadata
App
App
App
App
cache
monitoring
cache
MQ
DWH
security
MQ
search
Hadoop
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Here is what I’ve seen. There are some apps that use an enterprise MQ, data moves around using custom ETL scripts in batch. Over time, this ad-hoc way of connecting every new type of source to every type of destination, where everything talks to everything else, just doesn’t scale.
@rmoff #stratadata
App
App
App
App
request-response
changelogs App
KAFKA
App DWH
Hadoop
App
messaging OR stream
App
processing
streaming data pipelines
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
To make event-centric thinking available at a company-wide level is very much why we built Apache Kafka. We had a very particular vision for what a company would look like if you reimagined it’s use of data around streams of events
Photo by rmoff
Events model the real world
Don’t be afraid. It’s going to happen. “all I want to know is how many beans I sold” -> events can be aggregated to tell you that but if when your business wants to exploit the event stream data to understand customer interactions with the business then events provide these event streams give a perfect foundation for building “normal” ETL upon, so by adopting it now you prepare yourself for the future
Event streaming platform Data persistence
Flexibility & scalability
Photo by rmoff
Native stream processing Data when you need it
Don’t be afraid. It’s going to happen. “all I want to know is how many beans I sold” -> events can be aggregated to tell you that but if when your business wants to exploit the event stream data to understand customer interactions with the business then events provide these event streams give a perfect foundation for building “normal” ETL upon, so by adopting it now you prepare yourself for the future
@rmoff #stratadata
http://cnfl.io/book-bundle
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata
Resources • CDC Spreadsheet
#EOF
• Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions
• BD team (#partners / partners@confluent.io) can help with introductions on a given sales op
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
The Changing Face of ETL Event-Driven Architectures for Data Engineers Photo by rmoff
@rmoff
Build data pipelines better Currently: Inflexible, Slow, & Brittle Technology now exists for scalable, flexible, low-latency pipelines —-Big Data Tech Warsaw: 30 minutes Kafka Paris Meetup: 43 minutes