@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 1 Apache Kafka's Role in Modern Data Architectures Embrace the Anarchy : Robin Moffatt / Confluent Photo by Jaak Horn on Unsplash
A presentation at Munich Microservices Meetup in August 2018 in Munich, Germany by Robin Moffatt
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 1 Apache Kafka's Role in Modern Data Architectures Embrace the Anarchy : Robin Moffatt / Confluent Photo by Jaak Horn on Unsplash
@rmoff /
Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
�
2
•
Developer Advocate @ Confluent
•
Working in data & analytics since 2001
•
Oracle Developer Champion
•
Blogging : http://rmoff.net & http://cnfl.io/rmoff
•
Twitter:
@rmoff
• Geek stuff • Beer & Fried Breakfasts $ whoami https://speakerdeck.com/rmoff/
“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Apache Kafka is a Streaming Platform
“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Why do we need a streaming platform?
“
@rmoff /
Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
One of the reasons:
Decoupling
“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures A case in point…Analytics
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 7 Sales DWH Analytics—In the beginning…
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 8 Sales DWH Inventory And then there were more data sources…
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 9 Sales DWH Inventory Batch Transformations … (ETL / ELT)
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 10 Sales DWH Inventory Data Lake Add a Data Lake…
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 11 Sales Inventory Data Lake …or Replace the Data Warehouse
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 12 Sales Inventory Data Lake Still need to do Batch transformations…
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 13 Want your data anytime ! ? Batch is Latency built in by Design
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 14 Microservices Mobile Machine Learning Internet of Things The World has Changed
Photo by Denys Nevozhai on Unsplash
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 15 Photo by Rosie Fraser on Unsplash Lots of new technologies (whether you like it or not)
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 16 App App App App search Hadoop DWH monitoring security MQ MQ cache cache
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 17 KAFKA DWH Hadoop App App App App App App App App
request-response
messaging OR stream processing streaming data pipelines
changelogs
“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Apache Kafka is a Streaming Platform
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Three Lenses � 19
@rmoff /
Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
01
Messaging
Done Right
02
Scalable Streaming
Data Pipelines
03
Foundation for
Stream Processing
�
20
What is Apache Kafka?
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Scalability True Storage Real-Time Processing � 21 Lens 1: Messaging Done Right
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 22 Lens 2: Scalable Streaming Data Pipelines
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Lens 3: Foundation for Stream Processing KSQL is the Streaming SQL Engine for Apache Kafka � 23
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 24 The Streaming Platform
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 25 The Streaming Platform Event-Driven Scalable Decoupled
“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Bold claim: all your data is event streams
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 27 A Customer Experience
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 28 A Sale
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 29 A Sensor Reading
@rmoff /
Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
�
30
An Application
Log Entry
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 31 Databases
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 32 Do you think that’s a table
you are querying?
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 33 The Table Stream Duality Account ID Balance 12345 €50 Account ID Amount 12345
@rmoff /
Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
�
34
The truth is the log.
The database is a cache
of a subset of the log.
—Pat Helland
Immutability Changes Everything
http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
Photo by
Bobby Burch
on
Unsplash
“
@rmoff /
Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
A Brief Look at
Kafka's Technology
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 36 Apache Kafka Reads are a single seek & scan Writes are append only Kafka A Distributed Commit Log . Publish and subscribe to streams of records. Highly scalable, high throughput. Supports transactions. Persisted data. Stream processing. Producer & Consumer APIs Open-source client libraries for numerous languages, to directly integrate with your applications.
@rmoff /
Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
�
37
Apache Kafka
Orders
Table
Customers
Kafka Streams API
Kafka Connect API
Reliable and scalable integration of Kafka
with other systems – no coding required.
Kafka Streams API
Write standard Java applications & microservices
to process your data in real-time
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Declarative Stream Language Processing KSQL is a
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures KSQL is the Streaming SQL Engine for Apache Kafka
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 40 KSQL in Development and Production Interactive KSQL for development and testing Headless KSQL for Production Desired KSQL queries
have been identified REST “Hmm, let me try out this idea...”
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 41 • Log data monitoring, tracking and alerting • syslog data
•
Sensor / IoT data
CREATE STREAM
SYSLOG_INVALID_USERS
AS
SELECT
HOST, MESSAGE
FROM SYSLOG
WHERE MESSAGE LIKE '%Invalid user%' ; http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting KSQL for Real-Time Monitoring
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 42 CREATE TABLE possible_fraud AS SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING
count(*) > 3;
Identifying patterns or anomalies in real-time data,
surfaced in milliseconds
KSQL for Anomaly Detection
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 43 CREATE STREAM vip_actions AS SELECT userid, page, action
FROM clickstream c
LEFT JOIN
users u
ON c.userid = u.user_id
WHERE u.level = 'Platinum' ; Joining, filtering, and aggregating streams of event data KSQL for Streaming ETL
“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures What Problems does Kafka Solve?
@rmoff /
Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
�
45
Streaming
Platform
“A product was viewed”
Hadoop
Web
app
Event-Centric Thinking
@rmoff /
Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
�
46
Event-Centric Thinking
Streaming
Platform
“A product was viewed”
Hadoop
Web
app
mobile
app
APIs
@rmoff /
Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
�
47
Event-Centric Thinking
mobile
app
web
app
APIs
Streaming
Platform
Hadoop
Security
Monitoring
Rec
engine
“A product was viewed”
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 48 Producer Consumer System Availability and Event Buffering
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 49 Producer Consumer System Availability and Event Buffering
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 50 Consumer A Producer 24hr batch extract Varying Latency Requirements / Batch vs Stream
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 51 Producer 24hr batch extract Consumer A Consumer B Varying Latency Requirements / Batch vs Stream
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 52 Producer 24hr batch extract Consumer A Consumer B Varying Latency Requirements / Batch vs Stream
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 53 Producer 24hr batch extract Realtime Consumer A Consumer B Varying Latency Requirements / Batch vs Stream
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 54 Producer 24hr batch extract Realtime Consumer A Consumer B Varying Latency Requirements / Batch vs Stream
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 55 Producer Consumer A 24hr batch extract Realtime Realtime Consumer B Varying Latency Requirements / Batch vs Stream
@rmoff /
Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
�
56
Technology & Code/Algo Version Changes
Producer
Consumer
(v1)
@rmoff /
Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
�
57
Technology & Code/Algo Version Changes
Producer
Consumer
(v1)
Consumer
(V2)
@rmoff /
Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
�
58
Technology & Code/Algo Version Changes
Producer
Consumer
(V2)
“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Architectural Patterns with Apache Kafka
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 60 Photo by Christopher Burns on Unsplash Building for the Future
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 61 Tightly-coupled = Inflexible
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 62 Analytics - Database Offload HDFS / S3 / BigQuery etc RDBMS CDC
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 63 Stream Processing with Apache Kafka and KSQL order events customer customer orders Stream Processing RDBMS CDC
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 64 Real-time Event Stream Enrichment order events customer Stream Processing customer orders RDBMS <y> CDC
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 65 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS <y> New App <x> CDC
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 66 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS <y> HDFS / S3 / etc New App <x> CDC
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 67 Evolve processing from old systems to new Stream Processing RDBMS Existing App CDC New App <x>
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 68 Evolve processing from old systems to new Stream Processing RDBMS Existing App New App <x> New App <y> CDC
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 69 Want your data anytime ! ? Batch is Latency built in by Design You say that like "latency" is a synonym for "evil"
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 70 It's all about the Events!
“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures So…Analytics and Kafka
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 72 The Vision! Vision "One version of the truth"
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 73 The Reality…
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 74 Pragmatism is… "One version of the truth"
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 75 Streaming Platform Stream Processing "One version of the truth"
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 76 Streaming Platform ML App <y> NoSQL Search Graph Stream Processing "One version of the truth"
@rmoff /
Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures
Database Changes
Log Events
loT Data
Web Events
…
CRM
Data Warehouse
Database
Hadoop
Data
Integration
…
Monitoring
Analytics
Custom Apps
Transformations
Real-time Applications
…
Apache Open Source
Confluent Open Source
Confluent Platform
Confluent Platform
Apache Kafka
®
Core | Connect API | Streams API
Data Compatibility
Schema Registry
Development and Connectivity
Clients | Connectors | REST Proxy | CLI
Apache Open Source
Confluent Open Source
SQL Stream Processing
KSQL
�
77
Confluent Open Source :
Apache Kafka with a bunch of cool stuff! For free!
Confluent Enterprise
Monitoring & Administration Confluent Control Center | Security
Operations Replicator | Auto Data Balancing
Confluent Enterprise
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 78 Free Books! https://www.confluent.io/apache-kafka-stream-processing-book-bundle
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 79 Confluent Streaming Event, Munich http://cnfl.io/streaming-event-munich
@rmoff robin@confluent.io https://www.confluent.io/download/ http://cnfl.io/slack
@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 81 • CDC Spreadsheet
• Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC
• #partner-engineering on Slack for questions • BD team (#partners / partners@confluent.io ) can help with introductions on a given sales op Resources #EOF