Embrace the Anarchy : Apache Kafka's Role in Modern Data Architectures

A presentation at Munich Microservices Meetup in August 2018 in Munich, Germany by Robin Moffatt

Slide 1

Slide 1

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 1 Apache Kafka's Role in Modern Data Architectures Embrace the Anarchy : Robin Moffatt / Confluent Photo by   Jaak Horn   on   Unsplash

Slide 2

Slide 2

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 2 • Developer Advocate @ Confluent • Working in data & analytics since 2001 • Oracle Developer Champion • Blogging : http://rmoff.net & http://cnfl.io/rmoff
• Twitter: @rmoff

• Geek stuff • Beer & Fried Breakfasts $ whoami https://speakerdeck.com/rmoff/

Slide 3

Slide 3

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Apache Kafka is a Streaming Platform

Slide 4

Slide 4

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Why do we need a streaming platform?

Slide 5

Slide 5

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures One of the reasons:
Decoupling

Slide 6

Slide 6

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures A case in point…Analytics

Slide 7

Slide 7

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 7 Sales DWH Analytics—In the beginning…

Slide 8

Slide 8

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 8 Sales DWH Inventory And then there were more data sources…

Slide 9

Slide 9

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 9 Sales DWH Inventory Batch Transformations … (ETL / ELT)

Slide 10

Slide 10

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 10 Sales DWH Inventory Data Lake Add a Data Lake…

Slide 11

Slide 11

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 11 Sales Inventory Data Lake …or Replace the Data Warehouse

Slide 12

Slide 12

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 12 Sales Inventory Data Lake Still need to do Batch transformations…

Slide 13

Slide 13

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 13 Want your data anytime ! ? Batch is Latency built in by Design

Slide 14

Slide 14

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 14 Microservices Mobile Machine 
 Learning Internet of 
 Things The World has Changed

Photo by  Denys Nevozhai  on  Unsplash

Slide 15

Slide 15

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 15 Photo by   Rosie Fraser   on   Unsplash Lots of new technologies (whether you like it or not)

Slide 16

Slide 16

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 16 App App App App search Hadoop DWH monitoring security MQ MQ cache cache

Slide 17

Slide 17

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 17 KAFKA DWH Hadoop App App App App App App App App

request-response

messaging OR stream processing streaming data pipelines

changelogs

Slide 18

Slide 18

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Apache Kafka is a Streaming Platform

Slide 19

Slide 19

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Three Lenses � 19

Slide 20

Slide 20

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 01 Messaging
Done Right 02 Scalable Streaming 
 Data Pipelines 03 Foundation for 
 Stream Processing � 20 What is Apache Kafka?

Slide 21

Slide 21

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Scalability True Storage Real-Time Processing � 21 Lens 1: Messaging Done Right

Slide 22

Slide 22

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 22 Lens 2: Scalable Streaming Data Pipelines

Slide 23

Slide 23

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Lens 3: Foundation for Stream Processing KSQL is the Streaming SQL Engine for Apache Kafka � 23

Slide 24

Slide 24

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 24 The Streaming Platform

Slide 25

Slide 25

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 25 The Streaming Platform Event-Driven Scalable Decoupled

Slide 26

Slide 26

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Bold claim: all your data is event streams

Slide 27

Slide 27

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 27 A Customer Experience

Slide 28

Slide 28

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 28 A Sale

Slide 29

Slide 29

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 29 A Sensor Reading

Slide 30

Slide 30

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 30 An Application
Log Entry

Slide 31

Slide 31

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 31 Databases

Slide 32

Slide 32

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 32 Do you think that’s a table

you are querying?

Slide 33

Slide 33

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 33 The Table Stream Duality Account ID Balance 12345 €50 Account ID Amount 12345

  • €50 12345
  • €25 12345 -€60 Account ID Balance 12345 €75 Account ID Balance 12345 €15 Time Stream Table

Slide 34

Slide 34

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 34 The truth is the log.
The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Photo by   Bobby Burch   on   Unsplash

Slide 35

Slide 35

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures A Brief Look at
Kafka's Technology

Slide 36

Slide 36

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 36 Apache Kafka Reads are a single seek & scan Writes are append only Kafka A Distributed Commit Log . Publish and subscribe to 
 streams of records. Highly scalable, high throughput. 
 Supports transactions. Persisted data. Stream processing. Producer & Consumer APIs Open-source client libraries for numerous languages, to directly integrate with your applications.

Slide 37

Slide 37

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 37 Apache Kafka Orders Table Customers Kafka Streams API Kafka Connect API Reliable and scalable integration of Kafka
with other systems – no coding required. Kafka Streams API Write standard Java applications & microservices 
 to process your data in real-time

Slide 38

Slide 38

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Declarative Stream Language Processing KSQL is a

Slide 39

Slide 39

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures KSQL is the Streaming SQL Engine for Apache Kafka

Slide 40

Slide 40

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 40 KSQL in Development and Production Interactive KSQL 
 for development and testing Headless KSQL 
 for Production Desired KSQL queries

have been identified REST “Hmm, let me try 
 out this idea...”

Slide 41

Slide 41

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 41 • Log data monitoring, tracking and alerting • syslog data

• Sensor / IoT data CREATE STREAM SYSLOG_INVALID_USERS AS
SELECT HOST, MESSAGE

FROM SYSLOG

WHERE MESSAGE LIKE '%Invalid user%' ; http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting KSQL for Real-Time Monitoring

Slide 42

Slide 42

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 42 CREATE TABLE possible_fraud AS 
 SELECT card_number, count(*) 


FROM authorization_attempts 


WINDOW TUMBLING (SIZE 5 SECONDS) 


GROUP BY card_number 


HAVING count(*) > 3; Identifying patterns or anomalies in real-time data,
surfaced in milliseconds KSQL for Anomaly Detection

Slide 43

Slide 43

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 43 CREATE STREAM vip_actions AS 
 SELECT userid, page, action

FROM clickstream c

LEFT JOIN

users u

ON c.userid = u.user_id 


WHERE u.level = 'Platinum' ; Joining, filtering, and aggregating streams of event data KSQL for Streaming ETL

Slide 44

Slide 44

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures What Problems does Kafka Solve?

Slide 45

Slide 45

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 45 Streaming Platform “A product was viewed” Hadoop Web
app Event-Centric Thinking

Slide 46

Slide 46

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 46 Event-Centric Thinking Streaming Platform “A product was viewed” Hadoop Web
app mobile
app APIs

Slide 47

Slide 47

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 47 Event-Centric Thinking mobile
app web
app APIs Streaming Platform Hadoop Security Monitoring Rec
engine “A product was viewed”

Slide 48

Slide 48

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 48 Producer Consumer System Availability and Event Buffering

Slide 49

Slide 49

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 49 Producer Consumer System Availability and Event Buffering

Slide 50

Slide 50

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 50 Consumer A Producer 24hr batch extract Varying Latency Requirements / Batch vs Stream

Slide 51

Slide 51

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 51 Producer 24hr batch extract Consumer A Consumer B Varying Latency Requirements / Batch vs Stream

Slide 52

Slide 52

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 52 Producer 24hr batch extract Consumer A Consumer B Varying Latency Requirements / Batch vs Stream

Slide 53

Slide 53

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 53 Producer 24hr batch extract Realtime Consumer A Consumer B Varying Latency Requirements / Batch vs Stream

Slide 54

Slide 54

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 54 Producer 24hr batch extract Realtime Consumer A Consumer B Varying Latency Requirements / Batch vs Stream

Slide 55

Slide 55

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 55 Producer Consumer A 24hr batch extract Realtime Realtime Consumer B Varying Latency Requirements / Batch vs Stream

Slide 56

Slide 56

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 56 Technology & Code/Algo Version Changes Producer Consumer
(v1)

Slide 57

Slide 57

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 57 Technology & Code/Algo Version Changes Producer Consumer
(v1) Consumer
(V2)

Slide 58

Slide 58

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 58 Technology & Code/Algo Version Changes Producer Consumer
(V2)

Slide 59

Slide 59

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Architectural Patterns with Apache Kafka

Slide 60

Slide 60

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 60 Photo by   Christopher Burns   on   Unsplash Building for the Future

Slide 61

Slide 61

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 61 Tightly-coupled = Inflexible

Slide 62

Slide 62

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 62 Analytics - Database Offload HDFS / S3 / BigQuery etc RDBMS CDC

Slide 63

Slide 63

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 63 Stream Processing with Apache Kafka and KSQL order events customer customer orders Stream Processing RDBMS CDC

Slide 64

Slide 64

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 64 Real-time Event Stream Enrichment order events customer Stream Processing customer orders RDBMS <y> CDC

Slide 65

Slide 65

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 65 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS <y> New App <x> CDC

Slide 66

Slide 66

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 66 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS <y> HDFS / S3 / etc New App <x> CDC

Slide 67

Slide 67

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 67 Evolve processing from old systems to new Stream Processing RDBMS Existing App CDC New App <x>

Slide 68

Slide 68

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 68 Evolve processing from old systems to new Stream Processing RDBMS Existing App New App <x> New App <y> CDC

Slide 69

Slide 69

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 69 Want your data anytime ! ? Batch is Latency built in by Design You say that like "latency" is a synonym for "evil"

Slide 70

Slide 70

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 70 It's all about the Events!

Slide 71

Slide 71

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures So…Analytics and Kafka

Slide 72

Slide 72

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 72 The Vision! Vision "One version of the truth"

Slide 73

Slide 73

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 73 The Reality…

Slide 74

Slide 74

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 74 Pragmatism is… "One version of the truth"

Slide 75

Slide 75

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 75 Streaming Platform Stream Processing "One version of the truth"

Slide 76

Slide 76

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 76 Streaming Platform ML App <y> NoSQL Search Graph Stream Processing "One version of the truth"

Slide 77

Slide 77

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data 
 Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Apache Open Source Confluent Open Source Confluent Platform Confluent Platform Apache Kafka ® Core | Connect API | Streams API Data Compatibility Schema Registry Development and Connectivity Clients | Connectors | REST Proxy | CLI Apache Open Source Confluent Open Source SQL Stream Processing KSQL � 77 Confluent Open Source :
Apache Kafka with a bunch of cool stuff! For free!

Confluent Enterprise

Monitoring & Administration Confluent Control Center | Security

Operations Replicator | Auto Data Balancing

Confluent Enterprise

Slide 78

Slide 78

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 78 Free Books! https://www.confluent.io/apache-kafka-stream-processing-book-bundle

Slide 79

Slide 79

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 79 Confluent Streaming Event, Munich http://cnfl.io/streaming-event-munich

Slide 80

Slide 80

@rmoff robin@confluent.io https://www.confluent.io/download/ http://cnfl.io/slack

Slide 81

Slide 81

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures � 81 • CDC Spreadsheet

• Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC

• #partner-engineering on Slack for questions • BD team (#partners / partners@confluent.io ) can help with introductions on a given sales op Resources #EOF