The Changing Face of ETL: Event-Driven Architectures for Data Engineers

A presentation at Øredev 2019 in November 2019 in Malmö, Sweden by Robin Moffatt

Slide 1

Slide 1

The Changing Face of ETL Event-Driven Architectures for Data Engineers Photo by rmoff @oredev @rmoff

Slide 2

Slide 2

Photo by Samuel Sianipar on Unsplash

Slide 3

Slide 3

Photo by Khai Sze Ong on Unsplash

Slide 4

Slide 4

Photo by Rainier Ridao on Unsplash

Slide 5

Slide 5

Photo by Rohit Tandon on Unsplash

Slide 6

Slide 6

Photo by Theodore Moore on Unsplash

Slide 7

Slide 7

Photo by Cristian Grecu on Unsplash

Slide 8

Slide 8

Photo by Patrick Fore on Unsplash It used to be so simple @rmoff @oredev The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 9

Slide 9

@rmoff @oredev Photo by Eugenio Mazzone on Unsplash More Sources The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 10

Slide 10

@rmoff @oredev Photo by Tom Barrett on Unsplash More Targets The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 11

Slide 11

@rmoff @oredev Photo by Kirill on Unsplash More Data The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 12

Slide 12

@rmoff @oredev Batches and Buckets The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 13

Slide 13

@rmoff @oredev Applications Respond Photo by Deva Darshan from Pexels → an order was placed! Analytics Tell Us What Happened → how many orders were placed The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 14

Slide 14

@rmoff @oredev The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 15

Slide 15

Photo by NASA on Unsplash @rmoff @oredev The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 16

Slide 16

Photo by Mark Kamalov on Unsplash Events

Slide 17

Slide 17

@rmoff @oredev “ An event is both: * Notification * State transfer The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 18

Slide 18

@rmoff @oredev A Customer Experience The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 19

Slide 19

@rmoff @oredev A Sensor Reading The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 20

Slide 20

@rmoff @oredev Databases The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 21

Slide 21

Time The Stream/Table Duality Table Stream Account ID Amount 12345 + €50 12345

  • €25 12345 -€60 @rmoff @oredev Account ID Balance 12345 €50 Account ID Balance 12345 €75 Account ID Balance 12345 €15 The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 22

Slide 22

@rmoff @oredev The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf The Changing Face of ETL: Event-Driven Architectures for Data Engineers Photo by Bobby Burch on Unsplash

Slide 23

Slide 23

Events @rmoff @oredev Basket Bread Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 24

Slide 24

Events @rmoff @oredev Basket Bread ItemAdd Bread The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 25

Slide 25

@rmoff @oredev Events Basket Bread ItemAdd ItemAdd Bread Baked Beans Baked Beans The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 26

Slide 26

@rmoff @oredev Events Basket Bread ItemAdd ItemAdd ItemRemove Bread Baked Beans Baked Beans The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 27

Slide 27

@rmoff @oredev Events Basket Bread ItemAdd ItemAdd ItemRemove ItemAdd Bread Baked Beans Baked Beans Tinned Spaghetti Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 28

Slide 28

@rmoff @oredev Events Basket Bread ItemAdd ItemAdd ItemRemove ItemAdd Bread Baked Beans Baked Beans Tinned Spaghetti Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 29

Slide 29

@rmoff @oredev Events Basket Bread ItemAdd ItemAdd ItemRemove ItemAdd Bread Baked Beans Baked Beans Tinned Spaghetti Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 30

Slide 30

@rmoff @oredev Events Basket Bread ItemAdd ItemAdd ItemRemove ItemAdd Bread Baked Beans Baked Beans Tinned Spaghetti Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 31

Slide 31

@rmoff @oredev What is an Event Streaming Platform? Producer Connectors Consumer The Log Connectors Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 32

Slide 32

Immutable Event Log Old @rmoff @oredev New Messages are added at the end of the log The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 33

Slide 33

@rmoff @oredev Topics Clicks Orders Customers Topics are similar in concept to tables in a database The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 34

Slide 34

@rmoff @oredev Partitions Clicks p0 P1 P2 Messages are guaranteed to be strictly ordered within a partition The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 35

Slide 35

Messages are just K/V bytes @rmoff @oredev plus headers + timestamp Clicks Header Timestamp Key Value The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 36

Slide 36

Messages are just K/V bytes @rmoff @oredev With great power comes great responsibility Avro -> Confluent Schema Registry Protobuf JSON CSV https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 37

Slide 37

@rmoff @oredev Consumers have a position all of their own New Old Sally is here Scan The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 38

Slide 38

@rmoff @oredev Consumers have a position all of their own New Old Fred is here Scan Sally is here Scan The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 39

Slide 39

@rmoff @oredev Consumers have a position all of their own George is here Scan New Old Fred is here Scan Sally is here Scan The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 40

Slide 40

@rmoff @oredev The Connect API Producer Connectors Consumer The Log Connectors Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 41

Slide 41

@rmoff @oredev Streaming Integration with Kafka Connect syslog Sources Tasks Workers Kafka Connect Kafka Brokers The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 42

Slide 42

@rmoff @oredev Streaming Integration with Kafka Connect Amazon S3 Google BigQuery Sinks Tasks Workers Kafka Connect Kafka Brokers The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 43

Slide 43

@rmoff @oredev Streaming Integration with Kafka Connect Amazon S3 syslog Google BigQuery Tasks Workers Kafka Connect Kafka Brokers The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 44

Slide 44

Stream Processing in Kafka Producer Connectors @rmoff @oredev Consumer The Log Connectors Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 45

Slide 45

@rmoff @oredev Kafka Streams API final StreamsBuilder builder = new StreamsBuilder() .stream(“orders”, Consumed.with(stringSerde, ordersSerde)) .filter( (key, order) -> order.getStatus().equals(“COMPLETE”) ) .to(“complete_orders”, Produced.with(stringSerde, ordersSerde)); The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 46

Slide 46

Stream Processing with KSQL @rmoff @oredev CREATE STREAM completedOrders AS SELECT * FROM orders WHERE status=’COMPLETE’; The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 47

Slide 47

@rmoff @oredev Photo by Ash from Modern Afflatus on Unsplash This is Something New The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 48

Slide 48

@rmoff @oredev Events in Action Review events reviews The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 49

Slide 49

@rmoff @oredev Events in Action Review events reviews Operational dashboard The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 50

Slide 50

@rmoff @oredev Events in Action Review events reviews Operational dashboard Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 51

Slide 51

@rmoff @oredev Events in Action Review events CREATE STREAM reviews_clean AS SELECT * FROM reviews WHERE id IS NOT NULL; reviews reviews_clean Operational dashboard Filter out bad data Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 52

Slide 52

@rmoff @oredev Events in Action Existing apps User data RDBMS txn log users Kafka Connect Kafka The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 53

Slide 53

@rmoff @oredev Events in Action Review events reviews users reviews_clean Operational dashboard User data Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 54

Slide 54

@rmoff @oredev Events in Action Review events CREATE CREATE SELECT SELECT STREAM enriched_reviews AS STREAM reviews_clean AS ** FROM reviews_clean r FROM reviews INNER JOIN users u WHERE id IS NOT NULL ON r.userid=u.userid; reviews users reviews_clean enriched_reviews Operational dashboard User data Join events to users, and filter Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 55

Slide 55

@rmoff @oredev Events in Action Notification service Review events Operational dashboard User data Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 56

Slide 56

Events in Action Review events @rmoff @oredev CREATE STREAM unhappy_vips AS SELECT * FROM enriched_reviews WHERE rating Notification< 3 AND status = ‘Platinum’; service reviews users reviews_clean enriched_reviews Operational dashboard unhappy_vips User data Join events to users, and filter Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 57

Slide 57

Photo by rmoff The Power of an Event-Driven Architecture

Slide 58

Slide 58

Not Everything is a Nail Events @rmoff @oredev RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 59

Slide 59

@rmoff @oredev Not Everything is a Nail Events RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 60

Slide 60

@rmoff @oredev Not Everything is a Nail Events Elasticsearch RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 61

Slide 61

@rmoff @oredev Not Everything is a Nail Graph Events Elasticsearch RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 62

Slide 62

Side-by-Side Tech Evaluation @rmoff @oredev Events HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 63

Slide 63

Side-by-Side Tech Evaluation Events @rmoff @oredev BiqQuery HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 64

Slide 64

Side-by-Side Tech Evaluation @rmoff @oredev Snowflake Events BiqQuery HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 65

Slide 65

@rmoff @oredev Evolve Data Sources Producer Onpremises Consuming App A Consuming App B The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 66

Slide 66

@rmoff @oredev Evolve Data Sources Producer Onpremises Producer Consuming App A Consuming App B Cloud The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 67

Slide 67

@rmoff @oredev Evolve Data Sources Consuming App A Producer Consuming App B Cloud The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 68

Slide 68

Tight Coupling != Flexible Orders @rmoff @oredev RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 69

Slide 69

@rmoff @oredev Tight Coupling != Flexible Orders RDBMS HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 70

Slide 70

@rmoff @oredev Tight Coupling != Flexible Orders RDBMS HDFS App The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 71

Slide 71

@rmoff @oredev Loose Coupling == Freedom to Evolve RDBMS Orders The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 72

Slide 72

@rmoff @oredev Loose Coupling == Freedom to Evolve RDBMS Orders HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 73

Slide 73

@rmoff @oredev Loose Coupling == Freedom to Evolve RDBMS Orders App HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 74

Slide 74

@rmoff @oredev Transform Once, Use Many: Data Cleansing temp_raw App IoT App RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 75

Slide 75

@rmoff @oredev Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129 IoT reading 13.05 13.11 13.11 13.04 temp_raw App App RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 76

Slide 76

@rmoff @oredev Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129 IoT reading 13.05 13.11 13.11 13.04 temp_raw Cleanse App App Cleanse RDBMS Cleanse The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 77

Slide 77

@rmoff @oredev Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129 reading 13.05 13.11 13.11 13.04 IoT temp_clean sensor_id 42 42 42 App time_epoch 1551136074 1551136125 1551138129 reading 13.05 13.11 13.04 App temp_raw SENSOR_ID IS NOT NULL RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 78

Slide 78

@rmoff @oredev Transform Once, Use Many: Data Enrichment RDBMS Events App 01 Join The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 79

Slide 79

@rmoff @oredev Transform Once, Use Many: Data Enrichment RDBMS Events App 01 Join Elasticsearch App 02 Join The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 80

Slide 80

@rmoff @oredev Transform Once, Use Many: Data Enrichment App 01 Events Elasticsearch RDBMS Join The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 81

Slide 81

Message Payload Compatibility @rmoff @oredev Producer Consuming App The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 82

Slide 82

Message Payload Compatibility @rmoff @oredev Producer Consuming App Producer The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 83

Slide 83

Message Payload Compatibility @rmoff @oredev Producer Consuming App Producer Triangles to Squares The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 84

Slide 84

@rmoff @oredev Build Resilient Pipelines with Schemas COL1 ID INT COL2 NAME VARCHAR sales_csv Apply schema App 01 COL1 ID INT COL2 NAME VARCHAR Producer App 02 Apply schema The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 85

Slide 85

@rmoff @oredev Build Resilient Pipelines with Schemas Schema Registry sales App 01 Producer App 02 sales_csv Apply schema COL1 ID INT COL2 NAME VARCHAR The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 86

Slide 86

Photo by rmoff Say NO to brittle pipelines

Slide 87

Slide 87

@rmoff @oredev App App App App cache monitoring cache MQ DWH security MQ search Hadoop The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 88

Slide 88

@rmoff @oredev App App App App request-response changelogs App App KAFKA App App DWH Hadoop messaging OR stream processing streaming data pipelines The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 89

Slide 89

Photo by rmoff Events model the real world

Slide 90

Slide 90

Event streaming platform Photo by rmoff Native stream processing Data when you need it Data persistence Flexibility & scalability

Slide 91

Slide 91

Fully Managed Kafka as a Service

Slide 92

Slide 92

@rmoff @oredev http://cnfl.io/book-bundle The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 93

Slide 93

Photo by rmoff @rmoff https://talks.rmoff.net http://cnfl.io/slack @oredev