Streaming ETL on the Shoulders of Giants

A presentation at VoxxedDays Ticino 2019 in October 2019 in Lugano, Switzerland by Hans-Peter Grahsl

Slide 1

Slide 1

Streaming ETL on the Shoulders of G I A N T S

Slide 2

Slide 2

Hans-Peter Grahsl ” • working & living in Graz • technical trainer at • independent consultant & engineer • associate lecturer • @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland occasional conference speaker 2

Slide 3

Slide 3

Speed & Agility @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 3

Slide 4

Slide 4

For businesses to stay relevant they must deliver value at a breakneck pace and be constantly seeking new sources of value … @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 4

Slide 5

Slide 5

@hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 5

Slide 6

Slide 6

@hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 6

Slide 7

Slide 7

@hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 7

Slide 8

Slide 8

@hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 8

Slide 9

Slide 9

Diminishing Value of Data @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 9

Slide 10

Slide 10

Diminishing Value of Data @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 10

Slide 11

Slide 11

Diminishing Value of Data @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 11

Slide 12

Slide 12

Diminishing Value of Data @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 12

Slide 13

Slide 13

Diminishing Value of Data @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 13

Slide 14

Slide 14

Diminishing Value of Data @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 14

Slide 15

Slide 15

Historic ETL causes Pain • batch-driven • brittle / error prone • slow & late answers @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 15

Slide 16

Slide 16

Antipattern for Speed & Agility @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 16

Slide 17

Slide 17

Streaming ETL alleviates Pain • event-centric • stream-oriented • fast & timely answers @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 17

Slide 18

Slide 18

Enabler for Speed & Agility @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 18

Slide 19

Slide 19

Modern Data Architecture? @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 19

Slide 20

Slide 20

Modern Data Architecture? @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 20

Slide 21

Slide 21

Modern Data Architecture? @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 21

Slide 22

Slide 22

Modern Data Architecture? @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 22

Slide 23

Slide 23

Modern Data Architecture? @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 23

Slide 24

Slide 24

Modern Data Architecture? @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 24

Slide 25

Slide 25

On the Shoulders of G I A N T S @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 25

Slide 26

Slide 26

Operational Data Store @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 26

Slide 27

Slide 27

MongoDB • rich document model • powerful queries & indexing • ACID transactions • transparent sharding & replication @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 27

Slide 28

Slide 28

Streaming Platform @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 28

Slide 29

Slide 29

Apache Kafka • pub / sub to event streams • (permanently) store event streams • event streaming in near real-time @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 29

Slide 30

Slide 30

@hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 30

Slide 31

Slide 31

“… data processing that is designed with infinite data sets in mind.” — Tyler Akidau @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 31

Slide 32

Slide 32

EVENTS EVENTS EVERYWHERE! @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 32

Slide 33

Slide 33

Kafka APIs for “everything” • simple pub / sub scenario Producer & Consumer API • streaming data integration Connect API • powerful stream processing @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland ❓ ❓❓ KStreams API + KSQL 33

Slide 34

Slide 34

Kafka Connect @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 34

Slide 35

Slide 35

Kafka Connect @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 35

Slide 36

Slide 36

Kafka Connect @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 36

Slide 37

Slide 37

Kafka Connect @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 37

Slide 38

Slide 38

Kafka Connect @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 38

Slide 39

Slide 39

Kafka Connect • often about data stores @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 39

Slide 40

Slide 40

Kafka Connect • concrete examples @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 40

Slide 41

Slide 41

Kafka Connect • concrete examples @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 41

Slide 42

Slide 42

Source Connectors @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 42

Slide 43

Slide 43

Source Connectors @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 43

Slide 44

Slide 44

Source Connectors @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 44

Slide 45

Slide 45

Source Connectors @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 45

Slide 46

Slide 46

Sink Connectors @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 46

Slide 47

Slide 47

Sink Connectors @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 47

Slide 48

Slide 48

Sink Connectors @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 48

Slide 49

Slide 49

Sink Connectors @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 49

Slide 50

Slide 50

@hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 50

Slide 51

Slide 51

MongoDB Connector • officially supported by MongoDB • developed open-source on GitHub • verified Gold by Confluent @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 51

Slide 52

Slide 52

Exemplary Use Cases @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 52

Slide 53

Slide 53

Single Customer View @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 53

Slide 54

Slide 54

Single Customer View @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 54

Slide 55

Slide 55

Single Customer View @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 55

Slide 56

Slide 56

Single Customer View @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 56

Slide 57

Slide 57

Single Customer View @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 57

Slide 58

Slide 58

Synchronization across Services @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 58

Slide 59

Slide 59

Synchronization across Services @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 59

Slide 60

Slide 60

Synchronization across Services @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 60

Slide 61

Slide 61

Synchronization across Services @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 61

Slide 62

Slide 62

Synchronization across Services @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 62

Slide 63

Slide 63

Real-Time Recommendations @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 63

Slide 64

Slide 64

Real-Time Recommendations @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 64

Slide 65

Slide 65

Real-Time Recommendations @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 65

Slide 66

Slide 66

Real-Time Recommendations @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 66

Slide 67

Slide 67

Real-Time Recommendations @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 67

Slide 68

Slide 68

Demo Scenario @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 68

Slide 69

Slide 69

Demo Scenario @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 69

Slide 70

Slide 70

Demo Scenario @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 70

Slide 71

Slide 71

Demo Scenario @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 71

Slide 72

Slide 72

Demo Scenario @hpgrahsl | #VDT19 #VoxxedDays Ticino, 05th October 2019, Lugano - Switzerland 72

Slide 73

Slide 73

Slide 74

Slide 74