Streaming ETL on the Shoulders of Giants

A presentation at MongoDB World 2019 in June 2019 in New York, NY, USA by Hans-Peter Grahsl

Slide 1

Slide 1

Streaming ETL on the Shoulders of Giants Scott L’Hommedieu, MongoDB llamadew Hans-Peter Grahsl, NETCONOMY hpgrahsl

Slide 2

Slide 2

Streaming ETL on the Shoulders of Giants Why ETL is important How we can “ETL better” Let’s see (some use cases) + a DEMO!

Slide 3

Slide 3

Speed & Agility A Top 5 Tech Risk* For businesses to stay relevant they must deliver value at a breakneck pace and be constantly seeking new sources of value. *google ”top tech risks”

Slide 4

Slide 4

Managing, Processing and Analyzing Data We use Data To unlock insights And drive value

Slide 5

Slide 5

But, historic ETL is painful An antipattern for Speed and Agility ETL = Batch( Error Prone , Brittle, Slow )

Slide 6

Slide 6

Solving the pain of ETL through Streaming Data Speed and Agility ETL = DataStream ( Resilient, Loosely Coupled, Realtime)

Slide 7

Slide 7

Streaming ETL on the Shoulders of Giants Why ETL is important How we can “ETL” better Let’s see (some use cases) + a DEMO!

Slide 8

Slide 8

Architecture of a Modern Data Platform

Slide 9

Slide 9

Architecture of a Modern Data Platform Streaming Data Platform

Slide 10

Slide 10

Architecture of a Modern Data Platform Streaming Data Platform Connected Apps Datastores

Slide 11

Slide 11

Architecture of a Modern Data Platform Streaming Data Platform Connected Apps Connected Apps Stream Processors Datastores Datastores

Slide 12

Slide 12

On the shoulders of Giants MongoDB Kafka

Slide 13

Slide 13

Modern Data Platform

Slide 14

Slide 14

Modern Data Platform Doc Model Run Anywhere Distributed and Scalable Resilient and Performant

Slide 15

Slide 15

Apache Kafka 101

Slide 16

Slide 16

Streaming Platform

Slide 17

Slide 17

Streaming Platform • distributed • horizontally scalable • highly fault-tolerant

Slide 18

Slide 18

What is Streaming? “a type of data processing that is designed with infinite data sets in mind” –Tyler Akidau

Slide 19

Slide 19

“…everything that happens in a company – every customer interaction, every API request, every database change – can be represented as real-time stream that anything else can tap into, process or react to.”

Slide 20

Slide 20

“…Kafka and the whole category of stream processing represents a fundamental paradigm shift in how the digital part of a company is built, how data is used, and how applications are built. This is actually a pretty rare thing…” – Jay Kreps

Slide 21

Slide 21

KStreams App Streams API App Apps n Co Pr App Apps o c du e P rA I Connect API Connect API Data Sources s e m u P rA Data Sinks I KSQL KSQL App

Slide 22

Slide 22

Kafka APIs in a Nutshell… § Producer & Consumer API à publish-subscribe scenarios § Connect API à streaming data integration scenarios § Streams API & KSQL à code or SQL-based streaming scenarios

Slide 23

Slide 23

A bit more about Kafka Connect …

Slide 24

Slide 24

Kafka Connect Basics ANY Source Connect Connect ANY Sink ANY à e.g. file systems, data stores, REST endpoints, …

Slide 25

Slide 25

Kafka Connect Basics often about data stores SOURCE Connect Connect SINK

Slide 26

Slide 26

Kafka Connect Basics or more concretely Source Connectors Sink Connectors https://hub.confluent.io à many many more

Slide 27

Slide 27

Kafka Connect Basics or more concretely MongoDB Source MongoDB Sink https://hub.confluent.io à many many more

Slide 28

Slide 28

How do connectors operate?

Slide 29

Slide 29

Kafka Source Connectors Source Connector S M T … S M T Converter Serialize 1…N Single Message Transforms for basic in-flight manipulations

Slide 30

Slide 30

Kafka Sink Connectors Converter Deserialize S M T … S M T Sink Connector 1…N Single Message Transforms for basic in-flight manipulations

Slide 31

Slide 31

Announcing …

Slide 32

Slide 32

MongoDB Connector for Apache Kafka Supported by MongoDB Available on the Confluent Hub: https://www.confluent.io/hub/mongodb/kafka-connect-mongodb Verified Gold by

Slide 33

Slide 33

MongoDB Connector for Apache Kafka Available on the Confluent Hub: https://www.confluent.io/hub/mongodb/kafka-connect-mongodb

Slide 34

Slide 34

Streaming ETL on the Shoulders of Giants Why ETL is important How we can “ETL better” Let’s see (some use cases) + a DEMO!

Slide 35

Slide 35

Streaming ETL Use Cases

Slide 36

Slide 36

Single Customer View for eCommerce Source Connectors MongoDB Sinks Single Source of Truth

Slide 37

Slide 37

Data Synchronization between Microservices Service 1 … Service N MongoDB Sinks

Slide 38

Slide 38

Recommendation Engine for Opinion Mining Change Streams User Recommendation Engine MongoDB Source Surveys & Polls Data Change Streams

Slide 39

Slide 39

IoT Demo Scenario in Action

Slide 40

Slide 40

IoT Demo Scenario REST data generation device management ! Producer API Change Streams Stream Processor KSQL data serving SSE

Slide 41

Slide 41

IoT Demo Scenario REST MongoDB Source Connector data generation Producer API device management MongoDB Sink Connector Change Streams Stream Processor KSQL data serving SSE

Slide 42

Slide 42

That’s all folks! THANK YOU

Slide 43

Slide 43