Streaming ETL on the Shoulders of Giants Scott L’Hommedieu, MongoDB llamadew
Hans-Peter Grahsl, NETCONOMY hpgrahsl
Slide 2
Streaming ETL on the Shoulders of Giants Why ETL is important How we can “ETL better” Let’s see (some use cases) + a DEMO!
Slide 3
Speed & Agility
A Top 5 Tech Risk*
For businesses to stay relevant they must deliver value at a breakneck pace and be constantly seeking new sources of value.
*google ”top tech risks”
Slide 4
Managing, Processing and Analyzing Data We use Data
To unlock insights
And drive value
Slide 5
But, historic ETL is painful An antipattern for Speed and Agility
ETL = Batch( Error Prone , Brittle, Slow )
Slide 6
Solving the pain of ETL through Streaming Data Speed and Agility ETL = DataStream ( Resilient, Loosely Coupled, Realtime)
Slide 7
Streaming ETL on the Shoulders of Giants Why ETL is important How we can “ETL” better Let’s see (some use cases) + a DEMO!
Slide 8
Architecture of a Modern Data Platform
Slide 9
Architecture of a Modern Data Platform Streaming Data Platform
Slide 10
Architecture of a Modern Data Platform Streaming Data Platform Connected Apps
Datastores
Slide 11
Architecture of a Modern Data Platform Streaming Data Platform Connected Apps
Connected Apps
Stream Processors Datastores
Datastores
Slide 12
On the shoulders of Giants
MongoDB
Kafka
Slide 13
Modern Data Platform
Slide 14
Modern Data Platform Doc Model Run Anywhere Distributed and Scalable Resilient and Performant
What is Streaming?
“a type of data processing that is designed with infinite data sets in mind” –Tyler Akidau
Slide 19
“…everything that happens in a company – every customer interaction, every API request, every database change –
can be represented as real-time stream that anything else can tap into, process or react to.”
Slide 20
“…Kafka and the whole category of stream processing represents a fundamental paradigm shift in how the digital part of a company is built, how data is used, and how applications are built. This is actually a pretty rare thing…” – Jay Kreps
Slide 21
KStreams App
Streams API App Apps
n Co
Pr App Apps
o
c du
e
P rA
I
Connect API
Connect API Data Sources
s
e m u
P rA
Data Sinks
I
KSQL
KSQL App
Slide 22
Kafka APIs in a Nutshell… § Producer & Consumer API
à publish-subscribe scenarios
§ Connect API
à streaming data integration scenarios
§ Streams API & KSQL
à code or SQL-based streaming scenarios
Slide 23
A bit more about Kafka Connect …
Slide 24
Kafka Connect Basics
ANY
Source
Connect
Connect
ANY
Sink
ANY à e.g. file systems, data stores, REST endpoints, …
Slide 25
Kafka Connect Basics often about data stores
SOURCE
Connect
Connect
SINK
Slide 26
Kafka Connect Basics or more concretely
Source Connectors
Sink Connectors
https://hub.confluent.io à many many more
Slide 27
Kafka Connect Basics or more concretely MongoDB Source
MongoDB Sink
https://hub.confluent.io à many many more
Slide 28
How do connectors operate?
Slide 29
Kafka Source Connectors
Source Connector
S M T
…
S M T
Converter Serialize
1…N Single Message Transforms for basic in-flight manipulations
Slide 30
Kafka Sink Connectors
Converter Deserialize
S M T
…
S M T
Sink Connector
1…N Single Message Transforms for basic in-flight manipulations
Slide 31
Announcing …
Slide 32
MongoDB Connector for Apache Kafka
Supported by MongoDB
Available on the Confluent Hub: https://www.confluent.io/hub/mongodb/kafka-connect-mongodb
Verified Gold by
Slide 33
MongoDB Connector for Apache Kafka Available on the Confluent Hub: https://www.confluent.io/hub/mongodb/kafka-connect-mongodb
Slide 34
Streaming ETL on the Shoulders of Giants Why ETL is important How we can “ETL better” Let’s see (some use cases) + a DEMO!
Slide 35
Streaming ETL Use Cases
Slide 36
Single Customer View for eCommerce
Source Connectors
MongoDB Sinks
Single Source of Truth
Slide 37
Data Synchronization between Microservices Service 1
… Service N
MongoDB Sinks
Slide 38
Recommendation Engine for Opinion Mining Change Streams
User Recommendation Engine
MongoDB Source
Surveys & Polls Data
Change Streams
Slide 39
IoT Demo Scenario in Action
Slide 40
IoT Demo Scenario REST
data generation
device management
!
Producer API
Change Streams
Stream Processor KSQL
data serving
SSE
Slide 41
IoT Demo Scenario REST MongoDB Source Connector
data generation
Producer API
device management
MongoDB Sink Connector Change Streams
Stream Processor KSQL
data serving
SSE