From ksqlDB with LOVE: Detecting 007 with a Dash of Machine Learning

A presentation at Confluent VUG - Online Apache Kafka Meetup Event in July 2020 in by Hans-Peter Grahsl

Slide 1

Slide 1

From ksqlDB with ♥ LOVE ♥ Detecting 007 with a dash of machine learning

Slide 2

Slide 2

Hans-Peter Grahsl ‣ technical trainer ‣ independent engineer & consultant ‣ associate lecturer ‣ occasional conference speaker @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 2

Slide 3

Slide 3

@hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 3

Slide 4

Slide 4

@hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 4

Slide 5

Slide 5

@hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 5

Slide 6

Slide 6

@hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 6

Slide 7

Slide 7

Kafka’s streaming SQL engine @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 7

Slide 8

Slide 8

declarative stream processing language @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 8

Slide 9

Slide 9

KSQL in a Nutshell ‣ ANSI SQL inspired ‣ familiar syntax & semantics ‣ concise & expressive ‣ built on top of Kafka Streams ‣ NO(!) coding skills required ‣ entry barrier? “none” @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 9

Slide 10

Slide 10

KSQL in a Nutshell ‣ usual suspects OOTB: ‣ projections, filters ‣ joins, aggregations ‣ windowing @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 10

Slide 11

Slide 11

Query Examples @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 11

Slide 12

Slide 12

Criteria 1 - Learning Curve ‣ SQL == widespread & successful 4GL ‣ algorithms & data structures handled for us ‣ KSQL ‣ very easy to learn ‣ quick implementation cycles ‣ productivity-wise hard to beat @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 12

Slide 13

Slide 13

“Lingua Franca” effect @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 13

Slide 14

Slide 14

Streaming Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 14

Slide 15

Slide 15

Streaming Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 15

Slide 16

Slide 16

Streaming Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 16

Slide 17

Slide 17

Streaming Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 17

Slide 18

Slide 18

Streaming Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 18

Slide 19

Slide 19

NOT the ETL of our ancestors @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 19

Slide 20

Slide 20

Complex Streaming Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 20

Slide 21

Slide 21

KSQL ➔ originally the [T] only @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 21

Slide 22

Slide 22

What if I told you it’s as easy to build a STREAMING app as it is to build a CRUD app? @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 22

Slide 23

Slide 23

Unified Streaming Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 23

Slide 24

Slide 24

Unified Streaming Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 24

Slide 25

Slide 25

Unified Streaming Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 25

Slide 26

Slide 26

Connecting with Sources HINT: connector examples shown in demo later @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 26

Slide 27

Slide 27

Connecting with Sinks HINT: connector examples shown in demo later @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 27

Slide 28

Slide 28

Unified Streaming Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 28

Slide 29

Slide 29

Unified Streaming Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 29

Slide 30

Slide 30

Unified Streaming Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 30

Slide 31

Slide 31

Unified Streaming Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 31

Slide 32

Slide 32

Unified Streaming Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 32

Slide 33

Slide 33

Instead, only try to realize the truth… ksqlDB is NO DATABASE as we know it. @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 33

Slide 34

Slide 34

Data Concepts in ksqlDB STREAM ‣ immutable append-only sequence ‣ captures events representing a series of facts @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 34

Slide 35

Slide 35

Data Concepts in ksqlDB @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 35

Slide 36

Slide 36

Data Concepts in ksqlDB TABLE ‣ mutable collection of events ‣ holds the last known value for each key ‣ also result from stateful operations @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 36

Slide 37

Slide 37

Data Concepts in ksqlDB @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 37

Slide 38

Slide 38

Data Concepts in ksqlDB @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 38

Slide 39

Slide 39

ksqlDB: PUSH Queries ‣ act as subscription to query results ‣ fit asynchronous & reactive data flows ‣ run indefinitely ‣ new data causes continuous updates @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 39

Slide 40

Slide 40

ksqlDB: PUSH Query @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 40

Slide 41

Slide 41

ksqlDB: PULL Queries ‣ fetch point-in-time results ‣ fit request / response data flows ‣ terminate immediately ‣ lookup current state of materialized views @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 41

Slide 42

Slide 42

ksqlDB: PULL Query @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 42

Slide 43

Slide 43

ksqlDB Functions ‣ choose from three categories ‣ scalar ➔ UDF ‣ aggregation ➔ UDAF ‣ table ➔ UDTF @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 43

Slide 44

Slide 44

@hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 44

Slide 45

Slide 45

Criteria 2: Extensibility Options ‣ custom functions (UDFs, UDFAs & UDTFs) ‣ enable flexbile & powerful capabilities ‣ but Java code needed HINT: custom UDF example shown in demo later @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 45

Slide 46

Slide 46

Fictional Use Case @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 46

Slide 47

Slide 47

ksqlDB Use Case @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 47

Slide 48

Slide 48

ksqlDB Use Case @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 48

Slide 49

Slide 49

ksqlDB Use Case @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 49

Slide 50

Slide 50

ksqlDB Use Case @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 50

Slide 51

Slide 51

Data Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 51

Slide 52

Slide 52

Data Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 52

Slide 53

Slide 53

Data Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 53

Slide 54

Slide 54

Data Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 54

Slide 55

Slide 55

Data Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 55

Slide 56

Slide 56

Data Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 56

Slide 57

Slide 57

Data Architecture @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 57

Slide 58

Slide 58

Criteria 3: ML Integration Paths ‣ call fully-managed ML services (external) ‣ run your own model server (co-located) ‣ package home-brewed model into UDF (embedded) ‣ completely separated: integrate ML results via Connectors @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 58

Slide 59

Slide 59

Everywhere… ksqlDB runs everywhere @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 59

Slide 60

Slide 60

Criteria 4: Deployment Options ‣ run however you want ‣ bare metal, VMs, containers ‣ deploy wherever you need ‣ on-premises, private / public cloud, hybrid ‣ something fully-managed? @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 60

Slide 61

Slide 61

@hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 61

Slide 62

Slide 62

Unfortunately, no one can be told what ksqlDB is…You have to TRY IT for yourselves! https://ksqldb.io @hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 62

Slide 63

Slide 63

@hpgrahsl | #ConfluentVUG #ksqlDB #ApacheKafka | 2020-07-16 63

Slide 64

Slide 64

reach out to me @hpgrahsl

Slide 65

Slide 65

Slide 66

Slide 66