Apache Kafka Event-Streaming Platform for .NET Developers

A presentation at NDC Porto in April 2020 in by Viktor Gamov

Slide 1

Slide 1

#NDCPORTO @ndc_conferences Apache Kafka Event-Streaming Platform for .NET Developers April, 2020 @gamussa | #NDCPORTO | @ConfluentINc

Slide 2

Slide 2

2 @gamussa | #NDCPorto | @ConfluentINc

Slide 3

Slide 3

3 I build highly scalable Hello World apps @gamussa | #NDCPorto | @ConfluentINc

Slide 4

Slide 4

4 A company is build on DATA FLOWS but All we have is DATA STORES @gamussa | #NDCPorto | @ConfluentINc

Slide 5

Slide 5

5 Pre-Streaming @gamussa | #NDCPorto | @ConfluentINc

Slide 6

Slide 6

6 @gamussa | #NDCPorto | @ConfluentINc

Slide 7

Slide 7

Slide 8

Slide 8

8 New World Streaming first • DB/DWH + Many more distributed data systems • Monolith -> Microservices • Batch -> Real-time @gamussa | #NDCPorto | @ConfluentINc

Slide 9

Slide 9

9 Streaming Platform Storage Pub / Sub Processing @gamussa | #NDCPorto | @ConfluentINc

Slide 10

Slide 10

10 Storage @gamussa | #NDCPorto | @ConfluentINc

Slide 11

Slide 11

11 Core Abstraction ● DB - table ● Hadoop - file ● Kafka - ? @gamussa | #NDCPorto | @ConfluentINc

Slide 12

Slide 12

Slide 13

Slide 13

13 LOG @gamussa | #NDCPorto | @ConfluentINc

Slide 14

Slide 14

14 The log is a simple idea New Old Messages are added at the end of the log @gamussa | #NDCPorto | @ConfluentINc

Slide 15

Slide 15

15 The log is a simple idea New Old Messages are added at the end of the log @gamussa | #NDCPorto | @ConfluentINc

Slide 16

Slide 16

16 Pub / Sub @gamussa | #NDCPorto | @ConfluentINc

Slide 17

Slide 17

17 Time @gamussa | #NDCPorto | @ConfluentINc

Slide 18

Slide 18

18 Time C1 @gamussa | C2 #NDCPorto C3 | @ConfluentINc

Slide 19

Slide 19

19 Time A B hash(key) % numPartitions = N C D @gamussa | #NDCPorto | @ConfluentINc

Slide 20

Slide 20

20 Time Messages will be produced in a round robin fashion @gamussa | #NDCPorto | @ConfluentINc

Slide 21

Slide 21

21 Consumers have a position all of their own Ricardo is here Scan New Old Robin is here Scan Viktor is here @gamussa | Scan #NDCPorto | @ConfluentINc

Slide 22

Slide 22

22 Consumers have a position all of their own Ricardo is here Scan New Old Robin is here @gamussa Viktor is here Scan | #NDCPorto | Scan @ConfluentINc

Slide 23

Slide 23

23 Consumers have a position all of their own Ricardo is here Scan New Old Robin is here @gamussa | Viktor is here Scan #NDCPorto | @ConfluentINc Scan

Slide 24

Slide 24

24 Only Sequential Access Old Read to offset & scan @gamussa | #NDCPorto | @ConfluentINc New

Slide 25

Slide 25

CONSUMERS CONSUMER GROUP COORDINATOR CONSUMER GROUP

Slide 26

Slide 26

26 C @gamussa | #NDCPorto | @ConfluentINc

Slide 27

Slide 27

27 CC C1 CC C2 @gamussa | #NDCPorto | @ConfluentINc

Slide 28

Slide 28

28 @gamussa | #NDCPorto | C C C C @ConfluentINc

Slide 29

Slide 29

29 @gamussa | #NDCPorto | 0 1 2 3 @ConfluentINc

Slide 30

Slide 30

30 @gamussa | #NDCPorto | 0 1 2 3 @ConfluentINc

Slide 31

Slide 31

31 @gamussa | #NDCPorto | 0, 3 1 2 3 @ConfluentINc

Slide 32

Slide 32

32 Linearly Scalable Architecture Producers Single topic: - Many producers machines - Many consumer machines - Many Broker machines No Bottleneck!! Consumers @gamussa | #NDCPorto | @ConfluentINc

Slide 33

Slide 33

33 From/to other systems: Kafka Connect and more Tip: Great option to gradually move workloads to Kafka while keeping production running!

Slide 34

Slide 34

34 Kafka Connect ● Deployed standalone (development) or as a distributed cluster (production) ● Elastic service that works on bare-metal, VMs, containers, Kubernetes, … ● The individual ‘Connector’ determines delivery guarantees, e.g., exactly-once VM VM

Slide 35

Slide 35

35 Single Message Transforms for real-time ETL Ingress: modify an Event before storing ●Obfuscate sensitive information, e.g. PII ●Add origin of event for lineage tracking ●Remove unnecessary data fields ●… and more { user: ab123, gender: female, ip: 1.2.3.95 } Egress: modify an Event on its way out ●Route high-priority events to faster stores ●Direct events to different Elasticsearch indexes ●Cast data types to match destination ●… and more { user: ab123, ip: 1.2.3.XXX }

Slide 36

Slide 36

36 Replicate to get fault tolerance leader msg Machine B Machine A @gamussa replicate msg | #NDCPorto | @ConfluentINc

Slide 37

Slide 37

37 Partition Leadership and Replication Topic1 partition1 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4 Broker 1 Broker 2 Topic1 partition4 Broker 3 Broker 4 Leader @gamussa | #NDCPorto | @ConfluentINc Follower

Slide 38

Slide 38

38 Replication provides resiliency A replica takes over on machine failure @gamussa | #NDCPorto | @ConfluentINc

Slide 39

Slide 39

39 Partition Leadership and Replication - node failure Topic1 partition1 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4 Broker 1 Broker 2 Topic1 partition4 Broker 3 Broker 4 Leader @gamussa | #NDCPorto | @ConfluentINc Follower

Slide 40

Slide 40

40 The log is a type of durable messaging system Similar to a traditional messaging system (ActiveMQ, Rabbit etc) but with: (a) Far better scalability (b) Built in fault tolerance / HA (c) Storage

Slide 41

Slide 41

Stop! Demo time! @gamussa | #NDCPorto | @ConfluentINc

Slide 42

Slide 42

42 Processing @gamussa | #NDCPorto | @ConfluentINc

Slide 43

Slide 43

43 Streaming is the toolset for dealing with events as they move! @gamussa | #NDCPorto | @ConfluentINc

Slide 44

Slide 44

44 What exactly is Stream Processing? authorization_attempts @gamussa possible_fraud | #NDCPorto | @ConfluentINc

Slide 45

Slide 45

45 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #NDCPorto | @ConfluentINc

Slide 46

Slide 46

46 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #NDCPorto | @ConfluentINc

Slide 47

Slide 47

47 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #NDCPorto | @ConfluentINc

Slide 48

Slide 48

48 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #NDCPorto | @ConfluentINc

Slide 49

Slide 49

49 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #NDCPorto | @ConfluentINc

Slide 50

Slide 50

50 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #NDCPorto | @ConfluentINc

Slide 51

Slide 51

51 Coding Sophistication Lower the bar to enter the world of streaming Core developers who use Java/Scala streams Core developers who don’t use Java/Scala, e.g. .NET, Go Data engineers, architects, DevOps/SRE BI analysts User Population @gamussa | #NDCPorto | @ConfluentINc

Slide 52

Slide 52

52 KSQL #FTW ksql> 1 UI POST /query CLI 2 @gamussa | #NDCPorto 3 | REST @ConfluentINc 4 Headless

Slide 53

Slide 53

53 Interaction with Kafka KSQL (processing) Application Kafka (processing) Java/KStreams, .NET (data) Does not run on Kafka brokers @gamussa Does not run on Kafka brokers | #NDCPorto | @ConfluentINc

Slide 54

Slide 54

54 Find your local Meetup Group https://cnfl.io/kafka-meetups Grab Stream Processing books https://cnfl.io/book-bundle Join us in Slack http://cnfl.io/slack @gamussa | #NDCPorto | @ConfluentINc

Slide 55

Slide 55

Thanks! @gamussa viktor@confluent.io @gamussa | @ #NDCPorto | @ConfluentINc