#NDCPORTO @ndc_conferences Apache Kafka Event-Streaming Platform for .NET Developers April, 2020 @gamussa | #NDCPORTO | @ConfluentINc

2 @gamussa | #NDCPorto | @ConfluentINc

3 I build highly scalable Hello World apps @gamussa | #NDCPorto | @ConfluentINc

4 A company is build on DATA FLOWS but All we have is DATA STORES @gamussa | #NDCPorto | @ConfluentINc

5 Pre-Streaming @gamussa | #NDCPorto | @ConfluentINc

6 @gamussa | #NDCPorto | @ConfluentINc

8 New World Streaming first • DB/DWH + Many more distributed data systems • Monolith -> Microservices • Batch -> Real-time @gamussa | #NDCPorto | @ConfluentINc

9 Streaming Platform Storage Pub / Sub Processing @gamussa | #NDCPorto | @ConfluentINc

10 Storage @gamussa | #NDCPorto | @ConfluentINc

11 Core Abstraction ● DB - table ● Hadoop - file ● Kafka - ? @gamussa | #NDCPorto | @ConfluentINc

13 LOG @gamussa | #NDCPorto | @ConfluentINc

14 The log is a simple idea New Old Messages are added at the end of the log @gamussa | #NDCPorto | @ConfluentINc

15 The log is a simple idea New Old Messages are added at the end of the log @gamussa | #NDCPorto | @ConfluentINc

16 Pub / Sub @gamussa | #NDCPorto | @ConfluentINc

17 Time @gamussa | #NDCPorto | @ConfluentINc

18 Time C1 @gamussa | C2 #NDCPorto C3 | @ConfluentINc

19 Time A B hash(key) % numPartitions = N C D @gamussa | #NDCPorto | @ConfluentINc

20 Time Messages will be produced in a round robin fashion @gamussa | #NDCPorto | @ConfluentINc

21 Consumers have a position all of their own Ricardo is here Scan New Old Robin is here Scan Viktor is here @gamussa | Scan #NDCPorto | @ConfluentINc

22 Consumers have a position all of their own Ricardo is here Scan New Old Robin is here @gamussa Viktor is here Scan | #NDCPorto | Scan @ConfluentINc

23 Consumers have a position all of their own Ricardo is here Scan New Old Robin is here @gamussa | Viktor is here Scan #NDCPorto | @ConfluentINc Scan

24 Only Sequential Access Old Read to offset & scan @gamussa | #NDCPorto | @ConfluentINc New

CONSUMERS CONSUMER GROUP COORDINATOR CONSUMER GROUP

26 C @gamussa | #NDCPorto | @ConfluentINc

27 CC C1 CC C2 @gamussa | #NDCPorto | @ConfluentINc

28 @gamussa | #NDCPorto | C C C C @ConfluentINc

29 @gamussa | #NDCPorto | 0 1 2 3 @ConfluentINc

30 @gamussa | #NDCPorto | 0 1 2 3 @ConfluentINc

31 @gamussa | #NDCPorto | 0, 3 1 2 3 @ConfluentINc

32 Linearly Scalable Architecture Producers Single topic: - Many producers machines - Many consumer machines - Many Broker machines No Bottleneck!! Consumers @gamussa | #NDCPorto | @ConfluentINc

33 From/to other systems: Kafka Connect and more Tip: Great option to gradually move workloads to Kafka while keeping production running!

34 Kafka Connect ● Deployed standalone (development) or as a distributed cluster (production) ● Elastic service that works on bare-metal, VMs, containers, Kubernetes, … ● The individual ‘Connector’ determines delivery guarantees, e.g., exactly-once VM VM

35 Single Message Transforms for real-time ETL Ingress: modify an Event before storing ●Obfuscate sensitive information, e.g. PII ●Add origin of event for lineage tracking ●Remove unnecessary data fields ●… and more { user: ab123, gender: female, ip: 1.2.3.95 } Egress: modify an Event on its way out ●Route high-priority events to faster stores ●Direct events to different Elasticsearch indexes ●Cast data types to match destination ●… and more { user: ab123, ip: 1.2.3.XXX }

36 Replicate to get fault tolerance leader msg Machine B Machine A @gamussa replicate msg | #NDCPorto | @ConfluentINc

37 Partition Leadership and Replication Topic1 partition1 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4 Broker 1 Broker 2 Topic1 partition4 Broker 3 Broker 4 Leader @gamussa | #NDCPorto | @ConfluentINc Follower

38 Replication provides resiliency A replica takes over on machine failure @gamussa | #NDCPorto | @ConfluentINc

39 Partition Leadership and Replication - node failure Topic1 partition1 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4 Broker 1 Broker 2 Topic1 partition4 Broker 3 Broker 4 Leader @gamussa | #NDCPorto | @ConfluentINc Follower

40 The log is a type of durable messaging system Similar to a traditional messaging system (ActiveMQ, Rabbit etc) but with: (a) Far better scalability (b) Built in fault tolerance / HA (c) Storage

Stop! Demo time! @gamussa | #NDCPorto | @ConfluentINc

42 Processing @gamussa | #NDCPorto | @ConfluentINc

43 Streaming is the toolset for dealing with events as they move! @gamussa | #NDCPorto | @ConfluentINc

44 What exactly is Stream Processing? authorization_attempts @gamussa possible_fraud | #NDCPorto | @ConfluentINc

45 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #NDCPorto | @ConfluentINc

46 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #NDCPorto | @ConfluentINc

47 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #NDCPorto | @ConfluentINc

48 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #NDCPorto | @ConfluentINc

49 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #NDCPorto | @ConfluentINc

50 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #NDCPorto | @ConfluentINc

51 Coding Sophistication Lower the bar to enter the world of streaming Core developers who use Java/Scala streams Core developers who don’t use Java/Scala, e.g. .NET, Go Data engineers, architects, DevOps/SRE BI analysts User Population @gamussa | #NDCPorto | @ConfluentINc

52 KSQL #FTW ksql> 1 UI POST /query CLI 2 @gamussa | #NDCPorto 3 | REST @ConfluentINc 4 Headless

53 Interaction with Kafka KSQL (processing) Application Kafka (processing) Java/KStreams, .NET (data) Does not run on Kafka brokers @gamussa Does not run on Kafka brokers | #NDCPorto | @ConfluentINc

54 Find your local Meetup Group https://cnfl.io/kafka-meetups Grab Stream Processing books https://cnfl.io/book-bundle Join us in Slack http://cnfl.io/slack @gamussa | #NDCPorto | @ConfluentINc

Thanks! @gamussa viktor@confluent.io @gamussa | @ #NDCPorto | @ConfluentINc