Crossing the Streams: Rethinking Stream Processing with KStreams and KSQL

A presentation at DevNexus 2019 in March 2019 in Atlanta, GA, USA by Viktor Gamov

Slide 1

Slide 1

Crossing the Streams: Rethinking Stream Processing with Kafka Streams and KSQL DEVNEXUS 2019 | @tlberglund | @gamussa

Slide 2

Slide 2

@gamussa | @ @tlberglund | #DEVnexus

Slide 3

Slide 3

https://cnfl.io/streams-movie-demo

Slide 4

Slide 4

Raffle, yeah 🚀 Follow @gamussa 📸🖼👬 Tag @gamussa @tlberglund @tlberglund With #devnexus

Slide 5

Slide 5

Preface

Slide 6

Slide 6

Streaming is the toolset for dealing with events as they move! @gamussa | @tlberglund | #DEVnexus

Slide 7

Slide 7

Java Apps with Kafka Streams or KSQL Serving Layer (Microservices, Elastic, etc.) API based clustering High Throughput Continuous Streaming platform Computation @gamussa | @ @tlberglund | #DEVnexus

Slide 8

Slide 8

Apache Kafka Event Streaming Platform 101

Slide 9

Slide 9

Event Streaming Platform Architecture Application Application Application KSQL Native Client library Kafka Streams Kafka Streams Load Balancer * REST Proxy Schema Registry Kafka Connect Kafka Brokers @gamussa | Zookeeper Nodes @ @tlberglund | #DEVnexus

Slide 10

Slide 10

The log is a simple idea New Old Messages are added at the end of the log @gamussa | @tlberglund | #DEVnexus

Slide 11

Slide 11

The log is a simple idea New Old Messages are added at the end of the log @gamussa | @tlberglund | #DEVnexus

Slide 12

Slide 12

Consumers have a position all of their own Ricardo is here Scan New Old Robin is here Scan Viktor is here @gamussa | Scan @tlberglund | #DEVnexus

Slide 13

Slide 13

Consumers have a position all of their own Ricardo is here Scan New Old Robin is here Viktor is here Scan @gamussa | @tlberglund | Scan #DEVnexus

Slide 14

Slide 14

Consumers have a position all of their own Ricardo is here Scan New Old Robin is here @gamussa | Viktor is here Scan @tlberglund | #DEVnexus Scan

Slide 15

Slide 15

Only Sequential Access Old Read to offset & scan @gamussa | @tlberglund | #DEVnexus New

Slide 16

Slide 16

Shard data to get scalability Producer (1) Producer (2) Producer (3) Messages are sent to different partitions Cluster of machines Partitions live on different machines @gamussa | @tlberglund | #DEVnexus

Slide 17

Slide 17

CONSUMERS CONSUMER GROUP CONSUMER GROUP COORDINATOR

Slide 18

Slide 18

Linearly Scalable Architecture Producers Single topic: - Many producers machines - Many consumer machines - Many Broker machines No Bottleneck!! Consumers @gamussa | @tlberglund | #DEVnexus

Slide 19

Slide 19

!// in-memory store, not persistent Map<String, Integer> groupByCounts = new HashMap!<>(); try (KafkaConsumer<String, String> consumer = new KafkaConsumer!<>(consumerProperties()); KafkaProducer<String, Integer> producer = new KafkaProducer!<>(producerProperties())) { consumer.subscribe(Arrays.asList(“A”, “B”)); while (true) { !// consumer poll loop ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5)); for (ConsumerRecord<String, String> record : records) { String key = record.key(); Integer count = groupByCounts.get(key); if (count !== null) { count = 0; } count += 1; groupByCounts.put(key, count); } } @gamussa | @ @tlberglund | #DEVnexus

Slide 20

Slide 20

!// in-memory store, not persistent Map<String, Integer> groupByCounts = new HashMap!<>(); try (KafkaConsumer<String, String> consumer = new KafkaConsumer!<>(consumerProperties()); KafkaProducer<String, Integer> producer = new KafkaProducer!<>(producerProperties())) { consumer.subscribe(Arrays.asList(“A”, “B”)); while (true) { !// consumer poll loop ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5)); for (ConsumerRecord<String, String> record : records) { String key = record.key(); Integer count = groupByCounts.get(key); if (count !== null) { count = 0; } count += 1; groupByCounts.put(key, count); } } @gamussa | @ @tlberglund | #DEVnexus

Slide 21

Slide 21

!// in-memory store, not persistent Map<String, Integer> groupByCounts = new HashMap!<>(); try (KafkaConsumer<String, String> consumer = new KafkaConsumer!<>(consumerProperties()); KafkaProducer<String, Integer> producer = new KafkaProducer!<>(producerProperties())) { consumer.subscribe(Arrays.asList(“A”, “B”)); while (true) { !// consumer poll loop ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5)); for (ConsumerRecord<String, String> record : records) { String key = record.key(); Integer count = groupByCounts.get(key); if (count !== null) { count = 0; } count += 1; groupByCounts.put(key, count); } } @gamussa | @ @tlberglund | #DEVnexus

Slide 22

Slide 22

!// in-memory store, not persistent Map<String, Integer> groupByCounts = new HashMap!<>(); try (KafkaConsumer<String, String> consumer = new KafkaConsumer!<>(consumerProperties()); KafkaProducer<String, Integer> producer = new KafkaProducer!<>(producerProperties())) { consumer.subscribe(Arrays.asList(“A”, “B”)); while (true) { !// consumer poll loop ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5)); for (ConsumerRecord<String, String> record : records) { String key = record.key(); Integer count = groupByCounts.get(key); if (count !== null) { count = 0; } count += 1; groupByCounts.put(key, count); } } @gamussa | @ @tlberglund | #DEVnexus

Slide 23

Slide 23

while (true) { !// consumer poll loop ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5)); for (ConsumerRecord<String, String> record : records) { String key = record.key(); Integer count = groupByCounts.get(key); if (count !== null) { count = 0; } count += 1; groupByCounts.put(key, count); } } @gamussa | @ @tlberglund | #DEVnexus

Slide 24

Slide 24

while (true) { !// consumer poll loop ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(5)); for (ConsumerRecord<String, String> record : records) { String key = record.key(); Integer count = groupByCounts.get(key); if (count !== null) { count = 0; } count += 1; !// actually doing something useful groupByCounts.put(key, count); } } @gamussa | @ @tlberglund | #DEVnexus

Slide 25

Slide 25

if (counter!++ % sendInterval !== 0) { for (Map.Entry<String, Integer> groupedEntry : groupByCounts.entrySet()) { ProducerRecord<String, Integer> producerRecord = new ProducerRecord!<>(“group-by-counts”, groupedEntry.getKey(), groupedEntry.getValue()); producer.send(producerRecord); } consumer.commitSync(); } } } @gamussa | @ @tlberglund | #DEVnexus

Slide 26

Slide 26

if (counter!++ % sendInterval !== 0) { for (Map.Entry<String, Integer> groupedEntry : groupByCounts.entrySet()) { ProducerRecord<String, Integer> producerRecord = new ProducerRecord!<>(“group-by-counts”, groupedEntry.getKey(), groupedEntry.getValue()); producer.send(producerRecord); } consumer.commitSync(); } } } @gamussa | @ @tlberglund | #DEVnexus

Slide 27

Slide 27

if (counter!++ % sendInterval !== 0) { for (Map.Entry<String, Integer> groupedEntry : groupByCounts.entrySet()) { ProducerRecord<String, Integer> producerRecord = new ProducerRecord!<>(“group-by-counts”, groupedEntry.getKey(), groupedEntry.getValue()); producer.send(producerRecord); } consumer.commitSync(); } } } @gamussa | @ @tlberglund | #DEVnexus

Slide 28

Slide 28

https://twitter.com/monitoring_king/status/1048264580743479296

Slide 29

Slide 29

LET’S TALK ABOUT THIS FRAMEWORK OF YOURS. I THINK ITS GOOD, EXCEPT IT SUCKS @gamussa | @ @tlberglund | #DEVnexus

Slide 30

Slide 30

SO LET ME SHOW KAFKA STREAMS THAT WAY IT MIGHT BE REALLY GOOD @gamussa | @ @tlberglund | #DEVnexus

Slide 31

Slide 31

Talk is cheap! Show me code!

Slide 32

Slide 32

Every framework Wants to be when it grows up Scalable Elastic Stateful @gamussa Fault-tolerant Distributed | @tlberglund | #DEVnexus

Slide 33

Slide 33

@gamussa | @ @tlberglund | #DEVnexus

Slide 34

Slide 34

@

Slide 35

Slide 35

the KAFKA STREAMS API is a JAVA API to BUILD REAL-TIME APPLICATIONS @gamussa | @tlberglund | #DEVnexus

Slide 36

Slide 36

App Not running inside brokers! Streams API @gamussa | @tlberglund | #DEVnexus

Slide 37

Slide 37

Same app, many instances App Streams API @gamussa | App Streams API @tlberglund App Streams API | Brokers? Nope! #DEVnexus

Slide 38

Slide 38

Before Processing Cluster Shared Database Your Job @gamussa | @tlberglund | #DEVnexus Dashboard

Slide 39

Slide 39

After Dashboard APP Streams API @gamussa | @tlberglund | #DEVnexus

Slide 40

Slide 40

this means you can DEPLOY your app ANYWHERE using WHATEVER TECHNOLOGY YOU WANT @gamussa | @tlberglund | #DEVnexus

Slide 41

Slide 41

So many places to run you app! …and many more… @gamussa | @tlberglund | #DEVnexus

Slide 42

Slide 42

Things Kafka Stream Does Enterprise Support Open Source Powerful Processing incl. Filters, Transforms, Joins, Aggregations, Windowing Runs Everywhere Supports Streams and Tables @gamussa | Elastic, Scalable, Fault-tolerant Exactly-Once Processing @tlberglund | Kafka Security Integration Event-Time Processing #DEVnexus

Slide 43

Slide 43

Talk is cheap! Show me code!

Slide 44

Slide 44

Talk is cheap! Show me code!

Slide 45

Slide 45

Slide 46

Slide 46

Do you think that’s a table you are querying ?

Slide 47

Slide 47

The Stream-Table Duality Table Alice €50 (balance) Stream (payments) Alice + €50 Alice €50 €50 Alice €75 €75 Alice €15 Bob €18 €18 Bob €18 Bob €18 Bob

  • €18 Alice
  • €25 Alice – €60 time @gamussa | @tlberglund | #DEVnexus

Slide 48

Slide 48

Talk is cheap! Show me code!

Slide 49

Slide 49

What’s next?

Slide 50

Slide 50

https://twitter.com/IDispose/status/1048602857191170054

Slide 51

Slide 51

Coding Sophistication Lower the bar to enter the world of streaming Core developers who use Java/Scala streams Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts User Population @gamussa | @tlberglund | #DEVnexus

Slide 52

Slide 52

KSQL #FTW ksql> 1 UI 2 @gamussa POST /query CLI | 3 @tlberglund REST | #DEVnexus 4 Headless

Slide 53

Slide 53

Interaction with Kafka KSQL (processing) JVM application Kafka with Kafka Streams (processing) (data) Does not run on Kafka brokers Does not run on Kafka brokers @gamussa | @tlberglund | #DEVnexus

Slide 54

Slide 54

Standing on the shoulders of Streaming Giants Ease of use KSQL Powered by KSQL Kafka Streams Powered by Producer, Consumer APIs @gamussa | Flexibility @tlberglund | #DEVnexus

Slide 55

Slide 55

One last thing…

Slide 56

Slide 56

https://kafka-summit.org Gamov30 @gamussa | @ @tlberglund | #DEVnexus

Slide 57

Slide 57

Thanks! @gamussa viktor@confluent.io @tlberglund tim@confluent.io We are hiring! https://www.confluent.io/careers/ @gamussa | @ @tlberglund | #DEVnexus