Crossing The Streams: Kafka For Spring Developers

A presentation at NYJavaSIG in September 2018 in New York, NY, USA by Viktor Gamov

Slide 1

Slide 1

Crossing the Streams: Kafka for Spring Developers @gamussa @ @NYJavaSig @confluentinc

Slide 2

Slide 2

@gamussa @ @NYJavaSig @confluentinc

Slide 3

Slide 3

The agenda Quick intro to Streaming Platform Gentle intro to Stream processing Wire everything nicely with Spring @gamussa @ @NYJavaSig @confluentinc

Slide 4

Slide 4

https://cnfl.io/streams-movie @gamussa @ @NYJavaSig @confluentinc

Slide 5

Slide 5

Who am I? Solutions Architect Developer Advocate @gamussa in internetz Hey you, yes, you, go follow me in twitter © @gamussa @ @NYJavaSig @confluentinc

Slide 6

Slide 6

Kafka & Confluent @gamussa @ @NYJavaSig @confluentinc

Slide 7

Slide 7

@gamussa @ @NYJavaSig @confluentinc

Slide 8

Slide 8

@gamussa @ @NYJavaSig @confluentinc

Slide 9

Slide 9

@gamussa @ @NYJavaSig @confluentinc

Slide 10

Slide 10

Origins in Stream Processing Kafka Streams / KSQL Kafka Serving Layer (Cassandra, KV-storage etc.) High Throughput Messaging @gamussa @ @NYJavaSig Continuous Computation API based clustering @confluentinc

Slide 11

Slide 11

Kafka is a Streaming Platform Producer Connectors Consumer The Log Connectors Streaming Engine @gamussa @ @NYJavaSig @confluentinc

Slide 12

Slide 12

Streaming @gamussa @ @NYJavaSig @confluentinc

Slide 13

Slide 13

What exactly is Stream Processing? authorization_attempts @gamussa possible_fraud @ @NYJavaSig @confluentinc

Slide 14

Slide 14

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa @ @NYJavaSig @confluentinc

Slide 15

Slide 15

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa @ @NYJavaSig @confluentinc

Slide 16

Slide 16

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa @ @NYJavaSig @confluentinc

Slide 17

Slide 17

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa @ @NYJavaSig @confluentinc

Slide 18

Slide 18

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa @ @NYJavaSig @confluentinc

Slide 19

Slide 19

What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa @ @NYJavaSig @confluentinc

Slide 20

Slide 20

What is a Streaming Platform? Producer Consumer Connectors Connectors The Log Streaming Engine @gamussa @ @NYJavaSig @confluentinc

Slide 21

Slide 21

Kafka’s Distributed Log Producer Connectors Consumer Connectors The Log Streaming Engine @gamussa @ @NYJavaSig @confluentinc

Slide 22

Slide 22

The log - durable messaging system Similar to a traditional messaging system (ActiveMQ, Rabbit) but with: (a) Far better scalability (b) Built in fault tolerance / HA (c) Storage @gamussa @ @NYJavaSig @confluentinc

Slide 23

Slide 23

The log is a simple idea New Old Messages are added at the end of the log @gamussa @ @NYJavaSig @confluentinc

Slide 24

Slide 24

Consumers have a position all of their own George is here Scan New Old Fred is here @gamussa Sally is here Scan @ @NYJavaSig @confluentinc Scan

Slide 25

Slide 25

Only Sequential Access Old Read to offset & scan @gamussa @ @NYJavaSig @confluentinc New

Slide 26

Slide 26

Shard data to get scalability Messages are sent to different partitions Producer (1) Producer (2) Producer (3) Cluster of machines Partitions live on different machines @gamussa @ @NYJavaSig @confluentinc

Slide 27

Slide 27

Replicate to get fault tolerance leader msg Machine A @gamussa Machine B replicate @ @NYJavaSig msg @confluentinc

Slide 28

Slide 28

Replication provides resiliency A ‘replica’ takes over on machine failure @gamussa @ @NYJavaSig @confluentinc

Slide 29

Slide 29

Linearly Scalable Architecture Producers Single topic: - Many producers machines - Many consumer machines - Many Broker machines No Bottleneck!! Consumers @gamussa @ @NYJavaSig @confluentinc

Slide 30

Slide 30

Worldwide, localized views London Replicator Replicator Tokyo NY Replicator @gamussa @ @NYJavaSig @confluentinc !30

Slide 31

Slide 31

The Connect API Producer Connectors Consumer Connectors The Log Streaming Engine @gamussa @ @NYJavaSig @confluentinc

Slide 32

Slide 32

Ingest / Egest into any data source Kafka Connect @gamussa Kafka Connect @ @NYJavaSig @confluentinc

Slide 33

Slide 33

Ingest/Egest data from/to data sources Amazon S3 Elasticsearch HDFS JDBC Couchbase Cassandra Oracle SAP Vertica Blockchain JMX Kenesis MongoDB MQTT NATS Postgres Rabbit Redis Twitter Bintray DynamoDB FTP Github BigQuery Google Pub Sub RethinkDB Salesforce Solr Splunk @gamussa @ @NYJavaSig @confluentinc

Slide 34

Slide 34

Kafka Streams and KSQL Producer Connectors Consumer Connectors The Log Streaming Engine @gamussa @ @NYJavaSig @confluentinc

Slide 35

Slide 35

@gamussa @ @NYJavaSig @confluentinc

Slide 36

Slide 36

@gamussa @ @NYJavaSig @confluentinc

Slide 37

Slide 37

Before @gamussa @ @NYJavaSig @confluentinc

Slide 38

Slide 38

After @gamussa @ @NYJavaSig @confluentinc

Slide 39

Slide 39

Things Kafka Streams Does Runs everywhere Clustering done for you Integrated database Exactly-once processing Joins, windowing, aggregation @gamussa @ @NYJavaSig Event-time processing S/M/L/XL/XXL/XXXL sizes @confluentinc

Slide 40

Slide 40

Slide 41

Slide 41

Streams to Tables @gamussa @ @NYJavaSig @confluentinc

Slide 42

Slide 42

@gamussa @ @NYJavaSig @confluentinc

Slide 43

Slide 43

Stream/Table Duality @gamussa @ @NYJavaSig @confluentinc

Slide 44

Slide 44

Stream/Table Duality @gamussa @ @NYJavaSig @confluentinc

Slide 45

Slide 45

Join Streams and Tables Kafka Kafka Streams / KSQL Topic Stream Join Table Compacted Topic @gamussa @ @NYJavaSig @confluentinc

Slide 46

Slide 46

Kafka is a complete Streaming Platform Producer Connectors Consumer Connectors The Log Streaming Engine @gamussa @ @NYJavaSig @confluentinc

Slide 47

Slide 47

https://www.confluent.io/download/ @gamussa @ @NYJavaSig @confluentinc

Slide 48

Slide 48

Will it fly? Let’s try! @gamussa @ @NYJavaSig @confluentinc

Slide 49

Slide 49

One more thing… @gamussa @ @NYJavaSig @confluentinc

Slide 50

Slide 50

@gamussa @ @NYJavaSig @confluentinc

Slide 51

Slide 51

@gamussa @ @NYJavaSig @confluentinc

Slide 52

Slide 52

A Major New Paradigm @gamussa @ @NYJavaSig @confluentinc

Slide 53

Slide 53

www.kafka-summit.org promo: Gamov20 @gamussa @ @NYJavaSig @confluentinc

Slide 54

Slide 54

Thanks! @gamussa viktor@confluent.io We are hiring! https://www.confluent.io/careers/ @gamussa @ @NYJavaSig @confluentinc