Kafka as a Platform: the Ecosystem from the Ground Up

A presentation at DevSum in May 2021 in by Robin Moffatt

Slide 1

Slide 1

Kafka as a Platform: the Ecosystem from the Ground Up Robin Moffatt @rmoff

Slide 2

Slide 2

EVENTS @rmoff

Slide 3

Slide 3

EVENTS @rmoff

Slide 4

Slide 4

• • EVENTS d e n e p p a h g n i h t e Som d e n e p p a h t a Wh

Slide 5

Slide 5

Human generated events A Sale A Stock movement @rmoff

Slide 6

Slide 6

Machine generated events Networking IoT Applications @rmoff

Slide 7

Slide 7

EVENTS are EVERYWHERE @rmoff

Slide 8

Slide 8

EVENTS y r e v ^ are POWERFUL @rmoff

Slide 9

Slide 9

Slide 10

Slide 10

Slide 11

Slide 11

K V

Slide 12

Slide 12

LOG @rmoff

Slide 13

Slide 13

K V

Slide 14

Slide 14

K V

Slide 15

Slide 15

K V

Slide 16

Slide 16

K V

Slide 17

Slide 17

K V

Slide 18

Slide 18

K V

Slide 19

Slide 19

K V

Slide 20

Slide 20

Immutable Event Log Old New Events are added at the end of the log @rmoff

Slide 21

Slide 21

TOPICS @rmoff

Slide 22

Slide 22

Topics Clicks Orders Customers Topics are similar in concept to tables in a database @rmoff

Slide 23

Slide 23

PARTITIONS @rmoff

Slide 24

Slide 24

Partitions Clicks p0 P1 P2 Messages are guaranteed to be strictly ordered within a partition @rmoff

Slide 25

Slide 25

PUB / SUB @rmoff

Slide 26

Slide 26

PUB / SUB @rmoff

Slide 27

Slide 27

Producing data Old New Messages are added at the end of the log @rmoff

Slide 28

Slide 28

partition 0 … partition 1 producer … partition 2 … Partitioned Topic

Slide 29

Slide 29

package main import ( “gopkg.in/confluentinc/confluent-kafka-go.v1/kafka” ) func main() { topic := “test_topic” p, _ := kafka.NewProducer(&kafka.ConfigMap{ “bootstrap.servers”: “localhost:9092”}) defer p.Close() p.Produce(&kafka.Message{ TopicPartition: kafka.TopicPartition{Topic: &topic, Partition: 0}, Value: []byte(“Hello world”)}, nil) }

Slide 30

Slide 30

Producing to Kafka - No Key Time Partition 1 Partition 2 Partition 3 Messages will be produced in a round robin fashion Partition 4 @rmoff

Slide 31

Slide 31

Producing to Kafka - With Key Time Partition 1 A Partition 2 B hash(key) % numPartitions = N Partition 3 C Partition 4 D @rmoff

Slide 32

Slide 32

Producers partition 0 … partition 1 producer … partition 2 … Partitioned Topic • A client application • Puts messages into topics • Handles partitioning, network protocol • Java, Go, .NET, C/C++, Python • Also every other language

Slide 33

Slide 33

PUB / SUB @rmoff

Slide 34

Slide 34

Consuming data - access is only sequential Read to offset & scan Old New @rmoff

Slide 35

Slide 35

Consumers have a position of their own Old New Sally is here Scan @rmoff

Slide 36

Slide 36

Consumers have a position of their own Old New Fred is here Scan Sally is here Scan @rmoff

Slide 37

Slide 37

Consumers have a position of their own Rick is here Scan Old New Fred is here Scan Sally is here Scan @rmoff

Slide 38

Slide 38

c, _ := kafka.NewConsumer(&cm) defer c.Close() c.Subscribe(topic, nil) for { select { case ev := <-c.Events(): switch ev.(type) { case *kafka.Message: km := ev.(*kafka.Message) fmt.Printf(“✅ Message ‘%v’ received from topic ‘%v’\n”, string(km.Value), string(*km.TopicPartition.Topic)) } } }

Slide 39

Slide 39

Consuming From Kafka - Single Consumer Partition 1 Partition 2 C Partition 3 Partition 4 @rmoff

Slide 40

Slide 40

Consuming From Kafka - Multiple Consumers Partition 1 Partition 2 Partition 3 C1 C2 Partition 4 @rmoff

Slide 41

Slide 41

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 CC1 1 CC1 1 C2 Partition 4 @rmoff

Slide 42

Slide 42

CONSUMERS CONSUMER GROUP COORDINATOR CONSUMER GROUP

Slide 43

Slide 43

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 Ci Cii Ciii Civ Partition 4 @rmoff

Slide 44

Slide 44

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 Ci Cii Ciii Civ Partition 4 @rmoff

Slide 45

Slide 45

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 Ci Cii Ciii Partition 4 @rmoff

Slide 46

Slide 46

Consumers partition 0 … partition 1 … consumer A consumer A consumer A partition 2 … Partitioned Topic consumer B • A client application • Reads messages from topics • Horizontally, elastically scalable (if stateless) • Java, Go, .NET, C/C++, Python, everything else

Slide 47

Slide 47

BROKERS and REPLICATION @rmoff

Slide 48

Slide 48

Leader Partition Leadership and Replication Follower Partition 1 Partition 2 Partition 3 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff

Slide 49

Slide 49

Leader Partition Leadership and Replication Follower Partition 1 Partition 1 Partition 1 Partition 2 Partition 2 Partition 2 Partition 3 Partition 3 Partition 3 Partition 4 Partition 4 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff

Slide 50

Slide 50

Leader Partition Leadership and Replication Follower Partition 1 Partition 1 Partition 1 Partition 2 Partition 2 Partition 2 Partition 3 Partition 3 Partition 3 Partition 4 Partition 4 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff

Slide 51

Slide 51

DEMO @rmoff Photo by Raoul Droog on Unsplas

Slide 52

Slide 52

So far, this is Pretty good @rmoff

Slide 53

Slide 53

So far, this is Pretty good but I’ve not finished yet… @rmoff

Slide 54

Slide 54

Streaming Pipelines Amazon S3 RDBMS HDFS @rmoff

Slide 55

Slide 55

Evolve processing from old systems to new Existing App New App <x> RDBMS @rmoff

Slide 56

Slide 56

Slide 57

Slide 57

Streaming Integration with Kafka Connect syslog Sources Kafka Connect Kafka Brokers @rmoff

Slide 58

Slide 58

Streaming Integration with Kafka Connect Amazon Sinks Google Kafka Connect Kafka Brokers @rmoff

Slide 59

Slide 59

Streaming Integration with Kafka Connect Amazon syslog Google Kafka Connect Kafka Brokers @rmoff

Slide 60

Slide 60

Look Ma, No Code! { “connector.class”: “io.confluent.connect.jdbc.JdbcSourceConnector”, “connection.url”: “jdbc:mysql://asgard:3306/demo”, “table.whitelist”: “sales,orders,customers” } @rmoff

Slide 61

Slide 61

Extensible Connector Transform(s) Converter @rmoff

Slide 62

Slide 62

hub.confluent.io @rmoff

Slide 63

Slide 63

Single Kafka Connect node S3 Task #1 JDBC Task #1 JDBC Task #2 Kafka Connect cluster Worker Worker Offsets Config Status @rmoff

Slide 64

Slide 64

Kafka Connect - scalable and fault-tolerant S3 Task #1 JDBC Task #1 Kafka Connect cluster JDBC Task #2 Worker Worker Offsets Config Status @rmoff

Slide 65

Slide 65

Automatic fault tolerance S3 Task #1 JDBC Task #1 JDBC Task #2 Kafka Connect cluster Worker Worker Offsets Config Status @rmoff

Slide 66

Slide 66

K V

Slide 67

Slide 67

K V

Slide 68

Slide 68

K V

Slide 69

Slide 69

K V

Slide 70

Slide 70

K V ? s i h t s ’ t a h w … t i a W

Slide 71

Slide 71

How do you serialise your data? JSON Avro Protobuf Schema JSON CSV @rmoff

Slide 72

Slide 72

APIs are contracts between services {user_id: 53, address: “2 Elm st.”} Profile service Quote service {user_id: 53, quote: 580} @rmoff

Slide 73

Slide 73

But not all services {user_id: 53, address: “2 Elm st.”} Profile service Quote service {user_id: 53, quote: 580} @rmoff

Slide 74

Slide 74

And naturally… {user_id: 53, address: “2 Elm st.”} Profile service Quote service Profile database Stream processing @rmoff

Slide 75

Slide 75

Schemas are about how teams work together {user_id: 53, timestamp: 1497842472} new Date(timestamp) Profile service Quote service Profile database create table ( user_id number, timestamp number) Stream processing @rmoff

Slide 76

Slide 76

Things change… {user_id: 53, timestamp: “June 28, 2017 4:00pm”} Profile service Quote service Profile database Stream processing @rmoff

Slide 77

Slide 77

Moving fast and breaking things {user_id: 53, timestamp: “June 28, 2017 4:00pm”} Profile service Quote service Profile database Stream processing @rmoff

Slide 78

Slide 78

Lack of schemas – Coupling teams and services 2001 2001 Citrus Heights-Sunrise Blvd Citrus_Hghts 60670001 3400293 34 SAC Sacramento SV Sacramento Valley SAC Sacramento County APCD SMA8 Sacramento Metropolitan Area CA 6920 Sacramento 28 6920 13588 7400 Sunrise Blvd 95610 38 41 56 38.6988889 121 16 15.98999977 -121.271111 10 4284781 650345 52 @rmoff

Slide 79

Slide 79

Serialisation & Schemas JSON Avro Protobuf Schema JSON CSV @rmoff

Slide 80

Slide 80

Serialisation & Schemas JSON Avro Protobuf Schema JSON CSV 👍 👍 👍 😬 https://rmoff.dev/qcon-schemas @rmoff

Slide 81

Slide 81

Schemas Schema Registry Topic producer … consumer

Slide 82

Slide 82

n a y l l Actua ! c i p o t l a n r e t in Schemas Schema Registry Topic producer … consumer

Slide 83

Slide 83

Producers contain serializers props.put(“key.serializer”, “org.apache.kafka.serializers.StringSerializer”); props.put(“value.serializer”, “io.confluent.kafka.serializers.KafkaAvroSerializer”); props.put(“schema.registry.url”, “http://schema-registry:8081”); … producer<String, LogLine> producer = new KafkaProducer<String, LogLine>(props); @rmoff

Slide 84

Slide 84

partition 0 … partition 1 … consumer A consumer A consumer A partition 2 … consumer B Partitioned Topic @rmoff

Slide 85

Slide 85

consumer A consumer A consumer A @rmoff

Slide 86

Slide 86

} “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 Photo by Franck V. on Unsplash { @rmoff

Slide 87

Slide 87

Streams of events Time @rmoff

Slide 88

Slide 88

Stream Processing Stream: widgets Stream: widgets_red @rmoff

Slide 89

Slide 89

Stream Processing with Kafka Streams Stream: widgets final StreamsBuilder builder = new StreamsBuilder() .stream(“widgets”, Consumed.with(stringSerde, widgetsSerde)) .filter( (key, widget) -> widget.getColour().equals(“RED”) ) .to(“widgets_red”, Produced.with(stringSerde, widgetsSerde)); Stream: widgets_red @rmoff

Slide 90

Slide 90

Streams Application Streams Application Streams Application @rmoff

Slide 91

Slide 91

Stream Processing with ksqlDB Stream: widgets ksqlDB CREATE STREAM widgets_red AS SELECT * FROM widgets WHERE colour=’RED’; Stream: widgets_red @rmoff

Slide 92

Slide 92

} “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 Photo by Franck V. on Unsplash { @rmoff

Slide 93

Slide 93

SELECT * FROM WIDGETS WHERE WEIGHT_G > 120 { SELECT COUNT(*) FROM WIDGETS GROUP BY PRODUCTION_LINE } SELECT AVG(TEMP_CELCIUS) AS TEMP FROM WIDGETS GROUP BY SENSOR_ID HAVING TEMP>20 Photo by Franck V. on Unsplash “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 CREATE SINK CONNECTOR dw WITH ( Object store, ‘connector.class’ = ‘S3Connector’, data warehouse, ‘topics’ = ‘widgets’ RDBMS …); @rmoff

Slide 94

Slide 94

Stream Processing with ksqlDB Source stream @rmoff

Slide 95

Slide 95

Stream Processing with ksqlDB Source stream @rmoff

Slide 96

Slide 96

Stream Processing with ksqlDB Source stream @rmoff

Slide 97

Slide 97

Stream Processing with ksqlDB Source stream Analytics @rmoff

Slide 98

Slide 98

Stream Processing with ksqlDB Source stream Applications / Microservices @rmoff

Slide 99

Slide 99

Stream Processing with ksqlDB …SUM(TXN_AMT) GROUP BY AC_ID AC _I D= 42 BA LA NC AC E= _I 94 D= .0 42 0 Source stream Applications / Microservices @rmoff

Slide 100

Slide 100

ksqlDB or Kafka Streams? @rmoff Photo by Ramiz Dedaković on Unsplash

Slide 101

Slide 101

Standing on the Shoulders of Streaming Giants ksqlDB Powered by Ease of use ksqlDB UDFs Kafka Streams Powered by Producer, Consumer APIs Flexibility @rmoff

Slide 102

Slide 102

DEMO @rmoff Photo by Raoul Droog on Unsplas

Slide 103

Slide 103

Summary @rmoff

Slide 104

Slide 104

@rmoff

Slide 105

Slide 105

K V @rmoff

Slide 106

Slide 106

K V @rmoff

Slide 107

Slide 107

The Log @rmoff

Slide 108

Slide 108

Producer Consumer The Log @rmoff

Slide 109

Slide 109

Producer Consumer The Log Connectors @rmoff

Slide 110

Slide 110

Producer Consumer The Log Connectors Streaming Engine @rmoff

Slide 111

Slide 111

Apache Kafka Producer Consumer The Log Connectors Streaming Engine @rmoff

Slide 112

Slide 112

Producer Security Schema Registry Consumer The Log Streaming Engine ksqlDB REST Proxy Connectors Confluent Control Center

Slide 113

Slide 113

EVENTS are EVERYWHERE @rmoff

Slide 114

Slide 114

EVENTS y r e v ^ are POWERFUL @rmoff

Slide 115

Slide 115

Learn Kafka. Start building with Apache Kafka at Confluent Developer. developer.confluent.io

Slide 116

Slide 116

@rmoff #EOF rmoff.dev/talks