Kafka as a Platform: the Ecosystem from the Ground Up

A presentation at Big Data Conference Europe in November 2020 in by Robin Moffatt

Slide 1

Slide 1

Kafka as a Platform: the Ecosystem from the Ground Up Robin Moffatt | @rmoff

Slide 2

Slide 2

EVENTS @rmoff

Slide 3

Slide 3

EVENTS @rmoff

Slide 4

Slide 4

• • EVENTS d e n e p p a h g n i h t e Som d e n e p p a h t a Wh

Slide 5

Slide 5

Human generated events A Sale A Stock movement @rmoff

Slide 6

Slide 6

Machine generated events Networking IoT Applications @rmoff

Slide 7

Slide 7

EVENTS are EVERYWHERE @rmoff

Slide 8

Slide 8

EVENTS y r e v ^ are POWERFUL @rmoff

Slide 9

Slide 9

Slide 10

Slide 10

Slide 11

Slide 11

K V

Slide 12

Slide 12

LOG @rmoff

Slide 13

Slide 13

K V

Slide 14

Slide 14

K V

Slide 15

Slide 15

K V

Slide 16

Slide 16

K V

Slide 17

Slide 17

K V

Slide 18

Slide 18

K V

Slide 19

Slide 19

K V

Slide 20

Slide 20

Immutable Event Log Old New Events are added at the end of the log @rmoff

Slide 21

Slide 21

TOPICS @rmoff

Slide 22

Slide 22

Topics Clicks Orders Customers Topics are similar in concept to tables in a database @rmoff

Slide 23

Slide 23

PARTITIONS @rmoff

Slide 24

Slide 24

Partitions Clicks p0 P1 P2 Messages are guaranteed to be strictly ordered within a partition @rmoff

Slide 25

Slide 25

PUB / SUB @rmoff

Slide 26

Slide 26

PUB / SUB @rmoff

Slide 27

Slide 27

Producing data Old New Messages are added at the end of the log @rmoff

Slide 28

Slide 28

partition 0 … partition 1 producer … partition 2 … Partitioned Topic

Slide 29

Slide 29

package main import ( “gopkg.in/confluentinc/confluent-kafka-go.v1/kafka” ) func main() { topic := “test_topic” p, _ := kafka.NewProducer(&kafka.ConfigMap{ “bootstrap.servers”: “localhost:9092”}) defer p.Close() p.Produce(&kafka.Message{ TopicPartition: kafka.TopicPartition{Topic: &topic, Partition: 0}, Value: []byte(“Hello world”)}, nil) }

Slide 30

Slide 30

Producing to Kafka - No Key Time Partition 1 Partition 2 Partition 3 Messages will be produced in a round robin fashion Partition 4 @rmoff

Slide 31

Slide 31

Producing to Kafka - With Key Time Partition 1 A Partition 2 B hash(key) % numPartitions = N Partition 3 C Partition 4 D @rmoff

Slide 32

Slide 32

Producers partition 0 … partition 1 producer … partition 2 … Partitioned Topic • A client application • Puts messages into topics • Handles partitioning, network protocol • Java, Go, .NET, C/C++, Python • Also every other language Plus REST proxy if not

Slide 33

Slide 33

PUB / SUB @rmoff

Slide 34

Slide 34

Consuming data - access is only sequential Read to offset & scan Old New @rmoff

Slide 35

Slide 35

Consumers have a position of their own Old New Sally is here Scan @rmoff

Slide 36

Slide 36

Consumers have a position of their own Old New Fred is here Scan Sally is here Scan @rmoff

Slide 37

Slide 37

Consumers have a position of their own Rick is here Scan Old New Fred is here Scan Sally is here Scan @rmoff

Slide 38

Slide 38

c, _ := kafka.NewConsumer(&cm) defer c.Close() c.Subscribe(topic, nil) for { select { case ev := <-c.Events(): switch ev.(type) { case *kafka.Message: km := ev.(*kafka.Message) fmt.Printf(“✅ Message ‘%v’ received from topic ‘%v’\n”, string(km.Value), string(*km.TopicPartition.Topic)) } } }

Slide 39

Slide 39

Consuming From Kafka - Single Consumer Partition 1 Partition 2 Partition 3 A Partition 4 @rmoff

Slide 40

Slide 40

Consuming From Kafka - Multiple Consumers Partition 1 A1 Partition 2 Partition 3 A2 Partition 4 @rmoff

Slide 41

Slide 41

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 CC1 1 CA1 1 A2 Partition 4 @rmoff

Slide 42

Slide 42

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 C1 C2 C3 C4 Partition 4 @rmoff

Slide 43

Slide 43

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 C1 C2 C3 3 Partition 4 @rmoff

Slide 44

Slide 44

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 C1 C2 C3 Partition 4 @rmoff

Slide 45

Slide 45

Consumers partition 0 … partition 1 … consumer A consumer A consumer A partition 2 … Partitioned Topic consumer B • A client application • Reads messages from topics • Horizontally, elastically scalable (if stateless) • Java, Go, .NET, C/C++, Python, everything else Plus REST proxy if not

Slide 46

Slide 46

BROKERS and REPLICATION @rmoff

Slide 47

Slide 47

Leader Partition Leadership and Replication Follower Partition 1 Partition 2 Partition 3 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff

Slide 48

Slide 48

Leader Partition Leadership and Replication Follower Partition 1 Partition 1 Partition 1 Partition 2 Partition 2 Partition 2 Partition 3 Partition 3 Partition 3 Partition 4 Partition 4 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff

Slide 49

Slide 49

Leader Partition Leadership and Replication Follower Partition 1 Partition 1 Partition 1 Partition 2 Partition 2 Partition 2 Partition 3 Partition 3 Partition 3 Partition 4 Partition 4 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff

Slide 50

Slide 50

So far, this is Pretty good @rmoff

Slide 51

Slide 51

So far, this is Pretty good but I’ve not finished yet… @rmoff

Slide 52

Slide 52

Streaming Pipelines Amazon S3 RDBMS HDFS @rmoff

Slide 53

Slide 53

Evolve processing from old systems to new Existing App New App <x> RDBMS @rmoff

Slide 54

Slide 54

Slide 55

Slide 55

Streaming Integration with Kafka Connect syslog Sources Kafka Connect Kafka Brokers @rmoff

Slide 56

Slide 56

Streaming Integration with Kafka Connect Amazon Sinks Google Kafka Connect Kafka Brokers @rmoff

Slide 57

Slide 57

Streaming Integration with Kafka Connect Amazon syslog Google Kafka Connect Kafka Brokers @rmoff

Slide 58

Slide 58

Look Ma, No Code! { “connector.class”: “io.confluent.connect.jdbc.JdbcSourceConnector”, “connection.url”: “jdbc:mysql://asgard:3306/demo”, “table.whitelist”: “sales,orders,customers” } @rmoff

Slide 59

Slide 59

Extensible Connector Transform(s) Converter @rmoff

Slide 60

Slide 60

hub.confluent.io @rmoff

Slide 61

Slide 61

K V

Slide 62

Slide 62

K V

Slide 63

Slide 63

K V

Slide 64

Slide 64

K V

Slide 65

Slide 65

K V ? s i h t s ’ t a h w … t i a W

Slide 66

Slide 66

Lack of schemas – Coupling teams and services 2001 2001 Citrus Heights-Sunrise Blvd Citrus_Hghts 60670001 3400293 34 SAC Sacramento SV Sacramento Valley SAC Sacramento County APCD SMA8 Sacramento Metropolitan Area CA 6920 Sacramento 28 6920 13588 7400 Sunrise Blvd 95610 38 41 56 38.6988889 121 16 15.98999977 -121.271111 10 4284781 650345 52 @rmoff

Slide 67

Slide 67

Serialisation & Schemas JSON Avro Protobuf Schema JSON CSV @rmoff

Slide 68

Slide 68

Serialisation & Schemas JSON Avro Protobuf Schema JSON CSV 👍 👍 👍 😬 https://rmoff.dev/qcon-schemas @rmoff

Slide 69

Slide 69

Schemas Schema Registry Topic producer … consumer

Slide 70

Slide 70

partition 0 … partition 1 … consumer A consumer A consumer A partition 2 … consumer B Partitioned Topic @rmoff

Slide 71

Slide 71

consumer A consumer A consumer A @rmoff

Slide 72

Slide 72

} “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 Photo by Franck V. on Unsplash { @rmoff

Slide 73

Slide 73

Streams of events Time @rmoff

Slide 74

Slide 74

Stream Processing Stream: widgets Stream: widgets_red @rmoff

Slide 75

Slide 75

Stream Processing with Kafka Streams Stream: widgets final StreamsBuilder builder = new StreamsBuilder() .stream(“widgets”, Consumed.with(stringSerde, widgetsSerde)) .filter( (key, widget) -> widget.getColour().equals(“RED”) ) .to(“widgets_red”, Produced.with(stringSerde, widgetsSerde)); Stream: widgets_red @rmoff

Slide 76

Slide 76

Streams Application Streams Application Streams Application @rmoff

Slide 77

Slide 77

Stream Processing with ksqlDB Stream: widgets ksqlDB CREATE STREAM widgets_red AS SELECT * FROM widgets WHERE colour=’RED’; Stream: widgets_red @rmoff

Slide 78

Slide 78

} “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 Photo by Franck V. on Unsplash { @rmoff

Slide 79

Slide 79

SELECT * FROM WIDGETS WHERE WEIGHT_G > 120 { SELECT COUNT(*) FROM WIDGETS GROUP BY PRODUCTION_LINE } SELECT AVG(TEMP_CELCIUS) AS TEMP FROM WIDGETS GROUP BY SENSOR_ID HAVING TEMP>20 Photo by Franck V. on Unsplash “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 CREATE SINK CONNECTOR dw WITH ( Object store, ‘connector.class’ = ‘S3Connector’, data warehouse, ‘topics’ = ‘widgets’ RDBMS …); @rmoff

Slide 80

Slide 80

DEMO @rmoff Photo by Raoul Droog on Unsplas

Slide 81

Slide 81

Summary @rmoff

Slide 82

Slide 82

@rmoff

Slide 83

Slide 83

K V @rmoff

Slide 84

Slide 84

K V @rmoff

Slide 85

Slide 85

The Log @rmoff

Slide 86

Slide 86

Producer Consumer The Log @rmoff

Slide 87

Slide 87

Producer Consumer The Log Connectors @rmoff

Slide 88

Slide 88

Producer Consumer The Log Connectors Streaming Engine @rmoff

Slide 89

Slide 89

Apache Kafka Producer Consumer The Log Connectors Streaming Engine @rmoff

Slide 90

Slide 90

Confluent Platform ksqlDB Producer Consumer The Log Schema Registry Connectors Streaming Engine @rmoff

Slide 91

Slide 91

60 DE VA DV $200 USD off your bill each calendar month for the first three months when you sign up https://rmoff.dev/ccloud Free money! (additional $60 towards your bill 😄 ) Fully Managed Kafka as a Service * T&C: https://www.confluent.io/confluent-cloud-promo-disclaimer

Slide 92

Slide 92

Learn Kafka. Start building with Apache Kafka at Confluent Developer. developer.confluent.io

Slide 93

Slide 93

#EOF @rmoff rmoff.dev/talks youtube.com/rmoff