Keeping your Data Close and your Caches Hotter

A presentation at In-memory Computing Summit EU 2019 in June 2019 in London, UK by Viktor Gamov

Slide 1

Slide 1

Keep your Data Close and your Caches Hotter using Apache Kafka, Connect and KSQL @gamussa | @riferrei | #IMCSummit

Slide 2

Slide 2

2 @gamussa | @riferrei | #IMCSummit

Slide 3

Slide 3

Raffle, yeah πŸš€

Slide 4

Slide 4

Raffle, yeah πŸš€ Follow @gamussa πŸ“ΈπŸ–ΌπŸ‘¬ Tag @gamussa @riferrei @riferrei With #IMCSummit

Slide 5

Slide 5

4 Data is only useful if it is Fresh and Contextual @gamussa | @riferrei | #IMCSummit

Slide 6

Slide 6

@gamussa | @riferrei | #IMCSummit

Slide 7

Slide 7

What if the airbag deploys 30 seconds after the collision? @gamussa | @riferrei | #IMCSummit

Slide 8

Slide 8

@gamussa | @riferrei | #IMCSummit

Slide 9

Slide 9

December 6th, 2010: Commuter rail train hits elderly driver @gamussa | @riferrei | #IMCSummit

Slide 10

Slide 10

7 What if the information about the commuter rail train is outdated? @gamussa | @riferrei | #IMCSummit

Slide 11

Slide 11

8

Slide 12

Slide 12

8 March 29th, 2019: Wife said that she had put new toilet paper

Slide 13

Slide 13

9 Caches can be a Solution for Data that is Fresh @gamussa | @riferrei | #IMCSummit

Slide 14

Slide 14

10 APIs need to access data freely and easily API Read Write Cache Read Write @gamussa | @riferrei | #IMCSummit

Slide 15

Slide 15

10 APIs need to access data freely and easily ● Data should never be treated as a API Read Write Cache scarce resource in applications Read Write @gamussa | @riferrei | #IMCSummit

Slide 16

Slide 16

10 APIs need to access data freely and easily ● Data should never be treated as a API Read Write Cache scarce resource in applications ● Latency should be kept as minimal to ensure a better user experience Read Write @gamussa | @riferrei | #IMCSummit

Slide 17

Slide 17

10 APIs need to access data freely and easily ● Data should never be treated as a API Read Write Cache scarce resource in applications ● Latency should be kept as minimal to ensure a better user experience Read Write ● Data should be not be static: keep the data fresh continuously @gamussa | @riferrei | #IMCSummit

Slide 18

Slide 18

10 APIs need to access data freely and easily ● Data should never be treated as a API Read Write Cache scarce resource in applications ● Latency should be kept as minimal to ensure a better user experience Read Write ● Data should be not be static: keep the data fresh continuously ● Find ways to handle large amounts of data without breaking the APIs @gamussa | @riferrei | #IMCSummit

Slide 19

Slide 19

11 Caches can be either built-in or distributed Built-in Caches API Read Cache Write Distributed Caches API Cache Read Write Cache Cache @gamussa | @riferrei | #IMCSummit

Slide 20

Slide 20

11 Caches can be either built-in or distributed Built-in Caches API Read Cache Write ● If data can fit into the API memory, then you should use built-in caches Distributed Caches API Cache Read Write Cache Cache @gamussa | @riferrei | #IMCSummit

Slide 21

Slide 21

11 Caches can be either built-in or distributed Built-in Caches API Read Cache Write ● If data can fit into the API memory, then you should use built-in caches Distributed Caches API Cache Read Write ● Otherwise, you may need to use distributed caches for large sizes Cache Cache @gamussa | @riferrei | #IMCSummit

Slide 22

Slide 22

11 Caches can be either built-in or distributed Built-in Caches API Read Cache Write ● If data can fit into the API memory, then you should use built-in caches Distributed Caches API Cache Read Write Cache ● Otherwise, you may need to use distributed caches for large sizes ● Some cache implementations provides the best of both cases Cache @gamussa | @riferrei | #IMCSummit

Slide 23

Slide 23

11 Caches can be either built-in or distributed Built-in Caches API Read Cache Write ● If data can fit into the API memory, then you should use built-in caches Distributed Caches API Cache Read Write Cache Cache ● Otherwise, you may need to use distributed caches for large sizes ● Some cache implementations provides the best of both cases ● For distributed caches, make sure to always find a good way to O(1) @gamussa | @riferrei | #IMCSummit

Slide 24

Slide 24

12 DEMO @gamussa | @riferrei | #IMCSummit

Slide 25

Slide 25

13 Application x-ray ● Confluent Cloud Cluster ● AWS and Terraform ● Spring Boot Application ● Apache Kafka Connect ● Confluent KSQL ● Redis Cache ● AWS Lambda ● Amazon Alexa @gamussa | @riferrei | #IMCSummit

Slide 26

Slide 26

14 sourcecode of this application @gamussa | @riferrei | #IMCSummit

Slide 27

Slide 27

15 Caching Patterns @gamussa | @riferrei | #IMCSummit

Slide 28

Slide 28

Caching Pattern: Cache API Kafka Connect Kafka Connect Refresh Ahead ● Proactively updates the cache ● Keep the entries always in-sync ● Ideal for latency sensitive cases ● Ideal when data read is costly ● It may need initial data loading @gamussa | @riferrei | #IMCSummit

Slide 29

Slide 29

Transform and adapt records before delivery Caching Pattern: Refresh Ahead / Adapt Application Cache API Kafka Connect Kafka Connect ● Proactively updates the cache ● Keep the entries always in-sync ● Ideal for latency sensitive cases ● Ideal when data read is costly ● It may need initial data loading Schema Registry for canonical models @gamussa | @riferrei | #IMCSummit

Slide 30

Slide 30

Caching Pattern: Write Behind Application Cache API Kafka Connect Kafka Connect ● Removes I/O pressure from app ● Allows true horizontal scalability ● Ensures ordering and persistence ● Minimizes DB code complexity ● Totally handles DB unavailability @gamussa | @riferrei | #IMCSummit

Slide 31

Slide 31

Transform and adapt records before delivery Caching Pattern: Write Behind / Adapt Application Cache API Kafka Connect Kafka Connect ● Removes I/O pressure from app ● Allows true horizontal scalability ● Ensures ordering and persistence ● Minimizes DB code complexity ● Totally handles DB unavailability Schema Registry for canonical models @gamussa | @riferrei | #IMCSummit

Slide 32

Slide 32

Caching Pattern: Event Federation ● Replicates data across regions ● Keep multiple regions in-sync ● Great to improve RPO and RTO ● Handles lazy/slow networks well ● Works well if its used along with Confluent Replicator Read-Through and Write-Through patterns. <<MirrorMaker>> @gamussa | @riferrei | #IMCSummit

Slide 33

Slide 33

21 Kafka Connect Implementation Strategies @gamussa | @riferrei | #IMCSummit

Slide 34

Slide 34

Kafka Connect Kafka Connect support for In-Memory Caches ● Connector for Redis is open and it Kafka Connect is available in Confluent Hub ● Connector for Memcached is open and it is available in Confluent Hub Kafka Connect ● Connectors for both GridGain and Apache Ignite implementations. Kafka Connect ● Connector for InfiniSpan is open and is maintained by Red Hat @gamussa | @riferrei | #IMCSummit

Slide 35

Slide 35

Oracle GoldenGate Frameworks for other In-Memory Caches ● Oracle provides HotCache from Hazelcast Jet GoldenGate for Oracle Coherence ● Hazelcast has the Jet framework, which provides support for Kafka Spring Data Spring Kafka ● Pivotal GemFire (Apache Geode) has good support from Spring Connect Framework ● Good news: you can always write your own sink using Connect API @gamussa | @riferrei Any Cache | #IMCSummit

Slide 36

Slide 36

Interested on DB CDC? Then meet Debezium! ● Amazing CDC technology to pull data out from databases to Kafka ● Works in a log level, which means true CDC implementation for your projects instead of record polling ● Open-source maintained by Red Hat. Have broad support for many popular databases. ● It is built on top of Kafka Connect @gamussa | @riferrei | #IMCSummit

Slide 37

Slide 37

Support for Running Kafka Connect Servers ● Run by yourself on BareMetal: https://kafka.apache.org/downloads https:// Kafka Connect www.confluent.io/download ● IaaS on AWS or Google Cloud: https://github.com/confluentinc/ccloud-tools ● Running using Docker Containers: https://hub.docker.com/r/confluentinc/cp-kafkaconnect/ ● Running using Kubernetes: https:// github.com/confluentinc/cp-helm-chart https:// www.confluent.io/confluent-operator/ @gamussa | @riferrei | #IMCSummit

Slide 38

Slide 38

26 Stay in touch cnfl.io/blog cnfl.io/slack cnfl.io/meetups

Slide 39

Slide 39

Thanks! @riferrei ricardo@confluent.io @gamussa viktor@confluent.io https://slackpass.io/confluentcommunity #connect #ksql @gamussa | @riferrei @ | #IMCSummit

Slide 40

Slide 40

28