Keeping your Data Close and your Caches Hotter

Keep your Data Close and your Caches Hotter using Apache Kafka, Connect and KSQL @gamussa | @riferrei | #IMCSummit

2 @gamussa | @riferrei | #IMCSummit

Raffle, yeah 🚀

Raffle, yeah 🚀 Follow @gamussa 📸🖼👬 Tag @gamussa @riferrei @riferrei With #IMCSummit

4 Data is only useful if it is Fresh and Contextual @gamussa | @riferrei | #IMCSummit

@gamussa | @riferrei | #IMCSummit

What if the airbag deploys 30 seconds after the collision? @gamussa | @riferrei | #IMCSummit

@gamussa | @riferrei | #IMCSummit

December 6th, 2010: Commuter rail train hits elderly driver @gamussa | @riferrei | #IMCSummit

7 What if the information about the commuter rail train is outdated? @gamussa | @riferrei | #IMCSummit

8

8 March 29th, 2019: Wife said that she had put new toilet paper

9 Caches can be a Solution for Data that is Fresh @gamussa | @riferrei | #IMCSummit

10 APIs need to access data freely and easily API Read Write Cache Read Write @gamussa | @riferrei | #IMCSummit

10 APIs need to access data freely and easily ● Data should never be treated as a API Read Write Cache scarce resource in applications Read Write @gamussa | @riferrei | #IMCSummit

10 APIs need to access data freely and easily ● Data should never be treated as a API Read Write Cache scarce resource in applications ● Latency should be kept as minimal to ensure a better user experience Read Write @gamussa | @riferrei | #IMCSummit

10 APIs need to access data freely and easily ● Data should never be treated as a API Read Write Cache scarce resource in applications ● Latency should be kept as minimal to ensure a better user experience Read Write ● Data should be not be static: keep the data fresh continuously @gamussa | @riferrei | #IMCSummit

10 APIs need to access data freely and easily ● Data should never be treated as a API Read Write Cache scarce resource in applications ● Latency should be kept as minimal to ensure a better user experience Read Write ● Data should be not be static: keep the data fresh continuously ● Find ways to handle large amounts of data without breaking the APIs @gamussa | @riferrei | #IMCSummit

11 Caches can be either built-in or distributed Built-in Caches API Read Cache Write Distributed Caches API Cache Read Write Cache Cache @gamussa | @riferrei | #IMCSummit

11 Caches can be either built-in or distributed Built-in Caches API Read Cache Write ● If data can fit into the API memory, then you should use built-in caches Distributed Caches API Cache Read Write Cache Cache @gamussa | @riferrei | #IMCSummit

11 Caches can be either built-in or distributed Built-in Caches API Read Cache Write ● If data can fit into the API memory, then you should use built-in caches Distributed Caches API Cache Read Write ● Otherwise, you may need to use distributed caches for large sizes Cache Cache @gamussa | @riferrei | #IMCSummit

11 Caches can be either built-in or distributed Built-in Caches API Read Cache Write ● If data can fit into the API memory, then you should use built-in caches Distributed Caches API Cache Read Write Cache ● Otherwise, you may need to use distributed caches for large sizes ● Some cache implementations provides the best of both cases Cache @gamussa | @riferrei | #IMCSummit

11 Caches can be either built-in or distributed Built-in Caches API Read Cache Write ● If data can fit into the API memory, then you should use built-in caches Distributed Caches API Cache Read Write Cache Cache ● Otherwise, you may need to use distributed caches for large sizes ● Some cache implementations provides the best of both cases ● For distributed caches, make sure to always find a good way to O(1) @gamussa | @riferrei | #IMCSummit

12 DEMO @gamussa | @riferrei | #IMCSummit

13 Application x-ray ● Confluent Cloud Cluster ● AWS and Terraform ● Spring Boot Application ● Apache Kafka Connect ● Confluent KSQL ● Redis Cache ● AWS Lambda ● Amazon Alexa @gamussa | @riferrei | #IMCSummit

14 sourcecode of this application @gamussa | @riferrei | #IMCSummit

15 Caching Patterns @gamussa | @riferrei | #IMCSummit

Caching Pattern: Cache API Kafka Connect Kafka Connect Refresh Ahead ● Proactively updates the cache ● Keep the entries always in-sync ● Ideal for latency sensitive cases ● Ideal when data read is costly ● It may need initial data loading @gamussa | @riferrei | #IMCSummit

Transform and adapt records before delivery Caching Pattern: Refresh Ahead / Adapt Application Cache API Kafka Connect Kafka Connect ● Proactively updates the cache ● Keep the entries always in-sync ● Ideal for latency sensitive cases ● Ideal when data read is costly ● It may need initial data loading Schema Registry for canonical models @gamussa | @riferrei | #IMCSummit

Caching Pattern: Write Behind Application Cache API Kafka Connect Kafka Connect ● Removes I/O pressure from app ● Allows true horizontal scalability ● Ensures ordering and persistence ● Minimizes DB code complexity ● Totally handles DB unavailability @gamussa | @riferrei | #IMCSummit

Transform and adapt records before delivery Caching Pattern: Write Behind / Adapt Application Cache API Kafka Connect Kafka Connect ● Removes I/O pressure from app ● Allows true horizontal scalability ● Ensures ordering and persistence ● Minimizes DB code complexity ● Totally handles DB unavailability Schema Registry for canonical models @gamussa | @riferrei | #IMCSummit

Caching Pattern: Event Federation ● Replicates data across regions ● Keep multiple regions in-sync ● Great to improve RPO and RTO ● Handles lazy/slow networks well ● Works well if its used along with Confluent Replicator Read-Through and Write-Through patterns. <<MirrorMaker>> @gamussa | @riferrei | #IMCSummit

21 Kafka Connect Implementation Strategies @gamussa | @riferrei | #IMCSummit

Kafka Connect Kafka Connect support for In-Memory Caches ● Connector for Redis is open and it Kafka Connect is available in Confluent Hub ● Connector for Memcached is open and it is available in Confluent Hub Kafka Connect ● Connectors for both GridGain and Apache Ignite implementations. Kafka Connect ● Connector for InfiniSpan is open and is maintained by Red Hat @gamussa | @riferrei | #IMCSummit

Oracle GoldenGate Frameworks for other In-Memory Caches ● Oracle provides HotCache from Hazelcast Jet GoldenGate for Oracle Coherence ● Hazelcast has the Jet framework, which provides support for Kafka Spring Data Spring Kafka ● Pivotal GemFire (Apache Geode) has good support from Spring Connect Framework ● Good news: you can always write your own sink using Connect API @gamussa | @riferrei Any Cache | #IMCSummit

Interested on DB CDC? Then meet Debezium! ● Amazing CDC technology to pull data out from databases to Kafka ● Works in a log level, which means true CDC implementation for your projects instead of record polling ● Open-source maintained by Red Hat. Have broad support for many popular databases. ● It is built on top of Kafka Connect @gamussa | @riferrei | #IMCSummit

Support for Running Kafka Connect Servers ● Run by yourself on BareMetal: https://kafka.apache.org/downloads https:// Kafka Connect www.confluent.io/download ● IaaS on AWS or Google Cloud: https://github.com/confluentinc/ccloud-tools ● Running using Docker Containers: https://hub.docker.com/r/confluentinc/cp-kafkaconnect/ ● Running using Kubernetes: https:// github.com/confluentinc/cp-helm-chart https:// www.confluent.io/confluent-operator/ @gamussa | @riferrei | #IMCSummit

26 Stay in touch cnfl.io/blog cnfl.io/slack cnfl.io/meetups

Thanks! @riferrei ricardo@confluent.io @gamussa viktor@confluent.io https://slackpass.io/confluentcommunity #connect #ksql @gamussa | @riferrei @ | #IMCSummit

28

Keeping your Data Close and your Caches Hotter

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40