Ever changing data model - schema management for the future

A presentation at DataEngBytes in August 2020 in by Vidya Venugopal

Slide 1

Slide 1

Ever changing data model - Schema management for the future Vidya Venugopal @venuvid

Slide 2

Slide 2

Software engineer Founder of thekafkanerd.io Spent a lot of time on Enterprise Integration Platforms & Distributed systems Now, full on with Event Streaming Senior Dev, Vanguard Australia. DataEngBytes Aug 2020 @venuvid

Slide 3

Slide 3

Agenda The problem space Rise of Event Streaming Stepping into Event-driven Architecture Intro to Schemas Schema management way forward Era of Web Services DataEngBytes Aug 2020 @venuvid

Slide 4

Slide 4

The problem space Rise of Event Streaming Stepping into Event-driven Architecture Intro to Schemas Schema management way forward Era of Web Services DataEngBytes Aug 2020 @venuvid

Slide 5

Slide 5

The need for Schemas DataEngBytes Aug 2020 @venuvid

Slide 6

Slide 6

The need for schemas Customer Service A (eCommerce) DataEngBytes Aug 2020 @venuvid

Slide 7

Slide 7

The need for schemas Customer Service A Service B (eCommerce) (Analytics) DataEngBytes Aug 2020 @venuvid

Slide 8

Slide 8

The need for schemas Customer { “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec” } Service A Service B (eCommerce) (Analytics) DataEngBytes Aug 2020 @venuvid

Slide 9

Slide 9

The need for schemas Customer { { “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec” } “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec” } Service A Service B (eCommerce) (Analytics) DataEngBytes Aug 2020 @venuvid

Slide 10

Slide 10

The need for schemas Customer { { “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec”, “keyd”:”valued” } “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec”, “keyd”:”valued” } Service A Service B (eCommerce) (Analytics) DataEngBytes Aug 2020 @venuvid

Slide 11

Slide 11

The need for schemas Customer { { “keya”:”valuea”, “keyb”:”valueb”, “keyd”:”valued” “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec”, “keyd”:”valued” } } Service A Service B (eCommerce) (Analytics) DataEngBytes Aug 2020 @venuvid

Slide 12

Slide 12

The need for schemas Customer { { “keya”:”valuea”, “keyb”:”valueb”, “keyd”:”valued” “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec”, “keyd”:”valued” } } Service A Service B (eCommerce) (Analytics) DataEngBytes Aug 2020 @venuvid

Slide 13

Slide 13

The need for schemas Customer { { “keya”:”valuea”, “keyb”:”valueb”, “keyd”:”valued” } “keya”:”valuea”, “keyb”:”valueb”, “keyd”:”valued” } Service A Service B (eCommerce) (Analytics) DataEngBytes Aug 2020 @venuvid

Slide 14

Slide 14

The need for schemas { { “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued” “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued” } } Service A Service B (eCommerce) (Analytics) { “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued” } Service C (Delivery) DataEngBytes Aug 2020 @venuvid

Slide 15

Slide 15

The need for schemas { { “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued” “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued” } } Service A Service B (eCommerce) (Analytics) { “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued” } Service C (Delivery) DataEngBytes Aug 2020 @venuvid

Slide 16

Slide 16

DataEngBytes Aug 2020 @venuvid

Slide 17

Slide 17

The problem space Rise of Event Streaming Stepping into Event-driven Architecture Intro to Schemas Schema management way forward Era of Web Services DataEngBytes Aug 2020 @venuvid

Slide 18

Slide 18

Schemas to the rescue Schema/Contract Service A DataEngBytes Aug 2020 Service B @venuvid

Slide 19

Slide 19

Schemas to the rescue (Examples) DataEngBytes Aug 2020 @venuvid

Slide 20

Slide 20

Schemas to the rescue Schema/Contract Ver 1.0 Ver 1.1 Service A Schema/Contract PAYLOAD Service B Ver 1.0 Service C Ver 1.1 Schema/Contract DataEngBytes Aug 2020 @venuvid

Slide 21

Slide 21

Summary Schemas are the contracts between services Schemas improve data quality Schemas allows versioning It’s a way of documenting your interfaces Enables contract testing DataEngBytes Aug 2020 @venuvid

Slide 22

Slide 22

The problem space Rise of Event Streaming Stepping into Event-driven Architecture Intro to Schemas Schema management way forward Era of Web Services DataEngBytes Aug 2020 @venuvid

Slide 23

Slide 23

Era of Web Services Schema/Contract (XSD/JSON Schema) Service A DataEngBytes Aug 2020 HTTP SOAP/REST Service B @venuvid

Slide 24

Slide 24

Era of Web Services

Slide 25

Slide 25

Era of Web Services DataEngBytes Aug 2020 @venuvid

Slide 26

Slide 26

Era of Web Services DataEngBytes Aug 2020 @venuvid

Slide 27

Slide 27

Contract Testing ● Integration testing, but just with Schema/Contract contracts ● Validate the end to end flow without building one. ● Contract Test Contract Mock Find bugs upfront, don’t wait till the actual integration test cycle. Reference

  • https://martinfowler.com/bliki/ContractTest.html

Slide 28

Slide 28

Contract Testing - Example Reference - https://pactflow.io/how-pact-works/#slide-3

Slide 29

Slide 29

The problem space Rise of Event Streaming Stepping into Event-driven Architecture Intro to Schemas Schema management way forward Era of Web Services DataEngBytes Aug 2020 @venuvid

Slide 30

Slide 30

Event Driven Architecture Adoption of Message Queues = Event driven communication + Being real-time + Being asynchronous DataEngBytes Aug 2020 @venuvid

Slide 31

Slide 31

Event Driven Architecture Schema/Contract (XSD/JSON Schema) MQ/JMS Service A DataEngBytes Aug 2020 Service B @venuvid

Slide 32

Slide 32

EDA - Schema Validation MQ/JMS Service A DataEngBytes Aug 2020 Service B @venuvid

Slide 33

Slide 33

The problem space Rise of Event Streaming Stepping into Event-driven Architecture Intro to Schemas Schema management way forward Era of Web Services DataEngBytes Aug 2020 @venuvid

Slide 34

Slide 34

Event Streaming Rise of Event Streams = Capture All possible Events + at a Scale + Process them on the fly DataEngBytes Aug 2020 @venuvid

Slide 35

Slide 35

Event Streaming - A brief ● Record all events at a scale, in and around an Enterprise ● Process/Interrogate/Analyse events on the fly (Event Stream Processing) ● Sophisticated real-time processing abilities: Windowing, re-ordering, etc. ● Good examples: Apache Kafka, AWS Kinesis, Google Pub-Sub Google Cloud Pub/Sub

Slide 36

Slide 36

Rise of Schema Registry DataEngBytes Aug 2020 @venuvid

Slide 37

Slide 37

Schema Registry Schema/Contract (JSON/AVRO Schema) A 1 STREAMS/ TOPICS B 2 PRODUCERS CONSUMERS C 3 D 4 DataEngBytes Aug 2020 @venuvid

Slide 38

Slide 38

Schema Registry - How it works Schema Registry Schema/Contract (JSON/AVRO Schema) A 1 STREAMS/ TOPICS B 2 PRODUCERS CONSUMERS C 3 D 4 DataEngBytes Aug 2020 @venuvid

Slide 39

Slide 39

Schema Registry - How it works (Example) Reference - https://docs.confluent.io/current/schema-registry/index.html DataEngBytes Aug 2020 @venuvid

Slide 40

Slide 40

Schema Registry - How it works (Producer Example) 1. Define Schema Reference - https://github.com/vidyavenu/schema-registry-example/ DataEngBytes Aug 2020 @venuvid

Slide 41

Slide 41

Schema Registry - How it works (Producer Example) 2. Register Schema to a topic Reference - https://github.com/vidyavenu/schema-registry-example/ DataEngBytes Aug 2020 @venuvid

Slide 42

Slide 42

Schema Registry - How it works (Producer Example) 3. Produce data Reference - https://github.com/vidyavenu/schema-registry-example/ DataEngBytes Aug 2020 @venuvid

Slide 43

Slide 43

Schema Registry - How it works (Producer Example) 4. Messages sent ! Reference - https://github.com/vidyavenu/schema-registry-example/ DataEngBytes Aug 2020 @venuvid

Slide 44

Slide 44

Schema Registry - How it works (Producer Example) 6. Producer modifies the payload Reference - https://github.com/vidyavenu/schema-registry-example/ DataEngBytes Aug 2020 @venuvid

Slide 45

Slide 45

Schema Registry - How it works (Producer Example) 7. Validation fails :) Reference - https://github.com/vidyavenu/schema-registry-example/ DataEngBytes Aug 2020 @venuvid

Slide 46

Slide 46

Schema Registry - Summary ● Centralised schema registry between Producers, Consumers & Event Streams ● Simplified serialization and deserialization ● Allows versioning & compatibility checks (forward/backward compatibility) DataEngBytes Aug 2020 @venuvid

Slide 47

Slide 47

The problem space Rise of Event Streaming Stepping into Event-driven Architecture Intro to Schemas Schema management way forward Era of Web Services DataEngBytes Aug 2020 @venuvid

Slide 48

Slide 48

DataEngBytes Aug 2020 @venuvid