A presentation at DataEngBytes by Vidya Venugopal
Ever changing data model - Schema management for the future Vidya Venugopal @venuvid
Software engineer Founder of thekafkanerd.io Spent a lot of time on Enterprise Integration Platforms & Distributed systems Now, full on with Event Streaming Senior Dev, Vanguard Australia. DataEngBytes Aug 2020 @venuvid
Agenda The problem space Rise of Event Streaming Stepping into Event-driven Architecture Intro to Schemas Schema management way forward Era of Web Services DataEngBytes Aug 2020 @venuvid
The problem space Rise of Event Streaming Stepping into Event-driven Architecture Intro to Schemas Schema management way forward Era of Web Services DataEngBytes Aug 2020 @venuvid
The need for Schemas DataEngBytes Aug 2020 @venuvid
The need for schemas Customer Service A (eCommerce) DataEngBytes Aug 2020 @venuvid
The need for schemas Customer Service A Service B (eCommerce) (Analytics) DataEngBytes Aug 2020 @venuvid
The need for schemas Customer { “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec” } Service A Service B (eCommerce) (Analytics) DataEngBytes Aug 2020 @venuvid
The need for schemas Customer { { “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec” } “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec” } Service A Service B (eCommerce) (Analytics) DataEngBytes Aug 2020 @venuvid
The need for schemas Customer { { “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec”, “keyd”:”valued” } “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec”, “keyd”:”valued” } Service A Service B (eCommerce) (Analytics) DataEngBytes Aug 2020 @venuvid
The need for schemas Customer { { “keya”:”valuea”, “keyb”:”valueb”, “keyd”:”valued” “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec”, “keyd”:”valued” } } Service A Service B (eCommerce) (Analytics) DataEngBytes Aug 2020 @venuvid
The need for schemas Customer { { “keya”:”valuea”, “keyb”:”valueb”, “keyd”:”valued” “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec”, “keyd”:”valued” } } Service A Service B (eCommerce) (Analytics) DataEngBytes Aug 2020 @venuvid
The need for schemas Customer { { “keya”:”valuea”, “keyb”:”valueb”, “keyd”:”valued” } “keya”:”valuea”, “keyb”:”valueb”, “keyd”:”valued” } Service A Service B (eCommerce) (Analytics) DataEngBytes Aug 2020 @venuvid
The need for schemas { { “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued” “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued” } } Service A Service B (eCommerce) (Analytics) { “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued” } Service C (Delivery) DataEngBytes Aug 2020 @venuvid
The need for schemas { { “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued” “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued” } } Service A Service B (eCommerce) (Analytics) { “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued” } Service C (Delivery) DataEngBytes Aug 2020 @venuvid
DataEngBytes Aug 2020 @venuvid
The problem space Rise of Event Streaming Stepping into Event-driven Architecture Intro to Schemas Schema management way forward Era of Web Services DataEngBytes Aug 2020 @venuvid
Schemas to the rescue Schema/Contract Service A DataEngBytes Aug 2020 Service B @venuvid
Schemas to the rescue (Examples) DataEngBytes Aug 2020 @venuvid
Schemas to the rescue Schema/Contract Ver 1.0 Ver 1.1 Service A Schema/Contract PAYLOAD Service B Ver 1.0 Service C Ver 1.1 Schema/Contract DataEngBytes Aug 2020 @venuvid
Summary Schemas are the contracts between services Schemas improve data quality Schemas allows versioning It’s a way of documenting your interfaces Enables contract testing DataEngBytes Aug 2020 @venuvid
The problem space Rise of Event Streaming Stepping into Event-driven Architecture Intro to Schemas Schema management way forward Era of Web Services DataEngBytes Aug 2020 @venuvid
Era of Web Services Schema/Contract (XSD/JSON Schema) Service A DataEngBytes Aug 2020 HTTP SOAP/REST Service B @venuvid
Era of Web Services
Era of Web Services DataEngBytes Aug 2020 @venuvid
Era of Web Services DataEngBytes Aug 2020 @venuvid
Contract Testing ● Integration testing, but just with Schema/Contract contracts ● Validate the end to end flow without building one. ● Contract Test Contract Mock Find bugs upfront, don’t wait till the actual integration test cycle. Reference
Contract Testing - Example Reference - https://pactflow.io/how-pact-works/#slide-3
The problem space Rise of Event Streaming Stepping into Event-driven Architecture Intro to Schemas Schema management way forward Era of Web Services DataEngBytes Aug 2020 @venuvid
Event Driven Architecture Adoption of Message Queues = Event driven communication + Being real-time + Being asynchronous DataEngBytes Aug 2020 @venuvid
Event Driven Architecture Schema/Contract (XSD/JSON Schema) MQ/JMS Service A DataEngBytes Aug 2020 Service B @venuvid
EDA - Schema Validation MQ/JMS Service A DataEngBytes Aug 2020 Service B @venuvid
The problem space Rise of Event Streaming Stepping into Event-driven Architecture Intro to Schemas Schema management way forward Era of Web Services DataEngBytes Aug 2020 @venuvid
Event Streaming Rise of Event Streams = Capture All possible Events + at a Scale + Process them on the fly DataEngBytes Aug 2020 @venuvid
Event Streaming - A brief ● Record all events at a scale, in and around an Enterprise ● Process/Interrogate/Analyse events on the fly (Event Stream Processing) ● Sophisticated real-time processing abilities: Windowing, re-ordering, etc. ● Good examples: Apache Kafka, AWS Kinesis, Google Pub-Sub Google Cloud Pub/Sub
Rise of Schema Registry DataEngBytes Aug 2020 @venuvid
Schema Registry Schema/Contract (JSON/AVRO Schema) A 1 STREAMS/ TOPICS B 2 PRODUCERS CONSUMERS C 3 D 4 DataEngBytes Aug 2020 @venuvid
Schema Registry - How it works Schema Registry Schema/Contract (JSON/AVRO Schema) A 1 STREAMS/ TOPICS B 2 PRODUCERS CONSUMERS C 3 D 4 DataEngBytes Aug 2020 @venuvid
Schema Registry - How it works (Example) Reference - https://docs.confluent.io/current/schema-registry/index.html DataEngBytes Aug 2020 @venuvid
Schema Registry - How it works (Producer Example) 1. Define Schema Reference - https://github.com/vidyavenu/schema-registry-example/ DataEngBytes Aug 2020 @venuvid
Schema Registry - How it works (Producer Example) 2. Register Schema to a topic Reference - https://github.com/vidyavenu/schema-registry-example/ DataEngBytes Aug 2020 @venuvid
Schema Registry - How it works (Producer Example) 3. Produce data Reference - https://github.com/vidyavenu/schema-registry-example/ DataEngBytes Aug 2020 @venuvid
Schema Registry - How it works (Producer Example) 4. Messages sent ! Reference - https://github.com/vidyavenu/schema-registry-example/ DataEngBytes Aug 2020 @venuvid
Schema Registry - How it works (Producer Example) 6. Producer modifies the payload Reference - https://github.com/vidyavenu/schema-registry-example/ DataEngBytes Aug 2020 @venuvid
Schema Registry - How it works (Producer Example) 7. Validation fails :) Reference - https://github.com/vidyavenu/schema-registry-example/ DataEngBytes Aug 2020 @venuvid
Schema Registry - Summary ● Centralised schema registry between Producers, Consumers & Event Streams ● Simplified serialization and deserialization ● Allows versioning & compatibility checks (forward/backward compatibility) DataEngBytes Aug 2020 @venuvid
The problem space Rise of Event Streaming Stepping into Event-driven Architecture Intro to Schemas Schema management way forward Era of Web Services DataEngBytes Aug 2020 @venuvid
DataEngBytes Aug 2020 @venuvid
Here’s what was said about this presentation on social media.
📣. SPEAKER ANNOUNCEMENT 📣
— DataEngBytes (@dataengconfau) August 5, 2020
Very excited to let you know that we have @venuvid of @Vanguard_Group speaking at #DataEngBytes with a talk on ever changing data models and schema management for the future. pic.twitter.com/YK6hGMpDI6