Ever changing data model -
Schema management for the future
Vidya Venugopal @venuvid
Slide 2
Software engineer Founder of thekafkanerd.io Spent a lot of time on Enterprise Integration Platforms & Distributed systems Now, full on with Event Streaming Senior Dev, Vanguard Australia.
DataEngBytes Aug 2020
@venuvid
Slide 3
Agenda The problem space
Rise of Event Streaming
Stepping into Event-driven Architecture
Intro to Schemas
Schema management way forward
Era of Web Services
DataEngBytes Aug 2020
@venuvid
Slide 4
The problem space
Rise of Event Streaming
Stepping into Event-driven Architecture
Intro to Schemas
Schema management way forward
Era of Web Services
DataEngBytes Aug 2020
@venuvid
Slide 5
The need for Schemas
DataEngBytes Aug 2020
@venuvid
Slide 6
The need for schemas
Customer
Service A (eCommerce)
DataEngBytes Aug 2020
@venuvid
Slide 7
The need for schemas
Customer
Service A
Service B
(eCommerce)
(Analytics)
DataEngBytes Aug 2020
@venuvid
Slide 8
The need for schemas
Customer
{ “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec” }
Service A
Service B
(eCommerce)
(Analytics)
DataEngBytes Aug 2020
@venuvid
Slide 9
The need for schemas
Customer
{
{ “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec”
}
“keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec” }
Service A
Service B
(eCommerce)
(Analytics)
DataEngBytes Aug 2020
@venuvid
Slide 10
The need for schemas
Customer
{
{ “keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec”, “keyd”:”valued”
}
“keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec”, “keyd”:”valued” }
Service A
Service B
(eCommerce)
(Analytics)
DataEngBytes Aug 2020
@venuvid
Slide 11
The need for schemas
Customer
{
{ “keya”:”valuea”, “keyb”:”valueb”, “keyd”:”valued”
“keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec”, “keyd”:”valued”
} }
Service A
Service B
(eCommerce)
(Analytics)
DataEngBytes Aug 2020
@venuvid
Slide 12
The need for schemas
Customer
{
{ “keya”:”valuea”, “keyb”:”valueb”, “keyd”:”valued”
“keya”:”valuea”, “keyb”:”valueb”, “keyc”:”valuec”, “keyd”:”valued”
} }
Service A
Service B
(eCommerce)
(Analytics)
DataEngBytes Aug 2020
@venuvid
Slide 13
The need for schemas
Customer
{
{ “keya”:”valuea”, “keyb”:”valueb”, “keyd”:”valued”
}
“keya”:”valuea”, “keyb”:”valueb”, “keyd”:”valued” }
Service A
Service B
(eCommerce)
(Analytics)
DataEngBytes Aug 2020
@venuvid
Slide 14
The need for schemas {
{ “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued”
“keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued”
}
}
Service A
Service B
(eCommerce)
(Analytics)
{ “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued” }
Service C (Delivery)
DataEngBytes Aug 2020
@venuvid
Slide 15
The need for schemas {
{ “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued”
“keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued”
}
}
Service A
Service B
(eCommerce)
(Analytics)
{ “keya”:”valuea”, “keye”:”valuee”, “keyd”:”valued” }
Service C (Delivery)
DataEngBytes Aug 2020
@venuvid
Slide 16
DataEngBytes Aug 2020
@venuvid
Slide 17
The problem space
Rise of Event Streaming
Stepping into Event-driven Architecture
Intro to Schemas
Schema management way forward
Era of Web Services
DataEngBytes Aug 2020
@venuvid
Slide 18
Schemas to the rescue
Schema/Contract
Service A
DataEngBytes Aug 2020
Service B
@venuvid
Slide 19
Schemas to the rescue (Examples)
DataEngBytes Aug 2020
@venuvid
Slide 20
Schemas to the rescue
Schema/Contract
Ver 1.0 Ver 1.1
Service A
Schema/Contract
PAYLOAD
Service B
Ver 1.0
Service C
Ver 1.1
Schema/Contract
DataEngBytes Aug 2020
@venuvid
Slide 21
Summary
Schemas are the contracts between services Schemas improve data quality Schemas allows versioning It’s a way of documenting your interfaces Enables contract testing
DataEngBytes Aug 2020
@venuvid
Slide 22
The problem space
Rise of Event Streaming
Stepping into Event-driven Architecture
Intro to Schemas
Schema management way forward
Era of Web Services
DataEngBytes Aug 2020
@venuvid
Slide 23
Era of Web Services
Schema/Contract (XSD/JSON Schema)
Service A
DataEngBytes Aug 2020
HTTP SOAP/REST
Service B
@venuvid
Slide 24
Era of Web Services
Slide 25
Era of Web Services
DataEngBytes Aug 2020
@venuvid
Slide 26
Era of Web Services
DataEngBytes Aug 2020
@venuvid
Slide 27
Contract Testing
●
Integration testing, but just with Schema/Contract
contracts ●
Validate the end to end flow without building one.
●
Contract Test
Contract Mock
Find bugs upfront, don’t wait till the actual integration test cycle.
Reference
https://martinfowler.com/bliki/ContractTest.html
Slide 28
Contract Testing - Example
Reference - https://pactflow.io/how-pact-works/#slide-3
Slide 29
The problem space
Rise of Event Streaming
Stepping into Event-driven Architecture
Intro to Schemas
Schema management way forward
Era of Web Services
DataEngBytes Aug 2020
@venuvid
Slide 30
Event Driven Architecture
Adoption of Message Queues = Event driven communication + Being real-time + Being asynchronous
DataEngBytes Aug 2020
@venuvid
Slide 31
Event Driven Architecture
Schema/Contract (XSD/JSON Schema)
MQ/JMS
Service A
DataEngBytes Aug 2020
Service B
@venuvid
Slide 32
EDA - Schema Validation
MQ/JMS Service A
DataEngBytes Aug 2020
Service B
@venuvid
Slide 33
The problem space
Rise of Event Streaming
Stepping into Event-driven Architecture
Intro to Schemas
Schema management way forward
Era of Web Services
DataEngBytes Aug 2020
@venuvid
Slide 34
Event Streaming
Rise of Event Streams = Capture All possible Events + at a Scale + Process them on the fly DataEngBytes Aug 2020
@venuvid
Slide 35
Event Streaming - A brief
●
Record all events at a scale, in and around an Enterprise
●
Process/Interrogate/Analyse events on the fly (Event Stream Processing)
●
Sophisticated real-time processing abilities: Windowing, re-ordering, etc.
●
Good examples: Apache Kafka, AWS Kinesis, Google Pub-Sub
Google Cloud Pub/Sub
Slide 36
Rise of Schema Registry
DataEngBytes Aug 2020
@venuvid
Slide 37
Schema Registry Schema/Contract (JSON/AVRO Schema)
A
1
STREAMS/ TOPICS
B
2
PRODUCERS
CONSUMERS
C
3
D
4
DataEngBytes Aug 2020
@venuvid
Slide 38
Schema Registry - How it works Schema Registry Schema/Contract (JSON/AVRO Schema)
A
1
STREAMS/ TOPICS
B
2
PRODUCERS
CONSUMERS
C
3
D
4
DataEngBytes Aug 2020
@venuvid
Slide 39
Schema Registry - How it works (Example)
Reference - https://docs.confluent.io/current/schema-registry/index.html
DataEngBytes Aug 2020
@venuvid
Slide 40
Schema Registry - How it works (Producer Example)
1.
Define Schema
Reference - https://github.com/vidyavenu/schema-registry-example/
DataEngBytes Aug 2020
@venuvid
Slide 41
Schema Registry - How it works (Producer Example)
2. Register Schema to a topic
Reference - https://github.com/vidyavenu/schema-registry-example/
DataEngBytes Aug 2020
@venuvid
Slide 42
Schema Registry - How it works (Producer Example)
3. Produce data
Reference - https://github.com/vidyavenu/schema-registry-example/
DataEngBytes Aug 2020
@venuvid
Slide 43
Schema Registry - How it works (Producer Example)
4. Messages sent !
Reference - https://github.com/vidyavenu/schema-registry-example/
DataEngBytes Aug 2020
@venuvid
Slide 44
Schema Registry - How it works (Producer Example)
6. Producer modifies the payload
Reference - https://github.com/vidyavenu/schema-registry-example/
DataEngBytes Aug 2020
@venuvid
Slide 45
Schema Registry - How it works (Producer Example)
7. Validation fails :)
Reference - https://github.com/vidyavenu/schema-registry-example/
DataEngBytes Aug 2020
@venuvid
The problem space
Rise of Event Streaming
Stepping into Event-driven Architecture
Intro to Schemas
Schema management way forward
Era of Web Services
DataEngBytes Aug 2020
@venuvid