Resilience for Retail

A presentation at PagerDuty Webinar in October 2021 in by Quintessence Anx

Slide 1

Slide 1

Resilience for Retail A Tale Not About Ice Cream But Somehow Also About Ice Cream

Slide 2

Slide 2

Quintessence Anx DevOps Advocate @ PagerDuty @QuintessenceAnx

Slide 3

Slide 3

Don’t panic @QuintessenceAnx

Slide 4

Slide 4

@QuintessenceAnx

Slide 5

Slide 5

Elevated response period @QuintessenceAnx

Slide 6

Slide 6

@QuintessenceAnx

Slide 7

Slide 7

How to determine your Elevated Response Period @QuintessenceAnx

Slide 8

Slide 8

What support is needed @QuintessenceAnx

Slide 9

Slide 9

Build or Buy ! :: Make or Buy ” @QuintessenceAnx

Slide 10

Slide 10

@QuintessenceAnx

Slide 11

Slide 11

⛔ @QuintessenceAnx

Slide 12

Slide 12

v0 Architecture @QuintessenceAnx

Slide 13

Slide 13

Random Outage Graph @QuintessenceAnx

Slide 14

Slide 14

What, When, Where @QuintessenceAnx

Slide 15

Slide 15

Let’s Talk a Little About Resiliency Itself @QuintessenceAnx

Slide 16

Slide 16

A resilient system is a system that is able to withstand adversity. @QuintessenceAnx

Slide 17

Slide 17

Something is resilient if it is able to withstand adversity. @QuintessenceAnx

Slide 18

Slide 18

What can this look like? @QuintessenceAnx

Slide 19

Slide 19

Organizational Resilience can look like having the appropriate response structure(s) in place for IT systems, services, and users in the event of a latency or outage. @QuintessenceAnx

Slide 20

Slide 20

(IT) System Resilience can look like an application not going down, and/or autoscaling, in response to increased traffic. @QuintessenceAnx

Slide 21

Slide 21

Why is this important? @QuintessenceAnx

Slide 22

Slide 22

@QuintessenceAnx

Slide 23

Slide 23

Response and Design @QuintessenceAnx

Slide 24

Slide 24

@QuintessenceAnx

Slide 25

Slide 25

Resilient Response @QuintessenceAnx

Slide 26

Slide 26

Resilient Response Checklist • Define elevated response • Maximize experienced responders • Both primary and secondary • Do not design around resources you do not have • Minimize responder burnout • Clear handoff procedures • Clear ownership • Dedicated, clear, responder roles • Practiced response process • Validate responder access to tools and data • Updated documentation @QuintessenceAnx

Slide 27

Slide 27

Define elevated response @QuintessenceAnx

Slide 28

Slide 28

Maximize Experienced Responders @QuintessenceAnx

Slide 29

Slide 29

Do not design around resources you do not have @QuintessenceAnx

Slide 30

Slide 30

Responder Burnout @QuintessenceAnx

Slide 31

Slide 31

Clear Handoff Procedures @QuintessenceAnx

Slide 32

Slide 32

Clear Ownership @QuintessenceAnx

Slide 33

Slide 33

Dedicated, clear, responder roles @QuintessenceAnx

Slide 34

Slide 34

Practiced Response Process @QuintessenceAnx

Slide 35

Slide 35

Validate access to tools and data @QuintessenceAnx

Slide 36

Slide 36

Updated documentation @QuintessenceAnx

Slide 37

Slide 37

Resilient Response Checklist • Define elevated response • Maximize experienced responders • Both primary and secondary • Do not design around resources you do not have • Minimize responder burnout • Clear handoff procedures • Clear ownership • Dedicated, clear, responder roles • Practiced response process • Validate responder access to tools and data • Updated documentation @QuintessenceAnx

Slide 38

Slide 38

@QuintessenceAnx

Slide 39

Slide 39

Resilient Design Checklist • Build, test, secure with scalability in mind • Build, test, secure with humans in mind • Automate as much as is feasible • Keep documentation updated in pace of releases • Build, test, secure with redundancy • Do not design around resources and/or failover in mind • Build, test, secure with operator control in mind • Build, test, secure with observability in mind you do not have • Clear ownership • Who owns the service, writes the code, etc. @QuintessenceAnx

Slide 40

Slide 40

Build, test, secure @QuintessenceAnx

Slide 41

Slide 41

Build, test, secure: scalability @QuintessenceAnx

Slide 42

Slide 42

Build, test, secure: humans @QuintessenceAnx

Slide 43

Slide 43

Build, test, secure: redundancy / failover @QuintessenceAnx

Slide 44

Slide 44

Build, test, secure: operator control @QuintessenceAnx

Slide 45

Slide 45

Build, test, secure: observability @QuintessenceAnx

Slide 46

Slide 46

Automation @QuintessenceAnx

Slide 47

Slide 47

Updated documentation @QuintessenceAnx

Slide 48

Slide 48

Do not design around resources you do not have @QuintessenceAnx

Slide 49

Slide 49

Clear ownership @QuintessenceAnx

Slide 50

Slide 50

Resilient Design Checklist • Build and test with scalability in mind • Build and test with humans in mind • Automate as much as is feasible • Keep documentation updated in pace of releases • Do not design around resources you do not have • Clear ownership • Build and test with redundancy and/ or failover in mind • Build and test with security in mind • Build and test with operator control in mind • Build and test with observability in mind • Who owns the service, writes the code, etc. @QuintessenceAnx

Slide 51

Slide 51

Practice with Ice Cream ” @QuintessenceAnx

Slide 52

Slide 52

@QuintessenceAnx

Slide 53

Slide 53

Understand the Business @QuintessenceAnx

Slide 54

Slide 54

@QuintessenceAnx

Slide 55

Slide 55

Resilient Response: Questions to Ask • What cannot go wrong? • What is at risk of going wrong? • What responses are needed in each situation? • Who is doing what step(s) in the response process(es)? • Are we in an Elevated Response Period? • And are separate considerations for that period defined? @QuintessenceAnx

Slide 56

Slide 56

Resilient System: Questions to Ask • How do we prevent “what cannot go wrong”? • How do we mitigate risk for “what else can go wrong”? • How do we support our response process(es)? • How do we support our responders? • How does an elevated response period impact our system? @QuintessenceAnx

Slide 57

Slide 57

Resiliency is not limited to IT systems and personnel @QuintessenceAnx

Slide 58

Slide 58

Resources & References noti.st/quintessence @QuintessenceAnx

Slide 59

Slide 59

Questions? Quintessence Anx DevOps Advocate noti.st/quintessence @QuintessenceAnx