A presentation at PagerDuty Webinar by Quintessence Anx
Resilience for Retail A Tale Not About Ice Cream But Somehow Also About Ice Cream
Quintessence Anx DevOps Advocate @ PagerDuty @QuintessenceAnx
Don’t panic @QuintessenceAnx
@QuintessenceAnx
Elevated response period @QuintessenceAnx
How to determine your Elevated Response Period @QuintessenceAnx
What support is needed @QuintessenceAnx
Build or Buy ! :: Make or Buy ” @QuintessenceAnx
⛔ @QuintessenceAnx
v0 Architecture @QuintessenceAnx
Random Outage Graph @QuintessenceAnx
What, When, Where @QuintessenceAnx
Let’s Talk a Little About Resiliency Itself @QuintessenceAnx
A resilient system is a system that is able to withstand adversity. @QuintessenceAnx
Something is resilient if it is able to withstand adversity. @QuintessenceAnx
What can this look like? @QuintessenceAnx
Organizational Resilience can look like having the appropriate response structure(s) in place for IT systems, services, and users in the event of a latency or outage. @QuintessenceAnx
(IT) System Resilience can look like an application not going down, and/or autoscaling, in response to increased traffic. @QuintessenceAnx
Why is this important? @QuintessenceAnx
Response and Design @QuintessenceAnx
Resilient Response @QuintessenceAnx
Resilient Response Checklist • Define elevated response • Maximize experienced responders • Both primary and secondary • Do not design around resources you do not have • Minimize responder burnout • Clear handoff procedures • Clear ownership • Dedicated, clear, responder roles • Practiced response process • Validate responder access to tools and data • Updated documentation @QuintessenceAnx
Define elevated response @QuintessenceAnx
Maximize Experienced Responders @QuintessenceAnx
Do not design around resources you do not have @QuintessenceAnx
Responder Burnout @QuintessenceAnx
Clear Handoff Procedures @QuintessenceAnx
Clear Ownership @QuintessenceAnx
Dedicated, clear, responder roles @QuintessenceAnx
Practiced Response Process @QuintessenceAnx
Validate access to tools and data @QuintessenceAnx
Updated documentation @QuintessenceAnx
Resilient Design Checklist • Build, test, secure with scalability in mind • Build, test, secure with humans in mind • Automate as much as is feasible • Keep documentation updated in pace of releases • Build, test, secure with redundancy • Do not design around resources and/or failover in mind • Build, test, secure with operator control in mind • Build, test, secure with observability in mind you do not have • Clear ownership • Who owns the service, writes the code, etc. @QuintessenceAnx
Build, test, secure @QuintessenceAnx
Build, test, secure: scalability @QuintessenceAnx
Build, test, secure: humans @QuintessenceAnx
Build, test, secure: redundancy / failover @QuintessenceAnx
Build, test, secure: operator control @QuintessenceAnx
Build, test, secure: observability @QuintessenceAnx
Automation @QuintessenceAnx
Clear ownership @QuintessenceAnx
Resilient Design Checklist • Build and test with scalability in mind • Build and test with humans in mind • Automate as much as is feasible • Keep documentation updated in pace of releases • Do not design around resources you do not have • Clear ownership • Build and test with redundancy and/ or failover in mind • Build and test with security in mind • Build and test with operator control in mind • Build and test with observability in mind • Who owns the service, writes the code, etc. @QuintessenceAnx
Practice with Ice Cream ” @QuintessenceAnx
Understand the Business @QuintessenceAnx
Resilient Response: Questions to Ask • What cannot go wrong? • What is at risk of going wrong? • What responses are needed in each situation? • Who is doing what step(s) in the response process(es)? • Are we in an Elevated Response Period? • And are separate considerations for that period defined? @QuintessenceAnx
Resilient System: Questions to Ask • How do we prevent “what cannot go wrong”? • How do we mitigate risk for “what else can go wrong”? • How do we support our response process(es)? • How do we support our responders? • How does an elevated response period impact our system? @QuintessenceAnx
Resiliency is not limited to IT systems and personnel @QuintessenceAnx
Resources & References noti.st/quintessence @QuintessenceAnx
Questions? Quintessence Anx DevOps Advocate noti.st/quintessence @QuintessenceAnx
View Resilience for Retail on Notist.
Dismiss
Discussing how resilient software practices are relevant to retail employers.
The following resources were mentioned during the presentation or are useful additional information.