A presentation at Conf42: SRE by Quintessence Anx
In today’s world, service downtime has a significant impact on any business. Being able to respond quickly and proactively to issues can dramatically reduce the repercussions of any incident, both financial and reputational.
We’ll take lessons learned from formalized incident response, such as those used by first responders, and show you how to apply those same practices to your organization. By utilizing these methods you’ll improve both the speed and effectiveness of your team’s response, reducing the amount of downtime experienced.
In this workshop, attendees will:
The following resources were mentioned during the presentation or are useful additional information.
This documentation covers parts of the PagerDuty Incident Response process. It is a cut-down version of our internal documentation used at PagerDuty for any major incidents and to prepare new employees for on-call responsibilities. It provides information not only on preparing for an incident, but also what to do during and after the incident. It is intended to be used by on-call practitioners and those involved in an operational incident response process (or those wishing to enact a formal incident response process). See the about page for more information on what this documentation is and why it exists.
Full-service ownership means that people take responsibility for supporting the software they deliver, at every stage of the software/service lifecycle. That level of ownership brings development teams much closer to their customers, the business, and the value being delivered. This guide provides a step-by-step process that software delivery operations teams can use to ensure they are able to fully own their services.
The postmortem concept is well known in the technology industry, but it can be difficult for newer individuals, teams, and organizations to adopt the cultural nuances required for effective postmortems. This guide will teach you how to build a culture of continuous learning, the most important components to include in your analysis, and how to conduct effective postmortem meetings.
Having retrospectives on a regular basis is one way for your team to learn what they’re doing right, where they can improve, how to avoid making the same mistakes again and again, and, most importantly, how to critically think about how they’re working together. Well-designed retrospectives allow teams to iteratively improve their end product and collaboration process.
The concept of having retrospectives is well known in the technology industry, but it can be difficult for newer individuals, teams, and organizations to adopt the mindset required to execute effective for effective retrospectives. This guide will teach you how to build a culture of continuous improvement, the most important components to include in your retrospectives, and how to conduct effective retrospectives.