Don’t Panic! Effective Incident Response

A presentation at Conf42: SRE by Quintessence Anx

In today’s world, service downtime has a significant impact on any business. Being able to respond quickly and proactively to issues can dramatically reduce the repercussions of any incident, both financial and reputational.

We’ll take lessons learned from formalized incident response, such as those used by first responders, and show you how to apply those same practices to your organization. By utilizing these methods you’ll improve both the speed and effectiveness of your team’s response, reducing the amount of downtime experienced.

In this workshop, attendees will:

  • Be introduced to the Incident Command System and learn how it can be adapted to other industries.
  • Walk through the basics of incident response best practices.
  • Discuss examples of formal incident response from multiple organizations.
  • Perform hands-on exercises to practice incident response skills.
  • Have the opportunity to share their real-world experiences with each other.

Resources

The following resources were mentioned during the presentation or are useful additional information.

  • Incident Response Ops Guide

    This documentation covers parts of the PagerDuty Incident Response process. It is a cut-down version of our internal documentation used at PagerDuty for any major incidents and to prepare new employees for on-call responsibilities. It provides information not only on preparing for an incident, but also what to do during and after the incident. It is intended to be used by on-call practitioners and those involved in an operational incident response process (or those wishing to enact a formal incident response process). See the about page for more information on what this documentation is and why it exists.

  • Full Service Ownership Ops Guide

    Full-service ownership means that people take responsibility for supporting the software they deliver, at every stage of the software/service lifecycle. That level of ownership brings development teams much closer to their customers, the business, and the value being delivered. This guide provides a step-by-step process that software delivery operations teams can use to ensure they are able to fully own their services.

  • Post Mortems Ops Guide

    The postmortem concept is well known in the technology industry, but it can be difficult for newer individuals, teams, and organizations to adopt the cultural nuances required for effective postmortems. This guide will teach you how to build a culture of continuous learning, the most important components to include in your analysis, and how to conduct effective postmortem meetings.

  • Retrospectives Ops Guide

    Having retrospectives on a regular basis is one way for your team to learn what they’re doing right, where they can improve, how to avoid making the same mistakes again and again, and, most importantly, how to critically think about how they’re working together. Well-designed retrospectives allow teams to iteratively improve their end product and collaboration process.

    The concept of having retrospectives is well known in the technology industry, but it can be difficult for newer individuals, teams, and organizations to adopt the mindset required to execute effective for effective retrospectives. This guide will teach you how to build a culture of continuous improvement, the most important components to include in your retrospectives, and how to conduct effective retrospectives.