Don't Panic! How to Cope Now You're Responsible for Production Euan Finlay @efinlay24

https://theantimedia.com/bbc-accidentally-reports-the-death-of-queen-elizabeth-ii/

https://www.businessinsider.com/ft-publishes-incorrect-ecb-announcement-early-2015-12

https://www.businessinsider.com/ft-publishes-incorrect-ecb-announcement-early-2015-12

https://twitter.com/Ludo_Dufour/status/672401158653218816

https://www.ft.com/content/c01f37ec-99c1-11e5-987b-d6cdef1b205c

/usr/bin/whoami @efinlay24

/usr/bin/whodoiworkfor No such file or directory. @efinlay24

https://www.ft.com

You've just been told you're on call. (and you're mildly terrified) @efinlay24

Obligatory audience interaction. @efinlay24

Everyone feels the same way at the start. (I still do today) @efinlay24

How do you get to the point where you're more comfortable? @efinlay24

A tenuous link to A Christmas Carol.

The Ghost of Incidents... > Future Present Past

The Ghost of Incidents Future

Handling incidents is the same as any other skill. @efinlay24

Get comfortable with your alerts.

Get comfortable with your alerts. (and bin the rubbish ones)

Have a plan for when things break. @efinlay24

Keep your documentation up to date. @efinlay24

Practice regularly. @efinlay24

The one where we decommissioned all our production servers

Break things, and see what happens. Did your systems do what you expected? @efinlay24

The Planned Datacenter Disconnect

We got complacent, and stopped running datacenter failure tests... @efinlay24

The Unplanned Datacenter Disconnect

Have a central place for reporting changes and problems. @efinlay24

The Unplanned Datacenter Disconnect (Part II)

We should have followed our own advice. @efinlay24

We're not perfect. (but we always try to improve) @efinlay24

The Ghosts of Incidents... Future > Present Past

The Ghost of Incidents Present

Calm down, take a deep breath: it's (probably) ok. @efinlay24

Don't dive straight in. Go back to first principles. @efinlay24

What's the actual impact? @efinlay24

"All incidents are equal, but some incidents are more equal than others." George Orwell, probably @efinlay24

What's already been tried? @efinlay24

Is there definitely a problem? @efinlay24

What's the minimum viable solution? @efinlay24

Get it running before you get it fixed.

Check the basics first. @efinlay24

Don't be afraid to call for help. @efinlay24

The One Where a Manager Falls Through the Ceiling

The One Where a Director Falls Through the Ceiling

Communication is key. Especially to your customers. @efinlay24

Put someone in charge. @efinlay24

Create a temporary incident channel. @efinlay24

If you think you're over-communicating, it's probably just the right amount. @efinlay24

Tired people don't think good. @efinlay24

The one where we had to serve traffic from staging

It wasn't great, but it wasn't the end of the world. @efinlay24

The Ghosts of Incidents... Future Present > Past

The Ghost of Incidents Past

Congratulations! You survived. It probably wasn't that bad, was it? @efinlay24

Run a post-mortem with everyone involved. @efinlay24

Incident reports are important. @efinlay24

Prioritise follow-up actions. @efinlay24

https://blog.travis-ci.com/2018-04-03-incident-post-mortem

https://blog.travis-ci.com/2018-04-03-incident-post-mortem

Identify what can be done better next time.

The One with the Badly Named Database

Don't name your pre-production database: 'pprod' Seriously, who does that? @efinlay24

https://twitter.com/iamdevloper/status/1040171187601633280

Nearly the end. (don't clap yet) @efinlay24

Problems will always happen. (and that's ok) @efinlay24

The end. (please clap)

@efinlay24 euan.finlay@ft.com We're hiring! https://ft.com/dev/null https://aboutus.ft.com/en-gb/careers/ Image links: https://goo.gl/3DeojV