TO ERR IS HUMAN: The Complexity of Security Failures Kelly Shortridge (@swagitda_) Hacktivity 2019

Hi, I’m Kelly @swagitda_

“To err is human; to forgive, divine.” – Alexander Pope @swagitda_

Humans make mistakes. It’s part of our nature (it’s mostly a feature, not a bug) 4 @swagitda_

Infosec’s mistake: operating as if you can force humans to never err 5 @swagitda_

This forces us into a futile war against nature. We cannot bend it to our will. 6 @swagitda_

To build secure systems, we must work with nature, rather than against it. 7 @swagitda_

Clearing the Err Hindsight & Outcome Bias Unhealthy Coping Mechanisms Making Failure Epic 8 @swagitda_

Clearing the Err

Error: an action that leads to failure or that deviates from expected behavior 10 @swagitda_

Security failure: the breakdown in our security coping mechanisms 11 @swagitda_

“Human error” involves subjective expectations, including in infosec 12 @swagitda_

Understanding why incidents happened is essential, but blame doesn’t help 13 @swagitda_

Aviation, manufacturing, & healthcare are already undergoing this revolution 14 @swagitda_

Slips (unintended actions) occur far more than mistakes (inappropriate intentions) 15 @swagitda_

The term “human error” is less grounded to reality than we believe… 16 @swagitda_

Hindsight & Outcome Bias

Cognitive biases represent mental shortcuts that are optimal for evolution 18 @swagitda_

We learn from the past to progress, but our “lizard brain” can take things too far 19 @swagitda_

Hindsight bias: the “I knew it all along” effect aka the “curse of knowledge” 20 @swagitda_

People overestimate their predictive abilities when lacking future knowledge 21 @swagitda_

e.g. skepticism of N.K. attribution for the Sony Pictures leak; now it is “obvious” 22 @swagitda_

Outcome bias: judging a decision based on its eventual outcome 23 @swagitda_

Instead, evaluate decisions based on what was known at that time 24 @swagitda_

All decisions involve some level of risk. Outcomes are largely based on chance. 26 @swagitda_

We unfairly hold people accountable for events beyond their control 27 @swagitda_

e.g. CapitalOne – did the breach really represent a failure in their strategy? (No.) 28 @swagitda_

These biases change how we cope with failure… 29 @swagitda_

Unhealthy Coping Mechanisms

Unhealthy coping mechanism #1: Blaming “human error” 31 @swagitda_

Infosec’s fav hobbies: PICNIC & PEBKAC 32 @swagitda_

This isn’t about removing accountability — malicious individuals certainly exist 33 @swagitda_

Fundamental attribution error: your actions reflect innate traits, mine don’t 34 @swagitda_

“You are inattentive, sloppy, & naïve for clicking a link. I was just super busy.” 35 @swagitda_

An error represents the starting point for an investigation, not a conclusion 36 @swagitda_

“Why did they click the link?” “Why did clicking a link lead to pwnage?” 37 @swagitda_

These questions go unanswered if we accept the “human error” explanation 38 @swagitda_

e.g. training devs to “care about security” completely misses the underlying issue 39 @swagitda_

A “5 Whys” approach is a healthy start 40 @swagitda_

Equifax’s ex-CEO blamed “human error” for the breach. He was wrong. 41 @swagitda_

What about frictional workflows, legacy dependence, org pressures for uptime? 42 @swagitda_

90% of breaches cite “human error” as the cause. That stat is basically useless. 43 @swagitda_

Bad theory: if humans are removed from the equation, error can’t occur 44 @swagitda_

Unhealthy coping mechanism #2: Behavioral control 45 @swagitda_

“An approach aimed at the individual is the equivalent of swatting individual mosquitoes rather than draining the swamp to address the source of the problem.” – Henriksen, et al. 46 @swagitda_

“Policy violation” is a sneaky way to still rely on “human error” as an answer 47 @swagitda_

The cornucopia of security awareness hullabaloo is a direct result of this 48 @swagitda_

Solely restricting human behavior will never improve security outcomes. 49 @swagitda_

We focus on forcing humans to fit our ideal mold vs. re-designing our systems 50 @swagitda_

Formal policies are rarely written by those in the flow of work being policed 51 @swagitda_

Infosec is mostly at the “blunt” end of systems; operators are at the “sharp” end 52 @swagitda_

People tend to blame whomever resides closest to the error 53 @swagitda_

Operator actions “add a final garnish to a lethal brew whose ingredients have already been long in the cooking.” – James Reason 54 @swagitda_

e.g. Equifax’s 48-hour patching policy that was very obviously not followed 55 @swagitda_

Creating words on a piece of paper & expecting results is… ambitious 56 @swagitda_

Discipline doesn’t actually fix the “policy violation” cause (but it does scapegoat) 57 @swagitda_

Case study: SS&C & BEC 58 @swagitda_

Solely implementing controls to regulate human behavior doesn’t beget resilience 59 @swagitda_

Post-WWII analysis: Improved design of cockpit controls won over pilot training 60 @swagitda_

Communicate expert guidance, but tether it to reality 61 @swagitda_

Checklists can be valuable aids if they’re based on knowledge of real workflows 62 @swagitda_

Policies must encourage safer contexts, not lord over behavior with an iron fist. 63 @swagitda_

Unhealthy coping mechanism #3: The just-world hypothesis 64 @swagitda_

Attempting to find the ultimate causal seed of failure helps us cope with fear 65 @swagitda_

The just world hypothesis: humans like believing the world is orderly & fair 66 @swagitda_

The fact that the same things can lead to both success & failure isn’t a “just world” 67 @swagitda_

Case Study: The Chernobyl disaster 68 @swagitda_

Errors are really symptoms of pursuing goals while under resource constraints 69 @swagitda_

How can security teams more productively deal with security failures? 70 @swagitda_

Making Failure Epic

Infosec will progress when we ensure the easy way is the secure way 72 @swagitda_

System perspective Security UX Chaos security engineering Blameless culture 73 @swagitda_

System perspective

Security failure is never the result of one factor, one vuln, or one dismissed alert 75 @swagitda_

Security must expand their focus to look at relationships between components 76 @swagitda_

A system is “a set of interdependent components interacting to achieve a common specified goal.” 77 @swagitda_

“A narrow focus on operator actions, physical component failures, and technology may lead to ignoring some of the most important factors in terms of preventing future accidents” – Nancy Leveson 78 @swagitda_

The way humans use tech involves economic & social factors, too 79 @swagitda_

Economic factors: revenue & profit goals, compensation schemes, budgeting, etc. 80 @swagitda_

Social factors: KPIs, expectations, what behavior is rewarded or punished, etc. 81 @swagitda_

Pressure to do more work, faster is a vulnerability. So is a political culture. 82 @swagitda_

Non-software vulns don’t appear in our threat models, but also erode resilience 83 @swagitda_

We treat colleagues like Schrödinger’s attacker vs. dissecting org-level factors 84 @swagitda_

Security is something a system does, not something a system has. 85 @swagitda_

Think of it as helping our systems operate safely vs. “adding security” 86 @swagitda_

Health & “security vanity” metrics don’t say whether systems are doing security 87 @swagitda_

Number of vulns found matters less than their severity & how quickly they’re fixed 88 @swagitda_

Infosec should analyze the mismatch between self-perception & reality 89 @swagitda_

Alternative analysis for defenders is basically just user research… 90 @swagitda_

Security UX

The pressure to meet competing goals is a strong source of security failure 92 @swagitda_

What drives their promotion or firing? What are their performance goals? 93 @swagitda_

Human attention is a finite & precious resource, so you must compete for it 94 @swagitda_

User research can help you determine how to draw attention towards security 95 @swagitda_

96 @swagitda_

WARNING: CYBER ANOMALY (thanks Raytheon) 97 @swagitda_

Choice architecture: organizing the context in which people make decisions 98 @swagitda_

Place secure behavior on the path of least resistance by using defaults 99 @swagitda_

e.g. Requiring 2FA to create an account, security tests in CI/CD pipelines 100 @swagitda_

Slips require changes to the design of systems with which humans interact 101 @swagitda_

Checklists, defaults, eliminating distractions, removing complexity… 102 @swagitda_

Strong security design anticipates user workarounds & safely supports them 103 @swagitda_

e.g. Self-service app approvals with a Slackbot to confirm the run request 104 @swagitda_

Think in terms of acceptable tradeoffs – create secure alternatives, not loopholes 105 @swagitda_

How else can you better understand your systems & the context they create? 106 @swagitda_

Chaos Security Engineering

We will never be able to eliminate the potential for error. 108 @swagitda_

We must seek feedback on what creates success & failure in our systems 109 @swagitda_

“Enhancing error tolerance, error detection, and error recovery together produce safety.” – Woods, et al 110 @swagitda_

Error tolerance: the ability to not get totally pwned when compromise occurs 111 @swagitda_

Error detection: the ability to spot unwanted activity 112 @swagitda_

Error recovery: the ability to restore systems to their intended functionality 113 @swagitda_

Highest ROI: anticipating how the potential for failure evolves 114 @swagitda_

Chaos eng: continual experimentation to evaluate response to unexpected failure 115 @swagitda_

e.g. Retrograding: inject old versions of libs, containers, etc. into your systems 116 @swagitda_

Chaos engineering assumes existing knowledge hangs in a delicate balance 117 @swagitda_

The potential for hazard is constantly changing, creating new blindspots 118 @swagitda_

If you don’t understand your systems, you can’t ever hope to protect them 119 @swagitda_

Chaos security engineering requires a blameless culture… 120 @swagitda_

Blameless Culture

A blameless culture balances safety and accountability – not absolution 122 @swagitda_

Supports a perpetual state of learning, in which critical info isn’t suppressed 123 @swagitda_

Asking the right questions is the first step towards a blameless culture 124 @swagitda_

Neutral questions prevent bias from seeping into our incident review 125 @swagitda_

Ask other practitioners what they would do in the same original context 126 @swagitda_

Case study: the stressed accountant 127 @swagitda_

“Human error” becomes a reasonable action given the human’s circumstances 128 @swagitda_

Your security program is set up to fail if it blames humans for reasonable actions 129 @swagitda_

Neutral practitioner questions help sketch a portrait of local rationality 130 @swagitda_

“Irrational behavior” is only irrational when considered without local context 131 @swagitda_

Our goal is to change the context of decision-making to promote security 132 @swagitda_

If you’re using an ad-hominem attack in incident review, you’ve veered astray 133 @swagitda_

In Conclusion

Discard the crutch of “human error” so you can learn from failure 135 @swagitda_

Always consider the messiness of systems, organizations, and minds 136 @swagitda_

You aren’t exempt – your own emotions play a part in these systems 137 @swagitda_

Work with human nature rather than against it, and think in terms of systems 138 @swagitda_

Leverage UX & chaos eng to improve the context your systems engender 139 @swagitda_

Ask neutral questions & ensure your teams feel safe enough to discuss errors 140 @swagitda_

Infosec is erring. But we still have the chance to become divine. 141 @swagitda_

“We may encounter many defeats, but we must not be defeated. It may even be necessary to encounter the defeat, so that we can know who we are. So that we can see, oh, that happened, and I rose.” – Maya Angelou 142 @swagitda_

@swagitda_ /in/kellyshortridge kelly@greywire.net 143 @swagitda_

Suggested Reading • “The evolution of error: Error management, cognitive constraints, and adaptive decision-making biases.” Johnson, D., et al. • “Hindsight bias impedes learning.” Mahdavi, S., & Rahimian, M. A. • “Outcome bias in decision evaluation.” Baron, J., & Hershey, J. C. • “Human error.” Reason, J. • “Behind human error.” Woods, D., et al. • “People or systems? To blame is human. The fix is to engineer.” Holden, R.J. • “Understanding adverse events: a human factors framework.” Henriksen, K., et al. • “Engineering a safer world: Systems thinking applied to safety.” Leveson, N. • “‘Going solid’: a model of system dynamics and consequences for patient safety.” Cook, R., Rasmussen, J. • “Choice Architecture.” Thaler, R. H., Sunstein, C.R., Balz, J.P. • “Blameless PostMortems and a Just Culture.” Allspaw, J. 144 @swagitda_