Controlled Chaos: The Inevitable Marriage of DevOps & Security

A presentation at Black Hat USA in August 2019 in Las Vegas, NV, USA by Kelly Shortridge

Slide 1

Slide 1

C ONTROLLED C HAOS The Inevitable Marriage of DevOps & Security Kelly Shortridge (@swagitda_) Dr. Nicole Forsgren (@nicolefv) Black Hat USA 2019

Slide 2

Slide 2

Hi, I’m Kelly 2

Slide 3

Slide 3

Hi, I’m Nicole 3

Slide 4

Slide 4

“Chaos isn’t a pit. Chaos is a ladder.” ― Petyr Baelish, Game of Thrones 4

Slide 5

Slide 5

Software is eating the world. DevOps drives its devouring. 5

Slide 6

Slide 6

Infosec has a choice: marry DevOps or be rendered impotent & irrelevant 6

Slide 7

Slide 7

Infosec isn’t invincible. Denial & “DevSecOps” won’t save your future. 7

Slide 8

Slide 8

How should infosec control chaos & make a marriage to DevOps last? 8

Slide 9

Slide 9

  1. DevOps Dominion
  2. The Metamorphosis 3. Time to D.I.E. 4. A Phoenix Rises 5. Marriage Vows 9

Slide 10

Slide 10

DevOps Dominion

Slide 11

Slide 11

DevOps is not automation or “agile” 11

Slide 12

Slide 12

DevOps is a mindset that unifies responsibility and accountability. 12

Slide 13

Slide 13

DevOps has “crossed the chasm” – the business benefits are too striking 13

Slide 14

Slide 14

DevOps integrates once-disparate roles, encouraging “shifting left” 14

Slide 15

Slide 15

Infosec can join DevOps or watch as DevOps carves its own secure path 15

Slide 16

Slide 16

Chaos & resilience is infosec’s future 16

Slide 17

Slide 17

Therefore, infosec & DevOps priorities actually align… 17

Slide 18

Slide 18

What are DevOps’s priorities?

Slide 19

Slide 19

Optimization of software delivery performance so tech delivers value 19

Slide 20

Slide 20

Stability & speed don’t conflict – resilience & innovation are bffs 20

Slide 21

Slide 21

CI/CD: implement changes in prod rapidly, sustainably, & safely 21

Slide 22

Slide 22

What metrics delineate “elite” DevOps performers from the rest? 22

Slide 23

Slide 23

Lead time for changes: How long does it take for committed code to successfully run in production? 23

Slide 24

Slide 24

Release frequency: How often is code deployed to production or released to end users? 24

Slide 25

Slide 25

Time to Recovery (TTR): How long does it take to restore service? 25

Slide 26

Slide 26

Change failure rate: What percentage of changes to production degrade service & require remediation? 26

Slide 27

Slide 27

Elite High Medium Low < One day 1 day - 1 week 1 week – 1 month 1 month – 6 months On demand (>1 daily) 1 per day – 1 per month 1 per week – 1 per month 1 per month – 1 per 6 months Time to recovery < 1 hour < 1 day < 1 day 1 week – 1 month Change failure rate 0% – 15% 0% – 15% 0% – 15% 46% – 60% Lead time for changes Release frequency 27

Slide 28

Slide 28

The evidence: no tradeoff between better infosec & DevOps leetness 28

Slide 29

Slide 29

Elites conduct security reviews & implement changes in mere days 29

Slide 30

Slide 30

“DevOps doesn’t care about security” is a lazy straw man. Stop it. 30

Slide 31

Slide 31

Security drives stronger DevOps results. Now infosec must evolve. 31

Slide 32

Slide 32

The Metamorphosis

Slide 33

Slide 33

Partitioning of responsibility & accountability engenders conflict 33

Slide 34

Slide 34

The real “DevSecOps”: DevOps will be held accountable for security fixes 34

Slide 35

Slide 35

What goals should infosec pursue in this evolution? 35

Slide 36

Slide 36

And… why should infosec goals diverge from DevOps goals? 36

Slide 37

Slide 37

Infosec should support innovation in the face of change – not add friction 37

Slide 38

Slide 38

Infosec has arguably failed, so “this is how we’ve always done it” is invalid 38

Slide 39

Slide 39

Cloud & microservices created the “Infosec Copernican Revolution” 39

Slide 40

Slide 40

But the data doesn’t lie: cloud & PaaS contribute to elite performance 40

Slide 41

Slide 41

The Security of Chaos

Slide 42

Slide 42

“Things will fail” naturally extends into “things will be pwned” 42

Slide 43

Slide 43

Security failure is when security controls don’t operate as intended 43

Slide 44

Slide 44

What are the principles of chaotic security engineering? 44

Slide 45

Slide 45

  1. Expect that security controls will fail & prepare accordingly 45

Slide 46

Slide 46

  1. Don’t try to avoid incidents – hone your ability to respond to them 46

Slide 47

Slide 47

What are the benefits of the chaos / resilience approach? 47

Slide 48

Slide 48

Benefits: lowers remediation costs & stress levels during real incidents 48

Slide 49

Slide 49

Benefits: minimizes end-user disruption & improves confidence 49

Slide 50

Slide 50

Benefits: creates feedback loops to foster understanding of systemic risk 50

Slide 51

Slide 51

The ability to automate “toil” away should also appeal to infosec 51

Slide 52

Slide 52

Toil: manual, repetitive, tactical work that doesn’t provide enduring value 52

Slide 53

Slide 53

Manual patching, provisioning 2FA / ACLs, firewall rule management, etc. 53

Slide 54

Slide 54

What other ways can infosec become more strategic? 54

Slide 55

Slide 55

Time to D.I.E.

Slide 56

Slide 56

C.I.A. triad: commonly used as a model to balance infosec priorities 56

Slide 57

Slide 57

Confidentiality: withhold info from people unauthorized to view it 57

Slide 58

Slide 58

Integrity: data is a trustworthy representation of the original info 58

Slide 59

Slide 59

Availability: organization’s services are available to end users 59

Slide 60

Slide 60

But these are security values, not qualities that create security 60

Slide 61

Slide 61

We need a model promoting qualities that make systems more secure 61

Slide 62

Slide 62

Instead, use the D.I.E. model: Distributed, Immutable, Ephemeral 62

Slide 63

Slide 63

Distributed: multiple systems supporting the same overarching goal 63

Slide 64

Slide 64

Distributed infrastructure reduces risk of DoS attacks by design 64

Slide 65

Slide 65

Immutable: infrastructure that doesn’t change after it’s deployed 65

Slide 66

Slide 66

Servers are now disposable “cattle” rather than cherished “pets” 66

Slide 67

Slide 67

Immutable infra is more secure by design – ban shell access entirely 67

Slide 68

Slide 68

Lack of control is scary, but unlimited lives are better than nightmare mode 68

Slide 69

Slide 69

Ephemeral: infrastructure with a very short lifespan (dies after a task) 69

Slide 70

Slide 70

Ephemerality creates uncertainty for attackers (persistence = nightmare) 70

Slide 71

Slide 71

Installing a rootkit on a resource that dies in minutes is a waste of effort 71

Slide 72

Slide 72

Optimizing for D.I.E. reduces risk by design & supports resilience 72

Slide 73

Slide 73

A Phoenix Rises

Slide 74

Slide 74

What metrics support resilient security engineering? 74

Slide 75

Slide 75

TTR is equally as important for infosec as it is for DevOps 75

Slide 76

Slide 76

Time Between Failure (TBF) will lead your infosec program astray 76

Slide 77

Slide 77

Extended downtime makes users sad, not more frequent but trivial blips 77

Slide 78

Slide 78

Prioritizing failure inhibits innovation 78

Slide 79

Slide 79

Instead, harness failure as a tool to help you prepare for the inevitable 79

Slide 80

Slide 80

TTR > TTD – who cares if you detect quickly if you don’t fix it? 80

Slide 81

Slide 81

Game days: like planned firedrills 81

Slide 82

Slide 82

Prioritize game days based on potential business impacts 82

Slide 83

Slide 83

Decision trees: start at target asset, work back to easiest attacker paths 85

Slide 84

Slide 84

Determine the attacker’s least-cost path (hint: it doesn’t involve 0day) 86

Slide 85

Slide 85

Architecting chaos

Slide 86

Slide 86

Begin with “dumb” testing before moving to “fancy” testing 88

Slide 87

Slide 87

Controlling Chaos: Availability 89

Slide 88

Slide 88

Turning security events into availability events appeals to DevOps 90

Slide 89

Slide 89

The existing repertoire of chaos eng tools primarily covers availability 91

Slide 90

Slide 90

Chaos Monkey, Azure Fault Analysis Service, Chaos-Lambda… 92

Slide 91

Slide 91

Kube-monkey, PowerfulSeal, Podreaper, Pumba, Blockade… 93

Slide 92

Slide 92

Infosec teams can use these tools but make attackers the source of failure 94

Slide 93

Slide 93

Controlling Chaos: Confidentiality 95

Slide 94

Slide 94

Microservices use multiple layers of auth that preserve confidentiality 96

Slide 95

Slide 95

A service mesh is like an on-demand VPN at the application level 97

Slide 96

Slide 96

Attackers are forced to escalate privileges to access the iptables layer 98

Slide 97

Slide 97

Test: inject failure into your service mesh to test authentication controls 99

Slide 98

Slide 98

Controlling Chaos: Integrity 100

Slide 99

Slide 99

Test: swap out certs in your ZTNs – all transactions should fail 101

Slide 100

Slide 100

Test: modify encrypted data & see if your FIM alerts on it 102

Slide 101

Slide 101

Test: retrograde libraries, containers, other resources in CI/CD pipelines 103

Slide 102

Slide 102

D.I.E.ing is an art, like everything else

Slide 103

Slide 103

Controlling Chaos: Distributed 105

Slide 104

Slide 104

Distributed mostly overlaps with availability in modern infra contexts 106

Slide 105

Slide 105

Multi-region services present a fun opportunity to mess with attackers 107

Slide 106

Slide 106

Shuffle IP blocks regularly to change attackers’ lateral movement game 108

Slide 107

Slide 107

Controlling Chaos: Immutable 109

Slide 108

Slide 108

Immutable infra is like a phoenix – it disappears & comes back a lot 110

Slide 109

Slide 109

Volatile environments with continually moving parts raise the cost of attack 111

Slide 110

Slide 110

Create rules like, “If there’s ever a write to disk, crash the node” 112

Slide 111

Slide 111

Attackers must stay in-memory, which hopefully makes them cry 113

Slide 112

Slide 112

Metasploit Meterpreter + webshell: Touch passwords.txt & kaboom 114

Slide 113

Slide 113

Infosec teams can build Docker images with a “bamboozle layer” 115

Slide 114

Slide 114

Mark garbage files as “unreadable” to craft enticing bait for attackers 116

Slide 115

Slide 115

A potential goal: architect immutability turtles all the way down 117

Slide 116

Slide 116

Test: inject attempts at writing to disk to ensure detection & reversion 118

Slide 117

Slide 117

Treat changes to disk by adversaries similarly to failing disks: mercy kill 119

Slide 118

Slide 118

Controlling Chaos: Ephemeral 120

Slide 119

Slide 119

Most infosec bugs are stated-related – get rid of state, get rid of bugs 121

Slide 120

Slide 120

Reverse uptime: longer host uptime adds greater security risk 122

Slide 121

Slide 121

Test: change API tokens & test if services still accept old tokens 123

Slide 122

Slide 122

Test: inject hashes of old pieces of data to ensure no data persistence 124

Slide 123

Slide 123

Use “arcade tokens” instead of using direct references to data 125

Slide 124

Slide 124

Leverage lessons from toll fraud – cloud billing becomes security signal 126

Slide 125

Slide 125

Test: exfil TBs or run a cryptominer to inform billing spike detection 127

Slide 126

Slide 126

So, how should infosec work with DevOps to implement all of this? 128

Slide 127

Slide 127

Marriage Vows

Slide 128

Slide 128

Infosec + DevOps = scalable love

Slide 129

Slide 129

How does this scalable love look?

Slide 130

Slide 130

Sit in on early design decisions & demos – but say “No, and…” vs. “No.”

Slide 131

Slide 131

Provide input on tests so every testing suite has infosec’s stamp on it

Slide 132

Slide 132

By the last “no” gate in the delivery process, nearly all issues will be fixed

Slide 133

Slide 133

Infosec should focus on outcomes that are aligned with business goals 135

Slide 134

Slide 134

TTR should become the preliminary anchor of your security metrics 136

Slide 135

Slide 135

Security- & performance-related gamedays can’t be separate species 137

Slide 136

Slide 136

Cultivate buy-in together for resilience & chaos engineering 138

Slide 137

Slide 137

Visibility / observability: collecting system information is essential 139

Slide 138

Slide 138

Your DevOps colleagues are likely already collecting the data you need 140

Slide 139

Slide 139

Changing culture: change what people do, not what they think 141

Slide 140

Slide 140

Conclusion

Slide 141

Slide 141

Security cannot force itself into DevOps. It must marry it. 143

Slide 142

Slide 142

Chaos/resilience are natural homes for infosec & represent its future. 144

Slide 143

Slide 143

Infosec must now evolve to unify responsibility & accountability. 145

Slide 144

Slide 144

If not, infosec will sit at the kids’ table until it is uninvited from the business. 146

Slide 145

Slide 145

Giving up control isn’t a harbinger of doom. Resilience is a beacon of hope. 147

Slide 146

Slide 146

“You must have chaos within you to give birth to a dancing star.” ― Friedrich Nietzsche 148

Slide 147

Slide 147

@swagitda_ @nicolefv /in/kellyshortridge /in/nicolefv kelly@greywire.net nicolefv@google.com 149