I am Cluster Admin, Destroyer of Everything You Hold Dear

A presentation at All Day Devops in November 2022 in by Matt Williams

Slide 1

Slide 1

TRACK: SITE RELIABILITY ENGINEERING NOVEMBER 10, 2022 I am Cluster Admin, Destroyer of Everything You Hold Dear Matt Williams, Evangelist @ Infra TW: @technovangelist - Mast: @technovangelist@fosstodon.org

Slide 2

Slide 2

TRACK: SITE RELIABILITY ENGINEERING Least Privilege According to Cybersecurity & Infrastructure Security Agency (CISA): Only the minimum necessary rights should be assigned to a subject that requests access to a resource and should be in effect for the shortest duration necessary … careful delegation of access rights can limit attackers from damaging a system.

Slide 3

Slide 3

TRACK: SITE RELIABILITY ENGINEERING What happens when we skip Least Privilege

Slide 4

Slide 4

TRACK: SITE RELIABILITY ENGINEERING Target - 2013 • HVAC on main network • Useful for monitoring energy consumption at various stores

Slide 5

Slide 5

TRACK: SITE RELIABILITY ENGINEERING Target - 2013 • HVAC on main network • Useful for monitoring energy consumption at various stores • Technician compromised

Slide 6

Slide 6

TRACK: SITE RELIABILITY ENGINEERING Target - 2013 • HVAC on main network • Useful for monitoring energy consumption at various stores • Technician compromised Attackers stole 40 million debit and credit cards

Slide 7

Slide 7

TRACK: SITE RELIABILITY ENGINEERING GitLab - 2017 • SRE responding to incident • Intended to drop replica database

Slide 8

Slide 8

TRACK: SITE RELIABILITY ENGINEERING GitLab - 2017 • SRE responding to incident • Intended to drop replica database • Fat fingered the production database and had excessive privileges to do it

Slide 9

Slide 9

TRACK: SITE RELIABILITY ENGINEERING GitLab - 2017 • SRE responding to incident • Intended to drop replica database • Fat fingered the production database and had excessive privileges to do it GitLab went down for 6 hours, 5k projects lost (issues, etc), comments, users

Slide 10

Slide 10

TRACK: SITE RELIABILITY ENGINEERING Marriott - 2018 User compromised Had admin access for everything Ran some database queries

Slide 11

Slide 11

TRACK: SITE RELIABILITY ENGINEERING Marriott - 2018 User compromised Had admin access for everything Ran some database queries Hundreds of millions of customer records lost

Slide 12

Slide 12

TRACK: SITE RELIABILITY ENGINEERING Capital One - 2019 • Misconfigured firewall • Generated temp account creds via SSRF exploit • Had excessive privileges to sync S3 buckets

Slide 13

Slide 13

TRACK: SITE RELIABILITY ENGINEERING Capital One - 2019 • Misconfigured firewall • Generated temp account creds via SSRF exploit • Had excessive privileges to sync S3 buckets 30GB of credit application data, affecting 100 million in US, 6 million in Canada

Slide 14

Slide 14

TRACK: SITE RELIABILITY ENGINEERING Verkada - 2021 • Credentials found for user • Had excessive privileges

Slide 15

Slide 15

TRACK: SITE RELIABILITY ENGINEERING Verkada - 2021 • Credentials found for user • Had excessive privileges Accessed 150k live camera feeds in schools, prisons, and hospitals

Slide 16

Slide 16

TRACK: SITE RELIABILITY ENGINEERING Reported by Rocky Chen? - 2021 • User accidentally deleted a namespace • Recreated it - but did it wrong • He thought he was in his test cluster • Assumed AWS role made it difficult to troubleshoot

Slide 17

Slide 17

TRACK: SITE RELIABILITY ENGINEERING SW company with tools used by law enforcement and sec teams • One of the devs ran kubectl command • Thought he was in test, was actually in prod • Assumed roles, never figured out who did it All access to Kubernetes removed and start over

Slide 18

Slide 18

TRACK: SITE RELIABILITY ENGINEERING What is the cost of breaches? • Avg cost: $4.24 million in 2021 • Avg time to identify: 212 days. • Avg lifecycle: 286 days from identification to containment. • The likelihood detected and prosecuted 0.05%. • Personal data involved in 45%. https://www.securitymagazine.com/articles/93990-a-cluster-without-rbac-is-an-insecure-cluster

Slide 19

Slide 19

TRACK: SITE RELIABILITY ENGINEERING How is this relevant to this talk? Let’s talk about Kubernetes & Cluster Admin

Slide 20

Slide 20

TRACK: SITE RELIABILITY ENGINEERING How is this relevant to this talk? Let’s talk about Kubernetes & Cluster Admin Cluster Admin is wonderful because you can do anything you want!!

Slide 21

Slide 21

TRACK: SITE RELIABILITY ENGINEERING How is this relevant to this talk? Let’s talk about Kubernetes & Cluster Admin Cluster Admin is wonderful because you can do anything you want!! Cluster Admin is scary because you can do anything you want!!

Slide 22

Slide 22

TRACK: SITE RELIABILITY ENGINEERING How is this relevant to this talk? Let’s talk about Kubernetes & Cluster Admin Cluster Admin is wonderful because you can do anything you want!! Cluster Admin is scary because you can do anything you want!! Cluster Admin is the worst thing ever because you can do anything you want!!

Slide 23

Slide 23

TRACK: SITE RELIABILITY ENGINEERING so the answer is don’t give cluster admin to everyone, right??

Slide 24

Slide 24

TRACK: SITE RELIABILITY ENGINEERING But creating users in k8s is HARD Users don’t actually exist in kubernetes

Slide 25

Slide 25

TRACK: SITE RELIABILITY ENGINEERING But creating users in k8s is HARD Users don’t actually exist in kubernetes Everything in k8s is a resource.

Slide 26

Slide 26

TRACK: SITE RELIABILITY ENGINEERING But creating users in k8s is HARD Users don’t actually exist in kubernetes Everything in k8s is a resource. But there is no user resource

Slide 27

Slide 27

TRACK: SITE RELIABILITY ENGINEERING But creating users in k8s is HARD Users don’t actually exist in kubernetes Everything in k8s is a resource. But there is no user resource Its All About the Certs

Slide 28

Slide 28

TRACK: SITE RELIABILITY ENGINEERING But creating users in k8s is HARD Users don’t actually exist in kubernetes Everything in k8s is a resource. But there is no user resource Its All About the Certs in your .kubeconfig

Slide 29

Slide 29

TRACK: SITE RELIABILITY ENGINEERING apiVersion: v1 clusters: - cluster: certificate-authority-data: certgoeshere server: https://clusterendpoint.k8s.ondigitalocean.com name: mycluster contexts: - context: cluster: mycluster user: do-sfo3-matt-primary-admin name: mycontext current-context: mycontext kind: Config preferences: {} users: - name: do-sfo3-matt-primary-admin user: token: dop_v1_dea9d7ff2b8eb092f53ffebogus31d2bd4602a62a19b5ac4

Slide 30

Slide 30

TRACK: SITE RELIABILITY ENGINEERING apiVersion: v1 clusters: - cluster: certificate-authority-data: certgoeshere server: https://clusterendpoint.k8s.ondigitalocean.com name: mycluster contexts: - context: cluster: mycluster user: do-sfo3-matt-primary-admin name: mycontext current-context: mycontext kind: Config preferences: {} users: - name: do-sfo3-matt-primary-admin user: token: dop_v1_dea9d7ff2b8eb092f53ffebogus31d2bd4602a62a19b5ac4

Slide 31

Slide 31

TRACK: SITE RELIABILITY ENGINEERING apiVersion: v1 clusters: - cluster: certificate-authority-data: certgoeshere server: https://clusterendpoint.k8s.ondigitalocean.com name: mycluster contexts: - context: cluster: mycluster user: do-sfo3-matt-primary-admin name: mycontext current-context: mycontext kind: Config preferences: {} users: - name: do-sfo3-matt-primary-admin user: token: dop_v1_dea9d7ff2b8eb092f53ffebogus31d2bd4602a62a19b5ac4

Slide 32

Slide 32

TRACK: SITE RELIABILITY ENGINEERING apiVersion: v1 clusters: - cluster: certificate-authority-data: certgoeshere server: https://clusterendpoint.k8s.ondigitalocean.com name: mycluster contexts: - context: cluster: mycluster user: do-sfo3-matt-primary-admin name: mycontext current-context: mycontext kind: Config preferences: {} users: - name: do-sfo3-matt-primary-admin user: token: dop_v1_dea9d7ff2b8eb092f53ffebogus31d2bd4602a62a19b5ac4

Slide 33

Slide 33

TRACK: SITE RELIABILITY ENGINEERING What is a Role? • Defines the level of access a ‘user’ has to the cluster • Resource • Verb

Slide 34

Slide 34

TRACK: SITE RELIABILITY ENGINEERING What is a Role? apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: marketing-dev labels: app.infrahq.com/include-role: “true” rules: - apiGroups: [“”] # “” indicates the core API group resources: [“pods”] verbs: [“get”, “watch”, “list”]

Slide 35

Slide 35

TRACK: SITE RELIABILITY ENGINEERING What is a Role? apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: marketing-dev labels: app.infrahq.com/include-role: “true” rules: - apiGroups: [“”] # “” indicates the core API group resources: [“pods”] verbs: [“get”, “watch”, “list”]

Slide 36

Slide 36

TRACK: SITE RELIABILITY ENGINEERING What is a Role? apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: marketing-dev labels: app.infrahq.com/include-role: “true” rules: - apiGroups: [“”] # “” indicates the core API group resources: [“pods”] verbs: [“get”, “watch”, “list”]

Slide 37

Slide 37

TRACK: SITE RELIABILITY ENGINEERING What is a Role? apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: marketing-dev labels: app.infrahq.com/include-role: “true” rules: - apiGroups: [“”] # “” indicates the core API group resources: [“pods”] verbs: [“get”, “watch”, “list”]

Slide 38

Slide 38

TRACK: SITE RELIABILITY ENGINEERING How to create a User • • • • Create the user key (openssl genpkey…) Create the CSR (openssl req –new) Submit the CSR to the cluster (yaml) Approve the request (kubectl certificate approve…)

Slide 39

Slide 39

TRACK: SITE RELIABILITY ENGINEERING How to create a User • Get the approved request (kubectl get csr…) • Build the kubeconfig (kubectl —kubeconfig myuserconfig config set-credentials, kubectl —kubeconfig myuserconfig configset-context) • Then distribute the file https://infrahq.com/blog/how-to-create-users

Slide 40

Slide 40

TRACK: SITE RELIABILITY ENGINEERING How to create a User • And then repeat often • Ensure bad parties can’t access • You can’t revoke a cert • And redistribute

Slide 41

Slide 41

TRACK: SITE RELIABILITY ENGINEERING that’s a lot of steps can we automate it?

Slide 42

Slide 42

TRACK: SITE RELIABILITY ENGINEERING

Slide 43

Slide 43

TRACK: SITE RELIABILITY ENGINEERING but… He doesn’t deal with file distribution

Slide 44

Slide 44

TRACK: SITE RELIABILITY ENGINEERING Is there something easier??

Slide 45

Slide 45

TRACK: SITE RELIABILITY ENGINEERING

Slide 46

Slide 46

TRACK: SITE RELIABILITY ENGINEERING Infra • Two deployment options • Self Hosted • Use Infra Cloud (coming soon)

Slide 47

Slide 47

TRACK: SITE RELIABILITY ENGINEERING DEMO

Slide 48

Slide 48

TRACK: SITE RELIABILITY ENGINEERING Summary • • • • • Least Privilege is important but… complicated on Kubernetes RBAC You can automate… Infra makes it easier

Slide 49

Slide 49

TRACK: SITE RELIABILITY ENGINEERING NOVEMBER 10, 2022 I am Cluster Admin, Destroyer of Everything You Hold Dear Matt Williams, Evangelist @ Infra @technovangelist

Slide 50

Slide 50

TRACK: SITE RELIABILITY ENGINEERING