TRACK: SITE RELIABILITY ENGINEERING NOVEMBER 10, 2022 I am Cluster Admin, Destroyer of Everything You Hold Dear Matt Williams, Evangelist @ Infra TW: @technovangelist - Mast: @technovangelist@fosstodon.org
A presentation at All Day Devops in November 2022 in by Matt Williams
TRACK: SITE RELIABILITY ENGINEERING NOVEMBER 10, 2022 I am Cluster Admin, Destroyer of Everything You Hold Dear Matt Williams, Evangelist @ Infra TW: @technovangelist - Mast: @technovangelist@fosstodon.org
TRACK: SITE RELIABILITY ENGINEERING Least Privilege According to Cybersecurity & Infrastructure Security Agency (CISA): Only the minimum necessary rights should be assigned to a subject that requests access to a resource and should be in effect for the shortest duration necessary … careful delegation of access rights can limit attackers from damaging a system.
TRACK: SITE RELIABILITY ENGINEERING What happens when we skip Least Privilege
TRACK: SITE RELIABILITY ENGINEERING Target - 2013 • HVAC on main network • Useful for monitoring energy consumption at various stores
TRACK: SITE RELIABILITY ENGINEERING Target - 2013 • HVAC on main network • Useful for monitoring energy consumption at various stores • Technician compromised
TRACK: SITE RELIABILITY ENGINEERING Target - 2013 • HVAC on main network • Useful for monitoring energy consumption at various stores • Technician compromised Attackers stole 40 million debit and credit cards
TRACK: SITE RELIABILITY ENGINEERING GitLab - 2017 • SRE responding to incident • Intended to drop replica database
TRACK: SITE RELIABILITY ENGINEERING GitLab - 2017 • SRE responding to incident • Intended to drop replica database • Fat fingered the production database and had excessive privileges to do it
TRACK: SITE RELIABILITY ENGINEERING GitLab - 2017 • SRE responding to incident • Intended to drop replica database • Fat fingered the production database and had excessive privileges to do it GitLab went down for 6 hours, 5k projects lost (issues, etc), comments, users
TRACK: SITE RELIABILITY ENGINEERING Marriott - 2018 User compromised Had admin access for everything Ran some database queries
TRACK: SITE RELIABILITY ENGINEERING Marriott - 2018 User compromised Had admin access for everything Ran some database queries Hundreds of millions of customer records lost
TRACK: SITE RELIABILITY ENGINEERING Capital One - 2019 • Misconfigured firewall • Generated temp account creds via SSRF exploit • Had excessive privileges to sync S3 buckets
TRACK: SITE RELIABILITY ENGINEERING Capital One - 2019 • Misconfigured firewall • Generated temp account creds via SSRF exploit • Had excessive privileges to sync S3 buckets 30GB of credit application data, affecting 100 million in US, 6 million in Canada
TRACK: SITE RELIABILITY ENGINEERING Verkada - 2021 • Credentials found for user • Had excessive privileges
TRACK: SITE RELIABILITY ENGINEERING Verkada - 2021 • Credentials found for user • Had excessive privileges Accessed 150k live camera feeds in schools, prisons, and hospitals
TRACK: SITE RELIABILITY ENGINEERING Reported by Rocky Chen? - 2021 • User accidentally deleted a namespace • Recreated it - but did it wrong • He thought he was in his test cluster • Assumed AWS role made it difficult to troubleshoot
TRACK: SITE RELIABILITY ENGINEERING SW company with tools used by law enforcement and sec teams • One of the devs ran kubectl command • Thought he was in test, was actually in prod • Assumed roles, never figured out who did it All access to Kubernetes removed and start over
TRACK: SITE RELIABILITY ENGINEERING What is the cost of breaches? • Avg cost: $4.24 million in 2021 • Avg time to identify: 212 days. • Avg lifecycle: 286 days from identification to containment. • The likelihood detected and prosecuted 0.05%. • Personal data involved in 45%. https://www.securitymagazine.com/articles/93990-a-cluster-without-rbac-is-an-insecure-cluster
TRACK: SITE RELIABILITY ENGINEERING How is this relevant to this talk? Let’s talk about Kubernetes & Cluster Admin
TRACK: SITE RELIABILITY ENGINEERING How is this relevant to this talk? Let’s talk about Kubernetes & Cluster Admin Cluster Admin is wonderful because you can do anything you want!!
TRACK: SITE RELIABILITY ENGINEERING How is this relevant to this talk? Let’s talk about Kubernetes & Cluster Admin Cluster Admin is wonderful because you can do anything you want!! Cluster Admin is scary because you can do anything you want!!
TRACK: SITE RELIABILITY ENGINEERING How is this relevant to this talk? Let’s talk about Kubernetes & Cluster Admin Cluster Admin is wonderful because you can do anything you want!! Cluster Admin is scary because you can do anything you want!! Cluster Admin is the worst thing ever because you can do anything you want!!
TRACK: SITE RELIABILITY ENGINEERING so the answer is don’t give cluster admin to everyone, right??
TRACK: SITE RELIABILITY ENGINEERING But creating users in k8s is HARD Users don’t actually exist in kubernetes
TRACK: SITE RELIABILITY ENGINEERING But creating users in k8s is HARD Users don’t actually exist in kubernetes Everything in k8s is a resource.
TRACK: SITE RELIABILITY ENGINEERING But creating users in k8s is HARD Users don’t actually exist in kubernetes Everything in k8s is a resource. But there is no user resource
TRACK: SITE RELIABILITY ENGINEERING But creating users in k8s is HARD Users don’t actually exist in kubernetes Everything in k8s is a resource. But there is no user resource Its All About the Certs
TRACK: SITE RELIABILITY ENGINEERING But creating users in k8s is HARD Users don’t actually exist in kubernetes Everything in k8s is a resource. But there is no user resource Its All About the Certs in your .kubeconfig
TRACK: SITE RELIABILITY ENGINEERING apiVersion: v1 clusters: - cluster: certificate-authority-data: certgoeshere server: https://clusterendpoint.k8s.ondigitalocean.com name: mycluster contexts: - context: cluster: mycluster user: do-sfo3-matt-primary-admin name: mycontext current-context: mycontext kind: Config preferences: {} users: - name: do-sfo3-matt-primary-admin user: token: dop_v1_dea9d7ff2b8eb092f53ffebogus31d2bd4602a62a19b5ac4
TRACK: SITE RELIABILITY ENGINEERING apiVersion: v1 clusters: - cluster: certificate-authority-data: certgoeshere server: https://clusterendpoint.k8s.ondigitalocean.com name: mycluster contexts: - context: cluster: mycluster user: do-sfo3-matt-primary-admin name: mycontext current-context: mycontext kind: Config preferences: {} users: - name: do-sfo3-matt-primary-admin user: token: dop_v1_dea9d7ff2b8eb092f53ffebogus31d2bd4602a62a19b5ac4
TRACK: SITE RELIABILITY ENGINEERING apiVersion: v1 clusters: - cluster: certificate-authority-data: certgoeshere server: https://clusterendpoint.k8s.ondigitalocean.com name: mycluster contexts: - context: cluster: mycluster user: do-sfo3-matt-primary-admin name: mycontext current-context: mycontext kind: Config preferences: {} users: - name: do-sfo3-matt-primary-admin user: token: dop_v1_dea9d7ff2b8eb092f53ffebogus31d2bd4602a62a19b5ac4
TRACK: SITE RELIABILITY ENGINEERING apiVersion: v1 clusters: - cluster: certificate-authority-data: certgoeshere server: https://clusterendpoint.k8s.ondigitalocean.com name: mycluster contexts: - context: cluster: mycluster user: do-sfo3-matt-primary-admin name: mycontext current-context: mycontext kind: Config preferences: {} users: - name: do-sfo3-matt-primary-admin user: token: dop_v1_dea9d7ff2b8eb092f53ffebogus31d2bd4602a62a19b5ac4
TRACK: SITE RELIABILITY ENGINEERING What is a Role? • Defines the level of access a ‘user’ has to the cluster • Resource • Verb
TRACK: SITE RELIABILITY ENGINEERING What is a Role? apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: marketing-dev labels: app.infrahq.com/include-role: “true” rules: - apiGroups: [“”] # “” indicates the core API group resources: [“pods”] verbs: [“get”, “watch”, “list”]
TRACK: SITE RELIABILITY ENGINEERING What is a Role? apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: marketing-dev labels: app.infrahq.com/include-role: “true” rules: - apiGroups: [“”] # “” indicates the core API group resources: [“pods”] verbs: [“get”, “watch”, “list”]
TRACK: SITE RELIABILITY ENGINEERING What is a Role? apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: marketing-dev labels: app.infrahq.com/include-role: “true” rules: - apiGroups: [“”] # “” indicates the core API group resources: [“pods”] verbs: [“get”, “watch”, “list”]
TRACK: SITE RELIABILITY ENGINEERING What is a Role? apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: marketing-dev labels: app.infrahq.com/include-role: “true” rules: - apiGroups: [“”] # “” indicates the core API group resources: [“pods”] verbs: [“get”, “watch”, “list”]
TRACK: SITE RELIABILITY ENGINEERING How to create a User • • • • Create the user key (openssl genpkey…) Create the CSR (openssl req –new) Submit the CSR to the cluster (yaml) Approve the request (kubectl certificate approve…)
TRACK: SITE RELIABILITY ENGINEERING How to create a User • Get the approved request (kubectl get csr…) • Build the kubeconfig (kubectl —kubeconfig myuserconfig config set-credentials, kubectl —kubeconfig myuserconfig configset-context) • Then distribute the file https://infrahq.com/blog/how-to-create-users
TRACK: SITE RELIABILITY ENGINEERING How to create a User • And then repeat often • Ensure bad parties can’t access • You can’t revoke a cert • And redistribute
TRACK: SITE RELIABILITY ENGINEERING that’s a lot of steps can we automate it?
TRACK: SITE RELIABILITY ENGINEERING
TRACK: SITE RELIABILITY ENGINEERING but… He doesn’t deal with file distribution
TRACK: SITE RELIABILITY ENGINEERING Is there something easier??
TRACK: SITE RELIABILITY ENGINEERING
TRACK: SITE RELIABILITY ENGINEERING Infra • Two deployment options • Self Hosted • Use Infra Cloud (coming soon)
TRACK: SITE RELIABILITY ENGINEERING DEMO
TRACK: SITE RELIABILITY ENGINEERING Summary • • • • • Least Privilege is important but… complicated on Kubernetes RBAC You can automate… Infra makes it easier
TRACK: SITE RELIABILITY ENGINEERING NOVEMBER 10, 2022 I am Cluster Admin, Destroyer of Everything You Hold Dear Matt Williams, Evangelist @ Infra @technovangelist
TRACK: SITE RELIABILITY ENGINEERING