Kubernetes Operators: Operating Cloud Native services at scale

A presentation at CloudWeek in June 2020 in by Horacio Gonzalez

Slide 1

Slide 1

Kubernetes Operators: Operating Cloud Native services at scale Horacio Gonzalez 2020-06-22

Slide 2

Slide 2

Who are we? Introducing myself and introducing OVH OVHcloud

Slide 3

Slide 3

Horacio Gonzalez @LostInBrittany Spaniard lost in Brittany, developer, dreamer and all-around geek Flutter

Slide 4

Slide 4

OVHcloud: A Global Leader 200k Private cloud VMs running 1 Dedicated IaaS Europe 30 Datacenters Own 20Tbps Hosting capacity : 1.3M Physical Servers 360k Servers already deployed Netwok with 35 PoPs

1.3M Customers in 138 Countries

Slide 5

Slide 5

OVHcloud: 4 Universes of Products WebCloud Domain / Email Domain names, DNS, SSL, Redirect Email, Open-Xchange, Exchange Baremetal Cloud VM General Purpose Baremetal SuperPlan T2 >20e Virtualization T3 >80e Storage PaaS for Web Mutu, CloudWeb Compute Standalone, Cluster Game Collaborative Tools, NextCloud Database T4 >300e Bigdata T5 >600e HCI Plesk, CPanel AI PaaS with Platform.sh VDI Cloud Game Public Cloud 12KVA /32KVA Hosted Private Cloud K8S, IA IaaS PaaS for DevOps Storage File, Block, Object, Archive Databases SQL, noSQL, Messaging, Dashboard Network Virtual servers VPS, Dedicated Server Network VPS aaS pCC DC SaaS CRM, Billing, Payment, Stats IP FO, NAT, LB, VPN, Router, DNS, DHCP, TCP/SSL Offload Virtuozzo Cloud Security Wordpress, Magento, Prestashop Wholesales Hosted Private Cloud IAM, MFA, Encrypt, KMS IT Integrators, Cloud Storage, VMware SDDC, vSAN 1AZ / 2AZ vCD, Tanzu, Horizon, DBaaS, DRaaS Nutanix HCI 1AZ / 2AZ, Databases, DRaaS, VDI OpenStack IAM, Compute (VM, K8S) Stortage, Network, Databases Storage Ontap Select, Nutanix File OpenIO, MinIO, CEPH Zerto, Veeam, Atempo AI ElementAI, HuggingFace, Deepopmatic, Systran, EarthCube Bigdata / Analitics / ML Cloudera over S3, Dataiku, Saagie, Tableau, MarketPlace CDN, Database, ISV, WebHosting Support, Managed High Intensive CPU/GPU, Support Basic Encrypt Support thought Partners KMS, HSM Managed services Encrypt (SGX, Network, Storage) IA, DL Hybrid Cloud Standard Tools for AI, AI Studio, vRack Connect, Edge-DC, Private DC IA IaaS, Hosting API AI Dell, HP, Cisco, OCP, MultiCloud Bigdata, ML, Analytics Datalake, ML, Dashboard Secured Cloud GOV, FinTech, Retail, HealtCare

Slide 6

Slide 6

OVHcloud & Poland ● Klaba family comes from Poland ● OVHcloud data center in Warsaw ● OVHcloud office in Wroclaw

Slide 7

Slide 7

OVHcloud Managed Kubernetes You use it, we operate it

Slide 8

Slide 8

Built over our Openstack based Public Cloud

Slide 9

Slide 9

Some interesting features

Slide 10

Slide 10

Operating Kubernetes Easier said than done

Slide 11

Slide 11

Operating microservices? Are you sure you want to operate them by hand?

Slide 12

Slide 12

Taming microservices with Kubernetes

Slide 13

Slide 13

Declarative infrastructure

Slide 14

Slide 14

Desired State Management

Slide 15

Slide 15

Beyond a simple deployment Everything is good now, isn’t it?

Slide 16

Slide 16

Complex deployments

Slide 17

Slide 17

Complex deployments

Slide 18

Slide 18

Helm Charts are configuration Operating is more than installs & upgrades

Slide 19

Slide 19

Kubernetes is about automation How about automating human operators?

Slide 20

Slide 20

Kubernetes Controllers Keeping an eye on the resources

Slide 21

Slide 21

A control loop They watch the state of the cluster, and make or request changes where needed

Slide 22

Slide 22

A reconcile loop Strives to reconcile current state and desired state

Slide 23

Slide 23

Custom Resource Definitions Extending Kubernetes API

Slide 24

Slide 24

Extending Kubernetes API By defining new types of resources

Slide 25

Slide 25

Kubernetes Operator Automating operations

Slide 26

Slide 26

What’s a Kubernetes Operator?

Slide 27

Slide 27

Example: databases Things like adding an instance to a pool, doing a backup, sharding…

Slide 28

Slide 28

Knowledge encoded in CRDs and Controllers

Slide 29

Slide 29

Custom Controllers for Custom Resources Operators implement and manage Custom Resources using custom reconciliation logic

Slide 30

Slide 30

Operator Capability Model Gauging the operator maturity

Slide 31

Slide 31

How to write an Operator

Slide 32

Slide 32

Kubebuilder SDK for building Kubernetes APIs using CRDs

Slide 33

Slide 33

The Operator Framework Open source framework to accelerate the development of an Operator

Slide 34

Slide 34

Operator SDK Three different ways to build an Operator

Slide 35

Slide 35

Operator SDK and Capability Model

Slide 36

Slide 36

Operator Lifecycle Manager

Slide 37

Slide 37

OperatorHub.io

Slide 38

Slide 38

Harbor Operator Managing private registries at scale

Slide 39

Slide 39

We wanted to build a new product OVHcloud Managed Private Registry

Slide 40

Slide 40

Looking at the Open Source world Two main alternatives around Docker Registry

Slide 41

Slide 41

Harbor has more community traction Two main alternatives

Slide 42

Slide 42

Harbor has lots of components

Slide 43

Slide 43

But it has a Helm Chart It should be easy to install, isn’t it? $ helm install harbor What about configuration? Installing a 200 GB K8s volume? Nginx pods for routing requests? One DB instance per customer? Managing pods all around the cluster?

Slide 44

Slide 44

We wanted a Managed Private Registry

Slide 45

Slide 45

Using the platform Kubernetes tooling to the rescue

Slide 46

Slide 46

Let’s automate it We needed an operator… and there wasn’t any

Slide 47

Slide 47

Working with the community Harbor community also needed the operator

Slide 48

Slide 48

The challenge: reconciliation loop

Slide 49

Slide 49

The Harbor Operator

Slide 50

Slide 50

It’s Open Source https://github.com/goharbor/harbor-operator

Slide 51

Slide 51

LoadBalancer Operator A managed LoadBalancer at scale

Slide 52

Slide 52

Load Balancer: a critical cog Cornerstone of any Cloud Provider’s infrastructure

Slide 53

Slide 53

Our legacy Load Balancer stack ● Excellent performances ○ ○ Built on bare metal servers + BGP Custom made servers tuned for network traffic ● Carry the TLS termination ○ SSL / LetsEncrypt ● Not cloud ready ○ ○ Piloted by configuration files Long configuration loading time ● Custom made hardware ○ ○ Slower to build Needs to be deployed on 30 datacenters

Slide 54

Slide 54

Our needs for a new Load Balancer ● Supporting mass update ● Quickly reconfigurable ● Available anywhere quickly ● Easily operable ● Integrated into our Public Cloud

Slide 55

Slide 55

Building it on Kubernetes

Slide 56

Slide 56

A Load Balancer in a pod

Slide 57

Slide 57

Orchestrating one million LBs… kubectl apply -f lb is not an option!

Slide 58

Slide 58

We needed an Operator

Slide 59

Slide 59

Network: multus-cni Attaching multiple network interfaces to pods: Bridge + Host-local

Slide 60

Slide 60

Adding network interfaces on the fly Using annotations to add interfaces to pod

Slide 61

Slide 61

Config management Using Config Map How to detect a change on Config Map files? Watch + Trigger? More information on Config Map working martensson.io/go-fsnotify-and-kubernetes-configmaps

Slide 62

Slide 62

A Controller to watch and trigger

Slide 63

Slide 63

Observability Tried Prometheus Operator, limited to one container per pod Switched to Warp 10 with Beamium Operator

Slide 64

Slide 64

That’s all, folks! Thank you all!