From Containers to Kubernetes Operators

A presentation at DeveloperWeek Enterprise in November 2020 in by Emanuil Tolev

Slide 1

Slide 1

From Containers to Kubernetes Operators Emanuil Tolev @emanuil_tolev @emanuil_tolev

Slide 2

Slide 2

Community Engineer @emanuil_tolev

Slide 3

Slide 3

@emanuil_tolev This is what we do

Slide 4

Slide 4

Agenda - Elastic’s journey through Docker images Helm Chart Kubernetes Operator @emanuil_tolev Slides PDF at the end This is a talk which contains a lot. Introductions to a few rare concepts, the lessons that a few dozen people learned over 8+ years and a bit of practical yaml to help you really ingest the abstract stuff.

Slide 5

Slide 5

Docker’s great but not without issues.

Slide 6

Slide 6

! Containers are the new ZIP format to distribute software @emanuil_tolev

Slide 7

Slide 7

One of many… RPM, DEB, TAR.GZ, MSI Ansible, Chef, Puppet @emanuil_tolev We don’t really care. Any of those will work. But if you insist on containers, let’s see what is in there

Slide 8

Slide 8

Fallacy :latest @emanuil_tolev https://github.com/elastic/ elasticsearch-docker/issues/ 75

Slide 9

Slide 9

Slide 10

Slide 10

No :latest, what about :7 and :7.7? Exact only: 7.7.1 @emanuil_tolev Your cluster is stuck on the smallest version — no gain. Just bugs of all versions and potential for bad combinations ^ Creating a shard on one node may not allow it to move to other nodes, resulting in eventual cluster imbalance.

Slide 11

Slide 11

The base image diversity and size debate @emanuil_tolev Only say “some people like to use different base images for each component to try to reduce size and fulfil other requirements - we did that too but ended up with one common base linux image in the end.”

Slide 12

Slide 12

Common base image since 5.4+ CentOS 7 Similar setup Shared layers - ultimately much better for size across components @emanuil_tolev

Slide 13

Slide 13

@emanuil_tolev

Slide 14

Slide 14

Kubernetes is the answer. What was the question? — https://twitter.com/charlesfitz/status/ 1068203930683752448 @emanuil_tolev

Slide 15

Slide 15

@emanuil_tolev This one is almost synonymous to Kubernetes: YAML

Slide 16

Slide 16

…lots of it @emanuil_tolev we started to feel more like YAML engineers than software engineers at some point

Slide 17

Slide 17

Fun with YAML http://www.yamllint.com ports: - 80:80 - 20:20 @emanuil_tolev Imagine you want to deploy a service. You read the Docker documentation and see this example of opening some ports to the outside. Can anybody spot the problem? 2060(1):2060(0)=1220 * 60

Slide 18

Slide 18

Fun with YAML https://docs.docker.com/compose/compose-file/#short-syntax-1 ports: - “80:80” - 73200

2060(1):2060(0)=1220 * 60=73200

@emanuil_tolev Always quote the port mapping of Docker. A container port lower than 60 will be evaluated as base-60

Slide 19

Slide 19

@emanuil_tolev Helm Chart Advanced package management with support for templating and more complex resources

Slide 20

Slide 20

Building on existing Kubernetes primitives like StatefulSet, Service, Deployment,… @emanuil_tolev

Slide 21

Slide 21

Elastic Helm Charts Elasticsearch, Kibana, Filebeat, Metricbeat, APM Server, Logstash https://github.com/elastic/helm-charts @emanuil_tolev

Slide 22

Slide 22

Tested on GKE Default storage pd-ssd (network attached) Kubernetes >=1.10 supports Local PersistentVolumes for increased performance @emanuil_tolev

Slide 23

Slide 23

Un-Opinionated Expose environment variables & mount secrets Multiple upgrade strategies @emanuil_tolev Doing this makes it much easier for this chart to support multiple versions with minimal changes

Slide 24

Slide 24

Minikube Example https://github.com/elastic/helm-charts/tree/master/elasticsearch/examples/ minikube helm repo add elastic https://helm.elastic.co helm install —name elasticsearch elastic/elasticsearch [—set imageTag=7.7.1] minikube addons enable default-storageclass minikube addons enable storage-provisioner cd examples/minikube make @emanuil_tolev Won’t have time to dig into this

Slide 25

Slide 25

—# Permit co-located instances for solitary minikube virtual machines antiAffinity: “soft” # Shrink default JVM heap esJavaOpts: “-Xmx128m -Xms128m” # Allocate smaller chunks of memory per pod resources: requests: cpu: “100m” memory: “512M” limits: cpu: “1000m” memory: “512M” # Request smaller persistent volumes volumeClaimTemplate: accessModes: [ “ReadWriteOnce” ] storageClassName: “standard” resources: requests: storage: 100M https://github.com/elastic/ helm-charts/tree/master/ elasticsearch/examples/ minikube/values.yaml

Slide 26

Slide 26

@emanuil_tolev Kubernetes Operator pattern Operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop. ^ Mention difference: operator has a runtime component, full lifecycle, do anything

Slide 27

Slide 27

Custom Resource (CR) CRD == type definition (class) CR == instance (object) @emanuil_tolev Class and object in the sense of object orientation; maybe not exactly the same but close

Slide 28

Slide 28

Custom Resource Definition (CRD) Think: Elasticsearch, Kibana, APM Contrast: Built-in resources like Pods, Services, Secrets, StatefulSets,… @emanuil_tolev

Slide 29

Slide 29

Custom Controller Brings CRDs to “life” (reconciliation loop) Upgrades, secrets, certificate management,… @emanuil_tolev Cost of the custom controller: custom software, expensive to develop, maintain, has a lifecycle, needs rollout,… Reconciliation is stateless, can crash any time (and restart)

Slide 30

Slide 30

@emanuil_tolev BTW I try to avoid ECK as a name because it only confuses people. Everybody talks about operators and we also want to keep it kind of generic and not only an ECK pitch.

Slide 31

Slide 31

Elastic Operator Elasticsearch, Kibana, APM Server https://github.com/elastic/cloud-on-k8s @emanuil_tolev

Slide 32

Slide 32

Golang 1.13 Kubebuilder 2 SDK for building Kubernetes APIs using CRDs Kustomize Generate patched CRDs for specific flavors @emanuil_tolev Kustomize: trivial-versions: No apiserver-side validation at all. Required for K8s <= 1.13

Slide 33

Slide 33

@emanuil_tolev Emphasise the integers: 1,2,3,4 . It’s a diagram, make it exciting and emphasise the start of each point. ^ Missing the APM server, but it’s the same ^ TODO: many more

Slide 34

Slide 34

Opinionated Encode best practices & operational knowledge Built-in certificate management, security,… @emanuil_tolev

Slide 35

Slide 35

Example Opinions Scale down: Drain nodes first Upgrade: Disable shard allocation @emanuil_tolev Not sure if all indices are replicated — different than Helm Delete pod, will be recreated by StatefulSet controller with the latest config

Slide 36

Slide 36

You Can Still Shoot Yourself in the Foot Configure 0 replicas and do an upgrade for example @emanuil_tolev

Slide 37

Slide 37

Running on Minikube minikube config set memory 16384 minikube config set cpus 4 minikube start @emanuil_tolev Requires non-trivial resources even just locally with minikube

Slide 38

Slide 38

Running on Minikube # Set up the entire operator: configs, deployment practices, monitoring, in one command kubectl apply -f https://download.elastic.co/downloads/eck/1.1.2/all-in-one.yaml # Monitor logs kubectl -n elastic-system logs -f statefulset.apps/elastic-operator # And this is where you come in - the configs you write kubectl apply -f apm_es_kibana.yaml @emanuil_tolev The following 3 code segments make up apmeskibana.yaml

Slide 39

Slide 39

—apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: elasticsearch-sample spec: version: 7.7.1 nodes: - nodeCount: 1 podTemplate: spec: containers: - name: elasticsearch resources: limits: memory: 2Gi volumeClaimTemplates: - metadata: name: data spec: accessModes: - ReadWriteOnce resources: requests: storage: 2Gi

Slide 40

Slide 40

—apiVersion: apm.k8s.elastic.co/v1 kind: ApmServer metadata: name: apm-server-sample spec: version: 7.7.1 nodeCount: 1 elasticsearchRef: name: “elasticsearch-sample” @emanuil_tolev

Slide 41

Slide 41

—apiVersion: kibana.k8s.elastic.co/v1 kind: Kibana metadata: name: kibana-sample spec: version: 7.7.1 nodeCount: 1 elasticsearchRef: name: “elasticsearch-sample” @emanuil_tolev

Slide 42

Slide 42

Running on Minikube # Check status kubectl get elasticsearch,kibana,apmserver # Expose Kibana kubectl port-forward service/kibana-sample-kb-http 5601 # Get the credentials echo kubectl get secret elasticsearch-sample-es-elastic-user -o=jsonpath='{.data.elastic}' | base64 --decode @emanuil_tolev Security on, self generated certificates, default service exposure

Slide 43

Slide 43

Changes Instance size / number, version,… kubectl apply -f apm_es_kibana.yaml @emanuil_tolev

Slide 44

Slide 44

Support GKE (Google Cloud) EKS (AWS) AKS (Azure) OpenShift (Redhat) @emanuil_tolev

Slide 45

Slide 45

StatefulSets Rolling Upgrades with Volume reuse “Standard” way to run stateful workloads — stable network ID, stable data volume that is re-attachable during rolling upgrades @emanuil_tolev Each logical group of Elasticsearch nodes (master, data,…) to a StatefulSet

Slide 46

Slide 46

Deployment CRDs require cluster admin level permissions to install @emanuil_tolev

Slide 47

Slide 47

Global Namespace @emanuil_tolev User installs ECK at global level ECK checks all namespaces for new ES / Kibana / APM Server objects Does not scale well with the number of clusters Requires elevated permissions on the cluster

Slide 48

Slide 48

Single Namespace @emanuil_tolev User installs ECK in specific namespaces ECK only checks for definition of ES / KB / APM Server in each namespace Does not play well with cross-namespace features (a single enterprise license pool for multiple clusters in multiple namespaces, cross-cluster search and replication on clusters across namespaces) To deploy 5 clusters in 5 different namespaces, it requires 5 operators running. A single one could have been technically enough.

Slide 49

Slide 49

Other Operators: MongoDB, Kafka, Redis, CockroachDB,… Operator “Marketplaces” like https://operatorhub.io/

Slide 50

Slide 50

Conclusion @emanuil_tolev

Slide 51

Slide 51

“Containers are disrupting the industry!” @emanuil_tolev With all the bad practices (root, :latest, yolo deploy), this is not surprising

Slide 52

Slide 52

“Can I run Elasticsearch on Docker or Kubernetes?” @emanuil_tolev Yes

Slide 53

Slide 53

“Should I run Elasticsearch on Docker or Kubernetes?” @emanuil_tolev It depends, but if you’re unsure about Docker: no

Slide 54

Slide 54

Effective collaboration and solving production problems Remember why you’re doing all this @emanuil_tolev It’s OK to try new stuff, but be careful that it’s actually helping you be a more effective business and (ultimately) leaves people happier after some onboarding pain. A bunch of VMs set up with Ansible or Chef may be just the right thing for you at the stage you’re at with the team skills you have. Do some prototype work, investigate, identify what problems exactly k8s will solve, commit and put in the effort and then do retrospectives on whether it did solve them.

Slide 55

Slide 55

Helm Charts vs Operator @emanuil_tolev Un-opinionated vs opinionated. First is good if you want to run many services and potentially in a similar fashion. Second one is more specialized with the good and the bad parts of that

Slide 56

Slide 56

Where to next? • Deeper look at the operator: https:// www.elastic.co/blog/introducing-elastic-cloudon-kubernetes-the-elasticsearch-operator-andbeyond • The source code: https://github.com/elastic/ cloud-on-k8s • The slides: https://noti.st/emanuil-tolev/ CqvknF/from-containers-to-kubernetes-operators @emanuil_tolev

Slide 57

Slide 57

Elastic interest? community.elastic.co free lunch info sessions, ping me @emanuil_tolev

Slide 58

Slide 58

Questions & Discussion @emanuil_tolev etolev@elastic.co @emanuil_tolev Next: https://www.elastic.co/blog/introducing-elastic-cloud-onkubernetes-the-elasticsearch-operator-and-beyond ^ ECE features that do not (some may never) exist in ECK Rollback on plan failure Snapshot management Index curation RBAC access Custom plugins upload API + UI to manage deployments Deployment templates IP filtering Built-in logs & metrics Support for ES <v6