From Containers to Kubernetes Operators

A presentation at CNCF Birmingham August 2020 Meetup in August 2020 in Birmingham, AL, USA by Emanuil Tolev

Slide 1

Slide 1

From Containers to Kubernetes Operators Emanuil Tolev @emanuil_tolev @emanuil_tolev

Slide 2

Slide 2

Community Engineer @emanuil_tolev

Slide 3

Slide 3

@emanuil_tolev This is what we do

Slide 4

Slide 4

Agenda - Elastic’s journey through Docker images Helm Chart Kubernetes Operator @emanuil_tolev Slides PDF at the end This is a talk which contains a lot. Introductions to a few rare concepts, the lessons that a few dozen people learned over 8+ years and a bit of practical yaml to help you really ingest the abstract stuff.

Slide 5

Slide 5

@emanuil_tolev

Slide 6

Slide 6

Slide 7

Slide 7

! Containers are the new ZIP format to distribute software @emanuil_tolev

Slide 8

Slide 8

One of many… RPM, DEB, TAR.GZ, MSI Ansible, Chef, Puppet @emanuil_tolev We don’t really care. Any of those will work. But if you insist on containers, let’s see what is in there

Slide 9

Slide 9

…but not without issues @emanuil_tolev

Slide 10

Slide 10

Fallacy root and chmod 777 @emanuil_tolev

Slide 11

Slide 11

@emanuil_tolev https://twitter.com/waxzce/ status/ 1151874532686422017

Slide 12

Slide 12

The container runs Elasticsearch as user elasticsearch using uid:gid 1000:0. https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html @emanuil_tolev

Slide 13

Slide 13

https://github.com/elastic/ elasticsearch-docker/issues/ 32

Slide 14

Slide 14

Slide 15

Slide 15

Slide 16

Slide 16

Fallacy :latest @emanuil_tolev https://github.com/elastic/ elasticsearch-docker/issues/ 75

Slide 17

Slide 17

Slide 18

Slide 18

No :latest, what about :7 and :7.7? Exact only: 7.7.1 @emanuil_tolev Your cluster is stuck on the smallest version — no gain. Just bugs of all versions and potential for bad combinations ^ Creating a shard on one node may not allow it to move to other nodes, resulting in eventual cluster imbalance.

Slide 19

Slide 19

The base image diversity and size debate @emanuil_tolev Only say “some people like to use different base images for each component to try to reduce size and fulfil other requirements - we did that too but ended up with one common base linux image in the end.”

Slide 20

Slide 20

Common base image since 5.4+ CentOS 7 Similar setup Shared layers - ultimately much better for size across components @emanuil_tolev

Slide 21

Slide 21

@emanuil_tolev

Slide 22

Slide 22

Kubernetes is the answer. What was the question? — https://twitter.com/charlesfitz/status/ 1068203930683752448 @emanuil_tolev

Slide 23

Slide 23

@emanuil_tolev This one is almost synonymous to Kubernetes: YAML

Slide 24

Slide 24

…lots of it @emanuil_tolev we started to feel more like YAML engineers than software engineers at some point

Slide 25

Slide 25

Fun with YAML http://www.yamllint.com ports: - 80:80 - 2060(1):2060(0)=1220 * 60 - 54:3 == 234 @emanuil_tolev Imagine you want to deploy a service. You read the Docker documentation and see this example of opening some ports to the outside. Can anybody spot the problem?

Slide 26

Slide 26

Fun with YAML https://docs.docker.com/compose/compose-file/#short-syntax-1 ports: - “80:80” - 73200 @emanuil_tolev Always quote the port mapping of Docker. A container port lower than 60 will be evaluated as base-60

Slide 27

Slide 27

@emanuil_tolev Helm Chart Advanced package management with support for templating and more complex resources

Slide 28

Slide 28

Building on existing Kubernetes primitives like StatefulSet, Service, Deployment,… @emanuil_tolev

Slide 29

Slide 29

Elastic Helm Charts Elasticsearch, Kibana, Filebeat, Metricbeat, APM Server, Logstash https://github.com/elastic/helm-charts @emanuil_tolev

Slide 30

Slide 30

Tested on GKE Default storage pd-ssd (network attached) Kubernetes >=1.10 supports Local PersistentVolumes for increased performance @emanuil_tolev

Slide 31

Slide 31

Un-Opinionated Expose environment variables & mount secrets Multiple upgrade strategies @emanuil_tolev Doing this makes it much easier for this chart to support multiple versions with minimal changes

Slide 32

Slide 32

Minikube Example https://github.com/elastic/helm-charts/tree/master/elasticsearch/examples/ minikube helm repo add elastic https://helm.elastic.co helm install —name elasticsearch elastic/elasticsearch [—set imageTag=7.7.1] minikube addons enable default-storageclass minikube addons enable storage-provisioner cd examples/minikube make @emanuil_tolev

Slide 33

Slide 33

—# Permit co-located instances for solitary minikube virtual machines antiAffinity: “soft” # Shrink default JVM heap esJavaOpts: “-Xmx128m -Xms128m” # Allocate smaller chunks of memory per pod resources: requests: cpu: “100m” memory: “512M” limits: cpu: “1000m” memory: “512M” # Request smaller persistent volumes volumeClaimTemplate: accessModes: [ “ReadWriteOnce” ] storageClassName: “standard” resources: requests: storage: 100M https://github.com/elastic/ helm-charts/tree/master/ elasticsearch/examples/ minikube/values.yaml

Slide 34

Slide 34

@emanuil_tolev Kubernetes Operator pattern Operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop. ^ Mention difference: operator has a runtime component, full lifecycle, do anything

Slide 35

Slide 35

Custom Resource (CR) CRD == type definition (class) CR == instance (object) @emanuil_tolev Class and object in the sense of object orientation; maybe not exactly the same but close

Slide 36

Slide 36

Custom Resource Definition (CRD) Think: Elasticsearch, Kibana, APM Contrast: Built-in resources like Pods, Services, Secrets, StatefulSets,… @emanuil_tolev

Slide 37

Slide 37

Custom Controller Brings CRDs to “life” (reconciliation loop) Upgrades, secrets, certificate management,… @emanuil_tolev Cost of the custom controller: custom software, expensive to develop, maintain, has a lifecycle, needs rollout,… Reconciliation is stateless, can crash any time (and restart)

Slide 38

Slide 38

@emanuil_tolev BTW I try to avoid ECK as a name because it only confuses people. Everybody talks about operators and we also want to keep it kind of generic and not only an ECK pitch.

Slide 39

Slide 39

Elastic Operator Elasticsearch, Kibana, APM Server https://github.com/elastic/cloud-on-k8s @emanuil_tolev

Slide 40

Slide 40

Golang 1.13 Kubebuilder 2 SDK for building Kubernetes APIs using CRDs Kustomize Generate patched CRDs for specific flavors @emanuil_tolev Kustomize: trivial-versions: No apiserver-side validation at all. Required for K8s <= 1.13

Slide 41

Slide 41

@emanuil_tolev Emphasise the integers: 1,2,3,4 . It’s a diagram, make it exciting and emphasise the start of each point. ^ Missing the APM server, but it’s the same ^ TODO: many more

Slide 42

Slide 42

Opinionated Encode best practices & operational knowledge Built-in certificate management, security,… @emanuil_tolev

Slide 43

Slide 43

Example Opinions Scale down: Drain nodes first Upgrade: Disable shard allocation @emanuil_tolev Not sure if all indices are replicated — different than Helm Delete pod, will be recreated by StatefulSet controller with the latest config

Slide 44

Slide 44

You Can Still Shoot Yourself in the Foot Configure 0 replicas and do an upgrade for example @emanuil_tolev

Slide 45

Slide 45

Running on Minikube minikube config set memory 16384 minikube config set cpus 4 minikube start @emanuil_tolev Requires non-trivial resources even just locally with minikube

Slide 46

Slide 46

Running on Minikube # Set up the entire operator: configs, deployment practices, monitoring, in one command kubectl apply -f https://download.elastic.co/downloads/eck/1.1.2/all-in-one.yaml # Monitor logs kubectl -n elastic-system logs -f statefulset.apps/elastic-operator # And this is where you come in - the configs you write kubectl apply -f apm_es_kibana.yaml @emanuil_tolev The following 3 code segments make up apmeskibana.yaml

Slide 47

Slide 47

—apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: elasticsearch-sample spec: version: 7.7.1 nodes: - nodeCount: 1 podTemplate: spec: containers: - name: elasticsearch resources: limits: memory: 2Gi volumeClaimTemplates: - metadata: name: data spec: accessModes: - ReadWriteOnce resources: requests: storage: 2Gi

Slide 48

Slide 48

—apiVersion: apm.k8s.elastic.co/v1 kind: ApmServer metadata: name: apm-server-sample spec: version: 7.7.1 nodeCount: 1 elasticsearchRef: name: “elasticsearch-sample” @emanuil_tolev

Slide 49

Slide 49

—apiVersion: kibana.k8s.elastic.co/v1 kind: Kibana metadata: name: kibana-sample spec: version: 7.7.1 nodeCount: 1 elasticsearchRef: name: “elasticsearch-sample” @emanuil_tolev

Slide 50

Slide 50

Running on Minikube # Check status kubectl get elasticsearch,kibana,apmserver # Expose Kibana kubectl port-forward service/kibana-sample-kb-http 5601 # Get the credentials echo kubectl get secret elasticsearch-sample-es-elastic-user -o=jsonpath='{.data.elastic}' | base64 --decode @emanuil_tolev Security on, self generated certificates, default service exposure

Slide 51

Slide 51

Changes Instance size / number, version,… kubectl apply -f apm_es_kibana.yaml @emanuil_tolev

Slide 52

Slide 52

Support GKE (Google Cloud) EKS (AWS) AKS (Azure) OpenShift (Redhat) @emanuil_tolev

Slide 53

Slide 53

StatefulSets Rolling Upgrades with Volume reuse “Standard” way to run stateful workloads — stable network ID, stable data volume that is re-attachable during rolling upgrades @emanuil_tolev Each logical group of Elasticsearch nodes (master, data,…) to a StatefulSet

Slide 54

Slide 54

Deployment CRDs require cluster admin level permissions to install Privileged Containers — Elasticsearch host kernel settings like vm.max_map_count @emanuil_tolev vm.max_map_count: Disable it if you configure it correctly already. Probably being removed soon because it’s a bad practise

Slide 55

Slide 55

Global Namespace @emanuil_tolev User installs ECK at global level ECK checks all namespaces for new ES / Kibana / APM Server objects Does not scale well with the number of clusters Requires elevated permissions on the cluster

Slide 56

Slide 56

Single Namespace @emanuil_tolev User installs ECK in specific namespaces ECK only checks for definition of ES / KB / APM Server in each namespace Does not play well with cross-namespace features (a single enterprise license pool for multiple clusters in multiple namespaces, cross-cluster search and replication on clusters across namespaces) To deploy 5 clusters in 5 different namespaces, it requires 5 operators running. A single one could have been technically enough.

Slide 57

Slide 57

Other Operators: MongoDB, Kafka, Redis, CockroachDB,… Operator “Marketplaces” like https://operatorhub.io/

Slide 58

Slide 58

Conclusion @emanuil_tolev

Slide 59

Slide 59

“Containers are disrupting the industry!” @emanuil_tolev With all the bad practices (root, :latest, yolo deploy), this is not surprising

Slide 60

Slide 60

“Can I run Elasticsearch on Docker or Kubernetes?” @emanuil_tolev Yes

Slide 61

Slide 61

“Should I run Elasticsearch on Docker or Kubernetes?” @emanuil_tolev It depends, but if you’re unsure about Docker: no

Slide 62

Slide 62

Effective collaboration and solving production problems Remember why you’re doing all this @emanuil_tolev It’s OK to try new stuff, but be careful that it’s actually helping you be a more effective business and (ultimately) leaves people happier after some onboarding pain. A bunch of VMs set up with Ansible or Chef may be just the right thing for you at the stage you’re at with the team skills you have. Do some prototype work, investigate, identify what problems exactly k8s will solve, commit and put in the effort and then do retrospectives on whether it did solve them.

Slide 63

Slide 63

Helm Charts vs Operator @emanuil_tolev Un-opinionated vs opinionated. First is good if you want to run many services and potentially in a similar fashion. Second one is more specialized with the good and the bad parts of that If ECK is Cloud-on-Kubernetes, you can think of the Helm charts as Stack-on-Kubernetes. They’re a way of defining exact specifications running our Docker images in a Kubernetes environment <!—Finally, you could use Helm to bootstrap the Operator (this is an idea right now and still up for discussion)—>

Slide 64

Slide 64

Where to next? • Deeper look at the operator: https:// www.elastic.co/blog/introducing-elastic-cloudon-kubernetes-the-elasticsearch-operator-andbeyond • The source code: https://github.com/elastic/ cloud-on-k8s • The slides: https://noti.st/emanuil-tolev/ CqvknF/from-containers-to-kubernetes-operators @emanuil_tolev

Slide 65

Slide 65

Elastic interest? community.elastic.co free lunch info sessions, ping me @emanuil_tolev

Slide 66

Slide 66

Questions & Discussion @emanuil_tolev etolev@elastic.co @emanuil_tolev Next: https://www.elastic.co/blog/introducing-elastic-cloud-onkubernetes-the-elasticsearch-operator-and-beyond ^ ECE features that do not (some may never) exist in ECK Rollback on plan failure Snapshot management Index curation RBAC access Custom plugins upload API + UI to manage deployments Deployment templates IP filtering Built-in logs & metrics Support for ES <v6