From Containers to Kubernetes Operators Emanuil Tolev @emanuil_tolev @emanuil_tolev
A presentation at DeveloperWeek Enterprise in November 2020 in by Emanuil Tolev
From Containers to Kubernetes Operators Emanuil Tolev @emanuil_tolev @emanuil_tolev
Community Engineer @emanuil_tolev
@emanuil_tolev This is what we do
Agenda - Elastic’s journey through Docker images Helm Chart Kubernetes Operator @emanuil_tolev Slides PDF at the end This is a talk which contains a lot. Introductions to a few rare concepts, the lessons that a few dozen people learned over 8+ years and a bit of practical yaml to help you really ingest the abstract stuff.
Docker’s great but not without issues.
! Containers are the new ZIP format to distribute software @emanuil_tolev
One of many… RPM, DEB, TAR.GZ, MSI Ansible, Chef, Puppet @emanuil_tolev We don’t really care. Any of those will work. But if you insist on containers, let’s see what is in there
Fallacy :latest @emanuil_tolev https://github.com/elastic/ elasticsearch-docker/issues/ 75
No :latest, what about :7 and :7.7? Exact only: 7.7.1 @emanuil_tolev Your cluster is stuck on the smallest version — no gain. Just bugs of all versions and potential for bad combinations ^ Creating a shard on one node may not allow it to move to other nodes, resulting in eventual cluster imbalance.
The base image diversity and size debate @emanuil_tolev Only say “some people like to use different base images for each component to try to reduce size and fulfil other requirements - we did that too but ended up with one common base linux image in the end.”
Common base image since 5.4+ CentOS 7 Similar setup Shared layers - ultimately much better for size across components @emanuil_tolev
@emanuil_tolev
Kubernetes is the answer. What was the question? — https://twitter.com/charlesfitz/status/ 1068203930683752448 @emanuil_tolev
@emanuil_tolev This one is almost synonymous to Kubernetes: YAML
…lots of it @emanuil_tolev we started to feel more like YAML engineers than software engineers at some point
Fun with YAML http://www.yamllint.com ports: - 80:80 - 20:20 @emanuil_tolev Imagine you want to deploy a service. You read the Docker documentation and see this example of opening some ports to the outside. Can anybody spot the problem? 2060(1):2060(0)=1220 * 60
Fun with YAML https://docs.docker.com/compose/compose-file/#short-syntax-1 ports: - “80:80” - 73200
@emanuil_tolev Always quote the port mapping of Docker. A container port lower than 60 will be evaluated as base-60
@emanuil_tolev Helm Chart Advanced package management with support for templating and more complex resources
Building on existing Kubernetes primitives like StatefulSet, Service, Deployment,… @emanuil_tolev
Elastic Helm Charts Elasticsearch, Kibana, Filebeat, Metricbeat, APM Server, Logstash https://github.com/elastic/helm-charts @emanuil_tolev
Tested on GKE Default storage pd-ssd (network attached) Kubernetes >=1.10 supports Local PersistentVolumes for increased performance @emanuil_tolev
Un-Opinionated Expose environment variables & mount secrets Multiple upgrade strategies @emanuil_tolev Doing this makes it much easier for this chart to support multiple versions with minimal changes
Minikube Example https://github.com/elastic/helm-charts/tree/master/elasticsearch/examples/ minikube helm repo add elastic https://helm.elastic.co helm install —name elasticsearch elastic/elasticsearch [—set imageTag=7.7.1] minikube addons enable default-storageclass minikube addons enable storage-provisioner cd examples/minikube make @emanuil_tolev Won’t have time to dig into this
—# Permit co-located instances for solitary minikube virtual machines antiAffinity: “soft” # Shrink default JVM heap esJavaOpts: “-Xmx128m -Xms128m” # Allocate smaller chunks of memory per pod resources: requests: cpu: “100m” memory: “512M” limits: cpu: “1000m” memory: “512M” # Request smaller persistent volumes volumeClaimTemplate: accessModes: [ “ReadWriteOnce” ] storageClassName: “standard” resources: requests: storage: 100M https://github.com/elastic/ helm-charts/tree/master/ elasticsearch/examples/ minikube/values.yaml
@emanuil_tolev Kubernetes Operator pattern Operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop. ^ Mention difference: operator has a runtime component, full lifecycle, do anything
Custom Resource (CR) CRD == type definition (class) CR == instance (object) @emanuil_tolev Class and object in the sense of object orientation; maybe not exactly the same but close
Custom Resource Definition (CRD) Think: Elasticsearch, Kibana, APM Contrast: Built-in resources like Pods, Services, Secrets, StatefulSets,… @emanuil_tolev
Custom Controller Brings CRDs to “life” (reconciliation loop) Upgrades, secrets, certificate management,… @emanuil_tolev Cost of the custom controller: custom software, expensive to develop, maintain, has a lifecycle, needs rollout,… Reconciliation is stateless, can crash any time (and restart)
@emanuil_tolev BTW I try to avoid ECK as a name because it only confuses people. Everybody talks about operators and we also want to keep it kind of generic and not only an ECK pitch.
Elastic Operator Elasticsearch, Kibana, APM Server https://github.com/elastic/cloud-on-k8s @emanuil_tolev
Golang 1.13 Kubebuilder 2 SDK for building Kubernetes APIs using CRDs Kustomize Generate patched CRDs for specific flavors @emanuil_tolev Kustomize: trivial-versions: No apiserver-side validation at all. Required for K8s <= 1.13
@emanuil_tolev Emphasise the integers: 1,2,3,4 . It’s a diagram, make it exciting and emphasise the start of each point. ^ Missing the APM server, but it’s the same ^ TODO: many more
Opinionated Encode best practices & operational knowledge Built-in certificate management, security,… @emanuil_tolev
Example Opinions Scale down: Drain nodes first Upgrade: Disable shard allocation @emanuil_tolev Not sure if all indices are replicated — different than Helm Delete pod, will be recreated by StatefulSet controller with the latest config
You Can Still Shoot Yourself in the Foot Configure 0 replicas and do an upgrade for example @emanuil_tolev
Running on Minikube minikube config set memory 16384 minikube config set cpus 4 minikube start @emanuil_tolev Requires non-trivial resources even just locally with minikube
Running on Minikube # Set up the entire operator: configs, deployment practices, monitoring, in one command kubectl apply -f https://download.elastic.co/downloads/eck/1.1.2/all-in-one.yaml # Monitor logs kubectl -n elastic-system logs -f statefulset.apps/elastic-operator # And this is where you come in - the configs you write kubectl apply -f apm_es_kibana.yaml @emanuil_tolev The following 3 code segments make up apmeskibana.yaml
—apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: elasticsearch-sample spec: version: 7.7.1 nodes: - nodeCount: 1 podTemplate: spec: containers: - name: elasticsearch resources: limits: memory: 2Gi volumeClaimTemplates: - metadata: name: data spec: accessModes: - ReadWriteOnce resources: requests: storage: 2Gi
—apiVersion: apm.k8s.elastic.co/v1 kind: ApmServer metadata: name: apm-server-sample spec: version: 7.7.1 nodeCount: 1 elasticsearchRef: name: “elasticsearch-sample” @emanuil_tolev
—apiVersion: kibana.k8s.elastic.co/v1 kind: Kibana metadata: name: kibana-sample spec: version: 7.7.1 nodeCount: 1 elasticsearchRef: name: “elasticsearch-sample” @emanuil_tolev
Running on Minikube # Check status kubectl get elasticsearch,kibana,apmserver # Expose Kibana kubectl port-forward service/kibana-sample-kb-http 5601 # Get the credentials echo kubectl get secret elasticsearch-sample-es-elastic-user -o=jsonpath='{.data.elastic}' | base64 --decode
@emanuil_tolev
Security on, self generated certificates, default service exposure
Changes Instance size / number, version,… kubectl apply -f apm_es_kibana.yaml @emanuil_tolev
Support GKE (Google Cloud) EKS (AWS) AKS (Azure) OpenShift (Redhat) @emanuil_tolev
StatefulSets Rolling Upgrades with Volume reuse “Standard” way to run stateful workloads — stable network ID, stable data volume that is re-attachable during rolling upgrades @emanuil_tolev Each logical group of Elasticsearch nodes (master, data,…) to a StatefulSet
Deployment CRDs require cluster admin level permissions to install @emanuil_tolev
Global Namespace @emanuil_tolev User installs ECK at global level ECK checks all namespaces for new ES / Kibana / APM Server objects Does not scale well with the number of clusters Requires elevated permissions on the cluster
Single Namespace @emanuil_tolev User installs ECK in specific namespaces ECK only checks for definition of ES / KB / APM Server in each namespace Does not play well with cross-namespace features (a single enterprise license pool for multiple clusters in multiple namespaces, cross-cluster search and replication on clusters across namespaces) To deploy 5 clusters in 5 different namespaces, it requires 5 operators running. A single one could have been technically enough.
Other Operators: MongoDB, Kafka, Redis, CockroachDB,… Operator “Marketplaces” like https://operatorhub.io/
Conclusion @emanuil_tolev
“Containers are disrupting the industry!” @emanuil_tolev With all the bad practices (root, :latest, yolo deploy), this is not surprising
“Can I run Elasticsearch on Docker or Kubernetes?” @emanuil_tolev Yes
“Should I run Elasticsearch on Docker or Kubernetes?” @emanuil_tolev It depends, but if you’re unsure about Docker: no
Effective collaboration and solving production problems Remember why you’re doing all this @emanuil_tolev It’s OK to try new stuff, but be careful that it’s actually helping you be a more effective business and (ultimately) leaves people happier after some onboarding pain. A bunch of VMs set up with Ansible or Chef may be just the right thing for you at the stage you’re at with the team skills you have. Do some prototype work, investigate, identify what problems exactly k8s will solve, commit and put in the effort and then do retrospectives on whether it did solve them.
Helm Charts vs Operator @emanuil_tolev Un-opinionated vs opinionated. First is good if you want to run many services and potentially in a similar fashion. Second one is more specialized with the good and the bad parts of that
Where to next? • Deeper look at the operator: https:// www.elastic.co/blog/introducing-elastic-cloudon-kubernetes-the-elasticsearch-operator-andbeyond • The source code: https://github.com/elastic/ cloud-on-k8s • The slides: https://noti.st/emanuil-tolev/ CqvknF/from-containers-to-kubernetes-operators @emanuil_tolev
Elastic interest? community.elastic.co free lunch info sessions, ping me @emanuil_tolev
Questions & Discussion @emanuil_tolev etolev@elastic.co @emanuil_tolev Next: https://www.elastic.co/blog/introducing-elastic-cloud-onkubernetes-the-elasticsearch-operator-and-beyond ^ ECE features that do not (some may never) exist in ECK Rollback on plan failure Snapshot management Index curation RBAC access Custom plugins upload API + UI to manage deployments Deployment templates IP filtering Built-in logs & metrics Support for ES <v6