Deploying and running your first application on K8s

A presentation at Kubernetes Community Days Berlin in June 2022 in Berlin, Germany by Alexander Reelsen

Slide 1

Slide 1

Deploying and running your first application on K8s Alexander Reelsen alex@elastic.co | @spinscale

Slide 2

Slide 2

Today’s goal How to build, run & maintain a modern java web application with minimal resources on K8s

Slide 3

Slide 3

Rocket-science free zone! Level: Intro Perspective: Developer, user of existing K8s cluster Journey from nothing to downtime free rollout during peak traffic

Slide 4

Slide 4

About me Developer & Advocate @Elastic PaaS fan, IaC fan K8s skeptic: Primitives level of abstraction First rule of SWE: Don’t write code, if you don’t want to maintain it…

Slide 5

Slide 5

Elastic Community Conference Organized by the Elastic Community Team Virtual Around the clock Several languages No talks from Elastic Community Team members 2021 was a success, 70 talks

Slide 6

Slide 6

2022: ElasticCC Registration via Elastic Cloud

Slide 7

Slide 7

Discussion Decision: Build vs. Buy (Registration, Live Streaming) Platform: PaaS vs. K8s (no approval required) Datastore: SQL vs. Elastic Cloud vs. API Let’s do this: Own web application

Slide 8

Slide 8

Discussion Decision: Build vs. Buy (Registration, Live Streaming) Platform: PaaS vs. K8s (no approval required) Datastore: Sql vs. Elastic Cloud vs. API Let’s do this: Own web application Use your own technologies in production —Me

Slide 9

Slide 9

Login via Cloud

Slide 10

Slide 10

Schedule

Slide 11

Slide 11

Feedback

Slide 12

Slide 12

Architecture

Slide 13

Slide 13

How to build, run & maintain No other teams involved after initial setup Collective ownership within the team Well tested

Slide 14

Slide 14

… a modern java web application Javalin as a framework Latest Java version Latest GC (ZGC) pac4j for SAML based authorization Frontend for backend developers with htmx and hyperscript New Elasticsearch Java Client Elastic APM Agent

Slide 15

Slide 15

… with minimal resources Small pods Fast rollouts No one working full time on this No user accounts/passwords should be stored Easy rollout for everyone in the community team

Slide 16

Slide 16

… on K8s Utilizing company wide resources Rollout: docker build && docker push && kubectl restart … imagePullPolicy: Always

Slide 17

Slide 17

Secrets with Vault apiVersion: vaultproject.io/v1 kind: SecretClaim metadata: name: elasticcc-app namespace: community spec: type: Opaque path: secret/k8s/elasticcc-app renew: 3600

Slide 18

Slide 18

Secrets with Vault apiVersion: apps/v1 kind: Deployment spec: template: spec: containers: - name: elasticcc-app env: - name: ELASTICSEARCH_PASSWORD valueFrom: secretKeyRef: name: elasticcc-app key: elasticsearch_password

Slide 19

Slide 19

Secrets with Vault vault write secret/k8s/elasticcc-app \ elasticsearch_password=S3cr3t \ key=value

Slide 20

Slide 20

Rollouts without downtime apiVersion: apps/v1 kind: Deployment spec: replicas: 1 strategy: rollingUpdate: maxUnavailable: 0 type: RollingUpdate

Slide 21

Slide 21

Rollouts without downtime Just start more pods… not so easy Requests are distributed via round robin Javalin is a Servlet based web framework with a notion of sessions… … each user gets a session cookie with a corresponding map of attributes on the server side Server side: User user = ctx.sessionAttribute(“user”) Instance shutdown kills session Session fixation? Works until shutdown…

Slide 22

Slide 22

Rollouts without downtime this.app = Javalin.create(cfg -> { cfg.sessionHandler(() -> createSessionHandler(elasticsearchClient)); }); public static SessionHandler createSessionHandler(ElasticsearchClient client) { SessionHandler sessionHandler = new SessionHandler(); // session handler setup here… SessionCache sessionCache = new NullSessionCache(sessionHandler); sessionCache.setSaveOnCreate(true); sessionCache.setFlushOnResponseCommit(true); sessionCache.setSessionDataStore(new ElasticsearchSessionDataStore(client)); } sessionHandler.setSessionCache(sessionCache); return sessionHandler;

Slide 23

Slide 23

Rollouts without downtime Every request writes its session data to Elasticsearch when finished Bad idea! The internet consists of bots… a lot 100k requests per hour before the announcement due to security scanners Solution: Only persist session if a login/logout has happened prior Major reduction of Elasticsearch write operations, resulting in faster responses

Slide 24

Slide 24

No announcement, but 100k req/hour? apiVersion: extensions/v1beta1 kind: Ingress metadata: name: elasticcc-app-ngx namespace: community annotations: kubernetes.io/ingress.class: nginx cert-manager.io/cluster-issuer: letsencrypt-production

Slide 25

Slide 25

Probes livenessProbe: failureThreshold: 3 periodSeconds: 30 httpGet: path: /monitoring/health port: 8080 readinessProbe: failureThreshold: 15 initialDelaySeconds: 10 periodSeconds: 5 httpGet: path: /monitoring/health port: 8080

Slide 26

Slide 26

Setting JVM memory resources: requests: cpu: 2.0 memory: 1Gi limits: cpu: 2.0 memory: 1Gi def jvmOptions = [“-XX:+UseZGC”, “-Xmx768m”] startScripts { defaultJvmOpts = jvmOptions }

Slide 27

Slide 27

Monitoring spec: template: metadata: annotations: watcher.alerts.slack: “#community-downtime-notifications” labels: app: elasticcc-app watcher: enabled

Slide 28

Slide 28

Observability Tradeoff GraalVM for speed and lower memory footprint APM agents require bytecode instrumentation

Slide 29

Slide 29

Observability

Slide 30

Slide 30

Observability

Slide 31

Slide 31

Slide 32

Slide 32

Debugging Logs were not on the same instance, adding friction Logs required k8s configuration change in our case, tedious Component that shipped logs over the network would have been great Do you really need logs, when exceptions are logged?

Slide 33

Slide 33

Missing Automatic rollouts Stateful services outsourced Setup-as-code (i.e. via terraform to also include Elasticsearch cluster) APM tooling can be tricky, hard to distinguish single service memory spikes when running several pods

Slide 34

Slide 34

Conference day APM early detected an exception thrown when a template was rendered Rolled out before main traffic was coming in No issue during the 12 hours of the conference > 170k valid requests served in total, 1.7 mio in total 95th percentile: /schedule : 8.8ms /speaker/{id} : 5.0ms /session/{id} : 5.5ms

Slide 35

Slide 35

Agility Log4Shell: From slack notification to assessing to rollout in 14 minutes Impact: Dropped the little one later to kindergarten

Slide 36

Slide 36

Summary 10/10 Would do again! Don’t go crazy on automation (i.e. push on rollout etc) Go with Cookie based session store? Go crazy on IaC! Logs should be easily accessible, just like APM data Level of abstraction:

Slide 37

Slide 37

Summary: Level of abstraction Primitives are designed for operations (CPU, Memory) When to scale up/out? Application hint required: # of concurrent requests duration of requests wait time until processed Scaling strategy: Start pods if one is overloaded? Or all? Talk to developers about this, the discussions within your company (especially with legacy apps) will be a great exercise for everyone

Slide 38

Slide 38

Thanks for listening Q&A Alexander Reelsen alex@elastic.co | @spinscale

Slide 39

Slide 39

Discussion What technologies would you use? Where did I go wrong? Alex, this is not how you do it in k8s world!11!!elf! - I’m sure, please talk to me

Slide 40

Slide 40

Thanks for listening Q&A Alexander Reelsen alex@elastic.co | @spinscale