Deploying and running your first application on K8s
Alexander Reelsen alex@elastic.co | @spinscale
Slide 2
Today’s goal How to build, run & maintain a modern java web application with minimal resources on K8s
Slide 3
Rocket-science free zone! Level: Intro Perspective: Developer, user of existing K8s cluster Journey from nothing to downtime free rollout during peak traffic
Slide 4
About me Developer & Advocate @Elastic PaaS fan, IaC fan K8s skeptic: Primitives
level of abstraction
First rule of SWE: Don’t write code, if you don’t want to maintain it…
Slide 5
Elastic Community Conference Organized by the Elastic Community Team Virtual Around the clock Several languages No talks from Elastic Community Team members 2021 was a success, 70 talks
Slide 6
2022: ElasticCC Registration via Elastic Cloud
Slide 7
Discussion Decision: Build vs. Buy (Registration, Live Streaming) Platform: PaaS vs. K8s (no approval required) Datastore: SQL vs. Elastic Cloud vs. API Let’s do this: Own web application
Slide 8
Discussion Decision: Build vs. Buy (Registration, Live Streaming) Platform: PaaS vs. K8s (no approval required) Datastore: Sql vs. Elastic Cloud vs. API Let’s do this: Own web application Use your own technologies in production —Me
Slide 9
Login via Cloud
Slide 10
Schedule
Slide 11
Feedback
Slide 12
Architecture
Slide 13
How to build, run & maintain No other teams involved after initial setup Collective ownership within the team Well tested
Slide 14
… a modern java web application Javalin as a framework Latest Java version Latest GC (ZGC) pac4j for SAML based authorization Frontend for backend developers with htmx and hyperscript New Elasticsearch Java Client Elastic APM Agent
Slide 15
… with minimal resources Small pods Fast rollouts No one working full time on this No user accounts/passwords should be stored Easy rollout for everyone in the community team
Slide 16
… on K8s Utilizing company wide resources Rollout: docker build && docker push && kubectl restart … imagePullPolicy: Always
Slide 17
Secrets with Vault apiVersion: vaultproject.io/v1 kind: SecretClaim metadata: name: elasticcc-app namespace: community spec: type: Opaque path: secret/k8s/elasticcc-app renew: 3600
Rollouts without downtime Just start more pods… not so easy Requests are distributed via round robin Javalin is a Servlet based web framework with a notion of sessions… … each user gets a session cookie with a corresponding map of attributes on the server side Server side: User user = ctx.sessionAttribute(“user”) Instance shutdown kills session Session fixation? Works until shutdown…
Rollouts without downtime Every request writes its session data to Elasticsearch when finished Bad idea! The internet consists of bots… a lot 100k requests per hour before the announcement due to security scanners Solution: Only persist session if a login/logout has happened prior Major reduction of Elasticsearch write operations, resulting in faster responses
Slide 24
No announcement, but 100k req/hour? apiVersion: extensions/v1beta1 kind: Ingress metadata: name: elasticcc-app-ngx namespace: community annotations: kubernetes.io/ingress.class: nginx cert-manager.io/cluster-issuer: letsencrypt-production
Observability Tradeoff GraalVM for speed and lower memory footprint APM agents require bytecode instrumentation
Slide 29
Observability
Slide 30
Observability
Slide 31
Slide 32
Debugging Logs were not on the same instance, adding friction Logs required k8s configuration change in our case, tedious Component that shipped logs over the network would have been great Do you really need logs, when exceptions are logged?
Slide 33
Missing Automatic rollouts Stateful services outsourced Setup-as-code (i.e. via terraform to also include Elasticsearch cluster) APM tooling can be tricky, hard to distinguish single service memory spikes when running several pods
Slide 34
Conference day APM early detected an exception thrown when a template was rendered Rolled out before main traffic was coming in No issue during the 12 hours of the conference > 170k valid requests served in total, 1.7 mio in total 95th percentile: /schedule : 8.8ms /speaker/{id} : 5.0ms /session/{id} : 5.5ms
Slide 35
Agility
Log4Shell: From slack notification to assessing to rollout in 14 minutes Impact: Dropped the little one later to kindergarten
Slide 36
Summary 10/10 Would do again! Don’t go crazy on automation (i.e. push on rollout etc) Go with Cookie based session store? Go crazy on IaC! Logs should be easily accessible, just like APM data Level of abstraction:
Slide 37
Summary: Level of abstraction Primitives are designed for operations (CPU, Memory) When to scale up/out? Application hint required: # of concurrent requests duration of requests wait time until processed Scaling strategy: Start pods if one is overloaded? Or all? Talk to developers about this, the discussions within your company (especially with legacy apps) will be a great exercise for everyone
Slide 38
Thanks for listening Q&A Alexander Reelsen alex@elastic.co | @spinscale
Slide 39
Discussion What technologies would you use? Where did I go wrong? Alex, this is not how you do it in k8s world!11!!elf! - I’m sure, please
talk to me
Slide 40
Thanks for listening Q&A Alexander Reelsen alex@elastic.co | @spinscale