A presentation at Kubernetes Community Days Berlin in in Berlin, Germany by Alexander Reelsen
Deploying and running your first application on K8s Alexander Reelsen alex@elastic.co | @spinscale
Today’s goal How to build, run & maintain a modern java web application with minimal resources on K8s
Rocket-science free zone! Level: Intro Perspective: Developer, user of existing K8s cluster Journey from nothing to downtime free rollout during peak traffic
About me Developer & Advocate @Elastic PaaS fan, IaC fan K8s skeptic: Primitives level of abstraction First rule of SWE: Don’t write code, if you don’t want to maintain it…
Elastic Community Conference Organized by the Elastic Community Team Virtual Around the clock Several languages No talks from Elastic Community Team members 2021 was a success, 70 talks
2022: ElasticCC Registration via Elastic Cloud
Discussion Decision: Build vs. Buy (Registration, Live Streaming) Platform: PaaS vs. K8s (no approval required) Datastore: SQL vs. Elastic Cloud vs. API Let’s do this: Own web application
Discussion Decision: Build vs. Buy (Registration, Live Streaming) Platform: PaaS vs. K8s (no approval required) Datastore: Sql vs. Elastic Cloud vs. API Let’s do this: Own web application Use your own technologies in production —Me
Login via Cloud
Schedule
Feedback
Architecture
How to build, run & maintain No other teams involved after initial setup Collective ownership within the team Well tested
… a modern java web application Javalin as a framework Latest Java version Latest GC (ZGC) pac4j for SAML based authorization Frontend for backend developers with htmx and hyperscript New Elasticsearch Java Client Elastic APM Agent
… with minimal resources Small pods Fast rollouts No one working full time on this No user accounts/passwords should be stored Easy rollout for everyone in the community team
… on K8s Utilizing company wide resources Rollout: docker build && docker push && kubectl restart … imagePullPolicy: Always
Secrets with Vault apiVersion: vaultproject.io/v1 kind: SecretClaim metadata: name: elasticcc-app namespace: community spec: type: Opaque path: secret/k8s/elasticcc-app renew: 3600
Secrets with Vault apiVersion: apps/v1 kind: Deployment spec: template: spec: containers: - name: elasticcc-app env: - name: ELASTICSEARCH_PASSWORD valueFrom: secretKeyRef: name: elasticcc-app key: elasticsearch_password
Secrets with Vault vault write secret/k8s/elasticcc-app \ elasticsearch_password=S3cr3t \ key=value
Rollouts without downtime apiVersion: apps/v1 kind: Deployment spec: replicas: 1 strategy: rollingUpdate: maxUnavailable: 0 type: RollingUpdate
Rollouts without downtime Just start more pods… not so easy Requests are distributed via round robin Javalin is a Servlet based web framework with a notion of sessions… … each user gets a session cookie with a corresponding map of attributes on the server side Server side: User user = ctx.sessionAttribute(“user”) Instance shutdown kills session Session fixation? Works until shutdown…
Rollouts without downtime this.app = Javalin.create(cfg -> { cfg.sessionHandler(() -> createSessionHandler(elasticsearchClient)); }); public static SessionHandler createSessionHandler(ElasticsearchClient client) { SessionHandler sessionHandler = new SessionHandler(); // session handler setup here… SessionCache sessionCache = new NullSessionCache(sessionHandler); sessionCache.setSaveOnCreate(true); sessionCache.setFlushOnResponseCommit(true); sessionCache.setSessionDataStore(new ElasticsearchSessionDataStore(client)); } sessionHandler.setSessionCache(sessionCache); return sessionHandler;
Rollouts without downtime Every request writes its session data to Elasticsearch when finished Bad idea! The internet consists of bots… a lot 100k requests per hour before the announcement due to security scanners Solution: Only persist session if a login/logout has happened prior Major reduction of Elasticsearch write operations, resulting in faster responses
No announcement, but 100k req/hour? apiVersion: extensions/v1beta1 kind: Ingress metadata: name: elasticcc-app-ngx namespace: community annotations: kubernetes.io/ingress.class: nginx cert-manager.io/cluster-issuer: letsencrypt-production
Probes livenessProbe: failureThreshold: 3 periodSeconds: 30 httpGet: path: /monitoring/health port: 8080 readinessProbe: failureThreshold: 15 initialDelaySeconds: 10 periodSeconds: 5 httpGet: path: /monitoring/health port: 8080
Setting JVM memory resources: requests: cpu: 2.0 memory: 1Gi limits: cpu: 2.0 memory: 1Gi def jvmOptions = [“-XX:+UseZGC”, “-Xmx768m”] startScripts { defaultJvmOpts = jvmOptions }
Monitoring spec: template: metadata: annotations: watcher.alerts.slack: “#community-downtime-notifications” labels: app: elasticcc-app watcher: enabled
Observability Tradeoff GraalVM for speed and lower memory footprint APM agents require bytecode instrumentation
Observability
Observability
Debugging Logs were not on the same instance, adding friction Logs required k8s configuration change in our case, tedious Component that shipped logs over the network would have been great Do you really need logs, when exceptions are logged?
Missing Automatic rollouts Stateful services outsourced Setup-as-code (i.e. via terraform to also include Elasticsearch cluster) APM tooling can be tricky, hard to distinguish single service memory spikes when running several pods
Conference day APM early detected an exception thrown when a template was rendered Rolled out before main traffic was coming in No issue during the 12 hours of the conference > 170k valid requests served in total, 1.7 mio in total 95th percentile: /schedule : 8.8ms /speaker/{id} : 5.0ms /session/{id} : 5.5ms
Agility Log4Shell: From slack notification to assessing to rollout in 14 minutes Impact: Dropped the little one later to kindergarten
Summary 10/10 Would do again! Don’t go crazy on automation (i.e. push on rollout etc) Go with Cookie based session store? Go crazy on IaC! Logs should be easily accessible, just like APM data Level of abstraction:
Summary: Level of abstraction Primitives are designed for operations (CPU, Memory) When to scale up/out? Application hint required: # of concurrent requests duration of requests wait time until processed Scaling strategy: Start pods if one is overloaded? Or all? Talk to developers about this, the discussions within your company (especially with legacy apps) will be a great exercise for everyone
Thanks for listening Q&A Alexander Reelsen alex@elastic.co | @spinscale
Discussion What technologies would you use? Where did I go wrong? Alex, this is not how you do it in k8s world!11!!elf! - I’m sure, please talk to me
Thanks for listening Q&A Alexander Reelsen alex@elastic.co | @spinscale
You need to get a small web application up and running, without much fuss and complexity? Well, lucky you, if there is a dedicated k8s cluster already running within the company, whose resources you can use. But what are the next steps from a developer perspective? How to get up and running, how to manage secrets, ensure logs can be accessed, making sure your application scales, handles rollouts and shares state properly? This talk walks you through the steps we took within Elastic, when getting the JVM based Elastic Community Conference application from nothing to running in production within a few weeks, while juggling all the other conference organizing tasks. Was k8s the best & easiest choice? Anything we would do differently after our first k8s experience?