Cloud Native Computing Rheinland Breaking Free with Open Standards: OpenTelemetry and Perses for Observability Kasper Borg Nissen, Principal Developer Advocate at Dash0 kaspernissen.xyz /in/kaspernissen kaspernissen

Who? Principal Developer Advocate at Dash0 KubeCon+CloudNativeCon EU/NA 24/25 Co-Chair (former) Author of OpenTelemetry for Dummies CNCF Ambassador Golden Kubestronaut CNCG Aarhus, KCD Denmark Organizer Co-founder & Community Lead Cloud Native Nordics

tl;dr; OpenTelemetry is standardizing telemetry collection. Perses is standardizing dashboarding. Applying Platform Engineering principles transforms observability into a seamless, scalable, and developer-friendly experience. Building on Open Standards allows you to freely move between vendors, ensuring they stay on their toes and provide you the best possible experience.

Today… … observability promised a lot.

Observability promised a lot Faster root cause analysis Understanding system behavior Lower MTTR Where do I start? Which tool is right? Why is this metric spiking? Reduced guesswork Human correlation Expectation Reality

The current reality: The browser tabs of Observability LOGS METRICS level=DEBUG level=debug LEVEL=DEBUG 100s/1000s of Microservices TRACES RUM PROFILING DISTRIBUTED SYSTEMS Finding the needle in the haystack

The current reality: fragmentation

The current reality: fragmentation Complex Query Languages

The current reality: fragmentation Complex Query Languages Vendor lock-in

The current reality: fragmentation Complex Query Languages Vendor lock-in Metadata Inconsistency

The current reality: fragmentation Complex Query Languages Vendor lock-in No instrumentation due to high complexity Metadata Inconsistency

The current reality: fragmentation Complex Query Languages Vendor lock-in No instrumentation due to high complexity Metadata Inconsistency Lack of unified insights

The cost & complexity paradox More telemetry, more tooling - same time to recovery Cost Relative Impact Complexity MTTR Time

Up to 84% of current observability users struggle with the costs and complexity of their daily monitoring responsibilities. Gartner Hype Cycle Report, 2025 Source: https://www.gartner.com/en/documents/6755734 14

A shift is happening.

A shift toward correlation Find related information Jump between signals Reconstruct chain of events

A shift toward…

OpenTelemetry OpenTelemetry (OTel) is an open source project designed to provide standardized tools and APIs for generating, collecting, and exporting telemetry data such as traces, metrics, and logs The de-facto standard for distributed tracing, metrics, logs, profiling & RUM (experimental) The main goals of the project are: Unified telemetry Vendor neutrality Cross platform

OpenTelemetry in a nutshell OpenTelemetry is a toolkit and a specification. What it is ◗ ◗ ◗ ◗ ◗ ◗ Data models API specifications Semantic conventions Library implementations in many languages Utilities and much more What it is NOT ◗ ◗ ◗ ◗ ◗ ◗ Proprietary An all-in-one observability tool A data storage or dashboarding solution A query language A Performance Optimizer Feature complete

1/1/20251/1/2026 Commits: 37.959 PRs+Issues: 46.709 Commits: 53.495 PRs+Issues: 40.597 Source: CNCF Velocity Report

49% of respondents using OpenTelemetry in production. 26% of respondents evaluating OpenTelemetry. Source: https://www.cncf.io/wp-content/uploads/2026/01/CNCF_Annual_Survey_Report_final.pdf

Signals METRICS 42 LOGS 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 TRACES PROFILES RUM

Let’s stop talking about the three pillars of observability … Kill The Three Pillars Manifesto Metrics Logs Traces

Let’s stop talking about the three pillars of observability … We don’t have a metrics problem, or a tracing problem. We have systems problems. Metrics Logs Traces

Correlation is the superpower METRICS 42 LOGS 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 20/JUN/2025 “GET / HTTP/1.1ˮ 200 TRACES PROFILES RUM

OpenTelemetry: A 1000 miles view Instrumentation OTel API & SDK Telemetry Backends The OpenTelemetry Collector auto-instrumentation Time-series database … Log database Receive Process Analysis Tools Export Trace database Infrastructure … Kubernetes … Generate and Emit transmit Collect, Convert, Process, Route, Export transmit Store & Analyze Inspired by visualizations from LFS148

OpenTelemetry: A 1000 miles view OTel API & SDK Collection of Telemetry is standardized Vendor space The OpenTelemetry Collector auto-instrumentation … Receive Process Export Infrastructure Kubernetes … “The last observability agent you will ever install” Generate and Emit transmit Collect, Convert, Process, Route, Export … and many more. transmit Store & Analyze

Telemetry without context is just data

What are we looking at?

What are we looking at? Awww… Adorable! Cute Cuteness Pretty Normal Unfortunate Creepy Reddit /r/funny, “Cuteness Vs Number of legs” (circa 2010) Gaah! Kill it! Kill it! 0 1 2 3 4 5 Number of Legs 6 7 8

How we talk about system context Organization (By whom) 1 Architecture (What / Why) Which service / system component is this? 2 Compute (How/2) 3 Platform (How) Kubernetes? Which cluster / namespace / deployment / cronjob / job / pod? AWS ECS? Which cluster / service / task? … Which team owns it? “Who you gonna call?” .. 4 Which container? Which process? Pid? Startup args? Which runtime is it? Node.js? JVM? .NET? Which build? Which version? … Infrastructure (Where) 5 Which datacenter / Cloud region / availability zone / account does it run in? …

How to set resource attributes? ● ● ● Resource detectors & manual “hard-coding”. OTEL_RESOURCE_ATTRIBUTES env var Added to telemetry “in transit” using the OpenTelemetry Collector. import import import import { { { { NodeSDK } from ‘@opentelemetry/sdk-node’; ConsoleSpanExporter } from ‘@opentelemetry/sdk-trace-node’; envDetector, processDetector, Resource} from ‘@opentelemetry/resources’; awsEcsDetector } from ‘@opentelemetry/resource-detector-aws’; const sdk = new NodeSDK({ traceExporter: new ConsoleSpanExporter(), // Skip metric exporter, auto-instrumentations and more. See // https://opentelemetry.io/docs/languages/js/getting-started/nodejs/ instrumentations: [getNodeAutoInstrumentations()], // Specify which resource detectors to use resourceDetectors: [envDetector, processDetector, awsEcsDetector], // Hard-coded resource resources: [new Resource({ team: ‘awesome’, })], }); sdk.start(); Sample initialization of the OpenTelemetry JS Distro in a Node.js application

without context semantic conventions is just data

Semantic Conventions Semantic Conventions define a common set of (semantic) attributes which provide meaning to data when collecting, producing and consuming it. https://github.com/open-telemetry/semantic-conventions Semantic Conventions by signals: ● ● ● ● ● Events: Semantic Conventions for event data. Logs: Semantic Conventions for logs data. Metrics: Semantic Conventions for metrics. Resource: Semantic Conventions for resources. Trace: Semantic Conventions for traces and spans.

OpenTelemetry semantic conventions to context layers 1 Organization 😢 Architecture Service (stable) and (experimental) Deployment Environment 2 Compute 3 Platform Kubernetes Cloud (cloud.platform specifically) Cloud-provider specific 4 COM NOT PRE A HE LIST NSIVE ! Telemetry SDK (stable) and (experimental) Compute Unit and Instance Operating System Process & Process Runtimes Device, Browser, Webengine, … … 5 Infrastructure Cloud (general stuff)

So, why OpenTelemetry? Instrument once, use everywhere Separate telemetry generation from analysis Make software observable by default Improve how we use telemetry

That’s all great, but how do I make it easily accessible for my developers?

The dual role of Platform Engineers in Observability

Observe the platform Enable developers (cloud, cluster, CI/CD, shared DBs, etc.) (traces, metrics, logs, profiling)

What types of Telemetry do I need? Prevalent telemetry types End-user devices and IoT — — —— —- — — — —— — ——- — — Runtimes, applications and services — — —— —- — — — —— — ——- — — Cloud, FaaS, Container orchestration — — —— —- — — — —— — ——- — — Operating system — — —— —- — — — —— — ——- — — Virtualisation Bare metal Infrastructure context — — —— —- — — — —— — ——- — — — — —— —- — — — —— — ——- — — Application context Based on: “What is observability?” by ubuntu.com

Platform Engineering for Observability Self-Service Experience Explicit and Consistent APIs Golden Paths Modularity Platform as a Product Core Requirements

Platform as a Product 🥳 Developer Product 1 Product 2 Platform Kubernetes Product 3

Paved Paths for Observability 󰠁 Paved Observability Path Logs Metrics Storage Traces Collectors Correlation Engine Instrumentation

That’s all great, but I ask again, how do I do that?

The answer: Auto-instrumentation + Operators = No-touch Instrumentation

OpenTelemetry Operator Instrumentation OpenTelemetryCollector OpAMPBridge OpenTelemetry Operator TargetAllocator

Operators as the delivery mechanism Instrumentation Instructs how to inject auto-instrumentation Injects instrumentation in to the pod OpenTelemetry Operator

Observability doesn’t stop at instrumentation.

Perses An open specification for dashboards.

Dashboards as code Perses PersesDatasource perses-operator PersesDashboard

Observability doesn’t stop at instrumentation Vendors How humans and agents understand the system Explorers Dashboards Alerting … and many more. Synthetic Checks Service Maps Agents Your environment (k8s, cloud, etc) OSS How telemetry is produced and collected Where telemetry lives and how it’s accessed Cost Insights

A quick word about AI Garbage In Garbage Out

A quick word about AI Specifications and Semantic Conventions Your Telemetry

Demo

Deployment DaemonSet

Putting it all together…

Golden defaults for developers

Observability is evolving - fast.

OpenTelemetry is standardizing telemetry collection.

Perses is standardizing dashboarding.

Applying Platform Engineering principles can transform observability from an afterthought into a seamless, scalable, and developer-friendly experience.

Observability is a systems problem - not a tracing, logging, or metrics problem.

When we connect signals together, we empower developers to solve problems faster.

And last but not least, Building on Open Standards allows you to freely move between vendors, ensuring they stay on their toes and provide you the best possible experience.

https://university.platformengineering.org/observability-for-platform-engineering

Get a free copy!

Follow us on Merge Forward (CNCF) Building a stronger open source future together! Learn more #merge-forward on Slack! community.cncf.io/merge-forward

Thank you! Kasper Borg Nissen, Principal Developer Advocate at Dash0 kaspernissen.xyz /in/kaspernissen kaspernissen

Get in touch! kaspernissen.xyz /in/kaspernissen kaspernissen