Golden Paths for Async Workflows: Dapr Meets OpenTelemetry Mauricio “Salaboy” Salatino Kasper Borg Nissen

Who?

Mauricio Salatino Kasper Borg Nissen Ecosystem Engineer & Passionate about Open Source Platform Engineering on Kubernetes Author Principal Developer Advocate at Dash0 Former KubeCon Co-Chair NA/EU CNCF Ambassador Golden Kubestronaut CNCG Aarhus Cloud Native Denmark Cloud Native Nordics

TLTW (too long to watch) If your applications and infrastructure grows in complexity you must have the right tools to understand what is going on at all times.

Why Async is Powerful

The Rise of Async Microservices

Limitations with synchronous communication

Async?

Synchronous communication is familiar, Asynchronous communication is powerful, and in real systems you need both to work seamlessly together.

Demo #1

Recap

The Golden Path - Dapr + OpenTelemetry

The CNCF Opportunity: Shared Abstractions

Dapr Building Blocks • APIs to help developers build scalable and resilient distributed applications • PubSub, Workflows, Secrets/Configs, Conversation (LLMs), etc • All these APIs, behind the covers, implement cross-cutting concerns • Security • Resilience • Observability

How does it work?

How does it work?

How does it work?

How does it work?

How does it work?

How does it work?

helm install dapr

Sidecars to the rescue The application can use the PubSub APIs to publish messages

Service To Service Invocation API No need to complicate application logic with retries or CBs.

Ok, but what happen when things go wrong? • What happens if the kitchen service is down and the retries are exhausted? • What happens if Kafka is down? • We cannot leave our pizza customers without their pizzas!

Dapr Workflows: Resilient orchestrations • Workflows are defined in code, executed by the Dapr sidecar • Durable, long-running state management • Retries, timers, wait-for-events all included • No single point of failure or SaaS service needed • Workflows will keep trying no matter what goes down! (even the workflow runtime!!!)

How does it work?

It is not that simple

Pizza Orchestration

It is not that simple ++

OpenTelemetry OpenTelemetry (OTel) is an open source project designed to provide standardized tools and APIs for generating, collecting, and exporting telemetry data such as traces, metrics, and logs The de-facto standard for distributed tracing, supports metrics, logs, RUM, and profiling (experimental) The main goals of the project are: • Unified telemetry • Vendor-neutrality • Cross-platform

1/1/20241/1/2025 Commits: 27.168 PRs+Issues: 58.508 Source: CNCF Velocity

Commits: 44.486 PRs+Issues: 56.299

OpenTelemetry Collector

OpenTelemetry Collector

OpenTelemetry Collector

OpenTelemetry Operator

Making sense of all the complexity

Two Perspectives, One Goal

Who Owns Tracing? A Hidden Conundrum

import io.opentelemetry.api.GlobalOpenTelemetry; import io.opentelemetry.api.trace.Tracer; Tracer tracer = GlobalOpenTelemetry.getTracer(“application”); Span span = tracer.spanBuilder(“doWork”).startSpan(); … span.end();

Why Observability is Critical with Dapr

Why Async Is Hard to Observe

A Shared Pain: Context Gets Lost

Trace propagation with Dapr

Context Propagation for Async Workflows

W3C Trace Context

Demo setup

Workflow Details

Demo #2

What Works Today • Dapr supports OpenTelemetry out of the box • Sidecar emits spans for pub/sub, service invocation, and workflows • OpenTelemetry Operator enables auto-instrumentation • OpenTelemetry Collector handles ingestion, processing, export

What Works Today

Challenges • Async boundaries break context • Sidecars add additional hops • Workflow engines introduce thread + process separation

Context Propagation for Async Workflows

Challenges with gRPC streaming

Gaps and Fixes in Dapr & OTel

PR #57 PR #9213 PR #46 Trace context, SemConv Trace context, SemConv, Pub/Sub Span kind Propagating context to executors client side

Enabling the Golden Path…

Get Involved

Thank you!

Abstract Async workflows power modern microservices, but they can be notoriously hard to observe. In this talk, we show how two CNCF projects - Dapr, for developer-friendly building blocks, and OpenTelemetry, for unified observability create a golden path that bridges developer productivity and platform reliability. We’ll start by using Dapr Workflows and Pub/Sub to connect and orchestrate services without boilerplate. Then we’ll add the OpenTelemetry Operator for no-touch instrumentation, instantly delivering traces, metrics, and logs - even across asynchronous boundaries. You’ll see current OpenTelemetry capabilities for tracking async requests end-to-end, where the gaps are today, and practical ways to correlate events in complex workflows. Through a live demo, we’ll prove that with the right abstractions, shipping features fast and observing systems deeply can go hand in hand.