Golden Paths for Async Workflows: Dapr Meets OpenTelemetry Mauricio “Salaboy” Salatino, Ecosystem Engineer at Diagrid.io Kasper Borg Nissen, Principal Developer Advocate at Dash0

Who?

Mauricio Salatino Kasper Borg Nissen Ecosystem Engineer & Passionate about Open Source Platform Engineering on Kubernetes Author Principal Developer Advocate at Dash0 Former KubeCon Co-Chair NA/EU CNCF Ambassador Golden Kubestronaut CNCG Aarhus Cloud Native Nordics/Denmark

TLTW (too long to watch) If your applications and infrastructure grows in complexity you must have the right tools to understand what is going on at all times.

Part 1 Why Async is Powerful

The Rise of Async Microservices

Limitations with synchronous communication

Async?

Synchronous communication is familiar, Asynchronous communication is powerful, and in real systems you need both to work seamlessly together.

Demo 1: Pizza ordering app

Recap

Part 2: The Golden Path - Dapr + OpenTelemetry -

The CNCF Opportunity: Shared Abstractions

Dapr Building Blocks • APIs to help developers build scalable and resilient distributed applications • PubSub, Workflows, Secrets/Configs, Conversation (LLMs), etc • All these APIs, behind the covers, implement cross-cutting concerns • Security • Resilience • Observability

How does it work?

How does it work?

How does it work?

How does it work?

How does it work?

How does it work?

helm install dapr

Sidecars to the rescue The application can use the PubSub APIs to publish messages

Service To Service Invocation API No need to complicate application logic with retries or CBs.

Ok, but what happen when things go wrong? • What happens if the kitchen service is down and the retries are exhausted? • What happens if Kafka is down? • We cannot leave our pizza customers without their pizzas!

Dapr Workflows: Resilient orchestrations • Workflows are defined in code, executed by the Dapr sidecar • Durable, long-running state management • Retries, timers, wait-for-events all included • No single point of failure or SaaS service needed • Workflows will keep trying no matter what goes down! (even the workflow runtime!!!)

How does it work?

It is not that simple

Pizza Orchestration

It is not that simple ++

OpenTelemetry OpenTelemetry (OTel) is an open source project designed to provide standardized tools and APIs for generating, collecting, and exporting telemetry data such as traces, metrics, and logs. The de-facto standard for distributed tracing, supports metrics, logs, RUM, and profiling (experimental)

Goals of the project Unified telemetry Vendor-neutrality Cross-platform

1/1/20241/1/2025 Commits: 27.168 PRs+Issues: 58.508 Source: CNCF Velocity - Commits: 44.486 PRs+Issues: 56.299

49% of respondents using OpenTelemetry in production. 26% of respondents evaluating OpenTelemetry. Source: https://www.cncf.io/wp-content/uploads/2026/01/CNCF_Annual_Survey_Report_final.pdf -

OpenTelemetry Collector

OpenTelemetry Collector

OpenTelemetry Collector

OpenTelemetry Operator

Part 3: Making sense of all the complexity -

Two Perspectives, One Goal

Who Owns Tracing? A Hidden Conundrum

import io.opentelemetry.api.GlobalOpenTelemetry; import io.opentelemetry.api.trace.Tracer; Tracer tracer = GlobalOpenTelemetry.getTracer(“application”); Span span = tracer.spanBuilder(“doWork”).startSpan(); … span.end();

Why Observability is Critical with Dapr

Why Async Is Hard to Observe

A Shared Pain: Context Gets Lost

Trace propagation with Dapr

Context Propagation for Async Workflows

W3C Trace Context

Demo setup

Workflow Details

Demo 2: Putting it all together

What Works Today • Dapr supports OpenTelemetry out of the box • Sidecar emits spans for pub/sub, service invocation, and workflows • Consistent parent-child relationships • OpenTelemetry Operator enables auto-instrumentation • OpenTelemetry Collector handles ingestion, processing, export

What Works Today

Challenges • Async boundaries break context • Sidecars add additional hops • Workflow engines introduce thread + process separation

Context Propagation for Async Workflows

Challenges with gRPC streaming

Trace Topology Is a Design Choice • Parent–child implies temporal enclosure • Links imply causal relationship • Async systems force us to choose deliberately

Parent–Child vs Span Links

Parent-Child works when… • Caller waits for completion • Temporal enclosure is real • Execution is synchronous or actively coordinated Span links fit better when… • Work is asynchronous • No blocking relationship exists • Causality does not imply execution order Examples: • HTTP / gRPC calls • Service invocation • Workflow orchestration (when the orchestrator is actively running) Examples: • Pub/Sub (producer → consumer) • Fan-out / fan-in • Fire-and-forget events

Why we started here • Optimize for human understanding first • Make the workflow readable as a single story • Refine semantics once the mental model is clear

Proposing a maturity model for OpenTelemetry support • A shared framework for evaluating OpenTelemetry support • Inspired by the CNCF Platform Engineering maturity model • Descriptive, not prescriptive • Focused on evolution, not scoring Github Issue: https://github.com/open-telemetry/community/issues/3247 -

OpenTelemetry maturity evolves through real fixes Level 0: Instrumented Level 1: OpenTelemetry Aligned Telemetry exists primarily to support internal debugging and development needs. OpenTelemetry is not yet a primary design concern. OpenTelemetry is explicitly supported, often alongside legacy approaches. Telemetry works for common scenarios, but legacy assumptions still influence design. Context propagation exists and working Parent–child relationships everywhere Level 2: OpenTelemetry Native OpenTelemetry is the primary integration surface. Telemetry is designed intentionally, with correlation and user experience in mind. OpenTelemetry support is continuously refined based on real-world usage and feedback. Telemetry is treated as a long-lived product surface. Intentional trace design (span links at async boundaries) Custom semantic conventions, refinement, and stewardship Github Issue: https://github.com/open-telemetry/community/issues/3247 - Level 3: OpenTelemetry Optimized

Gaps and Fixes in Dapr & OTel PR #57 PR #9213 PR #46 Trace context, SemConv Trace context, SemConv, Pub/Sub Span kind Propagating context to executors client side If you want to learn more Check the Dapr University https://www.diagrid.io/dapr-university#dapr-workflow

Get Involved If you want to learn more Check the Dapr University https://www.diagrid.io/dapr-university#dapr-workflow -

Enabling the Golden Path…

That’s all folks! Thank you!

Get in touch with us! Kasper Borg Nissen

Mauricio Salatino