Slide 1
Golden Paths for Async Workflows:
Dapr Meets OpenTelemetry Mauricio “Salaboy” Salatino, Ecosystem Engineer at Diagrid.io Kasper Borg Nissen, Principal Developer Advocate at Dash0
Slide 2
Who?
Mauricio Salatino
Kasper Borg Nissen
Ecosystem Engineer & Passionate about Open Source Platform Engineering on Kubernetes Author
Principal Developer Advocate at Dash0 Former KubeCon Co-Chair NA/EU CNCF Ambassador Golden Kubestronaut CNCG Aarhus Cloud Native Nordics/Denmark
Slide 3
TLTW (too long to watch)
If your applications and infrastructure grows in complexity you must have the right tools to understand what is going on at all times.
Slide 4
Part 1
Why Async is Powerful
Slide 5
The Rise of Async Microservices
Slide 6
Limitations with synchronous communication
Slide 7
Slide 8
Synchronous communication is familiar, Asynchronous communication is powerful, and in real systems you need both to work seamlessly together.
Slide 9
Demo 1:
Pizza ordering app
Slide 10
Slide 11
Part 2:
The Golden Path - Dapr + OpenTelemetry -
Slide 12
The CNCF Opportunity: Shared Abstractions
Slide 13
Slide 14
Dapr Building Blocks • APIs to help developers build scalable and resilient distributed applications • PubSub, Workflows, Secrets/Configs, Conversation (LLMs), etc • All these APIs, behind the covers, implement cross-cutting concerns • Security • Resilience • Observability
Slide 15
Slide 16
Slide 17
Slide 18
Slide 19
Slide 20
How does it work?
helm install dapr
Slide 21
Sidecars to the rescue
The application can use the PubSub APIs to publish messages
Slide 22
Service To Service Invocation API No need to complicate application logic with retries or CBs.
Slide 23
Ok, but what happen when things go wrong? • What happens if the kitchen service is down and the retries are exhausted? • What happens if Kafka is down? • We cannot leave our pizza customers without their pizzas!
Slide 24
Dapr Workflows: Resilient orchestrations • Workflows are defined in code, executed by the Dapr sidecar • Durable, long-running state management • Retries, timers, wait-for-events all included • No single point of failure or SaaS service needed • Workflows will keep trying no matter what goes down! (even the workflow runtime!!!)
Slide 25
Slide 26
Slide 27
Slide 28
Slide 29
Slide 30
OpenTelemetry OpenTelemetry (OTel) is an open source project designed to provide standardized tools and APIs for generating, collecting, and exporting telemetry data such as traces, metrics, and logs. The de-facto standard for distributed tracing, supports metrics, logs, RUM, and profiling (experimental)
Goals of the project Unified telemetry Vendor-neutrality Cross-platform
Slide 31
1/1/20241/1/2025
Commits: 27.168 PRs+Issues: 58.508
Source: CNCF Velocity -
Commits: 44.486 PRs+Issues: 56.299
Slide 32
49%
of respondents using OpenTelemetry in production.
26%
of respondents evaluating OpenTelemetry.
Source: https://www.cncf.io/wp-content/uploads/2026/01/CNCF_Annual_Survey_Report_final.pdf -
Slide 33
Slide 34
Slide 35
Slide 36
Slide 37
Part 3:
Making sense of all the complexity -
Slide 38
Two Perspectives, One Goal
Slide 39
Who Owns Tracing? A Hidden Conundrum
Slide 40
Slide 41
Slide 42
import io.opentelemetry.api.GlobalOpenTelemetry; import io.opentelemetry.api.trace.Tracer; Tracer tracer = GlobalOpenTelemetry.getTracer(“application”); Span span = tracer.spanBuilder(“doWork”).startSpan(); … span.end();
Slide 43
Why Observability is Critical with Dapr
Slide 44
Why Async Is Hard to Observe
Slide 45
A Shared Pain: Context Gets Lost
Slide 46
Trace propagation with Dapr
Slide 47
Context Propagation for Async Workflows
Slide 48
Slide 49
Slide 50
Slide 51
Demo 2:
Putting it all together
Slide 52
What Works Today • Dapr supports OpenTelemetry out of the box • Sidecar emits spans for pub/sub, service invocation, and workflows • Consistent parent-child relationships • OpenTelemetry Operator enables auto-instrumentation • OpenTelemetry Collector handles ingestion, processing, export
Slide 53
Slide 54
Challenges • Async boundaries break context • Sidecars add additional hops • Workflow engines introduce thread + process separation
Slide 55
Context Propagation for Async Workflows
Slide 56
Challenges with gRPC streaming
Slide 57
Trace Topology Is a Design Choice • Parent–child implies temporal enclosure • Links imply causal relationship • Async systems force us to choose deliberately
Slide 58
Parent–Child vs Span Links
Parent-Child works when… • Caller waits for completion • Temporal enclosure is real • Execution is synchronous or actively coordinated
Span links fit better when… • Work is asynchronous • No blocking relationship exists • Causality does not imply execution order
Examples: • HTTP / gRPC calls • Service invocation • Workflow orchestration (when the orchestrator is actively running)
Examples: • Pub/Sub (producer → consumer) • Fan-out / fan-in • Fire-and-forget events
Slide 59
Why we started here • Optimize for human understanding first • Make the workflow readable as a single story • Refine semantics once the mental model is clear
Slide 60
Proposing a maturity model for OpenTelemetry support • A shared framework for evaluating OpenTelemetry support • Inspired by the CNCF Platform Engineering maturity model • Descriptive, not prescriptive • Focused on evolution, not scoring
Github Issue: https://github.com/open-telemetry/community/issues/3247 -
Slide 61
OpenTelemetry maturity evolves through real fixes Level 0: Instrumented
Level 1: OpenTelemetry Aligned
Telemetry exists primarily to support internal debugging and development needs. OpenTelemetry is not yet a primary design concern.
OpenTelemetry is explicitly supported, often alongside legacy approaches. Telemetry works for common scenarios, but legacy assumptions still influence design.
Context propagation exists and working
Parent–child relationships everywhere
Level 2: OpenTelemetry Native OpenTelemetry is the primary integration surface. Telemetry is designed intentionally, with correlation and user experience in mind.
OpenTelemetry support is continuously refined based on real-world usage and feedback. Telemetry is treated as a long-lived product surface.
Intentional trace design (span links at async boundaries)
Custom semantic conventions, refinement, and stewardship
Github Issue: https://github.com/open-telemetry/community/issues/3247 -
Level 3: OpenTelemetry Optimized
Slide 62
Gaps and Fixes in Dapr & OTel
PR #57
PR #9213
PR #46
Trace context, SemConv
Trace context, SemConv, Pub/Sub Span kind
Propagating context to executors client side
If you want to learn more Check the Dapr University
https://www.diagrid.io/dapr-university#dapr-workflow
Slide 63
Get Involved
If you want to learn more Check the Dapr University
https://www.diagrid.io/dapr-university#dapr-workflow -
Slide 64
Enabling the Golden Path…
Slide 65
That’s all folks!
Thank you!
Slide 66
Get in touch with us!
Kasper Borg Nissen
Mauricio Salatino