The *Big State* Monster: Taming State Size in Multi-Way Joins with FLIP-516

A presentation at Flink Forward 2025 in in Barcelona, Spain by Gustavo de Morais

Real-time data is essential, but often the path to it is fraught with hidden challenges. One of the biggest culprits in complex stream processing applications, particularly when dealing with multiple data sources, is the “big state problem.” When you chain together several binary joins in Apache Flink, each intermediate result needs to be stored in state, leading to an ever-growing monster that devours resources and degrades jobs.

Key takeaways from this session will include:

Understanding the “Big State” Problem: Why chaining binary joins leads to excessive state. Introducing the StreamingMultiJoinOperator: How it works to reduce state. Realized Benefits: Exploring the impact of zero intermediate state and new possible optimizations. Practical Considerations: Discussing current limitations and compatibility. Other approaches: What are other strategies on dealing with large joining jobs? What’s next for multi-way joins in Flink. By the end of this session, you’ll gain a clear understanding of how to mitigate the “big state” problem and unlock the full potential of efficient, high-performance multi-way joins in your Apache Flink streaming applications. The “big state” monster will be far less scary, and your real-time data journey will be smoother than ever!”