Real-time SQL stream processing at scale with Apache Kafka and KSQL

A presentation at Strata Data Conference, London in in London, UK by Robin Moffatt

Have you ever thought that you needed to be a programmer to do stream processing and build streaming data pipelines? Think again. Apache Kafka is a distributed, scalable, and fault-tolerant streaming platform that provides low-latency pub-sub messaging coupled with a native storage and stream processing capabilities. Integrating Kafka with RDBMS, NoSQL, and object stores is simple with Kafka Connect, part of Apache Kafka. KSQL—the open source SQL streaming engine for Apache Kafka—makes it possible to build stream processing applications at scale, written using a familiar SQL interface.

Robin Moffatt walks you through the architectural reasoning for Apache Kafka and the benefits of real-time integration. You’ll then build a streaming data pipeline using nothing but your bare hands, Kafka Connect, and KSQL.

Gasp as you filter events in real time! Be amazed at how we can enrich streams of data with data from RDBMS! Be astonished at the power of streaming aggregates for anomaly detection!

Topics include:

  • Introduction to Apache Kafka (including Kafka Connect for streaming data from databases into Apache Kafka)
  • Streaming concepts (all data is events; stream/table duality)
  • Introduction to KSQL
  • How to run KSQL
  • Exploring kafka topics in KSQL
  • Defining KSQL streams and tables over source data
  • Filtering data in KSQL
  • Joining data in KSQL
  • Aggregating data in KSQL
  • Persisting stream queries
  • Examining derived Apache Kafka topics

Resources

The following resources were mentioned during the presentation or are useful additional information.

Code

The following code examples from the presentation can be tried out live.

Buzz and feedback

Here’s what was said about this presentation on social media.