A presentation at Kafka Summit San Francisco 2019 in in San Francisco, CA, USA by Robin Moffatt
Building stream processing applications with Apache Kafka® using KSQL @rmoff #KafkaSummit
STREAM PROCESSING
PROCESSING STREAM
PROCESSING STREAM a of EVENTS
@rmoff STREAMS ARE of EVENTS EVERYWHERE
@rmoff A Customer Experience Building stream processing applications for Apache Kafka using KSQL
@rmoff A Sale Building stream processing applications for Apache Kafka using KSQL
@rmoff A Sensor Reading Building stream processing applications for Apache Kafka using KSQL
@rmoff An Application Log Entry Building stream processing applications for Apache Kafka using KSQL
@rmoff Databases Building stream processing applications for Apache Kafka using KSQL
@rmoff Immutable event log Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Streams of events Time Building stream processing applications for Apache Kafka using KSQL
Stream Processing with KSQL @rmoff #KafkaSummit Stream: widgets Stream: widgets_red Building stream processing applications for Apache Kafka using KSQL
Stream Processing with KSQL @rmoff #KafkaSummit Stream: widgets CREATE STREAM widgets_red AS SELECT * FROM widgets WHERE colour=’RED’; Stream: widgets_red Building stream processing applications for Apache Kafka using KSQL
Stream Processing with KSQL @rmoff #KafkaSummit Source stream Building stream processing applications for Apache Kafka using KSQL
Stream Processing with KSQL @rmoff #KafkaSummit Source stream Building stream processing applications for Apache Kafka using KSQL
Stream Processing with KSQL @rmoff #KafkaSummit Source stream Analytics Applications / Microservices Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit KSQL in action 🚀 https://rmoff.dev/kssf19-ksql-code Building stream processing applications for Apache Kafka using KSQL
@rmoff Building stream processing applications for Apache Kafka using KSQL
@rmoff https://rmoff.dev/kssf19-ksql-code Building stream processing applications for Apache Kafka using KSQL
DEMO https://rmoff.dev/kssf19-ksql-code
Code! @rmoff #KafkaSummit https://rmoff.dev/kssf19-ksql-code Building stream processing applications for Apache Kafka using KSQL
MQTT + Kafka + KSQL + Elastic = ❤ @rmoff #KafkaSummit Building stream processing applications for Apache Kafka using KSQL
@rmoff Building stream processing applications for Apache Kafka using KSQL
@rmoff Building stream processing applications for Apache Kafka using KSQL
@rmoff http://confluent.cloud/signup Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Interacting with KSQL 📬 Building stream processing applications for Apache Kafka using KSQL
KSQL - Confluent Control Center @rmoff #KafkaSummit Building stream processing applications for Apache Kafka using KSQL
KSQL - CLI @rmoff #KafkaSummit Building stream processing applications for Apache Kafka using KSQL
KSQL - REST API @rmoff #KafkaSummit Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit KSQL operations and deployment 💾 Building stream processing applications for Apache Kafka using KSQL
KSQL in Development and Production Interactive KSQL for development and testing @rmoff #KafkaSummit Headless KSQL for Production REST Desired KSQL queries have been identified “Hmm, let me try out this idea…” Building stream processing applications for Apache Kafka using KSQL
How to run KSQL @rmoff #KafkaSummit DEB, RPM, ZIP, TAR downloads http://confluent.io/ksql Docker images KSQL Server confluentinc/cp-ksql-server confluentinc/cp-ksql-cli (JVM process) …and many more… Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Think Applications, not database instances Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Monitoring KSQL Confluent Control Center JMX https://www.confluent.io/blog/troubleshooting-ksql-part-2 Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit http://cnfl.io/book-bundle Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit #EOF 💬 Join the Confluent Community Slack group at http://cnfl.io/slack https://talks.rmoff.net
@rmoff #KafkaSummit Related Talks •The Changing Face of ETL: Event-Driven Architectures for Data Engineers •Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline! • 📖 Slides • 📖 Slides • 📽 Recording • 👾 Code • 📽 Recording •ATM Fraud detection with Kafka and KSQL • 📖 Slides •No More Silos: Integrating Databases and Apache Kafka • 👾 Code • 📖 Slides • 📽 Recording • 👾 Code (MySQL) • 👾 Code (Oracle) •Embrace the Anarchy: Apache Kafka’s Role in Modern Data Architectures • 📽 Recording • 📖 Slides • 📽 Recording Building stream processing applications for Apache Kafka using KSQL
Bonus content!
@rmoff #KafkaSummit KSQL in action 🚀 Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Filtering with KSQL ORDERS Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Filtering with KSQL ORDERS KSQL CREATE STREAM ORDERS_NY AS SELECT * FROM ORDERS WHERE ADDRESS->STATE=’New York’; Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Filtering with KSQL ORDERS KSQL CREATE STREAM ORDERS_NY AS SELECT * FROM ORDERS WHERE ADDRESS->STATE=’New York’; ORDERS_NY Building stream processing applications for Apache Kafka using KSQL
Schema manipulation with KSQL ORDERS @rmoff #KafkaSummit { “ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address”: { “street”: “243 Utah Way”, “city”: “Orange”, “state”: “California” } } Building stream processing applications for Apache Kafka using KSQL
Schema manipulation with KSQL @rmoff #KafkaSummit { “ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address”: { “street”: “243 Utah Way”, “city”: “Orange”, “state”: “California” } } ORDERS_NO_ADDRESS_DATA AS ORDERS KSQL CREATE STREAM SELECT ORDERTIME, ORDERID, ITEMID, ORDERUNITS FROM ORDERS; Building stream processing applications for Apache Kafka using KSQL
Schema manipulation with KSQL @rmoff #KafkaSummit { “ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address”: { “street”: “243 Utah Way”, “city”: “Orange”, “state”: “California” } AS ORDERS_NO_ADDRESS_DATA } ORDERS KSQL CREATE STREAM SELECT TIMESTAMPTOSTRING(ROWTIME, ‘yyyy-MM-dd HH:mm:ss’) AS ORDER_TIMESTAMP, ORDERID, ITEMID, ORDERUNITS FROM ORDERS; ORDERS_NO_ADDRESS_DATA { “order_ts”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5 } Building stream processing applications for Apache Kafka using KSQL
Schema manipulation with KSQL @rmoff #KafkaSummit { ORDERS } “ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address”: { “street”: “243 Utah Way”, “city”: “Orange”, “state”: “California” } Building stream processing applications for Apache Kafka using KSQL
Schema manipulation with KSQL ORDERS KSQL @rmoff #KafkaSummit { “ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address”: { “street”: “243 Utah Way”, “city”: “Orange”, “state”: “California” } CREATE STREAM ORDERS_FLAT AS SELECT […] } ADDRESS->STREET AS ADDRESS_STREET, ADDRESS->CITY AS ADDRESS_CITY, ADDRESS->STATE AS ADDRESS_STATE FROM ORDERS; Building stream processing applications for Apache Kafka using KSQL
Schema manipulation with KSQL @rmoff #KafkaSummit { ORDERS KSQL “ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address”: { “street”: “243 Utah Way”, “city”: “Orange”, “state”: “California” } CREATE STREAM ORDERS_FLAT AS SELECT […] } ADDRESS->STREET AS ADDRESS_STREET, ADDRESS->CITY AS ADDRESS_CITY, ADDRESS->STATE AS ADDRESS_STATE FROM ORDERS; ORDERS_FLAT {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address-street”: “243 Utah Way”, “address-city”: “Orange”, “address-state”: “California”} Building stream processing applications for Apache Kafka using KSQL
Reserialising data with KSQL ORDERS @rmoff #KafkaSummit {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address-street”: “243 Utah Way”, “address-city”: “Orange”, “address-state”: “California”} Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Reserialising data with KSQL ORDERS KSQL {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address-street”: “243 Utah Way”, “address-city”: “Orange”, “address-state”: “California”} CREATE STREAM ORDERS_CSV WITH (VALUE_FORMAT=’DELIMITED’) AS SELECT * FROM ORDERS_FLAT; Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Reserialising data with KSQL ORDERS KSQL {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address-street”: “243 Utah Way”, “address-city”: “Orange”, “address-state”: “California”} CREATE STREAM ORDERS_CSV WITH (VALUE_FORMAT=’DELIMITED) AS SELECT * FROM ORDERS_FLAT; ORDERS_CSV 1560045914101,24644,Item_0,1,43078 De 1560047305664,24643,Item_29,3,209 Mon 1560057079799,24642,Item_38,18,3 Autu 1560088652051,24647,Item_6,6,82893 Ar 1560105559145,24648,Item_0,12,45896 W 1560108336441,24646,Item_33,4,272 Hef 1560123862235,24641,Item_15,16,0 Dort 1560124799053,24645,Item_12,1,71 Knut Building stream processing applications for Apache Kafka using KSQL
Lookups and Joins with KSQL ORDERS @rmoff #KafkaSummit {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5} Building stream processing applications for Apache Kafka using KSQL
Lookups and Joins with KSQL @rmoff #KafkaSummit { “id”: “Item_9”, “make”: “Boyle-McDermott”, “model”: “Apiaceae”, “unit_cost”: 19.9 ITEMS ORDERS } {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5} Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Lookups and Joins with KSQL { “id”: “Item_9”, “make”: “Boyle-McDermott”, “model”: “Apiaceae”, “unit_cost”: 19.9 ITEMS } ORDERS KSQL CREATE STREAM ORDERS_ENRICHED AS SELECT O., I., O.ORDERUNITS * I.UNIT_COST AS TOTAL_ORDER_VALUE, FROM ORDERS O INNER JOIN ITEMS I ON O.ITEMID = I.ID ; {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5} Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Lookups and Joins with KSQL { “id”: “Item_9”, “make”: “Boyle-McDermott”, “model”: “Apiaceae”, “unit_cost”: 19.9 ITEMS } ORDERS KSQL CREATE STREAM ORDERS_ENRICHED AS SELECT O., I., O.ORDERUNITS * I.UNIT_COST AS TOTAL_ORDER_VALUE, FROM ORDERS O INNER JOIN ITEMS I ON O.ITEMID = I.ID ; {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5} ORDERS_ENRICHED { } “ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “make”: “Boyle-McDermott”, “model”: “Apiaceae”, “unit_cost”: 19.9, “total_order_value”: 99.5 Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Connecting to other systems with Kafka Connect KSQL CREATE STREAM ORDERS_ENRICHED AS SELECT […] FROM ORDERS O INNER JOIN ITEMS I ON O.ITEMID = I.ID ; Kafka Connect Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Stateful Aggregation with KSQL ORDERS Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Stateful Aggregation with KSQL ORDERS SELECT MAKE, COUNT(*) AS ORDER_COUNT FROM ORDERS_ENRICHED GROUP BY MAKE; Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Stateful Aggregation with KSQL ORDERS SELECT MAKE, COUNT(*) AS ORDER_COUNT FROM ORDERS_ENRICHED GROUP BY MAKE; Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Transform data with KSQL - merge streams ORDERS US US UK ORDERS_UK UK Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Transform data with KSQL - merge streams ORDERS US US INSERT INTO ORDERS_COMBINED SELECT ‘US’ AS SOURCE, ORDERTIME, ITEMID, ORDERUNITS, ADDRESS FROM ORDERS; UK ORDERS_UK UK INSERT INTO ORDERS_COMBINED SELECT ‘UK’ AS SOURCE, ORDERTIME, ITEMID, ORDERUNITS, ADDRESS FROM ORDERS_UK; Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Transform data with KSQL - merge streams ORDERS US UK US INSERT INTO ORDERS_COMBINED SELECT ‘US’ AS SOURCE, ORDERTIME, ITEMID, ORDERUNITS, ADDRESS US FROM ORDERS; ORDERS_UK UK UK UK INSERT INTO ORDERS_COMBINED SELECT ‘UK’ AS SOURCE, ORDERTIME, ITEMID, ORDERUNITS, ADDRESS US FROM ORDERS_UK; ORDERS_COMBINED Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Transform data with KSQL - split streams US UK UK US ORDERS_COMBINED Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Transform data with KSQL - split streams US UK CREATE STREAM ORDERS_US AS SELECT * FROM ORDERS_COMBINED WHERE SOURCE =’US’; UK US ORDERS_COMBINED CREATE STREAM ORDERS_UK AS SELECT * FROM ORDERS_COMBINED WHERE SOURCE =’UK’; Building stream processing applications for Apache Kafka using KSQL
@rmoff #KafkaSummit Transform data with KSQL - split streams US UK CREATE STREAM ORDERS_US AS SELECT * FROM ORDERS_COMBINED WHERE SOURCE =’US’; US US ORDERS_US US UK ORDERS_COMBINED CREATE STREAM ORDERS_UK AS SELECT * FROM ORDERS_COMBINED WHERE SOURCE =’UK’; UK UK ORDERS_UK Building stream processing applications for Apache Kafka using KSQL
Apache Kafka is a de facto standard streaming data processing platform, being widely deployed as a messaging system, and having a robust data integration framework (Kafka Connect) and stream processing API (Kafka Streams) to meet the needs that common attend real-time message processing. But there’s more!
KSQL is a declarative, SQL-like stream processing language that lets you easily define powerful stream-processing applications. What once took some moderately sophisticated Java code can now be done at the command line with a familiar and eminently approachable syntax.
Filtering one stream of data into another, creating derived columns, even joining two topics together—it’s all possible with KSQL. Come to this talk for a thorough overview of KSQL. There’ll be plenty of live coding on streaming data to illustrate clearly KSQL’s awesome power!
The following resources were mentioned during the presentation or are useful additional information.
Here’s what was said about this presentation on social media.
Excited to hear about KSQL from @rmoff - with live demos! #KafkaSummit pic.twitter.com/nknVP7ImW6
— Nikki (@NikkiThean) October 1, 2019
My friend @rmoff on #KafkaSummit explaining the concept of events to set the stage for furthermore explaining stream processing. pic.twitter.com/LGvRxuimuT
— Ricardo Ferreira (@riferrei) October 1, 2019
#KafkaSummit @rmoff KSQL applications pic.twitter.com/mXAxf343Dd
— lenny (@lny) October 1, 2019
— Stefan Frehse (@sfrehse) October 1, 2019
#speakerselfie #StreamingSelfie #KafkaSummit pic.twitter.com/QbqL0LO3TZ
— 𝚁𝚘𝚋𝚒𝚗 𝙼𝚘𝚏𝚏𝚊𝚝𝚝 🍻🏃🥓 (@rmoff) October 1, 2019
@rmoff killing it with the KSQL demo, one of the most amazing presentations this #KafkaSummit ! #kSQL #streaming #Kafka
— Namit Mahuvakar (@namit_mahuvakar) October 2, 2019
People, do check the video out when it's out ! :) pic.twitter.com/a3QipPjHxi
Live coding demo by @rmoff showcasing how to build streaming applications quickly with KSQL and @apachekafka, ft. MQTT and Elasticsearch. Video and slides will be available after #KafkaSummit. https://t.co/mT8fPWXZHA pic.twitter.com/Wwt6kfB7pK
— Michael G. Noll (@miguno) October 2, 2019
Subtle rickroll in @rmoff's data... #nevergonnagiveyouup #KafkaSummit
— Nikki (@NikkiThean) October 2, 2019
@rmoff showing that KSQL can be lots of BS but — in a good way on his session on #KafkaSummit. pic.twitter.com/dgR6mZ3paf
— Ricardo Ferreira (@riferrei) October 2, 2019
Run and race tracking with KSQL! @rmoff #KafkaSummit #kafkasummitbridgerun pic.twitter.com/FS8pWTzBUc
— Nikki (@NikkiThean) October 2, 2019
It’s hard to fill a room in the last slot of the day, but @rmoff does it, talking about KSQL. #kafkasummit pic.twitter.com/4dBLa86vxA
— Tim Berglund (@tlberglund) October 2, 2019
The rockstar speaker and #ksql expert isn't leaving not until all of the questions are answered. What a dedication to the #kafkasummit community pic.twitter.com/GIArocY8kg
— Viktor Gamov @ #KafkaSummit 🌁 (@gAmUssA) October 2, 2019