Game, Set, Match Transforming Live Sports with AI-Driven Commentary Dunith Danushka (Redpanda Data) Mark Needham (ClickHouse)

Dunith Danushka DevRel @ Redpanda Mark Needham Product @ ClickHouse

Can we create an AI Copilot to help live text writers?

What are we going to build?

The flow of events Window queries on ClickHouse and pass the results to OpenAI

What is Redpanda?

Redpanda is a Kafka API compatible streaming data platform ● ● ● ● Not a Kafka fork! Kafka rewritten in C++ Identical read/write interfaces as Kafka Designed for modern hardware

Simple to deploy, use and manage Single binary Kafka-compatible APIs Easy Day 2 Ops Dev-friendly interface © 2023 REDPANDA DATA

rpk Redpanda’s command line interface (CLI) utility. Check health of cluster rpk cluster health Create a topic rpk topic create my-topic -p 5 List topics rpk topic list Describe a topic rpk topic describe

Redpanda Demo

What is ClickHouse?

What is ClickHouse? Open Source Distributed Column-Oriented OLAP Database Developed since 2009 Files per column Replication Analytics use cases OSS 2016 Vectorized query execution Sharding Aggregations 34k+ Github stars Optimised for aggregations Multi-master Visualizations 1k+ contributors Sorting and indexing Cross-region Mostly immutable data 500+ releases Background merges

Row Oriented vs Column Oriented Row Oriented location ts temperature wind_speed humidity Aberystwyth 2022-01-01 00:00:00 14 21 79 Blackpool 2022-01-01 00:20:00 13 9 82 Column Oriented location ts temperature wind_speed humidity Aberystwyth 2022-01-01 00:00:00 14 21 79 Blackpool 2022-01-01 00:20:00 13 9 82

Vectorised Query Execution Process rows sequentially Process chunks of values

Flavours of ClickHouse chdb ClickHouse Local ClickHouse Server

ClickHouse Demo

What is Streamlit?

What is Streamlit? Streamlit turns data scripts into shareable web apps in minutes. All in pure Python. No front-end experience required.

Streamlit Hello World

Building the AI Copilot

The flow of events Window queries on ClickHouse and pass the results to OpenAI

Generated Commentary Demo

How do Large Language Models work? Prompt LLM

Ingesting context into the prompt Instructions Prompt Context Sports events feed

Sports events feed

LLM Code

Pulling events from ClickHouse

Generated text events

Retrieval queries: Recent points

Retrieval queries: Last completed game

Serving the live commentary Multiple consumers on the Redpanda topic FastAPI server that renders SSE events Post messages to Twitter API points livetext

Live Commentary Demo

Future Ideas

How can we extend this work? ● ● ● ● ● ● Automatic summaries every <x> seconds A Copilot that has access to fine-grained statistics Text to SQL so that the writer can ask questions of the data Can we use more batch data? Store the generated commentary for later analysis/tweaking of the prompt Compare the generated commentary with what the analyst chooses to publish

Is it only for sports? Weʼll be focusing on sports, but you could use it for any of the following problems: ● ● ● ● ● Live auctions Weather updates Local traffic reporting Current location of food delivery … Any use case where you have events that you want to summarise into a more readable format.

Thanks and Questions github.com/mneedham/devoxx-ai-sports-commentary www.linkedin.com/in/dunithd dunith.medium.com www.linkedin.com/in/markhneedham youtube.com/@LearnDataWithMark