@rmoff / 19 Mar 2024 / #kafkasum t π² Here be Dragons Stacktraces Flink SQL for Non-Java Developers i m Robin Moffatt, Principal DevEx Engineer @ Decodable
A presentation at Kafka Summit 2024 in March 2024 in London, UK by Robin Moffatt
@rmoff / 19 Mar 2024 / #kafkasum t π² Here be Dragons Stacktraces Flink SQL for Non-Java Developers i m Robin Moffatt, Principal DevEx Engineer @ Decodable
@decodableco @rmoff / #kafkasum m i i m Actual footage of a SQL Developer looking at Apache Flink for the first t e t
i @rmoff / #kafkasum m @decodableco t
@decodableco @rmoff / #kafkasum i m What Is Apache Flink? t
@decodableco @rmoff / #kafkasum i m A Brief History of Flink t
@decodableco @rmoff / #kafkasum i m or t
i @rmoff / #kafkasum m @decodableco t
@decodableco @rmoff / #kafkasum Back in the t e of dinosaurs Hadoop β’ Started life as a research project in 2 called Stratosphere. β’ This was the t 1, 1 0 m m i i i m e of MapReduce. Java and Scala were the only way to do this. t
@rmoff / #kafkasum Flink is a big project β’ Flink β’ Stateful Functions β’ β’ Kubernetes Operator β’ CDC Connector i on (incubating) m m i β’ Pa L M @decodableco t
@decodableco @rmoff / #kafkasum i m Capabilities t
@decodableco @rmoff / #kafkasum Connect to Lots of Source and Target Systems JDBC CDC Kinesis i m DynamoDB Object stores Kinesis Firehose t
@decodableco @rmoff / #kafkasum Stateful and Stateless computations β’ Filtering SELECT * FROM myStream WHERE foo=42 β’ Joining SELECT a., b. FROM myStream a INNER JOIN myLookup b ON a.id=b.foo_id β’ Transfor ng SELECT cost * tax_rate AS total_cost FROM myStream β’ Pattern matching SELECT * FROM myStream MATCH_RECOGNIZE ( PART ON BY id ORDER BY user_action_t [β¦] I T I i m M i m β’ β¦and a whole lot more m SELECT SU (order_value) AS total_order_values FROM orders i β’ Aggregations e t
@decodableco @rmoff / #kafkasum Batch and Strea i m i m Bounded and Unbounded ng t
@decodableco @rmoff / #kafkasum i m How Does Flink Work? t
@decodableco @rmoff / #kafkasum t i m << magic πͺ π§ >>
@decodableco @rmoff / #kafkasum i m // https: t nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/
@decodableco @rmoff / #kafkasum t n e p O s β t e L : e n i g n E L Q S s β ! k m n o o R Fli e n i g n E the r e h t l a W o T π£ y a d s e u π T m p 0 3 : 5 β° 2 m o o R t u o k a e r πΊ B i m m i // https: nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/
@decodableco @rmoff / #kafkasum t Running Flink Works on my machine⦠$ ./bin/start-cluster.sh Starting cluster. Starting standalonesession daemon on host asgard08. Starting taskexecutor daemon on host asgard08. e.taskexecutor. askManagerRunner e.entrypoint.StandaloneSessionClusterEntrypoint T m m i i i m $ jps -l 14656 org.apache.flink.runt 14379 org.apache.flink.runt
@decodableco @rmoff / #kafkasum i m Using Flink t
@decodableco @rmoff / #kafkasum Itβs not just Java β’ PyFlink β’ added in 1.9. in 2 9 in 2 8 β’ Flink SQL 1 1 0 0 0 0 i m β’ Added in 1.5. t
@decodableco @rmoff / #kafkasum i m Flink SQL t
@decodableco @rmoff / #kafkasum SQL Language Support ⒠Built on Apache Calcite ⒠Common Table Expression (C ) ( ⒠Set-based operations ⒠Joins ⒠Aggregations T I W E T i m ⒠And lots more⦠H) t
@decodableco @rmoff / #kafkasum Running Flink SQL β’ SQL Client β’ SQL Gateway β’ RE API β’ Hive β’ JDBC Driver i m T S β’ From Java or Python t
@decodableco @rmoff / #kafkasum t Flink SQL Client $ ./bin/sql-client.sh Welcome! Enter βHELP;β to list all available commands. βQU ;β to exit. Command history file path: /opt/ flink/.flink-sql-history Flink SQL> ββββββββ ββββββββββββββββ βββββββ βββββββ β ββββ βββββββββ βββββ βββ βββββββ βββββ βββ βββ βββββ ββ βββββββββββββββ ββ β βββ ββββββ βββββ βββββ ββββ βββββ βββββ βββββββ βββ βββββββ βββ βββββββββ ββ ββ ββββββββββ ββββββββ ββ β ββ βββββββ ββββ βββ β ββ ββββββββ βββββ ββββ β ββ β ββ ββββββββ ββββ ββ ββββ ββββ ββββββββββ βββ ββ ββββ ββββ ββ βββ βββββββββββ ββββ β β βββ βββ ββ βββ βββββββββ ββββ βββ ββ β βββββββ ββββββββ βββ ββ βββ βββ ββββββββββββββββββββ ββββ β βββββ βββ ββββββ ββββββββ ββββ ββ ββββββββ βββββββββββββββ ββ ββ ββββ βββββββ βββ ββββββ ββ βββ βββ βββ βββ βββββββ ββββ βββββββββββββ βββ βββββ ββββ ββ ββ ββββ βββ ββ βββ β ββ ββ ββ ββ ββ ββ ββ ββββββββ ββ βββββ ββ βββββββββββ ββ ββ ββββ β βββββββ ββ βββ βββββ ββ βββββββββββ ββββ ββββ βββββββ ββββββββ βββββ ββ ββββ βββββ βββββββββββββββββββββββββββββββββ βββββ T I i m ______ _ _ _ _____ ____ _ _____ _ _ _ BETA | | () | | / |/ __ | | / | () | | | | | | _ __ | | __ | ( | | | | | | | | | ___ _ __ | | | | | | | β | |/ / _ | | | | | | | | | |/ _ \ β | | | | | | | | | | < ____) | || | |____ | || | | _/ | | | | || |||| |||_\ |/ ___| ___|||_|| ||__|
@decodableco @rmoff / #kafkasum D i m // M E https: O github.com/decodableco/examples/kafka-iceberg t
@decodableco @rmoff / #kafkasum i m A Few Useful Settings t
@decodableco @rmoff / #kafkasum Runt S e Mode βexecution.runt β’ strea e-modeβ = βstrea ng [default] i m m i i m m i i m T E β’ batch ngβ; t
@decodableco @rmoff / #kafkasum Result Mode S βsql-client.execution.result-modeβ = βtableβ; β’ table [default] β’ changelog i m T E β’ tableau t
@decodableco @rmoff / #kafkasum Colour Scheme S βsql-client.display.color-schemaβ = βChesterβ; i m T E β’ Because why not?! t
@decodableco @rmoff / #kafkasum Changing the defaults Setting up a SQL Client initialisation file β’ Create a SQL file: $ cat init.sql S βexecution.runt e-modeβ = βbatchβ; S βsql-client.execution.result-modeβ = βtableauβ; β’ Launch SQL Client as a parameter: th the -i flag and pass the file i w m i i m T T E E ./bin/sql-client.sh -i init.sql t
@decodableco @rmoff / #kafkasum Sub tting SQL as a job β’ SQL Client $ ./bin/sql-client.sh βfile ~/my_query.sql β’ SQL Gateway curl βlocation βlocalhost:8083/sessions/42/statementsβ \ βheader βContent-Type: application/jsonβ \ βheader βAccept: application/jsonβ \ βdata β{ βstatementβ: βSELECT * FROM foo;β }β β’ Application mode support: FLIP-316: Support application mode for i m i m SQL Gateway t
@decodableco @rmoff / #kafkasum i m Some of the Gnarly Stuff t
@decodableco @rmoff / #kafkasum The Joy of JARs β’ For each connector, format, and catalog you need to install dependencies. β’ All of these are i m available as JARs (Java ARchive) t
@decodableco @rmoff / #kafkasum This ght jar a bitβ¦ Could not execute SQL statement. Reason: java.lang.ClassNotFoundException org.apache.flink.core.fs.UnsupportedFileSy ste chemeException: Could not find a file system plementation for scheme βs3β m i m i i i m m S m Could not find any factory for identifier βhiveβ that plements βorg.apache.flink.table.factories.CatalogF actoryβ in the classpath. t
@decodableco @rmoff / #kafkasum Finding JARs β’ Usually the docs ll tell you which JAR you need. β’ JARs are very specific to the versions of the tools that i w i m youβre using. t
i @rmoff / #kafkasum m @decodableco t
i @rmoff / #kafkasum m @decodableco t
i @rmoff / #kafkasum m @decodableco t
$ tree /opt/flink/lib @rmoff / #kafkasum βββ aws β βββ aws-java-sdk-bundle-1.12.648.jar β βββ hadoop-aws-3.3.4.jar βββ flink-cep-1.18.1.jar βββ flink-parquet_2.12-1.18.1.jar βββ flink-table-runt e-1.18.1.jar βββ hadoop β βββ commons-configuration2-2.1.1.jar β βββ commons-logging-1.1.3.jar β βββ hadoop-auth-3.3.4.jar βββ hive β βββ flink-sql-connector-hive-3.1.3_2.12-1.18.1.jar βββ iceberg β βββ iceberg-flink-runt βββ kafka e-1.18-1.5. .jar 0 0 m i m βββ flink-sql-connector-kafka-3.1. -1.18.jar i i β m @decodableco t
@decodableco @rmoff / #kafkasum i m Donβt forget to restart! t
@decodableco @rmoff / #kafkasum i m Tables, Connectors, and Catalogs t
@decodableco @rmoff / #kafkasum Tables CREA ( TABLE t_k_orders T T T T S S S m i m T E I T orderid RING, customerid RING, ordernumber IN , product RING, discountpercent INT ) W H ( βconnectorβ βtopicβ βproperties.bootstrap.serversβ βscan.startup. odeβ βformatβ ); = = = = = βkafkaβ, βordersβ, βbroker:29092β, βearliest-offsetβ, βjsonβ t
@decodableco @rmoff / #kafkasum This used to be s ple β’ The data and information about the data was all stored in the database β’ Information Schema β’ System Catalog m i i m β’ Data Dictionary Views t
@decodableco @rmoff / #kafkasum m i i m Now itβs not so s ple t
@decodableco @rmoff / #kafkasum i m Flink Catalogs t
@decodableco @rmoff / #kafkasum i m Flink Catalogs t
@decodableco @rmoff / #kafkasum i m Flink Catalogs t
@decodableco @rmoff / #kafkasum i m Flink Catalogs t
@decodableco @rmoff / #kafkasum i m Flink Catalogs t
@decodableco @rmoff / #kafkasum t Flink Catalogs T I i m E T CREA CATALOG c_hive W H ( βtypeβ = βhiveβ, βhive-conf-dirβ = β./confβ);
@decodableco @rmoff / #kafkasum Flink Catalogs i m table.catalog-store.kind: file table.catalog-store.file.path: ./conf/catalogs t
@decodableco @rmoff / #kafkasum D i m // M E https: O github.com/decodableco/examples/kafka-iceberg t
@decodableco @rmoff / #kafkasum i m In Conclusion⦠t
@decodableco @rmoff / #kafkasum Flink SQL is Fun! But thereβs a bit of a learning curve β’ Run ad-hoc queries th the SQL Client β’ Understand JAR dependencies for connectors, catalogs, formats, etc β’ Donβt be put off by the docs - there i i w i m there if you look hard enough s SQL content t
@decodableco @rmoff / #kafkasum i m decodable.co/blog t
@rmoff / #kafkasum t OF i m i m @rmoff / 19 Mar 2024 / #kafkasum E
@decodableco t