A presentation at Pittsburgh Apache Kafka® Meetup in in Pittsburgh, PA, USA by Viktor Gamov
apiVersion: cluster.confluent.com/v1alpha1 kind: KafkaCluster Who’s tweeting about #DSGTech and #KSQL? spec: image: confluent-docker.jfrog.io/confluent-operator/kafka jvmConfig: heapSize: 4G metricReporter: bootstrapEndpoint: Using Apachekafka:9071 Kafka, Kafka Connect, enabled: true KSQL and Kubernetes internal: false publishMs: 30000 February 2019 | @gamussa
#devkafkaops @gamussa | @ #DSGTech | @confluentinc
@gamussa | #DSGTech | @confluentinc
https://twitter.com/kelseyhightower/status/963413508300812295 @gamussa | @ #DSGTech | @confluentinc
https://twitter.com/kelseyhightower/status/963414038603427840 @gamussa | @ #DSGTech | @confluentinc
Don’t despair… “… not even over the fact that you don’t despair. Just when everything seems over with, new forces come marching up, and precisely that means that you are alive” Franz Kafka @gamussa | @ #DSGTech | @confluentinc
Kafka Streaming Architecture Fundamentals
@gamussa | @ #DSGTech | @confluentinc
@gamussa | @ #DSGTech | @confluentinc
Event Streaming Platform Architecture Application Application Application KSQL Native Client library Kafka Streams Kafka Streams Load Balancer * REST Proxy Schema Registry Kafka Brokers @gamussa | @ #DSGTech Kafka Connect Zookeeper Nodes | @confluentinc
Kubernetes Fundamentals
Microservices Docker Kubernetes Monolith @gamussa | @ #DSGTech | @confluentinc
https://twitter.com/sahrizv/status/1018184792611827712 @gamussa | @ #DSGTech | @confluentinc
@gamussa | @ #DSGTech | @confluentinc
Orchestration Compute Networking Storage Service Discovery @gamussa | @ #DSGTech | @confluentinc
Kubernetes Schedules and allocates resources Networking between Pods Storage Service Discovery @gamussa | @ #DSGTech | @confluentinc
Refresher - Kubernetes Architecture kubectl https://thenewstack.io/kubernetes-an-overview/ @gamussa | @ #DSGTech | @confluentinc
Pod Basic Unit of Deployment in Kubernetes A collection of containers sharing: Namespace Network Volumes @gamussa | @ #DSGTech | @confluentinc
Storage Persistent Volume (PV) & Persistent Volume Claim (PVC) Both PV and PVC are ‘resources’ @gamussa | @ #DSGTech | @confluentinc
Storage Persistent Volume (PV) & Persistent Volume Claim (PVC) PV is a piece of storage that is provisioned dynamic or static of any individual pod that uses the PV @gamussa | @ #DSGTech | @confluentinc
Storage Persistent Volume (PV) & Persistent Volume Claim (PVC) PVC is a request for storage by a User @gamussa | @ #DSGTech | @confluentinc
Storage Persistent Volume (PV) & Persistent Volume Claim (PVC) PVCs consume PV @gamussa | @ #DSGTech | @confluentinc
Stateful Workloads
StatefulSet Rely on Headless Service to provide network identity Headless Service Pod-0 Ideal for highly available stateful workloads @gamussa | @ #DSGTech | Pod-1 Pod-2 Containers Containers Containers Volumes Volumes Volumes @confluentinc
StatefulSet Rely on Headless Service to provide network identity @gamussa | Headless Service Pod-0 @ #DSGTech | Pod-1 Pod-2 Containers Containers Containers Volumes Volumes Volumes @confluentinc
StatefulSet Ideal for highly available stateful workloads @gamussa | Headless Service Pod-0 @ #DSGTech | Pod-1 Pod-2 Containers Containers Containers Volumes Volumes Volumes @confluentinc
Workloads Deployment @gamussa #DSGTech @confluentinc
Helm Charts @gamussa | @ #DSGTech | @confluentinc
Helm Charts @gamussa | @ #DSGTech | @confluentinc
Helm Charts @gamussa | @ #DSGTech | @confluentinc
https://cnfl.io/helm_video @gamussa | #DSGTech | @confluentinc
Helm Charts Package Manager Package multiple K8s resources into one deployment unit: Chart @gamussa | @ #DSGTech | @confluentinc
Kafka deployment checklist PVC for Storage Uses ZK Headless Svc StatefulSet for 3-node zk PVC for Storage Optional Pod Anti-Affinity to spread the ZK ensemble across nodes StatefulSet for n-node Kafka Headless Service ConfigMap for Prometheus JMX exporter @gamussa | @ #DSGTech A group of NodePort Services for external traffic ConfigMap for Prometheus JMX exporter | @confluentinc
Basic components are not enough @gamussa #DSGTech @confluentinc
Meet Kubernetes Operator @gamussa | @ #DSGTech | @confluentinc
Kubernetes Operator Embedded with operational knowledge of both data software and Kubernetes Backup/restore Scale up/down Rebalance data Regular health checks @gamussa | @ #DSGTech | @confluentinc
Controller Brain behind Kubernetes resources e.g. replication controller, namespace controller etc. @gamussa | @ #DSGTech | @confluentinc
Custom Resource Definition(CRD) Extend existing Kubernetes API API StatefulSet ReplicaSet … CRD Controller StatefulSet Controller ReplicaSet Controller … Custom Controller ReplicaSet … Custom Resource Instance @gamussa | @ #DSGTech | StatefulSet @confluentinc
Custom Resource Definition(CRD) Usually works together Custom Controller API StatefulSet ReplicaSet … CRD Controller StatefulSet Controller ReplicaSet Controller … Custom Controller ReplicaSet … Custom Resource Instance @gamussa | @ #DSGTech | StatefulSet @confluentinc
Custom Resource Definition(CRD) Users can create and access Customer Resources with kubectl, just as they do for built-in resources like pods. @gamussa | API StatefulSet ReplicaSet … CRD Controller StatefulSet Controller ReplicaSet Controller … Custom Controller ReplicaSet … Custom Resource Instance @ #DSGTech | StatefulSet @confluentinc
Operator Deploy and Manage your production streaming platform with Confluent Operator. Automated Provisioning Platform Operations Resiliency Monitoring @gamussa | @ #DSGTech | @confluentinc
Confluent Platform Reference Architecture Each Confluent Platform component has specific characteristics: Application Application Native Client library Kafka Streams Load Balancer * Security (SSL certificates) DNS names and zones Host selection Fault tolerance Scaling @gamussa Application REST Proxy Schema Registry Kafka Brokers | @ #DSGTech | @confluentinc Kafka Connect Zookeeper Nodes
Confluent Operator: Automated Provisioning Load Balancer Kafka Pod Kafka Pod Storage @gamussa | @ #DSGTech | @confluentinc Kafka Pod
Confluent Operator: Scale Horizontally Automate scaling: Spin up new broker pod(s) Distribute partitions to the new broker(s) Determine balancing plan Execute balancing plan Monitor resources @gamussa | @ #DSGTech | @confluentinc
Confluent Operator: Rolling Upgrade Automated rolling upgrade with no downtime for Kafka. Stop broker Wait for leader election to complete Start broker with new version Wait for zero underreplicated-partitions Repeat @gamussa | @ #DSGTech | @confluentinc
Will it fly? Let’s see @gamussa | #DSGTech | @confluentinc
Confluent Operator Automate provisioning Scale your Kafkas and CP clusters elastically Monitor SLAs through Confluent Control Center or Prometheus Operate at scale with enterprise support from Confluent @gamussa | @ #DSGTech | @confluentinc
Advanced use cases vs. @gamussa | #DSGTech | @confluentinc
Don’t despair! @gamussa | @ #DSGTech | @confluentinc
Coding Sophistication Lower the bar to enter the world of streaming Core developers who use Java/Scala streams Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts User Population @gamussa | #DSGTech | @confluentinc
KSQL #FTW ksql> 1 UI 2 @gamussa POST /query CLI | #DSGTech 3 | REST @confluentinc 4 Headless
Interaction with Kafka KSQL (processing) JVM application Kafka with Kafka Streams (processing) (data) Does not run on Kafka brokers @gamussa Does not run on Kafka brokers | #DSGTech | @confluentinc
Fault-Tolerance, powered by Kafka @gamussa | #DSGTech | @confluentinc
Differences KSQL streams You write… KSQL statements JVM applications UI included for human interaction Yes, in Confluent Platform No CLI included for human interaction Yes No Data formats Avro, JSON, CSV (today) Any data format, including Avro, JSON, CSV, Protobuf, XML REST API included Yes No, but you can DIY Runtime included Yes, the KSQL server Not needed, applications run as standard JVM processes Queryable state Not yet Yes @gamussa | #DSGTech | @confluentinc
Standing on the shoulders of Streaming Giants Ease of use KSQL Powered by KSQL UDFs Kafka Streams Powered by Producer, Consumer APIs @gamussa | Flexibility #DSGTech | @confluentinc
@gamussa | #DSGTech | @confluentinc
One last thing…
https://kafka-summit.org Gamov30 @gamussa | @ #DSGTech | @confluentinc
Resources and Next Steps https://cnfl.io/helm_video https://cnfl.io/cp-helm https://cnfl.io/k8s https://slackpass.io/confluentcommunity #kubernetes @gamussa | #DSGTech | @confluentinc
Thanks! @gamussa viktor@confluent.io @gamussa | @ #DSGTech | @confluentinc
What can be more interesting than making systems of data processing pipelines? Let’s deal with these numerous tweets right now, using popular technologies — Apache Kafka, Kafka Connect and KSQL! We all know and love SQL, right? Well, KSQL is almost like SQL, only it’s Kafka. KSQL allows the creation of complex systems of thread-specific data processing, without writing Java or Scala (sick!) code! But the most interesting part is when, by using KSQL, we’ll deploy everything to Kubernetes, using Confluent Operator and Helm. We’ll then process the Twitter news feed in real time and find out who tweets most of all about Dick’s Sporting Goods!
Here’s what was said about this presentation on social media.
Can anyone help me find a xxl of @gAmUssA ‘s shirt? #DSGTech pic.twitter.com/0aKCORYrOW
— A. Brooks Renoll (@BrooksRenoll) March 1, 2019
#ksql is awesome thanks @gAmUssA
— Abdulrahman asiri (@Abdulrahmanasi1) March 1, 2019
#ksql #dsgtech loving the live demo of confluent kafka at dsg tech talk Thursday! @gAmUssA
— Greg Barker (@GBark1204) March 1, 2019
Learned a lot about running Kafka on Kubernetes by @gAmUssA #DSGTech pic.twitter.com/sNyDiIYUtf
— Jorge Balderas (@jorgerbf) March 1, 2019
At #dsgtech learning Kafka with k8s by @gAmUssA
— Javier Ochoa (@javier_ochoa) March 1, 2019
Enjoying a talk by @gAmUssA of @confluentinc on #apachekafka at tonight's #DSGTechTalkThursday #dsgtech #dsglife pic.twitter.com/nOdaCPfgQK
— Brian Surratt (@bpsurratt) February 28, 2019
Thanks @gAmUssA for making the trip to Pittsburgh to present tonight’s @DICKS #DSGTechTalkThursday. Today was a great opportunity for the @DICKSCareers tech team to learn and grow together. #DSGTech #dsglife https://t.co/ecWV1a9eMG https://t.co/GSGVsZHm8G
— A. Brooks Renoll (@BrooksRenoll) February 28, 2019
Tech talk Thursdays with @gAmUssA and @confluentinc. KAFKA, KSQL and K8s #DSGTech pic.twitter.com/y49U5S8b5n
— Oz (@richardkoswald) February 28, 2019
I'm at #DSGTech taking another opportunity to learn from @gAmUssA from @confluentinc about Kafka. pic.twitter.com/RPyYoINb6w
— Mud Runner Codes (@mudrunnercodes) February 28, 2019
Thanks to @confluentinc and @gAmUssA for giving a tech talk at @DICKS. Can't wait to start writing code #dsgtech #kafka
— Jesse Coddington (@JesseCoddington) February 28, 2019
Very excited to be at the #DSGTech meeting and getting the chance to learn from @gAmUssA pic.twitter.com/GjJ2jicixv
— Mud Runner Codes (@mudrunnercodes) February 28, 2019
#DSGTech very interesting topics can't wait to learn more about it #DicksSportingGoods pic.twitter.com/E1GvFz1dit
— Abdulrahman asiri (@Abdulrahmanasi1) February 28, 2019
#PGHTech: Join us at our next #TechTalk Thursday with Viktor Gamov, Developer Advocate @Confluentinc, on Feb 28 at 6 pm. We’re excited to co-host this event with Pittsburgh @ApacheKafka Meetup. Get more details here: https://t.co/4T2vg31eyT #DSGTech #TechTalk @gAmUssA pic.twitter.com/zzN9hER7wn
— DICK'S Careers (@DICKSCareers) February 20, 2019