A presentation at Boston Apache Kafka® Meetup by Confluent in in Boston, MA, USA by Viktor Gamov
1 Apache Kafka Event Streaming Platform March, 2019 / Boston, MA @gamussa | #BostonKafka | @ConfluentINc
2 @gamussa | #BostonKafka | @ConfluentINc
@gamussa | #BostonKafka | @ConfluentINc
Raffle, yeah 🚀 Follow @gamussa 📸🖼🏋 Tag @gamussa With #BostonKafka
5 A company is build on DATA FLOWS but All we have is DATA STORES @gamussa | #BostonKafka | @ConfluentINc
6 Pre-Streaming @gamussa | #BostonKafka | @ConfluentINc
7 @gamussa | #BostonKafka | @ConfluentINc
8 New World Streaming first • DB/DWH + Many more distributed data systems • Monolith -> Microservices • Batch -> Real-time @gamussa | #BostonKafka | @ConfluentINc
9 Origins in Stream Processing Java Apps with Kafka Streams or KSQL Serving Layer (Microservices, Elastic, etc.) High Throughput Continuous Streaming platform Computation @gamussa | #BostonKafka | API based clustering @ConfluentINc
10 Streaming Platform Storage Pub / Sub Processing @gamussa | #BostonKafka | @ConfluentINc
11 Storage @gamussa | #BostonKafka | @ConfluentINc
12 Core Abstraction ● DB - table ● Hadoop - file ● Kafka - ? @gamussa | #BostonKafka | @ConfluentINc
13 LOG @gamussa | #BostonKafka | @ConfluentINc
14 The log is a simple idea New Old Messages are added at the end of the log @gamussa | #BostonKafka | @ConfluentINc
15 The log is a simple idea New Old Messages are added at the end of the log @gamussa | #BostonKafka | @ConfluentINc
16 Pub / Sub @gamussa | #BostonKafka | @ConfluentINc
17 Time @gamussa | #BostonKafka | @ConfluentINc
18 Time C1 @gamussa | C2 #BostonKafka C3 | @ConfluentINc
19 Time A B hash(key) % numPartitions = N C D @gamussa | #BostonKafka | @ConfluentINc
20 Time Messages will be produced in a round robin fashion @gamussa | #BostonKafka | @ConfluentINc
21 Consumers have a position all of their own Ricardo is here Scan New Old Robin is here Scan Viktor is here @gamussa | Scan #BostonKafka | @ConfluentINc
22 Consumers have a position all of their own Ricardo is here Scan New Old Robin is here @gamussa | Viktor is here Scan #BostonKafka | Scan @ConfluentINc
23 Consumers have a position all of their own Ricardo is here Scan New Old Robin is here @gamussa | Viktor is here Scan #BostonKafka | @ConfluentINc Scan
24 Only Sequential Access Old Read to offset & scan @gamussa | #BostonKafka | @ConfluentINc New
CONSUMERS CONSUMER GROUP COORDINATOR CONSUMER GROUP
26 C @gamussa | #BostonKafka | @ConfluentINc
27 CC C1 CC C2 @gamussa | #BostonKafka | @ConfluentINc
28 @gamussa | #BostonKafka | C C C C @ConfluentINc
29 @gamussa | #BostonKafka | 0 1 2 3 @ConfluentINc
30 @gamussa | #BostonKafka | 0 1 2 3 @ConfluentINc
31 @gamussa | #BostonKafka | 0, 3 1 2 3 @ConfluentINc
32 Linearly Scalable Architecture Producers Single topic: - Many producers machines - Many consumer machines - Many Broker machines No Bottleneck!! Consumers @gamussa | #BostonKafka | @ConfluentINc
33 Replicate to get fault leader msg Machine B Machine A @gamussa replicate | #BostonKafka | msg @ConfluentINc
34 Partition Leadership and Replication Topic1 partition1 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4 Broker 1 Broker 2 Topic1 partition4 Broker 3 Broker 4 Leader @gamussa | #BostonKafka | @ConfluentINc Follower
35 Replication provides resiliency A replica takes over on machine failure @gamussa | #BostonKafka | @ConfluentINc
36 Partition Leadership and Replication - node failure Topic1 partition1 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4 Broker 1 Broker 2 Topic1 partition4 Broker 3 Broker 4 Leader @gamussa | #BostonKafka | @ConfluentINc Follower
37 The log is a type of durable messaging system Similar to a traditional messaging system (ActiveMQ, Rabbit etc) but with: (a) Far better scalability (b) Built in fault tolerance / HA (c) Storage
Stop! Demo time! @gamussa | #BostonKafka | @ConfluentINc
39 Processing @gamussa | #BostonKafka | @ConfluentINc
40 Streaming is the toolset for dealing with events as they move! @gamussa | #BostonKafka | @ConfluentINc
41 What exactly is Stream Processing? authorization_attempts @gamussa | possible_fraud #BostonKafka | @ConfluentINc
42 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #BostonKafka | @ConfluentINc
43 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #BostonKafka | @ConfluentINc
44 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #BostonKafka | @ConfluentINc
45 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #BostonKafka | @ConfluentINc
46 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #BostonKafka | @ConfluentINc
47 What exactly is Stream Processing? possible_fraud authorization_attempts CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa | #BostonKafka | @ConfluentINc
48 Coding Sophistication Lower the bar to enter the world of streaming Core developers who use Java/Scala streams Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts User Population @gamussa | #BostonKafka | @ConfluentINc
49 KSQL #FTW ksql> 1 UI @gamussa POST /query 2 CLI | #BostonKafka 3 REST | @ConfluentINc 4 Headless
50 Interaction with Kafka KSQL JVM application Kafka (processing) with Kafka Streams (processing) (data) Does not run on Kafka brokers @gamussa Does not run on Kafka brokers | #BostonKafka | @ConfluentINc
51 Standing on the shoulders of Streaming Giants KSQL Ease of use Powered by KSQL UDFs Kafka Streams Powered by Producer, Consumer APIs @gamussa | Flexibility #BostonKafka | @ConfluentINc
52 Find your local Meetup Group https://cnfl.io/kafka-meetups Grab Stream Processing books https://cnfl.io/book-bundle Join us in Slack http://cnfl.io/slack @gamussa | #BostonKafka | @ConfluentINc
53 One more thing… @gamussa | #BostonKafka | @ConfluentINc
54 @gamussa | #BostonKafka | @ConfluentINc
55 @gamussa | #BostonKafka | @ConfluentINc
https://kafka-summit.org Gamov30 @gamussa | @ @tlberglund | #DEVnexus
Thanks! @gamussa viktor@confluent.io We are hiring! https://www.confluent.io/careers/ @gamussa | @ #BostonKafka | @ConfluentINc
Streaming platforms have emerged as a popular, new trend, but what exactly is a streaming platform? Part messaging system, part Hadoop made fast, part fast ETL and scalable data integration, with Apache Kafka at the core, streaming platforms offer an entirely new perspective on managing the flow of data. This talk will explain what a streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems. New developments in this area such as KSQL will also be discussed.
The following code examples from the presentation can be tried out live.
KLyfft demo app demonstrated during the meetup
Here’s what was said about this presentation on social media.
Stream processing is possible with Kafka using either KSQL or Producer/ Consumer API @confluentinc #Boston #Kafka #BostonKafka @gAmUssA pic.twitter.com/UYih96OjKK
— Microsoft Developers (@DevBostonDotOrg) March 12, 2019
Distributed event log, offsets, pub/sub, the message will not disappear after consumer reads it, guaranteed order in the partition #BostonKafka @gAmUssA pic.twitter.com/AdmMnMJWvz
— Microsoft Developers (@DevBostonDotOrg) March 12, 2019
Huge turnout at our first ever Boston based Kafka Meetup co-hosted by @confluentinc and @WayfairTech pic.twitter.com/COLTqW3WbC
— Vinay Narayana ☁️ (@nvinay26) March 13, 2019
Joining @gAmUssA & @nvinay26 to find out what is #ApacheKafka & to learn about Wayfair's Journey with Apache Kafka on March 12 @wayfairatwork #Meetup #Boston #Kafka @apachekafka @confluentcloud pic.twitter.com/SeyZUhEakM
— Microsoft Developers (@DevBostonDotOrg) March 12, 2019
Great time with the Boston Tribe @TribalScale learning about #ApacheKafka & Wayfair’s Journey with Kafka. Great presentations from @gAmUssA & @nvinay26! “Three things changed the course of history: invention of fire, invention of the wheel, and invention of Kafka” #BostonKafka pic.twitter.com/cLChEQpyzc
— Jacob Zweifel (@jacob_zweifel) March 13, 2019
@gAmUssA awesome presentation ! #bostonkafka
— raj-raj (@rajk_r) March 12, 2019
Watching @gAmUssA at #bostonkafka. pic.twitter.com/8fITxTSmFc
— urayoan (@urayoan) March 12, 2019