8
New World Streaming first • DB/DWH + Many more distributed data systems • Monolith -> Microservices • Batch -> Real-time
@gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 9
9
Origins in Stream Processing Java Apps with Kafka Streams or KSQL
Serving Layer (Microservices, Elastic, etc.)
High Throughput Continuous Streaming platform Computation
@gamussa
|
#BostonKafka
|
API based clustering
@ConfluentINc
14
The log is a simple idea New
Old
Messages are added at the end of the log
@gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 15
15
The log is a simple idea New
Old
Messages are added at the end of the log
@gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 16
16
Pub / Sub @gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 17
17
Time
@gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 18
18
Time
C1 @gamussa
|
C2
#BostonKafka
C3 |
@ConfluentINc
Slide 19
19
Time A
B hash(key) % numPartitions = N
C D @gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 20
20
Time
Messages will be produced in a round robin fashion
@gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 21
21
Consumers have a position all of their own
Ricardo is here
Scan
New
Old
Robin is here
Scan
Viktor is here
@gamussa
|
Scan
#BostonKafka
|
@ConfluentINc
Slide 22
22
Consumers have a position all of their own
Ricardo is here
Scan
New
Old
Robin is here @gamussa
|
Viktor is here
Scan
#BostonKafka
|
Scan
@ConfluentINc
Slide 23
23
Consumers have a position all of their own
Ricardo is here
Scan
New
Old
Robin is here @gamussa
|
Viktor is here
Scan
#BostonKafka
|
@ConfluentINc
Scan
Slide 24
24
Only Sequential Access
Old
Read to offset & scan
@gamussa
|
#BostonKafka
|
@ConfluentINc
New
Slide 25
CONSUMERS
CONSUMER GROUP COORDINATOR
CONSUMER GROUP
Slide 26
26
C
@gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 27
27
CC C1
CC C2 @gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 28
28
@gamussa
|
#BostonKafka
|
C
C
C
C
@ConfluentINc
37
The log is a type of durable messaging system Similar to a traditional messaging system (ActiveMQ, Rabbit etc) but with: (a) Far better scalability (b) Built in fault tolerance / HA (c) Storage
40
Streaming is the toolset for dealing with events as they move! @gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 41
41
What exactly is Stream Processing? authorization_attempts
@gamussa
|
possible_fraud
#BostonKafka
|
@ConfluentINc
Slide 42
42
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 43
43
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 44
44
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 45
45
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 46
46
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 47
47
What exactly is Stream Processing? possible_fraud
authorization_attempts
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count() > 3; @gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 48
48
Coding Sophistication
Lower the bar to enter the world of streaming Core developers who use Java/Scala
streams Core developers who don’t use Java/Scala
Data engineers, architects, DevOps/SRE
BI analysts
User Population @gamussa
|
#BostonKafka
|
@ConfluentINc
50
Interaction with Kafka KSQL
JVM application
Kafka
(processing)
with Kafka Streams (processing)
(data)
Does not run on Kafka brokers @gamussa
Does not run on Kafka brokers |
#BostonKafka
|
@ConfluentINc
Slide 51
51
Standing on the shoulders of Streaming Giants
KSQL
Ease of use
Powered by
KSQL UDFs
Kafka Streams Powered by
Producer, Consumer APIs @gamussa
|
Flexibility
#BostonKafka
|
@ConfluentINc
Slide 52
52
Find your local Meetup Group https://cnfl.io/kafka-meetups Grab Stream Processing books https://cnfl.io/book-bundle Join us in Slack http://cnfl.io/slack @gamussa
|
#BostonKafka
|
@ConfluentINc
Slide 53
53
One more thing… @gamussa
|
#BostonKafka
|
@ConfluentINc