About Ververica
Original Creators of Apache Flink®
2
@morsapaes
Enterprise Stream Processing With Ververica Platform
Part of Alibaba Group
Slide 3
Apache Flink Flink is an open source framework and distributed engine for stateful stream processing.
Flink Runtime Stateful Computations over Data Streams
3
@morsapaes
Learn more: flink.apache.org
Slide 4
Apache Flink Flink is an open source framework and distributed engine for stateful stream processing.
High Performance
Fault Tolerance Stateful Processing
Flexible APIs
Flink Runtime Stateful Computations over Data Streams
4
@morsapaes
Learn more: flink.apache.org
Slide 5
Use Cases This gives you a robust foundation for a wide range of use cases:
Streaming Analytics & ML
Stateful Stream Processing
Event-Driven Applications
Streams, State, Time SQL, PyFlink, Tables
Stateful Functions
Flink Runtime Stateful Computations over Data Streams
5
@morsapaes
Learn more: flink.apache.org
Slide 6
Use Cases Classical, core stream processing use cases that build on the primitives of streams, state and time.
Streaming Analytics & ML
Stateful Stream Processing
Event-Driven Applications
Streams, State, Time SQL, PyFlink, Tables
Stateful Functions
Flink Runtime Stateful Computations over Data Streams
6
@morsapaes
Learn more: flink.apache.org
Slide 7
Stateful Stream Processing Classical, core stream processing use cases that build on the primitives of streams, state and time.
●
Explicit control over these primitives
●
Complex computations and customization
●
Maximize performance and reliability
Example Use Cases
Large-scale Data Pipelines
7
@morsapaes
ML-Based Fraud Detection
Service Monitoring & Anomaly Detection
Slide 8
Use Cases More high-level or domain-specific use cases that can be modeled with SQL or Python and dynamic tables.
Streaming Analytics & ML
Stateful Stream Processing
Event-Driven Applications
Streams, State, Time SQL, PyFlink, Tables
Stateful Functions
Flink Runtime Stateful Computations over Data Streams
8
@morsapaes
Learn more: flink.apache.org
Slide 9
Streaming Analytics & ML More high-level or domain-specific use cases that can be modeled with SQL or Python and dynamic tables.
●
Focus on logic, not implementation
●
Mixed workloads (batch and streaming)
●
Maximize developer speed and autonomy
Example Use Cases
Unified Online/Offline Model Training
9
@morsapaes
E2E Streaming Analytics Pipelines
ML Feature Generation
Slide 10
More Flink Users
10
@morsapaes
Learn More: Powered by Flink, Speakers – Flink Forward San Francisco 2019, Speakers – Flink Forward Europe 2019
Slide 11
11
@morsapaes
Slide 12
Python is…pretty stacked?
Mature analytics stack, with libraries that are fast and intuitive.
12
@morsapaes
Source: JetBrains’ Developer Ecosystem Report 2020
Slide 13
…and also timeless!
1995 2008 2003 2015
Mature analytics stack, with libraries that are fast and intuitive.
2001
13
@morsapaes
Source: JetBrains’ Developer Ecosystem Report 2020
Slide 14
…and also timeless!
1995 2008 2003 2015
Mature analytics stack, with libraries that are fast and intuitive.
2001
Older libraries are mostly restricted to a data size that fits in memory (RAM), and designed to run on a single core (CPU).
14
@morsapaes
Slide 15
This is a problem.
15
@morsapaes
Slide 16
16
@morsapaes
Slide 17
But you still want to use these powerful libraries, right?
17
@morsapaes
Slide 18
Why PyFlink?
18
@morsapaes
Slide 19
Why PyFlink?
Expose the functionality of Flink to Python users
19
@morsapaes
Slide 20
Why PyFlink?
Distribute and scale the functionality of Python through Flink
20
@morsapaes
Learn more: The Integration of Pandas into PyFlink.
Slide 21
Flink at Alibaba scale Double 11 / Singles Day incl. sub-second updates to the GMV dashboard
Real-time Data Applications
Search
Recomm.
Infrastructure
5K nodes
Ads
Data Size
CPU cores
21
@morsapaes
100TB
Security
Throughput (Peak)
2.5B
985PB State Size (Biggest)
500K
BI
events/sec
Latency
Sub-sec
Learn more: Optimizations in Blink Runtime for Global Shopping Festival at Alibaba
Slide 22
PyFlink in a Nutshell*
22
●
Native SQL integration
●
Unified APIs for batch and streaming
●
Support for a large set of operations (incl. complex joins, windowing, pattern matching/CEP)
@morsapaes
As of Flink 1.11, only the Table API is exposed through PyFlink. The low-level DataStream API is on the roadmap (FLIP-130).
Slide 23
PyFlink in a Nutshell* ●
Native SQL integration
●
Unified APIs for batch and streaming
●
Support for a large set of operations (incl. complex joins, windowing, pattern matching/CEP)
Execution
Streaming
Batch
UDF Support
23
Python UDF
Pandas UDF
+UDAF (WIP)
+UDAF (WIP)
@morsapaes
As of Flink 1.11, only the Table API is exposed through PyFlink. The low-level DataStream API is on the roadmap (FLIP-130).
Slide 24
PyFlink in a Nutshell* ●
Native SQL integration
●
Unified APIs for batch and streaming
●
Support for a large set of operations (incl. complex joins, windowing, pattern matching/CEP)
Execution
Streaming
Native Connectors
Formats
Batch FileSystems
Apache Kafka
ML Library (WIP)
FLIP-39
Notebooks
UDF Support
Kinesis
Python UDF
Pandas UDF
+UDAF (WIP)
+UDAF (WIP)
HBase
JDBC Elasticsearch
Apache Zeppelin
24
@morsapaes
As of Flink 1.11, only the Table API is exposed through PyFlink. The low-level DataStream API is on the roadmap (FLIP-130).
Slide 25
PyFlink in a Nutshell* ●
Native SQL integration
●
Unified APIs for batch and streaming
●
Support for a large set of operations (incl. complex joins, windowing, pattern matching/CEP)
Execution
Streaming
Native Connectors
Formats
Batch FileSystems
Apache Kafka
ML Library (WIP)
FLIP-39
Notebooks
UDF Support
Kinesis
Python UDF
Pandas UDF
+UDAF (WIP)
+UDAF (WIP)
HBase
JDBC Elasticsearch
Apache Zeppelin
25
@morsapaes
As of Flink 1.11, only the Table API is exposed through PyFlink. The low-level DataStream API is on the roadmap (FLIP-130).
Slide 26
26
@morsapaes
Slide 27
Apache Zeppelin Web-based notebook that provides an interactive and collaborative computing environment.
…
27
@morsapaes
Advantages ●
Support for a lot of interpreters
●
Polyglot notes
●
Built-in interactive visualizations
●
Multi-tenancy
●
Pluggable notebook storage (e.g. git)