A presentation at ApacheCon Asia by Tim Spann
STREAMING FLANK STACK WITH FLINK FOR STREAMING USE CASES Timothy Spann Developer Advocate
FLaNK and FLiP Stacks ● ● ● Apache Flink Apache NiFi Apache Kafka ● ● ● ● Apache Flink Apache Pulsar StreamNative’s Flink Connector for Pulsar Apache +++ Apache projects are the way for all streaming use cases.
FLiP Stack (FLink -integratePulsar) StreamNative’s Flink Connector for Pulsar is the bridge to FLiP your streams at speed. https://hub.streamnative.io/data-processing/pulsar-flink/2.7.0/
WHAT IS APACHE PULSAR? Apache Pulsar is an open source, cloud-native distributed messaging and streaming platform. EVENTS
APACHE PULSAR Enable Geo-Replicated Messaging ● ● ● ● ● ● ● ● ● ● ● ● Pub-Sub Geo-Replication Pulsar Functions Horizontal Scalability Multi-tenancy Tiered Persistent Storage Pulsar Connectors REST API CLI Many clients available Four Different Subscription Types Multi-Protocol Support ○ MQTT ○ AMQP ○ JMS ○ Kafka ○ … https://hub.streamnative.io/
WHAT IS APACHE FLINK? 3B+ data points daily streaming in from 25 million customers running real time machine learning prediction USE CASE Streaming real-time data pipelines that need to handle complex stream or batch data event processing, analytics, and/or support event-driven applications event time window job with state and connectors for basic writes to HDFS, Pulsar, Kafka. Need Event-at-a-time/microbatch, stateful/stateless operations, and exactly once or at least once Processing TECHNOLOGY Flink performs compute at in-memory speed at any scale Flink parses SQL using Apache Calcite, which supports standard ANSI SQL Flink runs standalone, on YARN and Kubernetes Flink
Flink SQL ● ● ● ● ● ● ● ● Streaming Analytics Continuous SQL Continuous ETL Kafka, Pulsar and More… Complex Event Processing Standard SQL Powered by Apache Calcite Deployed Apache Flink Apps on YARN Scalable Stream Processing https://www.datainmotion.dev/2021/04/cloudera-sql-stream-builder-ssb-updated.html 7
Multiinges t Multiinges t ALL DATA - ANYTIME - ANYWHERE - MULTI-CLOUD MULTI-PROTOCOL Multi-ingest Merge Priority
DEEPER CONTENT ● ● https://www.datainmotion.dev/2020/10/running-flink-sql-against-kafka-using.html https://www.datainmotion.dev/2020/10/top-25-use-cases-of-cloudera-flow.html ● ● ● ● ● https://github.com/tspannhw/EverythingApacheNiFi https://github.com/tspannhw/CloudDemo2021 https://github.com/tspannhw/StreamingSQLExamples https://github.com/tspannhw/SpeakerProfile/blob/main/2021/talks/UsingFLaNKStackEdgetspann2021.pdf https://www.linkedin.com/pulse/2021-schedule-tim-spann/ 9
THANK YOU QUESTIONS? @PaasDev timothyspann https://www.pulsardeveloper.com/
Using FLaNK and FLiP for streaming use cases utilizing apache open source projects including Apache Flink and Apache Pulsar to ingest data at scale.