@redpilla @stewartbryson

Founder & CEO Red Pill Analytics twitter: @stewartbryson medium: @stewartbryson linkedin: stewartbryson

Technologies We Use Data Warehouse Analytics ANALYTICS @redpilla Data-engineering & ETL

@redpilla We’re hiring. @stewartbryson

https://medium.com/free-code-camp/the-rise-of-the-data-engineer-91be18f1e603 @redpilla We’re hiring. @stewartbryson

@redpilla @stewartbryson “There’s a multitude of reasons why complex pieces of software are not developed using drag and drop tools: it’s that ultimately code is the best abstraction there is for software.” Maxime Beauchemin, The Rise of the Data Engineer, Jan 20, 2017

@redpilla @stewartbryson

What’s so different about ETL? @redpilla @stewartbryson

Code versus Clicks Source control CI/CD Collaboration Code anywhere @redpilla Faster Easier to start Any discipline Business friendly @stewartbryson

Assumption: Code is better, but harder. @redpilla @stewartbryson

Data Warehouses SQL >> GUI (ETL) >> GUI (ELT) Modern Data Platforms SQL >> GUI (ETL) >> GUI (ELT) >> Distributed >> SQL @redpilla @stewartbryson

@redpilla @stewartbryson

@redpilla @stewartbryson

@redpilla @stewartbryson

CREATE OR REPLACE VIEW stage.promotions_vw AS SELECT promo_id, promo_name, promo_subcategory, promo_begin_date, promo_end_date, time_id status_date, status, weekday_flag FROM ( SELECT time_id, CASE WHEN day_name IN (‘Saturday’,’Sunday’) THEN ‘N’ ELSE ‘Y’ END weekday_flag, promo_id, MAX(promo_name) OVER (partition BY promo_id) promo_name, MAX(promo_subcategory) OVER (partition BY promo_id) promo_subcategory, MAX(promo_begin_date) OVER (partition BY promo_id) promo_begin_date, MAX(promo_end_date) OVER (partition BY promo_id) promo_end_date, nvl(status, ‘ongoing’) status FROM stage.promotions_stg partition BY (promo_id) right outer JOIN sh.times ON time_id=status_date) WHERE time_id BETWEEN promo_begin_date AND promo_end_date @redpilla https://www.rittmanmead.com/blog/2009/04/using-a-partition-by-join-to-fill-in-the-gaps

@redpilla @stewartbryson https://www.rittmanmead.com/blog/2009/04/using-a-partition-by-join-to-fill-in-the-gaps

@redpilla @stewartbryson

@redpilla @stewartbryson

Code >> GUI >> Code @redpilla @stewartbryson

I would rather write this. @redpilla SELECT first_name, last_name, email, phone_number, hire_date, job_id, manager_id, department_id, salary, commission_pct, eff_from_date, eff_to_date, current_flag FROM EMPLOYEES JOIN DEPARTMENTS using (DEPARTMENT_ID) JOIN JOBS using (JOB_ID) @stewartbryson

@redpilla @stewartbryson We aren’t starting with a blank slate. @stewartbryson

An embarrassment of riches. @redpilla @stewartbryson

Shouldn’t my pipelines just live in my data warehouse? @redpilla @stewartbryson

Data Build Tool (dbt) @redpilla @stewartbryson

Am I feeding other downstream systems? Data warehouses typically aren’t very good at that. @redpilla @stewartbryson

TABLE STREAMS & TASKS Web IoT External Stage (S3, Azure Blobs, GCS) Auto-Ingest Table Streams & Tasks Mobile Enterprise Apps © 2019 Snowflake Inc. All Rights Reserved Continuous Bulk Load (Snowpipe) OR (COPY) Storage Integration Staging Transformations Table involving any DML Target Table 1 Target Table 2 Unload External Stage (S3, Azure Blobs, GCS)

TABLE STREAMS & TASKS Table Streams & Tasks Target Table 1 Staging Table © 2019 Snowflake Inc. All Rights Reserved Transformations involving any DML Target Table 2 Unload External Stage (S3, Azure Blobs, GCS)

What are some other use cases we might want to consider? @redpilla @stewartbryson

Realtime Events © 2016 RED PILL Analytics

© 2016 RED PILL Analytics Mobile Analytics

Search © 2016 RED PILL Analytics

© 2016 RED PILL Analytics Machine Learning

Why Spark? @redpilla @stewartbryson

Why Databricks? @redpilla @stewartbryson

@redpilla @stewartbryson

@redpilla @stewartbryson

Demonstration @redpilla @stewartbryson