Data Engineering and the Death of Visual ETL

Slide 1

@redpilla @stewartbryson

Slide 2

Founder & CEO Red Pill Analytics twitter: @stewartbryson medium: @stewartbryson linkedin: stewartbryson

Slide 3

Slide 4

Slide 5

Slide 6

Partnerships Data Warehouse Analytics ANALYTICS @redpilla Data-engineering & ETL

Slide 7

https://medium.com/free-code-camp/the-rise-of-the-data-engineer-91be18f1e603 @redpilla We’re hiring. @stewartbryson

Slide 8

@redpilla @stewartbryson “There’s a multitude of reasons why complex pieces of software are not developed using drag and drop tools: it’s that ultimately code is the best abstraction there is for software.” Maxime Beauchemin, The Rise of the Data Engineer, Jan 20, 2017

Slide 9

@redpilla @stewartbryson

Slide 10

What’s so different about ETL? @redpilla @stewartbryson

Slide 11

Code versus Clicks Source control CI/CD Collaboration Code anywhere @redpilla Faster Easier to start Any discipline Business friendly @stewartbryson

Slide 12

Assumption: Code is better, but harder. @redpilla @stewartbryson

Slide 13

@redpilla @stewartbryson

Slide 14

@redpilla @stewartbryson

Slide 15

@redpilla @stewartbryson

Slide 16

CREATE OR REPLACE VIEW stage.promotions_vw AS SELECT promo_id, promo_name, promo_subcategory, promo_begin_date, promo_end_date, time_id status_date, status, weekday_flag FROM ( SELECT time_id, CASE WHEN day_name IN (‘Saturday’,’Sunday’) THEN ‘N’ ELSE ‘Y’ END weekday_flag, promo_id, MAX(promo_name) OVER (partition BY promo_id) promo_name, MAX(promo_subcategory) OVER (partition BY promo_id) promo_subcategory, MAX(promo_begin_date) OVER (partition BY promo_id) promo_begin_date, MAX(promo_end_date) OVER (partition BY promo_id) promo_end_date, nvl(status, ‘ongoing’) status FROM stage.promotions_stg partition BY (promo_id) right outer JOIN sh.times ON time_id=status_date) WHERE time_id BETWEEN promo_begin_date AND promo_end_date @redpilla https://www.rittmanmead.com/blog/2009/04/using-a-partition-by-join-to-fill-in-the-gaps

Slide 17

@redpilla @stewartbryson https://www.rittmanmead.com/blog/2009/04/using-a-partition-by-join-to-fill-in-the-gaps

Slide 18

@redpilla @stewartbryson mapping.create(‘MY_PROJECT’, ‘DEMO_FOLDER’, ‘EMPLOYEE_DIM_LOAD’) .datastores([ [name: “HR.EMPLOYEES”], [name: “HR.DEPARTMENTS”], [name: “HR.JOBS”], [name: “PERF.D_EMPLOYEE”, integration_type: “SCD”], ]) .select(“EMPLOYEES”) .filter(‘NAME_FILTER’, [filter_condition: “EMPLOYEES.FIRST_NAME LIKE ‘D%’” ]) .join(‘EMP_DEPT’, [‘DEPARTMENTS’], [join_condition: “EMPLOYEES.DEPARTMENT_ID = DEPARTMENTS.DEPARTMENT_ID” ]) .join(‘DEPT_JOBS’, [‘JOBS’], [join_condition: “EMPLOYEES.JOB_ID = JOBS.JOB_ID” ]) .connect(“D_EMPLOYEE”, [[ attr: “employee_id”, key_indicator: true ], [ attr: “eff_from_date”, expression: “sysdate”, execute_on_hint: “TARGET”], [ attr: “eff_to_date”, expression: “sysdate”, execute_on_hint: “TARGET”], [ attr: “current_flag”, expression: 1, execute_on_hint: “TARGET”], [ attr: “surr_key”, expression: “:RM_PROJECT.D_EMPLOYEE_SEQ_NEXTVAL”, execute_on_hint: “TARGET”],]) .commit() .validate() https://www.rittmanmead.com/blog/2017/03/accelerating-your-odi-implementation-rittmanmead-style/ https://www.rittmanmead.com/blog/2009/04/using-a-partition-by-join-to-fill-in-the-gaps

Slide 19

@redpilla @stewartbryson https://www.rittmanmead.com/blog/2017/03/accelerating-your-odi-implementation-rittmanmead-style/

Slide 20

@redpilla @stewartbryson

Slide 21

Code >> GUI >> Code @redpilla @stewartbryson

Slide 22

I would rather write this. @redpilla SELECT first_name, last_name, email, phone_number, hire_date, job_id, manager_id, department_id, salary, commission_pct, eff_from_date, eff_to_date, current_flag FROM EMPLOYEES JOIN DEPARTMENTS using (DEPARTMENT_ID) JOIN JOBS using (JOB_ID) @stewartbryson

Slide 23

@redpilla @stewartbryson

Slide 24

@redpilla @stewartbryson

Slide 25

@redpilla @stewartbryson We aren’t starting with a blank slate. @stewartbryson

Slide 26

An embarrassment of riches. @redpilla @stewartbryson

Slide 27

Why Spark? @redpilla @stewartbryson

Slide 28

Why Databricks? @redpilla @stewartbryson

Slide 29

@redpilla @stewartbryson

Slide 30

@redpilla @stewartbryson

Slide 31

Demonstration @redpilla @stewartbryson

Data Engineering and the Death of Visual ETL

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32