Become a Data Scientist Francesco Tisiot Analytics Tech Lead

Francesco Tisiot Analytics Tech Lead Verona, Italy http://ritt.md/ftisiot Over10 Years in Analytics ft@rittmanmead.com @FTisiot Oracle ACE Director ITOUG Board President

info@rittmanmead.com www.rittmanmead.com @rittmanmead Data Engineering Analytics Data Science

Agenda •OAC •Data Scientist •Become a Data Scientist

Oracle Analytics Cloud • Platform Services (PaaS) • Delivered entirely in the cloud: •No infrastructure footprint • Flexibility •Simplified, metered licensing • Several options to suit your needs: •BYOL • Functionality bundled into 2 editions • Professional • Enterprise

Functions OAC supports Every type of analytics Classic Modern

Classic Enterprise BI • Similar to OBIEE 12c • Centrally maintained & governed • Semantic model • Interactive Dashboards • KPI measurement & monitoring • Guided navigation paths • BI Publisher • Highly formatted, burst outputs • Action Framework • Navigation actions • Scheduled agents

Modern Data Discovery • Data Preparation •Acquire data • Clean/Enrich •Transform • Repeatable Flows • Data Visualisation • Create visual insights rapidly •Construct narrated storyboards • Share findings

Unified Analytics Free Discovery Centralised Reporting Unique Source of Truth Specific Access Control Raw Data To Insights Data Enrichment and Cleaning https://speakerdeck.com/ftisiot/become-an-equilibrista-find-the-right-balance-in-the-analytics-tech-ecosystem

Augmented Analytics Data Enrichment Suggestions Natural Language Processing Explain One-Click Advanced Analytics Advanced Machine Learning

Data Scientist

Data Scientist Is a person who has the knowledge and skills to conduct sophisticated and systematic analyses of data. A data scientist extracts insights from data sets, and evaluates and identifies strategic opportunities. https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/

D ata Scientist Is a Data Analyst who lives in California! https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/

Data Scientist Skills

Brendan Tierney Oracle Ace Director https://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html

Data Scientist …Company Missing a Data Scientist

Low Hanging Fruit Theory Democratise Data Science

Basic Operations Based on my Experience I can Guess…. What are the Drivers for My Sales? Statistically Significant Drivers for Sales Are … Augmented Analytics

Basic Operations YES/NO Is this Client going to accept the Offer? 50% Basic ML Model 70%

Become a Data Scientist with OAC

Before Starting…. Define the Problem!

Problem Definition: Predicting Wine Quality

TEP Task Experience Performance Classify Good/Bad Wine Corpus of Wine Descriptions with Rating Accuracy

Become a Data Scientist with OAC Connect

Connection Options in OAC Pre-Defined Data Models External Data Sources

Select Relevant Columns and Apply Filters

Become a Data Scientist with OAC Connect Clean

What Everybody Thinks a Data Scientist Does What He Really Does

https://www.infoworld.com/article/3228245/data-science/the-80-20-data-science-dilemma.html

Cleaning What? Mark <> MArk City “Rome” Missing Values Wrong Values Irrelevant Observations Col1 -> Name Role: CIO Salary:500 K$ 0-200k 0-1 Handling Outliers Feature Scaling N/A Labelling Columns

Of Clicks

Train: 80% Test: 20% Aggregation Train/Test Set Split

Cleaning How? Data Flows - Filter - Aggregate - Join

Cleaning What? N/A UPPER Mark <> MArk City FILTER “Rome” Missing Values Wrong Values Irrelevant Observations CASE … Automated WHEN… COLUMN Col1 -> Name RENAME Role: CIO FILTER? Salary:500 K$ 0-200k KPI/ Automated (MAX-MIN) 0-1 Handling Outliers Labelling Columns

of Clicks COUNT Aggregation

Feature Scaling Train: 80% FILTER Automated Test: 20% Train/Test Set Split

Why Removing an Outlier? Years Experience Salary 1 30.000 2 32.000 3 35.000 4 35.500 5 36.000 6 40.000 7 50.000 8 70.000 9 90.000 10 500.000

How To Find Outliers? One Dimension

How To Find Outliers? Two Dimensions

Become a Data Scientist with OAC Connect Clean Transform & Enrich

Feature Engineering Location -> ZIP Code Additional Data Sources? Name -> Sex 2 Locations -> Distance Data Flow Day/Month/Year -> Date

Data Preparation Recommendations

Spatial Enrichment Oracle Spatial Studio https://www.rittmanmead.com/blog/2019/07/oracle-spatial-studio/

Become a Data Scientist with OAC Connect Analyse Clean Transform & Enrich

Data Overview

Explain

Explain - Key Drivers

Become a Data Scientist with OAC Connect Clean Analyse Train & Evaluate Transform & Enrich

What Problem are we Trying to Solve? Supervised Unsupervised “I want to predict the value of Y, here are some examples” “Here is a dataset, make sense out of it!” Regression Classification Clustering https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d

Easy Models

NLP

DataFlow Train Model

Which Model - Parameters?

Select, Try, Save, Change, Try, Save …..

Compare - Classification Real Value Predicted Value Good Bad Good Bad

There is No Single Truth… 502/(502+896) = Precision 64.77% 471/(471+866)= 64.09%

Compare - Regression

Become a Data Scientist with OAC Connect Clean Transform & Enrich Analyse Train & Evaluate Predict

Use On the Fly

Step of a Data Flow

Congratulations! …You are now a Data Scientist!

Nearly There

Required Knowledge 50% . 60% 80% 90% 95% 97%

…But Data Cleaning Feature Engineering Model Creation & Evaluation Feature Selection 80% > 50%

ML Production Deployment Data Scientist ML -> Data Oracle Advanced Analytics

Become a Data Scientist with OAC http://ritt.md/OAC-datascience

ML in Action with OAC http://ritt.md/OAC-ML-Video

Insights Lab https://www.rittmanmead.com/insight-lab/

O AC Data Science

Become a Data Scientist Francesco Tisiot Analytics Tech Lead