A presentation at Oracle Openworld 2019 in September 2019 in San Francisco, CA, USA by Francesco Tisiot
Become a Data Scientist Francesco Tisiot Analytics Tech Lead
Francesco Tisiot Analytics Tech Lead Verona, Italy http://ritt.md/ftisiot Over10 Years in Analytics ft@rittmanmead.com @FTisiot Oracle ACE Director ITOUG Board President
info@rittmanmead.com www.rittmanmead.com @rittmanmead Data Engineering Analytics Data Science
Agenda •OAC •Data Scientist •Become a Data Scientist
Oracle Analytics Cloud • Platform Services (PaaS) • Delivered entirely in the cloud: •No infrastructure footprint • Flexibility •Simplified, metered licensing • Several options to suit your needs: •BYOL • Functionality bundled into 2 editions • Professional • Enterprise
Functions OAC supports Every type of analytics Classic Modern
Classic Enterprise BI • Similar to OBIEE 12c • Centrally maintained & governed • Semantic model • Interactive Dashboards • KPI measurement & monitoring • Guided navigation paths • BI Publisher • Highly formatted, burst outputs • Action Framework • Navigation actions • Scheduled agents
Modern Data Discovery • Data Preparation •Acquire data • Clean/Enrich •Transform • Repeatable Flows • Data Visualisation • Create visual insights rapidly •Construct narrated storyboards • Share findings
Unified Analytics Free Discovery Centralised Reporting Unique Source of Truth Specific Access Control Raw Data To Insights Data Enrichment and Cleaning https://speakerdeck.com/ftisiot/become-an-equilibrista-find-the-right-balance-in-the-analytics-tech-ecosystem
Augmented Analytics Data Enrichment Suggestions Natural Language Processing Explain One-Click Advanced Analytics Advanced Machine Learning
Data Scientist
Data Scientist Is a person who has the knowledge and skills to conduct sophisticated and systematic analyses of data. A data scientist extracts insights from data sets, and evaluates and identifies strategic opportunities. https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/
D ata Scientist Is a Data Analyst who lives in California! https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/
Data Scientist Skills
Brendan Tierney Oracle Ace Director https://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html
Data Scientist …Company Missing a Data Scientist
Low Hanging Fruit Theory Democratise Data Science
Basic Operations Based on my Experience I can Guess…. What are the Drivers for My Sales? Statistically Significant Drivers for Sales Are … Augmented Analytics
Basic Operations YES/NO Is this Client going to accept the Offer? 50% Basic ML Model 70%
Become a Data Scientist with OAC
Before Starting…. Define the Problem!
Problem Definition: Predicting Wine Quality
TEP Task Experience Performance Classify Good/Bad Wine Corpus of Wine Descriptions with Rating Accuracy
Become a Data Scientist with OAC Connect
Connection Options in OAC Pre-Defined Data Models External Data Sources
Select Relevant Columns and Apply Filters
Become a Data Scientist with OAC Connect Clean
What Everybody Thinks a Data Scientist Does What He Really Does
https://www.infoworld.com/article/3228245/data-science/the-80-20-data-science-dilemma.html
Cleaning What? Mark <> MArk City “Rome” Missing Values Wrong Values Irrelevant Observations Col1 -> Name Role: CIO Salary:500 K$ 0-200k 0-1 Handling Outliers Feature Scaling N/A Labelling Columns
Train: 80% Test: 20% Aggregation Train/Test Set Split
Cleaning How? Data Flows - Filter - Aggregate - Join
Cleaning What? N/A UPPER Mark <> MArk City FILTER “Rome” Missing Values Wrong Values Irrelevant Observations CASE … Automated WHEN… COLUMN Col1 -> Name RENAME Role: CIO FILTER? Salary:500 K$ 0-200k KPI/ Automated (MAX-MIN) 0-1 Handling Outliers Labelling Columns
Feature Scaling Train: 80% FILTER Automated Test: 20% Train/Test Set Split
Why Removing an Outlier? Years Experience Salary 1 30.000 2 32.000 3 35.000 4 35.500 5 36.000 6 40.000 7 50.000 8 70.000 9 90.000 10 500.000
How To Find Outliers? One Dimension
How To Find Outliers? Two Dimensions
Become a Data Scientist with OAC Connect Clean Transform & Enrich
Feature Engineering Location -> ZIP Code Additional Data Sources? Name -> Sex 2 Locations -> Distance Data Flow Day/Month/Year -> Date
Data Preparation Recommendations
Spatial Enrichment Oracle Spatial Studio https://www.rittmanmead.com/blog/2019/07/oracle-spatial-studio/
Become a Data Scientist with OAC Connect Analyse Clean Transform & Enrich
Data Overview
Explain
Explain - Key Drivers
Become a Data Scientist with OAC Connect Clean Analyse Train & Evaluate Transform & Enrich
What Problem are we Trying to Solve? Supervised Unsupervised “I want to predict the value of Y, here are some examples” “Here is a dataset, make sense out of it!” Regression Classification Clustering https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d
Easy Models
NLP
DataFlow Train Model
Which Model - Parameters?
Select, Try, Save, Change, Try, Save …..
Compare - Classification Real Value Predicted Value Good Bad Good Bad
There is No Single Truth… 502/(502+896) = Precision 64.77% 471/(471+866)= 64.09%
Compare - Regression
Become a Data Scientist with OAC Connect Clean Transform & Enrich Analyse Train & Evaluate Predict
Use On the Fly
Step of a Data Flow
Congratulations! …You are now a Data Scientist!
Nearly There
Required Knowledge 50% . 60% 80% 90% 95% 97%
…But Data Cleaning Feature Engineering Model Creation & Evaluation Feature Selection 80% > 50%
ML Production Deployment Data Scientist ML -> Data Oracle Advanced Analytics
Become a Data Scientist with OAC http://ritt.md/OAC-datascience
ML in Action with OAC http://ritt.md/OAC-ML-Video
Insights Lab https://www.rittmanmead.com/insight-lab/
O AC Data Science
View Become a Data Scientist with Oracle Analytics Cloud on Notist.
Dismiss