Become a Data Scientist Francesco Tisiot Analytics Tech Lead
Slide 2
Francesco Tisiot Analytics Tech Lead
Verona, Italy http://ritt.md/ftisiot Over10 Years in Analytics ft@rittmanmead.com @FTisiot Oracle ACE Director ITOUG Board President
Slide 3
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Data Engineering
Analytics
Data Science
Slide 4
Agenda
•OAC •Data Scientist •Become a Data Scientist
Slide 5
Oracle Analytics Cloud • Platform Services (PaaS) • Delivered entirely in the cloud: •No infrastructure footprint • Flexibility •Simplified, metered licensing • Several options to suit your needs: •BYOL • Functionality bundled into 2 editions • Professional • Enterprise
Slide 6
Functions OAC supports Every type of analytics
Classic
Modern
Slide 7
Classic Enterprise BI • Similar to OBIEE 12c • Centrally maintained & governed • Semantic model
• Interactive Dashboards • KPI measurement & monitoring • Guided navigation paths
• BI Publisher • Highly formatted, burst outputs
• Action Framework • Navigation actions • Scheduled agents
Slide 8
Modern Data Discovery • Data Preparation •Acquire data • Clean/Enrich •Transform • Repeatable Flows
• Data Visualisation • Create visual insights rapidly •Construct narrated storyboards • Share findings
Slide 9
Unified Analytics Free Discovery
Centralised Reporting Unique Source of Truth Specific Access Control
Raw Data To Insights
Data Enrichment and Cleaning
https://speakerdeck.com/ftisiot/become-an-equilibrista-find-the-right-balance-in-the-analytics-tech-ecosystem
Slide 10
Augmented Analytics Data Enrichment Suggestions Natural Language Processing Explain
One-Click Advanced Analytics Advanced Machine Learning
Slide 11
Data Scientist
Slide 12
Data Scientist Is a person who has the knowledge and skills to conduct sophisticated and systematic analyses of data. A data scientist extracts insights from data sets, and evaluates and identifies strategic opportunities.
https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/
Slide 13
D
ata Scientist
Is a Data Analyst who lives in California!
https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/
Slide 14
Data Scientist Skills
Slide 15
Brendan Tierney Oracle Ace Director https://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html
Slide 16
Data Scientist …Company Missing a Data Scientist
Slide 17
Low Hanging Fruit Theory
Democratise Data Science
Slide 18
Basic Operations Based on my Experience I can Guess….
What are the Drivers for My Sales?
Statistically Significant Drivers for Sales Are …
Augmented Analytics
Slide 19
Basic Operations YES/NO
Is this Client going to accept the Offer?
50% Basic ML Model
70%
Slide 20
Become a Data Scientist with OAC
Slide 21
Before Starting…. Define the Problem!
Slide 22
Problem Definition: Predicting Wine Quality
Slide 23
TEP Task
Experience
Performance
Classify Good/Bad Wine
Corpus of Wine Descriptions with Rating
Accuracy
Slide 24
Become a Data Scientist with OAC Connect
Slide 25
Connection Options in OAC Pre-Defined Data Models
External Data Sources
Slide 26
Select Relevant Columns and Apply Filters
Slide 27
Become a Data Scientist with OAC Connect
Clean
Slide 28
What Everybody Thinks a Data Scientist Does
What He Really Does
Become a Data Scientist with OAC Connect
Clean
Transform & Enrich
Slide 37
Feature Engineering Location -> ZIP Code
Additional Data Sources? Name -> Sex
2 Locations -> Distance
Data Flow Day/Month/Year -> Date
Slide 38
Data Preparation Recommendations
Slide 39
Spatial Enrichment
Oracle Spatial Studio https://www.rittmanmead.com/blog/2019/07/oracle-spatial-studio/
Slide 40
Become a Data Scientist with OAC Connect
Analyse
Clean
Transform & Enrich
Slide 41
Data Overview
Slide 42
Explain
Slide 43
Explain - Key Drivers
Slide 44
Become a Data Scientist with OAC Connect
Clean
Analyse
Train & Evaluate
Transform & Enrich
Slide 45
What Problem are we Trying to Solve? Supervised
Unsupervised
“I want to predict the value of Y, here are some examples”
“Here is a dataset, make sense out of it!”
Regression
Classification
Clustering
https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d
Slide 46
Easy Models
Slide 47
NLP
Slide 48
DataFlow Train Model
Slide 49
Which Model - Parameters?
Slide 50
Select, Try, Save, Change, Try, Save …..
Slide 51
Compare - Classification
Real Value
Predicted Value Good Bad Good Bad
Slide 52
There is No Single Truth… 502/(502+896) = Precision
64.77%
471/(471+866)=
64.09%
Slide 53
Compare - Regression
Slide 54
Become a Data Scientist with OAC Connect
Clean
Transform & Enrich
Analyse
Train & Evaluate
Predict
Slide 55
Use On the Fly
Slide 56
Step of a Data Flow
Slide 57
Congratulations!
…You are now a Data Scientist!
Slide 58
Nearly There
Slide 59
Required Knowledge
50% .
60% 80%
90% 95% 97%
Slide 60
…But Data Cleaning
Feature Engineering
Model Creation & Evaluation
Feature Selection
80% > 50%
Slide 61
ML Production Deployment
Data Scientist ML -> Data Oracle Advanced Analytics
Slide 62
Become a Data Scientist with OAC
http://ritt.md/OAC-datascience