Rediscover the known Universe with NASA dataset @LostInBritanny @AurrelH95
@PierreZ @moyowi
@jugsummercamp
@helloexoworld
Slide 2
Pierre Zemb @PierreZ Infrastructure Engineer working on distributed systems
@jugsummercamp
@helloexoworld
Slide 3
Aurélien Hébert @AurrelH95 Software Engineer and data lover
@jugsummercamp
@helloexoworld
Slide 4
Horacio Gonzalez @LostInBrittany Spaniard lost in Brittany, developer, dreamer and all-around geek
@jugsummercamp
@helloexoworld
Slide 5
Emmanuel we ♥ you
@jugsummercamp
@helloexoworld
Slide 6
HelloExoWorld
Looking for exoplanets in NASA datasets @jugsummercamp
@helloexoworld
Slide 7
HelloExoWorld Once upon a time...
@jugsummercamp
@helloexoworld
Slide 8
What not to do if you love astronomy
Live in Brest @jugsummercamp
@helloexoworld
Slide 9
Looking for solutions
Computer stuff
Astronomy
Mixing passions @jugsummercamp
@helloexoworld
Slide 10
Google is your friend...
Let's find a project @jugsummercamp
@helloexoworld
Slide 11
Exoplanets?
Planets orbiting stars far away @jugsummercamp
@helloexoworld
Slide 12
How do we find them?
The transit method seems the best @jugsummercamp
@helloexoworld
Slide 13
Exoplanets detection From theory to practice
@jugsummercamp
@helloexoworld
Slide 14
The transit method
Credits: NASA’s Goddard Space Flight Center
@jugsummercamp
@helloexoworld
Slide 15
How do we look for transits?
Image credits : NASA
Image credits : NASA
Kepler
Tess @jugsummercamp
@helloexoworld
Slide 16
@jugsummercamp
@helloexoworld
Slide 17
Watching the sky
By Carter Roberts [Public domain], via Wikimedia Commons
@jugsummercamp
@helloexoworld
Slide 18
Kepler image A star : 12*12px
@jugsummercamp
@helloexoworld
Slide 19
And what kind of data we get?
Pleiades By NASA, ESA, AURA/Caltech, Palomar Observatory. Via Wikimedia Common
@jugsummercamp
@helloexoworld
Slide 20
Well, that's the problem
Seven stars, seven different profiles @jugsummercamp
@helloexoworld
Slide 21
Kinda big data
Over 40 million light curves @jugsummercamp
@helloexoworld
Slide 22
Big AND open data
Lots of datasets in #opendata @jugsummercamp
@helloexoworld
Slide 23
And we can help with that!
Let's use our tools to analyse the data @jugsummercamp
@helloexoworld
Slide 24
Time Series To analyse Kepler datasets
@jugsummercamp
@helloexoworld
Slide 25
Kepler: spatial Time Series Definition of Time Series: A series of data points indexed in time order
@jugsummercamp
@helloexoworld
Slide 26
Time Series ● ● ● ● ● ● ●
Stock Market Analysis Economic Forecasting Budgetary Analysis Process and Quality Control Workload Projections Census Analysis ... @jugsummercamp
@helloexoworld
Slide 27
Time Series Applications: ● Understanding the data ● Fit a model ○ Monitoring ○ Forecasting
@jugsummercamp
@helloexoworld
Slide 28
Time Series Stock market Analytics Economic Forecasting $$$ Study & Research @jugsummercamp
@helloexoworld
Slide 29
Time Series Many specific analytical tools: ● ● ● ● ● ●
Moving average ARMA (AutoRegressive Moving Average) Multivariate ARMA models ARCH (AutoRegressive Conditional Heteroscedasticity) Dynamic time warping ... @jugsummercamp
@helloexoworld
Slide 30
Time Series Specific application of general tools ● ● ● ● ●
Artificial neural networks Hidden Markov model Fourier & Wavelets transforms Entropy encoding ...
@jugsummercamp
@helloexoworld
Slide 31
Dealing with Time Series
The 3 'v' @jugsummercamp
@helloexoworld
Slide 32
Monitoring OVH with Time Series
@jugsummercamp
@helloexoworld
Slide 33
OVH Metrics A metrics data platform
@jugsummercamp
@helloexoworld
Slide 34
Tools to deal with Time Series
Many options @jugsummercamp
@helloexoworld
Slide 35
Metrics Data Platform
@jugsummercamp
@helloexoworld
Slide 36
Metrics’ metrics
● 1.5M datapoints/s, 24/7 ● Peaks at ~10M datapoints/s ● 500M unique series
@jugsummercamp
@helloexoworld
Slide 37
Metrics Data Platform
+
+
@jugsummercamp
@helloexoworld
Slide 38
Why Warp 10? Warp 10 is a software platform that ● Ingests and stores time series ● Manipulates and analyzes time series
@jugsummercamp
@helloexoworld
Slide 39
Analytics is the key to success
Fetching data is only the tip of the iceberg @jugsummercamp
@helloexoworld
Slide 40
Manipulating Time Series with Warp 10
A true Time Series analysis toolbox ○ Hundreds of functions ○ Manipulation frameworks ○ Analysis workflow
@jugsummercamp
@helloexoworld
Slide 41
Anatomy of a time series Each time series is composed of: ● Metadata ○ ○
org.nasa.kepler.starlight { keplerId: 52163778 }
Class name Labels
● Datapoints ○ ○
Timestamp Value
@jugsummercamp
@helloexoworld
Slide 42
Class names and labels ● Class names define the kind of measure ○
Starlight, heart rate, speed…
● Labels define particular traits of a TS ○
org.nasa.kepler.starlight { keplerId: 52163778 }
Device Id, Device Type, Private User Id...
@jugsummercamp
@helloexoworld
Slide 43
A match made in heaven Warp 10, OVH Metrics and HelloExoWorld
@jugsummercamp
@helloexoworld
Slide 44
What we have done ● ● ● ●
Downloaded and parsed 40 millions of FITS files Pushed it to OVH Metrics Select a cool subset as training set Verified we could find the same planets as NASA
@jugsummercamp
@helloexoworld
Slide 45
From kepler-11 raw data
@jugsummercamp
@helloexoworld
Slide 46
To (candidates) exoplanets
@jugsummercamp
@helloexoworld
Slide 47
Your job
@jugsummercamp
@helloexoworld
Slide 48
Let's get started! 1. Connect to https://bit.ly/2H7Z5b3 or Connect to WIFI HEW-5G (or HEW) 2. Password is helloexoworld 3. Click on cancel on user password window 4. Open chrome/chromium on 192.168.1.2
Reach step 3.2 and enjoy! @jugsummercamp
@helloexoworld
Slide 49
What's next? Where do we go from here?
@jugsummercamp
@helloexoworld
Slide 50
Only the beginning Better detection
New import method
Explorer
Deep learning
satellite/star location
@jugsummercamp
Yours?
@helloexoworld
Slide 51
A growing team
@jugsummercamp
@helloexoworld
Slide 52
And you!
Join us!
https://helloexo.world
https://xkcd.com/1371/
@jugsummercamp
@helloexoworld
Slide 53
OVH Metrics Come speak with us about your time-series or monitoring projects and OVH Metrics @jugsummercamp
@helloexoworld