Rediscover the known Universe with NASA dataset

A presentation at DevFest Nantes 2018 in October 2018 in Nantes, France by Horacio Gonzalez

Slide 1

Slide 1

Rediscover the known Universe with NASA dataset Pierre Zemb Aurélien Hébert Horacio Gonzalez

Slide 2

Slide 2

Pierre Zemb @PierreZ Infrastructure Engineer working on Metrics / Kubernetes

Slide 3

Slide 3

Aurélien Hébert @AurrelH95 Software Engineer and data lover 😍

Slide 4

Slide 4

Horacio Gonzalez  @LostInBrittany Spaniard lost in Brittany, developer, dreamer and all-around geek

Slide 5

Slide 5

HelloExoWorld Looking for exoplanets in NASA datasets

Slide 6

Slide 6

Once upon a time... HelloExoWorld

Slide 7

Slide 7

What not to do if you love astronomy To live in Brest

Slide 8

Slide 8

Looking for solutions Mixing passions

Slide 9

Slide 9

Google is your friend... Let's find a project

Slide 10

Slide 10

Exoplanets? Planets orbiting stars far away

Slide 11

Slide 11

How do we find them? The transit method seems the best

Slide 12

Slide 12

Exoplanets detection From theory to practice

Slide 13

Slide 13

The transit method Credits: NASA’s Goddard Space Flight Center

Slide 14

Slide 14

How do we look for transits? Image credits : NASA Kepler Image credits : NASA Tess

Slide 15

Slide 15

Slide 16

Slide 16

Watching the sky By Carter Roberts [Public domain], via Wikimedia Commons

Slide 17

Slide 17

Kepler image A star : 12*12px

Slide 18

Slide 18

And what kind of data we get? Pleiades By NASA, ESA, AURA/Caltech, Palomar Observatory. Via Wikimedia Common

Slide 19

Slide 19

Well, that's the problem Seven stars, seven different profiles

Slide 20

Slide 20

Kinda big data Over 40 million light curves

Slide 21

Slide 21

Big AND open data Lots of datasets in #opendata

Slide 22

Slide 22

And we can help with that! Let's use our tools to analyse the data

Slide 23

Slide 23

Time Series To analyse Kepler datasets

Slide 24

Slide 24

Kepler: spatial Time Series Definition of Time Series: A series of data points indexed in time order

Slide 25

Slide 25

Time Series Stock Market Analysis Economic Forecasting Budgetary Analysis Process and Quality Control ● Workload Projections ● Census Analysis ● ... ● ● ● ●

Slide 26

Slide 26

Time Series Applications: ● Understanding the data ● Fit a model ○ Monitoring ○ Forecasting

Slide 27

Slide 27

Time Series Stock market Analytics Economic Forecasting $$$ Study & Research

Slide 28

Slide 28

Time Series Many specific analytical tools: Moving average ARMA (AutoRegressive Moving Average) Multivariate ARMA models ARCH (AutoRegressive Conditional Heteroscedasticity) ● Dynamic time warping ● ... ● ● ● ●

Slide 29

Slide 29

Time Series Specific application of general tools ● ● ● ● ● Artificial neural networks Hidden Markov model Fourier & Wavelets transforms Entropy encoding ...

Slide 30

Slide 30

Dealing with Time Series The 3 'v'

Slide 31

Slide 31

Monitoring OVH with Time Series

Slide 32

Slide 32

OVH Metrics A metrics data platform

Slide 33

Slide 33

Tools to deal with Time Series Many options

Slide 34

Slide 34

Metrics Data Platform

Slide 35

Slide 35

Metrics’ metrics ● 1.5M datapoints/s, 24/7 ● Peaks at ~10M datapoints/s ● 500M unique series

Slide 36

Slide 36

Metrics Data Platform + +

Slide 37

Slide 37

Why Warp 10? Warp 10 is a software platform that ● Ingests and stores time series ● Manipulates and analyzes time series

Slide 38

Slide 38

Analytics is the key to success Fetching data is only the tip of the iceberg

Slide 39

Slide 39

Manipulating Time Series with Warp 10 A true Time Series analysis toolbox ○ Hundreds of functions ○ Manipulation frameworks ○ Analysis workflow

Slide 40

Slide 40

Anatomy of a time series Each time series is composed of: ● Metadata ○ ○ Class name Labels ● Datapoints ○ ○ Timestamp Value org.nasa.kepler.starlight { keplerId: 52163778 }

Slide 41

Slide 41

Class names and labels ● Class names define the kind of measure ○ Starlight, heart rate, speed… ● Labels define particular traits of a TS ○ Device Id, Device Type, Private User Id... org.nasa.kepler.starlight { keplerId: 52163778 }

Slide 42

Slide 42

A match made in heaven Warp 10, OVH Metrics and HelloExoWorld

Slide 43

Slide 43

What we have done ● Downloaded and parsed 40 millions of FITS files ● Pushed it to OVH Metrics ● Select a cool subset as training set ● Verified we could find the same planets as NASA

Slide 44

Slide 44

From kepler-11 raw data

Slide 45

Slide 45

To (candidates) exoplanets

Slide 46

Slide 46

Your job

Slide 47

Slide 47

Let's get started! 1. Connect to https://bit.ly/2H7Z5b3 or Connect to WIFI HEW-5G (or HEW) 2. Password is helloexoworld 3. Click on cancel on user password window 4. Open chrome/chromium on 192.168.1.2 Reach step 3.2 and enjoy!

Slide 48

Slide 48

What's next? Where do we go from here?

Slide 49

Slide 49

Only the beginning Better detection New import method Explorer Deep learning satellite/star location Yours?

Slide 50

Slide 50

A growing team

Slide 51

Slide 51

And you! Join us! https://helloexo.world https://xkcd.com/1371/

Slide 52

Slide 52

OVH Metrics Come speak with us about your monitoring and Kubernetes projects!

Slide 53

Slide 53

Thank you, dear sponsors!

Slide 54

Slide 54

Thank you!