Rediscover the known Universe with NASA dataset

A presentation at JUG Summer Camp in September 2018 in La Rochelle, France by Horacio Gonzalez

Slide 1

Slide 1

Rediscover the known Universe with NASA dataset @LostInBritanny @AurrelH95 @PierreZ @moyowi @jugsummercamp @helloexoworld

Slide 2

Slide 2

Pierre Zemb @PierreZ Infrastructure Engineer working on distributed systems @jugsummercamp @helloexoworld

Slide 3

Slide 3

Aurélien Hébert @AurrelH95 Software Engineer and data lover @jugsummercamp @helloexoworld

Slide 4

Slide 4

Horacio Gonzalez @LostInBrittany Spaniard lost in Brittany, developer, dreamer and all-around geek @jugsummercamp @helloexoworld

Slide 5

Slide 5

Emmanuel we ♥ you @jugsummercamp @helloexoworld

Slide 6

Slide 6

HelloExoWorld Looking for exoplanets in NASA datasets @jugsummercamp @helloexoworld

Slide 7

Slide 7

HelloExoWorld Once upon a time... @jugsummercamp @helloexoworld

Slide 8

Slide 8

What not to do if you love astronomy Live in Brest @jugsummercamp @helloexoworld

Slide 9

Slide 9

Looking for solutions Computer stuff Astronomy Mixing passions @jugsummercamp @helloexoworld

Slide 10

Slide 10

Google is your friend... Let's find a project @jugsummercamp @helloexoworld

Slide 11

Slide 11

Exoplanets? Planets orbiting stars far away @jugsummercamp @helloexoworld

Slide 12

Slide 12

How do we find them? The transit method seems the best @jugsummercamp @helloexoworld

Slide 13

Slide 13

Exoplanets detection From theory to practice @jugsummercamp @helloexoworld

Slide 14

Slide 14

The transit method Credits: NASA’s Goddard Space Flight Center @jugsummercamp @helloexoworld

Slide 15

Slide 15

How do we look for transits? Image credits : NASA Image credits : NASA Kepler Tess @jugsummercamp @helloexoworld

Slide 16

Slide 16

@jugsummercamp @helloexoworld

Slide 17

Slide 17

Watching the sky By Carter Roberts [Public domain], via Wikimedia Commons @jugsummercamp @helloexoworld

Slide 18

Slide 18

Kepler image A star : 12*12px @jugsummercamp @helloexoworld

Slide 19

Slide 19

And what kind of data we get? Pleiades By NASA, ESA, AURA/Caltech, Palomar Observatory. Via Wikimedia Common @jugsummercamp @helloexoworld

Slide 20

Slide 20

Well, that's the problem Seven stars, seven different profiles @jugsummercamp @helloexoworld

Slide 21

Slide 21

Kinda big data Over 40 million light curves @jugsummercamp @helloexoworld

Slide 22

Slide 22

Big AND open data Lots of datasets in #opendata @jugsummercamp @helloexoworld

Slide 23

Slide 23

And we can help with that! Let's use our tools to analyse the data @jugsummercamp @helloexoworld

Slide 24

Slide 24

Time Series To analyse Kepler datasets @jugsummercamp @helloexoworld

Slide 25

Slide 25

Kepler: spatial Time Series Definition of Time Series: A series of data points indexed in time order @jugsummercamp @helloexoworld

Slide 26

Slide 26

Time Series ● ● ● ● ● ● ● Stock Market Analysis Economic Forecasting Budgetary Analysis Process and Quality Control Workload Projections Census Analysis ... @jugsummercamp @helloexoworld

Slide 27

Slide 27

Time Series Applications: ● Understanding the data ● Fit a model ○ Monitoring ○ Forecasting @jugsummercamp @helloexoworld

Slide 28

Slide 28

Time Series Stock market Analytics Economic Forecasting $$$ Study & Research @jugsummercamp @helloexoworld

Slide 29

Slide 29

Time Series Many specific analytical tools: ● ● ● ● ● ● Moving average ARMA (AutoRegressive Moving Average) Multivariate ARMA models ARCH (AutoRegressive Conditional Heteroscedasticity) Dynamic time warping ... @jugsummercamp @helloexoworld

Slide 30

Slide 30

Time Series Specific application of general tools ● ● ● ● ● Artificial neural networks Hidden Markov model Fourier & Wavelets transforms Entropy encoding ... @jugsummercamp @helloexoworld

Slide 31

Slide 31

Dealing with Time Series The 3 'v' @jugsummercamp @helloexoworld

Slide 32

Slide 32

Monitoring OVH with Time Series @jugsummercamp @helloexoworld

Slide 33

Slide 33

OVH Metrics A metrics data platform @jugsummercamp @helloexoworld

Slide 34

Slide 34

Tools to deal with Time Series Many options @jugsummercamp @helloexoworld

Slide 35

Slide 35

Metrics Data Platform @jugsummercamp @helloexoworld

Slide 36

Slide 36

Metrics’ metrics ● 1.5M datapoints/s, 24/7 ● Peaks at ~10M datapoints/s ● 500M unique series @jugsummercamp @helloexoworld

Slide 37

Slide 37

Metrics Data Platform + + @jugsummercamp @helloexoworld

Slide 38

Slide 38

Why Warp 10? Warp 10 is a software platform that ● Ingests and stores time series ● Manipulates and analyzes time series @jugsummercamp @helloexoworld

Slide 39

Slide 39

Analytics is the key to success Fetching data is only the tip of the iceberg @jugsummercamp @helloexoworld

Slide 40

Slide 40

Manipulating Time Series with Warp 10 A true Time Series analysis toolbox ○ Hundreds of functions ○ Manipulation frameworks ○ Analysis workflow @jugsummercamp @helloexoworld

Slide 41

Slide 41

Anatomy of a time series Each time series is composed of: ● Metadata ○ ○ org.nasa.kepler.starlight { keplerId: 52163778 } Class name Labels ● Datapoints ○ ○ Timestamp Value @jugsummercamp @helloexoworld

Slide 42

Slide 42

Class names and labels ● Class names define the kind of measure ○ Starlight, heart rate, speed… ● Labels define particular traits of a TS ○ org.nasa.kepler.starlight { keplerId: 52163778 } Device Id, Device Type, Private User Id... @jugsummercamp @helloexoworld

Slide 43

Slide 43

A match made in heaven Warp 10, OVH Metrics and HelloExoWorld @jugsummercamp @helloexoworld

Slide 44

Slide 44

What we have done ● ● ● ● Downloaded and parsed 40 millions of FITS files Pushed it to OVH Metrics Select a cool subset as training set Verified we could find the same planets as NASA @jugsummercamp @helloexoworld

Slide 45

Slide 45

From kepler-11 raw data @jugsummercamp @helloexoworld

Slide 46

Slide 46

To (candidates) exoplanets @jugsummercamp @helloexoworld

Slide 47

Slide 47

Your job @jugsummercamp @helloexoworld

Slide 48

Slide 48

Let's get started! 1. Connect to https://bit.ly/2H7Z5b3 or Connect to WIFI HEW-5G (or HEW) 2. Password is helloexoworld 3. Click on cancel on user password window 4. Open chrome/chromium on 192.168.1.2 Reach step 3.2 and enjoy! @jugsummercamp @helloexoworld

Slide 49

Slide 49

What's next? Where do we go from here? @jugsummercamp @helloexoworld

Slide 50

Slide 50

Only the beginning Better detection New import method Explorer Deep learning satellite/star location @jugsummercamp Yours? @helloexoworld

Slide 51

Slide 51

A growing team @jugsummercamp @helloexoworld

Slide 52

Slide 52

And you! Join us! https://helloexo.world https://xkcd.com/1371/ @jugsummercamp @helloexoworld

Slide 53

Slide 53

OVH Metrics Come speak with us about your time-series or monitoring projects and OVH Metrics @jugsummercamp @helloexoworld

Slide 54

Slide 54

Thank you, dear sponsors! @jugsummercamp @helloexoworld

Slide 55

Slide 55

Thank you! @jugsummercamp @helloexoworld