Rediscover the known Universe with NASA datasets

A presentation at Code d'Armor in June 2018 in 22300 Lannion, France by Horacio Gonzalez

Slide 1

Slide 1

Rediscover the known Universe with NASA datasets Horacio Gonzalez @LostInBrittany Introduction to Time Series @LostInBrittany

Slide 2

Slide 2

Horacio Gonzalez @LostInBrittany Spaniard lost in Brittany, developer, dreamer and all-around geek Introduction to Time Series @LostInBrittany

Slide 3

Slide 3

HelloExoWorld Looking for exoplanets in NASA datasets Introduction to Time Series @LostInBrittany

Slide 4

Slide 4

HelloExoWorld Once upon a time... Introduction to Time Series @LostInBrittany

Slide 5

Slide 5

An amateur astronomer Pierre Zemb, DevOps OVH Introduction to Time Series @LostInBrittany

Slide 6

Slide 6

What not to do if you love astronomy Live in Brest Introduction to Time Series @LostInBrittany

Slide 7

Slide 7

Looking for solutions Computer stuff Astronomy Mixing passions Introduction to Time Series @LostInBrittany

Slide 8

Slide 8

Google is your friend... Let's find a project Introduction to Time Series @LostInBrittany

Slide 9

Slide 9

Exoplanets? Planets orbiting stars far away Introduction to Time Series @LostInBrittany

Slide 10

Slide 10

How do we find them? The transit method seems the best Introduction to Time Series @LostInBrittany

Slide 11

Slide 11

The transit method Credits: NASA’s Goddard Space Flight Center Introduction to Time Series @LostInBrittany

Slide 12

Slide 12

How do we look for transits? Image credits : NASA Kepler Introduction to Time Series @LostInBrittany

Slide 13

Slide 13

Watching the sky By Carter Roberts [Public domain], via Wikimedia Commons Introduction to Time Series @LostInBrittany

Slide 14

Slide 14

And what kind of data we get? Pleiades By NASA, ESA, AURA/Caltech, Palomar Observatory. Via Wikimedia Common Introduction to Time Series @LostInBrittany

Slide 15

Slide 15

Well, that's the problem Seven stars, seven different profiles Introduction to Time Series @LostInBrittany

Slide 16

Slide 16

Kinda big data Over 40 million light curves Introduction to Time Series @LostInBrittany

Slide 17

Slide 17

Big AND open data Lots of datasets in #opendata Introduction to Time Series @LostInBrittany

Slide 18

Slide 18

And we can help with that! Let's use our tools to analyse the data Introduction to Time Series @LostInBrittany

Slide 19

Slide 19

A match made in heaven Warp 10, OVH Metrics and HelloExoWorld Introduction to Time Series @LostInBrittany

Slide 20

Slide 20

What we have done ● ● ● ● Downloaded and parsed 40 millions of FITS files Pushed it to OVH Metrics Select a cool subset as training set Verified we could find the same planets as NASA Introduction to Time Series @LostInBrittany

Slide 21

Slide 21

Choosing a star: Kepler 11 Image credit: NASA/Tim Pyle Introduction to Time Series @LostInBrittany

Slide 22

Slide 22

Looking at the raw signal... SAP_FLUX: The flux in units of electrons per second contained in the optimal aperture pixels collected by the spacecraft. Introduction to Time Series @LostInBrittany

Slide 23

Slide 23

Looking at the raw signal... ? SAP_FLUX: The flux in units of electrons per second contained in the optimal aperture pixels collected by the spacecraft. Introduction to Time Series @LostInBrittany

Slide 24

Slide 24

Looking at one record Perturbations in dirty signals Introduction to Time Series @LostInBrittany

Slide 25

Slide 25

Transits are tiny ~40 electrons per second Introduction to Time Series @LostInBrittany

Slide 26

Slide 26

First step: downsampling Introduction to Time Series @LostInBrittany

Slide 27

Slide 27

First step: downsampling You can see the transit candidates… but how can we teach the computer to see them? Introduction to Time Series @LostInBrittany

Slide 28

Slide 28

If you ♥ signal processing High pass filter Introduction to Time Series @LostInBrittany

Slide 29

Slide 29

Poor person's high pass filter Using the trend Introduction to Time Series @LostInBrittany

Slide 30

Slide 30

Signal - Trend Now you can see them well Introduction to Time Series @LostInBrittany

Slide 31

Slide 31

After some tuning We have our transit candidates Introduction to Time Series @LostInBrittany

Slide 32

Slide 32

What's next? Where do we go from here? Introduction to Time Series @LostInBrittany

Slide 33

Slide 33

Only the beginning Better detection New import method Explorer Deep learning satellite/star location Introduction to Time Series Yours? @LostInBrittany

Slide 34

Slide 34

A growing team Introduction to Time Series @LostInBrittany

Slide 35

Slide 35

And you! Join us! https://helloexo.world https://xkcd.com/1371/ Introduction to Time Series @LostInBrittany

Slide 36

Slide 36

Thank you! Introduction to Time Series @LostInBrittany

Slide 37

Slide 37

Want to know more? Analysing with WarpScript Introduction to Time Series @LostInBrittany

Slide 38

Slide 38

WarpScript Reverse Polish Notation Introduction to Time Series @LostInBrittany

Slide 39

Slide 39

Variables ‘hello, world!’ // Push Hello World String on the Stack ‘exo’ STORE // Store it in a variable called exo $exo // Then push back exo variable on the stack Introduction to Time Series @LostInBrittany

Slide 40

Slide 40

What are the available series? [ $readToken // Application authentication '~.*' // selector for classname {} // Selector for labels ] FIND Introduction to Time Series @LostInBrittany

Slide 41

Slide 41

Get raw data [ $readToken // Application authentication 'sap.flux' // selector for classname { 'KEPLERID' '6541920' } // Selector for labels '2009-05-02T00:56:10.000000Z' // Start date '2013-05-11T12:02:06.000000Z' // End date ] FETCH Introduction to Time Series @LostInBrittany

Slide 42

Slide 42

Kepler-11: Raw data Introduction to Time Series @LostInBrittany

Slide 43

Slide 43

Time manipulation Introduction to Time Series @LostInBrittany

Slide 44

Slide 44

Time related functions Introduction to Time Series @LostInBrittany

Slide 45

Slide 45

How to split a Time series $gts // Singleton (or list of) GTS 6h // Minimum of time without data-points 100 // Minimum of data-points required 'record' // New labels to subdivide the result TIMESPLIT Introduction to Time Series @LostInBrittany

Slide 46

Slide 46

Filtering [ $gts // Singleton (or list of) GTS [] // Equivalence classes { 'record' '5' } // Labels to select filter.bylabels // Type of filter ] FILTER Introduction to Time Series @LostInBrittany

Slide 47

Slide 47

Reference record: 5 Introduction to Time Series @LostInBrittany

Slide 48

Slide 48

Downsampling Introduction to Time Series @LostInBrittany

Slide 49

Slide 49

Bucketize Introduction to Time Series @LostInBrittany

Slide 50

Slide 50

Syntax Time series parameter [ $gts bucketizer.min 0 Singleton 2h 0 ] BUCKETIZE Time-series set Introduction to Time Series @LostInBrittany

Slide 51

Slide 51

Syntax Bucketizer [ $gts bucketizer.min 0 2h 0 ] BUCKETIZE Type of operator to apply on each bucket last, max, mean, and, count ... Introduction to Time Series @LostInBrittany

Slide 52

Slide 52

Syntax Lastbucket [ $gts bucketizer.min 0 2h 0 ] End timestamp of the more recent bucket BUCKETIZE Introduction to Time Series @LostInBrittany

Slide 53

Slide 53

Syntax Bucketspan [ $gts bucketizer.min 0 2h 0 ] Width of a bucket BUCKETIZE Introduction to Time Series @LostInBrittany

Slide 54

Slide 54

Syntax Bucketcount [ $gts bucketizer.min 0 2h 0 ] Number of buckets to keep BUCKETIZE Introduction to Time Series @LostInBrittany

Slide 55

Slide 55

Actual Introduction to Time Series @LostInBrittany

Slide 56

Slide 56

Trend Introduction to Time Series @LostInBrittany

Slide 57

Slide 57

Mapper Introduction to Time Series @LostInBrittany

Slide 58

Slide 58

Syntax Time series parameter [ $gts mapper.mean 2 Singleton 2 0 ] MAP Time-series set Introduction to Time Series @LostInBrittany

Slide 59

Slide 59

Syntax Mapper [ $gts mapper.mean 2 2 0 ] MAP Type of operator to apply on each window add, gt, rate, and, count... Introduction to Time Series @LostInBrittany

Slide 60

Slide 60

Syntax Pre [ $gts mapper.mean 2 2 0 ] Number of data-points before MAP Introduction to Time Series @LostInBrittany

Slide 61

Slide 61

Syntax Post [ $gts mapper.mean 2 2 0 ] Number of data-points after MAP Introduction to Time Series @LostInBrittany

Slide 62

Slide 62

Syntax Occurrence [ $gts mapper.mean 2 2 0 ] Maximal number of calculation for a data-point MAP Introduction to Time Series @LostInBrittany

Slide 63

Slide 63

Actual Introduction to Time Series @LostInBrittany

Slide 64

Slide 64

Trend Introduction to Time Series @LostInBrittany

Slide 65

Slide 65

Actual - trend Introduction to Time Series @LostInBrittany

Slide 66

Slide 66

Actual - trend Introduction to Time Series @LostInBrittany

Slide 67

Slide 67

Time to level-up! Introduction to Time Series @LostInBrittany

Slide 68

Slide 68

Time series operation [ $gts0 // First series pull … // … $gtsN // N series pull [ ‘record’ ] // Key labels list op.add // Type of operator ] APPLY Introduction to Time Series @LostInBrittany

Slide 69

Slide 69

Syntax Time series parameter [ $gts0 … $gtsN Singleton [ ‘record’ ] op.add ] APPLY Time-series set Introduction to Time Series @LostInBrittany

Slide 70

Slide 70

Syntax Equivalence class [ Records data $gts0 … $gtsN [ ‘record’ ] op.add ] Record 1 APPLY Record 3 Record 2 Introduction to Time Series @LostInBrittany

Slide 71

Slide 71

Syntax Operator [ $gts0 Record 1 Record 3 … Record 2 $gtsN [ ‘record’ ] op.add ] APPLY Type of operator to apply on each class sub, gt, mask, and, mul ... Introduction to Time Series @LostInBrittany

Slide 72

Slide 72

Final result Introduction to Time Series @LostInBrittany