Rome | March 22 - 23, 2019
Rediscover the known Universe with NASA datasets Horacio Gonzalez @LostInBrittany
@LostInBrittany
Slide 2
Horacio Gonzalez
@LostInBrittany Spaniard lost in Brittany, developer, dreamer and all-around geek
@LostInBrittany
Slide 3
HelloExoWorld
Looking for exoplanets in NASA datasets @LostInBrittany
Slide 4
HelloExoWorld Once upon a time…
@LostInBrittany
Slide 5
An amateur astronomer
Pierre Zemb, DevOps OVH @LostInBrittany
Slide 6
What not to do if you love astronomy
Live in Brest @LostInBrittany
Slide 7
Looking for solutions
Computer stuff
Astronomy
Mixing passions @LostInBrittany
Slide 8
Google is your friend…
Let’s find a project @LostInBrittany
Slide 9
Exoplanets?
Planets orbiting stars far away @LostInBrittany
Slide 10
How do we find them?
The transit method seems the best @LostInBrittany
Slide 11
The transit method
Credits: NASA’s Goddard Space Flight Center
@LostInBrittany
Slide 12
How do we look for transits?
Image credits : NASA
Kepler @LostInBrittany
Slide 13
Watching the sky
By Carter Roberts [Public domain], via Wikimedia Commons
@LostInBrittany
Slide 14
And what kind of data we get?
Pleiades By NASA, ESA, AURA/Caltech, Palomar Observatory. Via Wikimedia Common
@LostInBrittany
Slide 15
Well, that’s the problem
Seven stars, seven different profiles @LostInBrittany
Slide 16
Kinda big data
Over 40 million light curves @LostInBrittany
Slide 17
Big AND open data
Lots of datasets in #opendata @LostInBrittany
Slide 18
And we can help with that!
Let’s use our tools to analyse the data @LostInBrittany
Slide 19
Time Series To analyse Kepler datasets
@LostInBrittany
Slide 20
Kepler: spatial Time Series Definition of Time Series: A series of data points indexed in time order
@LostInBrittany
Slide 21
Time Series ● ● ● ● ● ● ●
Stock Market Analysis Economic Forecasting Budgetary Analysis Process and Quality Control Workload Projections Census Analysis …
@LostInBrittany
Slide 22
Time Series Applications: ● Understanding the data ● Fit a model ○ Monitoring ○ Forecasting
@LostInBrittany
Slide 23
Time Series Stock market Analytics Economic Forecasting
$$$
Study & Research @LostInBrittany
Slide 24
Time Series Many specific analytical tools: ● ● ● ● ● ●
Moving average ARMA (AutoRegressive Moving Average) Multivariate ARMA models ARCH (AutoRegressive Conditional Heteroscedasticity) Dynamic time warping …
@LostInBrittany
Slide 25
Time Series Specific application of general tools ● ● ● ● ●
Artificial neural networks Hidden Markov model Fourier & Wavelets transforms Entropy encoding …
@LostInBrittany
Slide 26
Dealing with Time Series
The 3 ‘v’ @LostInBrittany
Slide 27
A match made in heaven Warp 10, OVH Observability and HelloExoWorld
@LostInBrittany
Slide 28
Monitoring OVH with Time Series
@LostInBrittany
Slide 29
OVH Observability Data Platform Some of OVH Observability metrics: ● 1.5M datapoints/s, 24/7 ● Peaks at ~10M datapoints/s ● 500M unique series
@LostInBrittany
Slide 30
Tools to deal with Time Series
Many options @LostInBrittany
Slide 31
Metrics Data Platform
@LostInBrittany
Slide 32
Metrics Data Platform
+
+
@LostInBrittany
Slide 33
Why Warp 10? Warp 10 is a software platform that ● Ingests and stores time series ● Manipulates and analyzes time series
@LostInBrittany
Slide 34
Analytics is the key to success
Fetching data is only the tip of the iceberg @LostInBrittany
Slide 35
Manipulating Time Series with Warp 10
A true Time Series analysis toolbox ○ Hundreds of functions ○ Manipulation frameworks ○ Analysis workflow
@LostInBrittany
Slide 36
What we have done ● ● ● ●
Downloaded and parsed 40 millions of FITS files Pushed it to OVH Metrics Select a cool subset as training set Verified we could find the same planets as NASA
@LostInBrittany
Slide 37
Choosing a star: Kepler 11
Image credit: NASA/Tim Pyle
@LostInBrittany
Slide 38
Looking at the raw signal…
SAP_FLUX: The flux in units of electrons per second contained in the optimal aperture pixels collected by the spacecraft.
@LostInBrittany
Slide 39
Looking at the raw signal…
?
SAP_FLUX: The flux in units of electrons per second contained in the optimal aperture pixels collected by the spacecraft.
@LostInBrittany
Slide 40
Looking at one record
Perturbations in dirty signals @LostInBrittany
Slide 41
Transits are tiny
~40 electrons per second @LostInBrittany
Slide 42
First step: downsampling
@LostInBrittany
Slide 43
First step: downsampling
You can see the transit candidates… but how can we teach the computer to see them? @LostInBrittany
Slide 44
If you ♥ signal processing
High pass filter @LostInBrittany
Slide 45
Poor person’s high pass filter
Using the trend @LostInBrittany
Slide 46
Signal - Trend
Now you can see them well @LostInBrittany
Slide 47
After some tuning
We have our transit candidates @LostInBrittany
Slide 48
What’s next? Where do we go from here?
@LostInBrittany
Slide 49
Only the beginning
Better detection
New import method
Explorer
Deep learning
satellite/star location
Yours?
@LostInBrittany
Slide 50
A growing team
@LostInBrittany
Slide 51
And you!
Join us!
https://helloexo.world
https://xkcd.com/1371/
@LostInBrittany
Slide 52
Thank you!
@LostInBrittany
Slide 53
Want to know more? Analysing with WarpScript
@LostInBrittany
Variables
‘hello, world!’
// Push Hello World String on the Stack
‘exo’ STORE
// Store it in a variable called exo
$exo
// Then push back exo variable on the stack
@LostInBrittany
Slide 56
What are the available series?
[ $readToken
// Application authentication
‘~.*’
// selector for classname
{}
// Selector for labels
] FIND
@LostInBrittany
Slide 57
Get raw data
[ $readToken
// Application authentication
‘sap.flux’
// selector for classname
{ ‘KEPLERID’ ‘6541920’ }
// Selector for labels
‘2009-05-02T00:56:10.000000Z’
// Start date
‘2013-05-11T12:02:06.000000Z’
// End date
] FETCH
@LostInBrittany
Slide 58
Kepler-11: Raw data
@LostInBrittany
Slide 59
Time manipulation
@LostInBrittany
Slide 60
Time related functions
@LostInBrittany
Slide 61
How to split a Time series
$gts
// Singleton (or list of) GTS
6h
// Minimum of time without data-points
100
// Minimum of data-points required
‘record’
// New labels to subdivide the result
TIMESPLIT
@LostInBrittany
Slide 62
Filtering
[ $gts
// Singleton (or list of) GTS
[]
// Equivalence classes
{ ‘record’ ‘5’ }
// Labels to select
filter.bylabels
// Type of filter
] FILTER
@LostInBrittany
Slide 63
Reference record: 5
@LostInBrittany
Slide 64
Downsampling
@LostInBrittany
Slide 65
Bucketize
@LostInBrittany
Slide 66
Syntax Time series parameter
[ $gts bucketizer.min 0 2h
Singleton
0 ] BUCKETIZE
Time-series set
@LostInBrittany
Slide 67
Syntax Bucketizer
[ $gts bucketizer.min 0 2h 0 ] BUCKETIZE Type of operator to apply on each bucket last, max, mean, and, count …
@LostInBrittany
Slide 68
Syntax Lastbucket
[ $gts bucketizer.min 0 2h 0 ] BUCKETIZE
End timestamp of the more recent bucket
@LostInBrittany
Slide 69
Syntax Bucketspan
[ $gts bucketizer.min 0 2h 0 ] BUCKETIZE
Width of a bucket
@LostInBrittany
Slide 70
Syntax Bucketcount
[ $gts bucketizer.min 0 2h 0 ] BUCKETIZE
Number of buckets to keep
@LostInBrittany
Slide 71
Actual
@LostInBrittany
Slide 72
Trend
@LostInBrittany
Slide 73
Mapper
@LostInBrittany
Slide 74
Syntax Time series parameter
[ $gts mapper.mean 2 2
Singleton
0 ] MAP
Time-series set
@LostInBrittany
Slide 75
Syntax Mapper
[ $gts mapper.mean 2 2 0 ] MAP Type of operator to apply on each window add, gt, rate, and, count…
@LostInBrittany
Slide 76
Syntax Pre
[ $gts mapper.mean 2 2 0 ] MAP
Number of data-points before
@LostInBrittany
Slide 77
Syntax Post
[ $gts mapper.mean 2 2 0 ] MAP
Number of data-points after
@LostInBrittany
Slide 78
Syntax Occurrence
[ $gts mapper.mean 2 2 0 ] MAP
Maximal number of calculation for a data-point
@LostInBrittany
Slide 79
Actual
@LostInBrittany
Slide 80
Trend
@LostInBrittany
Slide 81
Actual - trend
@LostInBrittany
Slide 82
Actual - trend
@LostInBrittany
Slide 83
Time to level-up!
@LostInBrittany
Slide 84
Time series operation
[ $gts0
// First series pull
…
// …
$gtsN
// N series pull
[ ‘record’ ]
// Key labels list
op.add
// Type of operator
] APPLY
@LostInBrittany
Slide 85
Syntax Time series parameter
[ $gts0 … $gtsN [ ‘record’ ]
Singleton
op.add ] APPLY
Time-series set
@LostInBrittany
Slide 86
Syntax Equivalence class
[
Records data
$gts0 … $gtsN [ ‘record’ ] op.add ]
Record 1
Record 3
APPLY Record 2
@LostInBrittany
Slide 87
Syntax Operator
Record 1
[
Record 3
$gts0 … $gtsN
Record 2
[ ‘record’ ] op.add ] APPLY Type of operator to apply on each class sub, gt, mask, and, mul …
@LostInBrittany