Elastic Stack Logging Workshop

A presentation at Elastic Logging Workshop in October 2019 in by Alexander Reelsen

Slide 1

Slide 1

Workshop: Logging with the Elastic Stack Alexander Reelsen @spinscale alex@elastic.co

Slide 2

Slide 2

Slide 3

Slide 3

Agenda • Why use a search engine for logging? • Log centralization • Logging challenges • Deployment • Demo & workshop • Logging patterns •Q&A

Slide 4

Slide 4

Prerequisite • docker • docker-compose • git • java

Slide 5

Slide 5

Prerequisites • git clone https://github.com/xeraa/java-logging • cd java-logging • ./gradlew assemble • docker-compose up —build

Slide 6

Slide 6

Logging? Why use Elastic Stack for logging?

Slide 7

Slide 7

But why? • Fundamental for debugging production issues • Logs are decentralized • Containers containing logs are ephemeral • Logs are not standardized • Correlations are hard

Slide 8

Slide 8

No standards… 1.2.3.4 - - [06/Nov/2014:19:10:38 +0600] “GET /news/foo.html HTTP/1.1” 404 177 “-” “Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”

Slide 9

Slide 9

No standards… Sep 12 10:15:08 rhincodon logd[64]: #DECODE failed to resolve UUID: [pc:0x7fff65ec1ac7 ns:0x06 type:0x82 flags:0x8208 main:A5 2374C3-0F9D-3062-A636-131B737C4589 pid:945]

Slide 10

Slide 10

No standards… [2019-09-12T10:23:45,900][INFO ][o.e.c.s.ClusterApplierService] [rhincodon] master node changed {previous [], current [{rhincodon}{q3RjloGxRdm176yLo9d9UA}{Vq4FpFklRbCyVFAVKU7ukQ} {127.0.0.1}{127.0.0.1:9300}{dim}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20}]}, term: 6, version: 57, reason: Publication{term=6, version=57}

Slide 11

Slide 11

Preprocessing to the rescue • Date normalization • Information extraction • Field normalization

Slide 12

Slide 12

Time series have a lifecycle • Recent data is more important • Recent data is queried more often • Older data less searched • Old data may require archival due to compliance

Slide 13

Slide 13

Time series is a search • Max response time per 10 minute window since yesterday • Documents: All documents from yesterday till now • Aggregate in 10 minute buckets (6*24) • For each bucket, extract max value

Slide 14

Slide 14

Dashboards & Time Series

Slide 15

Slide 15

Dashboards & Time Series

Slide 16

Slide 16

Dashboards & Time Series

Slide 17

Slide 17

Dashboards & Time Series

Slide 18

Slide 18

Dashboards & Time Series

Slide 19

Slide 19

Standardizing data

Slide 20

Slide 20

Elasticsearch overview

Slide 21

Slide 21

Elasticsearch in 10 seconds • Search Engine (FTS, Analytics, Geo), real-time • Distributed, scalable, highly available, resilient • Interface: HTTP & JSON • Centrepiece of the Elastic Stack

Slide 22

Slide 22

Elasticsearch - a distributed system node 1 p0

Slide 23

Slide 23

Elasticsearch - a distributed system node 1 p0 p1

Slide 24

Slide 24

Elasticsearch - a distributed system node 1 node 2 p0 p1

Slide 25

Slide 25

Elasticsearch - a distributed system node 1 node 2 p0 p1 node 3 node 4

Slide 26

Slide 26

Elasticsearch - a distributed system node 1 node 2 node 3 node 4 p0 p1 r0 r1

Slide 27

Slide 27

Ingest overview

Slide 28

Slide 28

Ingestion • Logstash: extensible dynamic data collection • Beats: specialized single purpose data shipper • your own rolled integration, it’s all HTTP!

Slide 29

Slide 29

Logstash

Slide 30

Slide 30

Logstash

Slide 31

Slide 31

Logstash

Slide 32

Slide 32

Beats • Filebeat • Metricbeat • Packetbeat • Winlogbeat • Auditbeat • Heartbeat • Functionbeat • Journalbeat

Slide 33

Slide 33

Filebeat s • Apache • Google Cloud • Logstash • Palo Alto Networks • Auditd • haproxy • MongoDB • PostgreSQL • AWS • IBM MQ • MSSQL • RabbitMQ • CEF • Icinga • MySQL • Redis • Cisco • IIS • nats • Santa • Coredns • Iptables • NetFlow • Suricata • Elasticsearch • Kafka • Nginx • Traefik • Envoyproxy • Kibana • Osquery • Zeek (Bro)

Slide 34

Slide 34

Metricbeat modules • Aerospike • Elasticsearch • Logstash • Redis • Apache • envoyproxy • Memcached • Statsd • aws • Etcd • MongoDB • System • Golang • MSSQL • traefik • Munin • uwsgi • MySQL • vSphere • Nats • Windows • Nginx • ZooKeeper • Beat • Ceph • Graphite • CockroachDB • HAProxy • consul • HTTP • coredns • Jolokia • Couchbase • Kafka • couchdb • Kibana • PostgreSQL • Docker • Kubernetes • Prometheus • Dropwizard • kvm • RabbitMQ • Oracle • PHP_FPM

Slide 35

Slide 35

Solutions

Slide 36

Slide 36

Elastic APM • Distributed tracing • APM server • Kibana application • Agents: Java, .NET, Node, Python, Ruby, RUM, Go • Alerting & ML integration

Slide 37

Slide 37

Elastic Logs

Slide 38

Slide 38

Elastic SIEM

Slide 39

Slide 39

Elastic Metrics

Slide 40

Slide 40

Elastic Uptime

Slide 41

Slide 41

Elastic Uptime

Slide 42

Slide 42

Elastic Infrastructure

Slide 43

Slide 43

Elastic Infrastructure

Slide 44

Slide 44

Deployment options

Slide 45

Slide 45

Distributions • zip, tar.gz, RPM, DEB • debian/rpm repositories, homebrew tap • Docker, Helm chart • K8s Operator (ECK)

Slide 46

Slide 46

Elastic Cloud

Slide 47

Slide 47

Elastic Cloud Enterprise

Slide 48

Slide 48

meetup.com RSVP stream demo Time series data…

Slide 49

Slide 49

logging workshop demo start your engines…

Slide 50

Slide 50

Logging patterns

Slide 51

Slide 51

Time based data • time based data has properties • current data gets indexed • more recent data gets searched more • old data is still required ‘just in case’

Slide 52

Slide 52

Homogeneous architecture

Slide 53

Slide 53

Hot warm architecture

Slide 54

Slide 54

Hot warm architecture Index

Slide 55

Slide 55

Hot warm architecture Index

Slide 56

Slide 56

Hot warm architecture Index

Slide 57

Slide 57

Index Lifecycle Management • Hot: read & write • Warm: frequently read • Cold: seldom read • Delete: no longer needed

Slide 58

Slide 58

Index Lifecycle Management: Hot • rollover • set priority • unfollow

Slide 59

Slide 59

Index Lifecycle Management: Warm • set priority • unfollow • read-only • allocate • shrink • forge merge

Slide 60

Slide 60

Index Lifecycle Management: cold • set priority • unfollow • allocate • freeze

Slide 61

Slide 61

More lifecycle topics • SLM: create snapshots based on cron • Rollup: Summarize and store historical data • Transform: Pivot data to entity centric indices

Slide 62

Slide 62

Architecture patterns

Slide 63

Slide 63

Start small

Slide 64

Slide 64

Grow big

Slide 65

Slide 65

https://ela.st/cfcamp-workshop-munich

Slide 66

Slide 66

Slide 67

Slide 67

Q&A