Cloud Native Telegraf

A presentation at Cloud Native London (September 2019) in September 2019 in London, UK by David McKay

Slide 1

Slide 1

Cloud Native Telegraf Cloud Native London September 2019

Slide 2

Slide 2

🏴󠁧󠁢󠁳󠁣󠁴󠁿 Scottish David McKay InfluxData Developer Advocate 2 © 2019 InfluxData. All rights reserved. 💙 Esoteric Programming Languages ☸ Kubernetes Release Team 🚒 Former SRE 🍝 Former Developer @rawkode

Slide 3

Slide 3

Cloud Native Telegraf 3 © 2019 InfluxData. All rights reserved.

Slide 4

Slide 4

Can I have one Telegraf, please? 4 © 2019 InfluxData. All rights reserved.

Slide 5

Slide 5

Telegraf github.com/influxdata/telegraf Telegraf is an agent for collecting, processing, aggregating, and writing metrics. 5 © 2019 InfluxData. All rights reserved. @rawkode

Slide 6

Slide 6

Architecture GCP Third Party Systems Telegraf ? Your Application 6 © 2019 InfluxData. All rights reserved. @rawkode

Slide 7

Slide 7

Telegraf is Agnostic 7 © 2019 InfluxData. All rights reserved.

Slide 8

Slide 8

Architecture GCP Third Party Systems StackDriver Telegraf InfluxDB Prometheus Your Application 8 © 2019 InfluxData. All rights reserved. @rawkode

Slide 9

Slide 9

Plugins Inputs ★ ★ ★ ★ ★ ★ Docker Kafka Kubernetes Nats Postgres System ○ ○ ○ ○ ○ 9 CPU Disk Disk IO Mem Process © 2019 InfluxData. All rights reserved. Outputs ➔ ➔ ➔ ➔ ➔ ➔ ➔ ➔ ➔ ➔ CrateDB CloudWatch DataDog Elasticsearch Graphite InfluxDB OpenTSDB Prometheus StackDriver Wavefront @rawkode

Slide 10

Slide 10

Plugins 10 Inputs Outputs

160 35 © 2019 InfluxData. All rights reserved. @rawkode

Slide 11

Slide 11

Input: activemq Slide 9 / 247 11 © 2019 InfluxData. All rights reserved.

Slide 12

Slide 12

Input: kubernetes Slide 12 / 48 12 © 2019 InfluxData. All rights reserved.

Slide 13

Slide 13

Kubernetes ➔ Should be run as a DaemonSet ➔ Hits the stats/summary endpoint of each kubelet ➔ Is responsible for gathering metrics for pods and their containers ➔ Will produce high cardinality data 13 © 2019 InfluxData. All rights reserved. @rawkode

Slide 14

Slide 14

Kubernetes [[inputs.kubernetes]] url = “https://localhost:10255” bearer_token = “/run/secrets/token insecure_skip_verify = true 14 © 2019 InfluxData. All rights reserved. @rawkode

Slide 15

Slide 15

Kubernetes For Cloud Providers Managed Kubernetes or minikube [[inputs.kubernetes]] url = “https://kubernetes.default/api/v1/nodes/$NODE_NAME/proxy/ ” 15 © 2019 InfluxData. All rights reserved. @rawkode

Slide 16

Slide 16

Kubernetes Improvements ➔ 99.97% of the time, this plugin will run in-cluster ◆ No reference, I made this number up ➔ So we don’t need any configuration ◆ We should trust you to manage RBAC ◆ We’ll use mounted ServiceAccount ◆ We’ll infer URL 16 © 2019 InfluxData. All rights reserved. @rawkode

Slide 17

Slide 17

Input: kube_inventory Slide 10 / 20 17 © 2019 InfluxData. All rights reserved.

Slide 18

Slide 18

Kube Inventory ➔ Should be run as a Deployment, with a single replica ➔ Hits the APIServer for resource information ➔ Will give you information on Deployments, DaemonSets, Volumes, etc, etc ➔ Will produce high cardinality data 18 © 2019 InfluxData. All rights reserved. @rawkode

Slide 19

Slide 19

Kube Inventory [[inputs.kube_inventory]] url = “https://kubernetes.default” bearer_token = “” resource_exclude = [] resource_include = [] 19 © 2019 InfluxData. All rights reserved. @rawkode

Slide 20

Slide 20

Kube Inventory Improvements ➔ 99.97% of the time, this plugin will run in-cluster ◆ I heard this once before ➔ So we don’t need any configuration ◆ We should trust you to manage RBAC ◆ We’ll use mounted ServiceAccount ◆ We’ll infer URL 20 © 2019 InfluxData. All rights reserved. @rawkode

Slide 21

Slide 21

Input: prometheus Slide 10 / 20 21 © 2019 InfluxData. All rights reserved.

Slide 22

Slide 22

Prometheus ➔ Run it however you want ◆ Globally ◆ Per Namespace ◆ Depends on your workloads ➔ Will scrape Prometheus endpoints ➔ Will discover services through Prometheus annotations 22 © 2019 InfluxData. All rights reserved. @rawkode

Slide 23

Slide 23

Prometheus [[inputs.prometheus]] monitor_kubernetes_pods = true # monitor_kubernetes_pods_namespace = “” bearer_token = “” 23 © 2019 InfluxData. All rights reserved. @rawkode

Slide 24

Slide 24

Prometheus Improvements ➔ 99.97% of the time, this plugin will run in-cluster ◆ Definite fact, I’ve heard this more than once ➔ So we don’t need any configuration ◆ We should trust you to manage RBAC ◆ We’ll use mounted ServiceAccount 24 © 2019 InfluxData. All rights reserved. @rawkode

Slide 25

Slide 25

Prometheus Improvements ➔ Support ServiceMonitor CRD (Prometheus Operator) 25 © 2019 InfluxData. All rights reserved. @rawkode

Slide 26

Slide 26

Output: influxdb 26 © 2019 InfluxData. All rights reserved.

Slide 27

Slide 27

InfluxDB [[outputs.influxdb]] urls = [“http://influxdb.monitoring:8086”] [[outputs.influxdb_v2]] urls = [“http://influxdb.monitoring:9999”] organization = “InfluxData” bucket = “kubernetes” token = “secret-token” 27 © 2019 InfluxData. All rights reserved. @rawkode

Slide 28

Slide 28

Output: prometheus_client 28 © 2019 InfluxData. All rights reserved.

Slide 29

Slide 29

Prometheus Client [[outputs.prometheus_client]] ## Address to listen on. listen = “:9273” 29 © 2019 InfluxData. All rights reserved. @rawkode

Slide 30

Slide 30

Telegraf Super Powers 30 © 2019 InfluxData. All rights reserved.

Slide 31

Slide 31

Proxying 31 © 2019 InfluxData. All rights reserved.

Slide 32

Slide 32

Proxying influxdb_listener is a service input plugin that listens for requests sent according to the InfluxDB HTTP API. The intent of the plugin is to allow Telegraf to serve as a proxy/router for the /write endpoint of the InfluxDB HTTP API. 32 © 2019 InfluxData. All rights reserved. @rawkode

Slide 33

Slide 33

Proxying http_listener_2 is a service input plugin that listens for metrics sent via HTTP. Metrics may be sent in ANY supported data format. 33 © 2019 InfluxData. All rights reserved. @rawkode

Slide 34

Slide 34

Proxying There’s also socket_listener, tcp_listener, and udp_listener 34 © 2019 InfluxData. All rights reserved. @rawkode

Slide 35

Slide 35

Batching 35 © 2019 InfluxData. All rights reserved.

Slide 36

Slide 36

Batching Telegraf will send metrics to outputs in batches of at most metric_batch_size metrics. This controls the size of writes that Telegraf sends to output plugins. 36 © 2019 InfluxData. All rights reserved. @rawkode

Slide 37

Slide 37

Buffering 37 © 2019 InfluxData. All rights reserved.

Slide 38

Slide 38

Buffering If a write to an output fails, Telegraf will hold metric_buffer_limit worth of metrics in-memory before data is lost. This is PER output 38 © 2019 InfluxData. All rights reserved. @rawkode

Slide 39

Slide 39

These 2 simple settings get you redundancy, high availability, and performance optimisation of the write path. 39 © 2019 InfluxData. All rights reserved.

Slide 40

Slide 40

Telegraf as a Sidecar 40 © 2019 InfluxData. All rights reserved.

Slide 41

Slide 41

Telegraf as a Sidecar Hopefully from everything I’ve discussed, you can see how Telegraf could be a useful addition to any application as a sidecar. 1. It can consume logs 2. You can write events / traces from your code 3. It can act as a local metric buffer during DB downtime 41 © 2019 InfluxData. All rights reserved. @rawkode

Slide 42

Slide 42

Telegraf as a Sidecar Unfortunately … The Telegraf binary is around 80MiB The Telegraf image is around 250MiB / 80MiB 42 © 2019 InfluxData. All rights reserved. @rawkode

Slide 43

Slide 43

BYOT: Bring Your Own Telegraf 43 © 2019 InfluxData. All rights reserved.

Slide 44

Slide 44

Bring Your Own Telegraf FROM rawkode/telegraf:byo AS build FROM alpine:3.7 AS telegraf COPY —from=build /etc/telegraf /etc/telegraf COPY —from=build /go/src/github.com/influxdata/telegraf/telegraf /bin/telegraf 44 © 2019 InfluxData. All rights reserved. @rawkode

Slide 45

Slide 45

Telegraf Operator 45 © 2019 InfluxData. All rights reserved.

Slide 46

Slide 46

Telegraf Operator apiVersion: influxdata.com/v1 kind: Telegraf metadata: name: mine spec: version: “1.12” scrape_prometheus: false sidecar_injection: true metric_server: true 46 © 2019 InfluxData. All rights reserved. @rawkode

Slide 47

Slide 47

Demo Time 47 © 2019 InfluxData. All rights reserved.

Slide 48

Slide 48

48 © 2019 InfluxData. All rights reserved. @rawkode

Slide 49

Slide 49

🐦 @rawkode 🐦 Thank You