A presentation at Cloud Native London (September 2019) in September 2019 in London, UK by David McKay
Cloud Native Telegraf Cloud Native London September 2019
🏴 Scottish David McKay InfluxData Developer Advocate 2 © 2019 InfluxData. All rights reserved. 💙 Esoteric Programming Languages ☸ Kubernetes Release Team 🚒 Former SRE 🍝 Former Developer @rawkode
Cloud Native Telegraf 3 © 2019 InfluxData. All rights reserved.
Can I have one Telegraf, please? 4 © 2019 InfluxData. All rights reserved.
Telegraf github.com/influxdata/telegraf Telegraf is an agent for collecting, processing, aggregating, and writing metrics. 5 © 2019 InfluxData. All rights reserved. @rawkode
Architecture GCP Third Party Systems Telegraf ? Your Application 6 © 2019 InfluxData. All rights reserved. @rawkode
Telegraf is Agnostic 7 © 2019 InfluxData. All rights reserved.
Architecture GCP Third Party Systems StackDriver Telegraf InfluxDB Prometheus Your Application 8 © 2019 InfluxData. All rights reserved. @rawkode
Plugins Inputs ★ ★ ★ ★ ★ ★ Docker Kafka Kubernetes Nats Postgres System ○ ○ ○ ○ ○ 9 CPU Disk Disk IO Mem Process © 2019 InfluxData. All rights reserved. Outputs ➔ ➔ ➔ ➔ ➔ ➔ ➔ ➔ ➔ ➔ CrateDB CloudWatch DataDog Elasticsearch Graphite InfluxDB OpenTSDB Prometheus StackDriver Wavefront @rawkode
Plugins 10 Inputs Outputs
160 35 © 2019 InfluxData. All rights reserved. @rawkode
Input: activemq Slide 9 / 247 11 © 2019 InfluxData. All rights reserved.
Input: kubernetes Slide 12 / 48 12 © 2019 InfluxData. All rights reserved.
Kubernetes ➔ Should be run as a DaemonSet ➔ Hits the stats/summary endpoint of each kubelet ➔ Is responsible for gathering metrics for pods and their containers ➔ Will produce high cardinality data 13 © 2019 InfluxData. All rights reserved. @rawkode
Kubernetes [[inputs.kubernetes]] url = “https://localhost:10255” bearer_token = “/run/secrets/token insecure_skip_verify = true 14 © 2019 InfluxData. All rights reserved. @rawkode
Kubernetes For Cloud Providers Managed Kubernetes or minikube [[inputs.kubernetes]] url = “https://kubernetes.default/api/v1/nodes/$NODE_NAME/proxy/ ” 15 © 2019 InfluxData. All rights reserved. @rawkode
Kubernetes Improvements ➔ 99.97% of the time, this plugin will run in-cluster ◆ No reference, I made this number up ➔ So we don’t need any configuration ◆ We should trust you to manage RBAC ◆ We’ll use mounted ServiceAccount ◆ We’ll infer URL 16 © 2019 InfluxData. All rights reserved. @rawkode
Input: kube_inventory Slide 10 / 20 17 © 2019 InfluxData. All rights reserved.
Kube Inventory ➔ Should be run as a Deployment, with a single replica ➔ Hits the APIServer for resource information ➔ Will give you information on Deployments, DaemonSets, Volumes, etc, etc ➔ Will produce high cardinality data 18 © 2019 InfluxData. All rights reserved. @rawkode
Kube Inventory [[inputs.kube_inventory]] url = “https://kubernetes.default” bearer_token = “” resource_exclude = [] resource_include = [] 19 © 2019 InfluxData. All rights reserved. @rawkode
Kube Inventory Improvements ➔ 99.97% of the time, this plugin will run in-cluster ◆ I heard this once before ➔ So we don’t need any configuration ◆ We should trust you to manage RBAC ◆ We’ll use mounted ServiceAccount ◆ We’ll infer URL 20 © 2019 InfluxData. All rights reserved. @rawkode
Input: prometheus Slide 10 / 20 21 © 2019 InfluxData. All rights reserved.
Prometheus ➔ Run it however you want ◆ Globally ◆ Per Namespace ◆ Depends on your workloads ➔ Will scrape Prometheus endpoints ➔ Will discover services through Prometheus annotations 22 © 2019 InfluxData. All rights reserved. @rawkode
Prometheus [[inputs.prometheus]] monitor_kubernetes_pods = true # monitor_kubernetes_pods_namespace = “” bearer_token = “” 23 © 2019 InfluxData. All rights reserved. @rawkode
Prometheus Improvements ➔ 99.97% of the time, this plugin will run in-cluster ◆ Definite fact, I’ve heard this more than once ➔ So we don’t need any configuration ◆ We should trust you to manage RBAC ◆ We’ll use mounted ServiceAccount 24 © 2019 InfluxData. All rights reserved. @rawkode
Prometheus Improvements ➔ Support ServiceMonitor CRD (Prometheus Operator) 25 © 2019 InfluxData. All rights reserved. @rawkode
Output: influxdb 26 © 2019 InfluxData. All rights reserved.
InfluxDB [[outputs.influxdb]] urls = [“http://influxdb.monitoring:8086”] [[outputs.influxdb_v2]] urls = [“http://influxdb.monitoring:9999”] organization = “InfluxData” bucket = “kubernetes” token = “secret-token” 27 © 2019 InfluxData. All rights reserved. @rawkode
Output: prometheus_client 28 © 2019 InfluxData. All rights reserved.
Prometheus Client [[outputs.prometheus_client]] ## Address to listen on. listen = “:9273” 29 © 2019 InfluxData. All rights reserved. @rawkode
Telegraf Super Powers 30 © 2019 InfluxData. All rights reserved.
Proxying 31 © 2019 InfluxData. All rights reserved.
Proxying influxdb_listener is a service input plugin that listens for requests sent according to the InfluxDB HTTP API. The intent of the plugin is to allow Telegraf to serve as a proxy/router for the /write endpoint of the InfluxDB HTTP API. 32 © 2019 InfluxData. All rights reserved. @rawkode
Proxying http_listener_2 is a service input plugin that listens for metrics sent via HTTP. Metrics may be sent in ANY supported data format. 33 © 2019 InfluxData. All rights reserved. @rawkode
Proxying There’s also socket_listener, tcp_listener, and udp_listener 34 © 2019 InfluxData. All rights reserved. @rawkode
Batching 35 © 2019 InfluxData. All rights reserved.
Batching Telegraf will send metrics to outputs in batches of at most metric_batch_size metrics. This controls the size of writes that Telegraf sends to output plugins. 36 © 2019 InfluxData. All rights reserved. @rawkode
Buffering 37 © 2019 InfluxData. All rights reserved.
Buffering If a write to an output fails, Telegraf will hold metric_buffer_limit worth of metrics in-memory before data is lost. This is PER output 38 © 2019 InfluxData. All rights reserved. @rawkode
These 2 simple settings get you redundancy, high availability, and performance optimisation of the write path. 39 © 2019 InfluxData. All rights reserved.
Telegraf as a Sidecar 40 © 2019 InfluxData. All rights reserved.
Telegraf as a Sidecar Hopefully from everything I’ve discussed, you can see how Telegraf could be a useful addition to any application as a sidecar. 1. It can consume logs 2. You can write events / traces from your code 3. It can act as a local metric buffer during DB downtime 41 © 2019 InfluxData. All rights reserved. @rawkode
Telegraf as a Sidecar Unfortunately … The Telegraf binary is around 80MiB The Telegraf image is around 250MiB / 80MiB 42 © 2019 InfluxData. All rights reserved. @rawkode
BYOT: Bring Your Own Telegraf 43 © 2019 InfluxData. All rights reserved.
Bring Your Own Telegraf FROM rawkode/telegraf:byo AS build FROM alpine:3.7 AS telegraf COPY —from=build /etc/telegraf /etc/telegraf COPY —from=build /go/src/github.com/influxdata/telegraf/telegraf /bin/telegraf 44 © 2019 InfluxData. All rights reserved. @rawkode
Telegraf Operator 45 © 2019 InfluxData. All rights reserved.
Telegraf Operator apiVersion: influxdata.com/v1 kind: Telegraf metadata: name: mine spec: version: “1.12” scrape_prometheus: false sidecar_injection: true metric_server: true 46 © 2019 InfluxData. All rights reserved. @rawkode
Demo Time 47 © 2019 InfluxData. All rights reserved.
48 © 2019 InfluxData. All rights reserved. @rawkode
🐦 @rawkode 🐦 Thank You