Lessons Learned When a Dev Does Opsy Things

A presentation at DeliveryConf 2020 in January 2020 in Seattle, WA, USA by Cora Fedesna

Slide 1

Slide 1

Lessons Learned When a Dev Does Opsy Things Cora Fedesna (she/her) @CorainChicago noti.st/corainchicago @DeliveryConf

Slide 2

Slide 2

What is ActiveCampaign? Automation. Email. CRM. Messaging. For over 80,000 growing businesses.

Slide 3

Slide 3

Software Engineer Developer Conference Organizer @CorainChicago noti.st/corainchicago @DeliveryConf

Slide 4

Slide 4

Never been a: SRE OPS SysAdmin @CorainChicago @DeliveryConf

Slide 5

Slide 5

Our story begins with dealignment of the SREs (Also known as when I join the company)

Slide 6

Slide 6

SRE Alignment SRE Dealignment @CorainChicago @DeliveryConf

Slide 7

Slide 7

We get to do opsy things

Slide 8

Slide 8

We have to do opsy things

Slide 9

Slide 9

@CorainChicago @DeliveryConf

Slide 10

Slide 10

Slide 11

Slide 11

Slide 12

Slide 12

Continuous Integration for your 1000% coverage

Slide 13

Slide 13

Continuous Integration Locally gradle/mvn clean test

Slide 14

Slide 14

Continuous Deployment Locally gradle/mvn clean build

Slide 15

Slide 15

Tasks 1. Write terraform to make my CI/CD pipeline

Slide 16

Slide 16

Tasks 1. Write terraform to make my CI/CD pipeline 2. Write terraform to create the infrastructure for my application

Slide 17

Slide 17

Tasks 1. Write terraform to make my CI/CD pipeline 2. Write terraform to create the infrastructure for my application 3. Get someone to apply the terraform

Slide 18

Slide 18

Tasks 1. Write terraform to make my CI/CD pipeline 2. Write terraform to create the infrastructure for my application 3. Get someone to apply the terraform 4. Learn how to write terraform

Slide 19

Slide 19

terraform/ci @CorainChicago @DeliveryConf

Slide 20

Slide 20

terraform/ci terraform/cd

Slide 21

Slide 21

Tutorial Code PR PR Comments - SRE PR update PR Comments - SRE PR update 1:1 with SRE @CorainChicago 9. 10. 11. 12. 13. 14. 15. Code PR PR Comments - SRE PR Update PR Comments - SRE PR Update 1:1 Apply - SRE @DeliveryConf

Slide 22

Slide 22

DEALIGNMENT All teams are on a paging schedule. @CorainChicago @DeliveryConf

Slide 23

Slide 23

Github Gitlab

Slide 24

Slide 24

Kubernetes

Slide 25

Slide 25

.gitlab-ci.yml include: - project: ‘devops/name-of-project’ ref: v1.1.1 file: ‘filename.yml’

Slide 26

Slide 26

.gitlab-ci.yml cache: paths: - .gradle/wrapper - .gradle/caches

Slide 27

Slide 27

test: .gitlab-ci.yml stage: test image: docker-image-for-jdk services: - docker:18-dind script: - export GRADLE_USER_HOME=pwd/.gradle - apk add docker - apk add bash - setup_docker - ./gradlew clean test

Slide 28

Slide 28

.gitlab-ci.yml script: - export GRADLE_USER_HOME=pwd/.gradle - apk add docker - apk add bash build: - setup_docker stage: build

  • ecr_login name-of-aim-role image: image-jdk-used
  • ./gradlew build services:
  • docker pull $REPOSITORY:latest || true
  • docker:18-dind
  • docker build —cache-from $REPOSITORY:latest —tag $REPOSITORY:$TAG —tag $REPOSITORY:latest . - docker push $REPOSITORY:$TAG - docker push $REPOSITORY:latest

Slide 29

Slide 29

.gitlab-ci.yml variables: VARIABLES_FROM_DEVOP_PROJECT_KEYS: value STAGING_ENABLED: “true” TEST_DISABLED: “true” REVIEW_DEPLOY_VALUES: > —env SPRING_PROFILES_ACTIVE=staging —env DEFAULT_LOG_LEVEL=debug STAGING_DEPLOY_VALUES: > —env SPRING_PROFILES_ACTIVE=staging —env DEFAULT_LOG_LEVEL=debug

Slide 30

Slide 30

name: service-name app.yaml namespace: service-name image: service-name:latest imagePullPolicy: IfNotPresent http: true gateway: service-name.test iam: k8s/segue-e2e-role livenessDelay: 360 livenessTimeout: 15 readinessDelay: 360 readinessTimeout: 50 memoryRequest: 2Gi memoryLimit: 4Gi vaultAddr: https://vault.staging.app-us1.com:8200 containerPort: 8080 secrets: [“vault/path/to/secret”] healthPath: /actuator/health prometheusPath: /actuator/prometheus minScale: 1 maxScale: 1

Slide 31

Slide 31

name: service-name app.yaml namespace: service-name image: service-name:latest imagePullPolicy: IfNotPresent livenessDelay: 360 livenessTimeout: 15 readinessDelay: 360 Oops! Our devops team abstracted this work away from us. http: true gateway: service-name.test iam: k8s/segue-e2e-role containerPort: 8080 healthPath: /actuator/health minScale: 1 maxScale: 1 readinessTimeout: 50 memoryRequest: 2Gi memoryLimit: 4Gi vaultAddr: https://vault.staging.app-us1.com:8200 secrets: [“vault/path/to/secret”] prometheusPath: /actuator/prometheus

Slide 32

Slide 32

Book Club Time!

Slide 33

Slide 33

Book Club Time! (actually reading chapters 1-7)

Slide 34

Slide 34

Kubernetes

Slide 35

Slide 35

CI/CD being much easier, let’s go back to Terraform

Slide 36

Slide 36

Terraform

Slide 37

Slide 37

Slide 38

Slide 38

Playing with AWS == Scary @CorainChicago @DeliveryConf

Slide 39

Slide 39

Dev View resource “aws_db_instance” “name_mysql” { instance_class = “${var.mysql_instance_class}” db_subnet_group_name = “${aws_db_subnet_group.mysql.name}” engine = “mysql” allocated_storage = “${var.allocated_storage}” engine_version = “${var.mysql_version}” username = “${var.mysql_user_name}” password = “${var.mysql_password}” vpc_security_group_ids = [ “${data.aws_security_group.rds.id}”, ] skip_final_snapshot = true identifier = “${var.mysql_idenitifier}” Version tags = “${merge( local.common_tags, map( “Name”, “${var.application_name}-db-instance” ) )}” Username Password lifecycle { ignore_changes = [ “password”, ] } }

Slide 40

Slide 40

CODE Good Bad

Slide 41

Slide 41

Terraform Coding == Java Coding Good ● ● ● ● Modular Variables used Documentation Examples available Bad ● ● ● All in one Hard code values Don’t follow examples

Slide 42

Slide 42

Terraform Coding == Java Coding Good ● ● ● ● Modular Variables used Documentation Examples Bad ● ● ● ● All in one Hard code values Don’t follow examples Doesn’t run

Slide 43

Slide 43

Terraform Coding == Java Coding Good ● ● ● ● Modular structure Variables used Documentation Examples Bad ● ● ● ● ● All in one Hard code values Don’t follow examples Doesn’t run Runs and does something inadvertent and horrible

Slide 44

Slide 44

Module module “service name” { source = environment_type = mysql_instance_class = mysql_version = mysql_password = mysql_user_name = } “../modules/filename” “staging” “${var.mysql_instance_class}” “${var.mysql_version}” “${var.mysql_password}” “${var.mysql_user_name}”

Slide 45

Slide 45

Resources resource “aws_s3_bucket” “bucket name” { bucket = “${var.application_name}-controls” acl = “private” force_destroy = true tags = “${merge( local.common_tags, map( “Name”, “${var.application_name}-controls” ) )}” }

Slide 46

Slide 46

Resources

Slide 47

Slide 47

Slide 48

Slide 48

Data Blocks data “aws_vpc” “my_name_for_it” { tags { Name = “region I want to filter by” } }

Slide 49

Slide 49

Data Blocks resource “aws_db_instance” “name_mysql” { data “aws_security_group” “rds” instance_class { db_subnet_group_name = name = “rds” = “${var.mysql_instance_class}” “${aws_db_subnet_group.mysql.name}” tags { Zone = “zone name” } } engine = “mysql” allocated_storage = “${var.allocated_storage}” engine_version = “${var.mysql_version}” username = “${var.mysql_user_name}” password = “${var.mysql_password}” vpc_security_group_ids = [ “${data.aws_security_group.rds.id}”, ] }

Slide 50

Slide 50

Variables variable “environment_type” { default = “staging” description = “notes about it, explanation” } variable “mysql_password” { description = “The password for the mysql database” type = “string” }

Slide 51

Slide 51

Data vs. Variable @CorainChicago @DeliveryConf

Slide 52

Slide 52

terraform.tfvars mysql_instance_class = “db.m5.12xlarge” mysql_user_name = “username” kubernetes_vpc_name = “vpc_name”

Slide 53

Slide 53

Slide 54

Slide 54

Slide 55

Slide 55

Terraform - Tricky Things Destroy database Commits passwords @CorainChicago @DeliveryConf

Slide 56

Slide 56

Terraform - Tricky Things Destroy database Commits passwords DEAR LORD - IT CAN FEEL SIDEWAYS @CorainChicago @DeliveryConf

Slide 57

Slide 57

terraform fmt Commit code terraform init terraform plan terraform apply @CorainChicago @DeliveryConf

Slide 58

Slide 58

Tutorial Code PR PR Comments - SRE PR update PR Comments - SRE PR update 1:1 with SRE (briefer) @CorainChicago 9. 10. 11. 12. 13. 14. 15. Code PR PR Comments - SRE PR Update PR Comments - SRE PR Update 1:1 Apply - SRE @DeliveryConf

Slide 59

Slide 59

DEALIGNMENT My team’s SRE stops coming to stand up.

Slide 60

Slide 60

Java Spring Docker

Slide 61

Slide 61

Java Spring Docker Database S3 API security K8 Cluster

Slide 62

Slide 62

What’s still a struggle? @CorainChicago noti.st/corainchicago @DeliveryConf

Slide 63

Slide 63

Things Devs Rarely Need to Think About IAM Policies Cidr Blocks VPCs

Slide 64

Slide 64

Things Devs Rarely Deal With SECURITY

Slide 65

Slide 65

IAM Identity and Access Management Cidr Blocks Classless Inter-Domain Routing VPC Virtual Private Cloud

Slide 66

Slide 66

IAM - Identity and Access Management An entity that, when attached to an identity or resource, defines their permissions. AWS Permissions Thing

Slide 67

Slide 67

data “aws_iam_policy_document” “policy_name” { statement { IAM effect = “Allow” actions = [ Identity and “s3:PutObject”, “s3:GetObject” Access ] resources = [ Management “${aws_s3_bucket.name.arn}/*”, ] } }

Slide 68

Slide 68

IAM - Identity and Access Management data “aws_iam_policy_document” “policy_name” { resource “aws_iam_policy” “name_task_policy” { statement { name effect = “Allow” actions = [ = “name-task-profile” policy = “${data.aws_iam_policy_document.policy_name.json}” } “s3:PutObject”, “s3:GetObject”, resource “aws_iam_policy_attachment” “attachment_policy” { ] name = “name-task-profile-attachment” resources = [ roles = [“${aws_iam_role.name_role.name}”] “${aws_s3_bucket.name.arn}/*”, ] } } policy_arn = “${aws_iam_policy.name_task_policy.arn}” }

Slide 69

Slide 69

CIDR - Classless Inter-Domain Routing 192.168.100.14/26

Slide 70

Slide 70

CIDR - Classless Inter-Domain Routing

Slide 71

Slide 71

VPC - Virtual Private Cloud logically isolated section of the AWS any cloud (Has a cidr block)

Slide 72

Slide 72

data “aws_vpc” “kubernetes” { tags { Name = “${var.kubernetes_vpc_name}” } } cidr_blocks = [“${data.aws_vpc.kubernetes.cidr_block}”]

Slide 73

Slide 73

IAM Identity and Access Management Cidr Blocks Classless Inter-Domain Routing VPC Virtual Private Cloud

Slide 74

Slide 74

TUNING @CorainChicago @DeliveryConf

Slide 75

Slide 75

Generate Load @CorainChicago @DeliveryConf

Slide 76

Slide 76

Volume hitting service? Database Connections - good? Services falling over? Error logs?

Slide 77

Slide 77

Volume hitting service? Will my service cause an alert or page? Database Connections - good? Services falling over? Error logs?

Slide 78

Slide 78

Volume hitting service? Services falling over? Can I prevent it? Database Connections - good? Error logs?

Slide 79

Slide 79

AWS Console Datadog Dashboards Kibana Logs Kubectl Commands @CorainChicago noti.st/corainchicago @DeliveryConf

Slide 80

Slide 80

Slide 81

Slide 81

Where’d my db go?

Slide 82

Slide 82

Where’d my db go?

Slide 83

Slide 83

Slide 84

Slide 84

Slide 85

Slide 85

Slide 86

Slide 86

Datadog

Slide 87

Slide 87

Datadog

Slide 88

Slide 88

Datadog

Slide 89

Slide 89

Datadog

Slide 90

Slide 90

Slide 91

Slide 91

Slide 92

Slide 92

Slide 93

Slide 93

Kibana

Slide 94

Slide 94

Slide 95

Slide 95

Logs LOGGER.error(“Here’s the exact information you will need {}” , request);

Slide 96

Slide 96

Kubectl Commands ● Kubectl staging get pods -n namepspace ● Kubectl staging describe pods -n namepspace ● Kubectl staging logs pod_id -n namepspace -c container -f

Slide 97

Slide 97

Modify @CorainChicago @DeliveryConf

Slide 98

Slide 98

Repeat @CorainChicago @DeliveryConf

Slide 99

Slide 99

Tutorial Code PR PR Comments - SRE Dev PR update PR Comments - SRE PR update 1:1 with SRE (briefer) @CorainChicago 9. 10. 11. 12. 13. 14. 15. 16. Code PR PR Comments - SRE PR Update PR Comments - SRE PR Update 1:1 Apply - SRE SRE Ticket to apply @DeliveryConf

Slide 100

Slide 100

Our story ends with… (Also known as now)

Slide 101

Slide 101

Thank you! (We’re hiring!)

Slide 102

Slide 102

Thank you! Cora Fedesna (she/her) @CorainChicago noti.st/corainchicago @DeliveryConf