A presentation at DeliveryConf 2020 in in Seattle, WA, USA by Cora Fedesna
Lessons Learned When a Dev Does Opsy Things Cora Fedesna (she/her) @CorainChicago noti.st/corainchicago @DeliveryConf
What is ActiveCampaign? Automation. Email. CRM. Messaging. For over 80,000 growing businesses.
Software Engineer Developer Conference Organizer @CorainChicago noti.st/corainchicago @DeliveryConf
Never been a: SRE OPS SysAdmin @CorainChicago @DeliveryConf
Our story begins with dealignment of the SREs (Also known as when I join the company)
SRE Alignment SRE Dealignment @CorainChicago @DeliveryConf
We get to do opsy things
We have to do opsy things
@CorainChicago @DeliveryConf
Continuous Integration for your 1000% coverage
Continuous Integration Locally gradle/mvn clean test
Continuous Deployment Locally gradle/mvn clean build
Tasks 1. Write terraform to make my CI/CD pipeline
Tasks 1. Write terraform to make my CI/CD pipeline 2. Write terraform to create the infrastructure for my application
Tasks 1. Write terraform to make my CI/CD pipeline 2. Write terraform to create the infrastructure for my application 3. Get someone to apply the terraform
Tasks 1. Write terraform to make my CI/CD pipeline 2. Write terraform to create the infrastructure for my application 3. Get someone to apply the terraform 4. Learn how to write terraform
terraform/ci @CorainChicago @DeliveryConf
terraform/ci terraform/cd
Tutorial Code PR PR Comments - SRE PR update PR Comments - SRE PR update 1:1 with SRE @CorainChicago 9. 10. 11. 12. 13. 14. 15. Code PR PR Comments - SRE PR Update PR Comments - SRE PR Update 1:1 Apply - SRE @DeliveryConf
DEALIGNMENT All teams are on a paging schedule. @CorainChicago @DeliveryConf
Github Gitlab
Kubernetes
.gitlab-ci.yml include: - project: ‘devops/name-of-project’ ref: v1.1.1 file: ‘filename.yml’
.gitlab-ci.yml cache: paths: - .gradle/wrapper - .gradle/caches
test:
.gitlab-ci.yml
stage: test image: docker-image-for-jdk services: - docker:18-dind script: - export GRADLE_USER_HOME=pwd
/.gradle - apk add docker - apk add bash - setup_docker - ./gradlew clean test
.gitlab-ci.yml
script: - export GRADLE_USER_HOME=pwd
/.gradle - apk add docker - apk add bash
build: - setup_docker
stage: build
.gitlab-ci.yml variables: VARIABLES_FROM_DEVOP_PROJECT_KEYS: value STAGING_ENABLED: “true” TEST_DISABLED: “true” REVIEW_DEPLOY_VALUES: > —env SPRING_PROFILES_ACTIVE=staging —env DEFAULT_LOG_LEVEL=debug STAGING_DEPLOY_VALUES: > —env SPRING_PROFILES_ACTIVE=staging —env DEFAULT_LOG_LEVEL=debug
name: service-name app.yaml namespace: service-name image: service-name:latest imagePullPolicy: IfNotPresent http: true gateway: service-name.test iam: k8s/segue-e2e-role livenessDelay: 360 livenessTimeout: 15 readinessDelay: 360 readinessTimeout: 50 memoryRequest: 2Gi memoryLimit: 4Gi vaultAddr: https://vault.staging.app-us1.com:8200 containerPort: 8080 secrets: [“vault/path/to/secret”] healthPath: /actuator/health prometheusPath: /actuator/prometheus minScale: 1 maxScale: 1
name: service-name app.yaml namespace: service-name image: service-name:latest imagePullPolicy: IfNotPresent livenessDelay: 360 livenessTimeout: 15 readinessDelay: 360 Oops! Our devops team abstracted this work away from us. http: true gateway: service-name.test iam: k8s/segue-e2e-role containerPort: 8080 healthPath: /actuator/health minScale: 1 maxScale: 1 readinessTimeout: 50 memoryRequest: 2Gi memoryLimit: 4Gi vaultAddr: https://vault.staging.app-us1.com:8200 secrets: [“vault/path/to/secret”] prometheusPath: /actuator/prometheus
Book Club Time!
Book Club Time! (actually reading chapters 1-7)
Kubernetes
CI/CD being much easier, let’s go back to Terraform
Terraform
Playing with AWS == Scary @CorainChicago @DeliveryConf
Dev View resource “aws_db_instance” “name_mysql” { instance_class = “${var.mysql_instance_class}” db_subnet_group_name = “${aws_db_subnet_group.mysql.name}” engine = “mysql” allocated_storage = “${var.allocated_storage}” engine_version = “${var.mysql_version}” username = “${var.mysql_user_name}” password = “${var.mysql_password}” vpc_security_group_ids = [ “${data.aws_security_group.rds.id}”, ] skip_final_snapshot = true identifier = “${var.mysql_idenitifier}” Version tags = “${merge( local.common_tags, map( “Name”, “${var.application_name}-db-instance” ) )}” Username Password lifecycle { ignore_changes = [ “password”, ] } }
CODE Good Bad
Terraform Coding == Java Coding Good ● ● ● ● Modular Variables used Documentation Examples available Bad ● ● ● All in one Hard code values Don’t follow examples
Terraform Coding == Java Coding Good ● ● ● ● Modular Variables used Documentation Examples Bad ● ● ● ● All in one Hard code values Don’t follow examples Doesn’t run
Terraform Coding == Java Coding Good ● ● ● ● Modular structure Variables used Documentation Examples Bad ● ● ● ● ● All in one Hard code values Don’t follow examples Doesn’t run Runs and does something inadvertent and horrible
Module module “service name” { source = environment_type = mysql_instance_class = mysql_version = mysql_password = mysql_user_name = } “../modules/filename” “staging” “${var.mysql_instance_class}” “${var.mysql_version}” “${var.mysql_password}” “${var.mysql_user_name}”
Resources resource “aws_s3_bucket” “bucket name” { bucket = “${var.application_name}-controls” acl = “private” force_destroy = true tags = “${merge( local.common_tags, map( “Name”, “${var.application_name}-controls” ) )}” }
Resources
Data Blocks data “aws_vpc” “my_name_for_it” { tags { Name = “region I want to filter by” } }
Data Blocks resource “aws_db_instance” “name_mysql” { data “aws_security_group” “rds” instance_class { db_subnet_group_name = name = “rds” = “${var.mysql_instance_class}” “${aws_db_subnet_group.mysql.name}” tags { Zone = “zone name” } } engine = “mysql” allocated_storage = “${var.allocated_storage}” engine_version = “${var.mysql_version}” username = “${var.mysql_user_name}” password = “${var.mysql_password}” vpc_security_group_ids = [ “${data.aws_security_group.rds.id}”, ] }
Variables variable “environment_type” { default = “staging” description = “notes about it, explanation” } variable “mysql_password” { description = “The password for the mysql database” type = “string” }
Data vs. Variable @CorainChicago @DeliveryConf
terraform.tfvars mysql_instance_class = “db.m5.12xlarge” mysql_user_name = “username” kubernetes_vpc_name = “vpc_name”
Terraform - Tricky Things Destroy database Commits passwords @CorainChicago @DeliveryConf
Terraform - Tricky Things Destroy database Commits passwords DEAR LORD - IT CAN FEEL SIDEWAYS @CorainChicago @DeliveryConf
terraform fmt Commit code terraform init terraform plan terraform apply @CorainChicago @DeliveryConf
Tutorial Code PR PR Comments - SRE PR update PR Comments - SRE PR update 1:1 with SRE (briefer) @CorainChicago 9. 10. 11. 12. 13. 14. 15. Code PR PR Comments - SRE PR Update PR Comments - SRE PR Update 1:1 Apply - SRE @DeliveryConf
DEALIGNMENT My team’s SRE stops coming to stand up.
Java Spring Docker
Java Spring Docker Database S3 API security K8 Cluster
What’s still a struggle? @CorainChicago noti.st/corainchicago @DeliveryConf
Things Devs Rarely Need to Think About IAM Policies Cidr Blocks VPCs
Things Devs Rarely Deal With SECURITY
IAM Identity and Access Management Cidr Blocks Classless Inter-Domain Routing VPC Virtual Private Cloud
IAM - Identity and Access Management An entity that, when attached to an identity or resource, defines their permissions. AWS Permissions Thing
data “aws_iam_policy_document” “policy_name” { statement { IAM effect = “Allow” actions = [ Identity and “s3:PutObject”, “s3:GetObject” Access ] resources = [ Management “${aws_s3_bucket.name.arn}/*”, ] } }
IAM - Identity and Access Management data “aws_iam_policy_document” “policy_name” { resource “aws_iam_policy” “name_task_policy” { statement { name effect = “Allow” actions = [ = “name-task-profile” policy = “${data.aws_iam_policy_document.policy_name.json}” } “s3:PutObject”, “s3:GetObject”, resource “aws_iam_policy_attachment” “attachment_policy” { ] name = “name-task-profile-attachment” resources = [ roles = [“${aws_iam_role.name_role.name}”] “${aws_s3_bucket.name.arn}/*”, ] } } policy_arn = “${aws_iam_policy.name_task_policy.arn}” }
CIDR - Classless Inter-Domain Routing 192.168.100.14/26
CIDR - Classless Inter-Domain Routing
VPC - Virtual Private Cloud logically isolated section of the AWS any cloud (Has a cidr block)
data “aws_vpc” “kubernetes” { tags { Name = “${var.kubernetes_vpc_name}” } } cidr_blocks = [“${data.aws_vpc.kubernetes.cidr_block}”]
IAM Identity and Access Management Cidr Blocks Classless Inter-Domain Routing VPC Virtual Private Cloud
TUNING @CorainChicago @DeliveryConf
Generate Load @CorainChicago @DeliveryConf
Volume hitting service? Database Connections - good? Services falling over? Error logs?
Volume hitting service? Will my service cause an alert or page? Database Connections - good? Services falling over? Error logs?
Volume hitting service? Services falling over? Can I prevent it? Database Connections - good? Error logs?
AWS Console Datadog Dashboards Kibana Logs Kubectl Commands @CorainChicago noti.st/corainchicago @DeliveryConf
Where’d my db go?
Where’d my db go?
Datadog
Datadog
Datadog
Datadog
Kibana
Logs LOGGER.error(“Here’s the exact information you will need {}” , request);
Kubectl Commands ● Kubectl staging get pods -n namepspace ● Kubectl staging describe pods -n namepspace ● Kubectl staging logs pod_id -n namepspace -c container -f
Modify @CorainChicago @DeliveryConf
Repeat @CorainChicago @DeliveryConf
Tutorial Code PR PR Comments - SRE Dev PR update PR Comments - SRE PR update 1:1 with SRE (briefer) @CorainChicago 9. 10. 11. 12. 13. 14. 15. 16. Code PR PR Comments - SRE PR Update PR Comments - SRE PR Update 1:1 Apply - SRE SRE Ticket to apply @DeliveryConf
Our story ends with… (Also known as now)
Thank you! (We’re hiring!)
Thank you! Cora Fedesna (she/her) @CorainChicago noti.st/corainchicago @DeliveryConf
The story of a Dev wading into the land of ops to get her deployment environment created. The story starts in the land of terraforming. It’s a different land for coding with scary and unfamiliar things like data blocks, IAM roles, and cidr. What is cidr? Well you can’t drink it! She travels through the land of AWS console where to wind up in the running of load tests to tune her services resources.
This talk is a discovery of terraforming, AWS dashboards, and the digging (datadog/logs) needed to tune a service with load tests. It will show code, there will be no live coding. In a story format, the talk walks the audience through these steps from the perspective of a Dev.
Here’s what was said about this presentation on social media.
Another good one I got the privilege of sitting in on from #DeliveryConf. Go watch @CorainChicago talk about learning ops from a dev's perspective! https://t.co/6IIXAtDA1w
— Laura 👩🏽💻🧶 (@nimbinatus) January 28, 2020
Super cool #deliveryconf talk from @CorainChicago just now - pretty awesome that I lucked into her discussion of being a #terraform using SRE at @ActiveCampaign! Great storytelling and a compelling discussion. Thanks for the session!
— Jon Schulman (@vaficionado) January 21, 2020
Great #deliveryconf talk from @CoraInChicago on learning the hard lessons about application ownership and site reliability. 👏 for getting hands-on!! https://t.co/zMu05fRQsi
— Rosalind Benoit (@dnilas0r) January 21, 2020
When it’s going wrong, it feels very #wrong
— Sasha Rosenbaum (@DivineOps) January 21, 2020
Because you’re dealing with #infrastructure
And that somehow feels terribly serious #developer does #devops @CorainChicago @DeliveryConf pic.twitter.com/Tk3yVGTbLJ
“These things are still a struggle”
— David Killmon 🌤 (@kohidave) January 21, 2020
Omg, right?! Same. Same. @CorainChicago #DeliveryConf pic.twitter.com/pVzt5r22gs
Bad #Terraform practices:
— Sasha Rosenbaum (@DivineOps) January 21, 2020
Hard-coded values 🙁🙁🙁
Worse:
Doesn’t run 😣😣😣
Much worse:
Runs, but does what you didn’t intent it to do 😩😩😩@CorainChicago @DeliveryConf pic.twitter.com/zDxCfZ1xtW
I love story talks, and @CorainChicago shared an awesome journey doing opsy things as a dev. Lots of personal, shared experience that doesn't get shared often enough! @DeliveryConf #deliveryconf pic.twitter.com/wDawMZy4I5
— Steve Pereira (@SteveElsewhere) January 21, 2020
This is the first talk at #DeliveryConf I've seen where pronouns have been included. Now I'm EXTRA excited about this talk!
— Dr. Io, Hugs Reliability Engineer (@DevelopingHugs) January 21, 2020
Thanks for being wonderful, @CorainChicago! pic.twitter.com/UcxkrMiLtw
We are really excited to welcome Cora Fedesna @CorainChicago starting a conversation about “Lessons Learned When A Dev Does Opsy Things”
— DeliveryConf (@DeliveryConf) November 28, 2019
See you all there! https://t.co/REuDd3oQwDhttps://t.co/efxHUn1fDc pic.twitter.com/XqzW1goYl0