Blood-curdling tales of microservices misadventure, devops dread, and grisly governance

A presentation at Software Circus in October 2020 in by Holly Cummins

Slide 1

Slide 1

Blood-curdling tales of microservices misadventure devops dread grisly governance Holly Cummins IBM Garage @holly_cummins

Slide 2

Slide 2

I’m a consultant with the IBM Garage. These are my scary stories #IBMGarage @holly_cummins

Slide 3

Slide 3

is this thing on? http://sli.do #L750 #IBMGarage @holly_cummins

Slide 4

Slide 4

doom! the murky goal

Slide 5

Slide 5

what problem are we trying to solve? #IBMGarage @holly_cummins

Slide 6

Slide 6

doom! microservices envy

Slide 7

Slide 7

#IBMGarage @holly_cummins

Slide 8

Slide 8

we need to microservices #IBMGarage @holly_cummins

Slide 9

Slide 9

microservices are not the goal #IBMGarage @holly_cummins

Slide 10

Slide 10

microservices are not the goal they are the means #IBMGarage @holly_cummins

Slide 11

Slide 11

“we’re going too slowly. we need to get rid of COBOL and make microservices!” #IBMGarage @holly_cummins

Slide 12

Slide 12

“we’re going too slowly. we need to get rid of COBOL and make microservices!” “… but our release board only meets twice a year.” #IBMGarage @holly_cummins

Slide 13

Slide 13

distributed monolith #IBMGarage @holly_cummins

Slide 14

Slide 14

distributed monolith but without compile-time checking … or guaranteed function execution #IBMGarage @holly_cummins

Slide 15

Slide 15

reasons not to do microservices small team not planning to release independently don’t want complexity of a service mesh - or worse yet, rolling your own domain model doesn’t split nicely #IBMGarage @holly_cummins

Slide 16

Slide 16

doom! cloud-native spaghetti

Slide 17

Slide 17

“each of our microservices has duplicated the same object model … with twenty classes and seventy fields” #IBMGarage @holly_cummins

Slide 18

Slide 18

“every time we touch one microservice, the others break” #IBMGarage @holly_cummins

Slide 19

Slide 19

Microservice #IBMGarage @holly_cummins

Slide 20

Slide 20

Domain Microservice #IBMGarage @holly_cummins

Slide 21

Slide 21

Domain Microservice #IBMGarage @holly_cummins

Slide 22

Slide 22

distributed != decoupled #IBMGarage @holly_cummins

Slide 23

Slide 23

doom! microservices ops mayhem

Slide 24

Slide 24

doom! microservices ops mayhem

Slide 25

Slide 25

do you know how to operate these things? (there are quite a few of them) #IBMGarage @holly_cummins

Slide 26

Slide 26

do you know how to operate these things? (there are quite a few of them) #IBMGarage @holly_cummins

Slide 27

Slide 27

observability #IBMGarage @holly_cummins

Slide 28

Slide 28

observability #IBMGarage @holly_cummins

Slide 29

Slide 29

SRE #IBMGarage @holly_cummins

Slide 30

Slide 30

SRE #IBMGarage @holly_cummins

Slide 31

Slide 31

doom! the ‘someday’ automation

Slide 32

Slide 32

“our ops isn’t automated” #IBMGarage @holly_cummins

Slide 33

Slide 33

“our tests aren’t automated” #IBMGarage @holly_cummins

Slide 34

Slide 34

“we don’t know if our code works” #IBMGarage @holly_cummins

Slide 35

Slide 35

“we don’t know if our code works” #IBMGarage @holly_cummins

Slide 36

Slide 36

microservices need automated integration tests #IBMGarage @holly_cummins

Slide 37

Slide 37

microservices need automated contract tests #IBMGarage @holly_cummins

Slide 38

Slide 38

the rotting automation #IBMGarage @holly_cummins

Slide 39

Slide 39

#IBMGarage @holly_cummins

Slide 40

Slide 40

“oh yes, that build has been broken for a few weeks…” #IBMGarage @holly_cummins

Slide 41

Slide 41

“we don’t know when the build is broken” #IBMGarage @holly_cummins

Slide 42

Slide 42

let’s talk about your build http://sli.do #L750 #IBMGarage @holly_cummins

Slide 43

Slide 43

doom! the not-actually-continuous continuous integration and continuous deployment

Slide 44

Slide 44

“we have a CI/CD” #IBMGarage @holly_cummins

Slide 45

Slide 45

“we have a CI/CD” #IBMGarage @holly_cummins

Slide 46

Slide 46

CI/CD is something you do not a tool you buy #IBMGarage @holly_cummins

Slide 47

Slide 47

“i’ll merge my branch into our CI next week” #IBMGarage @holly_cummins

Slide 48

Slide 48

“CI/CD … CI/CD … CI/CD … we release every six months … CI/CD …. ” #IBMGarage @holly_cummins

Slide 49

Slide 49

what is CD? http://sli.do #L750 #IBMGarage @holly_cummins

Slide 50

Slide 50

continuous. I don’t think that word means what you think it means. #IBMGarage @holly_cummins

Slide 51

Slide 51

how do you do continuous? http://sli.do #L750 #IBMGarage @holly_cummins

Slide 52

Slide 52

how do you CD? http://sli.do #L750 #IBMGarage @holly_cummins

Slide 53

Slide 53

“we can’t ship until we have more confidence in the quality” #IBMGarage @holly_cummins

Slide 54

Slide 54

doom! the software crypt

Slide 55

Slide 55

“we can’t actually release this.” #IBMGarage @holly_cummins

Slide 56

Slide 56

why? #IBMGarage @holly_cummins

Slide 57

Slide 57

“we’ve scheduled the architecture board review for a month after the project is ready to ship” #IBMGarage @holly_cummins

Slide 58

Slide 58

“we can’t release this microservice… we deploy all our microservices at the same time.” #IBMGarage @holly_cummins

Slide 59

Slide 59

why? oh yes, we don’t know if they work #IBMGarage @holly_cummins

Slide 60

Slide 60

“we can’t ship until every feature is complete” #IBMGarage @holly_cummins

Slide 61

Slide 61

what’s the point of architecture that can go faster, if you don’t go faster? #IBMGarage @holly_cummins

Slide 62

Slide 62

feedback is good engineering #IBMGarage @holly_cummins

Slide 63

Slide 63

deferred wiring #IBMGarage @holly_cummins

Slide 64

Slide 64

feature flags #IBMGarage @holly_cummins

Slide 65

Slide 65

a/b testing canary deploys #IBMGarage @holly_cummins

Slide 66

Slide 66

doom! the lockeddown totally rigid inflexible un-cloudy cloud

Slide 67

Slide 67

“this provisioning software is broken” #IBMGarage @holly_cummins

Slide 68

Slide 68

10 minute provision-time what we sold “this provisioning software is broken” #IBMGarage @holly_cummins

Slide 69

Slide 69

what the client thought they’d got 10 minute provision-time what we sold 3 month provisiontime “this provisioning software is broken” #IBMGarage @holly_cummins

Slide 70

Slide 70

what the client thought they’d got 10 minute provision-time the reason 3 month provisiontime 84-step pre-approval process what we sold “this provisioning software is broken” #IBMGarage @holly_cummins

Slide 71

Slide 71

#IBMGarage @holly_cummins

Slide 72

Slide 72

governance #IBMGarage @holly_cummins

Slide 73

Slide 73

#IBMGarage @holly_cummins

Slide 74

Slide 74

Provider A Provider B “we’re going to change cloud provider #IBMGarage @holly_cummins

Slide 75

Slide 75

Provider A Provider B “we’re going to change cloud provider to fix our procurement process!” #IBMGarage @holly_cummins

Slide 76

Slide 76

Provider A Provider B “we’re going to change cloud provider to fix our procurement process!” #IBMGarage @holly_cummins

Slide 77

Slide 77

#IBMGarage @holly_cummins

Slide 78

Slide 78

“we’ve configured our network! #IBMGarage @holly_cummins

Slide 79

Slide 79

“we’ve configured our network! you can either access the cloud servers … or access jira. #IBMGarage @holly_cummins

Slide 80

Slide 80

“we’ve configured our network! you can either access the cloud servers … or access jira. to access both you’d need two machines.” #IBMGarage @holly_cummins

Slide 81

Slide 81

“it takes us a week to start coding.” #IBMGarage @holly_cummins

Slide 82

Slide 82

“it takes us a week to start coding.” “two days to get a repo … two days to get a pipeline …” #IBMGarage @holly_cummins

Slide 83

Slide 83

there is a cost: developers flee #IBMGarage @holly_cummins

Slide 84

Slide 84

doom! the mystery money pit

Slide 85

Slide 85

the cloud makes it so easy to provision hardware. IBM Garage @holly_cummins

Slide 86

Slide 86

that doesn’t mean the hardware is free. IBM Garage @holly_cummins

Slide 87

Slide 87

or useful. IBM Garage @holly_cummins

Slide 88

Slide 88

zombie workload #IBMGarage @holly_cummins

Slide 89

Slide 89

2017 survey 25% of 16,000 servers doing no useful work #IBMGarage @holly_cummins

Slide 90

Slide 90

2017 survey 25% of 16,000 servers doing no useful work #IBMGarage @holly_cummins

Slide 91

Slide 91

Hey boss, I created a Kubernetes cluster. #IBMGarage @holly_cummins

Slide 92

Slide 92

Hey boss, I created a Kubernetes cluster. I forgot it for 2 months. #IBMGarage @holly_cummins

Slide 93

Slide 93

Hey boss, I created a Kubernetes cluster. I forgot it for 2 months. … and it’s £1000 a month. #IBMGarage @holly_cummins

Slide 94

Slide 94

“we have 28 cloud instances. or maybe it’s 35.” IBM Garage @holly_cummins

Slide 95

Slide 95

“we have no idea how much we’re spending on cloud.” IBM Garage @holly_cummins

Slide 96

Slide 96

finops multicloud management IBM Garage @holly_cummins

Slide 97

Slide 97

@holly_cummins