Metrics driven development and devops

A presentation at Agile 2014 in July 2014 in Orlando, FL, USA by Karthik Gaekwad

Slide 1

Slide 1

• Ernest Mueller (@ernestmueller) • Karthik Gaekwad (@iteration1)

Slide 2

Slide 2

@ernestmueller #Agile2014 @iteration1

Slide 3

Slide 3

• Senior Engineer @Signal Sciences • Previous: • 10 years building productsagile/cloud/devops teams @ernestmueller #Agile2014 @iteration1

Slide 4

Slide 4

• Product Manager at Copperegg • Previous: • 20 years in IT – dev, ops, management @ernestmueller #Agile2014 @iteration1

Slide 5

Slide 5

Our Goal For You Today • • • Empower you with new ideas to bring your organization together! Metrics. What are they? How to use metrics for good, as illustrated by three Epic Rap Battles of History! – Dev vs Ops (What is this… DevOps?) – Small vs Large Org – Scrum vs Kanban @ernestmueller #Agile2014 @iteration1

Slide 6

Slide 6

@ernestmueller #Agile2014 @iteration1

Slide 7

Slide 7

What Are Metrics? • A quantifiable measure of any component or process whose change is of interest to your business. – Business! – Application! – System! – People! – Process! – Not: Meaningless numbers! @ernestmueller #Agile2014 @iteration1

Slide 8

Slide 8

@ernestmueller #Agile2014 @iteration1

Slide 9

Slide 9

Slide 10

Slide 10

@ernestmueller #Agile2014 @iteration1

Slide 11

Slide 11

11

Slide 12

Slide 12

Slide 13

Slide 13

@ernestmueller #Agile2014 @iteration1

Slide 14

Slide 14

@ernestmueller #Agile2014 @iteration1

Slide 15

Slide 15

@ernestmueller #Agile2014 @iteration1

Slide 16

Slide 16

Slide 17

Slide 17

@ernestmueller #Agile2014 @iteration1

Slide 18

Slide 18

@ernestmueller #Agile2014 @iteration1

Slide 19

Slide 19

@ernestmueller #Agile2014 @iteration1

Slide 20

Slide 20

Story Time: Our context • Tasked with building new cloud business for the organization. • Understand how cloud technologies can impact bottom line. • Build products customers will want from the new business unit. – Read, startup inside a bigger organization @ernestmueller #Agile2014 @iteration1

Slide 21

Slide 21

Slide 22

Slide 22

Slide 23

Slide 23

Lean Startup Applied • Used ‘Lean Startup’ ideas to power new area. – Able to define an MVP (Minimum Viable Product). – Easier to define workflow for something brand new. – No confusion with existing processes. • Once we started to see value, retrofitted to other parts of the org. @ernestmueller #Agile2014 @iteration1

Slide 24

Slide 24

Showing progress • Initially- we had weekly progress/status meetings with stakeholders. • Cross functional team with business/marketing/engineering. @ernestmueller #Agile2014 @iteration1

Slide 25

Slide 25

@ernestmueller #Agile2014 @iteration1

Slide 26

Slide 26

@ernestmueller #Agile2014 @iteration1

Slide 27

Slide 27

Metrics • Pivot: change conversations to metrics instead. • Agreed on metrics that we wanted to track – Stakeholder input • “What do you want out of this?” • “How quickly do you want this?” • …Okay, let’s measure this! @ernestmueller #Agile2014 @iteration1

Slide 28

Slide 28

Tracked Metrics • Tracked actionable metrics (dev and business): – # Users signing up per week – # Active sessions per day/week – # of compiles sent per week – # unique data points sent per week

Slide 29

Slide 29

Pro Tip: Metrics • Link all your metrics from one dashboard. – Business (Ex: User logins) – Dev (Ex: Performance metrics) – Ops (Ex: DB CPU Usage) • One bookmark to rule them all. @ernestmueller #Agile2014 @iteration1

Slide 30

Slide 30

Pro Tip: Metrics • Don’t use yet another username/password scheme. • You’ll lose your users really fast! @ernestmueller #Agile2014 @iteration1

Slide 31

Slide 31

Pro Tip: Metrics • Try to use a tool that can handle different kinds of metrics. • Shoutouts: – Statsd – Datadog @ernestmueller #Agile2014 @iteration1

Slide 32

Slide 32

End Result • Business and engineering on the same page. • Management looking at metrics without having “meetings to look at metrics”. • Became a part of the culture. • Innovate faster because different teams were in sync. @ernestmueller #Agile2014 @iteration1

Slide 33

Slide 33

@ernestmueller #Agile2014 @iteration1

Slide 34

Slide 34

Operations What is it? Why Do You Care? @ernestmueller #Agile2014 @iteration1

Slide 35

Slide 35

Other Kinds Of Operations • Wikipedia quoth: • Business operations is the harvesting of value from assets owned by a business • Operations management is […] overseeing, designing, and controlling the process of production and redesigning business operations in the production of goods or services. @ernestmueller #Agile2014 @iteration1

Slide 36

Slide 36

Technical Operations • “Operations: The New Secret Sauce” – Tim O’Reilly (2006) • Without the ability to – Release changes – Quickly respond to change – Provide a service without interruption – Operate cost effectively Your service is borked. @ernestmueller #Agile2014 @iteration1

Slide 37

Slide 37

What Does Operations Do? • • • • • • • • Build Servers, OS, Virtualization/Cloud Install/Upgrade Software Install Applications/Release Process/Move to Prod Configure Network, Load Balancers, Storage, etc. Security testing, reporting, and hardening Reliability (scaling, backups) Performance management (apps, systems) Scalability (capacity planning to autoscaling) @ernestmueller #Agile2014 @iteration1

Slide 38

Slide 38

What Else Does Operations Do? • • • • • • • Availability – Responsible for service being up Incident Response Fulfill Requests Budgeting/Contracts/Cost Tracking/Reduction Monitor all of that Much more So besides “they run the services,” the critical final piece of your value chain, they have access to many of the things you want metrics from

Slide 39

Slide 39

Code – Operations = @ernestmueller #Agile2014 @iteration1

Slide 40

Slide 40

Story Time: Black Friday • Every year, a huge spike in usage • Uptime and performance critical to retailers during the period • Product directly contributed to conversion • Metrics crucial to plan the period, execute through the period, report how we did @ernestmueller #Agile2014 @iteration1

Slide 41

Slide 41

@ernestmueller #Agile2014 @iteration1

Slide 42

Slide 42

Metrics From… From: • Servers • Applications/App Logs • App Servers/Software • Network • Data Stores • Web Servers/CDNs • Client Browsers • Alerts • Tickets Using: • Open source monitoring tools (Zabbix, nagios) • SaaS (SumoLogic, PagerDuty) • JIRA • Custom (Web front end analytics w.Hadoop) • Custom (Amazon cost analytics w.GoodData) • Custom (Metrics Dashboard)

Slide 43

Slide 43

Many Tools Are Awful

Slide 44

Slide 44

Slide 45

Slide 45

Slide 46

Slide 46

YOU DECIDE

Slide 47

Slide 47

DEVOPS @ernestmueller #Agile2014 @iteration1

Slide 48

Slide 48

Slide 49

Slide 49

Traditional Dev and Ops

Slide 50

Slide 50

What is DevOps? @ernestmueller #Agile2014 @iteration1

Slide 51

Slide 51

What is DevOps? @ernestmueller #Agile2014 @iteration1

Slide 52

Slide 52

Slide 53

Slide 53

Scrumming away….

Slide 54

Slide 54

Ready to deploy…

Slide 55

Slide 55

Slide 56

Slide 56

Metrics Promote DevOps • How do you get the cat inside the circle? Herding cats is hard. Some people aren’t cat people. • Metrics can be used to promote culture, understanding, and collaboration • Metrics help keep those different disciplines in sync by providing tangible collaboration points • MTTD, MTTR, performance metrics, events • Bringing all the discipline’s metrics together cover your whole value chain “code to cash”

Slide 57

Slide 57

Operations – Code = IT

Slide 58

Slide 58

Slide 59

Slide 59

IT + DevOps = ? • Many IT teams implement Agile today • They can implement DevOps too • But to do either, they have to change how they interact with others • Focus on customer’s needs not own needs; cloud/SaaS providing “competitive pressure” • Practice Theory of Constraints – embed when possible, even if you need to add some • Add devs and automate

Slide 60

Slide 60

Slide 61

Slide 61

Slide 62

Slide 62

SMALL ORG

Slide 63

Slide 63

Slide 64

Slide 64

Slide 65

Slide 65

Slide 66

Slide 66

Slide 67

Slide 67

Slide 68

Slide 68

Slide 69

Slide 69

Slide 70

Slide 70

Slide 71

Slide 71

Metrics 101: Culture of communication • Talk in terms of metrics – Builds common ground between different roles. – Understand different perspectives. – Find the best way to get everyone talking in 1 place.

Slide 72

Slide 72

Metrics 201 • Push your metrics into your conversation tool • Use tools that everyone likes: – IRC v/s Hipchat/slack • Integrate your metrics into a channel – “Deployment channel” in your chat

Slide 73

Slide 73

Culture of communication • Find a way to get people talking. • Find face to face time with stakeholders. • Metrics are that specific item to have a conversation around. • Engineering teams love IRC, but business and PM’s might not as much. • Transitioned to Slack/Hipchat (integrations and message history) • Leads to visibility and builds trust 74

Slide 74

Slide 74

End Result • Metrics drive conversations between everyone. • Enhances productivity. • Helped us streamline our process.

Slide 75

Slide 75

LARGE ORG

Slide 76

Slide 76

Large Mature Org • • • • • • Hundreds of developers Many teams (many goals, processes) Distributed teams International teams Outsourcers Various Weird Partner Relationships 77

Slide 77

Slide 77

Large Org Problems • Silos Galore • Communication Problems • Annoying Compliance Requirements • Profitability Actually Important • Less pure greenfield work – also responsibility for many existing mature systems

Slide 78

Slide 78

Story Time • Story Time: SaaS product, 40 Engineers, 2/3 outsourced, mostly maintenance but extreme scale (1/3 of staff were Ops) • Lots of support initiated urgent customer requests • Dev still required for features, integration/transition with newer services, bug fix, scaling • Team morale issues

Slide 79

Slide 79

Metrics 101 • First, add Agile. (Previously the ‘stew method’) • Basic Metrics – number of tickets (100+ in queue at any time), size of backlog (500 or so bugs and stories), rate of new inflow and completion. • Used to fix misunderstanding from upper management and correct resourcing • Next step on metrics – how to balance the support work and new work?

Slide 80

Slide 80

Metrics 201 Support SLA

Slide 81

Slide 81

Metrics 201 • Metrics 201 Velocity

Slide 82

Slide 82

Metrics 201 • Balancing these two metrics was the key to satisfying customers short and long term. • But it’s not an either-or - by seeing the effects of people, process, and technology changes on those metrics we drove SLA from <50% to 100% and kept velocity growing (20.. 50… 200…) • Having the metrics to focus on gave shared purpose and eased communication with the large distributed team • Experiment, see the impact, pivot.

Slide 83

Slide 83

Metrics 301 • Monthly “Operational Excellence (Metrics) Meeting” • Teams presented their metrics portfolio – with some variation as appropriate • Drawn from system info, app metrics, db reports, Salesforce, surveys, etc. • Keep it lean!!! • • • • • • • • Revenue and Cost Product Usage Performance Availability Client Satisfaction Employee Satisfaction Quality Security

Slide 84

Slide 84

Slide 85

Slide 85

Metrics 401 - A/B Testing • All features had usage measured • Feature flags would turn features on for customer subsets to measure usage, effect on conversion, etc. before committing • Sometimes you had to kill it despite work spent • Retooling could save a high profile failure • “Yes, product guy, you have to.” • Look for things metrics say you can kill – it’s the only way to stay lean long term

Slide 86

Slide 86

YOU DECIDE

Slide 87

Slide 87

@ernestmueller #Agile2014 @iteration1

Slide 88

Slide 88

Slide 89

Slide 89

KANBAN @ernestmueller #Agile2014 @iteration1

Slide 90

Slide 90

Slide 91

Slide 91

Here’s why… • • • • • • How many meetings? “Short planning meeting”? How often do these go long? Wait how long before prioritizing a feature/bug? Role of a dedicated scrum master is a luxury. Derailed sprints because of changing business priorities… @ernestmueller #Agile2014 @iteration1

Slide 92

Slide 92

Slide 93

Slide 93

Why Kanban? • • • • Limited number of WIP tasks in play. Easier to prioritize. There is only 1 list! Standups are simpler. Task estimates in days versus hours (1/2 day->7 day). • Research tasks to figure out how long something may take. @ernestmueller #Agile2014 @iteration1 94

Slide 94

Slide 94

Kanban benefits • Kanban + CI == Solved our issue of “when to release”. Didn’t have to wait for release windows like in scrum. • Less stressful == Only x number of tasks going on at once. Easier to measure. • Velocity is awesome! @ernestmueller #Agile2014 @iteration1 95

Slide 95

Slide 95

Kanban Metrics • Things we track: – Visualized board (JIRA Greenhopper) – Cycle Time (How fast something gets done) – WIP (Work/Tasks in progress) – Flow diagram – %of bugs

Slide 96

Slide 96

SCRUM @ernestmueller #Agile2014 @iteration1

Slide 97

Slide 97

Scrum Rules • Have used Scrum for both pure Ops and Dev+Ops teams

Slide 98

Slide 98

Kanban Drools • Deadlines help maintain tempo – we had multiple releases a sprint, don’t need to tie them together • You can reliably commit to a near term ETA with Scrum instead of just “when it’s done” • Scrum has a better backlog (esp. in JIRA!) • Many people “doing Kanban” are really “doing nothing”, like some doing “Agile” are really doing “cowboy coding.” Kanban takes more discipline and training than Scrum. @ernestmueller #Agile2014 @iteration1

Slide 99

Slide 99

Scrum and Metrics • Velocity is easier for people to understand than flow diagrams @ernestmueller #Agile2014 @iteration1

Slide 100

Slide 100

But I Hear Kanban Is Better For Ops • In a DevOps world, most Ops work SHOULD NOT be interrupt driven – it’s project work just like the devs are doing • Dev and Ops expedite work approach each other in magnitude over time assuming appropriate investment in automation • You may be thinking of “Level 1 Support” or “The Helpdesk” – that is NOT an Ops Engineer @ernestmueller #Agile2014 @iteration1

Slide 101

Slide 101

Scrum for Ops? • Devs have to be involved in major incidents too! • Over the length of a sprint, the interrupt level evens out – my metrics show that velocity doesn’t vary more than with dev teams • You manage WIP in your scrum too @ernestmueller #Agile2014 @iteration1

Slide 102

Slide 102

Complications Scrum Helps • Distributed teams need more communication ceremonies • Foreign/contract workers need more communication ceremonies • Same process across teams is better – in most cases other teams were using Scrum • Simple common metrics -> better collaboration • When starting from zero, Scrum was the quickest path to team continuous improvement

Slide 103

Slide 103

YOU DECIDE @ernestmueller #Agile2014 @iteration1

Slide 104

Slide 104

Slide 105

Slide 105

Using Metrics For Evil @ernestmueller #Agile2014 @iteration1

Slide 106

Slide 106

Too Many Metrics

Slide 107

Slide 107

Cargo Cult Metrics

Slide 108

Slide 108

Demand Perfection @ernestmueller #Agile2014 @iteration1

Slide 109

Slide 109

Weaponized Metrics

Slide 110

Slide 110

Slide 111

Slide 111

Recap • Metrics are good - use them, be guided by them, communicate with them. @ernestmueller #Agile2014 @iteration1

Slide 112

Slide 112

Recap • Metrics can enhance your: –Culture –Productivity –Process

Slide 113

Slide 113

Recap • Use them for good, not for evil.

Slide 114

Slide 114

@ernestmueller @iteration1 theagileadmin.com