The future of DevOps Sasha Rosenbaum @DivineOps

Israeli Air Force Defense Industry R&D Cloud Consulting Microsoft GitHub DevOpsDays Chicago since 2014 Sasha Rosenbaum Red Hat @DivineOps

How about you?

The Past

Technology

1990s: Getting a new server for an application: 2-3 months

Backup

Date Release name 1990 SQL Server 1.1 (16-bit) 1992 SQL Server 4.2A 1993 SQL Server 4.21a 1995 SQL Server 6.0 1996 SQL Server 6.5 1998 SQL Server 7.0 2000 SQL Server 2000 2003 SQL Server 2000 64-bit 2005 SQL Server 2005 2008 SQL Server 2008 2010 Azure SQL database Software release cadence: 2-3-year cycle

Merge hell Merging the development branches and completing the test procedures could take months

Maintenance windows

Expected Availability < 99% 3.65 days / year

Unavailable systems were estimated to have cost American businesses $4.54 billion in 1996. Source: IBM Global Services, Improving systems availability, 1998.

Culture

Traditional IT dev ops wall of confusion

Speed Reliability

The problem isn’t technical. The problem isn’t people. The problem is socio-technical.

Darmok and Jalad at Tanagra

Patrick and Andrew at Agile TO 2008

10 deploys per day: Dev and Ops collaboration at Flickr Velocity 09: John Allspaw and Paul Hammond

Agile Infrastructure Velocity 09: Andrew Clay Shafer

DevOpsDays Ghent 2009: Patrick Debois

Speed Reliability

Charity Majors, CEO of Honeycomb

Nicole Forsgren. State of DevOps Report 2019

Speed Reliability

Software delivery is like a muscle. The more you use it, the stronger it gets. 33

Software Services Needs to be Operated Platform Services Needs to be Operated Infrastructure Services Needs to be Operated 34

In the beginning…

Deployment Checklists

Scripts

OS-level APIs

PowerShell (Windows) configuration management framework and scripting language Jeffrey Snover, 2006

Source Control for Ops

GitHub launch: 2008

Distributed version control + Pull Request system = Global collaboration

Infrastructure-level APIs

Amazon Web Services: 2002 Amazon Cloud Computing: 2006 45

Darwinian Pressure The new models evolved due to pressure to deliver adaptable services at scale. 46

Netflix, Amazon, Google, and every ‘cloud native’ company built a platform Because they had to… 47

‘Cloud’ evolved from lessons learned building and operating these internal services 48

Infrastructure as code

Configuration management minimizes manual toil and infrastructure configuration drift

“The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service.” –Werner Vogels, CTO Amazon 2006

Jez Humble and Dave Farley: 2010

Continuous Integration (CI) The process of automating the build and testing of code every time a team member commits changes to version control.

Continuous Delivery (CD) The approach in which teams produce software in short cycles, ensuring that the software can be reliably released at any time.

Software Services Needs to be Operated Platform Services Needs to be Operated Infrastructure Services Needs to be Operated 55

You will automate me out of a job!

Toil Toil is the kind of work that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows. 58

We would not be able to achieve the availability, reliability and speed we have today without automation

The problem isn’t technical. The problem isn’t people. The problem is socio-technical.

The Present

The future is already here. It’s just not evenly distributed ~ William Gibson

DevOps across Microsoft http://aka.ms/DevOps-Stories 105K 4.4M 5M 2M 500M 500K Engineers use the DevOps platform Git commits per month Builds per month Test executions per day Work items viewed per day Work items updated per day Shared with permission from Microsoft. Internal data snapshot some time in 2019 85,000 Deployments per day

DevOps and SRE engineers command a higher salary

We’ve created new jobs ¯_(ツ)_/¯

The new jobs got people higher salaries and more interesting work! ヽ(•‿•)ノ

We’ve created new disciplines!

SRE

SRE ≃ Google’s DevOps implementation

100% reliability is unattainable

Availability 99.999% 5.26 mins / year

How much does that cost?

Risk and Error Budgets Error Budgets An acceptable level of unreliability It’s a budget. It can be allocated.

SLI, SLO, and SLA Indicators Service Level Terminology Describe the metrics that matter, the values we want, and how we will react Defined measurement of an aspect of a service. Objectives Target value (or range of values) as measured by an SLI Agreements Explicit or implicit contract with users or customers, with consequences of meeting or missing objectives

Monitoring

“Monitoring is how you manage your knownunknowns, which involves checking values for predefined thresholds, creating actionable alerts and runbooks and so forth.” Charity Majors, CEO, Honeycomb

Without monitoring, you have no way to tell whether the service is even working

Observability

“Observability is how you handle unknown-unknowns, by instrumenting your code and capturing the right level of detail that lets you answer any question …” Charity Majors, CEO, Honeycomb

All of this requires collecting and analyzing massive amounts of data

”If we have data, let’s look at data. If all we have are opinions, let’s go with mine” - Jim Barksdale

Infrastructure as code ?

MLOps

MLOps •Massive amounts of data •Data model versioning •Model re-use •Model decay over time •Compliance considerations

Chaos Engineering

Chaos engineering The discipline of experimenting on a software system in production in order to build confidence in the system’s capability to withstand turbulent and unexpected conditions.

Everybody tests in production

Incident Response

Blameless postmortems

There is no root cause

DevSecOps

More code = more problems

Cyber attacks are at all time high

Security must be an integral part of the software development lifecycle

There is so much more…

The industry has evolved

Open Source

Open source is defining the new industry standards 1M+ projects 100M+ repositories Source: https://github.com. August, 2019. 40M developers 2.1M businesses

90% of IT leaders are using enterprise open source. 90% Source: Red Hat State of Open Source Report 2021

Cloud

Cloud Numbers Public Cloud ● IaaS - Dominated by 6-8 Clouds ● PaaS - Dominated by 50-100 Clouds ● SaaS - Over 4000 SaaS offerings Data Center | Private Cloud ● “Only 20% is in the Public Cloud” - IBM ● “Only 5% is in the Public Cloud “ - AWS 2020 Worldwide Public Cloud Revenues ~$235B USD 2020 Worldwide Data Center Revenues $2-4T USD

Kubernetes

85% of global IT leaders agree that Kubernetes is key to cloud-native application strategies Source: Red Hat State of Open Source Report 2021 Source: Red Hat State of Open Source Report 2021

The Future

The future is already here. It’s just not evenly distributed ~ William Gibson

seeking advantage seeking legitimacy

Agile DevOps SRE

“DevOps is a solved problem” - Someone from Google, 2019

Source: StackOverflow Developer Survey, 2018

Source: StackOverflow Developer Survey, 2018

If knowledge was all it took, we’d all have six pack abs.

We must spend time on making sure that the “standard of living” improves for everyone

Companies, just like people, don’t like to change

“Smart people don’t learn … because they have too much invested in proving what they know and avoiding being seen as not knowing.” - Chris Argyris

Learning requires vulnerability

The age of continuous updates

“In this era of becoming, everyone becomes a newbie. Worse, we will be newbies forever.” - Kevin Kelly

More innovation in the new DevOps disciplines

Managed Services

Spend More Time On What Matters The Cloud Native Organization DEV ARCH OPS TRADITIONAL IT CLOUD NATIVE “Artisanal Projects” “Industrial Products” ANCHORED UNINSPIRED APPS PROTECTIVE GATEKEEPING PER PROJECT INFRASTRUCTURE SELF SERVICE CREATION BRITTLE DEPLOYMENTS TOIL DRIVEN OPERATIONS ENABLING CONSTRAINTS COLLAPSE COMPLEXITY UNCHAINED DIFFERENTIATED VALUE RESPONSIVE INNOVATION PLATFORM SERVICES UBIQUITOUS AUTOMATION STANDARDIZE INFRA

“This is one of the innovator’s dilemmas: Blindly following the maxim that good managers should keep close to their customers can sometimes be a fatal mistake.” - Clayton Christensen

The DevOps evolution continues, as we solve new problems every day

Good DevOps copy Great DevOps steal

Thank you! @DivineOps