DevOps Patterns & Antipatterns for Continuous Software Updates

A presentation at Cloud and AI DevFest Montreal 2019 in September 2019 in Montreal, QC, Canada by Baruch Sadogursky

Slide 1

Slide 1

DevOps Patterns & Antipatterns for Continuous Software Updates “What can possibly go wrong?!”

Slide 2

Slide 2

Why software updates? @jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 3

Slide 3

Slide 4

Slide 4

Slide 5

Slide 5

@jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 6

Slide 6

“As every company become a software company, Security vulnerabilities are the new oil spills” @jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 7

Slide 7

Can be helped with tech

Slide 8

Slide 8

Identify @jbaruch #LiquidSoftware Fix #CloudAIdevfestMTL19 Deploy http://jfrog.com/shownotes

Slide 9

Slide 9

Identify Fix Deploy Immediately OS upgrade years

Slide 10

Slide 10

Identify Fix Deploy 2 months Struts upgrade 2 months

Slide 11

Slide 11

@jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 12

Slide 12

@jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 13

Slide 13

@jbaruch #LiquidSoftware Identify As fast as possible Fix As fast as possible Deploy As fast as possible #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 14

Slide 14

Slide 15

Slide 15

Slide 16

Slide 16

Slide 17

Slide 17

@jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 18

Slide 18

This is not a new idea! @jbaruch #LiquidSoftware XP: short feedback Scrum: reducing cycle time to absolute minimum TPS: Decide as late as possible and Deliver as fast as possible Kanban: Incremental change #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 19

Slide 19

Slide 20

Slide 20

shownotes http://jfrog.com/shownotes Slides Video Links Comments, Ratings Raffle @jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 21

Slide 21

Slide 22

Slide 22

@jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 23

Slide 23

@jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 24

Slide 24

Slide 25

Slide 25

Slide 26

Slide 26

Update available Yes No Do we trust the update? Yes How about no Let’s update! Yes Are there any high risks? No Do we want it? No

Slide 27

Slide 27

@jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 28

Slide 28

number of artifacts as a symptom of complexity Today IoT Serverless Docker Microservices Infrastructure as Code Continuous Delivery Continuous Integration Agile 2000 @jbaruch @jfrog #LiquidSoftware www.liquidsoftware.com

Slide 29

Slide 29

The problem is not the code, it’s the data. Big data. @jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 30

Slide 30

#emptyenvelopefromchina @jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 31

Slide 31

@jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 32

Slide 32

Update available Yes No Can we verify the update? No Yes Yes How about no Do we trust the update? Time consuming verification Let’s update! Yes Are there any high risks? No Do we want it? No

Slide 33

Slide 33

Slide 34

Slide 34

Features that we want @jbaruch #LiquidSoftware Acceptance tests costs #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 35

Slide 35

Slide 36

Slide 36

Your browser Twitter in your browser Twitter on your smartphone Your smartphone OS?! Update available Yes Are there any high risks? No Let’s update! Do we want it? No one asked you (auto update)

Slide 37

Slide 37

What can possibly go wrong?

Slide 38

Slide 38

@jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 39

Slide 39

Slide 40

Slide 40

Continuous updates pattern: Local rollback @jbaruch #LiquidSoftware Problem: update went catastrophically wrong and an over the-air patch can’t reach the device Solution: Have a previous version saved on the device prior to update. Rollback in case problem occurred #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 41

Slide 41

Slide 42

Slide 42

Slide 43

Slide 43

Slide 44

Slide 44

Continuous updates pattern: OTA software updates @jbaruch #LiquidSoftware Problem: physical recalls are costly. Extremely costly. Also, you can’t force an upgrade. Solution: Implement over the air software updates, preferably, continuous updates. #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 45

Slide 45

continuous OTA updates are like normal OTA updates, but better @jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 46

Slide 46

Slide 47

Slide 47

Slide 48

Slide 48

Slide 49

Slide 49

Continuous updates pattern: continuous updates @jbaruch #LiquidSoftware Problem: In batch updates important features wait for non-important features. Solution: Implement continuous updates. #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 50

Slide 50

@jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 51

Slide 51

Slide 52

Slide 52

Nub’s horror @jbaruch #LiquidSoftware New feature update Uses templating with $ symbol Apple’s staging servers return prices without $ symbol Some Apple’s production servers return prices with $ symbol As a result, some users suffer crashes It took time to understand what went wrong It took time to get the fix through Apple review #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 53

Slide 53

Continuous updates pattern: Canary releases @jbaruch #LiquidSoftware Problem: Releasing a bug affects ALL the users. Solution: Release to a small number of users first and observe. If a problem occurs, stop the release, revert or update the affected users. #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 54

Slide 54

Continuous updates pattern: observability @jbaruch #LiquidSoftware Problem: Some problems are hard to trace relying on user feedback only Solution: Implement tracing, monitoring and logging #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 55

Slide 55

Continuous updates pattern: Rollbacks @jbaruch #LiquidSoftware Problem: Fixes might take time, users suffer in a meanwhile Solution: Implement rollback, the ability to deploy a previous version without delay #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 56

Slide 56

Continuous updates pattern: feature flags @jbaruch #LiquidSoftware Problem: Rollbacks are not always supported by the deployment target platform Solution: Embed 2 versions of the features in the app itself and trigger them with API calls #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 57

Slide 57

Slide 58

Slide 58

You thought your problems are hard? Things under your control Server-side Updates IoT (Mobile, Automotive, Edge) Updates ✓ ✓ ✓ ✓ ✕ ✕ ✕ ✕ The availability of the target The state of the target The version on the target The access to the target @jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 59

Slide 59

Slide 60

Slide 60

KNIGHT-MARE @jbaruch #LiquidSoftware New system reused old APIs 1 out of 8 servers was not updated New clients sent requests to machine contained old code Engineers undeployed working code from updated servers, increasing the load on the not-updated server No monitoring, no alerting, no debugging #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 61

Slide 61

Continuous updates pattern: Automated deployment @jbaruch #LiquidSoftware Problem: People suck at repetitive tasks. Solution: Automate everything. #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 62

Slide 62

Continuous updates pattern: frequent updates @jbaruch #LiquidSoftware Problem: Seldom deployments generate anxiety and stress, leading to errors. Solution: Update frequently to develop skill and habit. #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 63

Slide 63

Continuous updates pattern: state awareness @jbaruch #LiquidSoftware Problem: Target state can affect the update process and the behavior of the system after the update. Solution: Know and consider target state when updating. Reverting might require revering the state. #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 64

Slide 64

@jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 65

Slide 65

Slide 66

Slide 66

Slide 67

Slide 67

Real life pattern: be kind @jbaruch #LiquidSoftware Problem: You shame someone publicly; week later shit happens to you. Solution: Don’t be a shmuck. #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 68

Slide 68

Cloud-dark @jbaruch #LiquidSoftware New rules are deployed frequently to battle attacks Deployment of a single misconfigured rule Included regex to spike CPU to 100% “Affected region: Earth” #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 69

Slide 69

Continuous updates pattern: Canary releases @jbaruch #LiquidSoftware Problem: Releasing a bug affects ALL the users. Solution: Release to a small number of users first effectively reducing the blast radius and observe. If a problem occurs, stop the release, revert or update the affected users. #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 70

Slide 70

@jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 71

Slide 71

Slide 72

Slide 72

Continuous updates pattern: zero downtime updates @jbaruch #LiquidSoftware Problem: You will probably loose all your users if you shut down for 5 weeks (and counting) to perform an update. Solution: Perform zerodowntime OTA small and fequent continuous updates. #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 73

Slide 73

Continuous updates @jbaruch #LiquidSoftware Frequent Automatic Tested Canary State-aware Observability *Local Rollbacks #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 74

Slide 74

Update available Yes Do we trust the update? Yes Do we want it? Are there any high risks? Sure, why not? (auto update) Yes Let’s update! No

Slide 75

Slide 75

” Our goal is to transition from bulk and rare software updates to extremely tiny and extremely frequent software updates; so tiny and so frequent that they provide an illusion of software flowing from development to the update target. We call it the Liquid Software vision. @jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 76

Slide 76

@jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 77

Slide 77

Corner cases? @jbaruch #LiquidSoftware #CloudAIdevfestMTL19 http://jfrog.com/shownotes

Slide 78

Slide 78

Q&A and twitter ads @jbaruch #LiquidSoftware #CloudAIdevfestMTL19 https://liquidsoftware.com https://jfrog.com/shownotes