Don't Panic – Launching Websites Confidently and Successfully

A presentation at DevOps Tallinn 2018 in May 2018 in Tallinn, Estonia by Ryan Townsend

Slide 1

Slide 1

Don’t Panic! How to launch a large-scale website confidently and successfully Photo by SpaceX on Unsplash DevOps Tallinn 2018

Slide 2

Slide 2

Who am I? @ryantownsend Ryan Townsend, CTO

Slide 3

Slide 3

Relaunched May 2017

Slide 4

Slide 4

“Just use auto-scaling and forget about it” Kris Quigley – Lead Developer @ SHIFT (sarcasm)

Slide 5

Slide 5

Timeline Development Pre-launch Launch ! Post-launch

Slide 6

Slide 6

• Functional Testing • Deployment Pipelines • Configuration & Implementation

Slide 7

Slide 7

Development http://www.spacex.com/media-gallery/detail/149431/9391

Slide 8

Slide 8

Keep Things Simple

Slide 9

Slide 9

Limit Project Scope

Slide 10

Slide 10

New Problem or New Technology

Slide 11

Slide 11

“A l m o s t a l l t h e c a s e s w h e r e I ' v e h e a r d o f a
system that was built as a microservice system from scratch, it has ended up in serious trouble.” – Martin Fowler, ThoughtWorks CTO

Slide 12

Slide 12

Clear Decoupling

Slide 13

Slide 13

Admin Panel API Website

Slide 14

Slide 14

Use Boring Mature Technology

Slide 15

Slide 15

Load Testing

Slide 16

Slide 16

Don’t wait until the end

Slide 17

Slide 17

It’s A LOT harder than people let on

Slide 18

Slide 18

• Use real metrics and logged user behaviour • Use a wide variety of metrics, not just traffic • Post-test validate the metrics at source

Slide 19

Slide 19

Assume user behaviour will change

Slide 20

Slide 20

Stress Test

Slide 21

Slide 21

Web Performance Testing

Slide 22

Slide 22

Remember: it’s not just for you!

Slide 23

Slide 23

Caching

Slide 24

Slide 24

Client CDN Application Database

Slide 25

Slide 25

Write-through caches

Slide 26

Slide 26

Start small… low TTLs

Slide 27

Slide 27

Front-end – static assets & redirects

Slide 28

Slide 28

Higher hit ratios = less traffic hitting our servers

Slide 29

Slide 29

Feature Toggles

Slide 30

Slide 30

Ideal Fallback Off On

Slide 31

Slide 31

On Ideal Fallback Off

Slide 32

Slide 32

• Built into your application • Content Delivery Network • A/B testing tool

Slide 33

Slide 33

Circuit Breakers

Slide 34

Slide 34

Ideal Fallback Open Error Closed

Slide 35

Slide 35

Ideal Fallback Open Error Closed

Slide 36

Slide 36

Ideal Fallback Open Error Closed

Slide 37

Slide 37

Pre-launch Preparations https://www.flickr.com/photos/spacex/31450835954/

Slide 38

Slide 38

Communication

Slide 39

Slide 39

• Build a trusting relationship with stakeholders • Understand their metrics • Get their perspective • Determine authority

Slide 40

Slide 40

Visibility

Slide 41

Slide 41

• System monitoring 
 – infrastructure & client-side

• Client / stakeholder dashboards & reporting 
 – see what they see

• Customer engagement 
 – social media, customer support

• Instant access to logs 
 – filterable, searchable

Slide 42

Slide 42

Above shows how New Relic tracked a 3rd party script harming site performance but the server-side was fine.

Slide 43

Slide 43

Roleplay

Slide 44

Slide 44

• What could go wrong? • Who would you escalate to? • How would you solve? • What people do you need access to? • What systems do you need access to?

Slide 45

Slide 45

Traffic Reduction

Slide 46

Slide 46

Slide 47

Slide 47

• Avoid scheduling big campaigns • Paid advertising is easy to turn off • Reduce offering

Slide 48

Slide 48

Launch Day https://unsplash.com/photos/yJv97tE7GDM

Slide 49

Slide 49

Scale-up

Slide 50

Slide 50

“Big Bang” vs Canary Release

Slide 51

Slide 51

Feature Toggles: Off

Slide 52

Slide 52

Keep Calm and Carry On

Slide 53

Slide 53

• Expect issues • Keep a level-head • Remain professional • You’re an expert – you’ve got this "

Slide 54

Slide 54

Post-launch https://unsplash.com/photos/-p-KCm6xB9I

Slide 55

Slide 55

Continue Building Confidence

Slide 56

Slide 56

• Gather actual real metrics & usage patterns • Revisit your load tests and re-assess • Re-run load tests for future releases • Ship some safe releases • Ship small releases, often

Slide 57

Slide 57

Since Launch https://unsplash.com/photos/MEW1f-yu2KI

Slide 58

Slide 58

Optimising Caching

Slide 59

Slide 59

Strong Migrations

Slide 60

Slide 60

Started working towards micro macro-services

Slide 61

Slide 61

Event Sourcing

Slide 62

Slide 62

Static Site Generation

Slide 63

Slide 63

Communication is Paramount

Slide 64

Slide 64

Thank you

@ryantownsend