A presentation at Velocity Conf by Ryan Neal
How Netlify Migrated to a Multicloud Architecture And no one noticed
ryan@ rybit
Who am I? @ry_boflavin
Dog Dad
Engineer
Fire Spinner
Netlify
... What is Netlify? Netlify is the simplest way to build, deploy, and manage web projects on the JAMstack. We're changing the way the web is built by collapsing the modern front-end development process into a single, simplified workflow.
... What is Netlify? Over
What is Netlify? Over
...
What am I going to talk about? 1. Intro to the system 2. Why we did all this work 3. How we accomplished it 4. The actual migration 5. Next steps
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
Health checking for everything
Getting Data into the system
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
Getting Data out of the system
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
DB Cluster
Origin Cluster
CDN
Content Delivery
origin
origin
origin
origin
origin
origin
cdn node
cdn node
cdn node
cdn node
cdn node
cdn node
Cloud
Files
DNS
Cool, but where
are the actual servers?
DB Cluster Origin Cluster CDN But where does it live? origin origin origin origin origin origin
DB Cluster Origin Cluster CDN But where does it live? origin origin origin origin origin origin
DB Cluster Origin Cluster CDN But where does it live? origin origin origin origin origin origin
DB Cluster Origin Cluster CDN But where does it live? origin origin origin origin origin origin
DB Cluster Origin Cluster CDN But where does it live? origin origin origin origin origin origin
DB Cluster Origin Cluster CDN But where does it live? origin origin origin origin origin origin
DB Cluster Origin Cluster CDN But where does it live? origin origin origin origin origin origin
DB Cluster Origin Cluster CDN And when it fails? origin origin origin origin origin origin DNS
DB Cluster Origin Cluster CDN And when it fails? origin origin origin origin origin origin DNS
DB Cluster Origin Cluster CDN And when it fails? origin origin origin origin origin origin DNS
DB Cluster Origin Cluster CDN And when it fails? origin origin origin origin origin origin DNS
DB Cluster Origin Cluster CDN And when it fails? origin origin origin origin origin origin DNS
DB Cluster Origin Cluster CDN And when it fails? origin origin origin origin origin origin DNS
DB Cluster Origin Cluster CDN And when it fails? origin origin origin origin origin origin DNS
DB Cluster Origin Cluster CDN And when it fails? origin origin origin origin origin origin DNS
What happens when things go wrong?
Higher traffic sites are going to be happier What happens when things go wrong?
Higher traffic sites are going to be happier What happens when things go wrong?
Higher traffic sites are going to be happier What happens when things go wrong?
Same cloud is fastest Multicloud Setup
Cloud Files
Cloud Files
Why do all of this? Because clouds fail
But how do we build around that?
But how do we build around that?
Steps to Multicloud 1. Double check assumptions 2. Replicate all the objects 3. Prepare the database 4. Make the origin services cloud agnostic 5. Test everything 6. Do the actual cutover
Steps to Multicloud 1. Double check assumptions 2. Replicate all the objects 3. Prepare the database 4. Make the origin services cloud agnostic 5. Test everything 6. Do the actual cutover
Assumption Checking https://github.com/rybit/cloud-bench
Steps to Multicloud 1. Double check assumptions 2. Replicate all the objects 3. Prepare the database 4. Make the origin services cloud agnostic 5. Test everything 6. Do the actual cutover
Replicate it all
origin
origin
origin
origin
primary
CF
1
2
3
{
"_id" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
"size" : 9935,
"sha" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
"created_at" : ISODate("2018-06-07T21:02:29.240Z"),
}
Replicate it all
origin
origin
origin
origin
primary
CF
1
2
3
{
"_id" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
"size" : 9935,
"sha" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
"created_at" : ISODate("2018-06-07T21:02:29.240Z"),
}
Replicate it all
origin
origin
origin
origin
primary
CF
1
2
3
{
"_id" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
"size" : 9935,
"sha" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
"created_at" : ISODate("2018-06-07T21:02:29.240Z"),
“m”: 1 } RAX = 1 AWS = 2 GCP = 4 Upload mask Example: m = 6
→ AWS & GCP
m = 3
→ AWS & RAX
m = 1
→ RAX only
Records progress and errors BlobSync
Records progress and errors
Records progress and errors
Records progress and errors
Records progress and errors
Replicate it all
origin
origin
origin
origin
primary
CF
1
2
3
{
"_id" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
"size" : 9935,
"sha" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
"created_at" : ISODate("2018-06-07T21:02:29.240Z"),
“m”: 1 }
Replicate it all
{
"_id" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
"size" : 9935,
"sha" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
"created_at" : ISODate("2018-06-07T21:02:29.240Z"),
“m”: 1,
“r”: true
}
Replication Flag
Spares index in mongo
origin
origin
origin
origin
primary
CF
1
2
3
State of the world origin origin origin origin primary CF CDN
State of the world origin origin origin origin primary CF GCS S3 BlobSync CDN
Steps to Multicloud 1. Double check assumptions 2. Replicate all the objects 3. Prepare the database 4. Make the origin services cloud agnostic 5. Test everything 6. Do the actual cutover
State of the world origin origin origin origin CF GCS S3 BlobSync CDN primary
Steps to Multicloud 1. Double check assumptions 2. Replicate all the objects 3. Prepare the database 4. Make the origin services cloud agnostic 5. Test everything 6. Do the actual cutover
Forceable overrides
Smart Resolution
origin
CDN
primary
{
"sha" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
“m”: 7
}
Smart Resolution
primary
{
"sha" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
“m”: 7
}
origin
CDN
Smart Resolution
origin
CDN
primary
{
"sha" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
“m”: 7
}
Smart Resolution
origin
CDN
primary
{
"sha" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
“m”: 7
}
Smart Resolution
origin
CDN
primary
{
"sha" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
“m”: 7
}
Smart Resolution
origin
CDN
primary
{
"sha" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
“m”: 7
}
Smart Resolution
origin
CDN
primary
{
"sha" : "9c74b7c31a3c04634ddf1d54d2339e0163dcf4a7",
“m”: 7
}
Steps to Multicloud 1. Double check assumptions 2. Replicate all the objects 3. Prepare the database 4. Make the origin services cloud agnostic 5. Test everything 6. Do the actual cutover
State of the world origin origin origin origin CF origin origin origin origin GCS origin origin origin origin S3 BlobSync CDN primary
State of the world origin origin origin origin CF origin origin origin origin GCS origin origin origin origin S3 BlobSync CDN primary
Steps to Multicloud 1. Double check assumptions 2. Replicate all the objects 3. Prepare the database 4. Make the origin services cloud agnostic 5. Test everything 6. Do the actual cutover
Steps to Multicloud 1. Double check assumptions 2. Replicate all the objects 3. Prepare the database 4. Make the origin services cloud agnostic 5. Test everything 6. Do the actual cutover
Pulling the trigger
1.
Spin up enough origin services
2.
Fail over the DB
3.
Update the consul entry
4.
Aggressively stare at monitors
State of the world origin origin origin origin CF origin origin origin origin GCS origin origin origin origin S3 BlobSync CDN primary
primary State of the world origin origin origin origin CF origin origin origin origin GCS origin origin origin origin S3 BlobSync CDN
primary State of the world origin origin origin origin CF origin origin origin origin GCS origin origin origin origin S3 BlobSync CDN
Automated failover Summary
More monitoring
WE ARE HIRING
ryan@ rybit
Find me to talk! @ry_boflavin
Netlify’s CDN runs on six different cloud providers, but up until recently, its origin servers relied on only one. That meant its origin servers were subject to the performance and uptime characteristics of a single provider. The company needed make sure that in the face of a major outage from an underlying provider anywhere in its network, Netlify’s service would continue with minimal interruption.
Ryan Neal explains how Netlify planned, tested, and executed its first multicloud migration that could direct traffic to Google Cloud (GCP), AWS, and Rackspace Cloud on demand, without any service interruptions. Along the way, Ryan shares lessons learned and key takeaways you can apply to your own infrastructure decisions.