OCP Case Studies From Servers to Serverless in Ten Minutes Erik Riedel, PhD Senior Vice President, Engineering ITRenew
A presentation at OCP Virtual Summit in May 2020 in by erik riedel
OCP Case Studies From Servers to Serverless in Ten Minutes Erik Riedel, PhD Senior Vice President, Engineering ITRenew
The Power of Hyperscale For All Optimized for your workload, from deskside to data center. No assembly. No guesswork. Just plug them in. PROVEN HYPERSCALE TECH BUILT ON OPEN ARCHITECTURE CONSISTENT PRODUCT DEPENDABLE SUPPLY BETTER-THANEVER TCO FLEXIBLE, SCALABLE
Servers
Serverless
in 10 minutes…or so…
Project goal and genesis SERVER Many service provider & enterprise SaaS companies OCP adoption could be accelerated by offering pre-designed and pre-qualified solutions for key computing use cases. are looking for solutions to roll onto the floor, plug in, and quickly run workloads. We are working with infrastructure software stacks and software partners to pre-design and pre-qualify solutions with OCP equipment.
Servers SERVER
Servers single or 2- socket nodes, 25 GbE connectivity flash-based storage nodes; millions of IOPS and terabytes of capacity external TOR switches (2x) ingress ingress ingress internal TOR switches (2x) compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute SERVER power zone BB compute compute compute compute compute compute compute compute compute storage storage storage storage storage storage storage storage storage mgmt mgmt mgmt infra infra infra power zone AA up to 45 nodes for Open Systems
Servers SERVER
external TOR switches (2x) Servers ingress ingress ingress internal TOR switches (2x) compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute single or 2- socket nodes, 25 GbE connectivity power zone BB compute compute compute compute compute compute compute compute compute compute compute storage storage storage compute storage storage storage compute storage storage storage infra mgmt mgmt mgmt power supply + switch infra infra infra up to 5 nodes Fast-Start flash-based storage nodes; millions of IOPS and terabytes of capacity power zone AA up to 45 nodes for Open Systems
Servers SERVER
Prerequisites – Before You Start Provisioning Server Linux DevOps Automation Our Choices Ubuntu 19.04 docker, Ansible Rancher Deployed Services DRP (PXE server) DHCP (static IPs) Digital Rebar docker registry Rancher
st 1 Step – PXE Entire Rack • IPMI power on all the discovered nodes for wipe, install, first boot 03 min 51 sec 07 min 34 sec 04 min 02 sec Power on -> Disk Erased Reboot -> Linux Installed (mSATA device) Reboot -> Boot from installed disk (*) • (including 2 min of PXE timeout and 30 sec of grub menu)
Servers SERVER SCALE compute compute compute CRIMSON 2x 12c 256GB 10G compute CRIMSON 2x 12c 256GB 10G compute CRIMSON 2x 12c 256GB 10G compute CRIMSON 2x 12c 256GB 10G 96 cores 1 TB memory compute compute infra power supply + switch DRP (PXE server) DHCP (static IPs)
st 1 Step – Success ! OSes Deployed
nd 2 Step – Deploy Rancher Deploy provisioning server via Ansible Provisioning server uses Ansible to deploy RKE k8s to all nodes Provisioning server deploys Rancher and prerequisites into the deployed k8s (*) * we use one controller/etcd + three workers in a 4-node cluster
nd 2 Step – Success! – Rancher Deployed
rd 3 Step – Configure Storage More details than can be shown here due to time constraints. See our blog post at link
th 4 Step – Configure Workloads Ansible to Rancher to provision pods to exercise cluster Ansible to Rancher to provision other monitoring tools Then workloads are visible at the k8s master IP which shows cluster and node health + dashboard + graphs of loads & resource usage
th 4 Step – Success! Workloads Configured
Let’s Review – How Long Did It All Take?
external TOR switches (2x) compute compute compute compute compute compute compute compute compute power zone BB compute compute compute compute infra power supply + switch 4 workload nodes (PoC platform) compute compute compute compute compute compute compute compute compute storage storage storage storage storage storage storage storage storage infra infra infra power zone AA 27 workload nodes (production platform) Fast-Start for Open Systems OCP Case Studies Case Study large-scale Sesame customer in the media & entertainment space SERVER
We Did This With Three Different Stacks RANCHER 20 minutes to full cluster readiness 1,650 pods 500 pods*
500 pods* c o n t a i n e r s b e g a n t o d i e , k u b e l e t c r a s h e d a t 5 5 0 p o d s TALOS 31.5 minutes to full cluster readiness 3,000 pods 500 pods*
Conclusion Not quite 10 minutes, as we had hoped… …but 20 minutes is achievable (on a 4-node cluster), and the gaps to 10 minutes are clear (for both 4-node and 45-node clusters) SERVER
QUESTIONS? by 26
Product/Facility Info Fast-Start Sesame for Open Systems for Open Systems ITRenew Marketplace https://www.itrenew.com/sesame-solutions
Call to Action CHECK US OUT AT THE OCP MARKETPLACE: https://www.opencompute.org/circular-economy/5/sesame-for-open-systems QUESTIONS OR COMMENTS, REACH US: @RiedelAtWork https://github.com/SesameEngineering
The Power of Hyperscale For All Fast-Start and Rack-Scale Solutions optimized for your workload, from deskside to data center. NO ASSEMBLY. NO GUESSWORK. JUST PLUG THEM IN. learn more at: www.ITRenew.com/Sesame