From Servers to Serverless

A presentation at OCP Virtual Summit in May 2020 in by erik riedel

Slide 1

Slide 1

OCP Case Studies From Servers to Serverless in Ten Minutes Erik Riedel, PhD Senior Vice President, Engineering ITRenew

Slide 2

Slide 2

Slide 3

Slide 3

The Power of Hyperscale For All Optimized for your workload, from deskside to data center. No assembly. No guesswork. Just plug them in. PROVEN HYPERSCALE TECH BUILT ON OPEN ARCHITECTURE CONSISTENT PRODUCT DEPENDABLE SUPPLY BETTER-THANEVER TCO FLEXIBLE, SCALABLE

Slide 4

Slide 4

Servers

Slide 5

Slide 5

Serverless

Slide 6

Slide 6

in 10 minutes…or so…

Slide 7

Slide 7

Project goal and genesis SERVER Many service provider & enterprise SaaS companies OCP adoption could be accelerated by offering pre-designed and pre-qualified solutions for key computing use cases. are looking for solutions to roll onto the floor, plug in, and quickly run workloads. We are working with infrastructure software stacks and software partners to pre-design and pre-qualify solutions with OCP equipment.

Slide 8

Slide 8

Servers SERVER

Slide 9

Slide 9

Servers single or 2- socket nodes, 25 GbE connectivity flash-based storage nodes; millions of IOPS and terabytes of capacity external TOR switches (2x) ingress ingress ingress internal TOR switches (2x) compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute SERVER power zone BB compute compute compute compute compute compute compute compute compute storage storage storage storage storage storage storage storage storage mgmt mgmt mgmt infra infra infra power zone AA up to 45 nodes for Open Systems

Slide 10

Slide 10

Servers SERVER

Slide 11

Slide 11

external TOR switches (2x) Servers ingress ingress ingress internal TOR switches (2x) compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute compute single or 2- socket nodes, 25 GbE connectivity power zone BB compute compute compute compute compute compute compute compute compute compute compute storage storage storage compute storage storage storage compute storage storage storage infra mgmt mgmt mgmt power supply + switch infra infra infra up to 5 nodes Fast-Start flash-based storage nodes; millions of IOPS and terabytes of capacity power zone AA up to 45 nodes for Open Systems

Slide 12

Slide 12

Servers SERVER

Slide 13

Slide 13

Prerequisites – Before You Start Provisioning Server Linux DevOps Automation Our Choices Ubuntu 19.04 docker, Ansible Rancher Deployed Services DRP (PXE server) DHCP (static IPs) Digital Rebar docker registry Rancher

Slide 14

Slide 14

st 1 Step – PXE Entire Rack • IPMI power on all the discovered nodes for wipe, install, first boot 03 min 51 sec 07 min 34 sec 04 min 02 sec Power on -> Disk Erased Reboot -> Linux Installed (mSATA device) Reboot -> Boot from installed disk (*) • (including 2 min of PXE timeout and 30 sec of grub menu)

  • caveats for investigation: in some instances, the use of IPMI chassis bootdev at the end of the install, reverts UEFI to traditional BIOS mode and requires additional reboot step, extending the time

Slide 15

Slide 15

Servers SERVER SCALE compute compute compute CRIMSON 2x 12c 256GB 10G compute CRIMSON 2x 12c 256GB 10G compute CRIMSON 2x 12c 256GB 10G compute CRIMSON 2x 12c 256GB 10G 96 cores 1 TB memory compute compute infra power supply + switch DRP (PXE server) DHCP (static IPs)

Slide 16

Slide 16

st 1 Step – Success ! OSes Deployed

Slide 17

Slide 17

nd 2 Step – Deploy Rancher Deploy provisioning server via Ansible Provisioning server uses Ansible to deploy RKE k8s to all nodes Provisioning server deploys Rancher and prerequisites into the deployed k8s (*) * we use one controller/etcd + three workers in a 4-node cluster

Slide 18

Slide 18

nd 2 Step – Success! – Rancher Deployed

Slide 19

Slide 19

rd 3 Step – Configure Storage More details than can be shown here due to time constraints. See our blog post at link

Slide 20

Slide 20

th 4 Step – Configure Workloads Ansible to Rancher to provision pods to exercise cluster Ansible to Rancher to provision other monitoring tools Then workloads are visible at the k8s master IP which shows cluster and node health + dashboard + graphs of loads & resource usage

Slide 21

Slide 21

th 4 Step – Success! Workloads Configured

Slide 22

Slide 22

Let’s Review – How Long Did It All Take?

Slide 23

Slide 23

external TOR switches (2x) compute compute compute compute compute compute compute compute compute power zone BB compute compute compute compute infra power supply + switch 4 workload nodes (PoC platform) compute compute compute compute compute compute compute compute compute storage storage storage storage storage storage storage storage storage infra infra infra power zone AA 27 workload nodes (production platform) Fast-Start for Open Systems OCP Case Studies Case Study large-scale Sesame customer in the media & entertainment space SERVER

Slide 24

Slide 24

We Did This With Three Different Stacks RANCHER 20 minutes to full cluster readiness 1,650 pods 500 pods*

  • d o c k e r e r r o r s s t a r t e d a t 5 0 7 p o d s , c r a s h e d a t 6 0 0 p o d s KSPHERE 60 minutes to full cluster readiness 1,800 pods

500 pods* c o n t a i n e r s b e g a n t o d i e , k u b e l e t c r a s h e d a t 5 5 0 p o d s TALOS 31.5 minutes to full cluster readiness 3,000 pods 500 pods*

  • h e a l t h c h e c k s t o f a i l a t 5 0 0 b e g a n p o d s STACKS SLEEP CONTAINER NGINX CONTAINER ** note that all these tests were done after overriding the default 110 maximum pods per node, as set by Kubernetes

Slide 25

Slide 25

Conclusion Not quite 10 minutes, as we had hoped… …but 20 minutes is achievable (on a 4-node cluster), and the gaps to 10 minutes are clear (for both 4-node and 45-node clusters) SERVER

Slide 26

Slide 26

QUESTIONS? by 26

Slide 27

Slide 27

Product/Facility Info Fast-Start Sesame for Open Systems for Open Systems ITRenew Marketplace https://www.itrenew.com/sesame-solutions

Slide 28

Slide 28

Call to Action CHECK US OUT AT THE OCP MARKETPLACE: https://www.opencompute.org/circular-economy/5/sesame-for-open-systems QUESTIONS OR COMMENTS, REACH US: @RiedelAtWork https://github.com/SesameEngineering

Slide 29

Slide 29

The Power of Hyperscale For All Fast-Start and Rack-Scale Solutions optimized for your workload, from deskside to data center. NO ASSEMBLY. NO GUESSWORK. JUST PLUG THEM IN. learn more at: www.ITRenew.com/Sesame

Slide 30

Slide 30