Scaling Down While Scaling Up

A presentation at OCP Regional Summit in April 2024 in Lisbon, Portugal by erik riedel

Slide 1

Slide 1

Scaling Down While Scaling Up Design Choices That Increase Efficiency & Performance

Slide 2

Slide 2

EMEA DEPLOYMENTS DEPLOYMENTS SUSTAINABILITY Scaling Down While Scaling Up – Design Choices That Increase Efficiency & Performance Erik Riedel, PhD, Chief Engineering Officer Flax Computing Hesham Ghoneim, PhD, Strategic Product Development Sims Lifecycle Services

Slide 3

Slide 3

Abstract This talk presents a detailed design study of several OCP hyperscale systems vs previous server & rack designs, including trade-offs & choices that reduce complexity, reduce costs, increase performance, and increase scalability. We will also quantify that a clear side-effect of these choices is to reduce the carbon footprint of data center deployments & operations. These carbon savings arise from both operational & scope 3 benefits, and they continue to accrue the longer systems are productively utilized. The power of collaborative, “out of the box” design approaches that have crossed traditional industry silos – across hardware, software, & operations – have enabled innovative designs that have proven their value and influence across the industry. The fact that these designs were pursued with simplicity and efficiency in mind mean that they also inherently consume less resources in both energy, materials, and operational effort paying dividends in the short-run and long-run.

Slide 4

Slide 4

Expected Learnings Talk attendees will learn about the details of several OCP server, storage & GPU designs that rethink the traditional data center server to provide clear benefits in reduced complexity, reduced costs, increased performance, & increased efficiency at scale. By reviewing a small subset of open hardware designs from the 10 year history of the Open Compute Project, we will outline the clear engineering trade-offs that were made, and the resulting gains in efficiency & scalable deployment. Attendees will also learn how these optimizations have direct benefits to the reduction of carbon footprints – both energy & materials consumption – when combined with modern data center infrastructure, deployment, & software architectures. The combination of creative hardware designs, software architectures, & operational approaches developed in wide industry collaborations with traditionally siloed teams and companies operating together have made these individual evolutions into a true revolution.

Slide 5

Slide 5

Social 280 Scaling Down While Scaling Up – learn about OCP designs that reduce complexity, costs & carbon footprints, while increasing performance & efficiency. By innovating across industry silos, these collaborations have made a series of design evolutions into a true revolution.

Slide 6

Slide 6

Outline • Intro – carbon footprint of computing • Design Footprints – compare OCP to traditional server designs • Materials Footprints – material + carbon impacts • Materials Reuse / Recycle – extend life, reduce carbon impacts • Server Density – can we improve further • Workload / Carbon – example: memory density • Conclusions + Call To Action

Slide 7

Slide 7

Slide 8

Slide 8

natural resources carbon footprint demand growth

Slide 9

Slide 9

design footprints

Slide 10

Slide 10

Slide 11

Slide 11

2 servers / 2U - dual CPUs - 24x memory sockets - two 1U heatsinks - twelve 1U fans - eight SATA drives 2.5” - two PSUs (OCP Leopard) 3 servers / 2 OU - dual CPUs - 16x memory sockets - two 2U heatsinks - two 2U fans - six NVMe drives M.2 - shared PSUs (PowerEdge R630)

Slide 12

Slide 12

Dell R630 server rack cpus 2 80 dimms 24 960 heatsinks 2 80 • two 1U heatsinks fans 12 480 • twelve 1U fans 2.5” SATA 8 640 • eight SATA drives 2.5” psus 2 80 • 2 servers / 2U • dual CPUs • 24x memory sockets • two PSUs • 40 servers / rack

Slide 13

Slide 13

OCP Leopard/v2 server rack cpus 2 96 dimms 16 768 heatsinks 2 96 • two 2U heatsinks fans 2 96 • two 2U fans M.2 NVMe 6 288 • six NVMe drives M.2 psus

6 • 3 servers / 2OU • dual CPUs • 16x memory sockets • shared PSUs • 48 servers / rack

Slide 14

Slide 14

materials footprints

Slide 15

Slide 15

Slide 16

Slide 16

server weight (g) rack total (kg) cpus 2 54 80 4.3 dimms 24 10 960 9.6 heatsinks 2 165 80 13.2 motherboard 1 875 40 35.0 fans 12 20 480 9.6 2.5” SATA 8 90 320 28.8 2.5” drive caddy 8 54 320 17.3 NIC 4x 10G 1 320 40 12.8 PCI riser (NIC) 1 90 40 3.6 PCI riser (unused) 1 130 40 5.2 psus 2 565 80 45.2 chassis 1 17.6 kg 40 704.0 pdu

35 kg 1 35.0 power cables 2 120 80 9.6 TOTAL rack

145 kg 1 145.0 1078 kg

Slide 17

Slide 17

server weight rack total cpus 2 54 96 5.2 dimms 16 10 768 7.7 heatsinks 2 185 96 17.8 motherboard 1 325 48 15.6 fans 2 30 96 2.9 M.2 NVMe 6 75 288 21.6 AVA board 2 307 96 29.5 NIC 2x 25G mezz 1 154 48 7.4 PCI riser (x16 + x8) 1 92 48 4.4

— 0 48 0.0 psus

5.5 kg 6 33.0 chassis 1 7.5 kg 48 360.0 power shelf

25 kg 1 25.0 busbar

15 kg 1 15.0 TOTAL rack

175 kg 1 175.0 720 kg

Slide 18

Slide 18

server weight (g) rack total (kg) cpus 2 54 80 4.3 1,120 dimms 24 10 960 9.6 cores heatsinks 2 165 80 13.2 motherboard 1 875 40 35.0 fans 12 20 480 9.6 2.5” SATA 8 90 320 28.8 2.5” drive caddy 8 54 320 17.3 NIC 4x 10G 1 320 40 12.8 PCI riser (NIC) 1 90 40 3.6 PCI riser (unused) 1 130 40 5.2 psus 2 565 80 45.2 chassis 1 17.6 kg 40 704.0 pdu

35 kg 1 35.0 power cables 2 120 80 9.6 TOTAL rack

145 kg 1 145.0 1078 kg just over 1 core per kg

Slide 19

Slide 19

server weight (g) rack total (kg) cpus 2 54 80 4.3 0.4% dimms 24 10 960 9.6 0.9% heatsinks 2 165 80 13.2 motherboard 1 875 40 35.0 fans 12 20 480 9.6 2.5” SATA 8 90 320 28.8 2.5” drive caddy 8 54 320 17.3 NIC 4x 10G 1 320 40 12.8 PCI riser (NIC) 1 90 40 3.6 PCI riser (unused) 1 130 40 5.2 psus 2 565 80 45.2 chassis 1 17.6 kg 40 704.0 pdu

35 kg 1 35.0 power cables 2 120 80 9.6 TOTAL rack

145 kg 1 145.0 1078 kg only 1.3% of rack is the “computing” elements

Slide 20

Slide 20

server weight rack total cpus 2 54 96 5.2 1,344 dimms 16 10 768 7.7 cores heatsinks 2 185 96 17.8 motherboard 1 325 48 15.6 fans 2 30 96 2.9 M.2 NVMe 6 75 288 21.6 AVA board 2 307 96 29.5 NIC 2x 25G mezz 1 154 48 7.4 PCI riser (x16 + x8) 1 92 48 4.4

— 0 48 0.0 psus

5.5 kg 6 33.0 chassis 1 7.5 kg 48 360.0 power shelf

25 kg 1 25.0 busbar

15 kg 1 15.0 TOTAL rack

175 kg 1 175.0 720 kg almost 2 cores per kg

Slide 21

Slide 21

server weight rack total cpus 2 54 96 5.2 0.7% dimms 16 10 768 7.7 1.1% heatsinks 2 185 96 17.8 motherboard 1 325 48 15.6 fans 2 30 96 2.9 M.2 NVMe 6 75 288 21.6 AVA board 2 307 96 29.5 NIC 2x 25G mezz 1 154 48 7.4 PCI riser (x16 + x8) 1 92 48 4.4

— 0 48 0.0 psus

5.5 kg 6 33.0 chassis 1 7.5 kg 48 360.0 power shelf

25 kg 1 25.0 busbar

15 kg 1 15.0 TOTAL rack

175 kg 1 175.0 720 kg only 1.8% of rack is the “computing” elements

Slide 22

Slide 22

server weight rack total cpus 2 54 96 5.2 0.7% dimms 16 10 768 7.7 1.1% heatsinks 2 185 96 17.8 motherboard 1 325 48 15.6 fans 2 30 96 2.9 M.2 NVMe 6 75 288 21.6 AVA board 2 307 96 29.5 NIC 2x 25G mezz 1 154 48 7.4 PCI riser (x16 + x8) 1 92 48 4.4

— 0 48 0.0 psus

5.5 kg 6 33.0 chassis 1 7.5 kg 48 360.0 power shelf

25 kg 1 25.0 busbar

15 kg 1 15.0 TOTAL rack

175 kg 1 175.0 720 kg only 1.8% of rack is the “computing” elements 3.0% to 4.8% with storage elements included

Slide 23

Slide 23

Example – Fans • twelve 1U fans • two 2U fans

Slide 24

Slide 24

Example – Fans • twelve 1U fans • two 2U fans 24 / 2U 6 / 2OU

Slide 25

Slide 25

…but wait, there’s more… (even denser server options)

Slide 26

Slide 26

2 servers / 2U 8 sockets, 96 dimms (PowerEdge FC830)

  • quad CPUs - 48x memory sockets - four 1U heatsinks - twenty 1U fans - eight SAS/SATA drives 2.5” - two PSUs (OCP Leopard) 3 servers / 2 OU - dual CPUs - 16x memory sockets - two 2U heatsinks - two 2U fans - six NVMe drives M.2 - shared PSUs 6 sockets, 48 dimms

Slide 27

Slide 27

server weight (g) rack total (kg) cpus 4 54 160 8.6 3,520 dimms 48 10 1920 19.2 cores heatsinks 4 165 160 26.4 motherboard 1 875 40 35.0 fans 12 20 480 9.6 2.5” SATA 8 90 320 28.8 2.5” drive caddy 8 54 320 17.3 NIC 4x 10G 1 185 40 7.4 PCI riser (NIC) 1 65 40 2.6 PCI riser (unused) 3 65 120 7.8 psus 1 565 40 22.6 chassis 1 18.4 kg 40 736.0 pdu

35 kg 1 35.0 power cables 1 120 40 4.8 TOTAL rack

145 kg 1 145.0 1106 kg over 3 cores per kg

Slide 28

Slide 28

materials reuse

Slide 29

Slide 29

Slide 30

Slide 30

server weight rack total cpus 2 54 96 5.2 dimms 16 10 768 7.7 heatsinks 2 185 96 17.8 motherboard 1 325 48 15.6 fans 2 30 96 2.9 M.2 NVMe 6 75 288 21.6 AVA board 2 307 96 29.5 NIC 2x 25G mezz 1 154 48 7.4 PCI riser (x16 + x8) 1 92 48 4.4 psus

5.5 kg 6 33.0 chassis 1 7.5 kg 48 360.0 power shelf

25 kg 1 25.0 busbar

15 kg 1 15.0 TOTAL rack

175 kg 1 175.0 720 kg

Slide 31

Slide 31

REUSE (MEMORY COMPONENTS)

Slide 32

Slide 32

cpus dimms server weight rack total 2 54 96 5.2 1,344 cores socketed cpu – fits any LGA2011 Haswell/Broadwell server or workstation socket 16 10 768 7.7 12 TB mem memory modules fit any DDR4 server ECC socket

Slide 33

Slide 33

So why does it matter? 1. Internal / external reuse 2. Economics – market value DIMM manufacturer In use typically for 3 – 5 years Server decommissioning Harvested, re-tested, re-programmed DIMM One tonne recycled ferrous

  1. Avoided emissions per rack kg CO2 emissions avoided 1,471 Rack recycle 23,198 As compared to manufacturing new units Reuse CPU, DIMMs, recycle rest Harvested, re-tested, re-programmed DIMM Five new DIMMs

Slide 34

Slide 34

Automation is key for reuse of components Automated dismantling / removal from rack

Slide 35

Slide 35

Automation is key for reuse of components Automate DIMM removal, code scanning and boxing

Slide 36

Slide 36

Automation is key for reuse of components Automated heat sink and CPU removal

Slide 37

Slide 37

REUSE SHRED (STORAGE COMPONENTS)

Slide 38

Slide 38

M.2 NVMe server weight rack total 6 75 288 21.6 kg CO2 emissions avoided 23,198 1,471 Rack recycle 49,740 Reuse CPUs, Reuse SSDs, CPUs, DIMMs, recycle rest DIMMs, recycle rest flash drives fit any m.2/nvme server, workstation, or desktop but…usually need to be shredded.

Slide 39

Slide 39

RECYCLE (METAL COMPONENTS)

Slide 40

Slide 40

server weight rack total heatsinks 2 185 96 17.8 chassis 1 7.5 kg 48 360.0 busbar

15 kg 1 15.0 rack

175 kg 1 175.0 metal extracted for direct recycling kg CO2 emissions avoided 1,471 788 Rack recycle excl. computing Entire rack recycle

Slide 41

Slide 41

RECYCLE REUSE (SYSTEMS COMPONENTS)

Slide 42

Slide 42

server weight rack total motherboard 1 325 48 15.6 fans 2 30 96 2.9 AVA board 2 307 96 29.5 NIC 2x 25G mezz 1 154 48 7.4 PCI riser (x16 + x8) 1 92 48 4.4 difficult materials extraction extract precious metals such as Au, Ag, Pd, and Cu from PCBs

Slide 43

Slide 43

…can we do better ? (more reuse, repurpose, redeploy) kg CO2 emissions avoided 49,740 60,406 23,198 1,471 Entire rack recycle Reuse CPUs, Reuse SSDs, CPUs, Reuse all 48 servers DIMMs, recycle rest DIMMs, recycle rest

Slide 44

Slide 44

Slide 45

Slide 45

…additional details… (higher density design optimization)

Slide 46

Slide 46

1OU Tioga Pass • OCP Tioga Pass • in 1 OU form factor • low-profile heat sinks • low-profile 4x NVMe M.2 • dual SkyLake-SP, CascadeLake-SP • 768GB memory (12x 64GB) • 15 TB NVMe (4x 3.84TB) • dual 25G net + host mgmt • optional dual 100G net

Slide 47

Slide 47

2 servers / 2U 8 sockets, 96 dimms (PowerEdge FC830)

  • quad CPUs - 48x memory sockets - four 1U heatsinks - twenty 1U fans - eight SAS/SATA drives 2.5” - two PSUs (OCP Tioga Pass) 3 servers / 1 OU - dual CPUs - 12x memory sockets - two 2U heatsinks - two 2U fans - eight NVMe drives M.2 - shared PSUs 12 sockets, 72 dimms

Slide 48

Slide 48

server weight rack total cpus 2 54 192 10.4 3,840 dimms 16 10 1536 15.4 cores heatsinks 2 185 192 35.5 motherboard 1 325 96 31.2 fans 2 30 192 5.8 M.2 NVMe 6 75 576 43.2 AVA board 2 307 192 58.9 NIC 2x 25G mezz 1 154 96 14.8 PCI riser (x16 + x8) 1 92 96 8.8

— 0 48 0.0 psus

5.5 kg 10 55.0 chassis 1 75 kg 96 720.0 power shelf

25 kg 1 25.0 busbar

15 kg 1 15.0 TOTAL rack

175 kg 1 175.0 1215 kg over 3 cores per kg

Slide 49

Slide 49

workload / carbon

Slide 50

Slide 50

Slide 51

Slide 51

Example – DIMMs • Dell R630 • 24x DDR4 modules, 2x cpu Broadwell • OCP Leopard/v2 • 16x DDR4 modules, 2x cpu Broadwell • OCP Tioga Pass • 12x DDR4 modules , 2x cpu SkyLake • Dell FC830 • 32x DDR modules , 4x cpu Broadwell

Slide 52

Slide 52

Examples – DIMMs by Size per server GB R630 Leopard Tioga Pass FC830 8 GB DDR4 192 128 96 256 16 GB DDR4 384 256 192 512 32 GB DDR4 768 512 384 1024 64 GB DDR4 1536 1024 768 2048

Slide 53

Slide 53

Examples – DIMMs by Size per rack GB R630 Leopard Tioga Pass FC830 8 GB DDR4 7,680 6,144 9,216 10,240 16 GB DDR4 15,360 12,288 18,432 20,480 32 GB DDR4 30,720 24,576 36,864 40,960 64 GB DDR4 61,440 49,152 73,728 81,920

Slide 54

Slide 54

See It For Yourself

Slide 55

Slide 55

Slide 56

Slide 56

Slide 57

Slide 57

Call to Action • Reach out to us to get involved • Engage us to evaluate / quantify your server carbon footprints • www.flaxcomputing.com • www.simslifecycle.com • Evaluate your own servers, share the results with us report @ flaxcomputing.com • Contribute measurements and component details (reuse stories, workload information, etc.) and share what components you would be most interested in data @ flaxcomputing.com sls.sustainability@simsmm.com

Slide 58

Slide 58

Thank you!