When Bad Things Happen To Good Disks - aka Disks Don’t Have File Descriptors

A presentation at LinuxCon + CloudOpen + ContainerCon North America 2015 in in Seattle, WA, USA by erik riedel

This talk outlines some of the complexity challenges faced by devs (at their desks) and ops personnel (in the data centers, 6 months later) when trying to design for and then diagnose a widely distributed storage system subject to the slings & arrows of outrageous fortune. A modest sized system with 50 disks per node and 500 nodes has 25,000 disk drives; 30,000 file systems (when everything is working fine); 100 billion files; 1 million open file descriptors (when fine); 10 million hourly log messages (when fine, 1 billion when not). The layering in the Linux storage stack (sata, sas, ses, sg, sd, dm, lvm, fs, etc) is great when trying to find a creative solution to a single-node storage setup, but can be a real pain when trying to diagnose what is going wrong at these scales. We’ll outline how we’ve attacked the problem so far, and where we still daily feel the pain.