<div dir="ltr"><div>Thread out.</div><div><br></div><div>Kind regards<br><br>Paul Wilkins</div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 22 Aug 2019 at 14:12, Matthew Moyle-Croft <<a href="mailto:mmc@mmc.com.au">mmc@mmc.com.au</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I preface this by saying - “I recommend that all my competitors spend their engineering time and effort on solving for this problem”.<br>
<br>
> 3 - If we figure a drive is good for 1M restarts, then you'd expect precession to cause 0.2% of disks to fail over a 5 year lifespan<br>
<br>
So, let’s look at the Backblaze numbers (<a href="https://www.backblaze.com/blog/2018-hard-drive-failure-rates/" rel="noreferrer" target="_blank">https://www.backblaze.com/blog/2018-hard-drive-failure-rates/</a>). Their DCs are in LA and AZ are approximately as north as Adelaide and Sydney are south. <br>
<br>
They see a 1.27% Annual failure rate across their fleet. If you look at their numbers the variation is by far due to the model than anything else. If you’re claiming 0.2% over FIVE years (ie. not annually but across 5 years), as before, this isn’t going to be a significant impact to the fleet. It’s also worth noting that they don’t seem to look at controller failure vs mechanical failure. <br>
<br>
An extra 40 drives per year against 4455 failures per annum. 40 drives per annum is around USD$8k ($200/drive at scale). If you said to BackBlaze, I’d like to save you $8k, but you’ve got to spend a few million to realign the racks, then, well, it’s not going to be a business case I’d defend. (Mostly cost would be rebuilding the DCs so the cooling was arranged properly as well as all the cabling, hot aisle containment etc etc).<br>
<br>
At 0.2% across 5 years you’d have to have more than 1000 drives to make this something worth caring about (1000 drives means 2 extra failed drives over 5 years).<br>
<br>
> 4 - Whether this shows up in MTBF depends on measurement techniques, and whether the effect is above the random noise<br>
<br>
See above.<br>
<br>
Feel free to argue how I’ve butchered probability etc, but I doubt it makes the business case “better”. <br>
<br>
MMC</blockquote></div>