RAID: Not A Perfect Panacea
Continued from ‘Storage Impermanence: Now It’s Personal‘…
At this point, I did not immediately initiate a RAID rebuild. Why? To answer this question, I’m going to finally get a to-write-about topic off my list. Last year’s USENIX Conference on FAST (File and Storage Technologies) showcased two particularly notable papers, one from Carnegie Mellon University and the other from Google, whose compelling findings included:
- Expensive ‘enterprise’ drives don’t have notably better reliability than their ‘consumer’ counterparts (consider this conclusion in the context of my past recommendation of Western Digital 10,000 RPM Raptor SATA HDDs as a credible alternative to other manufacturers’ much more costly SAS drives)
- S.M.A.R.T. error reporting only encompasses a fraction of all experience HDD failure mechanisms, and, specifically to this writeup’s theme,
- RAID 1 and 5 are less robust than might appear to be the case at first glance…particularly when (as in my case…ahem) all of the drives in the RAID array come from the same manufacturer, and especially when they come from the same manufacturing lot. If one drive fails, the likelihood that a second drive will fail shortly thereafter is uncomfortably…likely.
I therefore needed to get the information off the NAS as quickly as possible. RAID rebuild sessions span the entire array…the RAID controller has no knowledge, nor does it care, whether a given HDD sector contains a portion or the entirety of a valid file, a deleted file, or not-yet-written space. My TeraStation was only partially full. The last thing I wanted to happen was for another drive to croak in the midst of full RAID rebuild.
So the first thing I tried to do was a brute-force copy of the entire TeraStation contents to a sub-directory of my ReadyNAS, with a Windows XP laptop as the intermediary. This attempt failed partway through, because both Windows and OS X clients had previously written to the TeraStation (this is one of the many curses of a dual-O/S office). OS X, for those of you that don’t know, sometimes uses ‘unusual’ characters in its file and directory names, which Windows isn’t fond of.
Attempt #2: direct backup to an USB-tethered HDD. I hadn’t tried this technique first because TeraStation-managed backups employ the NAS’s unconventional XFS file system, and frankly I was already thinking it was time to retire Buffalo’s product. Unfortunately, this endeavor also didn’t succeed; no matter what I tried, I couldn’t get the TeraStation to ’see’ a Western Digital 250 GByte external drive connected to any of the NAS’s four USB ports.
Attempt #3: folder-by-folder copy from the TeraStation to the ReadyNAS, scrupulously avoiding the offending OS X bits in the process. Tedious? Yes. Content-incomplete? Quite (thereby acting as one reminder that RAID alone isn’t enough…supplement it with periodic backups to comprehend situations such as virus infections, software installations and upgrades gone awry, etc…which would corrupt the entire RAID image). But successful? Also yes. The iTunes tunes were safe.
By this time, it was late on Monday night. I returned to the TeraStation GUI, selected the new drive, clicked on the ‘Restructure RAID Array’ button, and was greeted by the following pop-up box:
What did that mean? Would it obliterate only the new drive…which was fine, since the drive was blank anyway? Or (completely counter to the spirit of RAID rebuild) would it wipe the entire array clean? I had no clue, given the dubious wording. The documentation was no help. Note, too, the capacity discrepancy between the new and existing drives, which I hoped was simply a function of the latter’s unformatted status.
I clicked ‘OK’, hoping for the best, and went to bed. Three hours later, when I spontaneously woke up again, the cacophony of blinking front panel LEDs had stopped, my stored data was thankfully still intact, and I found this:
As I type these words, I’m copying (first via a Windows PC intermediary for applicable files and directories, then a Mac) the TeraStation’s contents to a RAID 1-configured Linksys NAS200 containing two Western Digital 750 GByte SATA HDDs. Although I had a spare ReadyNAS NV enclosure in inventory, the only SATA drives I had four of were WD 10,000 RPM Raptors, whose capacity was limited and whose performance potential even a GbE network interface wouldn’t come close to harnessing. Better to save them for my next desktop PC build.
Anyway, I wanted to save the Infrant enclosure, which is functionally identical to the one I’m currently using but contains a lithography-shrunk (therefore higher-speed) storage processor, for performance-boosting my existing ReadyNAS RAID build. The NAS200 doesn’t support simultaneous striping for performance and mirroring for redundancy (you’ll see that I chose safety over speed), nor does it offer GbE bandwidth. But since the LAN here is predominantly comprised of HomePlug AV and 802.11g tethers, the performance potential of a RAID 0 (or RAID 5)-striped, GbE-equipped NAS would largely go unrealized, anyway.
Continue reading with ‘HDD Failures: Transience Is Still Noxious‘…















