SSD Performance Variability, Wearout Failure, and Help With Specifying SSDs
The maturity level and consistent operation of hard disk drives (HDD) from the few vendors still making HDDs may lull you into expecting similar consistency from solid-state disks (SSDs). After all, both HDDs and SDDs use the same disk interfaces (PATA, SATA, and SAS (serial-attached SCSI)), so where’s the inconsistency? You will not get variability information from the SSD vendors themselves, and at present there seems to be about 170 such vendors based on estimates I heard this week at the Flash Memory Summit. You need a more objective view. I got one at the Summit by interviewing Mike Ajlouny, VP of Sales at Flexstar Technology, a drive-tester manufacturer in Fremont, CA.
Ajlouny has been in the disk-drive industry for 25 years, long before SSDs showed up. He started as a mechanical engineer and spent most of that quarter century at IBM, which was a leading commercial drive vendor until it sold the business to Hitachi a few years back. Ajlouny now spends a lot of his time helping people understand the ins and outs of SSDs.
The first thing you should understand is that the SSDs introduced so far have largely been developed by systems houses or semiconductor vendors, not drive vendors. The consequence of this lineage is that SSDs behave differently depending on the vendor. Although they may conform to the letter of the disk-interface specification, they do not all queue and prioritize commands the same way—as just one example. As a result, SSDs don’t deliver the kind of consistent system performance from one vendor to the next the way HDDs do. There will be some variation in the time required to execute command sequences. So if your system design is built with an expectation of consistent behavior, it may develop operational issues from one vendor’s SSD to the next. It’s a matter of industry maturity says Ajlouny.
There can also be time variations in the way that the SSD’s internal controller manages the drive’s Flash memory. NAND flash chips perform reads very quickly. Writes are slower and erases are glacial. You can only write a Flash block once before you need to erase it, so write and erase operations need to be managed carefully to avoid erasures whenever possible. Different drives from different vendors use different management schemes, resulting in quite a bit of performance variation.
Next, you should understand that SSDs wear out. So do HDDs, which can wear out spindle bearings and other moving parts. The SSD industry is acutely aware of the wearout issue and SSD vendors talk a lot more about wearout problems than do HDD vendors. Perhaps that’s because the SSDs can more easily monitor the health of the underlying NAND Flash technology and provide a host system with predictions of the onset of wearout failure.
The subsystem-level design of an SSD determines how fast it wears out. For example, all SSDs implement wear-leveling algorithms that spread writes to the NAND Flash across as much of the storage array as possible. Some wear-leveling algorithms are better than others, so some SSDs will wear out faster than others. Flexstar’s drive testers can test SSDs to failure and they have. So it’s possible to test drive samples from various SSD vendors to evaluate the relative effectiveness their wear-leveling algorithms as well as the quality of the NAND Flash chips incorporated into the SSD’s design.
Beyond wearout failure, Ajlouny says that SSDs behave differently under stress testing, which pushes an SSD to the corners of the operational envelope for throughput, bandwidth, and overall performance. Flexstar had to remake its HDD drive testers a year ago to make them fast enough to stress-test SSDs because SSDs have considerably higher performance specs relative to HDDs based on the number of IOPS (IO operations per second) they can execute. SSDs can have IOPS performance ratings 100x or 200x larger than HDDs, so SSDs are harder to push into those performance corners.
The good news is that an SSD that passes a tester’s stress tests is likely to last a long time, says Ajlouny. The bad news is that you can’t test an SSD for very long without wearing it out, so you probably want to stress test some initial drive samples to failure, so that you can evaluate the drives’ underlying NAND Flash management algorithms, but you’re not going to test every production drive that way or you’ll have no production.
DGCasey commented:
Smart Sand commented:
lance commented:
Steve Leibson commented:
lance commented:
Steve Leibson commented:
lance commented:
Steve Leibson commented:
MTBF Explained commented:
lance commented:















