Subscribe to EDN

SSD Performance Variability, Wearout Failure, and Help With Specifying SSDs

August 14, 2009

The maturity level and consistent operation of hard disk drives (HDD) from the few vendors still making HDDs may lull you into expecting similar consistency from solid-state disks (SSDs). After all, both HDDs and SDDs use the same disk interfaces (PATA, SATA, and SAS (serial-attached SCSI)), so where’s the inconsistency? You will not get variability information from the SSD vendors themselves, and at present there seems to be about 170 such vendors based on estimates I heard this week at the Flash Memory Summit. You need a more objective view. I got one at the Summit by interviewing Mike Ajlouny, VP of Sales at Flexstar Technology, a drive-tester manufacturer in Fremont, CA.

Ajlouny has been in the disk-drive industry for 25 years, long before SSDs showed up. He started as a mechanical engineer and spent most of that quarter century at IBM, which was a leading commercial drive vendor until it sold the business to Hitachi a few years back. Ajlouny now spends a lot of his time helping people understand the ins and outs of SSDs.

The first thing you should understand is that the SSDs introduced so far have largely been developed by systems houses or semiconductor vendors, not drive vendors. The consequence of this lineage is that SSDs behave differently depending on the vendor. Although they may conform to the letter of the disk-interface specification, they do not all queue and prioritize commands the same way—as just one example. As a result, SSDs don’t deliver the kind of consistent system performance from one vendor to the next the way HDDs do. There will be some variation in the time required to execute command sequences. So if your system design is built with an expectation of consistent behavior, it may develop operational issues from one vendor’s SSD to the next. It’s a matter of industry maturity says Ajlouny.

There can also be time variations in the way that the SSD’s internal controller manages the drive’s Flash memory. NAND flash chips perform reads very quickly. Writes are slower and erases are glacial. You can only write a Flash block once before you need to erase it, so write and erase operations need to be managed carefully to avoid erasures whenever possible. Different drives from different vendors use different management schemes, resulting in quite a bit of performance variation.

Next, you should understand that SSDs wear out. So do HDDs, which can wear out spindle bearings and other moving parts. The SSD industry is acutely aware of the wearout issue and SSD vendors talk a lot more about wearout problems than do HDD vendors. Perhaps that’s because the SSDs can more easily monitor the health of the underlying NAND Flash technology and provide a host system with predictions of the onset of wearout failure.

The subsystem-level design of an SSD determines how fast it wears out. For example, all SSDs implement wear-leveling algorithms that spread writes to the NAND Flash across as much of the storage array as possible. Some wear-leveling algorithms are better than others, so some SSDs will wear out faster than others. Flexstar’s drive testers can test SSDs to failure and they have. So it’s possible to test drive samples from various SSD vendors to evaluate the relative effectiveness their wear-leveling algorithms as well as the quality of the NAND Flash chips incorporated into the SSD’s design.

Beyond wearout failure, Ajlouny says that SSDs behave differently under stress testing, which pushes an SSD to the corners of the operational envelope for throughput, bandwidth, and overall performance. Flexstar had to remake its HDD drive testers a year ago to make them fast enough to stress-test SSDs because SSDs have considerably higher performance specs relative to HDDs based on the number of IOPS (IO operations per second) they can execute. SSDs can have IOPS performance ratings 100x or 200x larger than HDDs, so SSDs are harder to push into those performance corners.

The good news is that an SSD that passes a tester’s stress tests is likely to last a long time, says Ajlouny. The bad news is that you can’t test an SSD for very long without wearing it out, so you probably want to stress test some initial drive samples to failure, so that you can evaluate the drives’ underlying NAND Flash management algorithms, but you’re not going to test every production drive that way or you’ll have no production.

 

Posted by Steve Leibson on August 14, 2009 | Comments (10)

August 20, 2009
In response to: SSD Performance Variability, Wearout Failure, and Help With Specifying SSDs
DGCasey commented:

What I want to know is how much faster my computer will be when doing things that require hundreds of disk swaps. I'd like to see an article somewhere on performance improvements in actual operating situations. Maybe I'd buy now just to speed things up even if I needed a new SSD in a year.


August 16, 2009
In response to: SSD Performance Variability, Wearout Failure, and Help With Specifying SSDs
Smart Sand commented:

Personally, I'm waiting for SSD products designed with MRam or FeRam (magnetic-based)...no Flash-stress wearout mechanisms or charge-leakage-based 'forgetfulness' - just not quite there on the cost-per-bit, yet, but I can be patient. Hell, I just might come out of retirement and design one myself.


August 15, 2009
In response to: SSD Performance Variability, Wearout Failure, and Help With Specifying SSDs
lance commented:

... no delusion here. Just look at what happened so far: - Intel degraded with time, reported on review sites and denied by Intel (at first). - JMicron stuttering: reported on review sites and basically ignored by JMicron. - Intel BIOS incompatibility with certain systems reported on review sites. - Intel password corruption only found out upon fielding the G2 SSD. The technology is immature and the companies making the technology are not talking. It will take years for the market to confirm the issues and rid the technology of the hype. Sure, you can't make an omelet without breaking a few eggs, but it is ridiculous to drink the sugar water directly.


August 14, 2009
In response to: SSD Performance Variability, Wearout Failure, and Help With Specifying SSDs
Steve Leibson commented:

Lance, you seem to be harboring the delusion that hard drives are reliable. The maxim for HDDs is that they WILL fail. Usually in about three years. Abruptly. Without warning. Contrast this with a well-done SSD that monitors and manages bad blocks. SSDs, if designed well, fail soft by monitoring wear on all NAND blocks, shedding capacity from bad blocks, and moving data from worn blocks to good blocks. SSDs can report their wear statistics and can predict failure so the drive can be replaced before data loss. Can't do that with HDDs. Raw NAND and raw disk platters are unreliable storage media. Both require ECC, and lots of it, to create reliable storage media. Most of the speakers at the Flash Memory Forum cited these facts as though they understood them well. A couple of them mentioned using the same ECC schemes developed for disk drives in their SSDs. I don't think you'll ever see any of these factors discussed in an Amazon review. That's why there's an EDN.


August 14, 2009
In response to: SSD Performance Variability, Wearout Failure, and Help With Specifying SSDs
lance commented:

... exactly. Design engineers can't wait that long. Neither can marketers and entrepreneurs. The whole point of this article and issue is that people that buy storage need to be very skeptical because designers and business people "can't wait that long". For example: They would rather see two shuttles blowup midair than consider an engineering report because "they can't wait that long". In the future I may get some x25-e SLC SDDs just to use them as daily caches to backup, but as far as reliable disk storage, no way. SSDs solve speed problems, but backups and plans to counter "being screwed" are still needed. Anyone that is buying the sugar water of SSD reliability needs to be more skeptical.


August 14, 2009
In response to: SSD Performance Variability, Wearout Failure, and Help With Specifying SSDs
Steve Leibson commented:

Hmmm, quote a 25-year industry vet who doesn't even sell drives and get smacked with the "air of expertism." I think you should indeed wait for those Amazon reviewers that complain because the instruction book was the wrong shade of bone or ecru or that the drive doesn't install itself by walking from the box to the PC on little legs. Seriously, I like Amazon reviews myself, but design engineers can't wait that long or depend on the highly subjective results.


August 14, 2009
In response to: SSD Performance Variability, Wearout Failure, and Help With Specifying SSDs
lance commented:

... exactly. Best to ignore the experts, paid talking heads and technical specifications and wait five years for the flood of long-term reviews to come in at Amazon, or not. Ironic that the best technical documentation is from raging consumers at Amazon and other user-review sites. This article and posts seem to give the air of "expertism". I think I'll just wait for the truth to emerge at Amazon.


August 14, 2009
In response to: SSD Performance Variability, Wearout Failure, and Help With Specifying SSDs
Steve Leibson commented:

The whole topic of wearout failure for SSDs is pretty hot and contested right now. The MTBF spec on the Intel drive doesn't mean that you can write to it contiunuously for 228 years and it won't fail. It also doesn't mean 228-year data retention. There are different reliability specs for those attributes. the industry spec for NAND Flash data retention is 10 years, and there's plenty of controversy for that spec too. SanDisk is trying to interest the industry in an SSD write-wearout rating in terabytes: you can write this many terabytes to the SSD before wearout failure. However, my impression, based on the SanDisk speaker's voiced frustration, is that SanDisk isn't getting any traction with this spec.


August 14, 2009
In response to: SSD Performance Variability, Wearout Failure, and Help With Specifying SSDs
MTBF Explained commented:

Probably because MTBF (Mean Time Between Failures) is meant to be confusing... It's not unusual for things to have MTBF's that significantly exceed their lifetime as defined by wear-out. In-fact, I'll use an example most of us should be familiar with, a "thirty-something" American (well within their constant failure rate phase) has a failure (death) rate of about 1.1 deaths per 1,000 person-years and therefore has an MTBF of 900 years (of course its really 900 person-years per death). Although, even the best of us wear out long before this. I know it's confusing and probably seems almost useless, but it has it's purpose... The previous example points out one important characteristic of MTBF, it is an ensemble metric. It should be applied to populations (i.e. "arrays") of things; not a sample characteristic, which would then only apply to one specific thing. So basically, MTBF is an excellent characteristic for determining how many spare hard drives are needed to support 1,000 PC's, but a poor characteristic for letting you on when you should change your hard drive to avoid a crash. So why continue this confusion? Simple, if you're a hard drive vendor and you have no idea when your drive will fail you can't just say "Our hard disk drives will die sometime between 1-7 years, we just don't know when exactly" so you do a little marketing spin and say "Our hard disk drives have an MTBF of 1,000,000 hours" Just my two cents worth :)


August 14, 2009
In response to: SSD Performance Variability, Wearout Failure, and Help With Specifying SSDs
lance commented:

Intel says their X25-e lasts 2,000,000 hours MTBF which is 228 years. Something sounds fishy one way or the other. If I let a SSD sit for 228 years without moving it or plugging it in I bet one of its bits would fail on its own without even using it. Something sounds fishy.

POST A COMMENT
Display Name
captcha

Before submitting this form, please type the characters displayed above. Note the letters are case sensitive:

Advertisement
Advertisement
Advertisement
About EDN   |   Site Map   |   Contact Us   |   Subscription   |   RSS
© 2011 UBM Electronics. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other UBM Canon sites

UBM Canon | Design News | Test & Measurement World | Packaging Digest | EDN | Qmed | Pharmalive | Appliance Magazine | Plastics Today | Powder Bulk Solids | Canon Trade Shows