Feature
Digital audio gets an audition part one: lossless compression
Theory is fine, but reality is better. In this two-part series (see our Jan 18 issue for Part 2), I dust off my PC, stopwatch, oscilloscope, and spectrum analyzer and answer your audio-compression questions.
By Brian Dipert, Technical Editor -- EDN, 1/4/2001
|
![]() |
Last year, I wrote about the theory behind digital-audio compression, basing my research mostly on second-hand anecdotes, technical papers, and marketing collateral (Reference 1 to Reference 4). Theory is all well and good, but as I'm sure you've already discovered in your careers, even the best academic analysis doesn't always match real-life results. I prefer to come to my own conclusions, after running my own experiments, so I jumped at the chance to work on this article series. I'll be putting a number of lossless and lossy codecs under the magnifying glass (as well as through my headphones and speakers) to see how they perform.
Part 2 of this article series will appear in EDN's Jan 18, 2001, issue. Additional subsequent work will appear on EDN's Web site (see sidebar "Browse to your eyes' and ears' content").
When people seem to want to talk only about MP3 and other lossy codecs, why should you care about lossless compression (Table 1, below)? Well, people who tape live performances are going to want to archive their recordings in the best-quality format, while still shrinking the file sizes to as small as possible. Remember that an uncompressed dual-channel, 16-bit, 44.1-kHz-sampled audio stream gobbles up nearly 11 Mbytes of storage space per minute of music. Even if lossless compression shrinks the file by only one-third or one-half, that size reduction is better than nothing. For similar reasons, audiophiles prefer to listen to their multichannel, surround-sound, extended-precision, and high-sampling-rate music in a lossless format, such as the MLP used in DVD Audio, rather than a lossy-compression technique, such as DTS or Dolby Digital (Reference 5).
Lossless compression may also make sense for musicians and audio engineers who want to temporarily store tracks prior to editing and mixing and archive the final master tracks for future fame and fortune, or folks who decide to digitize those LP records collecting dust in their garages. Once you convert a sound clip to a lossy format, you cannot recover the information that the encoder discarded. And just as with repeated saving of JPEG still images, every time you modify and resave an MP3 or other lossy-compression audio file—or transcode it from one lossy format to another—the audio quality further degrades.
Storing digital audio clips as compressed data files instead of burning them onto an audio CD has another key advantage: There's no need for a subsequent DAE process to copy the information from the audio CD. DAE is notorious for injecting clicks, pops, stutters, blank spots, and other irregularities into the WAV file created during the extraction process. These flaws result from less-than-perfect laser-head positioning, defects in the audio-CD media, and programs running in the background that periodically distract the PC from its primary extraction task. If the lossless codec you choose contains built-in playback capability, or if the vendor supplies a plug-in for Nullsoft's WinAmp or other popular audio-playback software, your customers can enjoy tunes without needing to first decompress the files.
Table 1—Representative lossless compression algorithms | ||
| Codec | Version | URL |
| DAKX | www.dakx.com | |
| LPAC | www-ft.ee.tu-berlin.de/~liebchen/lpac.html | |
| LTAC | www-ft.ee.tu-berlin.de/~liebchen/ltac.html | |
| MKW | http://home.att.net/~mkw | |
| Monkey's Audio | www.monkeysaudio.com | |
| *MusiCompress (WaveZip) | 2.01 | http://members.aol.com/sndspace |
| Perfect Clarity Audio | www.sonicfoundry.com | |
| *RAR | 2.71 | www.rarsoft.com |
| RKAU | http://rksoft.virtualave.net | |
| *Shorten | 2.3a1 | www.softsound.com/Shorten.html |
| SPS | www.krishnasoft.com/sps.htm; www.pegasusimaging.com/sound.html | |
| Waveform Archiver | www.simtel.net/pub/simtelnet/msdos/arcers/wavarc10.zip | |
| *WavPack | 3.2 | www.wavpack.com |
| *WinZip | 8.0 | www.winzip.com |
| Zap | www.emagic.de | |
| * Algorithms that I tested. | ||
For the lossless-compression analysis, I wanted to measure several key parameters. First, I wanted to determine how long each algorithm takes to both encode (compress) WAV files and decode (decompress) back to WAV files. I also wanted to compare the compressed file sizes with the originals to determine each algorithm's compression efficiency. Certainly, I needed to compare each original WAV file to the one that the encode-and-decode process created to ensure that the algorithm was indeed lossless. (Fortunately, in all cases, it was.) I also originally hoped to differentiate the algorithms based on the percentage of CPU resources they consumed during encoding and decoding, but I found that they all took whatever spare MIPS capability was available during at least a portion of their compression and decompression routines.
Lossless compression results depend greatly on the characteristics of the source material: for example, total frequency range that the material spans; the amount of sample-to-sample frequency variation; whether the material is monophonic or stereo; and, if the material is stereo, the difference in the channels' frequency, amplitude, and phase. Table 2 includes a variety of music genres to throw at the codecs. I chose not only modern, digitally captured music but also older, analog-recorded, as well as live-performed, tracks, which potentially exhibit worst-case channel-to-channel phase differences.
To convert most of the audio tracks to WAV files, I used an enhanced version of Windows 98's CDFX.VXD driver, which automatically displays the tracks in Windows Explorer as stereo and mono, 11-, 22.05-, and 44.1-kHz-sampled data files (Reference 6). I had only to copy them to the hard drive. I could have also used specialized audio-extraction utilities, such as Exact Audio Copy or Plextor Manager (Reference 7 and Reference 8). The Beastie Boys hybrid CD contained both audio tracks and PC software and was, therefore, unusable with CDFX.VXD; I extracted the song using the WAV conversion utility built into Music Match Jukebox. Note that all the music tracks are stereo; one common method of cheating that some codec developers use in benchmarking against their competition involves employing low-fidelity mono, less random versions of a sound clip for themselves and a "harder" high-fidelity stereo version for the alternatives.
To fully understand the codecs' capabilities and gain some insight into how they work, I used both "real" music clips and test tones. I generated the first 10 files that Table 3 shows using Syntrillium Software's Cool Edit Pro. White noise is a random audio pattern spanning the entire frequency spectrum, representing all frequencies in equal proportions. Pink noise's sample proportions, in contrast, follow a 1/frequency pattern that places equivalent audio energy in each frequency octave. In other words, low frequencies are logarithmically more likely to exist in each sample than high frequencies. The pink-noise pattern more closely matches the human-auditory-system response, giving pink noise a more natural sound, with less high-frequency "hiss" than white noise. In contrast, brown-noise proportions follow a 1/(frequency)2 ratio, giving brown noise an audio presentation dominated by bass much more so than its colorful counterparts. Part two of this article discusses the remainder of Table 3's test tones.
Keep in mind when you analyze the results that the PC I used has a fast and full-featured microprocessor inside (Table 4). If a given format's encoders and decoders take advantage of the integer and floating-point SIMD instructions that the Pentium III supports, they might run faster on my PC than their nonenhanced counterparts; the differences would be less significant on a more conventional CPU. Other important CPU-related questions include whether the algorithms fit within the L1 or L2 caches and whether they use streaming data instructions to prevent cache corruption. With 256 Mbytes of Direct Rambus DRAM, the PC should have sufficient available memory for even the largest music clip. And, although some level of system overhead is inevitable when reading and writing large audio files, the PC's 7200 RPM, Ultra ATA/66 hard drive should keep this overhead to a minimum.
One other note about my PC processor: As a Katmai-generation CPU (the 533B), it's different from the latest-and-greatest Coppermine-generation Pentium IIIs (such as the 533EB) coming out of Intel's fabs. Both processors employ a 32-kbyte (16-kbyte code, 16-kbyte data) internal L1 cache, and both use a 133-MHz system bus. My 533B, however, includes a 512-kbyte half-speed external L2 cache; the 533EB and its Coppermine companions employ a smaller (256-kbyte) but faster (full-speed) integrated L2 cache, with more advanced features. Coppermine-era processors also beef up the system buffering to increase bus usage: They include four write-back buffers, six fill buffers, and eight bus-queue buffers. My 533B should still run rings around a 533 (Pentium II core) or 533A (Pentium III Katmai core) Celeron CPU, which have 66-MHz system buses and even smaller 128-kbyte integrated L2 caches.
My PC ran Microsoft's Windows 98 Second Edition. Before encoding or decoding, I used the task list (accessed via a CNTL-ALT-DEL keystroke combination) to terminate all running applications except Explorer and Systray. Eliminating all unnecessary background programs pushed available system resources to greater than 95% and ensured that random interruptions wouldn't unfairly handicap one codec and not another.
Results
I first decided to run each of the WAV files through the ubiquitous (at least for Windows-based PCs) WinZip program. I did two sets of tests; the first at the program's "fastest speed" setting, then at its "highest compression" configuration. I'd previously heard that the compression algorithms that generic file-compression utilities, such as PKZip, WinRAR, and WinZip use, are not optimal for low-entropy multimedia files, and the results showed that this opinion is correct (Table 5). (A version of Table 5—expanded from our print version—appears on the CommVerge Web site, our sister publication. You can view Table 5 in PDF format. The table contains results for all 19 music genres shown in Table 2, as well as for pink and white noise test tones 1 through 4 in Table 3. Even with advanced warning, I was shocked by how much poorer the WinZip compression ratios were than the ratios of the other programs and how long the compression and decompression routines took to complete.
I also found it ironic that Win-Zip's "highest compression" option—although it took noticeably longer to execute than its "fastest speed" counterpart—didn't achieve proportionally better compression success. WinRAR, a peer of WinZip, is unique among generic file-compression programs in offering a multimedia-optimized compression option. And, as the name promises, WinRAR's multimedia mode delivered significantly better compression than WinZip. However, on average, WinRAR took roughly twice as long to compress a WAV file than WinZip at its slowest setting.
Now for the audio-exclusive compression algorithms. In comparing the data for MusiCompress, Shorten, and WavPack, you might be struck by the similarity of the results. This seeming near-parity is ironic because, if you believe the programs' documentation, the algorithms vary widely in their complexity. Literature from Soundspace Audio, for example, touts MusiCompress as a lean, mean, integer-only routine employing a relatively simple predictive algorithm. Shorten, at the opposite end of the spectrum, is a more elaborate and highly configurable floating-point-based scheme. In fairness to Shorten, note that I did not experiment with altering its default settings. As they say in college textbooks, "This exercise is left to the interested student" (see sidebar "Revision enhancements").
It's possible that the hard-drive read-and-write delays normalized the results, although my gut feeling is that HDD overhead was only a secondary factor. Also, remember that in this study I'm using a high-speed, robust-featured microprocessor. A simpler, or slower, CPU, such as those in many embedded applications, might produce greater discrepancies among the competing alternatives. For similar reasons, I decided not to use my other PC, which has an even faster PIII 800 CPU and a dual-Rambus-channel i840 chip set. A 533-MHz CPU-based system represents a more mainstream configuration, particularly when you consider not only the PCs that are selling today but also those PCs that are already in users' hands. In this era of 1.5-GHz Pentium 4's, it's sometimes easy to forget that not too long ago, 533 MHz represented the state of the art.
Not surprisingly, the compression routines targeting audio struggled the most with the random-white-noise and pseudorandom-pink-noise sound files, achieving limited compression success comparable with that of WinZip, which targets more random, generic data. When you examine Table 5, compare the compression-ratio differences that exist when right and left channels have equivalent intensity with the ratio when one channel is "louder" than the other. Also, note that the supposedly compressed WavPack "fast-option" white-noise equivalent-channel file is actually larger than the original WAV clip. The fact that a "compressed" version of a highly random data pattern is larger than the pattern itself isn't a surprise; this phenomenon results from the control bit and other overhead of the compression scheme. But most compression algorithms, in such a case, automatically use the original file instead.
Different types of music, even with similar time duration, achieve widely differing compression ratios and speeds. Instrumental-only classical data compresses to less than half the original file size, but more modern music types, such as hard rock, rap, and techno, are nearly as random and difficult to compress as pink and white noise. Perhaps our parents were right, after all, when they claimed that rock and roll was nothing but noise. MusiCompress was unable to handle the Beastie Boys rap WAV file I generated using MusicMatch Jukebox. I've heard that some "ripping" software packages produce nonstandard WAV files that other programs sometimes find difficult to read and play. I believe that this scenario occurred here.
Stay tuned for part 2 of my hands-on project in the Jan 18 issue of EDN.
| AAC: Advanced Audio Coding codec: compressor/decompressor, also sometimes used to define a single-chip A/D-plus-D/A converter DAE: digital audio extraction DTS: Digital Theater Systems EBU: European Broadcast Union FDD: floppy-disk drive HDD: hard-disk drive LPAC: Lossless Predictive Audio Compression MIPS: million instructions per second MLP: Meridian Lossless Packing RIMM: Rambus in-line memory module SIMD: single instruction/multiple data SPS: Sound Processing Software SQAM: Sound Quality Assessment Material |
| Browse to your eyes' and ears' content For additional information, visit the Web-site addendum to this article series at http://www.edn.com/info/35503.html. Readers of this article's upcoming part two may notice that I haven't exactly been fair to MP3. Due to page-count limitations, I've shown and discussed only the 64-kbit "fastest encode" results, not the "highest quality" outputs, which theoretically might not exhibit the same type or extent of artifacts. Also, for each codec, a whole range of higher bit rates exist for which I have not provided frequency- and time-based snapshots. The Web site contains all of these graphics, including electronic versions of the figures from part two of the article. The Web-site addendum also provides downloadable WAV source files for the first 10 test tones shown in Table 3. I'll also post some of the lossy-compressed file versions of the tones that I generated. Copyright restrictions preclude me from posting Table 2's song clips, as well as the EBU SQAM CD tracks. But the Massachusetts Institute of Technology (Cambridge, MA) has obtained permission to post some of the EBU SQAM tracks, so the Web-site addendum will include a link to MIT's Web site. Also included are links to other sources of useful audio files, comprising both synthetic test patterns and real music and voice, that you might find useful in your own analyses, as well as other lossless and lossy compression studies that I came across in my research. As time allows, I plan to work on additional analyses. I'd like to retest MP3 with other encoder and decoder combinations. I'd also like to look more closely at the three lossy codecs that part two of this article describes for evidence of other compression artifacts and to compile additional quantifiable comparison data. And plenty of other codecs await inspection, chief among them is AAC. (See sidebar "The never-ending sonic story" in this article's part two in |
| At this point, I've been working on this project for nearly six months. My lossless-compression study took place back in the late summer of 2000, after which I shared my results with vendors. David Bryant, developer of the WavPack algorithm, cleared up some of my post-analysis confusion (and tempted me to do even more analysis) when he told me that the compressors I tested actually are similar, despite vendors' claims. However, he advised me that two of the compressors that I didn't test (LPAC and RKAU) are different and produce better compression albeit at slower speeds. Unfortunately, "much" better compression is maybe only a 5% improvement or a little more. Although I tested the latest available version of WavPack, Bryant has been hard at work improving his product. Several of the other algorithm developers also follow an aggressive updating schedule. This rapid-revision phenomenon was largely why, for example, I chose to not include in the study the popular Monkey's Audio algorithm. When software goes through multiple revisions in a short time, analysis results become obsolete long before they can appear in print. Bryant claims that if I had tested the latest version of WavPack (3.6 beta), the results would have had more variety. According to Bryant, the new WavPack is a 32-bit program, so it executes much faster than the version I tested, and it now contains a new "high"-compression option (-h) that gives compression ratios much closer to the best programs while still being reasonably fast. Bryant says he achieves this high performance-with-high compression combination by including more samples in the predictor, but he still retains an all-integer approach. For readers who would like to test new versions of WavPack, any of the other algorithms I evaluated, or codecs that I didn't have time to benchmark, most of the test files are available at |
| For more information... | ||
| For information on subjects discussed in this article, use EDN's information-request service. When you contact any of the following manufacturers directly, please let them know you read about their products in EDN. | ||
| RARSoft www.rarsoft.com Enter No. 301 | SoftSound Ltd +44 1223 421754 www.softsound.com Enter No. 302 | Soundspace Audio 1-408-221-1191 members.aol.com/sndspace Enter No. 303 |
| WinZip Computing 1-612-253-8488 www.winzip.com Enter No. 304 | Other companies mentioned in this article Adaptec, www.adaptec.com Belkin, www.belkin.com Cirque, www.cirque.com Digigram, www.digigram.com DTS (Digital Theater Systems), www.dtsonline.com Dolby Labs, www.dolby.com Ego-Sys, www.egosys.net High Criteria, www.highcriteria.com Hitachi, www.hitachi.com Intel, www.intel.com Iomega, www.iomega.com Kingston Technology, www.kingston.com Linksys, www.linksys.com M-Audio (Midiman), www.m-audio.com Maxtor, www.maxtor.com Microsoft, www.microsoft.com NEC, www.nec.com Nullsoft, www.nullsoft.com Nvidia, www.nvidia.com Plextor, www.plextor.com Sonic Foundry, www.sonicfoundry.com Syntrillium Software, www.syntrillium.com Zoltrix, www.zoltrix.com | |
Author info
|
|
1. Dipert, Brian, "Now hear this," EDN, Feb 3, 2000, pg 50.
2. Dipert, Brian, "Digital audio breaks the sound barrier," EDN, July 20, 2000, pg 71.
3. Dipert, Brian, "Security scheme doesn't hold water(marking)," EDN, Dec 21, 2000, pg 35.
4. Dipert, Brian, "Listen up," CommVerge, January 2000, pg 46.
5. Partkya, Jeff, "Stop, hey, what's that sound," EMedia Magazine, January 1999, pg 42.
6. Starrett, Robert A, "Ripping Off Recordings: Digital Audio Extraction Do's, Don'ts, and Do-ers," EMedia Magazine, July 1999, pg 34.
7. Dipert, Brian, "Software delivers the sound of savings," EDN, Sept 1, 2000, pg 24.
8.Dipert, Brian, "CD-RW drive, tutorial prevent a 'slow burn,'" EDN, Oct 26, 2000, pg 34.
















Contact Technical Editor Brian Dipert at 1-916-454-5242, fax 1-530-937-8147, e-mail 
