Feature
Signal to noise: calculating the high-resolution-audio reality-to-hype ratio
Is high-resolution audio a "sound" investment, or will it bring your next design to a crashing halt in the market?
By Brian Dipert, Technical Editor -- EDN, 2/6/2003
|
Music labels and equipment suppliers hope that high-resolution-audio formats, such as DVD-Audio and SACD (Super Audio CD), will be the latest in a string of mostly successful upgrade "pitches" (Reference 1). Beginning with the 78-rpm record and
quarter-inch tape, the audio industry has sold consumers on a series of claimed
ever-higher quality formats: 33.3-rpm albums, 45-rpm singles, and eight-track
tapes, all of which audio CDs and cassettes and, less successfully, DAT and
MiniDiscs have now superseded. Along the way, consumers have upgraded their
audio gear, refreshed their music libraries, and purchased more expensive
variants of new music. With history as a guide to future trends, why should this
latest format jump be any different?
Well, with every generation up-tick, the incremental quality improvement has diminished. I'd argue, in fact, that the reason that audio CDs so quickly supplanted LPs had little or nothing to do with higher sonic quality, and the even more rapid acceptance of degraded-quality MP3 and other lossy- compression formats supports this claim. The embrace of the audio CD was all about portability and durability, not first-play sound quality, and some folks still insist that LPs sound better. The latest audio formats often offer surround sound, which is a credible upgrade motivator for at least some consumers. But will large samples and high sample rates further increase consumers' temptations to pull out their wallets? Will consumers care about these features? Or, as the actions of Sony (which refused to label recent two-channel re-releases of the Rolling Stones' library as the hybrid SACDs they in fact were and sells them at conventional-CD prices) suggest, will the plethora of formats have a detrimental effect on sales?
What's the theory behind the auditory benefit claims of the new high-resolution formats? And how well does this theory hold up outside the laboratory—that is, in the real world? To set a framework for the discussion that follows, let's make sure we're using the same vocabulary, and the same definitions for the words in that vocabulary. I adapted the descriptions that follow from an Analog Devices application note (Reference 2):
- Decibel: describes the sound-level (sound-pressure-level) ratio or power and voltage ratios; dBVOLTS=20×log(Vo/Vi), dBWATTS=10×log(Po/Pi), dBSPL=20×log(Po/PI).
- Dynamic range: the difference between the loudest and the quietest representable signal level or, if noise is present, the difference between the loudest (maximum level) signal to the noise floor; measured in decibels; dynamic range=(peak level)–(noise floor) dB.
- SNR: the difference between the nominal level and the noise floor; measured in decibels; other authors define SNR for analog systems as the ratio of the largest representable signal to the noise floor when no signal is present, which more closely parallels SNR for a digital system.
- Headroom: the difference between nominal line level and peak level where signal clipping occurs; measured in decibels; the larger the headroom, the better the audio system handles very loud signal peaks before distortion occurs.
- Peak operating level: the maximum representable signal level at which clipping of the signal occurs.
- Line level: nominal operating level (0 dB or, more precisely, –10 to +4 dB).
- Noise floor: the noise floor for human hearing is the average level of "just audible" white noise; analog-audio equipment can generate noise from components; with a DSP, noise can come from quantization errors.
Analog Devices' documentation also points out that you can assume that the sum of headroom and SNR of an electrical analog signal is equal to the dynamic range, although this statement is not entirely accurate because signals can still be audible below the noise floor. It also points out that, in undithered DSP-based systems, you cannot directly apply the SNR definition because no noise is present in the absence of a signal. In the digital domain, which this article series will primarily discuss, dynamic range and SNR both often describe the ratio of the largest representable signal to the quantization error or noise floor.
More bits to represent a signal mean more available quantization levels (Figure 1 and Table 1). Having more levels means lower quantization noise, a wider dynamic range, and a more accurate representation of the original signal. Again quoting the Analog Devices literature, "The maximum representable signal amplitude to the maximum quantization error for an ideal A/D converter or DSP-based digital system is calculated as: SNRRMS (dB)=6.02×n+1.76 dB;
dynamic range (dB)=6.02×n+1.76 dB6×n."
The documentation bases 1.76 dB on sinusoidal waveform statistics, and "this figure would vary for other waveforms"; n represents the data-word length. Providing more bits means providing better sound, then, at least to a point. How much accuracy between the sampled signal and the original is good enough, and how much is too much? Supporting more bits requires more processing muscle and more storage, both of which negatively impact cost. Ironically, Meridian Audio's Bob Stuart, one of the founding fathers of the 24-bit DVD-Audio format, along with a number of equally well-regarded peers, published a paper a few years ago that stated that 20-bit precision at a 48-kHz sampling rate and 14-bit precision at a 96-kHz sampling rate (in both cases incorporating noise shaping) were the maximum-required specifications for high-quality audio (Reference 3).
Thinking along similar lines, sound engineer Thomas Sandmann from Master Orange Entertainment points out that the theoretical quantization noise of a 24-bit A/D converter at –144 dB is significantly lower than the thermal noise of a single resistor connected to the ADC input (Reference 4). And Sound and Vision editor David Ranada, in a recent review of DVD-Audio and SACD players, notes that even the best of them, with an effective dynamic range of 18 to 19 bits, delivers A-weighted noise levels approximately 34 dB "worse" than ideal, 24-bit PCM performance (Reference 5).
The human auditory system, in an ideal anechoic listening environment, discerns a 120-dB dynamic range. Literature often quotes the typical ambient masking-noise level in a living room as 45 dB SPL (sound-pressure level); the noise level in a moving automobile is significantly higher. Quantization noise, most noticeable in audio with low signal levels, must be near to (because it's correlated to the audio) or ideally above this ambient noise floor before it's audible. Even with 16-bit audio CDs, such a scenario would require extensive signal amplification, which would likely blow out speakers and eardrums when the audio returned to nominal levels (Table 2).
You may be getting the sense at this point that a 24-bit sample is overkill for audio storage. Even if you believe that 16-bit samples are insufficient, which I don't, a few bits' more resolution will keep sample size from becoming the weak link in the audio chain that begins in the recording studio and microphones and ends in the listening room and your ears. The choice of a 24-bit sample primarily results from the fact that modern memory, processing, and input/output circuits most readily handle information in 8-bit groups. But at least two scenarios exist for which I'd argue that 24 bits might not be enough.
The first situation occurs during the original recording, mixing, and mastering of the audio, before producers transfer it to optical storage, a downloadable file, or some other mass-distribution vehicle. Think about all of the operations that occur during music creation: Audio engineers combine, equalize, speed up, and slow down multiple tracks' worth of recordings; acoustically manipulate vocals to turn marginal singers into divas; and invariably compress the dynamic range of the final product for as-loud-as-possible radio broadcast. Each of these numerous steps involves arithmetic calculations that, with insufficient precision, result in overflow, rounding, and truncation errors. The effects of these incremental errors build on each other and may audibly degrade the final product (see sidebar "Put on a new record"). Even so, audio engineers are still grumbling over the significant amount of expensive hardware and software upgrades that larger samples, flowing into and out of machines at faster rates, require (references 6 to 8).
An analogous scenario occurs in the decoding and postprocessing stages of audio playback. The latest generation audio formats, such as DVD-Audio, DTS 96/24, WMA Professional, PCM-transformed SACD—which themselves require long data words and postdecoding tasks, such as surround virtualization and bass management—further add to the potential for calculation error and subsequent loss of acoustic "transparency." Even in the era of 16-bit audio, 32-bit, fixed- and floating-point DSPs commonly found use in midrange and high-end equipment. With the migration to 24-bit audio, the 32-bit DSP will likely become pervasive, and all but the lowest end systems will employ floating-point variants.
In Part 2 of this article series, I'll discuss the other half of the technology behind the high-resolution audio hype: high sampling rates. Have a relaxing intermission and tune in to the next issue of EDN for the rest of the show.
| For more information... | ||
|
When you contact any of the following manufacturers directly, please let them know you read about their products in EDN. |
||
| Analog Devices www.analog.com | Audio Engineering Society (AES) www.aes.org | Digigram www.digigram.com |
| Digital Theater Systems (DTS) www.dtsonline.com | Master Orange Entertainment www.master-orange.de | Meridian Audio www.meridian-audio.com |
| NEC www.nec.com | Oktava http://oktava.tula.net | Sony www.sony.com |
| Sound and Vision Magazine www.soundandvisionmag.com | ||
| Author Information |
Technical editor Brian
Dipert is off to listen to the culinary frequencies emanating from his
microwave oven (note: not the interference it creates in his cordless
phone, of which he is already intimately aware), and the quantization
noise in his latest ac/dc CD. (Any excuse to "crank it up" is a good
excuse.) Reach him, and his amplifier that goes to "11," at 1-916-454-5242, fax 1-916-454-5101, bdipert@edn.com,
and www.bdipert.com. |
| References |
|
|















Technical editor Brian
Dipert is off to listen to the culinary frequencies emanating from his
microwave oven (note: not the interference it creates in his cordless
phone, of which he is already intimately aware), and the quantization
noise in his latest ac/dc CD. (Any excuse to "crank it up" is a good
excuse.) Reach him, and his amplifier that goes to "11," at 1-916-454-5242, fax 1-916-454-5101, 
