3-D audio: above and behind you or moving low left to right?
By Maury Wright, Technical Editor - June 6, 1996
The latest in a seemingly endless succession of audio technologies, 3-D audio promises to quickly pervade desktop computer systems. Vendors loosely define 3-D audio as techniques that range from low-cost spreading of a stereo signal to accurate, DSP-based positioning of sound sources in a 3-D space. Only the former is shipping in volume on PCs. The latter will ultimately serve in any number of embedded and entertainment applications. However, the industry will have to make significant cost reductions before true 3-D audio moves into every PC.
Most people are familiar with some type of audio-positioning technology, especially in applications such as home- or commercial-theater systems. For example, Dolby Pro Logic Surround Sound home-entertainment systems use five speakers plus a subwoofer (5.1 channels) to enhance the sound field. For example, when watching "Top Gun" in such home-theater systems, the viewer perceives that the Navy jets are actually passing overhead.
Dolby Pro Logic is actually a standard for recording and playing multichannel audio in an analog format. The follow-on Dolby AC-3 standard provides similar capabilities but assumes that each channel is digitally encoded. Dolby AC-3 audio will accompany MPEG-2 video on next-generation digital video disks (DVDs). Both technologies primarily target enhanced soundtracks for movies. The technologies require six speakers to play the multichannel soundtrack.
The application for 3-D audio on a PC or in an embedded application is quite different from that of the home theater. One of the most compelling applications for 3-D audio is interactive games. True 3-D audio technology can create perceptions such as a door creaking from behind or a rocket streaking by the user. The audio enhancement fully immerses the gamer into the fictional world.
The technology has implementations outside the computer-game environment, however. For example, embedded implementations for the medical world could include hearing or reflex-reaction test or therapy systems. Other applications could include flight, driving, or other types of simulators.
Compared with Dolby-type soundtracks, computer-oriented 3-D audio applications are typically interactivity-based. The user's actions regularly alter the sound sequence. Computer and embedded installations are also unlikely to have six speakers available because of cost, space, portability, or environmental constraints. Therefore, 3-D audio for the computer environment must create the positioning perception for the user with only two speakers or a pair of headphones. To achieve this level of realism, the audio applications rely on signal-processing algorithms armed with knowledge of the human hearing system (see box, "3-D audio biophysics"). These algorithms can accurately position sounds using only two speakers.
|3-D audio biophysics|
|When you are perusing the various 3-D audio offerings, it may be helpful to understand the biophysical techniques the companies use to create the 3-D perception. Each vendor may have distinct algorithms, but all algorithms rely on some basic biophysical properties of the human hearing system; the structure of the human ear; and even the effect of the head, torso, and shoulders.
Generally, all 3-D audio implementations take advantage of three biophysical phenomena:
ITD refers to the sound's hitting one ear before the other. Audio vendors can use variable delay in stereo channels to create the perception of left/right movement or the location of a sound source.
The IID phenomenon occurs from the audio shadow of the head. Sound waves that come from one side of the head are much quieter or lower in intensity at the far ear after passing through the head. IID effects can significantly broaden the perception of left/right audio control and can be thought of as a pan control.
The vendors can inexpensively leverage both ITD and IID. Most of the 3-D audio implementations shipping in PCs take advantage of both phenomena. Such products offer the ability to spread or enhance stereo sound but lack the ability to accurately locate sound sources in a 3-D space. Some products can locate sources along a 2-D arc. The actual enhanced-stereo implementations can be handled by analog filters on the audio output or in a DSP-based codec that drives the audio DAC. The implementations that locate sound along a 2-D arc typically depend on signal processing that is performed on the host processor.
Conversely, the HRTF phenomenon requires substantial signal-processing power to harness and can yield accurate 3-D sound positioning. HRTF refers to the difference between a source sound and the actual sound that reaches the inner ear. The main contributor to the HRTF phenomenon is the asymmetrical shape of the outer and inner ear. To a lesser degree, the head, torso, and shoulders also affect the sound that reaches the inner ear. True 3-D audio implementations must leverage the HRTF to offer up/down and front/back sound control.
Full 3-D implementations that use all three phenomena leverage sophisticated filtering techniques that present sounds to the ears just as if the sounds came from a specific direction. The most popular of these techniques is called "binaural synthesis" and is designed for use with headphones. Crosstalk cancellation can be added to binaural synthesis for speaker-based systems.
There are overlaps between the Dolby world and the computer-audio environment. For example, DVD players available next year will be used in computers and home entertainment. Invariably, users will demand that the computer multimedia subsystem play DVD movie titles. Using true 3-D audio and only two speakers, a computer could project all six AC-3 channels so that a listener would perceive that the user had positioned six speakers around the room.
As stated before, 3-D audio is a loosely used term. Low-end 3-D audio refers to technologies than enhance standard stereo signals. Vendors use such adjectives as "spatial-enhancement" or "sound-defusion" to describe these low-end technologies that spread sound on a 2-D arc around a listener.
Azimuth-positioning (left/right-control) technologies that can position or move sound sources along the 2-D arc comprise the midrange of 3-D audio. At the high end of the spectrum are true 3-D sound-positioning technologies. True 3-D technologies can locate multiple sound sources, each of which can be anywhere in a 3-D space around the listener. The positioning technologies cost significantly more than the stereo-enhancement technologies and require content that developers have specifically coded for the 3-D environment.
You should start your consideration of 3-D technologies at the low end because of the low cost and the wide availability. In the simplest case, the spatial-enhancement technologies operate on a standard stereo signal. A listener in front of a pair of closely spaced speakers would perceive the speakers as being far to each side. Parts of the stereo signal that are piped to both channels, such as vocals in a music soundtrack, appear centered between the speakers.
Simple spatial enhancement can be performed in analog filters on standard stereo outputs or in a DSP-based codec that feeds the audio-output DAC of a digital sound system, such as a sound card. The implementations vary frequency and phase to present an enhanced sound field (Figure 1). Desktop implementations are widely available. QSound, Spatializer, and SRS Labs have been pushing the technology for more than a year. Each company has a list of systems, sound cards, and speakers that implement the enhancements.
The designer can turn to a number of sources to buy stand-alone, spatial-enhancement ICs. QSound, for example, directly markets the QX-2010 and QX-2130 ICs that primarily differ in supply voltage and current. ESS, OnChip Systems , and Panasonic offer ICs under licensing agreements with Spatializer. And, Seponix Corp offers the SRS5250 under license from SRS. All of these stand-alone chips are analog implementations and cost as little as $2 in the large volumes that are common with the target markets.
Binaura Corp offers what is perhaps the simplest spatial-enhancement circuit, the Universal Discrete analog design. The circuit requires only a quad op amp and a handful of resistors and capacitors. The company hopes to have a stand-alone IC available this year but is licensing the design for now. Creative Labs ships the Bianura circuit on all newer sound cards. Creative sells the circuit, along with sound-generation ICs, on an OEM basis.
In many cases, however, designers will want to buy 3-D spatial-enhancement technology in a more highly integrated form. For example, both QSound and SRS have licensed Crystal Semiconductor to produce audio codecs with integrated spatial en-hancement. Crystal offers the $42.50 (10,000) CS4237 with SRS 3-D stereo technology and the $44.50 (10,000) CS4238 with QXpander technology. Both codecs are pin-compatible with the $39.50 (10,000) CS4236. The devices include standard features, such as plug-and-play compatibility, delta-sigma converters, a MIDI (musical- instrument digital-interface) UART, and a joystick and CD-ROM interface.
In both cases, Crystal's designers implemented the 3-D audio on the DSP embedded in the codec. The DSP implementation offers software developers and, potentially, end users the opportunity to change filter settings and modify such parameters as the width of the sound field. The digital implementation also leaves the possibility of adding other special effects, such as reverberation.
Analog vs digital
Some vendors claim an analog implementation is preferable. For example, Analog Devices includes spatial enhancement in the AD1816 codec ($20 (10,000)). The company developed its own spatial-enhancement technology. Despite designing a DSP into the codec, the designers chose to partition the Stereo Phase Expander (SPX) as an analog function at the output of the codec (Figure 2). The analog implementation easily supports 3-D processing of external sources, such as an audio CD or a microphone, as well as digital sources that pass through the codec. To process external analog sources on a DSP implementation, the analog source must be digitized, spatially enhanced, and then converted back to analog.
You can also implement spatial- enhancement technologies using algorithms on standard DSPs. For example, both Motorola and Yamaha have licensed SRS 3-D Stereo and are planning to offer the algorithms for use on their DSPs. Similarly, QXpander algorithms are available on the Pine core from the DSP Group, on devices from Yamaha, and on the SPX DSP from NEC. Prices for algorithm implementations are rarely quoted, but royalties are typically equal to or less than the cost of a stand-alone spatial-enhancement IC.
Positioning technologies pose a challenge for audio designers. Designers must choose between a lower cost, azimuth-only or a full-blown 3-D implementation. The difficulty comes in how a programmer specifies the positioning information in the coding process. Understanding the software issues is paramount to making the best decision on implementing 3-D positional audio.
All of the positional technologies allow a developer to preprocess an audio program on a professional audio workstation, add all of the positional sounds, and play the program through standard stereo speakers. Preprocessing works fine for audio recordings, and much of the 3-D audio technology has a professional-audio heritage. Preprocessing does not work in an interactive environment in which actions of a user require changes in the way the system implements a sound. Interactive applications such as games require a real-time, runtime processing resource that can respond to the user.
Companies shipping positional technology have custom application- programming interfaces (APIs) that programmers can use to develop interactive games. In some cases, programmers can use the APIs on consumer-game consoles, and, in other cases, the programmers can use the APIs on a PC.
Vendors have implemented current APIs for the PC as custom dynamic-link libraries (DLLs). In fact, a software developer has to write software for a specific 3-D audio target. As you may expect, Microsoft (Redmond, WA) holds the key to an industry-standard API. Such an API would ultimately allow programmers to write software on any Windows-based PC. Moreover, a standard API is equally important to developers of embedded applications as it is to game developers. Embedded-system designers will almost assuredly find that an embedded PC provides the lowest cost avenue to using 3-D audio.
At the Windows Hardware Engineering Conference (WinHEC) (April 1 to 3, 1996, San Jose, CA), Microsoft announced the Direct3DSound API as yet another component of the DirectX API suite. An alpha version of the API is available, a beta version will ship with a developer's kit in July, and the company will include a production release with DirectX III in August.
As usual, however, Microsoft's API plans have met with a mixture of vehement boos and cheers. Microsoft set out to make the 3-D API available across all future PCs. To do so, the company decided it must have a common API on all systems, regardless of the capabilities of the runtime resource that implements 3-D positioning. A low-end system may rely strictly on the host processor and deliver only left/right azimuth control. A high-end system, meanwhile, could run the same software and deliver full 3-D positional audio by leveraging an auxiliary DSP.
The 3-D audio plan seems like a re-incarnation of Intel's native-signal-processing (NSP) initiative. Unlike NSP, however, the controversy has little to do with host-based signal processing and everything to do with the envelopment of the 3-D audio algorithm within the API. Microsoft plans to break the positioning operation into two steps. The first step, which the company calls the "interesting-but-not-computationally complex" step, consists of calculating the 3-D filtering parameters. The second step applies the parameters to a simple filter. Iteratively running the filter in real time is computationally complex and, in practice, requires a DSP for the second step to produce 3-D sound.
The controversial aspect of first step is that it includes the very intellectual property on which 3-D audio companies hope to profit. At this point, Microsoft has not indicated whether it has developed its own algorithm based on public-domain knowledge and the general principles in the box, "3-D audio biophysics," or whether it has licensed technology from one of the vendors covered in this article. If Microsoft has licensed the technology, the chosen vendor is maintaining a good poker face for now.
Assuming that the 3-D audio API continues as planned, it's unclear how the 3-D audio vendors will sell their products. The companies may be able to replace Microsoft's API implementation with their own APIs implemented as DLLs and offer better quality or some other differentiating characteristic. For now, however, the companies are at a loss to describe which products they plan to offer or how much they plan to charge for their technology.
Several companies are pursuing positional technologies and have products available in some form today. The first step up from spatial enhancement, azimuth positioning, allows one or more monaural sound sources to be positioned along a 2-D arc. In fact, the arc can extend to as much as 270°:, so that the user might perceive the sound as coming from over his shoulder.
QSound is currently the only company shipping an azimuth-positioning product. The company's Q1 algorithm is available in several formats. You can buy preprocessing software that can be suitable for some applications. QSound sells add-on software modules that work with popular audio-editing software for the Macintosh and the PC. Software that can generate simple sounds costs approximately $500, and a professional-audio configuration can cost more than $1000.
QSound has also licensed Q1 to a number of game vendors for both consumer-console and PC games. The company has its own Windows DLL available for now. The company offers licenses on a one-title, one-platform basis or on a two-year site license that doesn't limit titles and games. Licenses are individually negotiated, and pricing isn't made public. Demonstration software is available free, however, so you can easily evaluate the technology.
QSound believes that azimuth-only 3-D is appropriate near term for the largest group of PCs, because a 100-MHz or faster Pentium can generate the left/right positioning in real time without affecting performance. For example, positioning four channels of 11-kHz audio requires only 2.8% of the cycles in a 100-MHz Pentium.
Q1 is a speaker-based algorithm, and QSound is working on adding elevation and range dimensions. The company also has a headphone-based algorithm under development that has yet to ship but that will support true 3-D.
Spatializer, meanwhile, demonstrated a technology called "Worf" at WinHEC that offers true 3-D but that doesn't require a DSP. The algorithm executes on the host and leverages the capabilities of the Spatializer 3-D Stereo ICs from ESS, OnChip, and Panasonic. The demonstration at WinHEC was rough around the edges. It worked on speakers or headphones and did deliver azimuth, range, and elevation perceptions. Sound movement, however, was highly nonlinear. Moreover, the simple demonstration provided no way in which to judge how much burden the algorithm places on the host processor. Spatializer has promised a production version around press time and has hinted that per-unit cost will be only incrementally higher than the price of the Spatializer IC.
True positional sound
As you might expect, leading sound- card vendor Creative Labs is developing positional audio. In fact, any of its boards that use the E-mu 8000 IC can handle positional audio through speakers or headphones. For example, the popular AWE32 card can handle wave-table and positional audio and sells for around $225 from discount retailers.
Creative hasn't heavily promoted the positional capability to date due to a lack of a common API, but it does offer an API for the E-mu 8000. The E-mu 8000 IC was actually developed by E-mu Systems (Creative Labs and E-mu systems are both subsidiaries of Creative Technology Ltd in Singapore). The chip is DSP-based but is heavily customized for sound. Creative offers the chip on an OEM basis for the PC market, and E-mu sells the chips into other markets such as set-top boxes or embedded systems. Depending on volume, the chip sells for $30 to $40.
To date, the most complete implementation of 3-D audio comes from Crystal River Engineering. The company's AudioReality algorithm requires the horsepower of a DSP but can deliver complex sounds in a 3-D space. For example, a DSP-based AudioReality implementation can play 5.1-channel Dolby AC-3 over two speakers and achieve the perception of six speakers. This capability may quickly become the benchmark by which algorithms earn the true 3-D audio moniker.
Crystal River's algorithm is based on binaural synthesis and is designed for use over headphones (Figure 3). For speaker-based applications, the signal must be filtered to add crosstalk cancellation. A sound card adds this filter between the headphone jack and the speaker jacks.
Using Crystal River's algorithm, a typical 16-bit 22-kHz sound requires 2 to 3 MIPS of DSP power. Hosting 10 such sources on an Analog Devices 2181 DSP would require 50% of the chip's power. In addition to the 2181, the algorithms are available for the Motorola 56000 family and Aureal Semiconductor (formerly, MediaVision) ASP301. The latter is designed specifically for sound cards and together with a PCI chip sells in the $30 to $40 range, depending on volume.
The company offers free demo versions of its software for designers that want to evaluate the AudioReality algorithm. The software runs on several DSP cards, including Turtle Beach sound cards that have an onboard DSP. Motorola also offers a 56000-based evaluation board called the DSP-56009EVM. The board was designed for evaluation of Dolby AC-3 but can also run Crystal River's software.
Only two other vendors have demonstrated true 3-D sound to date. Harman Interactive (sister company to audiovisual-product company Harman Karden) demonstrated an implementation running on a Motorola 56000 processor at WinHEC. The speaker-based demonstration used binaural-synthesis technology similar to that used by Crystal River. Harman has yet to announce actual product plans.
Yamaha is the only other vendor that's currently shipping true 3-D audio in addition to offering QSound and SRS algorithms. The company's YSS225 Effects Processor includes an embedded DSP core and implements what the company calls Y SoundÑa speaker-oriented 3-D audio technique. The YSS225 IC can play AC-3 5.1-channel audio over two speakers. The $10 (10,000) IC can also generate other effects, such as reverb, chorus, echo, distortion, delay, and karaoke.
Perhaps Aureal Semiconductor, Creative/E-mu, and Yamaha have the most clear-cut path to making a volume business out of positional 3-D audio. Crystal River, QSound, Spatializer, and SRS have and will always have safe niches in professional-audio development. Their future in the volume business depends largely on Microsoft.
Depending on what Microsoft does with an API, E-mu, or Yamaha may make money on their library of intellectual property. Spatial-enhancement audio will soon be the standard on PCs. Positional techniques will be used on top of soundtracks already bolstered by spatial enhancement. E-mu and Yamaha own all of these capabilities. Aureal, Creative/E-mu, and Yamaha should, in any case, stand to generate revenue via the combination of ICs and positional algorithms.
|Shortly, every PC sound card and most stereos will include spatial-enhancement technology as a standard feature. The technology is practically free today in either IC or algorithmic form. Two obstacles, meanwhile, stand in the way of widely available positional 3-D audio. The first block is Microsoft. Designers hope that the company will deliver an application-programming interface on time (for a change) and leave a way for companies to make money from their intellectual property. We can only wait patiently for this obstacle to fall.
The second obstacle is the cost of true 3-D audio. Most early implementations will require a dedicated DSP that can add significantly to the cost of a consumer product, such as a sound card. Suitable stand-alone DSPs from Analog Devices, Motorola, or others cost $20 or more and likely need external memory, other support circuits, and a royalty for the 3-D audio algorithm used. Embedded applications can absorb this premium, but widespread desktop deployment will require lower cost through integration. Even the Aureal and E-mu ICs are being targeted at premium-priced systems rather than at baseline PCs.
Perhaps, more highly integrated ICs will debut this summer. Yamaha is rumored to be adding its positional technology to a future version of an OPL3-family IC. The OPL3 ICs are widely used on sound cards, and some already integrate codecs, synthesizer, plug-and-play interface, and other features for just over $10. The ICs already include a DSP core, and Yamaha will have a sure winner if it can add positional audio at a small price premium.
You can reach Technical Editor Maury Wright at (619) 748-6785, fax (619) 679-1861, firstname.lastname@example.org
|For free information|
|Aureal Semiconductor Inc
Menlo Park, CA
|Creative Labs Inc
|Crystal River Engineering
Palo Alto, CA
|Crystal Semiconductor Corp
|DSP Group Inc
Santa Clara, CA
|E-mu Systems Inc
Scotts Valley, CA
|ESS Technology Inc
|Harman Interactive Group
San Jose, CA
Mountain View, CA
|OnChip Systems Inc
San Jose, CA
|Panasonic Industrial Corp
|QSound Labs Inc
Calgary, AB, Canada
San Jose, CA
|Spatializer Audio Laboratories Inc
Woodland Hills, CA
|SRS Labs Inc
Santa Ana, CA
|Yamaha Systems Technology Inc
San Jose, CA