|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
May 22, 1997 Videoconferencing goes to POTSStephen Kempainen, Technical Editor Believe it or not, the POTS infrastructure can now deliver videoconferencing. Because of new standardized codecs, low-cost, interoperable products provide toll-quality audio as well as video that ranges from acceptable to very good. Telecommuters beware: Videoconferencing is ready to invade your home office, bringing an end to teleconferences in which you're still unshaven or wearing your pajamas. At the same time, however, videoconferencing introduces exciting and yet unimagined adventures in communications. Enabling these adventures is the International Telecommunication Union--Telecommunication Sector (ITU-T) standard H.324 for multimedia conferencing on plain-old telephone service (POTS). H.324 defines new standards for transmitting video, voice, and data over one analog telephone line. Possibilities abound because related ITU-T recommendations for videoconferencing on LANs and WANs use the same interoperable substandards as H.324, enabling even greater interoperability among videophones with different features and manufacturers. Before H.324, products required the use of ISDN phone lines, which added cost and hassle to videoconferencing. In addition, an expensive and proprietary "boardroom" product from one manufacturer could not conduct a call with the product of another manufacturer. Add the fact that the video and audio quality were inconsistent, and you can see why videophones have not enjoyed widespread use among the general public. H.324 addresses these problems, but another roadblock to widespread use is that even the adventurous consumer expects a videophone to have the quality and reliability of his telephone. The H.324 standard also addresses these expectations. Delivery occurs through the digitally switched, worldwide POTS infrastructure that has the bit rate and guaranteed circuit connections necessary for multimedia calls. In addition, the H.324 standard relies on the proven bandwidth of V.34 modems to carry simultaneous video, audio, and data. However, since the completion of H.324 in March 1996, few products have appeared on the shelves, so consumer purchases have yet to acknowledge whether the POTS signal can carry enough video data to meet quality expectations. The audio quality is not in question, because the new codec techniques provide toll quality or better. But the video quality is suspect. Consumers will have to adjust to videophone calls as they adjust to regular phone calls: making sure that calls happen in quiet surroundings and that participants sit relatively still in a brightly lit place with a visually quiet background. Only then will the new video codec be able to properly transfer facial expressions in color for full-sized displays. If the codec must analyze excessive motion and poor lighting, the video frame rate will slow, and the images will blur. If consumers are willing to adjust their expectations, videoconferencing will ride H.324 to mainstream consumer products. The almost-universal connection of every home and business to POTS provides a ready market for videoconferencing. The popularity of V.34 modems and powerful, cheap computers and videoconferencing chips, in addition to software techniques for compression, will pave the way for multimedia conferencing on low-bit-rate-circuit switched networks. H.324 provides for interoperability among diverse terminals, including PC-based systems, PBX videophone systems, stand-alone videophones, Web browsers with live video, telemedicine terminals, and remote-security and surveillance cameras. ITU-T standards for other networks provide even more variety for interoperable videoconferencing products. Standards for all network connections consist of substandards (Table 1, pg 88), both mandatory and optional, for the audio and video codecs. The table shows how some of the substandards are common across a number of the overall videoconferencing standards, meaning that interoperation could be as simple as translating from one communication format to another. For example, the video codec H.261 is mandatory for ISDN, POTS, LAN, and asynchronous-transfer-mode (ATM) implementations. Also evident is the commonality of control-function and data-transfer standards, which further eases interoperability. As the first videoconferencing standard, created by the ITU-T in 1990, H.320 for ISDN served as the architectural model for all the follow-on standards. H.324, the first follow-on standard, focused on the telephone analog local loop; therefore, performance on low-bit-rate networks was of primary importance. Also important were flexibility, options, and maintaining backward compatibility with H.320. However, H.324 video quality will not match that of H.320 because of POTS' limited bandwidth: 33.6 kbps compared to ISDN's 128 kbps. (Because the new 56-kbps modems provide their bit rate in only one direction, they might only minimally improve videoconferencing bandwidth.) In addition, there are two videoconferencing standards for LANs: one for those that can guarantee bandwidth and another for those that can't. H.323 is for packet-switched networks that cannot guarantee bandwidth, such as Ethernet. Ethernet's dominance in the business market is causing much product development around this recommendation. The other standard, for guaranteed-bandwidth LANs such as ATM, is H.310. This standard is making strides in compatibility with the MPEG standards, so future products will be able to interoperate with the next MPEG standard for video compression, MPEG-4. A primary goal of H.323 was to establish interoperability with all other terminal types for videoconferencing. Hence, in addition to the terminal specifications, H.323 defines such components as gatekeepers for conference admissions, multipoint controllers for group conferences, and gateways for interoperability with H.324, H.310, and H.320. The gateways translate call signaling, control-channel messages, and multiplexing techniques between terminals. You can avoid audio- and video-compression transcoding with gateways, because terminals could have a common mandatory or optional algorithm between them.
Toll-quality audio is critical Even though the term "videoconferencing" implies that video is of primary importance, without the audio there would be no conference. Inconsistent or dropped video frames are annoying but tolerable, whereas bad audio can end a call. Therefore, the audio stream and its quality need primary design attention. Recent work to optimize audio coders for specific applications focuses on trade-offs in attributes such as bit rate, complexity, delay, and quality. (See box, "Speech coders for low-bit-rate multimedia communications," for an explanation of speech-coding attributes.) The G.723.1 audio-codec standard addresses low-bit-rate videoconferencing systems and produces a near-toll-quality audio signal. The two bit rates for G.723.1 are 5.3 and 6.3 kbps; the less complex, higher bit-rate system uses less processor power. The estimated processing power needed to implement the algorithm from a fixed-point DSP is 23 MIPS for the low bit rate and 18 MIPS for the high bit rate. As speech coders go, both algorithms are toward the low end of the complexity scale. The G.723.1 also has a provision for silence-suppression coding, which is possible because usually only one party talks at a time. The listener's terminal can eliminate its audio portion from the multimedia stream and use that bandwidth to improve the video signal when the other person is talking. However, complete silence on a call might make a listener think he has lost his connection. For this reason, the receive-end audio DSP adds simulated background noise to increase the user's comfort level with the technology. When the listener starts talking, the voice-activity detector turns the audio codec back on. The talker's transmitter-control unit decides whether to use the high or low bit rate for the audio codec. The listener's receiver signals its preference for either high or low bit rate to the transmitter using the new substandard, H.245 for control protocols, during the call setup and whenever the receiver's preference changes. The transmitter can change rates during a call, depending on receive-end feedback and audio complexity. This change is possible because each audio bit-stream frame carries the coder rate as a part of its syntax. The G.723.1 speech coder adds about a 100-msec, one-way system delay to the audio stream, which is less than the video coder adds. If lip-synch capability is important to the end product, a delay in the receive-audio path synchs the audio to the video (Figure 1). The transmitter uses H.245 to send a message describing the skew between the transmitted audio and video streams. The receiver uses this information to adjust the delay in the receive-audio path. The receiver may choose no added delay if the video-frame rate is low, because at five to seven frames/sec, the motion of facial expressions is not going to closely match the audio anyway. However, at 10 to 15 frames/sec, the absence of lip-synch becomes distracting. Therefore, the receive terminal should add audio delay to match the video delay. Another problem that the audio-stream delay causes is acoustic echo from the receiving speakerphone. In a full-duplex videophone, the listener's microphone picks up the room's audio energy disseminating from the listener's speaker and sends it back to the talker. This echo can be disconcerting to the talker and becomes worse after adding delay for lip-synch. Therefore, including acoustic-echo cancellation (AEC) in a videophone is a very good idea. A good AEC algorithm typically uses about 15 MIPS. Total audio processing uses about 23 MIPS for coding, 15 MIPS for AEC, and 5 MIPS for audio-control messages, for a total of approximately 43 MIPS. Because low system cost is crucial to H.324, one processor that can handle the task and still have MIPS to spare, such as Hitachi's SH-DSP, is beneficial. The SH-DSP provides 60 MIPS, which is enough to run any of the videoconferencing speech coders in Table 1 and still have MIPS for control functions and other features, such as speech recognition. Hitachi has the algorithms available for all the ITU-T speech coders. Using one powerful processor is probably better than the alternatives: leaving out some of the features, such as AEC, or using multiple processors. You wouldn't be alone in assuming that it is the video in POTS videoconferencing (and not the audio) that will go to pot. Video sent over analog phone lines creates images with jerky motion, blurring, and stair-step or jagged-edge artifacts. The low bit rate that analog telephone lines offer presents a challenge to producing good-quality video. Previous videoconferencing experience shows that a rate of approximately 20 frames/sec gives motion images that compare with TV quality. At 15 frames/sec, the motion quality is very good, but lip-synch to the audio needs to have a skew of less than 50 msec, or the audio will not match the mouth moving on the video. At five frames/sec, motion quality deteriorates until the lip movement is undetectable anyway, but image clarity is still important. If you have a bigger display, motion and clarity become even more critical. The size of the display and whether it is LCD or CRT set user expectations. Consider a TV-based videophone connected to POTS by a set-top box. The viewer expects the image on his TV to be TV-quality and cringes at a jerky picture. On the other hand, a videophone on your desktop with a 4-in., color LCD screen or a quarter-common-intermediate-format (QCIF) (176×144 pixels) video window on your notebook PC does not create the same user expectations that the TV display does; viewers are more likely to accept a less-than-TV-quality image on such a display. So, the task is to cost-effectively transmit enough information at low bit rates and process the received information to fill video displays that satisfy each consumer. The ITU-T undertook this task when it created the recommendation H.263, which addresses video coding for low-bit-rate communications. It specifies a coding technique to compress the moving-picture component of audio-visual signals. The video-compression technique comes from H.261 (the video-compression algorithm of H.320 for ISDN conferencing), with significant improvements in the motion-compensation quality at low frame rates. The target bit rate for H.263 available from a V.34 modem in 1995 was 28.8 kbps, and POTS provided that bit rate at the time. A low-complexity algorithm, which translates into low cost, was another objective that the ITU-T wanted to include. In addition, H.263 had to be as generic as possible to fit the range of products that are possible in POTS-videoconferencing connections. Flexibility is just what H.263 delivers. It is so flexible that either CPU software routines or a hardware accelerator satisfy its processing requirements. It works in a TV-based or stand-alone videophone or in a multimedia PC equipped with a videophone. The motion-picture quality can vary from three to 30 frames/sec, depending on the processing power and bandwidth available. In addition, there are five standardized picture formats; sub-QCIF (128×96 pixels), QCIF, CIF, 4CIF, and 16CIF. Therefore, the video can be a small window with only three to five frames/sec or a full-screen window with 15 or more frames/sec. Product designers make a trade-off between picture quality and cost/complexity by choosing from the H.263 options for compression, image size, and frame rate. The video quality for H.263 can be very good. Even though 28.8 kbps minus the 5.3 or 6.3 kbps for audio was the design goal, H.263 slightly outperforms H.261 when you use H.263 at higher bit rates. But at the lower bit rates, the video quality of H.263 is equal to H.261 working at five to 10 times H.263's bit rate. H.263 worked so well that the MPEG adopted it as the basis for MPEG-4. MPEG and ITU-T will likely converge on one recommendation for next-generation products. The trade-off between a software or hardware video codec for H.263 is simply between motion quality and cost. One example of a software codec is Intel's Video Phone v1.2, which runs best on a multimedia-extension (MMX) Pentium processor. In general, the software technique can eliminate the need for a separate video codec, but it provides only a QCIF window at about three to 15 frames/sec; the frame rate depends on available CPU cycles. And you still need a video-capture card when using an analog camera with a software video codec. On the other hand, a hardware video codec can deliver a CIF (352×288 pixels) or bigger window at 15 frames/sec. Hardware can deliver more frames per second, because software doesn't use all the complex, full-motion estimation algorithms and compression options offered by H.263 that a hardware video codec does.
An interesting insight on software systems comes from Microsoft's (Redmond, WA) NetMeeting 2.0, which adds H.324 and H.323 audioconferencing and videoconferencing to the company's Internet-telephony and multipoint-data-conferencing capabilities. Beta testing of NetMeeting 2.0 reveals poor video-image quality. Microsoft lists some possible causes, such as too large an image and insufficient camera lighting, which both overload the CPU. Suggested fixes include connecting the camera through a video-capture card to help unload the CPU, closing programs to conserve CPU power, reducing the size of the image, and increasing camera lighting. You can get more information on NetMeeting at www.microsoft.com/netmeeting for Windows 95 and NT. The video-capture card for software codecs can be simple, but it is essential for speeding video-image processing. For example, the Brooktree Bt848 Video Capture for PCI chip integrates the video-capture system. It incorporates functions from the analog video-input circuitry to the PCI-master interface. The Bt848 helps the CPU by directly delivering down-scaled video to the system memory in YUV format. The resized and formatted video is ready for immediate codec processing by the CPU. Further help not only for software but also for all video codecs comes from filtering noise from the video-input signal. Noise from poor video-input circuitry, poor lighting, and electrical interference wreaks havoc on compression algorithms. The Bt848 has a five-tap vertical filter at the video input. The filter reduces high-frequency noise in the video signal, enabling better compression of the signal. You can eliminate the need for a separate video-capture card by using a video-accelerator chip with video input, such as the CL-GD5480 VisualMedia Accelerator from Cirrus. The chip uses Cirrus' V-Port standard interface for a digital-camera input, performs linear scaling by filtering for scale-down, and performs interpolation for zooming without using the host processor. The chip also has two hardware windows and a mirroring feature that allows the local videoconferencing user to see himself.
In addition to the audio and video media channels, H.324 provides for data channels. The data-channel substandard is T.120, and it provides for a plethora of data-transmission functions. T.120 includes electronic white boards, computer-application sharing, file transfers, and camera remote control. Standardized formats do not limit the type of data exchange possible in T.120; it allows two terminals to negotiate any type of data exchange between them. The ITU-T standard recommendations for videoconferencing open the door for more useful products, implying guaranteed interoperability for all products designed for the same recommendation, regardless of manufacturer or complexity. Also, the common substandards within the recommendations enable the design of gateways that can translate compression and multiplexing schemes from one standard to another for even greater interoperability. References
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| EDN Access | Feedback | Table of Contents | |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Copyright © 1997 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Publishing Company, a unit of Reed Elsevier Inc. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||