Feature
Video improvements obviate big bit streams
As digital television continues its frustrating nonemergence, interest in interim technologies that make current video sources look their best is on the rise. These enhancements may obsolete HDTV before it gets off the ground.
By Brian Dipert, Technical Editor -- EDN, 3/15/2001
Display progress(ive) PAL defines a two-field, 625-line-per-frame interlaced signal, which refreshes at a 50-Hz field rate and is equivalent to a 25-Hz frame rate or 15.625-kHz line frequency; SECAM uses PAL-like timings and vertical resolution but employs unique chroma-handling techniques. This article uses NTSC-based terminology. If you live outside of the United States, or if you live within the United States but, in search of maximum-possible vertical image resolution, have invested in a PAL-compatible DVD player, movies, and display, use 25 frames/sec and 50 fields/sec instead of 30 frames/sec and 60 fields/second, respectively. All but the latest generation of TVs are interlaced, so until recently, all video cameras also captured interlaced images—first all odd lines in the frame, followed by even lines 1/60 sec later. And, in an era in which televisions that measured more than 20 in. diagonally (the largest size for which the 1939 black-and-white television specification was intended) were unimaginable, interlacing worked well. However, a field-refresh rate of 60 Hz isn't fast enough to prevent the onset of visible phosphor decay between redrawn lines. The television freshly redraws one set of lines, even or odd, as the other set fades. This step helps minimize the phospher-decay problem, because your eyes and brain interpolate across scan lines and recreate some semblance of the missing information. However, the result is an overall softening of the image, along with flicker that's particularly noticeable in a dark room. Interlaced-image capturing produces acceptable results if the subjects you videotape are still. But if the subjects move, their locations within the frame shift between the time that the camera captures one set of lines (for example, odd) and when it captures the other set (even) (Figure 3). Look closely at an interlaced display, and you can see artifacts, which go by names such as feathering, jaggies, twitter, judder, and line crawl. Keep in mind that although the dimensions of the average home and, therefore, the distance from the television screen to the sofa in the living room haven't grown significantly in the decades since NTSC's unveiling, the average screen size Part of the solution to the video-quality problem involves employing a progressive-scan display, such as a computer-monitor-like CRT, an LCD, a DLP, or a plasma unit. Progressive-scan displays refresh all of the scan lines consecutively, from top to bottom (Figure 1c). The CRT phosphor-decay problem is even more critical with progressive-scan displays than with interlaced displays, because you don't have fresh, even scan lines to visually reinforce fading odd lines and vice versa. So the entire progressive-scan CRT frame refreshes at 60 Hz. Progressive-scan displays based on LCD, DLP, or plasma technology aren't subject to the same phosphor-decay problem as CRTs and can refresh more slowly. However, they're still subject to the approximately 24-frame/sec minimum refresh rate required to fool your eyes and brain into thinking that consecutive still frames are actually continuous motion. Bob and weave Once you have a progressive-scan display, how are you going to present interlaced-captured 60-field/sec content on it? If the objects in the image are at rest, the deinterlacing, or line-doubling, solution is simple (Figure 4a): You stitch the odd and even fields together, in a technique commonly called An alternative approach involves doubling up the odd scan lines to form an entire frame, then duplicating the even lines to form the next frame (Figure 4b). This approach, often referred to as More elaborate versions of bob and weave interpolate the missing scan-line information in each artificially constructed frame, either from nearby pixels in the same field's scan lines or from pixels at identical locations in past and future fields (Figure 4c). The number of pixels that the interpolation process uses and the proportional priority given to the information in each of these pixels differentiates the alternatives. The more complex the algorithm, the more logic gates or lines of code you need to execute it and the faster those gates or the processor running that code needs to operate. Also, the more pixels you use in the interpolation process, the more buffer memory you need to hold the pixels' respective scan lines. The best approach to deinterlacing combines the best aspects of both bob and weave. Motion-adaptive deinterlacing selects a temporal or spatial-interpolation algorithm for moving and stationary objects, respectively (Figure 5a and Figure 5b). The selection occurs either on a field-by-field, pixel-group-by-pixel- group, or, ideally, pixel-by-pixel basis, because different sections of the image often move in different directions and at different speeds. How can you tell whether a pixel is in motion? The answer to this question represents the black-magic proprietary technology that no developer is willing to publicly divulge. The Faroudja division of Sage, for example, touts its DCDI, which, according to the company, works well on diagonal edges. If the deinterlacer resides within a DVD or DTV decoder chip prior to the digital-to-analog video-conversion step, you may think it can use the MPEG motion vectors; some first-generation deinterlacers have exclusively employed this technique. Reliance only on motion-prediction vectors is, however, of questionable benefit. Motion vectors do not always correspond to actual motion; rather, they are useful for mathematical expediency. A motion-vector shift may, for example, reflect nothing more than a change in scene lighting that creates a better block-to-block match elsewhere, even with no object motion present. The deinterlacing-algorithm selection represents a balance of quality and cost. National Semiconductor's Mediamatics DVD decoding chips, for example, use flag-controlled deinterlacing algorithms, and the company's software DVD decoders employ bob and weave techniques. National plans to migrate to three-field motion-adaptive deinterlacing for next-generation devices and software revisions. Videophiles would probably insist on more sophisticated techniques; whether they can Remember, too, that DVD decoders, like DTV decoders and graphics accelerators, employ a unified memory architecture and that they use memory not only for deinterlacing but also for audio and video decoding, scaling, and other functions. The choice of deinterlacing algorithm hinges not only on available memory density but also on available memory bandwidth and on the number of functions simultaneously contending for that bandwidth. Graphics accelerators integrated in core-logic chip sets tend to offer limited features and performance compared with stand-alone high-end graphics chips, and similarly, you can't expect an integrated deinterlacer to match the quality of a separate chip tuned for that purpose. Formatting film Extracting maximum quality from an interlaced video source for output to a progressive display involves a lot of work. Many video sources, however, aren't interlaced. Examples include progressive-scan video cameras, film, and computer graphics. Theoretically, it should be much easier to progressively display this material. However, reality is more complicated, specifically if the video creators assume their products will appear on an interlaced display. For example, consider a DVD player. To encode 24-frame/sec film onto a DVD, movie studios put the film through a video-conversion process called telecining, also called 3:2 pulldown (Figure 6a). Not all of the resultant fields are stored on the DVD video disc; embedded control flags instruct the DVD decoder chip to repeat_first_field and put top_field_first, for example. Note, though, that use of these flags, as well as picture_structure=frame, picture_structure=top field, picture_structure=bottom field, and the self-explanatory progressive_frame, is optional. If the flags are absent or if they're incorrectly coded, interlaced display is relatively unaffected. But missing or incorrect flags can cause havoc for an inverse telecine algorithm. Why bother with inverse telecining? Two out of every five telecine-encoded video frames contain fields from Because the inverse-telecine algorithm can't rely on only control flags it must buffer, analyze, and attempt to match successive fields to detect the presence of 3:2-encoded film material. Remember, if the algorithm's location is after the digital-to-analog conversion step, it doesn't even have access to the flags for use as a guide or sanity check. Note that even if the inverse-telecine algorithm successfully rejoins the correct fields into frames, it still needs to repeat two out of every five frames to meet the 30- frame/sec television requirement (Figure 6b). This repetition causes slight display stuttering, which explains why among the 18 ATSC formats, several formats support a native display of 24-frame/sec material (Table 1). You might think that instead of replicating film frames, you could interpolate intermediate frames between the actual 24 frames to fill the 30 frame/sec rate. And you can. However, the artifacts that this process induces are sometimes more visually unpleasant than either replicated frame-video stuttering or the artifacts that result from not doing an inverse 3:2 pulldown at all. Also, full-frame interpolation is computationally intensive; therefore, it is appropriate only for offline rendering or for low resolution video frames. Two other huge obstacles await the inverse-telecine algorithm. First, the editing process might have eliminated film frames or, more likely, inserted video material, such as commercials or news clips, between them (Figure 6c). After detecting telecining, if the algorithm blindly executes an inverse 3:2 pulldown on subsequent frames, it incorrectly matches up the wrong fields past the point of the edit break. The algorithm should continuously monitor the frame sequence to prevent artifacts as a result of bad editing. The more challenging problem is that 24-frame/sec film-sourced and 60- field/sec video-sourced material, both in motion, can coexist You might see references to 2:2 pulldowns in video literature. This phrase refers to the conversion of 24-frame/sec film to 25-frame/sec (50-field/sec) PAL or SECAM video. Typically, the telecine algorithm speeds the video and audio by a factor of 25/24 (1.04) and then interlaces it. Inverse telecining of PAL and SECAM is much simpler than the inverse-3:2-pulldown technique. However, the algorithm must correctly detect whether it's transforming an NTSC or PAL-or-SECAM source and apply the correct pulldown splicing. Bigger, smaller, taller, and wider Computer-graphics subsystems and LCD controllers need to upscale and downscale horizontal and vertical dimensions of images and refresh rates to match users' resolution settings and display capabilities. Early computers could drive both interlaced and progressive-scan CRT monitors, as well as TVs, but a progressive-scan CRT or LCD is today's dominant PC display option (see sidebar "High-quality video meets the Internet"). Not surprisingly then, video-enhancement technology is coming not only from companies that traditionally focused on home theaters but also from chip suppliers targeting PCs, such as ATI Technologies, Focus Enhancements, Genesis Microchip, Nvidia, PixelWorks, Sage, Silicon Image, and SmartASIC. This trend is accelerating as PCs expand beyond a 2- and 3-D graphics-only platform and process more still- and video-image content. Most upscaling and downscaling operations stretch or compress both the horizontal and vertical image dimensions by the same multiplication or division factor to prevent distortion. Upscaling tends to be the algorithmically easier of the two operations and is analogous either to how a digital still camera interpolates from a small CCD or CMOS sensor-captured image to create a larger picture or to the digital-zoom feature of camcorders. Because you're inventing pixels that didn't exist in the original image, the downside of upscaling is the inevitable blurring of previously distinct object edges. Nearest-neighbor, bilinear, and bicubic algorithms all find use in upscaling; nearest-neighbor is the simplest, fastest, and least memory-intensive algorithm, and bicubic produces the most accurate and artifact-free results. To experience upscaling, resize a VGA still image to XGA on your computer; your image-editing software should give you several interpolation-algorithm alternatives. A reconstruction filter that insufficiently bandlimits the interpolated content and, therefore, inadequately suppresses frequency harmonics can cause moiré and jagged edges. Maintaining good video quality is more difficult when downscaling, for example, if you're playing a DVD movie in a less-than-full-screen window or in a window whose native aspect ratio doesn't match the aspect ratio of the display or for picture-in-picture applications. By inadvertently discarding important image details, such as the contours of a human face, you don't want to end up with a presentation that viewers find disagreeable. You also don't want to distort the image by disproportionally shrinking an object's dimensions. Envision a picket fence consisting of equal-width boards, a crosshatch grid, or any other sequence of parallel lines. Displaying a downscaled image in which some lines disappear and others end up fatter or thinner than others or in which downscaling alters the spacing of a group of previously equidistant lines won't work. The inverse relationship between time, or in this case location, and frequency requires that the downscaling filter length increase in proportion to the downscaling factor. A discussion of scaling would be incomplete without a review of resolution as it applies to displays. First, you should make sure your terminology is precise. The phrase "lines of resolution" has different meanings depending on whether you're talking about film, which measures the number of differentiable black lines in an image, or video, which counts not only the black lines but also the white spacing between them (Figure 7). Even though NTSC, DVD, and 480p ATSC can deliver 480 lines of vertical resolution, few CRT-based direct-view or rear-projection televisions can display them all. Be careful of televisions that claim that they can decode or even display 720p or 1080i HDTV content; the vendors' careful wording might obscure the reality, which is that the vertical resolution you see is actually much lower. The electron guns inside all but the biggest front-projection CRTs aren't accurate enough to deliver this high resolution. Even and odd scan lines might converge at portions of the screen, or the guns' aim might not exactly match up with the dot pitch of the display, partially illuminating two dots instead of fully illuminating one, for example. This phenomenon is called the Kell factor, and 0.7 is a common value for it. Even if televisions and computer monitors are both progressive-scan devices and cost roughly the same price, they differ in that the TV monitors have larger, brighter screens but with a more relaxed dot pitch than computer monitors; most TV monitors also deliver a slower maximum line-refresh frequency. These factors limit a progressive-scan television's maximum visible vertical resolution compared with that of a computer monitor. What's the resolution? Interlaced displays deliver even lower effective vertical resolution than progressive displays, resulting from phosphor decay at low frame-refresh rates and resultant image softening. Keep in mind that the horizontal display resolutions quoted in specification sheets are for only black, white, and gray-shaded patterns dominated by image luminance. Most video formats subsample image chrominance to save storage space and transmission bandwidth, a trade-off that decreases the maximum vertical and horizontal color resolution. Part 2 of this article explores in depth the reasons behind this subsampling (see sidebar "What's next?"). Also, composite video sources combine luminance, chrominance, and sometimes audio in the same broadcast channel. Inaccurate notch or comb filtering to separate the luminance and chrominance, particularly when object motion is involved, can create artifacts and reduce resolution. Lowpass filtering the luminance information to separate it from audio also reduces high-frequency detail. For a high-end 36-in. progressive-scan (34-in. viewable) television with a 0.77-mm center dot pitch, application of the Kell factor results in a calculated visible resolution of 628´470. This result closely matches the vertical resolution that a 480p digital television signal delivers (Table 2). Note, too, that the calculated horizontal resolution, although commonly specified in an edge-to-edge fashion, can be inaccurate. Vendors are supposed to specify only the number of lines of horizontal resolution contained within the diameter of a circle whose dimensions don't extend beyond the screen's top and bottom edges. Only the largest, most expensive front-projection CRT systems can deliver all 720 progressive or 1080 interlaced lines of resolution to the screen. DLP and LCD technologies may not match CRT's wide viewing angle and color accuracy. However, they're rapidly improving in both of these areas, and they have an edge in applications that value high resolution. Why would you want to horizontally upscale or downscale by a different factor than you vertically scale? Consider, for example, the display of a 16:9 movie frame (sometimes called a 1.77:1 movie frame) on a 4:3 (1.33:1) computer monitor or television (Figure 8a). Unless you select the pan-and-scan mode, which discards portions of the image, you may end up with horizontal blank bars at the top and bottom of the screen (Figure 8b). These bars leave unused available vertical-display resolution and can result in permanent CRT damage in the form of burn-in caused by uneven aging of the CRT's phosphors. Similarly, if uncorrected, a 4:3 image creates black bars at the right and left sides of a 16:9 display. Either simple horizontal or vertical linear stretching causes undesirable fat and short or skinny and tall distortions of objects within each frame. Alternatively, you can nonlinearly stretch the image, with more distortion at the edges of the screen and less distortion at its center, where, theoretically, most viewer attention focuses. Or, if CRT burn-in is your primary concern, you can eliminate the black bars by simply projecting a gray frame or a frame of another color onto the display instead, as Silicon Image does with subsidiary DVDO's iScan Pro. You may occasionally hear DVDs described as "anamorphic" or "16:9 enhanced" DVDs. What do these terms mean? Moviemakers who want to use all of the available vertical resolution of film place on the camera a special lens that squeezes an image's horizontal dimensions (Figure 8c). When a movie theater projects film through a reverse-effect lens, the correct dimensions are restored (Figure 8d). Similarly, instead of placing a wide-screen image within a 4:3 frame as part of the film-to-DVD transfer, thereby throwing away vertical resolution at the top and bottom of the frame, a video engineer can do an anamorphic transfer. The television, along with an appropriately configured DVD player, handles the restretching of the frame to correct dimensions, and the resulting image uses all 480 vertical lines of resolution that the DVD video format supports.
Author Information
REFERENCE
ACKNOWLEDGMENTS Special thanks to Dr Nikhil Balram, Vice President of Advanced Technology for the Faroudja division of Sage, to video-processor product manager Paul Wolf from Silicon Image, and to Ayre Acoustics' senior design engineer Charles Hansen. I'd also like to acknowledge the contributions of George Alfs from Intel, Kent Goodin and Raj Narayan from National Semiconductor, Diane Vanasse from Nvidia, and Brad Garofalo and Biao Zhang from SmartASIC. This article ran on page 83 of the March 15, 2001 issue of EDN. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||















The reasons behind the slow-as-molasses rollout of digital terrestrial-television programming are well-documented and frequently discussed (
Contact Technical Editor Brian Dipert at 1-916-454-5242, fax 1-530-937-8147, e-mail 
