Feature
DCT scaling enables universal MPEG decoder
In the rapidly evolving multimedia technology, a need exists for higher resolution pictures. Image scaling can meet that need, but traditional methods can
Ram Prabhakar, Cirrus Logic -- EDN, 6/6/1996
An MPEG encoder can specify a 4:2:0, 4:2:2, or 4:4:4 image format. Most VGA terminals need a 4:2:2 format, however, and D-1 recorders need the 4:4:4 format. For these devices to display the video with the appropriate resolution, the video's chrominance components (Cb and Cr) need rescaling.
The common method of image scaling works in the spatial domain. But, scaling in the spatial domain requires large memory arrays and adds significant latency to image reconstruction. An alternative approach works in the frequency, or DCT, domain. This approach manipulates the encoded video's DCT coefficients so that the decoding step produces the display format an application requires. Only the video's chrominance components change, however. The luminance components (Y) remain unaffected. Performing image scaling for an MPEG decoder in the DCT domain affects the function of the three highlighted blocks: the inverse-DCT (IDCT), the motion-compensation step, and the frame-store memory There are two types of IDCTs. The MPEG standard recommends use of the 2-D type-2 IDCT. The type-2 IDCT produces image pixels f(x,y) from DCT coefficients F(u,v) using where C(u) and C(v) are 1 for u or v=1, 2, 3,..., (N-1) and are 1/([square root] 2) for u or v=0 or N. To produce a 4:2:0 image, the type-2 IDCT works with an 8×8 block, a total of 64 coefficients. Scaling up to a 4:4:4 image uses a 16×16 block. Scaling down from 4:4:4 to 4:2:0 uses 4×4 blocks. Both cases involving the 4:4:4 format manipulate 256 coefficients. The scaling operation replaces the standard IDCT block with the functions in Figure 2. To scale up an image, the method first upsamples the chrominance components (X(m)), using and then multiplies the resulting block with the DCT coefficients of an anti-imaging lowpass filter. The convolution property of the DCT makes this multiplication equivalent to interpolating in the spatial domain. The anti-imaging lowpass filter is an even-length symmetrical filter with a cut-off frequency of ½[pi]. The filter's design uses the Remez exchange algorithm to obtain a proper frequency response. If the filter function is then the filter's right half is where L is the number of filter coefficients in h(n) and N is the IDCT's block size. To use the anti-imaging filter in the DCT domain, you need its filter-transform coefficients: Multiplying these coefficients with the results of Equation 2 produces an interpolated DCT block. Performing an IDCT on the interpolated block produces the rescaled image. Downscaling uses a similar procedure but reverses the order. The method first applies an antialiasing filter, hr(m), to the chrominance components, X(m), and then down-samples the result. The filtered chrominance components are: and down-sampling yields The inverse DCT of Yd(m) yields the rescaled image. The upscaling and downscaling of images affects only the chrominance components. To upscale a 4:2:0 format to a 4:2:2 format, for example, the method doubles the number of chrominance components in the vertical direction. Downscaling a 4:4:4 format to 4:2:0 halves the chrominance components both horizontally and vertically. An 8×8 block becomes a 4×4 block. Motion estimation also scales These rescaling algorithms operate on a fully encoded MPEG image frame. Part of MPEG's compression algorithm, however, replaces individual frames with motion-estimation data that allows the decoder's motion-compensation block to reconstruct the missing frame from other frames. Typically, the block must calculate motion vectors for both luminance and chrominance components. The motion vectors for chrominance components are scaled versions of the luminance vectors. For example, the chrominance vectors for an image upscaled from 4:2:0 to 4:4:4 are the same as the luminance vectors. Rescaling images in the DCT domain outperforms spatial-domain interpolation, because calculations in the DCT domain are relatively loss-free. However, the filters needed for DCT rescaling are larger than the filters used in spatial rescaling. The size difference stems partly from symmetry requirements. Because DCT-domain resizing is a point-wise convolution-multiplication and a sampling-rate change, the DCT-domain filter should have an even number of taps. Further, for symmetric convolution, the maximum number of taps can be twice the DCT block size and should have a right half, as Equation 4 defines. The largest block you encounter in the DCT domain is 16×16, so the largest filter size is 32 taps. The hardware needed to implement a filter in the DCT domain does not depend on the filter's size. Therefore, you can use the largest filter with no impact on hardware or latency. The hardware needed for the filter in the spatial domain increases with longer filters. The most common spatial filter is a seven-tap filter with coefficients (-29, 0, 140, 256, 140, 0, -29), as recommended by the MPEG standard. The 32-tap DCT-domain filter requires more operations than the seven-tap spatial filter, however, as the following comparisons show. DCT interpolation takes more processing Interpolation in the spatial domain from a 4:2:0 SIF (standard image format) picture to 4:2:2 works with a chrominance size of 176×120 samples. Using a seven-tap filter interpolation of one chrominance sample takes three multiplies and two additions. Each multiply operation uses three shifts and two adds, yielding 17 operations on each pixel. Interpolating 176×120=21,120 pixels takes 359,040 basic operations per chrominance component. There are two chrominance components, so interpolation in the spatial domain from 4:2:0 to 4:2:2 format takes approximately 720,000 operations. Interpolation from 4:2:0 to 4:4:4 in the spatial domain requires interpolation from 4:2:0 to 4:2:2 and then interpolation from 4:2:2 to 4:4:4 using the same basic principle. The number of basic operations to interpolate 4:2:0 to 4:4:4 is, thus, approximately 1.4 million. Interpolation in the DCT domain from 4:2:0 to 4:2:2 works with 16×8-sample blocks. After manipulating the inverse-quantized DCT coefficients in the DCT domain using Equation 2, interpolation requires a type-2 IDCT on the block. The transform uses 160 multiplies and 864 adds. Assuming one multiply is four shifts and three adds, the basic operation translates to 2014 basic operations. For a chrominance of 176×120, 330 blocks remain after manipulating the DCT coefficients. Thus, interpolating in the DCT domain uses approximately 665,000 basic operations. For two chrominance components, interpolation takes 1.33 million operations. To interpolate from 4:2:0 to 4:4:4, the IDCT block grows from 16×8 to 16×16, and the number of operations doubles. Thus, the number of operations needed to interpolate both the chrominance components is 2.6 million. Interpolating in the DCT domain is programmable by just changing the IDCT coefficients and the block size. Although the examples show interpolation by two in each direction, decimation and other combinations for interpolation are possible. The resized image in the DCT domain has a better resolution than the resized image in the spatial domain. Further, unlike spatial-domain interpolation, DCT-domain image resizing can upscale and downscale MPEG picture formats without changing the hardware required. This unique architecture can, thus, process and decode MPEG bit streams in any format. Author's biography Ram Prabhakar has been a design engineer at the Visual Systems Division of Cirrus Logic Inc (Fremont, CA) for eight months. In his current position, he designs, implements, and develops video and graphics products, such as MPEG decoders and 3-D chips. He attended State University of New York (Stony Brook, NY). In his spare time, Prabhakar enjoys basketball, golf, bridge, and tennis. Reference:
(Figure 1). This scaling least affects the frame-store memory, which must simply be large enough to handle the largest format, 4:4:4. The IDCT block performs most of the work.
(1)
(2)h(n), for n=-[L/2],..., 0,..., ([L/2]-1), (3)
hr(n)=h(n) for n=0, 1, 2,..., [L/2]-1 (4) hr(n)=0 for n=[L/2],..., N-1,
(5)
(6)
(7)













