|
||||||||||||||||||||||
|
||||||||||||||||||||||
| Read the information-rich Addendum to this article |
The embedded systems you design are transmitting, receiving, and manipulating increasingly rich data types, such as high-resolution-color still and video images, 2- and 3-D graphics, and high-fidelity audio. This information expands system versatility and enhances the user-interaction experience, but it also increases the required sizes of archival and temporary storage. RAM, ROM, and disk drives are getting cheaper and denser all the time, but until they're zero-cost and infinitely large, you can't afford to ignore them.
All those bits passing within a system or from one system to another consume precious I/O bandwidth. Wider, faster, and more complex buses burn power; generate noise; and increase pin count, packaging costs, and silicon costs. If your system's incoming or outgoing information passes through a toll-based interconnect, such as the Internet, you must also consider these operating expenses. However, a bandwidth-starved system with slower or more unpredictable response time than its less-featured predecessor is an unacceptable compromise. A more realistic scenario is that users expect both richer information and faster perceived performance.
If bandwidth and memory aren't free, how can you still deliver a rich multimedia experience to those individuals who use the systems you design? More and more, engineers are turning to data compression, implemented in both software algorithms and hard-coded circuits, to keep bandwidth and storage requirements at manageable levels while effectively transporting more information in a given amount of time. This trend makes good sense when you consider that the rate of logic integration and performance improvement is outpacing advancements in both interconnect speed and memory cost.
Two advanced compression techniques, wavelets and fractals, promise to reduce the memory and bandwidth needed to achieve acceptable quality levels and to significantly improve the multimedia experience at a given level of bandwidth and memory over today's mainstream techniques. Before exploring these new developments, this article reviews the theory behind the algorithms currently in widespread use. You also might want to refresh your knowledge of basic lossless and lossy compression concepts.
JPEG is today the most common way of storing and exchanging still-photograph image information. JPEG standardization efforts began in 1986, with the first specification version published in 1991. The JPEG committee's goal was to develop a technique that retained maximum image quality while keeping file sizes at reasonable levels and remaining in line with cost-effective processing capabilities and memory densities. Realize that when the JPEG committee first met, the first 386 microprocessor had just begun shipping, and DRAM cost more than $100/Mbyte.
Most JPEG implementations manipulate images in their YCrCb (or luminance/chrominance) format rather than their RGB or CMYK (cyan-magenta-yellow-black) formats. The human eye also uses RGB-to-luminance/ chrominance conversion to reduce redundancy and dynamic range and to best use available bandwidth in the optic nerves leading to the brain. Because the eye responds more precisely to brightness information than it does to color, JPEG optionally supports subsampling (decimating) the two chrominance matrices but not the luminance matrix. In a 4:4:4 scheme, each 8×8 matrix of RGB pixels converts to three YCrCb 8×8 matrices: one for luminance (Y) and one for each of the two chrominance bands (Cr and Cb). A 4:2:2 scheme also creates one 8×8 luminance matrix but decimates every two horizontal pixels to create each chrominance-matrix entry. This fact means that for every two 8×8 RGB-pixel-matrix sources, you end up with two 8×8 luminance matrices: one 8×8 Cr matrix and one 8×8 Cb matrix--two-thirds the amount of data in a 4:4:4 scheme.
Ratios of 4:2:0 decimate chrominance both horizontally and vertically, resulting in four Y, one Cr, and one Cb 8×8 matrix for every four 8×8 pixel-matrix sources. This conversion creates half the data required in a 4:4:4 chroma ratio. For this reason, 256-level gray-scale JPEG images aren't usually much smaller than their 24-bit color counterparts, because most JPEG implementations aggressively subsample the color information. Color data therefore represents a small percentage of the total file size. Decimation takes many forms, from simply using every other data value to averaging two values or using even more complex algorithms.
Lossy JPEG (there's also a lossless variant) takes advantage of pixel-to-pixel correlation by converting each matrix of pixels' information from the spatial to the frequency domain using a discrete cosine transform (DCT). The committee chose a matrix size of 8×8 after carefully considering processing and memory impacts. An 8×8 DCT yields one dc and 63 ac coefficients, whereas a 16×16 matrix, for example, would have significantly increased the logic-engine burden and would have resulted in 256 coefficients to manage in on-chip registers or memory locations.
Keeping
in mind that accuracy in low-frequency information is more important than high-frequency
detail, lossy JPEG then quantizes the ac coefficients, with a larger division constant for
high-frequency data. This fine-to-coarse quantization results in a large number of zeros
in the data set, which JPEG further encourages by ordering the coefficients from dc to
high frequency in a zigzag pattern to produce long runs of zeros in the bitstream (Figure 1). Run-length-encoding (RLE) compression of ac
components transforms these long zero strings into more manageable bit lengths. JPEG codes
the dc coefficient as a difference from the dc coefficient of the previous 8×8 matrix.
The final compression step uses variable-length Huffman or arithmetic compression to
reduce commonly occurring RLE-value-number-pair and dc-difference-coefficient sizes.
One key JPEG advantage over other lossy compression
schemes is that, regardless of file size, the software routines and hardware-acceleration
circuits are identical. Larger-pixel-count images just take longer to encode and decode.
As a result, JPEG-based systems benefit from industry standardization,
high-volume-manufacturing economies of scale, and multiple vendor sources. JPEG can
typically compress a color image to 15-to-1 before the compression artifacts appear (Figure 2). These artifacts manifest themselves as blocky
false pixels (because of low-frequency coefficient quantization) and edge blurring
(because of high-frequency coefficient quantization). Fortunately, the artifacts, being
pixel-bound, are more noticeable on low-resolution output devices, so an image may look
much better on a 300-dpi printer than when artificially enlarged on a 72-dpi monitor.
JPEG-conversion and -compression software almost always allows the user to select a desired performance, file-size, and quality level, controlling variables such as the quantization-constant table entries and chrominance-decimation ratios. Unfortunately, there's no standardized way of defining quality. Adobe (www.adobe.com) Photoshop uses a 0-to-10 scale with the larger number corresponding to higher quality. JASC's (www.jasc.com) Paintshop Pro uses a 0-to-100 scale with the smaller numbers giving the highest quality results. Other programs offer only high, medium, and low quality settings. Because these settings don't normally change, regardless of whether you're saving a JPEG for the first time or modifying an already-converted JPEG file, many users complain of rapid image degradation after repeated editing.
Another problem with JPEG is that it's difficult to predict in advance how big your compressed file will be, because results vary depending on each image's characteristics and the desired quality level. Progressive JPEG, with support finally becoming widespread in editing software and World Wide Web browsers, goes a long way toward solving this issue. A progressive-JPEG encoding algorithm makes multiple passes through a source file, storing compressed data at increasingly higher resolution levels without repeating low-resolution information. The multipass encoding algorithm can stop or prompt the user when the algorithm reaches a desired file-size threshold, even if it hasn't achieved the specified quality level.
Progressive JPEG also creates files that are more Web-friendly than baseline (standard) files. The sizes of the files are comparable with standard JPEG versions, but when downloading these files from a Web server, the viewer receives an almost-instantaneous, full-sized, low-quality version of the image. The progressive-JPEG image im-proves in quality with additional download time. Decoding a progressively coded JPEG file requires more processing power than does a standard baseline equivalent--another factor that, until recently, limited progressive JPEG's pervasiveness.
Variable quantization, a recently added JPEG capability, enables the encoder to vary the nonlinear quantization-constant table within a file on an 8×8-matrix boundary. You can now aggressively compress most of an image, while retaining fine detail where it's needed, such as a book's text or someone's face. Variable quantization has yet to establish widespread support in imaging hardware and software.
Advanced JPEG encoders and de-coders improve image quality at aggressive compression levels but still remain compliant with the JPEG specification. Because the encoder begins with a lossless source file, it can predict where edge artifacts may occur and make appropriate quantization adjustments to compensate. Intelligent decoders can detect probable edge artifacts and more smoothly blend the pixel-to-pixel decoded color variations before outputting to a screen or printer. Logic and memory have significantly improved in both performance and cost reduction--including gaining the ability to embed significant amounts of memory and logic on a single chip--since JPEG work began a decade ago. As a result, support for 16×16 DCT conversions in future JPEG-specification revisions is possible, although a conversion to wavelets is more likely.
MJPEG (Motion JPEG) was one of the first industry attempts to devise a standard video-compression scheme. MJPEG consists of a bit stream of individually JPEG-converted still-image frames. This approach enables frame-by-frame editing but is not ideal for a number of reasons. Even using aggressive lossy compression, MJPEG files are extremely large and require significant bandwidth and processing power to achieve reasonably sized color display rates of even a few frames/second.
JPEG's broad definition also encompasses a range of color depths, chrominance-decimation ratios, and quantization factors. However, multiframe/ second decoding requires custom hardware that assumes that the MJPEG file was encoded in a specific way. Encoder and decoder incompatibilities have, to date, ensured MJPEG a minor place in digital-video history.
MPEG has proved to be a far more successful approach. Development of the MPEG-1 specification began in the late 1980s, spurred by (among other factors) Intel's (www.intel.com) purchase of Sarnoff Lab's (www.sarnoff.com) proprietary DVI (digital-video-interactive) algorithms, which Intel later renamed Indeo. MPEG-1 targeted a maximum 1.86-Mbps bit rate, the best sustained performance that the double-speed CD-ROM drives of the era could deliver, and a 60-field-per-second progressive color-pixel CIF (Common Intermediate Format) display in the United States. Two interlaced CIF frames with interpolation combined to form a VHS-quality NTSC video output. Display dimensions and frame rates differ slightly for PAL (phase-alternation line) and SECAM (Systeme Electronique Couleur Avec Memoire). Note that MPEG-1 also supports larger-than-CIF frame sizes and rates at correspondingly greater-than-1.86-Mbps bandwidth.
MPEG-1 is both a subset and a superset of JPEG. It allows only the 4:2:0 luminance-chrominance ratio and a 24-bit color depth. MPEG-1 developers also realized that MJPEG encoders typically transmit a great deal of redundant information from one frame to another. Therefore, the developers defined three picture-coding types, which together significantly reduce the MPEG-1 average bandwidth over MJPEG.
Intra (I) frames are essentially 4:2:0 24-bit color JPEG-encoded still images. Interspersed between any two I frames are forward-predicted (P) and bidirectional-predicted (B) frames. Each P frame's 16×16-pixel macroblock, comprising four 8×8 luminance and two 8×8 chrominance matrices (one each for Cr and Cb), may contain vector or difference DCT coefficients from previous I and P frames. However, if the encoder doesn't find sufficient compatibility with previous frames, a macroblock contains full I-encoded information. The encoder must comprehend luminance, color, and frame-to-frame spatial (movement) data. Motion compensation occurs both between even and odd scan lines of the two MPEG-frame-derived interlaced-video outputs and from one video frame to another.
The
encoder creates B frames from the closest two I or P frames, using one in the past and one
in the future (Figure 3). B frames are
especially useful for fast-forward and -rewind functions and also tend to contain less
information than their I- and P-frame counterparts. As with I frames, each B-frame
macroblock can contain either difference or full-encoded data. In this case, however, the
encoder has three potential data sets from which to calculate the difference: the
preceding I or P frame, the next I or P frame, or an average of the preceding and next I
or P frames.
I-, P-, or B-frame selection is up to the encoder, as is macroblock encoding within a frame. For this reason, the frame pattern and per-frame data-set size vary, depending on the scene being encoded. MPEG-1 proponents estimate an average 100-to-1 compression ratio, although the compression you see partially depends on your counting scheme. The sequence of frames output from the encoder is also different from what you ultimately see on a monitor, because the decoder must receive I and P frames before it receives corresponding B frames. To minimize performance impacts, most MPEG decoders can simultaneously store one frame to local memory, decode another, and output a third.
MPEG-2, the video foundation for digital versatile disk (DVD), expands
on MPEG-1's compression concepts in several ways. MPEG-2 optionally supports the 4:2:2
luminance-chrominance ratio for professional-video applications. The MPEG-2 committee also
prioritized efficient conversion of 24-frame/second film, accurate representation of
interlaced broadcast video, and high-quality reproduction of more-than-two-channel stereo.
Other MPEG-2 en-hancements include separate quan-
tization tables for luminance and chrominance and the ability to represent the dc
component of the DCT-coefficient set with as much as 10-bit precision.
Videoconferencing standards, such as H.261 and H.263, tend to be application-optimized versions of the same concepts represented in MPEG. For example, H.263 specifies QCIF (quarter-CIF) dimensions but uses MPEG's 4:2:0 chroma ratio. Encoders also assume more limited motion, a safe presumption in a teleconferencing setup, and simplify encoding by eliminating B frames. Other proprietary video-compression algorithms also manipulate the same key variables that drive MPEG's definitions: frame rate, frame size, color depth, frame-to-frame redundancy, available bandwidth, unidirectional or bidirectional communication over the available bandwidth, encoder and decoder cost and complexity, and user quality expectations.
Both lossless and lossy techniques commonly take advantage of the pixel-to-pixel, field-to-field, and frame-to-frame consistency that is common in still and video images. However, because accurately decoding difference information can be successful only if the decoder previously received the reference data intact, many types of compression schemes are highly sensitive to single-bit errors. If, for example, the MPEG decoder receives a bad I-frame-macroblock data set, all frames from that point to the next I frame will display garbage information in that macroblock.
The impact of this single-bit-error sensitivity depends on your application. If the end user can solve the problem simply by requesting a repeat download of the image file or by exchanging the defective DVD for a replacement, the issue is primarily one of inconvenience. In videoconferencing or security applications, however, in which lost data is irreplaceable, you should consider more robust redundancy schemes.
Wavelet transforms, just like the Fourier transforms exemplified by JPEG and MPEG's DCT, convert spatial (for images) or time-based (for audio) data into the frequency domain. Both Fourier and wavelet functions integrate to zero; in other words, they vary symmetrically above and below the x axis. They both also manipulate sample data in the form of multiple frequency coefficients. However, Fourier transforms comprise periodic and infinitely repeating sine and cosine functions and are therefore localized in frequency but not in time or space. These transforms assume that the analyzed signal is periodic and infinitely long, and they can't effectively resolve nonrepeating transients.
Lack of an inherent timebase also means that Fourier-transform results average during the entire signal. To use an analogy, excerpts from a Mozart symphony and a Beatles single sound completely different from each other over any reasonable time interval but could have identical Fourier-transform frequency plots. The same issue holds true for images if you substitute the words "pixel location" for "time." DCTs therefore use repeated 8×8 matrices both to minimize logic complexity and to artificially retain some pixel-location details.
Wavelet
transforms, on the other hand, use time- and spatial-bound functions, such as single-cycle
square waves and sin(x)/x curves, or other periodic functions
combined with a Gaussian distribution. Wavelet transforms enable retention of frequency
and time/space information, and, therefore, wavelet image compression decomposes the
entire image in one operation. At each spatial-frequency level, wavelet transforms capture
both averages and differences of neighboring pixels or groups of pixels. In many cases,
difference components are close to zero, and the compression algorithm can aggressively
quantize or even discard them (Listing 1).
One common form
of wavelet transformation used in image compression is the Daubechies family, named after
its inventor, Princeton University mathematics professor Ingrid Daubechies. The input data
signal conceptually passes through multiple stages of a pyramid arrangement of lowpass and
highpass filters, or "quadrature-mirror filter pair," which looks somewhat
similar to an octave-tuned stereo-equalizer circuit (Figure
4). The result is a series of images decimated both horizontally and
vertically, the first image representing the lowest resolution smoothed average of all
pixels (the resolution 1 average in Listing 1)
and the other images representing detail coefficients at various resolutions (Figure 5).
In digital-filter form, lowpass and highpass coefficient matrices repeatedly multiply with and decimate each row and column of the 2-D pixel array. The wavelet-transform algorithm stores the high-frequency detail values, representing one-half of the result, and a smaller coefficient set multiplies with low-frequency average data. At the end of the transform, you're left with one high-frequency detail coefficient and one low-frequency "seed" value for each row's and column's luminance and chrominance. Assuming that you haven't quantized the detail coefficients, reversing the process exactly reconstructs the original image, beginning with a blurred, low-resolution version. Each iteration of the decoding algorithm builds on the previously calculated averages, eliminating redundancy and optimizing available bandwidth.
Quantization, however, is key to wavelets' compression advantage over JPEG and MJPEG for a similar perceived quality level. Because wavelet compression operates on the entire image--thousands of pixels--the strings of zeros produced for high-frequency terms can be much longer than those created by the 8×8 DCT matrix, allowing RLE compression to operate more effectively. Wavelet techniques strive to establish relationships between the "parent" image and the higher frequency "children" to optimize the quality-versus-compression trade-off. Wavelet lossy artifacts generally appear not as blocks (as in JPEG) but as an overall smoothing of the image, which many viewers prefer (Figure 2). Because row (horizontal) details are more crucial than column (vertical) details, you might choose to more aggressively quantize column details.
Analog Devices' ADV601, ADV601LC, and ADV611 are among the first commercially available hardware devices for wavelet video compression and decompression. They use a biorthogonal 9,7 approach with the first filter stage operating on only horizontal pixel information to reduce memory and processing requirements as well as costs. Whereas the ADV601 supports 10-bit luminance and chrominance precision, the ADV601LC uses MPEG-1-compatible, 8-bit precision. The ADV601 also offers a serial port for optional external DSP assistance in bit-rate conversions, and the ADV601LC comes in a smaller, 128-pin TQFP. The ADV601LC costs $14.95 (10,000), roughly half the price of the $30.90 (10,000) ADV601.
The Analog Devices $199 VideoPipe board is an easy-to-use means of evaluating ADV601LC image quality. It accepts NTSC or PAL and composite, S-video, or CCIR656 inputs, and it outputs composite, S-video, or CCIR656 to a monitor. Pushbuttons on the board allow you to freeze a frame. They also allow you to vary the compressed bit rate from 128 kbps to 33 Mbps, the field rate from one field per second to 50 PAL or 60 NTSC fields per second, and the image resolution from QQQQCIF to CCIR-601. Analog has also designed the VideoLab, a PCI-based board for the ADV610.
The new ADV611 is a variant of the ADV601LC that allows you to selectively enhance image quality in a portion of the frame, with results conceptually similar to JPEG's variable quantization or Geo Publishing's (www.emblaze.com) Emblaze Webcharger. Other wavelet-hardware options include decimation and biorthogonal-filter logic cores for Altera (www.altera.com) Flex10K and Flex8000 programmable-logic devices from Fastman. Xilinx (www.xilinx.com) has also worked on several wavelet projects with customers using its FPGAs. The company hopes to have a wavelet-core design kit in beta by the third quarter with general availability before the end of the year.
If you'd like to use a generic DSP for hardware-assisted wavelet encoding and decoding, companies such as Mathsoft, The MathWorks, Visual Numerics, and Wolfram Research can assist your development with off-the-shelf functions and utilities. Compression results depend on how well you optimize the wavelet-transform function for your assumed input pattern, and these companies' software provides rapid, straightforward, and valuable experimentation.
Analog Devices has also developed free, completely software-based wavelet encoders and decoders for video editing and display on a PC. The devices require a minimum 266-MHz Pentium II-class processor. Other software-only wavelet-transform options for both still images and video come from Compression Engines, Infinop, and Summus. Each company sells compression and decompression development kits, usable with multiple microprocessors and operating systems. Intel also uses wavelet transforms for compression in Indeo Version 5.0.
Compression Engines' video-encoding algorithm uses a spline wavelet transform. According to the company, the transform can compress a 40-frame/second 4:2:2 video stream of QCIF dimensions in real time on a Pentium Pro-200, Windows NT-based PC. The transform can also achieve 10-frame/second encoding of a CIF-sized video stream. Both Compression Engines and Summus report that they're working on ports of their Win32-based codecs to Windows CE. Summus' wavelet support is built into Corel (www.corel.com) Draw, and Summus even claims that NASA will use Summus' wavelet technology on the next Mars Pathfinder mission.
Unlike MPEG's P and B frames, wavelet video compression stores all image information within each frame's data stream. This approach has strengths and shortcomings. For a given resolution, color depth, frame rate, and total video duration, the wavelet file is larger than that of MPEG. On the other hand, eliminating the need to construct and deconstruct predicted frames smoothes out the bit-stream rate and substantially reduces encoder and decoder complexity and cost. You can display and edit each wavelet-compressed frame as an individual picture, whereas all but the most complex (and expensive) MPEG editing gear restricts you to I frames. Intrafield encoding is also more forgiving of single-bit errors than is MPEG's interfield-dependent approach.
Wavelet compression is a relatively immature technology compared with today's mainstream compression schemes. Unlike the 8×8 DCT at the core of JPEG and MPEG, there are literally an infinite number of possible wavelet-transform functions. This situation gives each vendor working on the technology tremendous latitude to innovate. On the other hand, each vendor's wavelet techniques and file formats are currently incompatible with all others, and the degree of symmetry between encoding and decoding stages also varies. If you want to convert wavelet video files to MPEG-1 or MPEG-2, Xing Technology (www.xingtech.com) offers transcoders for MPEG-1 conversion, and Darim Vision (www.darvision.com) offers transcoders for MPEG-2.
For still images, you need an intermediate software package, such as Adobe Photoshop, to open the file in one format and save it in another. Video equivalents to Photoshop, such as Adobe Premiere, also support reading and writing files using wavelet codecs. Most vendors' wavelet software also allows you to import and export still images to standard file formats, such as JPEG, PNG, and TIFF (tagged image format file). Fortunately, help is on the way. The JPEG 2000 committee is seriously considering adding standardized wavelet transforms to the specification, and the MPEG-4-standards body documentation also discusses still-image wavelet support.
The US government funded much of the early research on the commercialization possibilities for wavelets and fractals. Its interest had a dual purpose. A number of applications, such as battlefield communication of real-time audio, still- and video-image information, and orbit-to-ground transmission of surveillance and weather-satellite data, valued highly compressed files that retained fine image details. Other applications, such as fingerprint and facial identification, were interested in not only reducing database size but also using frequency sub-band or arithmetic comparisons for image matching.
Fractals are extensions of traditional Euclidean shapes, such as lines, squares, and circles, with two fundamental properties. First, when you view fractals, you can magnify them an infinite number of times, and they contain structure at every magnification level. Second, you can generate fractals using finite and typically small sets of instructions and data. Fractals grew out of the goal of mathematicians to completely describe the world using standard geometrical expressions. IBM mathematician Benoit B Mandelbrot, PhD, proved and published the theory behind fractals in 1981 and was the first to view computer-generated fractal structures. The well-known Mandelbrot fractal set is named in his honor. Another famous fractal researcher, with a set also named for him, is French mathematician Gaston Julia.
The standard Mandelbrot fractal equation takes the form z(n+1)=z(n)2+c, where c is the complex number x+iy corresponding to any point on the (x,y) coordinate plane. Fractal equations are iterative, in that the result of one calculation of the fractal equation becomes the z input to the next calculation. Over repeated evaluations of a fractal equation, values for each point in the (x,y) coordinate space either converge at single points, move toward the (0,0) origin point, or move toward infinity. The diverse colors in fractal plots reflect the rate of this movement for each point. Discussions of chaos theory frequently use fractals as examples, because slight variations in the fractal equation produce radically different results.
What's all of this theory got to do with image compression? Conceptually, you might consider the world to be one complex fractal image. For example, from space you can see very little detail of any point on the earth. The closer you get, the more you see. First, the North American continent becomes distinct, then you can pick out California. At some point, you see Sacramento; then, my house. You can even extend this process and examine the subparticles of an atom contained in my house's roof or on the surface of a flea on one of my dogs.
For a slightly more tangible example, consider a tree. Ideally, you could envision each tree limb as a modified version of the trunk, transformed in x, y, and z coordinates; luminance; and chrominance. Similarly, each piece of bark on the idealized tree is a transformed version of every other piece, as is every leaf. Optimized fractal equations can create impressive simulation images of trees, ferns, clouds, waves, mountains, blood vessels, neurons, and other objects. Note, however, that these are computer-generated simulations.
Fractal compression attempts to compute arithmetic expressions that fully describe real-life scenes. Several factors complicate this already-challenging effort. For example, real-life leaves aren't directly transformed variants of each other: Some are torn or twisted; others have missing portions. Also, the input source file that a fractal transform analyzes isn't infinite in detail like a real fractal but is finitely bound by the pixel count, arrangement, and bit precision of the digitized source.
Iterated Systems is a key proponent of fractal
compression and holds several of the fundamental patents on the technology. The company
uses a variant of fractal transformation called an "affine transform." As an
example, each iteration of the letter F is a modified version of the original in location,
orientation, and dimension (Figure 6). With
every iteration of the affine transform in the contractive affine map, the differences
between each result shrink, eventually converging on a fixed point--"the
attractor."
Affine transformation of a digitized image breaks the
pixels into groupings called "domains" and "ranges" (Figure 7). By evaluating each possible domain/
range pair for redundancy and similarity of pixel pattern, luminance, and chrominance, the
compression algorithm eventually produces a set of affine transforms that converge on the
original image, or attractor. The compression ratio and the quality of the
fractal-compressed file depend on the resolution of the source input, the degree of
similarity in domain/range pairs, and the amount of time you allow the transform algorithm
to run. The fractal-transform algorithm enables you with reasonable precision to control
the size of the resultant compressed file.
Fractal-compression promoters cite several advantages to their approach. Images can compress to JPEG or wavelet-sized files with comparable quality. Once you compress the files, you can expand the images to create versions several times the original size that retain the detail of the original digitized file. For example, compare the size of a bit-map file (such as that created in the Windows Paintbrush raster-graphics program), representing a bezier curve or other 2- or 3-D structure, with its equivalent created in a vector-graphics program, such as Micrografx (www.micrografx.com) Designer or Corel Draw. As you enlarge both objects, the bit-map file exponentially increases because of growth in all dimensions, and the vector-based file remains constant.
Enlarging pixel-based files also requires the editing program to create pixels where they didn't exist in the original bit-map matrix. Techniques such as bilinear, trilinear, and anisotropic approximation derive new pixel information from surrounding "real" pixels, but results include fuzzy edge transitions and artificial-looking pixel-to-pixel color progressions. Because fractal-compressed files store pixel groupings as grid-independent mathematical expressions, fractal proponents claim that such files can better retain the original pixel-transition relationships.
It's important to note that fractal enlargement does not create detail where it didn't originally exist, a fact that fractal proponents sometimes gloss over. For example, if you enlarge a picture of someone's face, you may or may not get realistic-looking skin. However, you don't see pores unless they were captured in the pixel matrix used to create the fractal file in the first place.
Currently, the most significant shortcoming of fractal compression is the large amount of processing power and memory needed to identify and calculate the affine-transform domain/range relationships. For example, 40-kbyte JPEG conversion of the 900-kbyte source TIFF in Figure 2c took 1.2 seconds on my Pentium-133 non-MMX desktop PC with 64 Mbytes of EDO DRAM using Adobe Photoshop 4.01. Wavelet conversions using Infinop's Lightning Strikes 3.1 plug-in (Figure 2d) and Summus' 4U2C plugin took 0.6 and 4.6 seconds, respectively. A 41-kbyte fractal compression using Altamira Group's Genuine Fractals plug-in containing Iterated Systems' affine-transform engine took 1 minute, 19.7 seconds (Figure 2e).
When reviewing these numbers, keep in mind that JPEG is a far more mature technology than wavelets or fractals; therefore, companies such as Adobe have had much more time to optimize their JPEG codecs. Because all of these compression schemes are math-intensive, results might also be quite different on an MMX-enabled system. To address the performance issue, Iterated Systems has developed several generations of ASIC-based fractal-transform hardware for confidential US government projects.
The Multimedia Access Osprey-1000 board is another fractal hardware-acceleration platform for PCI-based Windows NT systems. This board can render low-bit-rate Video for Windows files in real time. Fractal compression, like MPEG, is an asymmetrical approach. Opening the aforementioned 40-kbyte fractal file took only 1.6 seconds. Iterated also blends proprietary wavelet techniques into the fractal compression in its latest Sting technology.
The type and amount of compression you choose depend on a variety of system characteristics. What output image size are you targeting, and what is the resolution of the output device? A 4×5-in. image on a 72-dpi computer monitor requires a factor of 17 fewer pixels than the same-sized image output to a 300-dpi printer. Common digital-video resolutions (pixels×lines) include 720×486 (CCIR-601), 640×480 (VGA), 360×243 (CIF, or Standard Intermediate Format (SIF)), and 180×121 (QCIF, or quarter SIF (QSIF)).
Analog-video-display standards include NTSC at 525 scan lines and PAL and SECAM at 625 scan lines each.
For high-definition TV, the Advanced Television Systems Committee and Grand Alliance specify 18 video formats, as high as 1920×1088 (30 frames/second interlaced) or 1280×720 (60 frames/second progressive).
What is the maximum number of colors that your output device can accurately display? It makes little sense to store the images at 24-bit/pixel color if you are outputting to an 8-bit color printer or graphics card. The 24-bit color file is larger than necessary (assuming that you compress the alternative 8-bit file using the same algorithms), and performance suffers as the output device reduces the color of each image before displaying them--with varying accuracy results. On the other hand, 8-bit color is generally insufficient for accurately representing the kind of information in photographs, such as skin tones and subtle shades of color.
For video applications, what frame rate do you need? NTSC uses a frame rate of slightly less than 30 frames/second, and PAL and SECAM specify slightly less than 25 frames/second. Movies made with 16-mm film use a 24-frame/second specification. A lower frame rate means a smaller data set and less required bandwidth. per frame, or it can mean less compression to achieve a particular data-set size and bandwidth capability. However, when viewing images at rates lower than 15 frames/second, most viewers can quickly recognize the choppy, unnatural output made famous by the TV show Max Headroom.
What level of processing power and memory exists at the encoder and decoder to handle compression and decompression tasks? Some compression algorithms, such as MPEG and fractals, are asymmetrical, requiring significantly more MIPS, scratchpad RAM, and storage to encode the image at the originator than decoding it at the recipient requires. Asymmetrical compression is attractive in one-source, many-destination applications, such as video distribution to either set-top boxes or Internet-connected PCs, in which decoder cost is a significant issue.
An example of "reverse" asymmetrical compression occurs with digital still cameras and videocameras that transmit a minimally processed, lossless, compressed file of the raw sensor-output data to the computer for decompression and processing. Another example of asymmetry, this time with respect to memory storage, is the FlashPix file format. FlashPix stores multiple resolutions of an image on the server, resulting in a file roughly 50% larger than its highest single-resolution alternative. However, because the client communicates the resolution and percentage of the total image it needs at any time, FlashPix minimizes both the required download bandwidth and client processing and memory.
Symmetrical compression schemes, such as wavelets, are advantageous when the application has no clearly defined "source" and "destination." If any system or subsystem could at a given time either broadcast or receive an image file, symmetrical compression would reduce overall cost. This cost-reduction argument is especially appropriate if the compression and decompression engines share some of the same circuits.
What is your system user's perceived quality expectation? Also, does the application demand a constant quality level, a constant data-bandwidth rate, or both? Lossless and lossy compression take advantage of certain combinations of data patterns and image characteristics that may not exist in every frame. If the decoder demands a constant-rate data stream, the encoder needs either to turn off compression or to eliminate a varying amount of data, often in the lossy-compression stages, to maintain a consistent per-frame file size. The result is varying per-frame quality.
If the destination can accept a variable-rate data pattern, perhaps by buffering the decoder input and output, the source can vary the per-frame file size and maintain a consistent image-quality level. Available bandwidth depends on the characteristics of the transmission channel and on whether the communication is one-way, as in video viewing, or two-way, as in videoconferencing.
Kevin Leary and Roger Smith from Analog Devices, Tom Lane from the Independent JPEG Group, Burt Smith from Iterated Systems, Bert Hornbock and Bruce Totty from Irvine Sensors, Majid Rabbani from Kodak, and Allen Rush from Sierra Imaging all provided valuable information during the research and review phases of this article. I'm also grateful to Lane, Rabbani, and Rush for their detailed and comprehensive feedback on early article drafts.
| For more information... | ||
| When you contact any of the following manufacturers directly, please let them know you read about their products on EDN's website. | ||
| Altamira Group Inc Burbank, CA 1-818-556-6099 fax 1-818-556-3365 www.altamira-group.com |
Analog Devices Inc Norwood, MA 1-781-461-3881 fax 1-781-326-8703 www.analog.com |
C-Cube Microsystems Inc Milpitas, CA 1-408-944-6300 fax 1-408-940-8590 www.c-cube.com |
| Compression Engines LLC Houston, TX 1-281-876-3976 fax 1-281-876-3974 www.cengines.com |
Fastman Inc Austin, TX 1-512-328-9088 fax 1-512-328-9317 www.fastman.com |
Infinop Inc Denton, TX 1-940-484-1165 www.infinop.com |
| Iterated Systems Atlanta, GA 1-404-264-8000 fax 1-404-264-8300 www.iterated.com |
Mathsoft Inc Cambridge, MA 1-617-577-1017 fax 1-617-577-8829 www.mathsoft.com |
The Mathworks Inc Natick, MA 1-508-647-7000 fax 1-508-647-7001 www.mathworks.com |
| Multimedia Access Corp Dallas, TX 1-972-488-7200 fax 1-972-488-7299 www.mmac.com |
Summus Technologies Inc Fort Lauderdale, FL 1-954-486-2000 fax 1-954-486-9664 www.summus.com |
Visual Numerics Inc Houston, TX 1-713-784-3131 fax 1-713-781-9260 www.vni.com |
| Wolfram Research Inc Champaign, IL 1-217-398-0700 fax 1-217-398-0747 www.wolfram.com |
||
@ a glance |
|
| You can reach Technical Editor Brian Dipert at
1-916-454-5242,fax 1-916-454 5101, e-mail edndipert@worldnet.att.net, URL http://members.aol.com/bdipert. |
| EDN Access | Feedback | Table of Contents |