Good enough data compression: Samplify could open the door to novel uses
When designers talk about data compression these days, typically they are thinking about highly-evolved lossy compression algorithms for particular media data types: MP-3 for audio, for example, or h.264 for video. These algorithms are relatively complex, but can produce very high compression ratios—from 10:1 to 50:1, roughly, with what many find acceptable losses in content quality.
But data compression technology started with another concept altogether: lossless data compression, as best exemplified by the Lempel-Ziv algorithm used to create zip files. Rather than trying to find acceptable patterns of data loss in order to achieve very high compression ratios, L-Z guarantees an exact replica of the original data, but cannot guarantee a useful compression ratio. The algorithm works by building a dictionary of recurring patterns, scanning the incoming data, and replacing a pattern it recognizes with a symbol from the dictionary. For data streams that actually have large numbers of recurring sequences of bits, the compression ratio can be reasonably good: 2:1 or better. For true white noise, there would be almost no compression at all. L-Z is conceptually simple, but relies on exhaustive pattern-matching or clever hashing, and a large dictionary in memory.
Even before L-Z, thoughtful designers asked themselves what they knew about their data, and how they could turn that foreknowledge into simple, ad-hoc compression tricks: shortening delimiter strings, for instance, run-length encoding the gaps, or reducing the number of significant bits or decimating the sample stream when the designer knew that the loss would make little or no difference. Such techniques live in between lossless but relatively ineffective compression and highly effective but unacceptable compression.
One such algorithm has turned out to have a life beyond its original purpose: or at least so hope the employees and investors in a small but eager company, Samplify Systems. The algorithm has a number of interesting attributes: it is simple and, in an FPGA implementation, very fast and compact. It can be either lossless, lossy with fixed bit-rate, or lossy with fixed quality when an easily-computable metric exists. And like L-Z, it is conceptually elegant.
Al Wegener, CTO and founder of Samplify, explains the algorithm in general terms as follows. The Samplify engine comprises two major blocks. The first block is a preprocessor that does application-specific things to reduce the amount of work the compression engine will have to do. These things include removing one or more LSBs from each data word in a lossy compression scheme, for instance, or shifting the data from a high frequency band to start at DC.
The second block executes Samplify’s secret sauce: a purely lossless algorithm developed by Wegener before he founded the company. Here’s where the elegance comes in. You decimate the data stream, by, for example, splitting it into odd- and even-numbered samples. You then reconstruct the original stream using just, for example, the odd stream. Now you create an error stream that is simply the difference between your reconstruction and the original data. In this example, the odd data samples plus the error stream would be the output. If you have done your preprocessing thoughtfully, the error stream will be significantly smaller than the stream of even-numbered samples, so you have achieved compression.
Obviously this approach is not going to produce high compression ratios in lossless mode. Depending on how you decimate the data, which of course depends on the nature of the data, it shouldn’t often do much better than 2:1. Add the preprocessor, and in some applications the engine can do acceptable lossy compression at up to 8:1. And in fixed-loss mode, the algorithm can produce good compression at fixed error levels, whether measured in dB, in Error Vector Magnitudes, or some other quantitative means.
So who wants just-good-enough compression with relatively low compression ratios? The answer may be that lots of people do, many of them without knowing it yet. Currently, a couple of the more promising applications with which Samplify has engaged are medical imaging and cellular base stations.
In computer-aided tomography applications, lossless data compression at even less than 2:1 can mean nearly doubling the amount of data that you can get through the moving mechanical connector between the rotating ring of x-ray detectors and the processing unit. Accepting a little more noise—but still below the threshold that diagnosticians regard as visible—can mean a quite significant reduction in the data rate through this bottleneck.
In cellular base stations, the situation is similar. Here, the bottleneck is the cable running up from the baseband processor to the power amplifiers that now hang on the antenna pole. Again, a very modest compression rate can mean preserving the same physical connection at a higher data rate—either more antennas or, perhaps, migration to a more demanding air interface. Surprisingly, this application also turns out to be tolerant of small levels of distortion, Wegener observes, so base station designers have also begun looking at the higher-ratio, lossy algorithms.
We mentioned earlier that the algorithm is fast and simple. It’s time to quantify that. Wegener says that the 16-bit version of the full engine, including both preprocessor and lossless compression block, occupies about 1200 slices in a Virtex-5 FPGA, and it can handle 300 Msamples/s data in that implementation. A software-only version of the engine running on a 2 GHz dual-core Xeon processor can handle up to 100 Msample/s data. The company is currently investigating other processing environments, including the IBM Cell processor.
There are some very intriguing hints at other applications. For one, the algorithm lends itself to use inside signal-processing pipelines. Wegener says that because of the straightforward arithmetic relationship between the original and compressed bit streams, you can partially decompress the data, perform arithmetic operations on the stream in this intermediate form, and then recompress the results, sometimes significantly reducing the depth or frequency of a pipeline. Beamforming in particular has proven compatible with this approach, Wegener says. Additionally, it is possible in principle to combine other numerical algorithms with the compression, so that, for instance, a single engine might be able to do both crest-factor reduction and compression on an OFDM baseband signal headed up an antenna cable.
In another direction, the fact that the engine is relatively simple opens some interesting possibilities in die-to-die interconnect. Off-die interconnect is increasingly moving toward high-speed serial links to reduce the pin count, power, and signal integrity issues involved in moving large amounts of data between chips. This trend is probably most advanced in FPGA-based logic emulation systems, but it is showing up elsewhere in the multi-die design world as well, including inside multi-die packages. A small, low-power compression engine sitting between the SerDes and the digital portion of an SoC could substantially reduce the bandwidth requirements for these inter-die links. It’s worth thinking about.
wegener commented:
bamaskas commented:
ron commented:
LarryM commented:















