
Knowing which FPGA (field-programmable gate array) or CPLD (complex programmable-logic device) will match your application can save you significant design time. The first issue to resolve--but one which is beyond the scope of this article--concerns in-system versus off-line programmability. Next, you should determine whether a device's capacity and performance fit your application. Some simple selection guides in this article can help you compare various devices' capacity and performance. Once you resolve the fitness question, you can choose among a manageable number of devices, looking at cost, support, availability, etc (see box, "Articles that cover other aspects of multifaceted devices").
You can easily use all selection guides in this article without spending an enormous amount of time studying device architectures and performing abstruse fitting calculations. Because design involves a delicate balancing act where intuition plays a crucial role, no off-the-shelf benchmarks can replace experience.
Unfortunately, some of the most common metrics for FPGAs and CPLDs are worthless. Take, for example, a number borrowed from the gate-array world, the so-called equivalent-gate count. The count is the number of 2-input NAND gates needed to implement a given design that can also be implemented in the device. Because fundamental building blocks in most programmable devices are not NAND gates, gate-counting depends on the circuit selected for implementing and the vendor doing the counting. Besides, actual circuits seldom consist solely of 2-input NAND gates. Thus, gate count is not a good capacity measure for programmable devices. Other comparison parameters are the number of I/Os, macrocells, flip-flops, clocks, and speed or frequency.
Specifications not standardized
The lack of standardized specifications presents a major difficulty in interpreting vendor data books. In addition, data-book specifications are not necessarily usable. Take fMAX as an example: this vendor specification often does not account for setup and hold times for register-to-register operations. Also, data books sometimes cite fMAX without indicating whether the frequency is internal or external.
In addition, most devices' parameters change with operating conditions such as frequency, temperature, loading, voltage level, etc. But the industry has no standardized environment for measuring parameters.
Finally, vendors derive most of the their specifications from devices programmed with elementary logic structures that have little or no relation to real designs.
Interpreting PREP benchmark results
PREP benchmarks (Table 1), on the other hand, provide data derived from standardized tests in a standard format. Unfortunately, no recognized way of using PREP results exists, and every company will put its own twist on interpreting PREP benchmark results. Already some companies emphasize cost per instantiation while others stress the importance of system characteristics that are not benchmarked. Still other companies use biased "averages" of benchmark results, and accusations of misrepresentation and "unfair benchmarking" also exist. When reading advertisements, remember what Mark Twain once said, "There are three kinds of lies: lies, damned lies, and statistics."
Capacity comparison
One way to compare the capacities of different devices is to get the mean of the benchmark-repetition numbers for each device. Because benchmark circuits are of varying sizes, the simple arithmetic mean is not appropriate. For example, suppose that the repetition number of devices A and B for benchmarks 1, 2, and 3 are:
ERRONEOUS AVERAGE CAPACITY |
||
|---|---|---|
| Benchmark | Device A | Device B |
| 1 | 2 | 3 |
| 2 | 2 | 4 |
| 3 | 20 | 17 |
| Arithmetic mean | 8 | 8 |
| Harmonic mean | 2.86 | 4.67 |
The arithmetic means of the two devices' capacities indicate that they have the same capacity. But device B is 50% "larger" than device A for benchmark 1, and 100% "larger" for benchmark 2. Device B is only 15% "smaller" than device A for benchmark 3. So, saying that they have the same capacity is not accurate.
You could use a weighted arithmetic mean, perhaps giving larger circuits higher weights. However, capacity measurements for each individual benchmark circuit are highly architecture-dependent, and circuits are not implemented the same way by different vendors. So getting vendors to agree upon a set of weights would be very difficult. A better way for computing a valid average capacity is by looking at resource utilization.
Using the following two alternatives, one being the inverse of the other, could provide a comparable metric for programmable-device "sizes."
Utilization measure:
UM = [sigma] (1 / Ri) / n (average resources used per benchmark).
Capacity measure:
CM = n / [sigma] (1 / Ri) (average number of repetition per device).
PREP's benchmark method repeats a simple, standardized circuit until no more will fit in the device under test. Therefore, if R1 is the repetition number of benchmark-circuit i, then 1/R1i) (for i = 1 to n) is the portion of resources all n benchmark circuits use. And [sigma](1 / Ri) / n is the average usage per benchmark circuit. The inverse of this average value is the average capacity measure.
Note that capacity measure is the harmonic mean of capacity measurements of individual benchmarks. Depending on the nature of your data, you can obtain a mean value three different ways: arithmetic mean, harmonic mean, or geometric mean. Although the properties of these three means come under the heading of elementary statistics, surprisingly few engineers apply them properly, so a review of a math text may be in order.
In this case, taking the harmonic mean of capacity benchmarks is reasonable because the numbers are rates (instantiation per device). The harmonic mean automatically provides higher weights to smaller numbers (ie, larger benchmark circuits). The harmonic means for devices A and B in Table 1 are 2.86 and 4.67, respectively. Based on these numbers, device B has a larger overall capacity.
For a specific application, you can obtain a more accurate estimate by dividing a design into parts that match PREP's benchmark circuits. For example, suppose that a certain device's benchmark shows that it can accommodate 10 16-bit counters. If your application needs two 16-bit counters, they would occupy roughly 20% of the device. The total sum of all parts of the design should add up to somewhat less than 100% if you are trying to fit all the parts into one device.
Weighted-capacity measurements
The weighted harmonic mean embodies the weighted-capacity measurement. The weighted-capacity measure is
CWM = [sigma] wi / [sigma] (wi / Ri) (for i = 1 to n)
where wi is the weight you assign to benchmark i.
For example, if for a certain type of application the ratio for large state machines, 16-bit accumulators, and 4×4 ALUs is about 1:2:4, then the weights you assign to the respective PREP benchmarks can be simply 1, 2, and 4. You should give "unused" PREP benchmark circuits weights of zero.
For capacity comparisons, compute the weighted harmonic means for all candidate devices to determine the one with the best capacity for the given type of application. For a device to fit the application, its weighted harmonic mean must be greater than [sigma] wi. Note that if you replace [sigma] w1 by 1 in the numerator, CWM actually estimates the number of repetitions of the entire design which would fit into a given device. In this case, CWM should be greater than, but close to 1 for a good capacity match between the device and your application.
Of course, you cannot properly match all real circuits with PREP's benchmark circuits. In these cases, you may have to use some "equivalence" estimations. For example, you may equate a 31-stage, linear-feedback shift register to two of PREP's 16-bit counters.
RENT'S RULE PROVIDES ANOTHER DEVICE METRICAn interesting architectural measure is the logic-to-pin ratio. According to Rent's rule (an empirical equation relating I/O pins to circuits derived by IBM engineer E Rent): I = k (C)P where I equals the number of I/O lines, C is the number of circuits, and k and p are positive constants. For programmable devices, I is the number of user I/O pins and C is the number of usable gates. While various studies have produced different k and p values, generally p = 0.5 is a good value to use for VLSI circuits (making the number of I/O pins proportional to the square root of the number of circuits). So, in k = I / (C)0.5 k turns out to be between 0.5 and 2.5 (mostly between 1 and 2) for programmable devices. Using k = 1.5 as the dividing line, consider the device as having a high pin-to-logic ratio if k <1.5 and a high logic-to-pin ratio if k> 1.5. In general, high pin-to-logic devices are good for I/O-intensive applications while high logic-to-pin devices are good for logic-intensive applications. Consequently, some PLD vendors produce devices having the same internal logic but with different numbers of I/O pins. Another useful measure is the ratio between the number of registers (flip-flops) and the number of logic gates. In general, high register-to-logic devices are good for sequential applications and high logic-to-register devices are good for combinational applications.
|
Performance comparison
Again, the simple arithmetic mean is not a good measure of performance. In digital systems, the slower, lower frequency parts tend to dominate (limit) performance; hence, you should give them higher weights. The argument here parallels that for the capacity measure, the following two alternatives--one being the inverse of the other--can be used for programmable device speed.
Delay measure:
TM = [sigma] (1 / fi) / n (average delay per benchmark)
Performance measure:
fM = n / [sigma] (1 / fi) (average frequency for the device)
If f1 is the average frequency for benchmark i, then 1/f1 is the average delay for benchmark circuit i. And [sigma] (1 / fi) (for i = 1 to n) is the total delay of n benchmark circuits. Therefore, [sigma] (1 / fi) / n is the average delay per benchmark circuit. The inverse of this average value is the average frequency.
Because frequency benchmarks are also rates (cycles per second), the harmonic mean is reasonably the single performance measure. Note that the harmonic mean is always less than or equal to the arithmetic mean. The harmonic mean, in a way, incorporates a penalty for high variance. For a device with uniform performance results for the entire benchmark suite, the harmonic mean is the same as the arithmetic mean. On the other hand, for a device with varying benchmark results, the harmonic mean is always below the arithmetic mean. The wider the variation, the more the harmonic mean is below the arithmetic mean.
Again, for a specific design, you need only to examine numbers from benchmark circuits that match your design. For example, if the target design requires circuits similar to PREP's large state machine, a 4-bit ALU and a 16-bit counter at certain operating frequencies, then all candidate devices' results for these three benchmark circuits must exceed the respective target system's operating frequencies. Of course, your actual circuits are not going to be the same as benchmark circuits so use the benchmark result only as a first-order estimate.
Weighted harmonic performance mean
A weighted harmonic mean for performance makes no sense because, in general, you cannot trade the operating frequency of one part of the circuit with the frequency of another part. Often, the part of the circuit with the worst delay determines the design's overall performance in a synchronous system.
You may find examining the best and worst cases helpful to get some idea of the upper and lower bounds of performance. For example, using the critical-path design method, you could use the best-case result because the benchmark shows that achieving the reported speed by proper design is possible.
Some devices do not have all nine benchmark results reported because of vendors' inability to either implement or measure some of the benchmark circuits. Compute the harmonic means of these devices using the appropriately reduced
Articles that cover other aspects of multifaceted devices
EDN's editorial office does not handle reprints. To order reprints of EDN articles, contact
University Microfilm International, 300 N Zeeb Rd, Ann Arbor, MI 48106, (313) 761-4700. Acct # 1497
|
Single-number comparison
The capacity-speed product is an option if you want a single figure of merit for capacity-performance comparisons. This number is the simple product of CM and fM, and "retz" (repetition-megahertz) can be the unit of measure. However, you must be aware that you cannot trade capacity for speed or vice versa as you can with the gain-bandwidth product for op amps. The capacity-speed product is only a figure of merit with which you can quickly eliminate unsuitable devices. For example, if an application requires a device to have 500 retz, then all devices with retz numbers less than 500 would not qualify for the application. From among qualified devices, you then need to further qualify capacity and speed individually.
If you wish to include the cost in a single figure of merit, one way is the retz-per-dollar (rpd). That is, divide the retz number of the device by its cost. Thus, if a 500-retz device costs $100, then it is a 5-rpd device. However, device prices change regularly so be aware that rpd numbers can become outdated quickly.
Additional information
Benchmark results also provide some idea about predictability of performance. A device with small differences between best- and worst-case frequencies has more predictable timing. Similarly, a device that permits a high number of instantiation of one particular benchmark circuit, but which is comparable or worse than other devices for other benchmark circuits, probably best suits applications similar to the particular benchmark circuit. The percent-fill numbers indicate whether the particular device architecture is amenable to the type of circuit the benchmark represents.
In any case, be aware of routing congestions, which can significantly reduce the number of usable logic cells. While digital designers usually have a good idea of timing and logic requirements early in the design, they do not have the slightest idea about the interconnection requirement, because this requirement depends highly on device architecture and implementation.
In summary, to use the benchmark results, first choose devices that are of comparable capacity (using either raw benchmark results, harmonic means, or weighted harmonic means) and big enough for the application. The next step is to choose, among candidate devices, ones that can satisfy the performance requirement (either raw benchmark results or harmonic means).
Obviously, you should give the highest priority to the critical parts of your design when looking at benchmark results. For example, if a critical part of your design is a high-speed counter, then the counter benchmark should weigh more than other benchmark results, and perhaps even be the determining factor. Do not mix benchmark results obtained under different conditions (ie, automatic vs manual routing or optimized for capacity vs optimized for speed).
Comments on prep benchmarks
Using PREP benchmarks to measure programmable devices is like running benchmark programs to measure computer performance. Even though some benchmark programs are extracts from real programs, real programs don't resemble these benchmark programs at all.
PREP benchmarks have the same problem: not enough benchmark circuits exist to represent all possible real design needs. PREP benchmarks are useful for applications similar to existing MSI circuits. They are small- to medium-sized circuits that are not necessarily useful for designing complex circuits such as video graphics.
Remember, also, that the results of individual benchmarks are highly architecture-dependent. Architecture dependency explains why a device can have higher capacity than others in one benchmark circuit but lower than others in another. CPLDs and FPGAs should use different benchmark suites because they differ from each other.
PREP's step-and-repeat method for filling up a device with repetition of the same basic circuit does not reflect real PLD applications. The step-and-repeat method emphasizes local interconnections and downplays global interconnections.
Since benchmarks do not address other real-life factors such as power consumption, number of I/O pins, output-drive capability, packaging, 3-state capability, testability, in-circuit programmability, etc, data books are still important sources of information.
The role of software in benchmark results introduces another complicating factor. Does the benchmark's results reflect the software tools or the expertise of the designer using the tools? Remember that versions 1.2 and 1.3 of PREP's benchmark results are different simply because tools changed. For example, some software can optimize a design, thus producing more-impressive benchmark results. Practitioners also disagree about just what constitutes "manual" routing.
Also, vendors may concoct devices or software specifically tailored to the benchmark circuits. Thus, a device may have high benchmark ratings yet yield poor results for real applications.
The computer industry has wasted considerable manpower on these improvements just to make inflated claims. What we really need is a universal design tool that allows designers to quickly benchmark devices using their actual designs.
In spite of these limitations, however, PREP benchmarks do provide some useful information. I hope future improvements will make them even more useful.
1. PREP Inc, 504 Nino Ave, Los Gatos, CA 95032. (408) 356-2169. FAX (408) 356-0195.