FROM EDN EUROPE: Reconfigurable systems shape up for diverse application tasks
Hard times for the electronics industry have led to conservative design approaches. With design starts recovering, designers are renewing their interest in radical techniques in the race to gain an edge in system cost and power demand.
By Graham Prophet, Editor -- EDN, January 8, 2004
|
Recent advances in silicon-process technology have greatly benefited the makers of programmable logic. The high density of logic possible at process geometries such as 130 nm (0.13 microns) and, now, 90 nm has suited the high densities that programmable logic needs to pack in not just the active logic gates, but also the overhead of the programming infrastructure that it must carry. And, the increasing number of levels of metal interconnect that these processes can support has proved ideal for the eight and even 10 layers of complex programmable interconnect that FPGAs require. FPGAs now stand alongside DRAM as the vehicles of choice to first exploit the capabilities of the most advanced process in volume production. Xilinx (www.xilinx.com), for one, has said that it will some time in 2004 build its first device with 1 billion transistors.
Naturally, the availability of parts of this complexity renews interest in the creation of complete systems on programmable silicon, greatly encouraging silicon vendors. In a related strand of thinking, there is also a renewed interest in reconfigurable systems. After all, the most elaborate FPGAs are and always have been costly, so why not make the programmable elements work harder, multiplexing functions into them at different times? Such an approach could allow you to use a smaller device and reduce the project's cost.
FPGAs with multiplexed functions are just one example of the many devices that you can place into the broad category of "reconfigurable systems" (Figure 1). In one sense, of course, every microprocessor-based system is reconfigurable, by means of its software programming. The differentiating factor in this discussion is that the hardware itself is changeable, for a variety of reasons. At the most esoteric level, a line of reasoning envisages a computing architecture that executes all functions on hardware specifically structured to carry them out with maximum efficiency. Such a system would build these structures from a limited pool of resources just before the flow of computation requires them and then release the resources for reuse as soon as they fulfil their roles. However, such an approach remains largely the province of academic research. At the other end of the scale is a conventional, programmable-logic-based option whose hardware configuration you can revise or update throughout its life to correct flaws or to perform system upgrades.
A chart from Altera (www.altera.com) provides a convenient set of subdivisions of the ground between the two extremes; it classifies reconfigurable systems into four divisions according to the intervals between reloading the configuration file of the programmable logic. Revisions that happen monthly or yearly are the province of bug fixes, upgrades, or, perhaps, the temporary replacement of regular operating code with a diagnostic or service configuration. For revisions that happen hourly or daily, you might load configuration files into a system at boot-up or reboot; such an approach might use a single hardware platform to carry out different functions for different users or applications. A more frequent reconfiguration marks out the third category to be systems in which reconfiguration happens at intervals of 100 msec to minutes; you might load functions into a programmable section of these systems, perhaps to implement different algorithms at different stages of their operation. All of these systems, as Altera says, are in use today in production, and if you are seeking to design in these styles, an increasing range of point tools can help.
In the fourth category, reconfiguration takes place on a time-scale of microseconds or milliseconds. This approach, Altera contends, requires a radically different design style. In the first three cases, you explicitly write the code to perform the full or partial reloading of the programmable logic's configuration file in predetermined circumstances and with your design in a known state. The most complex case is characterised by true reconfigurable computing, in which the program flow determines the hardware configuration to be loaded, and a context switch takes place under software control. This approach cedes control of the form of the hardware of the system to the system itself. It not only is inherently difficult to program, because it is a scheduling problem of considerable complexity, but also presents a significant verification challenge, because the number of states that the system can potentially occupy rises massively.
Why, then, aspire to create such systems? There is a distinction in the list of operation models beyond simply exploiting the versatility of systems already built in programmable logic (providing the facility for system upgrades, for example) and using the power of reprogrammability to implement different functions at different stages of the product's runtime. Underlying many of the concepts is the increase in performance and efficiency that you can gain by implementing functions in hardware: Individual functions or complete systems in hardware run much faster or at lower power than they do in software. A separate but related motivation is to reduce the total amount of hardware involved by reusing it during operation.
The quest for true reconfigurable computing might have progressed further if Moore's Law had not maintained its relentless drive to higher densities in process technology. The concept of invoking only the functions you require at the time you require them and using the minimum amount of reusable hardware has always offered an attractive elegance. In other words, why build millions of transistors if thousands will do? The answer in many cases has been that, given the progress of silicon technology, building millions of transistors has been the easiest thing to do, even if on some chips most of the transistors spend most of their time switched off. However, much (mostly academic) work continues on the subject, and has been reported in a variety of conferences (see, for example, Reference 1).
To the attractions of elegance, you must now add the efficiency of executing functions in hardware and in a minimal footprint, as more and more devices go handheld and battery-powered. A group within the Belgian semiconductor-and-systems research organisation IMEC Interuniversity Microelectonics Centre, (www.imec.be) has for some time been working in reconfigurable systems. Its Gecko demonstrator, which has been through a number of iterations, offers a broader scope than reconfigurable hardware alone (Figure 2). It demonstrates software/hardware multitasking, switching tasks between the two, on a platform that is itself hardware-reconfigurable.
The multimedia handheld terminal bases its configurable processing scheme on a Xilinx Virtex FPGA. The architecture divides the FPGA into a number of "tiles," which you can configure separately; a bus scheme constructed using the on-chip routing links the tiles. Gecko demonstrates application scaling using reconfiguration. In "video" mode, it loads an MPEG processor in hardware using most of the programmable tiles and displays video on the full screen at more than 20 frames/sec. (A PDA provides a demonstrator screen.) It can also run multimedia games. The operating systems reallocates most of the FPGA tiles to 3-D graphics acceleration and reconfigures them accordingly to process the game. Meanwhile, the system reconfigures one or two tiles as the core of a software-based MPEG decoder. The video continues to run at a few frames per second in a small window on the screen. The systems maintain and seamlessly transfer all program threads during reconfiguration.
In another exercise, the IMEC team is turning its attention to a design study for a reconfigurable radio modem that could handle multiple radio-data standards. The ultimate objective is a modem that will scale transparently while roaming across multiple networks with widely different data rates. Conventionally, you would implement all the necessary modems and switch between them; transistors may be inexpensive, says IMEC's vice president of design technology, Rudy Lauwereins, but they're not that inexpensive. And you cannot tackle this class of problem with a general-purpose FPGA; the overhead in area and power is far too high for a degree of versatility that you cannot afford and do not need. This design is not a "software-radio" design in the sense that it still contains an RF front end, but all of the baseband functions are modular and can interconnect in different configurations depending on the data standard they're handling. Analysis of the problem shows that for a set of 14 modular circuit blocks, you could use 11 in all of the configurations that the design uses. For the three blocks that change, one or two options cover all possibilities.
Appropriate flexibility, IMEC's Lauwereins emphasises, is the key to designing a reconfigurable system. To achieve the ultimate efficiencies, he says, "you must choose a level of granularity that is optimal for the particular system you are designing; it needs to be just flexible enough." In the case of the modem, you can begin to identify the appropriate level by noting common features, such as the types of coding involved; at the right level, only a few things change as you change from one protocol to another. Lauwereins acknowledges, however, that little help in the form of commercial tools is now available to designers who want to carry out that analysis.
A global objective of IMEC's research into portable products is the reduction of power by orders of magnitude. The process begins with "software washing," in which a software tool set optimises "dirty" C code that simply describes system functions and "cleans" it into a multithreaded form. A key component of this process is the Atomium tool set. Atomium is not yet commercially available, but at press time, IMEC said that it was in the late stages of negotiation to license it to a commercial EDA concern. A basic premise of the tool set is that in a data-dominated application, your system expends a great deal of power in memory transactions. This amount, it turns out, is reducible by a factor that is far greater than intuitive reasoning might suggest. Optimising memory transactions, both the frequency of memory reads and writes and by creating an architecture in which addressed memory is always close to the processing element addressing it—avoiding driving long lines to remote memory—reduces power. Atomium, which IMEC built on an earlier "data- transfer and storage-exploration methodology," analyses an application's code to identify memory bottlenecks. You can see which data structures and arrays your host processor core is most frequently accessing and from which functions it is accessing them. You can also see which functions or parts of functions require the most memory accesses. You can then design the optimal memory architecture, basing it on timing analysis, and transform code to reuse memory locations and minimise the total amount of memory the application needs.
To produce a more general approach to the problems of reconfigurable systems for versatile portable products, Lauwereins anticipates an emerging silicon technology that will combine a VLIW (very-long-instruction-word) processor with a "coarse-grained" reconfigurable block structure. Such a design would embody the concept of processing element tiles, including the "just-flexible-enough" granularity. Almost by definition, chips built in this way would be specific to a set of applications. Lauwereins believes that several semiconductor companies are working on this class of device, but vertically integrated companies may well use it first for internal products. Therefore, it may not quickly emerge as a commercial technology. Other runtime-reconfigurable tools that IMEC has created to assist with the design process of systems include software to partition threads between hardware and software; all of this work is available to partners in the reconfigurable-system programme (Reference 2).
Returning to approaches in use today, if you want to implement a range of algorithmic solutions on an FPGA, as alternatives to configure as your product is running, you need support for partial reconfiguration. Conventionally, you create a configuration file for your programmable chip in the design process and load it at system boot-up; all of the FPGA vendors support, to a greater or lesser extent, reloading a fraction of the configuration file as a separate operation.
At Xilinx, Outbound marketing manager Giles Peckham notes a recent resurgence of interest in reconfigurable designs. One example that Xilinx quotes may be atypical, but it symbolises the aspects of in-system upgrade and resource limitation: The FedSat Australian research satellite is currently flying with programmable logic as the computing core of one of its payload modules, allowing upgrades and mission changes via the remote loading of new configurations. More typical, Peckham says, is a range of systems that pair a DSP with a programmable part to implement an algorithm in, say, image processing or medical imaging.
You may find that a relatively small amount of hardware assistance drastically reduces processing time, yet the range of possible algorithms that you can call on is large, and it would be uneconomic to program them all in hardware. In such cases, a reconfigurable coprocessor is appropriate. You might have an FPGA perform algorithm processing and reload it completely to adjust the system's function, or you might adjust the processing by resetting bits on the fly. Xilinx supports this kind of activity with its PART (Partial Reconfiguration Toolkit), which builds on its Java-based JBits software. From a base configuration, the tool kit compares revised or reconfigured programming files and extracts their differences to produce a configuration file for the FPGA that you can use to reset only the functions on the arrays that you require. However, as Peckham puts it, "You need to know where your data is when you do that."
Some of the Virtex parts use internal reconfiguration as part of their operation. You can adjust the transmission parameters for high-speed transceivers on the Virtex-II Pro chips as part of the configuration file to cope with varying impedances on real pc-board tracks. An on-board processor can perform this task.
Reconfiguration can also turn an FPGA into a large crossbar switch. Around 70% of the switches on a complex FPGA are interconnection switches; thus the configuration file is an effective way to set up a crossbar switch of nearly 1000 inputs and outputs. Storing every possible configuration file would be impossible, due to the amount of data involved. However, you can use the JBits software to calculate a new bit stream on the fly from a high-level parametric definition of the desired connectivity.
OS and tool support
As part of a reconfiguration initiative, Xilinx has an ongoing partnership with RTOS company Wind River (www.windriver.com) and software supplier Celoxica (www.celoxica.com). Celoxica has recently been pursuing an approach that identifies it more with the system-level design effort in the EDA community. Nevertheless, its products, including the compiler in its DK (design kit), continue to offer the ability to take algorithmic descriptions directly to FPGA configuration files. The company recently released the second version of its DK2. Celoxica's vice president of marketing, Jeff Jussel, anticipates that users will employ the reconfigurable capability that this kit affords in areas such as cellular-radio-base-station designs. You can implement key functions in the protocol in hardware for performance reasons and reshape them to suit changing traffic conditions.
A typical problem that Celoxica sees in its customer base is signal processing plus some measure of optimisation that benefits from hardware acceleration. "If you code the algorithm in RTL, you lose the ability to fine-tune it," Celoxica's Jussel says. These applications, however, are usually the kind for which you can periodically perform reconfigurations rather than dynamically at runtime. One of the objectives of the Xilinx/Celoxica/Wind River programme was to explore the issues of maintaining operation during dynamic reconfiguration. Jussel notes that, so far, such applications have been few, and most of them "are signal-processing or other algorithmic operations, where the function is static."
Similarly at Wind River, whose chairman Jerry Fiddler launched the reconfigurable programme with some fanfare, alliance manager Stewart Newton confirms that the last two years' industry downturn removed the impetus from many ambitious projects that might have used the reconfigurability concept. The basic platform to support dynamic reconfiguration of FPGAs with VxWorks as the main OS was in place two years ago, Newton says, and he confirms that a revival of interest has recently occurred. Whereas activity with runtime reconfiguration lies with military and telecom programmes, Newton cites efforts such as embedding the reconfiguration mechanism within automotive systems such as entertainment platforms.
These efforts are examples of introducing a long-life product into a market with fluid standards, and reconfiguration of the hardware might allow, say, retrofitting the video capability to an automotive-entertainment platform. It uses FPGA systems as risk-reduction mechanisms—tracking changing standards without replacing embedded systems—and also as a possible after-market revenue streams, fitting new options by downloading code alone. Newton also sees a role for the technology in the system "bring-up" (initial commissioning) and debugging stages of product development, using FPGAs in conjunction with JTAG access to add flexibility. Wind River is finding that large FPGAs are increasingly appearing in locations such as massive-computation image processing. In telecommunications, they are also appearing at the front-end of network processing, in which they implement, in effect, hardware acceleration of packet processing, typically requiring frequent reconfiguration as traffic-analysis requirements change.
Newton notes, however, that every company has its own design style, and it is not yet possible to define a standard design flow. He also cites an appreciation of the risks associated with using on-the-fly reconfiguration. This risk again brings up the system-verification problem: Flexible architectures, such as base stations, might increase versatility and reduce power demand, but they may also increase the number of potential routes to a system crash, which you cannot afford. Newton says that tools, application-programming interfaces (to enable calls from host processors out to configurable hardware), and services are in place and ready if and when the industry decides to move forward.
Technical marketing manager for Altera Europe, Pat Mead, referring again to the four categories of reconfigurability, says that the company explicitly supports all but the single-cycle-context-switch model. The system-upgrade model is easy to implement but rarely activated in practice. Reconfigure-at-reboot occurs, for example, in cellular base stations, in which Altera's customers are using its FPGAs to alter baseband processing for different traffic loads or to shape antenna patterns. The third group is the most common, Mead says, typified by test-and-measurement equipment in which the FPGA may change from data-capture to data-analysis mode. These tasks are relatively easy to implement: You simply store the alternative configuration files and load them as required. Altera has supported examples of its Nios processor, an IP (intellectual-property) block, as an onboard configuration controller. The company supports this use with a development kit that includes a Web server, to enable such configurations over LANs or WANs.
With a large part of its business focusing on antifuse technology, Actel (www.actel.com) sees its flash-based parts configured for infrequent, in-the-field upgrade capability but counsels a high level of caution when it comes to dynamic techniques. "A verification hell" is how the company's applications/IP director Yankin Tanurhan describes such an ambitious design approach. The reality for most systems, he says, is that they are "tweaked" in the lab, and, in many cases, refinements mean removing features to reduce costs rather than looking for ways to add more features.
At a board level, commercially available examples of reprogrammable hardware packaged to provide reconfigurable functions include National Instruments' (www.ni.com) PXI card, the PXI 7831R (Figure 3). An FPGA provides configurable I/O and control functions, speeding some control functions by tens or hundreds of times by placing them in hardware. NI has created its own software environment for this card to take it into the company's LabView environment. You program the required function in the graphical environment of that package and, transparently, the software first creates HDL (hardware-description-language) code to describe the hardware you require and then compiles it to a configuration bit stream. The code can describe multiple parallel processes, and the software will create separate hardware instances to match them, so you can have true parallel processing of timing-critical operations.
A number of specialist suppliers manufacture board-level products, providing reconfigurable computing to a range of markets. For example, Nallatech's (www.nallatech.com) DIME and DIME-II architecture boards package large FPGAs with standard interfaces and provide a development environment plus configurable operating system that allows you to add customised application hardware—implemented in the FPGA—to an embedded system that may be running on a conventional host or on one of Nallatech's motherboards.
The most recent introduction is a PC/ 104-format card stack that uses two Xilinx series parts (Figure 4); one or more Virtex-IIs provide as many as 56 million gates of user space, and a further part is preconfigured to provide board control and interfacing. Nallatech's Web site includes a paper that outlines a theoretical basis for high-performance reconfigurable computing that concurs with the IMEC team's view, stating that a machine that uses only the processing power it needs to solve a problem tends to automatically make better use of other peripheral resources, which can significantly benefit I/O and memory usage. For example, the paper continues, a reconfigurable machine often has functions built into local memory or local state data, so loading and interpreting instructions consumes less memory bandwidth and power than it does on traditional machines.
PicoChip (www.picochip.com) offers an architecture that emphasises the significance of having a reconfigurable granularity optimised for your application domain. The company produces an array processor targeting the base-station market. The chip is reconfigurable in software rather than hardware. The company has fixed the device's granularity as an array of simple but full-function processors for a relatively narrow market. Although the reconfiguration is in software, you change the software for some or all of the array of processors to revise the chip's functions. PicoChip's chief technical officer, Doug Pulley, points to the importance of an easy design flow: When you are dealing with architectures in which so much is changeable, excessive features add complication. PicoChip has opted for a deterministic approach in which functions and resources are fixed at compilation time and in which no shared resources exist to compete for during the design. Second silicon of the company's PC102 chip (Figure 5) is due in the next few weeks. It will incorporate enhancements for communications functions, such as spread spectrum and forward error correction. As with all reconfigurable approaches, Pulley says, it's important not to compromise flexibility and scalability when adding features; they remain some of the key points of the overall approach.
You can reach Editor Graham Prophet at +44 118 935 1650, fax +44 118 935 1670, e-mail gprophet@reedbusiness.com.
| References |


















