EDN Access

 

June 19, 1997


CPLDs readily replace precious µP resources

Damon E Domke, Design Engineer

Using CPLDs to offload your CPU lets you create a device that hits an effective performance and cost balance between the conflicting attributes of standard and custom parts.

If your system requires functions that would excessively burden your CPU, you may want to offload the function to external hardware comprising standard logic functions. However, you may find that available standard devices lack the precise combination of features you need, cannot handle the processing sequence you have, or have too many features and thus are too costly. Meanwhile, a custom design requires too much time and uncertainty.

As an alternative, consider using a CPLD, a standard device that you can customize via design software. Just as firmware allows a standard µP to perform many functions, the programmability of a CPLD makes it ideal for producing many logic functions from one standard device.

Each CPLD macrocell can assume one of several logic functions. Beyond this capability, the modular structure of the CPLD macrocell allows you to assemble many configurations by linking together several macrocells. Furthermore, programming the CPLD is rapid, so you can quickly create and edit entire logic structures. CPLDs have enough logic density with one package to accomplish fairly complex tasks.

Characters are complex

The easiest way to see how and what a CPLD can do is to use an example, such as a character-imager (CI) design. The CI, a subblock of a larger device, reads input data in one format, translates that format to the output-display format, and sends it to the output-display device.

The global functions of the CI are fairly straightforward (Figure 1). The imager receives data from an RS-232C line and separates the data into host-control commands and raw data characters. The imager uses the combination of these two host data types to map the incoming characters to a valid output matrix space, as they should appear on the final display device. The imager then converts data from this output-matrix space to a serial bit stream for each pixel line. Finally, the CI clocks out these serial bit streams--one row of pixels at a time--to the display device. The main constraints of this project are low overall cost, rapid time to production, fast character processing, and small physical design.

The transform function that maps the incoming characters to the output space, the parallel-to-serial converter, and the serial shifter require the most attention because they have the biggest resource-allocation problems. You base your design-resource requirements on the design functions, and you can implement these functions in one of two ways. Therefore, you can split resource re-quirements into two design methods.

In Design A, a standard UART processes the RS-232C data (Table 1). The main processor handles the entire input-data to an output bit-stream transform function. A standard synchronous serial port clocks the output pixel data to the display device.

In Design B, a standard UART processes the RS-232C data, and the main processor handles only a portion of the input-data to an output bit-stream transform function (Table 2). This function maps the input data to intermediate entities with attributes and variable sizes. A variable-length parallel-to-serial converter then uses the entity information to clock the processor output data to the display device.

Method B has these requirements because the incoming characters may have upside-down, scaled, inverted, and similar attributes that the µP must process in real time. Also, the output serial stream must be an exact pixel-per-pixel representation of the output image. The biggest difference between designs A and B is the amount of formatting that the µP must do to place the output data in a form suitable for the synchronous-output serial port. Because bit-manipulation instructions are time-intensive in a µP, you can save considerable time by using a synchronous-output serial port that can handle variable length character sizes.

Resources: not always available

The high-speed µP in Design A was too expensive to meet the stringent price constraint of the CI example. For Design B, the variable-length parallel-to-serial converter is not available on any standard µP that meets the design constraints of this project. To find what design would work, first look at the functions that have easy-to-find resource requirements. The pixel-mapping transform function has easy resource requirements, which you can meet with a moderate-speed µP and a moderate amount of SRAM. This design will work only if you keep the transform function less computationally intensive and leave the pixel-to-bit processing for the parallel-to-serial converter to perform.

However, significant problems arise when you are searching for the required variable-length parallel-to-serial converter. Most standard synchronous serial ports on µPs transmit a fixed number of bits at once. These ports simply accept 8 bits in parallel as an input and start automatically clocking out the bits. This technique is unsurprising because most applications need a synchronous serial port that can support this function.

When you analyze the critical portions of the image-transform function, you find that the slowest part is handling every single pixel and aligning all pixels into an 8-bit memory location. By changing the pixel-mapping algorithm to an object-manufacturing algorithm, you can alleviate the problem of the µP's shifting and storing bits, which it does inefficiently. This approach allows you to redefine the parallel-to-serial converter and the output serial shifter into a new function, the variable-length parallel-to-serial (VLPS) function.

The VLPS function accepts and processes arbitrary-length (within some range) default character-font data, producing the serial bit stream of the output pixels of a character with all attributes assigned (Figure 2). In other words, the VLPS function receives the regular-sized font data for a row of a character. Then, using the attribute definitions for that character, the function automatically expands the font data for that row to include the requested attributes, such as upside-down or inverse.

The specifications of the VLPS function are complex because they must carefully mesh with the software-mapping transfer function of the overall system operation. The function must process character objects from the µP's software algorithm, with each character having several possible attributes (Table 3). In addition, the VLPS function must process each character within a certain time. Because of the display device requirements, the VLPS function must send an entire line of dots of the output image within 2.5 msec.

This speed requirement applies to all characters with any combination of attributes. Therefore, you must design for the worst-case situation of the most time-consuming character and attribute set. Because of the parallel-processing relationship between this VLPS function and the µP's character-object transform function, the VLPS function usually waits for the µP to send the next character's data.

The output requirements of the VLPS are simple: It must transmit the desired image to display a pixel at a time until the function sends the entire pixel row. Then, the VLPS sends the next pixel row (Figure 3). Besides sending the image data and synchronized clock, the VLPS must inform the master CPU when the VLPS is busy performing a conversion, so that new data does not overwrite the current data in the VLPS. The VLPS must be able to output 8- to 16-pixel-wide characters.

The TTL-compatible inputs to the VLPS include data and control signals. The master reset for the CI system initializes the VLPS, and the data bus from the master CPU writes data to the VLPS. The data bus is functionally multiplexed, so it requires several control signals to receive the proper data from the master CPU. A 50%-duty-cycle, square-wave clock signal, CLOCKIN, times all transfers. This clock signal should be four times the frequency of the desired output synchronous serial clock signal, SERIALCLOCK. The maximum serial-shifting frequency of the output-display device is 5 MHz, so the maximum frequency of CLOCKIN is 20 MHz.

The TTL-compatible outputs of the VLPS are two lines running to the display device, SERIALDATA and SERIALCLOCK, and one control signal, BUSY, running back to the master CPU to act as a busy signal. The design intentionally keeps all internal variables that represent character-object attributes as short as possible to minimize the CPLD's requirements.

Timing is critical

Your analysis of the extreme requirements of the VLPS must include how the function must perform as a part of the entire system as well as a single unit. Because the worst-case situation is to transmit an output-dot line within 2.5 msec, the master µP mapping algorithm and the VLPS must be well-matched to accommodate the extreme timing situations. The most extreme condition involves the number of characters per row. You can calculate the maximum number of characters per row, nchar, from the following equation:

nchar=pwidth/cwidth,

where pwidth is the width of the display row in pixels per row, and cwidth is the width of the smallest character in pixels per character. The worst-case situation occurs when the current font to display is one dot larger than an 8-bit boundary, zoom is set to one, the font is upside-down, and no extra spaces are next to the character. This setup yields the most characters to send per pixel row. Use the following equation to calculate the worst-case time for transmission of one character:

tvlpssetup+tcharsend+tvlpshold=trow/nchar,

which, when you rearrange it, yields

tcharsend=trow/nchar­tvlpssetup­tvlpshold in seconds per character,

where trow is the maximum time to transmit a row of pixels to display in seconds per row, tvlpssetup is the maximum time to set up the VLSP function in seconds per character, tvlpshold is the maximum time required to hold the end of VLPS function in seconds per character, and tcharsend is the maximum time to send one character in the VLPS function in seconds per character.

Thus, for the display with 640 pixels per row, the VLPS must process each of these characters within 31.25 µsec, plus the setup-and-hold times for the VLPS function, which are specific to the actual physical device you choose to implement it. The smaller these two time values are, the more time the µP has to produce the character-object data to send to the VLPS function.

VLPS enters the real world

Now, you can implement the design specification of the VLPS into an actual CPLD. Many companies provide viable CPLDs, and each CPLD has various strengths and weaknesses. Select one that contains the macrocells and resources that the design needs. Often, this requirement means you have to perform an interactive search during the design to specify the best CPLD.

You can specify the functions you need implemented within a CPLD in one of several semantic levels. Today's advanced PLD-design software and fitters generally insulate you from the macrocells and routing when you use HDL equations or schematic-entry tools. However, because of cost constraints, you must design the VLPS function at a low level to use the smallest CPLD possible. To effectively design this function, research the macrocell and I/O details of the CPLD, exploiting strengths and avoiding weaknesses. You can still use HDL equations to specify the behavior of the CPLD, but you will choose better equations because of the low-level information that you have about the macrocells and I/O pins.

Implementing a function into a CPLD is mostly an exercise in resource distribution and management, and if you have more available resources than you have function requirements, you'll need less time to map the function into the CPLD. This approach gives the design-time constraint a higher priority. Proper selection of the CPLD is crucial for effective resource and cost control.

This application uses the EPM7032 CPLD from Altera's (San Jose, CA) MAX7000 family. The EPM7032 contains 32 macrocells and four additional inputs (Figure 4). Its input and output lines do not have to be connected to the accompanying macrocells. Therefore, you can use the I/O pins for inputs and the accompanying macrocell for a buried register. The EPM7032 also features product-term clocking, which allows a distinct ORed combination of AND terms for each macrocell's clock input. Although this feature may not be useful in some designs, for many designs--such as the VLPS--much of the buried register logic requires a different set of clock terms for the macrocell flip-flops. General CPLDs provide only global clocking terms, thus effectively limiting the number of clock terms that the entire design uses. Demanding designs require product-term clock sourcing for maximum logic density per macrocell.

The EPM7032's timing specifications must meet the worst-case time requirements for the VLPS function. The fastest clock that you can input into the device is the 20-MHz CLOCKIN signal. Because the three internal clock signals--CLOCKIN, CLOCKINHALF, and SYSTEMCLK--combine to define states in the system, the smallest time per state is one-half of CLOCKIN, or 25 nsec. The maximum time for a signal to propagate through the EPM7032-10 is approximately 10 nsec.

The EPM7032's setup-and-hold times, which partially determine the number of characters that the VLPS function can reasonably transfer, are less than 10 nsec. However, the µP must make three byte writes to properly set up and start the VLPS function. This action makes the setup-and-hold times of the EPM7032 negligible for this function and places the timing responsibility on your µP. Therefore, this CPLD should be fast enough for the VLPS function.

Start by assigning the CPLDs resources to the VLPS-function requirements. The listings use Altera MAX+Plus II design software with Altera's AHDL syntax for the equations. Think of HDL as a programming language for defining the behavior of the physical part. It uses symbolic references and binary and arithmetic operators for the behavior descriptions.

First, examine the requirements on the device to support the inputs to the VLPS function. The inputs for the VLPS require eight data-input lines for DATAIN0 through DATAIN7, three control-input lines for write-enable lines, one input line for the CLOCKIN signal, and one input line for the RESET signal (Listing 1). Next, examine the requirements to support the outputs of the VLPS function. You need only one output-data line and register for the SERIALDATA, one output line for the SERIALCLOCK, and one output-control line for the BUSY signal (Listing 2).

Keeping the internal variables of the VLPS function brief allows the use of a smaller, economical CPLD. For instance, even though you could leave the zoom information in an 8-bit (1-byte) format, it makes little sense to waste extra bits when the largest zoom is eight times, or 3 bits. Therefore, the internal variable requirements are eight registers for font data (barrel shifter), three registers for character-size information (down-ring counter), one register for an inverted-character flag (status flag), one register for a backward-character flag (status flag), three registers for spaces information (down-ring counter), and three registers for zoom information (down-ring counter) (Listing 3).

However, these items are just the requirements for storage of needed data. Specifying the proper logic for the VLPS function within the CPLD requires several other internal variables, including two macrocell registers to divide the incoming clock by four and a register set of three macrocells to store the original zoom value so that you can zoom each pixel. In addition, you require several internal control signals, including one register for a clock-synchronizing signal, one register for a transform-initialization signal, and one register for an end-of-transform control signal (Listing 4).

Because several I/O pins are unused, you should map some of the internal variables and control signals to output pins during design and testing. Even though sophisticated simulation software is available, nothing can replace actual hardware signals that you can monitor with a logic analyzer or an oscilloscope.

Next, you can implement the internal logic for the hardware parallel-to-serial converter (HPSC). This function is the actual conversion engine of the VLPS peripheral. Using the input, internal, and output variables described, this function generates synchronous serial data from the parallel data. This task is probably the most difficult portion of the design because, even though you know the HPSC function, many logic specifications exist that will properly perform this function. One way of specifying the behavioral-HDL equations produces the logic specification that the HPSC function requires (Listing 5).

This task completes the design of the function, but, if you use a different CPLD, you will probably get a different set of HDL equations because of the subtleties of the resources within that CPLD. Another possible change in the logic specification results from pin placement. Because of pc-board routing issues or signal placement, you may need to force certain inputs or outputs to certain pins on the CPLD. Using a different CPLD may also affect the pin placement. For instance, some CPLDs offer more or fewer AND or sharable-expander terms than does the EPM7032.

Testing, validation complete the design

You must test to ensure that the behavior of the physical device matches the expected behavior of the VLPS function. Testing also provides a visual representation of the operation of the system. This approach provides helpful insight for future debugging or enhancement of the design.

You can use several different test methods. Your initial testing should begin with functional and timing simulation if the CPLD-design software that you are using supports these features. For example, MAX+Plus II includes a simulator that allows you to import test vectors or create a custom wavetable that simulates the design. A useful feature in a simulator is the ability to let you specify I/O signals and buried combinatorial and register logic in several formats. This feature allows you to generate test plots that emphasize many aspects of the design, which is useful during debugging. You can simulate a waveform plot from the setup and start of the VLPS function and the waveform plot from the ending of the VLPS function (Figures 5 and 6).

Your next step in testing is to connect the CPLD to your circuit and check the CPLD outputs while varying its inputs. Extra I/O pins are useful during this task because they let you temporarily patch the buried combinatorial and register logic to output pins. This approach gives you a better picture of the system during operation. A good logic analyzer is the best tool to use at this stage, although you may need an oscilloscope to examine certain signals for rise time, noise, actual signal level, and similar signal characteristics.

One of the best approaches to ensure adequate testing is to initially and progressively document which input test vectors and resultant output behavior expectations are critical for proper operation of the function. Then, as you take each test plot, save the waveform printouts from the logic analyzer and the list of documented test vectors. This printout is valuable not only for future design verification, but also for future design analysis.


Reference

1. Altera 1996 Data Book.


Acknowledgment

Thanks to Richard Terrill of Altera Corp for data on the MAX7000 CPLD and MAX+Plus II software.


Table 1 - Design A resources and functions
Resource Function
Standard asynchronous serial port Receives input data
Fast µP that can handle all of the pixel manipulation Processes the input data to produce the output-pixel data
Large memory space for intermediate pixel storage Provides a work area on which to temporarily paint the output pixels
Standard synchronous serial port Converts output-pixel data that is stored in parallel to an output bit stream for each pixel line
Table 2 - Design B resources and functions
Resource Function
Standard asynchronous serial port Receives input data
Fairly fast mP that can partially process the pixel manipulation Processes the input data to produce the output-pixel data that is not 8-bit-aligned, as the standard synchronous serial port requires
Modest amount of memory for pixel storage Provides a work area on which to temporarily paint the output pixels
Variable-length-character synchronous serial port Converts variable lengths of output-pixel data that is stored in parallel to an output bit stream of each pixel line
Table 3 - Character attributes
Character attribute Range
Size 8 to 16 pixels wide
Zoom One to eight times
Inverse video (highlighting) On or off
Upside-down On or off
Trailing spaces (tabs or spacing) 0 to 7 pixels

Damon Domke, Design Engineer

Damon E Domke is a design engineer who has developed embedded-µP projects for more than 10 years. He has a BSEE from the DeVry Institute of Technology (Kansas City, MO) and an MS in Electrical and Computer Engineering from the University of California--Irvine.


| EDN Access | Feedback | Table of Contents |


Copyright © 1997 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Publishing Company, a unit of Reed Elsevier Inc.