Design Feature: May 9, 1996
The latest generation of content-addressable memories (CAMs) makes it possible to handle list searches and data translation as embedded functions within systems. The combination of a CAM and a state machine creates an economical controller for real-time processes that need to perform look-ups, data translations, and entry maintenance in sparsely populated tablesones with few entries compared to the address space required for direct look-up.
By simplifying many of the tasks required to deal with lists and tables of data, a CAM can offer you one of two possible benefits. You can simplify the controller by switching to a lower cost processor or a state machine. Or, you can dedicate the additional processing power that the CAM provides to additional features and functions. In either case, more processing power is available to an embedded controller at a lower cost.
A CAM is a hardware implementation of associative processing. Associative processing manipulates data based on matching, or associating, an input value with other values stored in an array. Associative-processing hardware incorporates a limited amount of computational capability at each memory location and allows you to examine an entire memory array at once. CAMs combine these functions with a control structure to perform associative processing.
This type of processing stands in contrast to today's predominant computational architecture: the Von Neumann engine. This architecture relies on address-based memories to store instructions and data. A few computational elements (most often one) sequentially process the instructions, operating on a single element of data at a time. Parallel-computing systems collect many of these processors, either operating together or independently, into large arrays.
CAMs implement associative processing
Associative-processing theory has been around for a long time. However, the first large CAMs were expensive, and only a few specialized applications could effectively use small CAMs. Now, larger, general-purpose CAMs allow you to use associative processing to solve a wider variety of problems. Several vendors make devices that are part of this latest generation of CAMs. Although some differences exist among vendors in array organization, interface specifics, and product variety, many similarities among offerings also exist. The following application examples are based on MUSIC Semiconductors' MU9C1480 LANCAM.
A CAM compares an input value, the comparand, to all the associative data stored in the memory array at once. The output from a CAM can be either a flag that indicates one or more matches or associated data that is related in some way to the matched values.
The external connections to this type of memory device are simple. A data bus provides command inputs (Table 1) and data I/O. Output flags indicate the comparison results and indicate when the memory is full. Flag inputs allow you to connect multiple devices in a daisy chain without external logic, thereby expanding the number of words in the memory array. Table 2 lists the CAM memory and data locations.
| Table 1LANCAM Instruction Summary | |
|---|---|
| MOV dest,src[mr][,V] | Move data |
| VBC memdest,vb | Set validity bits |
| CMP vb | Compare on specified validity types |
| NOP | No operation |
| SPS src | Set persistent data source |
| SPD dest[mr][,vb] | Set persistent data destination |
| TCO contrlreg | Read or write to configuration and control registers |
| SFF | Set Full Flag for initialization of a daisy chain |
|
src: data register or memory-array location dest: data register or memory-array location memdest: memory-array location mr: Mask Register 1 or 2 (moves to and from comparand or memory array) V: Set Validity bits Valid (memory-array destinations only) vb: any validity bit setting (V, E, S, R) contrlreg: source or destination for command read or write | |
Note that the CAM logic symbol has no external address lines
The search routine loads the VPI and VCI into the comparand register with two data writes. A command read attempts to fetch the match address. If a match exists, the routine uses the match address to look up the VPI/VCI associated data.
If no match occurred, a TCO command to the control register switches to mask register 1 and triggers a second search of the database for a VPI-only match. A second command-read cycle again attempts to read the match address. The match address for this comparison looks up the associated data for the VPI. Another TCO CT command switches back to no mask register for the next VPI/VCI comparison.
The add routine runs when setting up a new connection. The new VPI or VPI/VCI combination loads into the comparand register and moves to the next free location in the CAM with a MOV NF,CR,V instruction. A forced compare matches the new entry, so that a command read of the status register recovers the entry's CAM address. The match address then serves as the external RAM address for storing the associated data.
The delete routine runs when tearing down a connection. The routine loads the entry to be removed into the comparand register. A VBC ALM,E command purges all matching entries from the list by changing the validity bits to empty (even though having more than one matching entry should be considered an error condition). The routine need not clear the associated data from the RAM; this data will not be accessed until the next time the corresponding CAM location is used.
CAMs fit into embedded systems
You commonly find CAMs in two types of applications: data filters and data translators. Data filters search a table or list and generate a flag that indicates whether the filter finds the input comparand. An example of a data filter is an address filter in network bridges, routers, and firewalls. Data translators search a table and output data associated with the comparand. Examples are switches (header translation), caches (main storage to cache location translation), and compression (symbol translation).
Various boxes accompanying this article show typical instruction routines for a number of these applications, including network bridges, ATM switches, disk caches, branch tables, and compression. Also see Table 3, which shows the corresponding CAM cycle keys. By studying these routines, you can see how a CAM's associative properties simplify not only the search sequence, but also the maintenance of a data list. The use of a CAM reduces these tasks to operations that simple state machines can easily handle.
As one example, the cycle sequence in In Figure 3, the first three cycles write the data to the three segments of the comparand register and trigger a comparison. A command-read cycle gets the match address from the status register, which is the default source for command-read cycles. A single read cycle fetches the associated data from segment 0 of the highest priority matching location. Because the device's configuration is programmable, all five cycles deliver and fetch useful data without any additional overhead. This efficiency is very important in time-critical processes.
Searches are only part of the challenge of working with lists of data, however. Many embedded applications must also maintain the list over time by adding or learning new entries and deleting or purging obsolete ones. The associative-processing power of a CAM is well-suited to this task. New entries in the data list typically load into the next empty location in the CAM's memory array. A related concept is learning: After a comparison that finds no match, a single command (MOV NF,CR,V) moves data from the comparand register to the next free address. Because a typical list comprises unique entries with equal priority, the CAM can store a new entry anywhere in the memory array. Thus, you can construct an efficient sequence that adds data to the list by taking advantage of a CAM's ability to associatively access the next empty location.
Sometimes, you may want to update an entry that exists in the CAM's memory array, such as a time stamp in a segment of the comparand. Either the CAM/RAM configuration or a mask register usually masks out the time-stamp segment from the normal compare operations. When the CAM finds a match for the unmasked portion of the comparand, a single instruction (MOV HM,CR,MR1) moves only the bits indicated by mask register 1 to the highest priority matching location from the comparand register, updating only the time stamp in the process.
When it is time to delete or purge one or more entries from the data list, the CAM can use associative locations Highest Priority Match or All Matching. To perform the purge, the comparand register loads the entry to be removed, which forces a compare. Then, a single instruction (VBC ALM,E) changes the validity bits at all the matching locations to mark them as empty, thus removing them from compare operations and making them available for new entries as the next free location.
CAMs make hash of the alternatives
Choosing between CAMs and the alternatives requires making trade-offs between processing power and speed. Linear and binary searches are one alternative, but they require extensive shuffling of the data list when you add or delete entries. To add to a sorted data list, every entry from the end of the table to the location of the new entry must be read and then written into the next location. Removing an entry requires the same process in reverse. Also, search times depend on the size of the data list.
Another alternative is a hash table. Hash tables are the software implementation of associative processing and are fairly efficient at making the table smaller. However, hash tables require an increased level of computing power to quickly calculate the hash-table key. List maintenance, which comprises calculating the hash key for the new entry and placing the data at this location, is fairly straightforward until a new entry maps to an already-used location, causing a collision. Resolving collisions involves performing additional hash-key calculations until the data finds an open location in the hash table. Collisions cause nondeterministic hash-table search times. That is, the compare time may vary based on the number of collisions that occur when constructing the table.
When should you, as a designer, consider using CAMs? A CAM costs more than SRAM, because a CAM cell requires more transistors to build. However, a CAM's advantages outweigh the additional cost compared to other alternatives for the following conditions:
Your application need not meet all of these conditions for you to choose a CAM. At times, only one of these items may justify the use of a CAM. However, most suitable applications can take advantage of several CAM and associative-processing attributes.
The dictionary assumes that the first 258 entries comprise the literal 8-bit codes plus a clear code and an end-of-file code, so the first 258 CAM locations are marked "S" or "R" to let you use the match addresses as the codes for pairs of symbols. Initially, the comparison is 17 bits wide but can grow as the dictionary fills with new symbol combinations. The comparand for a 4-kbyte dictionary can typically grow to as many as 20 bits12 bits for the code and 8 bits for the input symbol. The compression adds entries to the dictionary and assigns a code to each one. The additional entries comprise combinations of two ASCII literals or an ASCII literal and the code for a previous dictionary entry.
If the base string is found in the table (as shown at steps 5 and 7 of the example) the routine sends nothing to the output. The routine reads the code (the match address) for this base string from the status register (the default location for command reads). The routine creates the base string for the next look-up by appending the next input symbol to this code.
The algorithm repeats this process until the input ends or the table fills. Even with an index of only two symbols, this procedure can build long strings, producing a good compression ratio. If the dictionary fills, you can extend the table length by going to codes of 10, then 11, and then 12 bits for symbol pairs that are added to the dictionary. When the dictionary reaches maximum size, the table is cleared, and a new table starts.
The decompression table comprises contiguous entries, having been built from incremental codes (the next free location was always one address higher than the previous CAM location used for a new entry). Therefore, a direct RAM look-up scheme can efficiently perform the decompression.
Author's biography
Table 2CAM memory and data locations Associative memory array locations: HM Highest priority matching memory location NF Highest priority empty (Next Free) memory location (destination only) ALM All matching memory locations (validity bits only) RAM-array locations: [AR]
aaaHMemory at the location specified by the address register. Memory at a specified hex address Data Registers: CR Comparand Register MR1 Mask Register 1 MR2 Mask Register 2
(Figure 1). In a CAM, most memory accesses are associative, that is, based on the contents of the memory, so external address lines are unnecessary for most operations. An internal address register, accessible through the command structure of the device, allows address-based random access of the CAM array when necessary. Two mask registers provide bit-level control of the comparand register for compare operations, allowing comparisons on a limited number of bits for special situations. You can also use the mask registers to update specific memory-array bits during moves from the comparand register.
Storing the associative data within the CAM device minimizes the component count in an embedded system. However, if the total number of associative and associated bits exceeds the width of the CAM, you can store the associated data in an external RAM (Figure 2).
Application: ATM switch
An asynchronous-transfer-mode (ATM) switch performs many functions. The switch must search internal tables that hold the necessary information for each connection that routes through the switch. The index to these tables is the virtual-path identifier (VPI) or the VPI/virtual-channel-identifier (VCI) combination from the header of an incoming data cell. The switch uses this information to look up the VPI and VCI for the outgoing link, the internal path through the switch to the correct output port, the billing rates for accounting purposes, traffic-flow parameters, and flags for any special functions incorporated into the switch.
A CAM configured for such an application requires at least 28 bits of associative data for matching VPI/VCI combinations and, possibly, more bits to include a port ID if several input ports share the CAM. Using a mask register for VPI-only look-ups lets you combine the channel and path lists, which, in turn, allows you to make trade-offs between speed and hardware simplicity.
The accompanying figure shows the VPI and VCI in separate segments, but the bit-level comparison control of the mask register allows you to distribute the identifiers across the two segments in any convenient fashion. Because of the amount of data that you can associate with each connection, this example uses external RAM for associated data storage; the CAM match address is the address of the associative data in the RAM.
Table 3CAM cycle key Cycle Type /CM /W /EC C2R Command read LOW HIGH HIGH CW Command write LOW LOW HIGH DR Data read HIGH HIGH HIGH DW Data write HIGH LOW HIGH CRE Command read, /EC LOW LOW HIGH LOW CWE Command write, /EC LOW LOW LOW LOW DRE Data Read, /EC LOW HIGH HIGH LOW DWE Data Write, /EC LOW HIGH LOW LOW
Figure 3 shows how the CAM architecture simplifies the control of a 48-bit comparison, a read of the match address, and a read of 16 bits of associated data. (Assume that the CAM has a 16-bit external bus and a 64-bit-wide memory array. Four 16-bit segments, numbered 0 through 3, access the memory array.) When you initialize the device, the memory array comprises 48 bits (segments 1, 2, and 3) of CAM and 16 bits (segment 0) of RAM. Data writes go to segments 1, 2, and 3 of the comparand register; data reads come from segment 0 of the highest priority matching location.
These alternatives are cumbersome in embedded applications, requiring processing power and long sequences of instructions that are unnecessary with a CAM. In an embedded system, the simple control interface makes it easy for a state machine to control the CAM's operation (Figure 4). Also, a CAM's powerful associative operations reduce the number of instructions required for any application, making it possible for simple state machines to control otherwise-complex operations.
Application: compression
Most lossless compression schemes, such as an LZW algorithm, are dictionary-based. These schemes typically search a look-up table and output translated data, such as a symbol code or expanded data. The dictionary table builds from the data stream that is compressing, so the dictionary contains only a subset of all possible entries. In other words, the dictionary is a sparse table.
The compression routine appends the first two input symbols to create a base string. Two data-write cycles load the base string into the comparand. If the base string is not in the dictionary (typical examples are shown at steps 1 and 6 of the example), the routine sends the first symbol in the base string to the output. The MOV NF,V command adds the base string to the dictionary, which causes the 9-bit code assigned to the symbol pair to correspond to the CAM memory address. The second symbol from the base string then forms a new base string with the next symbol from the input.
Compression example Step Input Comparand
[Seg1],[Seg0]Code
[Match Addr]Table entry Output M 1 i M,i - [258]=M+i M 2 s i,s - [259]=i+s i 3 s s,s - [260]=s+s s 4 i s,i - [261]=s+i s 5 s i,s 259 -- 6 s 259,s - [262]=259+s 259 7 i s,i 261 -- 8 p 261,p - [263]=261+p 261/td> 9 p p,p - [264]=p+p p 10 i p,i - [265]=p+i p 11 -- i,_ - [266]=i+_ i
Tom Weldon is director of product marketing and applications at MUSIC Semiconductors (Colorado Springs, CO), where he has worked for six years. He holds a BSE from Harvey Mudd College (Claremont, CA).
| design features | out in front | design ideas | columnist | departments | products |