|
||||||
February 2, 1998Analyzing and implementing SDRAM and SGRAM controllersChristian Green, MoSys IncDesigning your own synchronous-DRAM controller lets you tune its cost, complexity, and performance to your application needs. Youve got lots of options, so research the trade-offs before proceeding.Today's graphics engines, communications chip sets, and µPs are running faster than ever. High-speed core internal clock rates now exceed 300 MHz and are rising quickly. However, slow external-memory accesses, especially those involving DRAM, can greatly restrict overall system performance. Fortunately, higher performance synchronous memories are now available. Although these devices offer potentially much higher bandwidth, obtaining optimum performance from them requires a controller that takes advantage of their enhanced capabilities. Synchronous DRAMs (SDRAMs), including synchronous graphics RAMs (SGRAMs), differ from their asynchronous DRAM counterparts, such as fast-page-mode and extended-data-out (EDO) RAM, in more ways than the simple presence or absence of a clock input. First, the row-address-strobe (RAS)/activate operation is independent from precharge. Most asynchronous DRAMs precharge whenever you drive the RAS input high. With SDRAMs, however, precharge is an explicit and separate command from RAS/activate to allow multiple memory accesses on the same row. Leaving a row activated is called RAS parking. The burst features of SDRAMs generate their column addresses using an on-chip counter. Asynchronous DRAMs, on the other hand, require an explicit address that the memory controller supplies for each access. Also, SDRAMs include two or four internal banks. Each bank is an independent memory that you can activate and precharge separate from the other banks. This independent operation allows the controller to transfer data to or from one bank during another bank's precharge and activate latencies, maximizing performance (see box "DRAM basics"). SGRAMs differ from SDRAMs in two key areas. First, SGRAMs support a block-write command that fills many addresses with one value in a single operation. SGRAMs also offer a write-per-bit function that allows selective write of a portion of a word. Block write and write per bit are especially valuable in graphics applications, but any design that needs these functions can benefit from SGRAM. If you tie the Device Special Function (DSF) pin on an SGRAM to ground, the device behaves identical to an SDRAM. Memory-controller architecturesThe simplest SDRAM controller, an autoprecharge controller, issues a precharge command after every memory access. The chief advantage of the autoprecharge controller is its simplicity. However, subsequent accesses must issue an activate command before reading or writing, even when this next access is on the same row. The bandwidth lost by this approach may not be too severe if one or more of the following conditions exists:
If you decide to use an autoprecharge controller, you'll find that the easiest way to perform the precharge after each access is to use the write-with-autoprecharge and read-with-autoprecharge commands.
If a bank miss occurs with the valid bit set, the controller must activate the bank to access and precharge the last bank activated. However, because these two commands are to different banks, the controller can interleave their latencies. For example, the controller can activate the bank to access and in the next clock cycle precharge the last bank accessed. This interleaving greatly improves the probability that the new bank will be ready for activation the next time the task attempts to access it. A row miss (but a bank hit) that occurs with the valid bit set means that the task is trying to access an address in the same bank but in a different row from the last access. Therefore, the controller must precharge and activate the same bank before the access can take place. This scenario results in the biggest negative performance impact, which you can reduce only if the memory controller can interleave access to another memory chip in parallel with the precharge and activate operations. At initial power-up and immediately after a refresh command, the data in the bank/row register's contents may be invalid. You must also implement a valid/invalid bit so that the controller knows whether the bank the register points to is precharged or activated. If a task requests an access when all banks are precharged, then the controller must perform an activate operation before the read or write. The single-comparator controller is optimal for applications with only one or a few tasks, those in which task identification is impossible, and those in which the tasks have locality of reference (a high probability that accessed locations will be close to those of previous accesses).
Just as with a single-comparator architecture, the data in the register may be invalid after initial power-up and immediately after the register has issued a refresh command, so you must implement a valid/invalid bit. An application that uses four 256k×32-bit SGRAMs or SDRAMs on a 64-bit bus (4 Mbytes total) will contain four banks. Thus, the memory- controller design requires four tags. Whenever a miss occurs with the valid bit set, the memory controller must precede the activate command with a precharge, increasing the lead-off latency. Thus, this design performs poorly if the tasks exhibit extremely poor locality of reference. The autoprecharge architecture is a better and simpler choice in these situations. The comparator-per-bank design is optimal for applications with many tasks, those in which task identification is impossible, and those in which the tasks have locality of reference. This scenario is typical of main memory in PCs, workstations, or embedded processors in which the tasks are software processes.
The objective of the register-per-task architecture is to use the knowledge of each task's unique behavior to improve performance via selective optimizations. A comprehensive description of this architecture is impossible because the strategy is built around the special cases of the tasks. For example, a standard DRAM autoprecharge access best serves low-bandwidth clients with truly random-access patterns. On the other hand, keeping separate tags for both the source and the destination registers best serves graphics engines. The difficulty in implementing the register-per-task architecture lies in handling bank contention. If two tasks are trying to access the same bank, then the controller tends to "thrash" the memory with incompatible protocols and repeated precharge and activation cycles, negatively impacting performance. To solve this problem, the controller must use a well-designed scheduling algorithm that minimizes costly pre-charge/activate sequences and still satisfies the latency requirements of each task. Handling memory refreshSDRAMs provide a refresh command to keep charge in each DRAM bit capacitor from decaying, which would cause loss of stored data. Before issuing the refresh command, the memory controller must precharge all banks in the device. Internally, the memory contains a register with the address of the next row to refresh. When the controller issues a refresh command, the device activates the row in the register, immediately precharges it, and increments the register. The refresh command must execute every 15.6 µsec. Refresh implementation in the autoprecharge architecture is easy because all the banks are in a precharged condition between memory accesses. The controller needs only to insert the refresh command between memory accesses. In the register-per-task architecture, the memory controller can simply treat refresh as another task, which will request a refresh every 15.6 µsec. The controller must take into account the state of the memory in the single-comparator and comparator-per-bank architectures, because the controller must precharge all banks before issuing a refresh command. If a bank's register bit is invalid, the bank is already precharged, and the controller can immediately issue a refresh command. If the register bit indicates that the bank is activated, the controller must precharge the bank before issuing a refresh command.
The controller uses the following signals:
State transitions occur as follows: READ: IDLE -> ACT -> CMD -> D1 -> D2 ->
IDLE
If the ABORT input goes high during states D1 or D2, the controller issues a burst-stop command to the SGRAM and immediately returns to the IDLE state. To correctly interface to this memory controller, the host should support the following requirements:
You should consider two main factors when evaluating the net throughput of an SDRAM device. The first factor is the minimum cycle time; vendors often market their SDRAMs using this parameter. Minimum cycle time tells how fast data can be read or written during a burst. The second factor is the latency values of the device, which determine how much idle time is on the bus between data bursts. For example, CAS-to-data latency is the number of clocks before valid data appears on the bus after issuing a read command. CAS-to-data is perhaps the single most important latency value because it occurs on every read, regardless of whether the access is a row hit. Additional latencies, such as precharge-to-activate (tRP) and activate-to-read (tRCD), are also important. These latencies occur whenever the controller accesses a different row from the previous access. For standard SDRAM, a four-word read burst preceded by a precharge/activate sequence takes 12 clocks to complete. With a higher performance SDRAM, the same read cycle takes only nine clocks. This latency decrease results in a net bandwidth increase of 133 to 237 Mbytes/sec (a 34% improvement) without any change in clock speed. You can clearly see the benefits of using low-latency SGRAMs for graphics applications at the system level, with increased performance of memory-intensive applications, such as large screen-to-screen transfers and 3-D. Overall WinBench 3D scores can increase by 5 to 10%. The performance of memory-intensive 3-D operations that frequently switch rows, such as drawing large triangles that use Gouraud shading, can increase by as much as 35%. |
||||||
|
||||||
|
||||||
| EDN Access | Feedback | Table of Contents | |
||||||
| Copyright © 1997 EDN Magazine,EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Publishing Company, a unit of Reed Elsevier Inc. | ||||||