Design Con 2015

Optimizing High Performance CPUs, GPUs and DSPs? Use logic and memory IP—Part II

-November 21, 2013

Miss Part I?  Click here


In Part I of this two-article series we described how the combination of logic libraries and embedded memories within an EDA design flow can be used to optimize area in CPU, GPU or DSP cores. In Part II we explore methods by which logic libraries and embedded memories can be used to optimize performance and power consumption in these processor cores.

Maximizing Performance in CPU, GPU and DSP Cores

Clock frequency is the most highly publicized attribute of CPU, GPU and DSP cores. Companies that sell products that employ CPU cores often use clock frequency as a proxy for system-level value. Historically, for standalone processors in desktop PCs, this technique has had value. However, for embedded CPUs it’s not always easy to compare one vendor’s performance number to another’s, since the measurements are heavily influenced by many design and operating parameters. Often, those measurement parameters do not accompany the performance claims made in public materials and, even when the vendors make them available, it’s still difficult to compare the performance of two processors not implemented identically or measured under the same operating conditions.

Further complicating matters for consumers of processor IP, real-world applications have critical product goals beyond just performance. Practical tradeoffs in performance, power consumption and die area — to which we refer collectively as “PPA” — must be made in virtually every SoC implementation; rarely does the design team pursue frequency at all costs. Schedule, total cost and other configuration and integration factors are also significant criteria that should be considered when selecting processor IP for an SoC design.

Understanding the role common processor implementation parameters have on a core’s PPA and other important criteria such as cost and yield is key to putting IP vendors’ claims in perspective. Table 3 summarizes the effects that a CPU core’s common processor implementation parameters may have on its performance and other key product metrics.

 


Elusive Critical Paths

To achieve optimal performance designers must reduce the delay in the critical paths of the CPU, GPU and DSP designs. These critical paths can be in the register-to-register paths (logic) or the memory access paths to/from the L1/L2 caches. Critical paths can move between memory and logic during the design process and sometimes feel like playing Whac-A-Mole® but having well characterized logic and memory IP, a solid EDA flow and mastery of design techniques can help designers to achieve timing closure.

 

High Performance Critical Path Optimization Techniques

Performance of CPU critical paths can be maximized by selecting the high speed logic libraries and memories and using various design techniques including starting with a proper floorplan, library usage, incremental synthesis, script settings, path group optimization and using over-constraints.

 

One of the best ways to minimize these critical paths is to start with a good initial floorplan to minimize the physical distance between the memory I/O pins and the critical registers within the processor logic. The ability to change this floorplan is critical as the design progresses and engineering tradeoffs are made to achieve the goals. A good floorplan based on the number of cores and the rest of the high performance core interconnectivity requirements can minimize the physical distance in the top level of the design and reduce timing bottlenecks.

Library usage refers to selecting the best library architecture (in this case High Speed), and selecting the optimal VT and channel length libraries to introduce in the synthesis and place and route flows. This also refers to the practice of don’t_use lists to encourage the tools to select the highest performance cells by “hiding” some of the more area-efficient cells to trade performance for area.

Incremental compile techniques include running synthesis multiple times, sometimes introducing different synthesis options and additional libraries or cells to improve performance with each run.

Loading comments...

Write a Comment

To comment please Log In