Optimizing High Performance CPUs, GPUs and DSPs? Use logic and memory IP—Part II
In Part I of this two-article series we described how the combination of logic libraries and embedded memories within an EDA design flow can be used to optimize area in CPU, GPU or DSP cores. In Part II we explore methods by which logic libraries and embedded memories can be used to optimize performance and power consumption in these processor cores.
Maximizing Performance in CPU, GPU and DSP CoresClock frequency is the most highly publicized attribute of CPU, GPU and DSP cores. Companies that sell products that employ CPU cores often use clock frequency as a proxy for system-level value. Historically, for standalone processors in desktop PCs, this technique has had value. However, for embedded CPUs it’s not always easy to compare one vendor’s performance number to another’s, since the measurements are heavily influenced by many design and operating parameters. Often, those measurement parameters do not accompany the performance claims made in public materials and, even when the vendors make them available, it’s still difficult to compare the performance of two processors not implemented identically or measured under the same operating conditions.
Further complicating matters for consumers of processor IP, real-world applications have critical product goals beyond just performance. Practical tradeoffs in performance, power consumption and die area — to which we refer collectively as “PPA” — must be made in virtually every SoC implementation; rarely does the design team pursue frequency at all costs. Schedule, total cost and other configuration and integration factors are also significant criteria that should be considered when selecting processor IP for an SoC design.
Understanding the role common processor implementation parameters have on a core’s PPA and other important criteria such as cost and yield is key to putting IP vendors’ claims in perspective. Table 3 summarizes the effects that a CPU core’s common processor implementation parameters may have on its performance and other key product metrics.
High Performance Critical Path Optimization TechniquesPerformance of CPU critical paths can be maximized by selecting the high speed logic libraries and memories and using various design techniques including starting with a proper floorplan, library usage, incremental synthesis, script settings, path group optimization and using over-constraints.
One of the best ways to minimize these critical paths is to start with a good initial floorplan to minimize the physical distance between the memory I/O pins and the critical registers within the processor logic. The ability to change this floorplan is critical as the design progresses and engineering tradeoffs are made to achieve the goals. A good floorplan based on the number of cores and the rest of the high performance core interconnectivity requirements can minimize the physical distance in the top level of the design and reduce timing bottlenecks.
Library usage refers to selecting the best library architecture (in this case High Speed), and selecting the optimal VT and channel length libraries to introduce in the synthesis and place and route flows. This also refers to the practice of don’t_use lists to encourage the tools to select the highest performance cells by “hiding” some of the more area-efficient cells to trade performance for area.
Incremental compile techniques include running synthesis multiple times, sometimes introducing different synthesis options and additional libraries or cells to improve performance with each run.