The Automata Processor - Practical processing in memory
The semiconductor industry has done a good job of delivering incremental improvements to the processor memory interface. The steady progression of memory interfaces, from fast page mode (FPM) through a variety of DDR implementations, has kept processors fed with the data needed to continually advance the state of computing. The speed of the interface continues to be a primary focus, with one of the most recent innovations in memory technology, the Hybrid Memory Cube, taking memory performance to an unprecedented 160 GB/s of throughput.
Figure 1. von Neumann architecture basic architecture
The dominance of the von Neumann architecture has compelled the industry to take a relatively narrow view of the primary methods used for scaling system performance. The old cliché, “If it ain’t broke, don’t fix it,” could certainly be used to describe the evolution of computer architectures over the past three decades. While the von Neumann architecture provided the framework, Moore’s Law provided the means for the architecture to scale in performance and capability. Although the transistor budgets have continued to increase, as predicted by Moore’s Law, extracting more performance from these transistors is becoming increasingly difficult. The way these transistors are being utilized has shifted rapidly over several CPU generations. There are several reasons for this shift, including:
- Frequency Scaling Challenges: Previously, increasing the CPU operating frequency was one of the most effective or cost-efficient means to scale CPU performance. The growing importance of power efficiency, however, has driven CPU vendors to consider alternatives to frequency scaling.
- Maturity of CPU Architectures: Since the introduction of the Intel® 4004 in 1971, the capability of the instruction execution pipeline has consistently improved. These improvements have included integration of the numerical coprocessor, the introduction of superscalar architectures, and continual refinement and extension of the basic x86 instruction sets.
- Emergence of New Performance Drivers: Over time, it is normal for industries to mature. This is certainly true for an increasingly large segment of the computing industry. From the 1980s through the mid-2000s, most computer users were able to absorb the increases in compute performance that the industry delivered. Today, however, the performance of many applications is sufficient for a wide range of customers. All but the most complex spreadsheets can be computed nearly instantly. Web browsing is fluid, and performance issues are most often related to Internet bandwidth. Creation and rendering of high-definition video content is becoming practical and commonplace in the home. The need for extreme performance is largely driven by commercial, scientific, and government/military applications.
These factors have led the industry to change the way that CPUs are architected. The challenge facing CPU designers is to continue absorbing the increasing transistor budgets afforded by Moore’s Law and at the same time, better align the CPU’s performance gains with the most pressing computational problems facing the industry. Increasing cache memory capacity has certainly helped to consume transistors, but performance increases along this vector begin to approach asymptotic limits. This is especially true when considering more mature applications where performance gains are more elusive.