Future of computers - Part 2: The Power Wall

-January 06, 2012

The quest for speed is not new.

"Captain," the old lady said excitedly, "I will give you some of my lard to use if you will beat that riverboat in the race."

"Madame," said the Captain, "you have a deal."

The Captain had the lard put on the logs in the boiler, and the old lady watched in excitement as the fire in the boiler grew hotter, the great wheels turned faster, the riverboat quivered and shook. The boats drew even, but they could not pass their rival.1

Other than boiler explosions and extensive loss of life, steamboat racing was not that different from a CPU over-clocking convention. More speed, more heat ... until something breaks. Both competitions really concern money. Faster boats and faster chips are worth more.

It's All About Energy
Energy is the ability to do work, and it is power exerted over a period of time. In the case of computers there is useful power and there is wasted power. Useful power does work when the computer executes instructions or performs other operations. It is usually called "dynamic power" because it varies with the speed of computer operation. Dynamic power constantly stores and removes electrical charges from the billions of capacitors found in a current microprocessor. If the microprocessor quits clocking, dynamic power consumption stops.

Wasted power is consumed whenever a computer is powered on, even if the clock is stopped. For this reason it is usually called "static power". In integrated circuits static power is consumed by the billions of resistors inherent in the leaky transistors that constitute a circuit. Even if a microprocessor is dead stopped, the static power consumption is draining energy from its power supply or battery.

Microprocessor power consumption is doubly bad for users thanks to the First Law of Thermodynamics.2 In essence, energy is never destroyed, it just changes form. In the computer world this means the hundred watts being consumed by the latest multicore Xeon changes form to 100 watts of heat.

Why We Should Care About Power Consumption
One obvious reason we should care is that power is neither free nor infinite. On the small scale, Apple's A4 powers the IPad. Intel's Atom powers the HP Slate. One operates for 10 hours. The other for five.3 One is the overwhelming market leader and the other is ... not.

On the larger scale, supercomputers of today are the video games of tomorrow.4 Study of their problems can give insight into future problems of computers of all sizes.

Supercomputer centers and cloud server farms purchase energy around 10 cents a kilowatt hour. The power consumption average of the Top 10 supercomputers is 4.3 MW.5 If those machines run at full capacity, the power budget for a single year is $3.7M. In addition, the 4.3 MW of heat generated must be disposed of in some fashion, usually by re-circulating refrigerated water through the chips.

The Swiss National Computer Center pumps water from Lake Lugano at 45 meters depth at a temperature of 6 degrees Celsius.6 The pump is connected by an 80 cm pipe to two 13 ton, six meter high suction baskets.Three pumps push nearly 7,000 gallons per minute to the computer center.

Why We Should Care About Heat
Heat is the enemy of engines and electronics. In engines heat compromises the mechanical characteristics of materials causing reduced reliability and eventual failure. In electronics, heat alters both the mechanical and electrical characteristics of semiconductor circuits, circuit packaging, and electrical wiring.

In 2008 Charlie Demerjian analyzed overheating of NVIDIA GPUs.7 In short, as the multicore GPU chips got really hot, the electrical connections between the die and the package substrate separated. As much as 40% of some products experienced early failure.

Early in its life Microsoft's Xbox 360 faced a similar problem users referred to as "The Ring of Death".8 Infant mortality of early systems was reported to be between 23.7% and 54.2%. Some analysts claimed the failures resulted from the multicore IBM Cell derivative causing solder connections to melt and separate from the circuit board, much like NVIDIA's problem.

Temperature also affects performance in the following non-catastrophic ways:

- Increasing temperature decreases transistor speed.
- Increasing temperature increases transistor leakage.

Imagine running across a basketball court unimpeded.Now imagine running across the same court while being pummeled by basketballs hurled from all directions. This is what happens to electrons that carry charge when a transistor turns on. The greater the temperature, the faster the basketballs come. You must work harder and harder to cross the court. So must the electrons carrying the logical information.

The increasing leakage with temperature causes circuits that are overheating to consume more power. As they consume more power the die temperature increases further. As the temperature increases more the leakage increases more, and so on, until either the thermal protection circuit kicks in or the chip destroys itself.

Up Against the Wall, Just How Bad Is It?
The smart guys say it's bad.

From the National Research Council, "The Future of Computing Performance, Game Over or Next Level?":9

    "Even as multicore hardware systems are tailored to support software that can exploit multiple computation units, thermal constraints will continue to be a primary concern."

    "....fundamental power and energy constraints mean that even the best efforts might not yield a complete solution."

    "...there appears to be little opportunity to significantly increase performance by improving the internal structure of existing sequential processors."

    "Even when new parallel models and solutions are found, most future computing systems' performance will be limited by power....."

    "...it is an open question whether power and energy will be showstoppers.........."

From the DARPA ExaScale Computing Study:10

    "....there are four major challenges..."

    "The Energy and Power Challenge is the most pervasive of the four....."

    "........microprocessors of the future will not only NOT run faster than today, they will actually decline in clock rate."

Study Chairman and DARPA darling, Peter Kogge, lent his cheery view:11

    "The party isn't exactly over, but the police have arrived, and the music has been turned way down."

The uniform conventional wisdom of the experts seems to be, "All is woe".12

Another Brick in the Wall13
To understand this dystopian view of computer future we can examine various attempts at mitigating power consumption and the limits of those attempts.

The most important efforts to reduce power consumption have come from semiconductor process improvements. Cheap reliable MOS (metal oxide semiconductor) transistors enabled invention of the monolithic microprocessor.

We can thank a handful of men for the gift of MOS transistors and the technology to manufacture them. An incomplete list includes:

Jean Hoerni – inventor of the "planar process"14
Bob Noyce – inventor of the "silicon integrated circuit"15
Kerwin, Klein, and Sarace - inventors of the "self aligned silicon gate"16
Andy Grove and others – solvers of heavy metal contamination17

Most contemporary engineers only know of Grove as Intel's management guru and author of "High Output Management".18 A small group that goes back 40 years knows he literally wrote the book on semiconductor device physics19.

An even smaller group knows he was a major intellect in solving the "game over" technical problem that could have prevented the microprocessor from ever being invented.

In the late 1960's MOS integrated circuits could be manufactured fairly easily. However, in many cases after a few weeks or months, the devices would cease to function. Some mechanism was causing the transistors to leak more current over time, and at some point, the transistors would fail as shorts. This low reliability made the devices commercially unsuitable.

Grove figured out that heavy metal ions were contaminating the circuits and these ions were migrating over time in a way to cause the failure. This lead to the introduction of "gettering",20 a process step that removed the heavy metals causing the contamination, and a step that is used to this day.

"Moore's Law"21 (or more accurately Moore's Trend) describes the tendency for the number of transistors that can economically be place on an integrated circuit to double every two years.

The technological insight that makes Moore's Law work is called "Dennard Scaling,"22 after IBM scientist Robert Dennard. In his 1974 paper published in the IEEE Journal of Solid State Circuits, Dennard postulated:

MOSFETs continue to function as voltage-controlled switches while all key figures of merit such as layout density, operating speed, and energy efficiency improve provided geometric dimensions, voltages, and doping concentrations are consistently scaled to maintain the same electric field.

To the layman this means that as you make transistors smaller, they get better. Furthermore, as the power supply voltage decreases, power consumption decreases by the square of the voltage. This is the reason that Intel spends billions to be the first company to produce commercial volumes of a smaller process. Dennard Scaling predicts they will always have the fastest parts, the lowest power, and the lowest costs.

For 40 years the microprocessor business has lived off Dennard, but now transistor dimensions are approaching the atomic level and scaling is reaching its limits.

The gate is the electrical connection that controls the MOS switch. The gate is separated from the rest of the MOS transistor by an insulating layer. As this layer gets thinner, the transistor performance improves. However, at a certain point, the gate is so thin that it leaks electrons.

Silicon dioxide was the insulator of choice for three decades, but as gate leakage became an increasing problem, it was replaced by other materials that were less likely to leak.These materials were known as high-k (for high dielectric constant).23

As power supply voltages were reduced, the voltage which caused the transistor to turn on was necessarily reduced.Currently the threshold voltage is as low as 0.3 volt. The closer that voltage gets to zero, the more difficult it becomes to turn the transistor completely off.

In fact, the transistors are so leaky in a current multicore microprocessor that it consumes 50 watts of power while standing still.24

Reducing the supply voltage causes additional problems by increasing the current for the same power level. For an oversimplified example assume a 100 watt multicore microprocessor running on 1 volt. The power pins must provide 100 amps of current.For this reason, power pins constitute 70% of the pins on some packages.

In addition, these huge currents cause voltage droop across the internal power buses.If the droop is great enough, the circuits it connects will cease operation.

Circuit Tricks
Computer architects have some tricks that further reduce power beyond that achievable by process improvements alone.

Some microprocessors adjust the internal voltage of different sections of the logic depending upon what operation is being performed. This is known as "Dynamic Voltage Scaling".25 By running certain sections at a lower voltage than the rest of the chip, power in that section can be reduced.

When taken to the limit, entire sections of a chip can be powered down when they are not needed. This trick is less helpful that it used to be since many microprocessors have 2/3 of their area filled with memory caches.Caches cannot be powered down without losing their contents.

"Dynamic Frequency Scaling"26 similarly reduces the clock speed of some circuits under certain conditions. Dynamic power reduces linearly with reductions in clock speed. At state of the art process nodes, static power is greater than dynamic power for many microprocessors. As a result, frequency scaling is less helpful than in the past.

And the Wall Come'a Tumblin' Down27
There is another way to attack the Power Wall. Venray designs CPU cores to be built on commodity DRAMs.

When Venray engineers designed TOMI™ Aurora (4-core 64M) and TOMI™ Borealis (8-core 1 GB), they started with several advantages over legacy approaches:

  1. DRAM processes were inherently low leakage, because if the transistors leaked, the memory would forget.
  2. The capacitance of buses between cores and memory was really small.
  3. DRAM processes produce about the cheapest transistors in existence.

They also had several disadvantages:

  1. DRAM transistors were about 20% slower as the same logic process node.
  2. DRAMs were mostly analog devices and sensitive to high current spikes.
  3. DRAM processes usually had 3 layers of metal for connection compared to 10 or 12 layers in microprocessor processes.
The primary technique used to save power was to invent a really simple computer architecture that was both efficient on "Big Data"28 benchmarks and parsimonious in transistor count.The resulting Borealis core was 22K transistors. This was easily routable with 3 metal layers. The caches added an additional 393K per core. You can run your gcc benchmarks on TOMI Borealis at http://www.venraytechnology.com.

Power was reduced even further by making extensive use of differential signaling wherever possible.DRAMs already did much of their work in differential, so this was fairly straightforward.

The result was a 2.1-GHz 98-mW 32-bit CPU core.

The Future
For nearly 100 years, steamboat designers incrementally improved engine designs, boiler construction, materials, and fuels. Captains plying the Mississippi cargo trade pushed the technology envelope of heat and speed,

eventually reaching the unbelievable rate of 9 mph.

Then trains were invented and put the boats out of business. The same might happen to makers of today's legacy CPUs.

Next time we will examine the last of Patterson's Walls, Instruction Level Parallelism and pipelines.

The future of computers - Part 1: Multicore and the Memory Wall

Go to Part 3: Future of computing: The ILP Wall and pipelines.

About the author:
Russell Fish III's three-decade career dates from the birth of the microprocessor. One or more of his designs are licensed into most computers, cell phones, and video games manufactured today.

Russell and Chuck Moore created the Sh-Boom Processor which was included in the IEEE's "25 Microchips That Shook The World". He has a BSEE from Georgia Tech and an MSEE from Arizona State.

1 http://americanfolklore.net/folklore/2010/10/riverboat_racing.html

2 http://en.wikipedia.org/wiki/First_law_of_thermodynamics. Sec 5.2 Static power is 40% of total power http://users.eecs.northwestern.edu/~rjoseph/publications/cmp-adapt.pdf

3 Comparing iPad and Slate: http://www.engadget.com/2010/04/05/hp-slate-to-cost-549-have-1-6ghz-atom-z530-5-hour-battery

4 Kogge: http://spectrum.ieee.org/computing/hardware/nextgeneration-supercomputers/0

5 Power Consumption of Top 10: http://www.top500.org/lists/2011/06/press-release

6 Supercomputer cooling: http://hpc-ch.org/blog/category/cscs/

7  NVIDIA failure: http://www.theinquirer.net/inquirer/news/1004378/why-nvidia-chips-defective

8  Microsoft Ring of Death: http://en.wikipedia.org/wiki/Xbox_360_technical_problems

9  http://sites.nationalacademies.org/CSTB/CurrentProjects/CSTB_042221

10 DARPA report: http://www.er.doe.gov/ascr/Research/CS/DARPA%20exascale%20-%20hardware%20(2008).pdf

11 http://insidehpc.com/2011/01/28/is-the-party-over-for-exascale-ambitions

12 "The Flood of Mighty Waters": http://books.google.com/books?id=3UcaAQAAIAAJ&pg=RA1-PA191&lpg=RA1-PA191&dq=%22all+is+woe%22&source=bl&ots=s7f6NRk0TB&sig=CcwPUZE9pEDn1a7pAK6A3cz7nrI&hl=en

13 http://en.wikipedia.org/wiki/Another_Brick_in_the_Wall

14 http://en.wikipedia.org/wiki/Planar_process

15 http://en.wikipedia.org/wiki/Integrated_circuit

16 http://en.wikipedia.org/wiki/Self-aligned_gate

17 http://scitation.aip.org/getpdf/servlet/GetPDFServlet?filetype=pdf&id=APCPCS000683000001000003000001&idtype=cvips&prog=normal&bypassSSO=1

18 http://www.amazon.com/High-Output-Management-Andrew-Grove/dp/0679762884

19  http://www.amazon.com/Physics-Technology-Semiconductor-Devices-international/dp/0471329983

20 Gettering: http://scitation.aip.org/getpdf/servlet/GetPDFServlet?filetype=pdf&id=APCPCS000683000001000003000001&idtype=cvips&prog=normal&bypassSSO=1

21  http://en.wikipedia.org/wiki/Moore's_law

22  Dennard: http://www.ieee.org/portal/cms_docs_societies/sscs/PrintEditions/200701.pdf

23  http://en.wikipedia.org/wiki/High-k_dielectric

24  p.6 Sec. 4.1 http://207530779760502934-a-1802744773732722657-s-sites.googlegroups.com/site/mingchenhomepage/published-papers/ics11.pdf

25  Dynamic Voltage Scaling: http://en.wikipedia.org/wiki/Dynamic_voltage_scaling

26  Dynamic Frequency Scaling: http://en.wikipedia.org/wiki/Dynamic_frequency_scaling

27  http://en.wikipedia.org/wiki/Joshua_Fit_the_Battle_of_Jericho

28  http://en.wikipedia.org/wiki/Big_data

Loading comments...

Write a Comment

To comment please Log In