Subscribe to EDN
RSS
Reprints/License
Print
Email

Special Report: Inside the new multicore processors

By Ed Sperling, Editor in Chief -- EDN, April 13, 2007

Electronic News/Electronic Business sat down to discuss the future of multicore chips and multiple chips with Phil Hester, chief technology officer at AMD; Harold Hughes, president and CEO of Rambus, and Neil Knox, CEO of Peakstream

Q: There has always been some version of multicore, whether it’s a microkernel or multiple processors or even multiple cores within a processor. Except for very specific applications such as database searches, no one has been able to write the software to utilize it effectively. Will it ever have mainstream applications?
Hester: You need to distinguish between the client and the server space. I take exception to the comment about it being hard to program. In the server space today, it looks exactly like SMP (symmetric multiprocessing) on a chip. We have 20-plus years experience in RISC Unix machines being able to run applications well. From a software perspective, it’s business as usual. The challenge is in the client space, and your point is exactly right. There are 20–plus years of history where the client applications were never designed to run more than one thread or one core. That’s started to change two to three years ago, somewhat driven by the gaming space, whether it’s Xbox or PS3.

Q: Those are multicore machines, but the majority of applications still only use one core at a time.
Hester: But the development of those platforms drove select applications to start becoming enabled for multicore environments. Does it make sense for Powerpoint to be multithreaded? Probably not. On the other hand, we are starting to see more and more media-rich applications in the client space. If you look at the number of people editing videos on their PCs, more and more people are doing that. There’s a transition from traditional definition to HD video streams. The encoding/transcoding of that can use multicore technology to run a virus scan in the background. To us, the question is, ‘What is the number of cores that the client applications can effectively utilize?’ It’s certainly a lot less in the client space than in the server space. And because the applications are becoming so diverse in the client space, it’s not going to be homogeneous multicores. It’s going to be heterogeneous multicores. If you look at software stacks, Vista is going to make 3D graphics standard the way floating point became standard in the 486 generation. The first heterogeneous multicore will be a CPU (central processing unit) and GPU (graphics processing unit) in the client space.
Knox: There is one other element, and that’s the customer environment. The customer is used to getting horsepower improvement from single-threaded CPUs every 18 to 24 months. Going forward, the CPU companies have declared their architectures are going multicore. We ran a survey of customers and found that 65 percent of their applications are still single-threaded. If they want to improve performance on a multicore environment, they’re going to have to tweak their applications.
Hughes: I first encountered multicore when I was in charge of Intel Capital.
That was the point in time when the transistor budget was of a size that you could put multiple cores on it. But when you thought about the platform impact, very few of the applications even recognized the existence of multicores. So we began developing relationships with compiler companies to convince them of the value of recompiling their applications to take advantage of multicores. There were certain types of applications that showed real performance improvements, but many of the standard business applications showed very marginal improvement. We were going to some fairly exotic types of compilers, like the long-word compiler. Remember the if-then? You went down both streams until you figured out which was the more likely, then you went in that direction. That became the basis of the Itanium processor. The transistor budgets of today are staggering. You have to find ways to put something on there, and you have to find ways to eke out performance in any way possible. Running applications like virus scans in the background will give the consumer some benefit.

Q: How many cores is optimal?
Hester: The danger here is the industry getting into the core war like they got into the megahertz war. For a finite silicon budget that’s affordable in a client microprocessor, you can put four cores in that with a decent cache or you can put three cores in that with a better cache. Which one of those is going to give you a better user experience? I’d argue that in many cases the three-core would do better. There is a huge transistor budget so you can put more on a chip. The question is how you feed it, and there is a growing disparity between the speed of the microprocessors and the rate at which they can consume data, and between standard memory technologies. There are more cycles outside the core than inside. You have look at the memory hierarchies and how you feed the cores as well how many cores you have.

Q: As we progress to 65 and 45 nanometer processors for client machines, is the future going to be one application that takes advantage of multiple cores, or will it be multiple applications running on individual cores?
Hester: I think it’s a combination of both. There will be some applications that will inherently be able to easily exploit multiple cores. Rendering, 3D graphics and transcoding are all good examples of things that scale across multiple cores. Being able to render a PowerPoint slide in 70 milliseconds versus 100 milliseconds isn’t important. For a lot of the client applications, medium-range microprocessor technology is at the human limitation point, not the computer limitation point. Putting a lot of effort into trying to speed up those applications isn’t a good way for the software companies to invest their skills. On the other hand, we will see more intelligence in power management of the individual cores. In our future microprocessors, each of the cores is autonomous enough to figure out what’s running on it to be able to adjust its power level up and down in hardware. At the same time, a lot of the issues in the data center involve cooling, not the physical footprint. Having software that can aggregate power consumption across a number of servers and being able to dynamically adjust the power up and down based on managing the peak resources of the data center is an effective way of using resources.

Q: You’re referring to hypervisors, no doubt. How much overhead does that take?
Hester: That depends on how good a job you do in the hardware. If you’re talking about IBM in the LPARs (logical partitionings), there’s single-digit overhead. If you look the PC space and saw the early partitioning, it was 20 to 50 percent. One of the things we’re spending transistors on is hardware acceleration to enable hypervisors, both in the processor and in the I/O subsystems.
Hughes: AMD has done such a good job lately that Intel undoubtedly will fire up their engines and AMD will respond. The monolithic structures will go away and different techniques will be used more and the consumer will undoubtedly be the beneficiary. We’re very happy because all these multicores need bandwidth, and that’s what we do. The monolithic memory structure is gone. Not only are there power issues with mobile computers, but there are heat issues with the FB DIMM process. There will be some very unique memory issues now that AMD has bought ATI. The needs of a normal processor in terms of memory are different than the needs of a graphics processor. Presumably you want one type of memory, not multiple types.
Knox: It’s a very exciting time from an engine perspective. But the system requires a balanced architecture. Now, with multicore, you need a balanced hardware architecture. I’ve never met a customer who did not expect more horsepower on an annual basis. I don’t think we should become complacent that we’ve developed a faster CPU and the customers won’t need anymore power. That’s delusional. The hardware companies have to embrace the software environment, whether it’s power management or an easy way to develop applications for these multicore environments. They have to help develop solutions or the customers are going to run into a wall—and then the hardware vendors are going to find out they can’t sell the same volume of CPUs as they sell today.

Q: We’re hearing about 80 core chips. Is the solution a more rational use of cores, and possibly systems in a package where there are multiple chips for specific functions?
Hester: The diversity of the workload, particularly in the client space, is continuing to grow. When the x86 architecture was invented, you were running a character-oriented spreadsheet and you were emulating a mechanical teletype terminal. The x86 architecture has evolved very well over the past 20 years, but you’re going to have to have some specialized architectures to deal with this diversity of workloads. It’s not going to be more homogeneous cores. If you think about Amdahl’s Law, there are two critical paths for every piece of software. One is the amount of code that can be done sequentially. The other piece is the amount that can be done in parallel. The x86 has been optimized throughout its life for sequential. The GPUs grew up on the parallel path. As they did, they created different memory technologies because they’re fetching large amounts of data in parallel. There’s a fair amount of overhead setting that up, but once you get it set up those things scream over and over again with a large piece of data. Now, with the x86, you want to be using the GPU—or a piece of it—for general-purpose applications. That will be the first good example of heterogeneous computing. But you can see the need for certain specialized pieces of hardware. On today’s high-end dual-core machine, you can edit standard definition video at about 4x real-time. If you’ve got a 60-minute tape, you can do that in about 15 minutes. If you do that same thing on an HD tape, that same 60 minutes takes 20 hours. If I want take that back to 15 minutes, I have to use 80 cores. That’s not a good way to use hardware, when I can have one core do that same thing in less time. Over time, you’re going to see a range of cores that matches what the end user workload looks like. Part of the challenge there is the software stack.

Q: But that’s always been the challenge in multicores, right?
Hester: Yes. I’ve worked with both the Microsoft and the Linux communities to make sure they can exploit this. Our view is it’s a journey. It’s not going to happen overnight. Incrementally, more and more of this hardware capability will get exploited. But it’s a matter of impedance matching between the application and what features you can put in the hardware and the software.
Hughes: In all businesses, to solve a class problem, standardization happens. Much of the microprocessor was about standardization. Having all of the infrastructure necessary to do that, we’re now going to go into specialization. How that will work out is hard to say. There are plenty of people who want a standard PC, but there are also people who will want something different. How do you solve that with what’s available in the various PCs and how does that affect the software world?
Hester: This is a good news/bad news thing. When enough applications used floating point, it made sense to incur the die size increase and the resulting customer cost increase because enough applications benefited so you could afford to charge everyone for it. That’s the problem today. Enough people have to benefit to incur the incremental die size cost. In the client space, are we there for CPUs and GPUs? Yes. But in the high-performance technical space, what may be a better interim answer is to enable an add-in optional piece of hardware. You want to be able to take advantage of the PC architecture, but use a standard interface—whether it’s a PCI card or an HTX (HyperTransport Expansion) connector—to allow people to build these differentiated systems. The first example of that is using a GPU card for high-performance technical, through-the-stream and close-to-the-metal applications. We would see specialized vector processors as an add-in, where they can take advantage of the standard platform and the cost efficiency of that, but only have to design the add-in piece and make it optional for people who care about that. One of the challenges is to define an application programming interface that is easy enough to add around the standard Windows or Linux environment and also thin enough to deliver the performance of the hardware.

Coming Tuesday: Part II, the future of computing.

RSS
Reprints/License
Print
Email
Talkback
Canon Resource Center

Featured Company


Most Recent Resources

Advertisement
Related Content

No related content found.

  • 0 rated items found.
Advertisement

KNOWLEDGE CENTER

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
Engineering Careers
Jobs sponsored by
Advertisement
About EDN   |   Site Map   |   Contact Us   |   Subscription   |   RSS
© 2012 UBM Electronics. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other UBM Canon sites

UBM Canon | Design News | Test & Measurement World | Packaging Digest | EDN | Qmed | Pharmalive | Appliance Magazine | Plastics Today | Powder Bulk Solids | Canon Trade Shows