News and New Products
Refining multicore concepts, part two
By Ann Steffora Mutschler, Senior Editor -- Electronic News, 7/2/2007
Simon Davidmann, CEO of Imperas, David Stewart, CEO of Critical Blue, EDA industry analyst Gary Smith, Ian Mackintosh, president of OCP/IP and Chris Lennard, director of strategic marketing for the system design division at ARM sat down with Electronic News/Electronic Business during the Design Automation Conference held earlier this month in San Diego to discuss multicore issues. What follows are excerpts of that conversation. This is the second of two parts. For the first part, click here.
Q: Where are we at today with multicore programming?
Davidmann: Where we are at today with multicore from a programming point of view is very much like the semiconductor industry was before there was that convergence on representation and a flow. Three or four years ago, it was just sprouting, but divergence has happened, and now we’ve got to get convergence on the right approach, the right tools, the right methodology, and the right standards.
Mackintosh: I don’t think there will be a convergence in the short term just because of the nature of the variables. The issues are: the number of processors, whether the processors are the same or not, whether they have the same interfaces, protocols and the way they interoperate, and the folks who are trying to program them – do they have simple needs and deep knowledge of the elements or are they trying to program from the outside. These variables are so large, I don’t think you are going to get solutions that embrace all the needs of complex SoCs. They are just too diverse, so all you will see is more divergence and more point solutions. I don’t think you are going to see a unified force.
Davidmann: And that’s what happened in the 70’s and early 80’s with [hardware description languages], and if you look at Verilog, it’s a very simple, strange, funny little language but it actually isn’t specific to a DSP or a specific multiprocessor. It’s like ‘C’ – you can pretty much program anything in it, and in Verilog, you can pretty much describe any hardware. I believe from a programming of a concurrent platform, there will be a representation which is completely independent of the implementation. How it’s implemented doesn’t matter – you have to think of the applications independent of the application and you will need technology to map it. When you write RTL, you don’t know whether it’s going to go this ASIC library or whatever – you don’t care actually, because it’s the functionality that you’re trying to represent and then you have to verify that functionality and then you can do the implementation. I believe that’s how software has to be implemented. You run into this problem when you put the processors down, the software guy sees the architecture and starts fiddling with the register addresses in his peripheral and he builds his software that sits tight to one platform and suddenly you get a platform with three times the stuff in it, and the application is thrown away. You have to start moving the applications so that they are concurrent and independent of the implementation that they sit on, and then we can program something.
Lennard: This was a strategy adopted by the EDA industry and pretty much anyone who had ESL tools in the late 90s through to 2001/2002 and those tools are all dead now because of this concept of separation of behavior from architecture. And the reason why they collapsed is that they could not deal with the concept of asymmetric multiprocessing.
Smith: There is a general convergence on this stuff and there always has been. Verilog was there when everybody had converged on what an RTL should look like, and yeah mine was a little bit different. In essence, Verilog was a standard. It was a standard we all jumped on because we knew we needed a standard to get the tools and methodologies, then methodologies converged.
Davidmann: Yes, and I think language enables methodologies, but it was because of the tools that it actually accelerated, because when Verilog in ’85 was a great language with a good gate-level simulator and this weird programming language and you could model a telephone going, ‘ring-ring.’ It was only when Design Compiler could map some of these strange concepts you could actually get some productivity gain, it reduced the bugs you put in by hand and it improved your productivity. And then it became adopted.
Smith: The reason Design Compiler was successful was because it mapped to a standard. The only standard I’ve seen that’s taken off that initially was extremely immature and grew like wildfire, because for some reason it was built right, was SPIRIT. SPIRIT was supposed to be this bold language thing and within three years it was doing far more than anybody ever expected.
Stewart: I don’t necessarily deal with all these much bigger concepts in that, I can’t disagree with them, but the people I speak to have got an awful lot of things they need to do in a short period of time, and a huge amount of legacy -- from programming, from tools, from software, from cores they are targeting – and you can’t run away from it, and you can’t assume that suddenly they are going to have a great religious experience and change everything. That’s the practical reality of trying to introduce new ideas and new tools to people. They are terribly constrained, and I think they are constrained in a way they were not constrained 10 or 15 years ago when all the Verilog stuff happened. I think that people have less time; they are less open to new ideas not because they don’t like new ideas, but because they have a job to do. And so I think if we are to get in this direction, then it’s going to be a series of small steps that build upon the way we already do something. It’s the only practical way it’s going to happen.
Mackintosh: Speaking of small steps, [the OCP/IP industry consortium] is very uniquely positioned to handle the situation of heterogeneous multiprocessor designs simply because it is configurable. We are working in three areas which all help towards solving problems. First, in cache coherence, we have snooping schemes and directory-based schemes being developed through this year. Second, in the area of debug, we shall have a large unifying effect on pulling together disparate and disaggregated debug schemes, be they hardware, software or inter-SoC schemes by essentially creating a debug socket which is inclusive of other technologies being done in the industry. And the third front is by developing network-on-chip (NoC) benchmarking schemes, which really connect people to share common ground in terms of evaluating network-on-chips, whether they be in the development world or in the industry. So I think that’s some small steps.
Q: What else is missing in terms of refining the whole concept of multicore?
Mackintosh: A fundamental thing that is missing is a taxonomy. I think that’s a fundamental issue. I hear everybody talking – you have to listen really hard to know exactly what folks are talking about.
Davidmann: In the Multicore Association, the first couple of discussions were, “Is it ‘multicore,’ is it ‘multiprocessor,’ is it multiprogramming,’ what is it?” It’s a real challenge. The fundamental problem is, can we come up with something which is a standard way of programming applications in a way, which is independent of a platform, which is adoptable easily, that is not PhD type of stuff. And it’s got to work with legacy [code]. But really, none of this will come about until the tools exist. The interesting thing is computer science has been worrying about concurrency and parallelism since the year ‘dot.’ And it’s only recently that it’s actually having a problem to solve in that we’re not going have a 40-gigahertz processor, so if you want to run H.264 encode on a processor, you’ve actually got to do it parallel – no question. You put it in hardware or you put it on 12 processors – there’s no in between.
Mackintosh: You made an interesting point: you said nothing is going to happen until the tools exist. The reality is today, of all the MP SoCs that exist today, only 10 percent are using four or more processors. If that is the case, it is really a niche market in terms of tools being developed – that’s not a majority play for anyone. When you have vastly different types of processors being used and different numbers of them, and different objectives in mind in terms of being able to program them, the tools that will be put in place are extremely point solutions.
Davidmann: I think that’s the current understanding. People that built schematic editors had similar comments, because a AND gate is very different from an ADDER…so the applications have to be very independent from the implementation. It’s a complex problem, but we don’t have a choice here. We used to have Moore’s Law and it’s concurrency that’s going to double every 18 months. If we don’t wake up and sort these concurrent problems, we might as well forget the chip business because you’ll never be able to build anything with it. Every company that builds a multiprocessor chip has to have a way of programming or they die. The smart ones spend more than two-thirds their effort on software.
Q: Once we have a concurrent programming model/architecture, how will it address the legacy issues?
Stewart: It has to.
Lennard: The primary driver for ARM as it stands today with the ARM11 MP core is the fact that you have to deal with legacy and you need to be able to import things that have been dealing with these dominant cores, such as those dealing with the ARMv6 ISA as well as the ARM architecture and be able to import that from unicore over to multicore. The issue is that when you’re moving those things over, you can, for example, get a significant benefit and you can get, through improved utilization, power benefits of 85 percent , and things like that. That’s one of the major reasons people are moving over to multicore. But the legacy is one of the major inhibitors for moving. If you don’t resolve that issue, you won’t move and that’s one of the key things – and whilst I do agree that there is an interesting idea of being able to build top down – if you look at a chip today, it has a microprocessor, memory and some engines that work on things. Inside of those engines you have synthesis running because it is a well-contained problem, but you don’t try to synthesize everything the chip does because unless you are doing, for example, super ASIC, which is very dedicated. So there is this blend between the concept of the general programmable and those things which are a little bit more specific. Once you get into the specific area, there is a great opportunity for some very new and novel software programming languages and software mapping approaches. But I still think the legacy issue is going to drive multicore decisions over the next five years.
Smith: The semiconductor guys, when you talk to them, there’s two types. There’s the homogeneous guys, that, when you get the guys that are willing to talk, say, ‘We’re in trouble, we have four processors,’ and then there’s the heterogeneous guys saying that, ‘We’re about at six processors right now, and we’re going to 12 next year, and we don’t have a clue how we’re going to do it.’ So if you look at all the roadmaps, they’re saying, ‘Fix the problem please or Moore’s Law comes crashing down around our ears.’
Davidmann: The ITRS roadmap shows in 2011, 29 processors in production on the average chip. There’s no question, that’s the future, and I don’t believe the existing solutions solve that problem. There are new solutions needed.
Stewart: There is some good news in this. If you look at the legacy code, particularly as it relates to the accelerator part – the compute intensive pieces – generally speaking, the software is, because of the way the human brain works, they don’t work entirely in parallel so they tend to think about, ‘This is how I have to process the data first, and then once I’ve got that I get some kind of intermediate result and then I process the next thing.’ It is sequential, but it’s sequential in blocks – in pipeline. And it doesn’t mean you can immediately rip it apart and run it all in parallel, but it’s an interesting starting point – getting back to the idea of having functionality. If you can then manage the communication between those blocks then maybe there’s a way to get something based on the legacy that can be somewhat modified and then put onto a multicore architecture.















