Zibb

Brian DipertEDN Senior Technical Editor Brian Dipert exposes, analyzes and
opines on diverse topics in technology. Follow the Brian's Brain Twitter feed at www.twitter.com/BrianzBrain.



   Advertisement

Profile

RSS Feed

  • Add this blog to your RSS newsreader!

Recent Posts

Recent Comments

Most Commented On

Archives

By Category

Consumer Electronics Design Articles

Blog

Wednesday, October 12, 2005

Instigating a Platform Tug Of War: General-Purpose GPUs, Going Beyond Graphics

Oct 12 2005 3:18PM | Permalink |Comments (1) |


This blog post references my cover story, 'Instigating a Platform Tug Of War: Graphics Vendors Hunger for CPU Suppliers' Turf' in the October 13, 2005 edition of EDN.

Take a look at the print article's Figure 2c (link is to a PDF). Now consider the fast bidirectional PCI Express bus linking the CPU and GPU. Raise your hands....how many of you are thinking 'general-purpose coprocessor' or 'DSP' right now? Congratulations; I commend you on your vision.

Why are GPUs so much more powerful than CPUs, at roughly similar die sizes, when crunching through mathematics calculations? Although, as the print article points out, the contenders are ever-slowly-but-steadily encroaching on each other's turf (with GPUs, among other factors, now software-programmable, and in latest-generation iterations, even supporting shader instruction branching) the fact remains that a GPU at its heart is an application-tuned streaming media processor. This means that GPUs, for example, don't need large on-chip caches, and can instead devote that silicon area to additional computational circuitry.

Don't underestimate the importance of this differentiation....it's why I put Figure 2a (link is to a PDF), the die shot of the dual-core Pentium 4 CPU, in the article. See all those sections of the CPU that look like regularly repeating farmers' fields? Most of that's cache. Another important memory-related differentiation between CPUs and GPUs is that whereas in the CPU case there's a complex multi-DIMM link between the processor and main memory (in the Intel case, also including an intermediary DRAM controller in the core logic chipset's 'north bridge'), GPUs have a much simpler (therefore faster, and wider) point-to-point processor-to-memory hookup. ATI's recently-introduced X1800XT GPU, for example, touts a 256-bit wide frame buffer interface running at 750 MHz, to double-data-rate GDDR memory (translating to 1.5 Gbps of per-pin peak bandwidth).

Before you get too excited, though, realize that the 'G' in GPU stands for 'graphics'; these devices will for the foreseeable future remain graphics- and otherwise imaging-optimized devices (Nvidia's Chief Scientist David Kirk was quite adamant on this point when I asked him about it at the conclusion of his Hot Chips keynote address), and that fact will limit their general-purpose applicability. To that point, after you check out the IEEE Computer and IEEE Computer Graphics and Applications article links at this blog post, make sure you read through Stanford's paper published at the Eurographics Graphics Hardware 2004 conference. The inability of GPUs to do rapid, efficient random accesses to/from texture memory (a feature that's not generally needed in graphics applications), coupled with a deficit of on-chip cache memory that would otherwise hide the long access latency, is just one example of what I'm talking about here.

GPGPU (quick aside: the GPGPU site is an outstanding resource that I highly recommend as a jumping-off point for anyone interested in continued research on this topic) reminds me a lot of the C-on-FPGA (aka reconfigurable computing) experimentation that I spent my first eight years at EDN following. In both cases, there's tremendous potential for performance improvement, and for cost and energy consumption savings. In both cases, though, there's a tremendous 'paradigm shift' (I hate that term, but nothing else comes to mind at the moment) that'll have to occur for the potential to be fully realized, and until then there's tremendous difficulty in force-fitting software originally architected to run on one silicon platform (a CPU) to instead run on a different platform (as software on the GPU, or as logic gates on a FPGA). As a result, in both cases, the development activities today remain predominantly the bailiwick of academia, although if you re-read David Kirk's Hot Chips comments at the beginning of the print article, it's clear that sooner or later the GPU vendors expect (or said another way, are dependent upon) the concept to go mainstream. And to that point, one key difference between FPGAs and GPUs is that the latter technology has 'crossed the chasm'; i.e. shader-based GPUs, poised and ready to leverage beyond-graphics applications, are now pervasive in PCs.

Continued with 'Instigating a Platform Tug Of War: General-Purpose GPUs, Alternatives and Another Perspective'....


Reader Comments



at 4/12/2006 1:48:15 PM, SIDDHARTH DOSHI said:
The ability of a CPU to perform at least 2 tasks OR threads simultaneously is multitasking which also is co-ordinated by the OS's scheduler,which must support pre-emptive multitasking.(a task can be performed on a PC before finishing a job ALREADY AT HAND.eg - opening the start menu before finishing listening to a song on WINAMP )

The technology underlying this is obviously an ideal division of the processors clock speed (time) among other tasks that are running
as well.

I realise today , april 13,2k6 - 1.34 a.m. IST , foolishly that a process needs only a part of clock speed whose magnitude depends on its load on the CPU,the rest goes wasted.But when multitasking we see our system hanging every now n then.This may be due to lack of hardware resources - RAM amt./CPU speed.

Now,to the pt. , finally , a GPU is faster than a CPU?

Why ?

1.Clock Speeds of the Core
2.Memory Speeds of GPU > RAM/CPU cache ?
3.Memory bus width > CPUs FSB
4.Parallel pipes of GPU > CPU

Ans. 3,4

Todays CPUs run 64 bits of data at an instant , a GPU memory runs 256 in same time.
A CPUs FSB is 1066 Mhz < GPU max mem freq. = 1500 MHz

A nVIDIA GeForce 7800 GTX GPU has 24 pixel pipes [= 24 simultaneous tasks can be run on a PC at a time] to process polygons while gaming.A D840EE Pentium has the 'ability' to process 4 threads at a time = OS Scheduler can send 4 simultaneous threads [ corresponding to 2 cores, each has a virtual extra one too = 4 cores] at a time.This DOES NOT mean you can run only 4 tasks at a time.

Physics-More work is done by GPU than CPU => GPU is a more powerful machine than CPU

Next , come core speeds , a 7800GTX GPU [ G70 ] is 500 Mhz Max. and a Pentium maxes ou at 3.8GHz.So,why is pentium slower ?

I have no answer to this except that only CPUs clock speed doesnt give a responsive PC.You also need RAM etc with the CPU,which may be higher in GPU.

Now taking the pipelines point into consideration,they are elements of difference b/w CPU and GPU due to which the GPU has its current super parallel performance.The pipes are made by transistors, the more they are , the more the transistors , the more the pipes.

But heres the catch - If I used a 24 core cpu,It will need a very big die / socket and a mainboard to fit in.The bigger the die, the more are the chances of data corruption due to electricity , noise and magnetism around it.

So we first reduce the size of each core => net reduction in size and increase in clock speeds by usage of smaller sized but more numbered transistors [ and thus high yielding [ I got this today , April 13,2k6] = offering more overall performance in the real world ] .The problem is that these transistors will have a net increase in size for 2 cpus of same size = a dual core cpu.So we again have to decrease the size of the cpu die/transistors in a dual core cpu[Note - the lesser the size of the die and more transistors on it = less is current and heat generation difference per transistor relative to prevoius size = lesser heat=cool cpu].This though helpful is not the end of the problem as reduction is only possible to a limit AS OF NOW.Back to the same size case,if you do so, you lose out on clock speeds.The reason why a GPUs core clock speed is low is this.

Just try working on a 400 MHz CPU for a change and spot the difference.

Therefore in addition to low GPU memory caches and its point-to-point architecture , this may be the reason why we cant replace the CPU by a GPU
TODAY.

This hard work was possible due to Digit,CHIP[techno magazines],The Net [ For the role of buses],Anandtech.com [ Transistor theory ]

Post a comment



Display Name

Change Image
Before submitting this form, please type the characters displayed above.
Note the letters are NOT case sensitive.


ADVERTISEMENT

©1997-2009 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other Reed Business sites