A coworker and I were debating what do the likes of Intel, IBM and AMD do d= ifferently that allows them to design circuits at 3GHz+. In contrast with F= PGAs which for the most part run on a similar process node (i.e. 65 or 40nm= ), but where even the major static blocks (i.e. DSP blocks) are only capabl= e of around 500Mhz performance. Also compare to the fastest ARM chips, grap= hics chips, most ASICs and other chips which may get up to 1.5GHz, but rare= ly faster (yes, faster chips do exist, but they are the exception rather th= an the rule). So we had some theories about the cause of the difference: - Intel/IBM are way ahead in their technology development over the likes of= TSMC and UMC. Doesn't AMD use UMC? - The 3.5GHz logic (i.e. the execution unit pipeline) in an Intel CPU doesn= 't actually run at 3.5GHz. There is a 3.5G clock, but it turns into a mess = clock enables and logic effectively running at a much slower rate. Though e= ffective 3GHz performance is still achieved through parallelism. - The difference is dynamic logic/domino logic/etc. Most common logic desig= ns (ASICs, FPGAs, ARM processors) use static logic - a mess of conventional= CMOS gates separated by flops. High performance chips use dynamic logic, l= ots of latches and similar tricks to avoid the overhead of static logic. Th= is idea may not stand up to scrutiny as I understand that the latest Intel = architectures (Nehalem) are fully static. - The designers of ASICs/GPUs/FPGAs knowingly make the tradeoff to lower sp= eeds to reduce power consumption. That is you could get a 3.5GHz ARM proces= sor, but it'd be 100W. Anyone have any ideas or knowledge to clarify the issue? Why can Intel, AMD= , and IBM create 3-4GHz chips, when most other chips seem to be limited to = somewhere between 500MHz-1.5GHz. Chris
OT: Fast Circuits
Started by ●January 7, 2011
Reply by ●January 7, 20112011-01-07
Chris Maryan <kmaryan@gmail.com> wrote:>A coworker and I were debating what do the likes of Intel, IBM and AMD do d= >ifferently that allows them to design circuits at 3GHz+. In contrast with F= >PGAs which for the most part run on a similar process node (i.e. 65 or 40nm= >), but where even the major static blocks (i.e. DSP blocks) are only capabl= >e of around 500Mhz performance. Also compare to the fastest ARM chips, grap= >hics chips, most ASICs and other chips which may get up to 1.5GHz, but rare= >ly faster (yes, faster chips do exist, but they are the exception rather th= >an the rule). > >So we had some theories about the cause of the difference: >- Intel/IBM are way ahead in their technology development over the likes of= > TSMC and UMC. Doesn't AMD use UMC?Just compare the power consumption and there is your answer. -- Failure does not prove something is impossible, failure simply indicates you are not using the right tools... nico@nctdevpuntnl (punt=.) --------------------------------------------------------------
Reply by ●January 7, 20112011-01-07
On 01/07/2011 11:49 AM, Chris Maryan wrote:> A coworker and I were debating what do the likes of Intel, IBM and AMD do differently that allows them to design circuits at 3GHz+. In contrast with FPGAs which for the most part run on a similar process node (i.e. 65 or 40nm), but where even the major static blocks (i.e. DSP blocks) are only capable of around 500Mhz performance. Also compare to the fastest ARM chips, graphics chips, most ASICs and other chips which may get up to 1.5GHz, but rarely faster (yes, faster chips do exist, but they are the exception rather than the rule). >Well, one of the differences is that the CPUs are predetermined logic. They only have one "configuration" to get timing closure on, not an infinitely variable number of possibilities. I think that makes a HUGE difference. When they complete the design of some particular functional block, they can know EVERYTHING about it, such as setup and hold times, clock loading, clock skew within the module, etc. With an FPGA, there are a number of variables that add a large 'fuzz factor" to the timing margins and make it a lot harder to operate every FF at the maximum rate. FPGAs are designed to WORK correctly, but are clearly not completely optimized for speed. if you want max speed, you may need a custom part. because the CPU has only one config, they can optimize the speed to the utmost. This only explains part of the difference, of course. Jon
Reply by ●January 7, 20112011-01-07
On 7 Jan., 18:49, Chris Maryan <kmar...@gmail.com> wrote:> A coworker and I were debating what do the likes of Intel, IBM and AMD do=differently that allows them to design circuits at 3GHz+. In contrast with= FPGAs which for the most part run on a similar process node (i.e. 65 or 40= nm), but where even the major static blocks (i.e. DSP blocks) are only capa= ble of around 500Mhz performance. Also compare to the fastest ARM chips, gr= aphics chips, most ASICs and other chips which may get up to 1.5GHz, but ra= rely faster (yes, faster chips do exist, but they are the exception rather = than the rule).> > So we had some theories about the cause of the difference: > - Intel/IBM are way ahead in their technology development over the likes =of TSMC and UMC. Doesn't AMD use UMC?> - The 3.5GHz logic (i.e. the execution unit pipeline) in an Intel CPU doe=sn't actually run at 3.5GHz. There is a 3.5G clock, but it turns into a mes= s clock enables and logic effectively running at a much slower rate. Though= effective 3GHz performance is still achieved through parallelism.> - The difference is dynamic logic/domino logic/etc. Most common logic des=igns (ASICs, FPGAs, ARM processors) use static logic - a mess of convention= al CMOS gates separated by flops. High performance chips use dynamic logic,= lots of latches and similar tricks to avoid the overhead of static logic. = This idea may not stand up to scrutiny as I understand that the latest Inte= l architectures (Nehalem) are fully static.> - The designers of ASICs/GPUs/FPGAs knowingly make the tradeoff to lower =speeds to reduce power consumption. That is you could get a 3.5GHz ARM proc= essor, but it'd be 100W.> > Anyone have any ideas or knowledge to clarify the issue? Why can Intel, A=MD, and IBM create 3-4GHz chips, when most other chips seem to be limited t= o somewhere between 500MHz-1.5GHz.> > ChrisThe same question came to my mind a few days ago... For sure they are really running basically on the mentioned clock-rate. And of course the difference to an FPGA is clear (DSP blocks with 2GHz would simply make no sense when the logic fabric is not fast enough). But why is Intel faster than e.g. ARM in terms of maximum clock rate? Better RTL- design (i.e. fewer gates between the flip-flops)? Better process technology? Better use of dynamic logic? Prefering speed over power in the process? Something else? My guess is that it is a little bit of all... I have crossposted this to comp.arch, as we may get there a better answer. Thomas
Reply by ●January 7, 20112011-01-07
On Jan 7, 6:03=A0pm, Thomas Entner <thomas.entne...@gmail.com> wrote:> On 7 Jan., 18:49, Chris Maryan <kmar...@gmail.com> wrote: > > > A coworker and I were debating what do the likes of Intel, IBM and AMD =do differently that allows them to design circuits at 3GHz+. In contrast wi= th FPGAs which for the most part run on a similar process node (i.e. 65 or = 40nm), but where even the major static blocks (i.e. DSP blocks) are only ca= pable of around 500Mhz performance. Also compare to the fastest ARM chips, = graphics chips, most ASICs and other chips which may get up to 1.5GHz, but = rarely faster (yes, faster chips do exist, but they are the exception rathe= r than the rule).> > > So we had some theories about the cause of the difference: > > - Intel/IBM are way ahead in their technology development over the like=s of TSMC and UMC. Doesn't AMD use UMC?> > - The 3.5GHz logic (i.e. the execution unit pipeline) in an Intel CPU d=oesn't actually run at 3.5GHz. There is a 3.5G clock, but it turns into a m= ess clock enables and logic effectively running at a much slower rate. Thou= gh effective 3GHz performance is still achieved through parallelism.> > - The difference is dynamic logic/domino logic/etc. Most common logic d=esigns (ASICs, FPGAs, ARM processors) use static logic - a mess of conventi= onal CMOS gates separated by flops. High performance chips use dynamic logi= c, lots of latches and similar tricks to avoid the overhead of static logic= . This idea may not stand up to scrutiny as I understand that the latest In= tel architectures (Nehalem) are fully static.> > - The designers of ASICs/GPUs/FPGAs knowingly make the tradeoff to lowe=r speeds to reduce power consumption. That is you could get a 3.5GHz ARM pr= ocessor, but it'd be 100W.> > > Anyone have any ideas or knowledge to clarify the issue? Why can Intel,=AMD, and IBM create 3-4GHz chips, when most other chips seem to be limited= to somewhere between 500MHz-1.5GHz.> > > Chris > > The same question came to my mind a few days ago... For sure they are > really running basically on the mentioned clock-rate. And of course > the difference to an FPGA is clear (DSP blocks with 2GHz would simply > make no sense when the logic fabric is not fast enough). But why is > Intel faster than e.g. ARM in terms of maximum clock rate? Better RTL- > design (i.e. fewer gates between the flip-flops)? Better process > technology? Better use of dynamic logic? Prefering speed over power in > the process? Something else? My guess is that it is a little bit of > all... > > I have crossposted this to comp.arch, as we may get there a better > answer. > > ThomasWhip out your Virtex-6 datasheet to find some answers. In the FPGA, the logic cells, slices, whatever you want to call them are pretty fast. Internal time values may be sub 100 ps for clock to Q or setup. All of this is completely swamped by the multi nanosecond routing delays in the same architecture. It's a bit like a miniature version of a PC board full of tiny ASICs. Every time you leave an ASIC, you get hit with big IO buffer delays and board routing delays. Clearly the same process can do quite well timing-wise when you insert "hard" blocks like power PC processors or PCIe endpoint blocks. Altera is touting 25 Gb/s SERDES on their latest process. So really the big culprit is fabric interconnect. You pay a big price for programmability, and a bigger price for finer grain programmability. -- Gabor
Reply by ●January 7, 20112011-01-07
In comp.arch.fpga Thomas Entner <thomas.entner99@gmail.com> wrote: (snip)> The same question came to my mind a few days ago... For sure they are > really running basically on the mentioned clock-rate. And of course > the difference to an FPGA is clear (DSP blocks with 2GHz would simply > make no sense when the logic fabric is not fast enough).The FPGA routing fabric is slower than direct wiring, and that comes through to the final speed. But if you do things in parallel, you can get enough done in a given time.> But why is Intel faster than e.g. ARM in terms of maximum clock rate?Clock rate is not a good measure of processor speed. You have to also see how much gets done each clock cycle. For a pipelined design, clock rate is determined by the logic between pipeline registers, and faster is usually better. The tradeoffs are not easy, though, and sometimes the slower clock gets more done.> Better RTL- > design (i.e. fewer gates between the flip-flops)? Better process > technology? Better use of dynamic logic? Prefering speed over power in > the process? Something else? My guess is that it is a little bit of > all...> I have crossposted this to comp.arch, as we may get there a better > answer.-- glen
Reply by ●January 7, 20112011-01-07
> The FPGA routing fabric is slower than direct wiring, and that > comes through to the final speed. =A0But if you do things in parallel, > you can get enough done in a given time.As the OP wrote, the question is a little bit off-topic for comp.arch.fpga: I think it is clear to all that an ASIC will always be faster than a FPGA for various reasons in the same process.> > > But why is Intel faster than e.g. ARM in terms of maximum clock rate? > > Clock rate is not a good measure of processor speed. =A0You have to > also see how much gets done each clock cycle. =A0For a pipelined > design, clock rate is determined by the logic between pipeline > registers, and faster is usually better. =A0The tradeoffs are not > easy, though, and sometimes the slower clock gets more done. >But to my knowledge, modern x86-CPUs, with all their out-of-order- stuff, etc. are still more complex than the latest ARM-CPUs. Still they achieve higher clock-rates... Thomas
Reply by ●January 7, 20112011-01-07
On 1/7/2011 3:03 PM, Thomas Entner wrote:> On 7 Jan., 18:49, Chris Maryan<kmar...@gmail.com> wrote: >> A coworker and I were debating what do the likes of Intel, IBM and AMD do differently that allows them to design circuits at 3GHz+. In contrast with FPGAs which for the most part run on a similar process node (i.e. 65 or 40nm), but where even the major static blocks (i.e. DSP blocks) are only capable of around 500Mhz performance. Also compare to the fastest ARM chips, graphics chips, most ASICs and other chips which may get up to 1.5GHz, but rarely faster (yes, faster chips do exist, but they are the exception rather than the rule). >> >> So we had some theories about the cause of the difference: >> - Intel/IBM are way ahead in their technology development over the likes of TSMC and UMC. Doesn't AMD use UMC? >> - The 3.5GHz logic (i.e. the execution unit pipeline) in an Intel CPU doesn't actually run at 3.5GHz. There is a 3.5G clock, but it turns into a mess clock enables and logic effectively running at a much slower rate. Though effective 3GHz performance is still achieved through parallelism. >> - The difference is dynamic logic/domino logic/etc. Most common logic designs (ASICs, FPGAs, ARM processors) use static logic - a mess of conventional CMOS gates separated by flops. High performance chips use dynamic logic, lots of latches and similar tricks to avoid the overhead of static logic. This idea may not stand up to scrutiny as I understand that the latest Intel architectures (Nehalem) are fully static. >> - The designers of ASICs/GPUs/FPGAs knowingly make the tradeoff to lower speeds to reduce power consumption. That is you could get a 3.5GHz ARM processor, but it'd be 100W. >> >> Anyone have any ideas or knowledge to clarify the issue? Why can Intel, AMD, and IBM create 3-4GHz chips, when most other chips seem to be limited to somewhere between 500MHz-1.5GHz. >> >> Chris > > The same question came to my mind a few days ago... For sure they are > really running basically on the mentioned clock-rate. And of course > the difference to an FPGA is clear (DSP blocks with 2GHz would simply > make no sense when the logic fabric is not fast enough). But why is > Intel faster than e.g. ARM in terms of maximum clock rate? Better RTL- > design (i.e. fewer gates between the flip-flops)? Better process > technology? Better use of dynamic logic? Prefering speed over power in > the process? Something else? My guess is that it is a little bit of > all... > > I have crossposted this to comp.arch, as we may get there a better > answer. > > ThomasThe main difference: full custom VLSI is faster than ASIC cell based design is faster than FPGA design. I am often amazed the other way, at how fast FPGAs are: every time I look at FPGAs from first principles, I see them as 10X to 16X slower than full custom design. Yet they are actually closer than that. Ditto ASIC cell based design. Now, Intel and AMD both do cell based design, but not necessarily everywhere, and/or are quite willing to rework the cell library in critical areas. You can often see the difference on a cell photo: the sort of stack of boxes and then routing that is typical of cell based design, versus the really dense datapaths typical of full custom. Other differences: Intel's big design teams. There's a lot more manual work at Intel than at many other places. It's worth it, given Intel's manufacturing runs, to spend a lot of money to make the chip 10% smaller, but that almost directly translates to profit. Other points from the original poster and crossposter. >> - Intel/IBM are way ahead in their technology development over the likes of TSMC and UMC. Doesn't AMD use UMC? Intel fabs are often ahead of TSMC and UMC and GF. But this doesn't explain the whole difference, not by a long shot, especially given AMD's situation. By the way, an interesting conversation is why IBM is above 5GHz, whereas Intel is not. >> - The 3.5GHz logic (i.e. the execution unit pipeline) in an Intel CPU doesn't actually run at 3.5GHz. There is a 3.5G clock, but it turns into a mess clock enables and logic effectively running at a much slower rate. Though effective 3GHz performance is still achieved through parallelism. While certainly this trick should be in every designer's toolbox, it is not overall true. Large parts of the CPUs run at the above 3 GHz frequency. (Way back in Willamette, the important parts of the chip ran at 2X the published frequency. Not so much any more.) >> - The difference is dynamic logic/domino logic/etc. Most common logic designs (ASICs, FPGAs, ARM processors) use static logic - a mess of conventional CMOS gates separated by flops. High performance chips use dynamic logic, lots of latches and similar tricks to avoid the overhead of static logic. This idea may not stand up to scrutiny as I understand that the latest Intel architectures (Nehalem) are fully static. Again, while domino, etc., are an option for every project, many recent Intel projects have been full static. E.g. googling "intel nehalem static cmos" IDF: Inside Nehalem - HotHardware Aug 22, 2008 ... Another way Intel managed to keep the power requirements for Nehalem relatively low (130 watts TDP) was by using static CMOS for all of the ... hothardware.com/Articles/IDF-Inside-Nehalem/?page=3 [PDF] Intel and Core i7 (Nehalem) Dynamic Power Management File Format: PDF/Adobe Acrobat - Quick View hungry. To save power, Intel circuit designers decided to switch from domino logic to static CMOS based logic circuits when implementing Nehalem. ... cs466.andersonje.com/public/pm.pdf - Similar (I haven't found similarly clear statements for Intel Sandybridge, Atom, or AMD. But I haven't bothered to look at ISSCC papers. Yet.)
Reply by ●January 8, 20112011-01-08
On Jan 7, 8:38=A0pm, "Andy \"Krazy\" Glew" <a...@SPAM.comp-arch.net> wrote:> Intel fabs are =A0often ahead of TSMC and UMC and GF. =A0But this doesn't > explain the whole difference, not by a long shot, especially given AMD's > situation.When I was at AMD and AMD still owned the Dresden FABs, we would look at the process technologies from (say) TSMC,... and find that if we dumped our chips in that FAB they woud run about 1/2 as fast. So, there is about a factor of 2X in the FAB techology.> By the way, an interesting conversation is why IBM is above 5GHz, > whereas Intel is not.The market limits Intel to 100 Watt air cooled envelope, IBM is not so limited.> =A0>> - The difference is dynamic logic/domino logic/etc. Most common > logic designs (ASICs, FPGAs, ARM processors) use static logic - a mess > of conventional CMOS gates separated by flops. High performance chips > use dynamic logic, lots of latches and similar tricks to avoid the > overhead of static logic. This idea may not stand up to scrutiny as I > understand that the latest Intel architectures (Nehalem) are fully static=. Note: excepting for RAMs and ROMs, there is almost no dynamic logic in one of the x86 manufactures products. Dynamic logic is hard and takes a lot more designers to get right (some of them in the FAB.) Dynamic logic is sensitive to the process window swings. In many cases, dynamic logic is not really faster once you consider not being able to use the logic in the other 1/2 of the clock cycle and the added skew on the falling edge of the clock.> E.g. googling "intel nehalem static cmos" > > =A0 =A0 IDF: Inside Nehalem - HotHardware > =A0 =A0 Aug 22, 2008 ... Another way Intel managed to keep the power > =A0 =A0 requirements for Nehalem relatively low (130 watts TDP) was by us=ing> =A0 =A0 static CMOS for all of the ... > =A0 =A0 hothardware.com/Articles/IDF-Inside-Nehalem/?page=3D3 > > =A0 =A0 [PDF] Intel and Core i7 (Nehalem) Dynamic Power Management > =A0 =A0 File Format: PDF/Adobe Acrobat - Quick View > =A0 =A0 hungry. To save power, Intel circuit designers decided to switch > =A0 =A0 from domino logic to static CMOS based logic circuits when > =A0 =A0 implementing Nehalem. ... > =A0 =A0 cs466.andersonje.com/public/pm.pdf - SimilarA good decision based on the power envelope not letting the speed of dynamic logic to be utilized to its fullest. Mtich
Reply by ●January 8, 20112011-01-08
In comp.arch.fpga MitchAlsup <MitchAlsup@aol.com> wrote:> On Jan 7, 8:38�pm, "Andy \"Krazy\" Glew" <a...@SPAM.comp-arch.net> > wrote: >> Intel fabs are �often ahead of TSMC and UMC and GF. �But this doesn't >> explain the whole difference, not by a long shot, especially given AMD's >> situation.(snip)> Note: excepting for RAMs and ROMs, there is almost no dynamic logic in > one of the x86 manufactures products. Dynamic logic is hard and takes > a lot more designers to get right (some of them in the FAB.) Dynamic > logic is sensitive to the process window swings. In many cases, > dynamic logic is not really faster once you consider not being able to > use the logic in the other 1/2 of the clock cycle and the added skew > on the falling edge of the clock.The 8080 and 8086 used dynamic logic. (Possibly only for registers.) One reason the Z80 became more popular than the 8080 was its use of static logic, and the ability to debug with a slow clock. The processors with built-in PLL can't be slow clocked, even if the logic is static. -- glen





