FPGARelated.com
Forums

Updated Stratix II Power Specs & Explanation

Started by Paul Leventis February 14, 2005
glen herrmannsfeldt wrote:
> Austin Lesea wrote: > (big snip) > > > Is comparing our middle speed grade with > > their fastest honest? Well, if that is the only thing they can get
> > their hands on, perhaps it is. Would it be better to compare their
> > slowest with our slowest? Who would be excited about that? > > I wonder what fraction of FPGA designs are not very speed > sensitive. I used to wonder about that in TTL days, building > digital clocks (60Hz for the fastest signals) out of 30MHz TTL > chips. > > Many designs could easily only require on half or one tenth of > what current FPGAs are capable of, and still be worth putting in > an FPGA. For those, the slowest devices, especially with lower > static power, might be very useful.
Something similar crossed my mind when we first talked with our local Xilinx sales rep several months ago about how the S2 compares to the V4. Like has seemingly occured with microprocessors, will there come a time when FPGA's are fast enough for all but a small number of of fringe applications? Except for the gamers, you don't hear people talking about needing a faster CPU anymore - they are "fast enough." Before the vendors jump all over me, I'm not saying that FPGA's have reached that point yet. And yes I realize that there is still plenty of innovation that they can do on their side. And each one of those responses would miss my point. We push our FPGA's pretty hard where I work, and in the past, have always come up with a list of things that we want or need for the next generation device. But when Xilinx came around asking their thousand questions about what what they can improve on over V4 for the next gen device, our suggestion list was very short. I am most certainly not saying innovation is over when some engineers at a startup can't come up with really good ideas for next gen parts. I'm just saying that from where we sit, the FPGA's seem to be approaching "good enough." Not there yet, but approaching. I'm sure we'll want a 40 Gbps SERDES (with CDR) on every I/O pin someday, and we'll want to run several levels of logic at 1 GHz or more. But the wish list is pretty short compared to what it used to be - and I'm willing to wait a number of years for it to come true (where as in the past, there's usually been something that we needed immedately and had to "design around" the current architecture). Anyone else out there see this? Anyone seeing something that a V4 or S2 won't do fairly well, that you think someone might want or need in the next year or three?
> For the most part, I don't find this discussion very useful. > We all know about marketing departments, and having engineers > argue this doesn't cause me to look favorable on their company > or products.
While I agree with Glen that the quibbling about power was highly annoying, especially when everyone knew they were dealing with pre-release numbers and products that the vendors know are not final, after putting all the information through the FUD filter, I did come away with a better understanding of the issues involved, helped somewhat by the (probably rushed) update of the S2 power numbers. Have fun, Marc
Glen,

Well, here is something useful:  suppose I told you that with decreasing 
geometries, the models are getting both faster, leakier, AND slower, and 
  less leaky?

This brings up an interesting question, what if the next product had two 
extra speed grades SLOWER that the slowest?  Perhaps leakage grades as well?

Basically this is the implication of designing with the ever 
increasingly small geometries:  some transistors are faster, but some 
will be slower, and the process control will be more difficult.

For all those who do not need the speed, one could offer lower cost 
parts, as well as offer four (or five) more speed grades at an 
increasing premium.

Sort of like, if you get lemons, make lemonade.

The issue right now is the sales force freaks out when they hear that 
the next generation is both FASTER, AND SLOWER (it can be both, as it 
turns out).

But I agree with you, that not everyone needs the fastest part.  A 
survey of system clock speeds was quite revealing:  big use at 33 MHz, 
66 MHz, 100 MHz, 155 MHz, with a decreasing tail past 200 MHz.  Funny 
thing, all these frequencies are "magic" and coincide with PCI, 
SONET/SDH, SDRAM, etc.  No magic at all?

Austin

glen herrmannsfeldt wrote:

> Austin Lesea wrote: > (big snip) > >> The devil is in the details: is a static power reduction at 25C an >> improvement? Yes, and No. Is comparing our middle speed grade with >> their fastest honest? Well, if that is the only thing they can get >> their hands on, perhaps it is. Would it be better to compare their >> slowest with our slowest? Who would be excited about that? > > > I wonder what fraction of FPGA designs are not very speed sensitive. I > used to wonder about that in TTL days, building > digital clocks (60Hz for the fastest signals) out of 30MHz TTL chips. > > Many designs could easily only require on half or one tenth of what > current FPGAs are capable of, and still be worth putting in an FPGA. > For those, the slowest devices, especially with lower static power, > might be very useful. > > For the most part, I don't find this discussion very useful. > We all know about marketing departments, and having engineers > argue this doesn't cause me to look favorable on their company or products. > > -- glen >
Vaugn,

Shell and pea game:  no, you do not get the entire benefit of reduced C.

Also, not all layer dielectrics are Lo-K.  For example, the clock tree 
is near the top, where regular dielectric is used, isn't it?

At least, we evaluated both with, and without Lo-K devices (from the 
same masks and fab), and were surprised to see only a 5% improvement.

Did you do the same experiment?  We were surprised.

Turns out, there is a lot more in the equations that just C.

If it was just that simple, extracted simulations in spice would be 
unneeded.

Austin

Vaughn Betz wrote:

> Thanks to John for a thoughtful posting. I enjoy reading this newsgroup and > helping customers when I can. I also can't resist correcting errors / > misrepresentations when I see them. I don't think name calling or hyperbole > is enjoyable for either the people posting or the people reading posts, so I > appreciate the effort to encourage civility. > > Hopefully correcting the error below will not cause a firestorm in response: > > >>>It appears Low-K is a win for Altera. >> >>5% less C, means 5% less core power, and ~5% more speed over regular K. >>All good. > > > FSG dielectric (Virtex4) has a dielectric constant of 3.7. Black diamond > (Stratix II) has a dielectric constant of about 3.0. See > http://www.micromagazine.com/archive/04/03/applied.html for details on the > dielectric constant of black diamond. That means you get a 19% capacitance > reduction with black diamond vs. FSG. Everybody remembers capacitance is > directly proportional to dielectric constant, right? :). > > So all metal capacitance drops by 19%. Nowadays metal capacitance is about > 2/3 of the switching capacitance in an FPGA, while the remaining 1/3 is gate > & diffusion capacitance that is unaffected by low-k (since it's transistor > capacitance rather than in the metal stack). So, you get an ~13% speed up > and ~13% dynamic power reduction from the use of a low-k dielectric. > > Vaughn Betz > Altera > [v b e t z (at) altera.com] > >
In article <1108532703.846896.232700@z14g2000cwz.googlegroups.com>,
Marc Randolph <mrand@my-deja.com> wrote:

>Anyone else out there see this? Anyone seeing something that a V4 or >S2 won't do fairly well, that you think someone might want or need in >the next year or three?
Yes, I do. Gigabit Copper has become this all purpose glue: a cheap way of connecting stuff together. Currently, it takes an external PHY or MAC/PHY: not a big deal on an expensive board with an expensive FPGA, but its a big deal on a cheap board. I'd love to see a Spartan/Cyclone FX, with multiple 10/100/1000-T MAC/PHYs as hardcores, from one on the smallest part to perhaps as many as eight. I see a world which needs a ton of high speed, low cost programmable Gb network devices (mostly security applications, but who knows what else?) Never happen, but an interesting thought. -- Nicholas C. Weaver. to reply email to "nweaver" at the domain icsi.berkeley.edu
Nicholas Weaver wrote:
> In article <1108532703.846896.232700@z14g2000cwz.googlegroups.com>, > Marc Randolph <mrand@my-deja.com> wrote: > > >>Anyone else out there see this? Anyone seeing something that a V4 or >>S2 won't do fairly well, that you think someone might want or need in >>the next year or three? > > > Yes, I do. Gigabit Copper has become this all purpose glue: a cheap > way of connecting stuff together. Currently, it takes an external PHY > or MAC/PHY: not a big deal on an expensive board with an expensive > FPGA, but its a big deal on a cheap board. > > I'd love to see a Spartan/Cyclone FX, with multiple 10/100/1000-T > MAC/PHYs as hardcores, from one on the smallest part to perhaps as > many as eight. > > I see a world which needs a ton of high speed, low cost programmable > Gb network devices (mostly security applications, but who knows what > else?) > > Never happen, but an interesting thought.
This chip was interesting, as it includes SATA PHY, at 1.5GHz http://www.oxsemi.com/press/feb05/index.html so I am sure we will see the same in FPGAs. Longer range PHYs, probably are determined by power/voltage swings -jg
On 15 Feb 2005 21:45:03 -0800, "Marc Randolph" <mrand@my-deja.com>
wrote:

>glen herrmannsfeldt wrote: >> Austin Lesea wrote: >> (big snip)
>> Many designs could easily only require on half or one tenth of >> what current FPGAs are capable of, and still be worth putting in >> an FPGA. For those, the slowest devices, especially with lower >> static power, might be very useful. > >Something similar crossed my mind when we first talked with our local >Xilinx sales rep several months ago about how the S2 compares to the >V4.
>Anyone else out there see this? Anyone seeing something that a V4 or >S2 won't do fairly well, that you think someone might want or need in >the next year or three?
I suspect that as hot-spots are tackled, new ones will appear. For example, multipliers used to take up a huge area. I'm finding big multiplexers to be an issue, for a couple of reasons. Barrel shifters, and normalisation (which will become hugely important if the floating point synthesisable packages take off) for one, and the replacement of internal tri-states with mux logic for another. They tend to be quite large, not very well structured (for floorplanning), and not particularly well pipelined, as they tend to synthesise to several levels of logic without using the slice FFs. On a couple of recent designs I'd estimate 30 to 50% of the area has been multiplexers (and I didn't have the option of using multipliers as barrel shifters). MUXF5s help ... a little. I believe something like a 4/8/16 to 1 MUX function 16(18?) bits in width (preferably also configurable as a barrel shifter) would be worthy of consideration as a next generation block function (or MegaFunction) - Brian
Brian Drummond wrote:
> On 15 Feb 2005 21:45:03 -0800, "Marc Randolph" <mrand@my-deja.com> > wrote: > > >glen herrmannsfeldt wrote: > >> Austin Lesea wrote: > >> (big snip) > > >> Many designs could easily only require on half or one tenth of > >> what current FPGAs are capable of, and still be worth putting in > >> an FPGA. For those, the slowest devices, especially with lower > >> static power, might be very useful. > > > >Something similar crossed my mind when we first talked with our
local
> >Xilinx sales rep several months ago about how the S2 compares to the > >V4. > > >Anyone else out there see this? Anyone seeing something that a V4
or
> >S2 won't do fairly well, that you think someone might want or need
in
> >the next year or three? > > I suspect that as hot-spots are tackled, new ones will appear. For > example, multipliers used to take up a huge area. > > I'm finding big multiplexers to be an issue, for a couple of reasons. > Barrel shifters, and normalisation (which will become hugely
important
> if the floating point synthesisable packages take off) for one, and
the
> replacement of internal tri-states with mux logic for another. > > They tend to be quite large, not very well structured (for > floorplanning), and not particularly well pipelined, as they tend to > synthesise to several levels of logic without using the slice FFs. > > On a couple of recent designs I'd estimate 30 to 50% of the area has > been multiplexers (and I didn't have the option of using multipliers
as
> barrel shifters). MUXF5s help ... a little. > > I believe something like a 4/8/16 to 1 MUX function 16(18?) bits in > width (preferably also configurable as a barrel shifter) would be
worthy
> of consideration as a next generation block function (or
MegaFunction) Howdy Brian, Now knowing your device size, I'm not sure how many muxes you have, but any way you think of it, 50% of an FPGA for muxes is quite a bit. But I think the V4 DSP block does most of what you've discussed (wide muxes, barrel shifters, and counters), doesn't it? Have fun, Marc
Nicholas Weaver wrote:
> In article <1108532703.846896.232700@z14g2000cwz.googlegroups.com>, > Marc Randolph <mrand@my-deja.com> wrote: > > >Anyone else out there see this? Anyone seeing something that a V4
or
> >S2 won't do fairly well, that you think someone might want or need
in
> >the next year or three? > > Yes, I do. Gigabit Copper has become this all purpose glue: a cheap > way of connecting stuff together. Currently, it takes an external
PHY
> or MAC/PHY: not a big deal on an expensive board with an expensive > FPGA, but its a big deal on a cheap board. > > I'd love to see a Spartan/Cyclone FX, with multiple 10/100/1000-T > MAC/PHYs as hardcores, from one on the smallest part to perhaps as > many as eight. > > I see a world which needs a ton of high speed, low cost programmable > Gb network devices (mostly security applications, but who knows what > else?) > > Never happen, but an interesting thought.
Yes, it is - and I agree that there are tons of applications just begging for gigabit ethernet connectivity (PVR/DVR's or HDTV's, not to mention almost all "normal" communication or networking equipment). But the analog circuitry required to support the five level (!) signaling that gigabit copper uses seems like a little much to ask of a multi-hundred MHz digital device with tens of thousands of gates toggling all at the same time. Even so, you gave me an idea... what if you could have the analog front end in a cheapish external device and use the internal DSP blocks to do the signal processing? Of course, to make it worthwhile, that cheapish device would need to be at least 3 or 4x lower cost than the $10/port gigabit copper phy's you can buy right now. As you said - unlikely that it will ever happen. The MAC's are a completely different issue. They could do that easily right now, and in all devices. So I think you did hit on something there. There is no reason that every FPGA couldn't ship with 1 or 4 or 12 of them (or for a start, at least more than one family). When I first heard that the 4VFX20 devices were going to have 8 MGT's, I immedately started thinking of all the things I could do not only with those 8 MGT's, but with the 8 hard MAC's they would surely include. Only later did I discover they put only 2 MAC's in the FX20!?! The MAC's obviously aren't designed with networking/telecommunications equipment in mind or they would have included one for each MGT. Instead, they seem to have made it 2x the number of 405 processors. Great idea though! Have fun, Marc
"Austin Lesea" <austin@xilinx.com> wrote in message 
news:cuvptt$baj6@cliff.xsj.xilinx.com...
> Vaugn, > > Shell and pea game: no, you do not get the entire benefit of reduced C.
The entire benefit would be 19% speed and dynamic power reduction. As I said, we get about 2/3 of that maximum benefit, since not all C is metal C, but most is.
> > Also, not all layer dielectrics are Lo-K. For example, the clock tree is > near the top, where regular dielectric is used, isn't it?
We use low-k to near the top of the metal stack. At the very top, where you're routing power and ground, you don't need (or even want it), since high capacitance on power and ground is beneficial (helps prevent ground bounce & vcc sag). The vast majority of the switching capacitance (clocks, routing, ALMs, MACs, etc.) is in metal surrounded by low-k.
> At least, we evaluated both with, and without Lo-K devices (from the same > masks and fab), and were surprised to see only a 5% improvement. > > Did you do the same experiment? We were surprised.
We simulated everything with and without low-K, and got the ~13% improvement I previously mentioned. I am also surprised you got only 5%. That is certainly well below mainstream for the industry -- if everyone were seeing such small gains, I doubt the fabs and semiconductor equipment vendors would be pumping billions into developing low-k (and next generation extra-low-k) dielectrics. Sounds like you may have used low-k for only a few metal layers, so perhaps that explains your disappointing experience.
> Turns out, there is a lot more in the equations that just C. > > If it was just that simple, extracted simulations in spice would be > unneeded.
This is backwards. As metal capacitance has become the dominant capacitance, extracting layouts to obtain all the metal parasitics before running SPICE has become essential to getting accurate answers. Go back enough process generations and this was less true -- you could write up your transistor-level schematic in a SPICE deck, simulate it with no thought of metal, and you wouldn't be that far off for most circuits, since transistor parasitics dominated. Now that metal dominates, you have to extract layouts to get the metal C or you get bad answers. Vaughn Betz Altera [v b e t z (at) altera.com]
Vaughn,

Well, you certainly have been fooled.

See below,

Austin

Vaughn Betz wrote:
> "Austin Lesea" <austin@xilinx.com> wrote in message > news:cuvptt$baj6@cliff.xsj.xilinx.com... > >>Vaugn, >> >>Shell and pea game: no, you do not get the entire benefit of reduced C. > > > The entire benefit would be 19% speed and dynamic power reduction. As I > said, we get about 2/3 of that maximum benefit, since not all C is metal C, > but most is. > >>Also, not all layer dielectrics are Lo-K. For example, the clock tree is >>near the top, where regular dielectric is used, isn't it? > > > We use low-k to near the top of the metal stack. At the very top, where > you're routing power and ground, you don't need (or even want it), since > high capacitance on power and ground is beneficial (helps prevent ground > bounce & vcc sag). The vast majority of the switching capacitance (clocks, > routing, ALMs, MACs, etc.) is in metal surrounded by low-k.
I doubt it. The dielectric above the transistors is regular undoped glass (SiO2). K = 4.3. Then comes the lo-K after M1. M1 through M5 is all they can do as lo-K, if they do more, it sufffers major yield and reliability issues. Of maybe you haven't noticed the delamination yet?
> > >>At least, we evaluated both with, and without Lo-K devices (from the same >>masks and fab), and were surprised to see only a 5% improvement. >> >>Did you do the same experiment? We were surprised. > > > We simulated everything with and without low-K, and got the ~13% improvement
Nope. You did not. If you did, you would discover that the layer above the transistors and below metal 1, as well as the upper layers for clocks, etc. leads to less than expected improvements. I am pretty sure your ICDES folks just scaled everything. It would be a major project to develop, and QC spice models for both processes, and I seriously doubt anyone would bother.
> I previously mentioned. I am also surprised you got only 5%. That is > certainly well below mainstream for the industry -- if everyone were seeing > such small gains,
which they are. I doubt the fabs and semiconductor equipment vendors would
> be pumping billions into developing low-k (and next generation extra-low-k) > dielectrics.
The only folks making money on this are the equipment suppliers. No one I know asked for it. Yes, it can be a major benefit to ASIC, uP, and perhaps memories. But, it just isn't doing anything for us. Now, we will get lo-K for free, as they have the equipment and process now, butguess what? We still do not see more than a 5% improvement from V4 without lo-K to V4 with lo-K. Wow, two generations and two sets of side by side lo-K and regular experiments. Ignorance I guess is bliss. Sounds like you may have used low-k for only a few metal
> layers, so perhaps that explains your disappointing experience.
Nope,as I described, the only layers alloed to be lo-K for lifetime delamination issues and quality are the ones above M1, and below M5. Anymore than that, and we have see problems with fab process qual (not on our parts, but their test structures).
> > >>Turns out, there is a lot more in the equations that just C. >> >>If it was just that simple, extracted simulations in spice would be >>unneeded. > > > This is backwards. As metal capacitance has become the dominant > capacitance, extracting layouts to obtain all the metal parasitics before > running SPICE has become essential to getting accurate answers. Go back > enough process generations and this was less true -- you could write up your > transistor-level schematic in a SPICE deck, simulate it with no thought of > metal, and you wouldn't be that far off for most circuits, since transistor > parasitics dominated. Now that metal dominates, you have to extract layouts > to get the metal C or you get bad answers.
I can see you really have no clue about where the wire models are going. How thick is the metal, how thick is the dielectric? How close are the wires? There is R there (and lots of it). There is C there, too. There is also side wall C (the sidewalls are regular FSG, or SiO2 -- no lo-K advantage). Again, you go back and ask if they actually had foundry models for with, and without, and what the actual stack up was. One of the biggest overstatements we have seen recently is all of this nonsense about the superiority of lo-K. Its nice, don't get me wrong, but don't tout it as a miracle if you have never proven it is. You don't know. We do. Take the time to do it right, or at least study it right. Get an ICDES wire model expert to talk to you about where the lo-K is, and isn't.
> > Vaughn Betz > Altera > [v b e t z (at) altera.com] > >