comp.arch.fpga | Updated Stratix II Power Specs & Explanation| page 3

Reply by Marc Randolph ●February 16, 20052005-02-16

glen herrmannsfeldt wrote:
> Austin Lesea wrote:
> (big snip)
>
> > Is comparing our middle speed grade with
> > their fastest honest?  Well, if that is the only thing they can get

> > their hands on, perhaps it is.  Would it be better to compare their

> > slowest with our slowest?  Who would be excited about that?
>
> I wonder what fraction of FPGA designs are not very speed
> sensitive.  I used to wonder about that in TTL days, building
> digital clocks (60Hz for the fastest signals) out of 30MHz TTL
> chips.
>
> Many designs could easily only require on half or one tenth of
> what current FPGAs are capable of, and still be worth putting in
> an FPGA.  For those, the slowest devices, especially with lower
> static power, might be very useful.

Something similar crossed my mind when we first talked with our local
Xilinx sales rep several months ago about how the S2 compares to the
V4.

Like has seemingly occured with microprocessors, will there come a time
when FPGA's are fast enough for all but a small number of of fringe
applications?  Except for the gamers, you don't hear people talking
about needing a faster CPU anymore - they are "fast enough."  Before
the vendors jump all over me, I'm not saying that FPGA's have reached
that point yet.  And yes I realize that there is still plenty of
innovation that they can do on their side.  And each one of those
responses would miss my point.

We push our FPGA's pretty hard where I work, and in the past, have
always come up with a list of things that we want or need for the next
generation device.  But when Xilinx came around asking their thousand
questions about what what they can improve on over V4 for the next gen
device, our suggestion list was very short.  I am most certainly not
saying innovation is over when some engineers at a startup can't come
up with really good ideas for next gen parts.  I'm just saying that
from where we sit, the FPGA's seem to be approaching "good enough."
Not there yet, but approaching.

I'm sure we'll want a 40 Gbps SERDES (with CDR) on every I/O pin
someday, and we'll want to run several levels of logic at 1 GHz or
more.  But the wish list is pretty short compared to what it used to be
- and I'm willing to wait a number of years for it to come true  (where
as in the past, there's usually been something that we needed
immedately and had to "design around" the current architecture).

Anyone else out there see this?  Anyone seeing something that a V4 or
S2 won't do fairly well, that you think someone might want or need in
the next year or three?

> For the most part, I don't find this discussion very useful.
> We all know about marketing departments, and having engineers
> argue this doesn't cause me to look favorable on their company
> or products.

While I agree with Glen that the quibbling about power was highly
annoying, especially when everyone knew they were dealing with
pre-release numbers and products that the vendors know are not final,
after putting all the information through the FUD filter, I did come
away with a better understanding of the issues involved, helped
somewhat by the (probably rushed) update of the S2 power numbers.

Have fun,

   Marc

Reply by Austin Lesea ●February 16, 20052005-02-16

Glen,

Well, here is something useful:  suppose I told you that with decreasing 
geometries, the models are getting both faster, leakier, AND slower, and 
  less leaky?

This brings up an interesting question, what if the next product had two 
extra speed grades SLOWER that the slowest?  Perhaps leakage grades as well?

Basically this is the implication of designing with the ever 
increasingly small geometries:  some transistors are faster, but some 
will be slower, and the process control will be more difficult.

For all those who do not need the speed, one could offer lower cost 
parts, as well as offer four (or five) more speed grades at an 
increasing premium.

Sort of like, if you get lemons, make lemonade.

The issue right now is the sales force freaks out when they hear that 
the next generation is both FASTER, AND SLOWER (it can be both, as it 
turns out).

But I agree with you, that not everyone needs the fastest part.  A 
survey of system clock speeds was quite revealing:  big use at 33 MHz, 
66 MHz, 100 MHz, 155 MHz, with a decreasing tail past 200 MHz.  Funny 
thing, all these frequencies are "magic" and coincide with PCI, 
SONET/SDH, SDRAM, etc.  No magic at all?

Austin

glen herrmannsfeldt wrote:

> Austin Lesea wrote:
> (big snip)
> 
>> The devil is in the details: is a static power reduction at 25C an 
>> improvement?  Yes, and No.  Is comparing our middle speed grade with 
>> their fastest honest?  Well, if that is the only thing they can get 
>> their hands on, perhaps it is.  Would it be better to compare their 
>> slowest with our slowest?  Who would be excited about that?
> 
> 
> I wonder what fraction of FPGA designs are not very speed sensitive.  I 
> used to wonder about that in TTL days, building
> digital clocks (60Hz for the fastest signals) out of 30MHz TTL chips.
> 
> Many designs could easily only require on half or one tenth of what 
> current FPGAs are capable of, and still be worth putting in an FPGA.  
> For those, the slowest devices, especially with lower static power, 
> might be very useful.
> 
> For the most part, I don't find this discussion very useful.
> We all know about marketing departments, and having engineers
> argue this doesn't cause me to look favorable on their company or products.
> 
> -- glen
>

Reply by Austin Lesea ●February 16, 20052005-02-16

Vaugn,

Shell and pea game:  no, you do not get the entire benefit of reduced C.

Also, not all layer dielectrics are Lo-K.  For example, the clock tree 
is near the top, where regular dielectric is used, isn't it?

At least, we evaluated both with, and without Lo-K devices (from the 
same masks and fab), and were surprised to see only a 5% improvement.

Did you do the same experiment?  We were surprised.

Turns out, there is a lot more in the equations that just C.

If it was just that simple, extracted simulations in spice would be 
unneeded.

Austin

Vaughn Betz wrote:

> Thanks to John for a thoughtful posting.  I enjoy reading this newsgroup and 
> helping customers when I can. I also can't resist correcting errors / 
> misrepresentations when I see them.  I don't think name calling or hyperbole 
> is enjoyable for either the people posting or the people reading posts, so I 
> appreciate the effort to encourage civility.
> 
> Hopefully correcting the error below will not cause a firestorm in response:
> 
> 
>>>It appears Low-K is a win for Altera.
>>
>>5% less C, means 5% less core power, and ~5% more speed over regular K. 
>>All good.
> 
> 
> FSG dielectric (Virtex4) has a dielectric constant of 3.7.  Black diamond 
> (Stratix II) has a dielectric constant of about 3.0.  See 
> http://www.micromagazine.com/archive/04/03/applied.html for details on the 
> dielectric constant of black diamond.  That means you get a 19% capacitance 
> reduction with black diamond vs. FSG.  Everybody remembers capacitance is 
> directly proportional to dielectric constant, right? :).
> 
> So all metal capacitance drops by 19%.  Nowadays metal capacitance is about 
> 2/3 of the switching capacitance in an FPGA, while the remaining 1/3 is gate 
> & diffusion capacitance that is unaffected by low-k (since it's transistor 
> capacitance rather than in the metal stack).  So,  you get an ~13% speed up 
> and ~13% dynamic power reduction from the use of a low-k dielectric.
> 
> Vaughn Betz
> Altera
> [v b e t z (at) altera.com]
> 
>

Reply by Nicholas Weaver ●February 16, 20052005-02-16

In article <1108532703.846896.232700@z14g2000cwz.googlegroups.com>,
Marc Randolph <mrand@my-deja.com> wrote:

>Anyone else out there see this?  Anyone seeing something that a V4 or
>S2 won't do fairly well, that you think someone might want or need in
>the next year or three?

Yes, I do.  Gigabit Copper has become this all purpose glue: a cheap
way of connecting stuff together.  Currently, it takes an external PHY
or MAC/PHY: not a big deal on an expensive board with an expensive
FPGA, but its a big deal on a cheap board.

I'd love to see a Spartan/Cyclone FX, with multiple 10/100/1000-T
MAC/PHYs as hardcores, from one on the smallest part to perhaps as
many as eight.

I see a world which needs a ton of high speed, low cost programmable
Gb network devices (mostly security applications, but who knows what
else?)

Never happen, but an interesting thought.
-- 
Nicholas C. Weaver.  to reply email to "nweaver" at the domain
icsi.berkeley.edu

Reply by Jim Granville ●February 16, 20052005-02-16

Nicholas Weaver wrote:
> In article <1108532703.846896.232700@z14g2000cwz.googlegroups.com>,
> Marc Randolph <mrand@my-deja.com> wrote:
> 
> 
>>Anyone else out there see this?  Anyone seeing something that a V4 or
>>S2 won't do fairly well, that you think someone might want or need in
>>the next year or three?
> 
> 
> Yes, I do.  Gigabit Copper has become this all purpose glue: a cheap
> way of connecting stuff together.  Currently, it takes an external PHY
> or MAC/PHY: not a big deal on an expensive board with an expensive
> FPGA, but its a big deal on a cheap board.
> 
> I'd love to see a Spartan/Cyclone FX, with multiple 10/100/1000-T
> MAC/PHYs as hardcores, from one on the smallest part to perhaps as
> many as eight.
> 
> I see a world which needs a ton of high speed, low cost programmable
> Gb network devices (mostly security applications, but who knows what
> else?)
> 
> Never happen, but an interesting thought.

This chip was interesting, as it includes SATA PHY, at 1.5GHz
http://www.oxsemi.com/press/feb05/index.html
so I am sure we will see the same in FPGAs.
Longer range PHYs, probably are determined by power/voltage swings

-jg

Reply by Brian Drummond ●February 16, 20052005-02-16

On 15 Feb 2005 21:45:03 -0800, "Marc Randolph" <mrand@my-deja.com>
wrote:

>glen herrmannsfeldt wrote:
>> Austin Lesea wrote:
>> (big snip)

>> Many designs could easily only require on half or one tenth of
>> what current FPGAs are capable of, and still be worth putting in
>> an FPGA.  For those, the slowest devices, especially with lower
>> static power, might be very useful.
>
>Something similar crossed my mind when we first talked with our local
>Xilinx sales rep several months ago about how the S2 compares to the
>V4.

>Anyone else out there see this?  Anyone seeing something that a V4 or
>S2 won't do fairly well, that you think someone might want or need in
>the next year or three?

I suspect that as hot-spots are tackled, new ones will appear. For
example, multipliers used to take up a huge area.

I'm finding big multiplexers to be an issue, for a couple of reasons.
Barrel shifters, and normalisation (which will become hugely important
if the floating point synthesisable packages take off) for one, and the
replacement of internal tri-states with mux logic for another.

They tend to be quite large, not very well structured (for
floorplanning), and not particularly well pipelined, as they tend to
synthesise to several levels of logic without using the slice FFs.

On a couple of recent designs I'd estimate 30 to 50% of the area has
been multiplexers (and I didn't have the option of using multipliers as
barrel shifters). MUXF5s help ... a little.

I believe something like a 4/8/16 to 1 MUX function 16(18?) bits in
width (preferably also configurable as a barrel shifter) would be worthy
of consideration as a next generation block function (or MegaFunction)

- Brian

Reply by Marc Randolph ●February 16, 20052005-02-16

Brian Drummond wrote:
> On 15 Feb 2005 21:45:03 -0800, "Marc Randolph" <mrand@my-deja.com>
> wrote:
>
> >glen herrmannsfeldt wrote:
> >> Austin Lesea wrote:
> >> (big snip)
>
> >> Many designs could easily only require on half or one tenth of
> >> what current FPGAs are capable of, and still be worth putting in
> >> an FPGA.  For those, the slowest devices, especially with lower
> >> static power, might be very useful.
> >
> >Something similar crossed my mind when we first talked with our
local
> >Xilinx sales rep several months ago about how the S2 compares to the
> >V4.
>
> >Anyone else out there see this?  Anyone seeing something that a V4
or
> >S2 won't do fairly well, that you think someone might want or need
in
> >the next year or three?
>
> I suspect that as hot-spots are tackled, new ones will appear. For
> example, multipliers used to take up a huge area.
>
> I'm finding big multiplexers to be an issue, for a couple of reasons.
> Barrel shifters, and normalisation (which will become hugely
important
> if the floating point synthesisable packages take off) for one, and
the
> replacement of internal tri-states with mux logic for another.
>
> They tend to be quite large, not very well structured (for
> floorplanning), and not particularly well pipelined, as they tend to
> synthesise to several levels of logic without using the slice FFs.
>
> On a couple of recent designs I'd estimate 30 to 50% of the area has
> been multiplexers (and I didn't have the option of using multipliers
as
> barrel shifters). MUXF5s help ... a little.
>
> I believe something like a 4/8/16 to 1 MUX function 16(18?) bits in
> width (preferably also configurable as a barrel shifter) would be
worthy
> of consideration as a next generation block function (or
MegaFunction)

Howdy Brian,

Now knowing your device size, I'm not sure how many muxes you have, but
any way you think of it, 50% of an FPGA for muxes is quite a bit.  But
I think the V4 DSP block does most of what you've discussed (wide
muxes, barrel shifters, and counters), doesn't it?

Have fun,

   Marc

Reply by Marc Randolph ●February 16, 20052005-02-16

Nicholas Weaver wrote:
> In article <1108532703.846896.232700@z14g2000cwz.googlegroups.com>,
> Marc Randolph <mrand@my-deja.com> wrote:
>
> >Anyone else out there see this?  Anyone seeing something that a V4
or
> >S2 won't do fairly well, that you think someone might want or need
in
> >the next year or three?
>
> Yes, I do.  Gigabit Copper has become this all purpose glue: a cheap
> way of connecting stuff together.  Currently, it takes an external
PHY
> or MAC/PHY: not a big deal on an expensive board with an expensive
> FPGA, but its a big deal on a cheap board.
>
> I'd love to see a Spartan/Cyclone FX, with multiple 10/100/1000-T
> MAC/PHYs as hardcores, from one on the smallest part to perhaps as
> many as eight.
>
> I see a world which needs a ton of high speed, low cost programmable
> Gb network devices (mostly security applications, but who knows what
> else?)
>
> Never happen, but an interesting thought.

Yes, it is - and I agree that there are tons of applications just
begging for gigabit ethernet connectivity (PVR/DVR's or HDTV's, not to
mention almost all "normal" communication or networking equipment).
But the analog circuitry required to support the five level (!)
signaling that gigabit copper uses seems like a little much to ask of a
multi-hundred MHz digital device with tens of thousands of gates
toggling all at the same time.  Even so, you gave me an idea... what if
you could have the analog front end in a cheapish external device and
use the internal DSP blocks to do the signal processing?  Of course, to
make it worthwhile, that cheapish device would need to be at least 3 or
4x lower cost than the $10/port gigabit copper phy's you can buy right
now.  As you said - unlikely that it will ever happen.

The MAC's are a completely different issue.  They could do that easily
right now, and in all devices.  So I think you did hit on something
there.  There is no reason that every FPGA couldn't ship with 1 or 4 or
12 of them (or for a start, at least more than one family).  When I
first heard that the 4VFX20 devices were going to have 8 MGT's, I
immedately started thinking of all the things I could do not only with
those 8 MGT's, but with the 8 hard MAC's they would surely include.
Only later did I discover they put only 2 MAC's in the FX20!?!  The
MAC's obviously aren't designed with networking/telecommunications
equipment in mind or they would have included one for each MGT.
Instead, they seem to have made it 2x the number of 405 processors.
Great idea though!

Have fun,

   Marc

Reply by Vaughn Betz ●February 17, 20052005-02-17

"Austin Lesea" <austin@xilinx.com> wrote in message 
news:cuvptt$baj6@cliff.xsj.xilinx.com...
> Vaugn,
>
> Shell and pea game:  no, you do not get the entire benefit of reduced C.

The entire benefit would be 19% speed and dynamic power reduction.  As I 
said, we get about 2/3 of that maximum benefit, since not all C is metal C, 
but most is.
>
> Also, not all layer dielectrics are Lo-K.  For example, the clock tree is 
> near the top, where regular dielectric is used, isn't it?

We use low-k to near the top of the metal stack.  At the very top, where 
you're routing power and ground, you don't need (or even want it), since 
high capacitance on power and ground is beneficial (helps prevent ground 
bounce & vcc sag).  The vast majority of the switching capacitance (clocks, 
routing, ALMs, MACs, etc.) is in metal surrounded by low-k.

> At least, we evaluated both with, and without Lo-K devices (from the same 
> masks and fab), and were surprised to see only a 5% improvement.
>
> Did you do the same experiment?  We were surprised.

We simulated everything with and without low-K, and got the ~13% improvement 
I previously mentioned.  I am also surprised you got only 5%.  That is 
certainly well below mainstream for the industry -- if everyone were seeing 
such small gains, I doubt the fabs and semiconductor equipment vendors would 
be pumping billions into developing low-k (and next generation extra-low-k) 
dielectrics.  Sounds like you may have used low-k for only a few metal 
layers, so perhaps that explains your disappointing experience.

> Turns out, there is a lot more in the equations that just C.
>
> If it was just that simple, extracted simulations in spice would be 
> unneeded.

This is backwards.  As metal capacitance has become the dominant 
capacitance, extracting layouts to obtain all the metal parasitics before 
running SPICE has become essential to getting accurate answers.  Go back 
enough process generations and this was less true -- you could write up your 
transistor-level schematic in a SPICE deck, simulate it with no thought of 
metal, and you wouldn't be that far off for most circuits, since transistor 
parasitics dominated.  Now that metal dominates, you have to extract layouts 
to get the metal C or you get bad answers.

Vaughn Betz
Altera
[v b e t z (at) altera.com]

Reply by Austin Lesea ●February 17, 20052005-02-17

Vaughn,

Well, you certainly have been fooled.

See below,

Austin

Vaughn Betz wrote:
> "Austin Lesea" <austin@xilinx.com> wrote in message 
> news:cuvptt$baj6@cliff.xsj.xilinx.com...
> 
>>Vaugn,
>>
>>Shell and pea game:  no, you do not get the entire benefit of reduced C.
> 
> 
> The entire benefit would be 19% speed and dynamic power reduction.  As I 
> said, we get about 2/3 of that maximum benefit, since not all C is metal C, 
> but most is.
> 
>>Also, not all layer dielectrics are Lo-K.  For example, the clock tree is 
>>near the top, where regular dielectric is used, isn't it?
> 
> 
> We use low-k to near the top of the metal stack.  At the very top, where 
> you're routing power and ground, you don't need (or even want it), since 
> high capacitance on power and ground is beneficial (helps prevent ground 
> bounce & vcc sag).  The vast majority of the switching capacitance (clocks, 
> routing, ALMs, MACs, etc.) is in metal surrounded by low-k.

I doubt it.  The dielectric above the transistors is regular undoped 
glass (SiO2).  K = 4.3.  Then comes the lo-K after M1.  M1 through M5 is 
all they can do as lo-K, if they do more, it sufffers major yield and 
reliability issues.  Of maybe you haven't noticed the delamination yet?
> 
> 
>>At least, we evaluated both with, and without Lo-K devices (from the same 
>>masks and fab), and were surprised to see only a 5% improvement.
>>
>>Did you do the same experiment?  We were surprised.
> 
> 
> We simulated everything with and without low-K, and got the ~13% improvement 

Nope.  You did not.  If you did, you would discover that the layer above 
the transistors and below metal 1, as well as the upper layers for 
clocks, etc. leads to less than expected improvements.  I am pretty sure 
your ICDES folks just scaled everything.  It would be a major project to 
develop, and QC spice models for both processes, and I seriously doubt 
anyone would bother.

> I previously mentioned.  I am also surprised you got only 5%.  That is 
> certainly well below mainstream for the industry -- if everyone were seeing 
> such small gains,

which they are.

  I doubt the fabs and semiconductor equipment vendors would
> be pumping billions into developing low-k (and next generation extra-low-k) 
> dielectrics.

The only folks making money on this are the equipment suppliers.  No one 
I know asked for it.  Yes, it can be a major benefit to ASIC, uP, and 
perhaps memories.  But, it just isn't doing anything for us.  Now, we 
will get lo-K for free, as they have the equipment and process now, 
butguess what?  We still do not see more than a 5% improvement from V4 
without lo-K to V4 with lo-K.  Wow, two generations and two sets of side 
by side lo-K and regular experiments.

Ignorance I guess is bliss.

   Sounds like you may have used low-k for only a few metal
> layers, so perhaps that explains your disappointing experience.

Nope,as I described, the only layers alloed to be lo-K for lifetime 
delamination issues and quality are the ones above M1, and below M5. 
Anymore than that, and we have see problems with fab process qual (not 
on our parts, but their test structures).

> 
> 
>>Turns out, there is a lot more in the equations that just C.
>>
>>If it was just that simple, extracted simulations in spice would be 
>>unneeded.
> 
> 
> This is backwards.  As metal capacitance has become the dominant 
> capacitance, extracting layouts to obtain all the metal parasitics before 
> running SPICE has become essential to getting accurate answers.  Go back 
> enough process generations and this was less true -- you could write up your 
> transistor-level schematic in a SPICE deck, simulate it with no thought of 
> metal, and you wouldn't be that far off for most circuits, since transistor 
> parasitics dominated.  Now that metal dominates, you have to extract layouts 
> to get the metal C or you get bad answers.

I can see you really have no clue about where the wire models are going. 
  How thick is the metal, how thick is the dielectric?  How close are 
the wires?  There is R there (and lots of it).  There is C there, too. 
There is also side wall C (the sidewalls are regular FSG, or SiO2 -- no 
lo-K advantage).

Again, you go back and ask if they actually had foundry models for with, 
and without, and what the actual stack up was.  One of the biggest 
overstatements we have seen recently is all of this nonsense about the 
superiority of lo-K.

Its nice, don't get me wrong, but don't tout it as a miracle if you have 
never proven it is.  You don't know.  We do.

Take the time to do it right, or at least study it right.  Get an ICDES 
wire model expert to talk to you about where the lo-K is, and isn't.

> 
> Vaughn Betz
> Altera
> [v b e t z (at) altera.com]
> 
>

Previous 1 234 5 Next

Updated Stratix II Power Specs & Explanation

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group