Jim,

Yes, they are marked so they an be identified easily.

For qualification we advise customers to obtain parts from both fabs (ie 
request them) so they do not have to do qualification more than once.

Of course, customers are free to request parts from a single vendor, 
although that may subject them to supply issues if they are not planning 
far enough ahead.

Austin

Jim Granville wrote:

> Austin Lesea wrote:
> 
>>
>> Two fabs:  It is a challenge, but then having two qualified sources of 
>> supply is a definite advantage for our customers.
> 
> 
>  Hmm.. Do these parts have different label codes, so users can
> readily identify which FAB they came from ?
>  This could be an advantage to some, but to others it might mean having 
> to double-up on the qualification work, or worse...
> 
> -jg
>

Austin Lesea wrote:
> 
> Two fabs:  It is a challenge, but then having two qualified sources of 
> supply is a definite advantage for our customers.

  Hmm.. Do these parts have different label codes, so users can
readily identify which FAB they came from ?
  This could be an advantage to some, but to others it might mean having 
to double-up on the qualification work, or worse...

-jg

> Low-K: Don't get me wrong, I like low K, I like low pin capacitance
too.
>   I also like fine wine, and a good meal.  I had already asked you to

> fab the S2 without low-K and measure it.  We did that for V2 and V2P,

> and again for V4 at Toshiba and UMC.  We know.  You guess.

Do you like Spice too?  Try taking a routing path and re-spicing with
20% lower metal capacitance.  Either your chips have no metal in them
or you'll see an improvement in delay with low-k.

And yes, we have measured the difference in silicon.  We fabbed Stratix
in FSG and low-k; this was part of our qualification and testing of
low-k.  We didn't ever ship these low-k Stratix devices because we had
sufficient yield into our fast devices.  But we measured a performance
advantage matching our expectations.

> Low power:  What is low, is our power dissipation.  The static
leakage
> kills you folks as the part gets hot.  And what FPGA in the high end
> isn't running hot?  Yours just run even hotter due to the leakage (or

> require more expensive heatsink solutions).  This one is so easy to
> prove it is silly for you to even try to compete on total power.

At worst we're talking about a 1W difference in 2S180-sized part, for
worst-case leakage.  How much dynamic power is being consumed in a chip
that size?  In the vast majority of applications it will be a fair bit
-- somewhere in the 5-10W vacinity wouldn't surprise me.  Our dynamic
power advantage on logic/routing, RAMs, DSPs, and especially I/Os will
cover the difference in static power.

And where are those V4 worst-case leakage specs?  It seems that you
don't really have a handle on static power if a year after a product
introduction you still don't know how bad it can be.

BTW, did you notice that the 2S60/LX60 devices used in your recent net
seminar had the same static power (extrapolate from the dynamic power
data)?  And I love how you imply that our 2S90 chip you tested is
out-of-spec on leakage, when in fact it falls between our "typical" and
"worst-case" spec.

Regards,

Paul Leventis
Altera Corp.

Paul,

I am sure the newsgroup is getting really bored with this.  I certainly 
am.  Short and sweet:

Two fabs:  It is a challenge, but then having two qualified sources of 
supply is a definite advantage for our customers.

Low-K: Don't get me wrong, I like low K, I like low pin capacitance too. 
  I also like fine wine, and a good meal.  I had already asked you to 
fab the S2 without low-K and measure it.  We did that for V2 and V2P, 
and again for V4 at Toshiba and UMC.  We know.  You guess.

Low power:  What is low, is our power dissipation.  The static leakage 
kills you folks as the part gets hot.  And what FPGA in the high end 
isn't running hot?  Yours just run even hotter due to the leakage (or 
require more expensive heatsink solutions).  This one is so easy to 
prove it is silly for you to even try to compete on total power.

Austin

> And, stop with the low-K dielectric.  All of the Toshiba parts are
low
> K.  Guess what?  We do not speed grade or power grade them
differently,
> because it just doesn't make that much of a difference!

So the long delay in getting the -12 speed grade out had nothing to do
with this fab transition?  It must be fun characterizing one product
produced in two fabs with two different processes (one low-k, one not,
and who knows what else is different).

> Perhaps an ASIC can take proper advantage of low K, but the FPGAs
just
> do not show much of an improvement at all.

I wish we had this "defie the laws of physics" technology you use on
Virtex-4.  First you claim your devices do not draw more current with
increased voltage.  Then you claim that increased metal capacitance has
no impact on speed or power.  I'm waiting for you to claim that I/O pin
capacitance doesn't matter for performance, signal integrity or
power...

> The Japanese engineer who touched the S2 and V4 chips on our
> demonstrator said it all:  "S2 hot! V4 cool..."

A very scientific test!  Let's do some quick math here... Even if you
found some demo with a 1W VccInt difference, this should only translate
to ~10 C difference in chip temperature (still air, no heat sink on
2S60 --> Theta-JA = 10.4 C/W), which would hardly be discernable to the
touch.  Why was this demo so much hotter to the touch then?  My
educated guess (based on the analysis of one of our customers) is that
you had unequal I/O settings, causing lots more I/O dissipation in our
chip.  Really, that is rather low.

Regards,

Paul Leventis
Altera Corp.

Paul,

Yes, you can get the fastest speed grade.  Really a cheap shot, that 
one.  I sense some real desperation.

And, stop with the low-K dielectric.  All of the Toshiba parts are low 
K.  Guess what?  We do not speed grade or power grade them differently, 
because it just doesn't make that much of a difference!

Perhaps an ASIC can take proper advantage of low K, but the FPGAs just 
do not show much of an improvement at all.

And stop with the power "advantages of S2."

The Japanese engineer who touched the S2 and V4 chips on our 
demonstrator said it all:  "S2 hot! V4 cool..."

Austin

Paul Leventis (at home) wrote:

> You make good points.  If you need 66, what does it matter if you get 70 vs. 
> 75?  The problem is at the time most customers select a part, they do not 
> have a complete (or even partial) design.

True. Picking an FPGA footprint can defer to
the simulation of its contents.

Making a board used to be a high risk, critical path,
long lead time task. Four or five spins was the norm.
The fpga was a detail and getting the software guys
something to play with was the priority.

Today boards have fewer parts with more balls,
and making a board is not such a big deal.
There is little reason to make a quick board
for the software guys because all the interesting
registers are in the fpga. HDL simulation is
a critical path item.

> In my mind, speed matters most as a time-saving feature.  If the CAD tools 
> and chip you are using enable you to hit your performance requirements using 
> plain, architecture-agnostic HDL, push-button in the CAD tools, you've saved 
> yourself a bundle of hurt. 

Amen to that.

   -- Mike Treseler

> Warning:  Ranty, opinionated (and quite probably wrong):

Those are the best kind of kind of posts...

> How much does performance really matter?

You make good points.  If you need 66, what does it matter if you get 70 vs. 
75?  The problem is at the time most customers select a part, they do not 
have a complete (or even partial) design.  You know your Mhz requirement, 
but have no idea if you will hit it.  If you select a faster part, you are 
more *likely* to hit your Fmax target.  How much more likely?  Its hard to 
say.

But consider the downside to missing performance.  At best, you have to push 
the tools, or floorplan, or re-pipeline, or restructure your HDL.  At worst, 
you need to respin your board, select a new product, maybe get a faster 
speedgrade, or change other aspects of your system design to accomodate a 
lower clock speed.  All of this costs time, and time-to-market is one of the 
big FPGA sales points.

Not all clock domains are defined by external requirements.  Sometimes the 
faster you can run your core, the better the performance of your system 
(example -- graphics processor) even though your bus and memory speeds are 
still the same.  Also, if you get fast enough in your internal clock 
domains, you might be able to cut the data width or multiplicity of your 
internal logic, allowing you to migrate into a smaller (and thus cheaper) 
part.

In my mind, speed matters most as a time-saving feature.  If the CAD tools 
and chip you are using enable you to hit your performance requirements using 
plain, architecture-agnostic HDL, push-button in the CAD tools, you've saved 
yourself a bundle of hurt.  Its interesting -- we see the results of having 
fast chips and good out-of-the-box software performance, as these features 
translate into lower support costs of the "I need help meeting my timing" 
variety.

Having speed is not enough.  We have to have the features you need, but not 
too many as to exceed your cost requirements (is a feature that costs 3% but 
is only used by 1% of designers worth it?).  Our software and support have 
to be up to your needs.  And so on.  But that won't stop us from discussing 
speed, or power, or SI in isolation of these other design requirements.

Regards,

Paul Leventis
Altera Corp.

> Then there is the interconnect.  V4 is 500 ps faster for full chip routes, 
> 400 ps faster for 1/2 chip routes, 100-200 ps faster for a few CLBs, LABs, 
> and 100-200ps for neighbor routes.  Some very short routes are 30ps better 
> in S2.

I would guess that you did not normalize to take into account packing 
density.  How do you define a "short" route?  Do you multiply the # of CLBs 
and # of LABs by the right ratio of logic?  I'd argue that 1 LAB = 8 ALMs = 
~10-10.5 slices (based on our density analysis).

Anyway, the average distance of a hop in a critical path is roughly 3 LABs, 
so short connections are the most important.  Our data shows a performance 
advantage in hops of this length.

> Of course, anythign you can direct into the DSP48s will just scream, and 
> outperform anything S2 has.

That's interesting... did you miss the news that we've increased Stratix II 
DSP performance to 550 Mhz in Quartus II 5.0?   Not to mention that the S2 
DSP can do 36-bit multiplies in hardware (vs. 18-bit for DSP48)... but I 
will not digress into a feature pissing contest.

> I think that the newsgroup here will basically tell you to try a design in 
> both architectures, and play with the constraints to see how well it does.

On this, I agree with Austin.  Kick the tires.  Just be sure to set timing 
constraints before doing so, and also make sure not use "toy" designs 
(neither tool is particularly well optimized for very small designs in very 
large chips).  And beware numerical noise -- placement & routing is a 
heuristic.  If you perturb any aspect of the input, the output can change 
due to random differences in algorithm outcome.

Regards,

Paul Leventis
Altera Corp.

Hi Joseph,

First, I must stress that comparing "micro parameters" is difficult at best 
and dangerous at worst.  There are fairly arbitrary decisions made during 
timing modeling about where you lump various delays.  For example, where 
does "LUT delay" begin and end -- is it at the output of the 1st stage 
buffer after the multiplexor before the LUT?  Or is that multiplexor's delay 
included as part of LUT delay?

The Stratix/Stratix II/Cyclone/Cyclone II/Max II timing models are 
sufficiently complicated that there is little point to making datasheet 
entries for various internal timing parameters.  For example, the ALM is 
fairly complicated and depending on how your logic is synthesized and 
exactly how the router chooses to hook it up, your delay can vary 
considerably.  So your best bet is to look at real circuits with real timing 
constraints, since Quartus II will do its best to put the critical signals 
on the fastest paths.  That said...

As some posters have already pointed out, RAM speeds have increased in 
Quartus 5.0.  The latest comparison I've seen shows us with a Tco advantage 
vs. Virtex-4 when the RAM output registers are used, and a slight 
disadvantage when the RAM is unregistered -- in either case a few hundred ps 
difference.

As for LUT delays, here are the latest numbers I've got for a fastest speed 
grade 7-input LUT (ALM can do some inputs of 7-inputs, and all functions of 
6-inputs), as well as for a 4-LUT (the ALM can do two independent 4-LUTs).

Input    7-LUT    4-LUT
A        378 ps   366 ps
B        357 ps   228 ps
C        240 ps   225 ps
D        240 ps   53 ps
E        144 ps
F         53 ps
G        234 ps

According to Austin's post, Virtex-4 (fastest speed grade -- I dare you to 
try to buy one ;-)) shows 165 ps across-the-board (seems bogus to me, but 
what do I know).  So which LUT is faster based on this data?  Well, it 
depends on how we lumped our delays into logic vs. routing (see above).  It 
also depends on how often Quartus II will manage to route your critical 
signal on the fast LUT inputs -- usually it does a very good job of this.

The other critical component for logic fabric performance is the routing. 
Based on an analysis of routing delay between registers placed a varying 
distance apart in the X- and Y-directions, we've found that we have a ~20% 
delay advantage (fastest speed grade vs. fastest speed grade).  Of course, 
even this type of study has its caveats -- how do you normalize distance to 
take into account differences in logic density?

Stratix II employs a low-k inter-metal diaelectric (k = 2.9) vs. Virtex-4's 
"reduced-k" diaelectric (k = 3.6), given us a ~20% metal capacitance 
advantage.  If you set aside architectural and circuit differences, to first 
order you'd expect this to translate into a performance advantage for 
Stratix II.

Regards,

Paul Leventis
Altera Corp.