Reply by Austin Lesea May 17, 20052005-05-17
Jim,

Yes, they are marked so they an be identified easily.

For qualification we advise customers to obtain parts from both fabs (ie 
request them) so they do not have to do qualification more than once.

Of course, customers are free to request parts from a single vendor, 
although that may subject them to supply issues if they are not planning 
far enough ahead.

Austin

Jim Granville wrote:

> Austin Lesea wrote: > >> >> Two fabs: It is a challenge, but then having two qualified sources of >> supply is a definite advantage for our customers. > > > Hmm.. Do these parts have different label codes, so users can > readily identify which FAB they came from ? > This could be an advantage to some, but to others it might mean having > to double-up on the qualification work, or worse... > > -jg >
Reply by Jim Granville May 17, 20052005-05-17
Austin Lesea wrote:
> > Two fabs: It is a challenge, but then having two qualified sources of > supply is a definite advantage for our customers.
Hmm.. Do these parts have different label codes, so users can readily identify which FAB they came from ? This could be an advantage to some, but to others it might mean having to double-up on the qualification work, or worse... -jg
Reply by Paul Leventis May 17, 20052005-05-17
> Low-K: Don't get me wrong, I like low K, I like low pin capacitance
too.
> I also like fine wine, and a good meal. I had already asked you to
> fab the S2 without low-K and measure it. We did that for V2 and V2P,
> and again for V4 at Toshiba and UMC. We know. You guess.
Do you like Spice too? Try taking a routing path and re-spicing with 20% lower metal capacitance. Either your chips have no metal in them or you'll see an improvement in delay with low-k. And yes, we have measured the difference in silicon. We fabbed Stratix in FSG and low-k; this was part of our qualification and testing of low-k. We didn't ever ship these low-k Stratix devices because we had sufficient yield into our fast devices. But we measured a performance advantage matching our expectations.
> Low power: What is low, is our power dissipation. The static
leakage
> kills you folks as the part gets hot. And what FPGA in the high end > isn't running hot? Yours just run even hotter due to the leakage (or
> require more expensive heatsink solutions). This one is so easy to > prove it is silly for you to even try to compete on total power.
At worst we're talking about a 1W difference in 2S180-sized part, for worst-case leakage. How much dynamic power is being consumed in a chip that size? In the vast majority of applications it will be a fair bit -- somewhere in the 5-10W vacinity wouldn't surprise me. Our dynamic power advantage on logic/routing, RAMs, DSPs, and especially I/Os will cover the difference in static power. And where are those V4 worst-case leakage specs? It seems that you don't really have a handle on static power if a year after a product introduction you still don't know how bad it can be. BTW, did you notice that the 2S60/LX60 devices used in your recent net seminar had the same static power (extrapolate from the dynamic power data)? And I love how you imply that our 2S90 chip you tested is out-of-spec on leakage, when in fact it falls between our "typical" and "worst-case" spec. Regards, Paul Leventis Altera Corp.
Reply by Austin Lesea May 17, 20052005-05-17
Paul,

I am sure the newsgroup is getting really bored with this.  I certainly 
am.  Short and sweet:

Two fabs:  It is a challenge, but then having two qualified sources of 
supply is a definite advantage for our customers.

Low-K: Don't get me wrong, I like low K, I like low pin capacitance too. 
  I also like fine wine, and a good meal.  I had already asked you to 
fab the S2 without low-K and measure it.  We did that for V2 and V2P, 
and again for V4 at Toshiba and UMC.  We know.  You guess.

Low power:  What is low, is our power dissipation.  The static leakage 
kills you folks as the part gets hot.  And what FPGA in the high end 
isn't running hot?  Yours just run even hotter due to the leakage (or 
require more expensive heatsink solutions).  This one is so easy to 
prove it is silly for you to even try to compete on total power.

Austin
Reply by Paul Leventis May 17, 20052005-05-17
> And, stop with the low-K dielectric. All of the Toshiba parts are
low
> K. Guess what? We do not speed grade or power grade them
differently,
> because it just doesn't make that much of a difference!
So the long delay in getting the -12 speed grade out had nothing to do with this fab transition? It must be fun characterizing one product produced in two fabs with two different processes (one low-k, one not, and who knows what else is different).
> Perhaps an ASIC can take proper advantage of low K, but the FPGAs
just
> do not show much of an improvement at all.
I wish we had this "defie the laws of physics" technology you use on Virtex-4. First you claim your devices do not draw more current with increased voltage. Then you claim that increased metal capacitance has no impact on speed or power. I'm waiting for you to claim that I/O pin capacitance doesn't matter for performance, signal integrity or power...
> The Japanese engineer who touched the S2 and V4 chips on our > demonstrator said it all: "S2 hot! V4 cool..."
A very scientific test! Let's do some quick math here... Even if you found some demo with a 1W VccInt difference, this should only translate to ~10 C difference in chip temperature (still air, no heat sink on 2S60 --> Theta-JA = 10.4 C/W), which would hardly be discernable to the touch. Why was this demo so much hotter to the touch then? My educated guess (based on the analysis of one of our customers) is that you had unequal I/O settings, causing lots more I/O dissipation in our chip. Really, that is rather low. Regards, Paul Leventis Altera Corp.
Reply by Austin Lesea May 17, 20052005-05-17
Paul,

Yes, you can get the fastest speed grade.  Really a cheap shot, that 
one.  I sense some real desperation.

And, stop with the low-K dielectric.  All of the Toshiba parts are low 
K.  Guess what?  We do not speed grade or power grade them differently, 
because it just doesn't make that much of a difference!

Perhaps an ASIC can take proper advantage of low K, but the FPGAs just 
do not show much of an improvement at all.

And stop with the power "advantages of S2."

The Japanese engineer who touched the S2 and V4 chips on our 
demonstrator said it all:  "S2 hot! V4 cool..."

Austin
Reply by Mike Treseler May 17, 20052005-05-17
Paul Leventis (at home) wrote:

> You make good points. If you need 66, what does it matter if you get 70 vs. > 75? The problem is at the time most customers select a part, they do not > have a complete (or even partial) design.
True. Picking an FPGA footprint can defer to the simulation of its contents. Making a board used to be a high risk, critical path, long lead time task. Four or five spins was the norm. The fpga was a detail and getting the software guys something to play with was the priority. Today boards have fewer parts with more balls, and making a board is not such a big deal. There is little reason to make a quick board for the software guys because all the interesting registers are in the fpga. HDL simulation is a critical path item.
> In my mind, speed matters most as a time-saving feature. If the CAD tools > and chip you are using enable you to hit your performance requirements using > plain, architecture-agnostic HDL, push-button in the CAD tools, you've saved > yourself a bundle of hurt.
Amen to that. -- Mike Treseler
Reply by Paul Leventis (at home) May 17, 20052005-05-17
> Warning: Ranty, opinionated (and quite probably wrong):
Those are the best kind of kind of posts...
> How much does performance really matter?
You make good points. If you need 66, what does it matter if you get 70 vs. 75? The problem is at the time most customers select a part, they do not have a complete (or even partial) design. You know your Mhz requirement, but have no idea if you will hit it. If you select a faster part, you are more *likely* to hit your Fmax target. How much more likely? Its hard to say. But consider the downside to missing performance. At best, you have to push the tools, or floorplan, or re-pipeline, or restructure your HDL. At worst, you need to respin your board, select a new product, maybe get a faster speedgrade, or change other aspects of your system design to accomodate a lower clock speed. All of this costs time, and time-to-market is one of the big FPGA sales points. Not all clock domains are defined by external requirements. Sometimes the faster you can run your core, the better the performance of your system (example -- graphics processor) even though your bus and memory speeds are still the same. Also, if you get fast enough in your internal clock domains, you might be able to cut the data width or multiplicity of your internal logic, allowing you to migrate into a smaller (and thus cheaper) part. In my mind, speed matters most as a time-saving feature. If the CAD tools and chip you are using enable you to hit your performance requirements using plain, architecture-agnostic HDL, push-button in the CAD tools, you've saved yourself a bundle of hurt. Its interesting -- we see the results of having fast chips and good out-of-the-box software performance, as these features translate into lower support costs of the "I need help meeting my timing" variety. Having speed is not enough. We have to have the features you need, but not too many as to exceed your cost requirements (is a feature that costs 3% but is only used by 1% of designers worth it?). Our software and support have to be up to your needs. And so on. But that won't stop us from discussing speed, or power, or SI in isolation of these other design requirements. Regards, Paul Leventis Altera Corp.
Reply by Paul Leventis (at home) May 17, 20052005-05-17
> Then there is the interconnect. V4 is 500 ps faster for full chip routes, > 400 ps faster for 1/2 chip routes, 100-200 ps faster for a few CLBs, LABs, > and 100-200ps for neighbor routes. Some very short routes are 30ps better > in S2.
I would guess that you did not normalize to take into account packing density. How do you define a "short" route? Do you multiply the # of CLBs and # of LABs by the right ratio of logic? I'd argue that 1 LAB = 8 ALMs = ~10-10.5 slices (based on our density analysis). Anyway, the average distance of a hop in a critical path is roughly 3 LABs, so short connections are the most important. Our data shows a performance advantage in hops of this length.
> Of course, anythign you can direct into the DSP48s will just scream, and > outperform anything S2 has.
That's interesting... did you miss the news that we've increased Stratix II DSP performance to 550 Mhz in Quartus II 5.0? Not to mention that the S2 DSP can do 36-bit multiplies in hardware (vs. 18-bit for DSP48)... but I will not digress into a feature pissing contest.
> I think that the newsgroup here will basically tell you to try a design in > both architectures, and play with the constraints to see how well it does.
On this, I agree with Austin. Kick the tires. Just be sure to set timing constraints before doing so, and also make sure not use "toy" designs (neither tool is particularly well optimized for very small designs in very large chips). And beware numerical noise -- placement & routing is a heuristic. If you perturb any aspect of the input, the output can change due to random differences in algorithm outcome. Regards, Paul Leventis Altera Corp.
Reply by Paul Leventis (at home) May 17, 20052005-05-17
Hi Joseph,

First, I must stress that comparing "micro parameters" is difficult at best 
and dangerous at worst.  There are fairly arbitrary decisions made during 
timing modeling about where you lump various delays.  For example, where 
does "LUT delay" begin and end -- is it at the output of the 1st stage 
buffer after the multiplexor before the LUT?  Or is that multiplexor's delay 
included as part of LUT delay?

The Stratix/Stratix II/Cyclone/Cyclone II/Max II timing models are 
sufficiently complicated that there is little point to making datasheet 
entries for various internal timing parameters.  For example, the ALM is 
fairly complicated and depending on how your logic is synthesized and 
exactly how the router chooses to hook it up, your delay can vary 
considerably.  So your best bet is to look at real circuits with real timing 
constraints, since Quartus II will do its best to put the critical signals 
on the fastest paths.  That said...

As some posters have already pointed out, RAM speeds have increased in 
Quartus 5.0.  The latest comparison I've seen shows us with a Tco advantage 
vs. Virtex-4 when the RAM output registers are used, and a slight 
disadvantage when the RAM is unregistered -- in either case a few hundred ps 
difference.

As for LUT delays, here are the latest numbers I've got for a fastest speed 
grade 7-input LUT (ALM can do some inputs of 7-inputs, and all functions of 
6-inputs), as well as for a 4-LUT (the ALM can do two independent 4-LUTs).

Input    7-LUT    4-LUT
A        378 ps   366 ps
B        357 ps   228 ps
C        240 ps   225 ps
D        240 ps   53 ps
E        144 ps
F         53 ps
G        234 ps

According to Austin's post, Virtex-4 (fastest speed grade -- I dare you to 
try to buy one ;-)) shows 165 ps across-the-board (seems bogus to me, but 
what do I know).  So which LUT is faster based on this data?  Well, it 
depends on how we lumped our delays into logic vs. routing (see above).  It 
also depends on how often Quartus II will manage to route your critical 
signal on the fast LUT inputs -- usually it does a very good job of this.

The other critical component for logic fabric performance is the routing. 
Based on an analysis of routing delay between registers placed a varying 
distance apart in the X- and Y-directions, we've found that we have a ~20% 
delay advantage (fastest speed grade vs. fastest speed grade).  Of course, 
even this type of study has its caveats -- how do you normalize distance to 
take into account differences in logic density?

Stratix II employs a low-k inter-metal diaelectric (k = 2.9) vs. Virtex-4's 
"reduced-k" diaelectric (k = 3.6), given us a ~20% metal capacitance 
advantage.  If you set aside architectural and circuit differences, to first 
order you'd expect this to translate into a performance advantage for 
Stratix II.

Regards,

Paul Leventis
Altera Corp.