Jim,
Yes, they are marked so they an be identified easily.
For qualification we advise customers to obtain parts from both fabs (ie
request them) so they do not have to do qualification more than once.
Of course, customers are free to request parts from a single vendor,
although that may subject them to supply issues if they are not planning
far enough ahead.
Austin
Jim Granville wrote:
> Austin Lesea wrote:
>
>>
>> Two fabs: It is a challenge, but then having two qualified sources of
>> supply is a definite advantage for our customers.
>
>
> Hmm.. Do these parts have different label codes, so users can
> readily identify which FAB they came from ?
> This could be an advantage to some, but to others it might mean having
> to double-up on the qualification work, or worse...
>
> -jg
>
Reply by Jim Granville●May 17, 20052005-05-17
Austin Lesea wrote:
>
> Two fabs: It is a challenge, but then having two qualified sources of
> supply is a definite advantage for our customers.
Hmm.. Do these parts have different label codes, so users can
readily identify which FAB they came from ?
This could be an advantage to some, but to others it might mean having
to double-up on the qualification work, or worse...
-jg
Reply by Paul Leventis●May 17, 20052005-05-17
> Low-K: Don't get me wrong, I like low K, I like low pin capacitance
too.
> I also like fine wine, and a good meal. I had already asked you to
> fab the S2 without low-K and measure it. We did that for V2 and V2P,
> and again for V4 at Toshiba and UMC. We know. You guess.
Do you like Spice too? Try taking a routing path and re-spicing with
20% lower metal capacitance. Either your chips have no metal in them
or you'll see an improvement in delay with low-k.
And yes, we have measured the difference in silicon. We fabbed Stratix
in FSG and low-k; this was part of our qualification and testing of
low-k. We didn't ever ship these low-k Stratix devices because we had
sufficient yield into our fast devices. But we measured a performance
advantage matching our expectations.
> Low power: What is low, is our power dissipation. The static
leakage
> kills you folks as the part gets hot. And what FPGA in the high end
> isn't running hot? Yours just run even hotter due to the leakage (or
> require more expensive heatsink solutions). This one is so easy to
> prove it is silly for you to even try to compete on total power.
At worst we're talking about a 1W difference in 2S180-sized part, for
worst-case leakage. How much dynamic power is being consumed in a chip
that size? In the vast majority of applications it will be a fair bit
-- somewhere in the 5-10W vacinity wouldn't surprise me. Our dynamic
power advantage on logic/routing, RAMs, DSPs, and especially I/Os will
cover the difference in static power.
And where are those V4 worst-case leakage specs? It seems that you
don't really have a handle on static power if a year after a product
introduction you still don't know how bad it can be.
BTW, did you notice that the 2S60/LX60 devices used in your recent net
seminar had the same static power (extrapolate from the dynamic power
data)? And I love how you imply that our 2S90 chip you tested is
out-of-spec on leakage, when in fact it falls between our "typical" and
"worst-case" spec.
Regards,
Paul Leventis
Altera Corp.
Reply by Austin Lesea●May 17, 20052005-05-17
Paul,
I am sure the newsgroup is getting really bored with this. I certainly
am. Short and sweet:
Two fabs: It is a challenge, but then having two qualified sources of
supply is a definite advantage for our customers.
Low-K: Don't get me wrong, I like low K, I like low pin capacitance too.
I also like fine wine, and a good meal. I had already asked you to
fab the S2 without low-K and measure it. We did that for V2 and V2P,
and again for V4 at Toshiba and UMC. We know. You guess.
Low power: What is low, is our power dissipation. The static leakage
kills you folks as the part gets hot. And what FPGA in the high end
isn't running hot? Yours just run even hotter due to the leakage (or
require more expensive heatsink solutions). This one is so easy to
prove it is silly for you to even try to compete on total power.
Austin
Reply by Paul Leventis●May 17, 20052005-05-17
> And, stop with the low-K dielectric. All of the Toshiba parts are
low
> K. Guess what? We do not speed grade or power grade them
differently,
> because it just doesn't make that much of a difference!
So the long delay in getting the -12 speed grade out had nothing to do
with this fab transition? It must be fun characterizing one product
produced in two fabs with two different processes (one low-k, one not,
and who knows what else is different).
> Perhaps an ASIC can take proper advantage of low K, but the FPGAs
just
> do not show much of an improvement at all.
I wish we had this "defie the laws of physics" technology you use on
Virtex-4. First you claim your devices do not draw more current with
increased voltage. Then you claim that increased metal capacitance has
no impact on speed or power. I'm waiting for you to claim that I/O pin
capacitance doesn't matter for performance, signal integrity or
power...
> The Japanese engineer who touched the S2 and V4 chips on our
> demonstrator said it all: "S2 hot! V4 cool..."
A very scientific test! Let's do some quick math here... Even if you
found some demo with a 1W VccInt difference, this should only translate
to ~10 C difference in chip temperature (still air, no heat sink on
2S60 --> Theta-JA = 10.4 C/W), which would hardly be discernable to the
touch. Why was this demo so much hotter to the touch then? My
educated guess (based on the analysis of one of our customers) is that
you had unequal I/O settings, causing lots more I/O dissipation in our
chip. Really, that is rather low.
Regards,
Paul Leventis
Altera Corp.
Reply by Austin Lesea●May 17, 20052005-05-17
Paul,
Yes, you can get the fastest speed grade. Really a cheap shot, that
one. I sense some real desperation.
And, stop with the low-K dielectric. All of the Toshiba parts are low
K. Guess what? We do not speed grade or power grade them differently,
because it just doesn't make that much of a difference!
Perhaps an ASIC can take proper advantage of low K, but the FPGAs just
do not show much of an improvement at all.
And stop with the power "advantages of S2."
The Japanese engineer who touched the S2 and V4 chips on our
demonstrator said it all: "S2 hot! V4 cool..."
Austin
Reply by Mike Treseler●May 17, 20052005-05-17
Paul Leventis (at home) wrote:
> You make good points. If you need 66, what does it matter if you get 70 vs.
> 75? The problem is at the time most customers select a part, they do not
> have a complete (or even partial) design.
True. Picking an FPGA footprint can defer to
the simulation of its contents.
Making a board used to be a high risk, critical path,
long lead time task. Four or five spins was the norm.
The fpga was a detail and getting the software guys
something to play with was the priority.
Today boards have fewer parts with more balls,
and making a board is not such a big deal.
There is little reason to make a quick board
for the software guys because all the interesting
registers are in the fpga. HDL simulation is
a critical path item.
> In my mind, speed matters most as a time-saving feature. If the CAD tools
> and chip you are using enable you to hit your performance requirements using
> plain, architecture-agnostic HDL, push-button in the CAD tools, you've saved
> yourself a bundle of hurt.
Amen to that.
-- Mike Treseler
Reply by Paul Leventis (at home)●May 17, 20052005-05-17
You make good points. If you need 66, what does it matter if you get 70 vs.
75? The problem is at the time most customers select a part, they do not
have a complete (or even partial) design. You know your Mhz requirement,
but have no idea if you will hit it. If you select a faster part, you are
more *likely* to hit your Fmax target. How much more likely? Its hard to
say.
But consider the downside to missing performance. At best, you have to push
the tools, or floorplan, or re-pipeline, or restructure your HDL. At worst,
you need to respin your board, select a new product, maybe get a faster
speedgrade, or change other aspects of your system design to accomodate a
lower clock speed. All of this costs time, and time-to-market is one of the
big FPGA sales points.
Not all clock domains are defined by external requirements. Sometimes the
faster you can run your core, the better the performance of your system
(example -- graphics processor) even though your bus and memory speeds are
still the same. Also, if you get fast enough in your internal clock
domains, you might be able to cut the data width or multiplicity of your
internal logic, allowing you to migrate into a smaller (and thus cheaper)
part.
In my mind, speed matters most as a time-saving feature. If the CAD tools
and chip you are using enable you to hit your performance requirements using
plain, architecture-agnostic HDL, push-button in the CAD tools, you've saved
yourself a bundle of hurt. Its interesting -- we see the results of having
fast chips and good out-of-the-box software performance, as these features
translate into lower support costs of the "I need help meeting my timing"
variety.
Having speed is not enough. We have to have the features you need, but not
too many as to exceed your cost requirements (is a feature that costs 3% but
is only used by 1% of designers worth it?). Our software and support have
to be up to your needs. And so on. But that won't stop us from discussing
speed, or power, or SI in isolation of these other design requirements.
Regards,
Paul Leventis
Altera Corp.
Reply by Paul Leventis (at home)●May 17, 20052005-05-17
> Then there is the interconnect. V4 is 500 ps faster for full chip routes,
> 400 ps faster for 1/2 chip routes, 100-200 ps faster for a few CLBs, LABs,
> and 100-200ps for neighbor routes. Some very short routes are 30ps better
> in S2.
I would guess that you did not normalize to take into account packing
density. How do you define a "short" route? Do you multiply the # of CLBs
and # of LABs by the right ratio of logic? I'd argue that 1 LAB = 8 ALMs =
~10-10.5 slices (based on our density analysis).
Anyway, the average distance of a hop in a critical path is roughly 3 LABs,
so short connections are the most important. Our data shows a performance
advantage in hops of this length.
> Of course, anythign you can direct into the DSP48s will just scream, and
> outperform anything S2 has.
That's interesting... did you miss the news that we've increased Stratix II
DSP performance to 550 Mhz in Quartus II 5.0? Not to mention that the S2
DSP can do 36-bit multiplies in hardware (vs. 18-bit for DSP48)... but I
will not digress into a feature pissing contest.
> I think that the newsgroup here will basically tell you to try a design in
> both architectures, and play with the constraints to see how well it does.
On this, I agree with Austin. Kick the tires. Just be sure to set timing
constraints before doing so, and also make sure not use "toy" designs
(neither tool is particularly well optimized for very small designs in very
large chips). And beware numerical noise -- placement & routing is a
heuristic. If you perturb any aspect of the input, the output can change
due to random differences in algorithm outcome.
Regards,
Paul Leventis
Altera Corp.
Reply by Paul Leventis (at home)●May 17, 20052005-05-17
Hi Joseph,
First, I must stress that comparing "micro parameters" is difficult at best
and dangerous at worst. There are fairly arbitrary decisions made during
timing modeling about where you lump various delays. For example, where
does "LUT delay" begin and end -- is it at the output of the 1st stage
buffer after the multiplexor before the LUT? Or is that multiplexor's delay
included as part of LUT delay?
The Stratix/Stratix II/Cyclone/Cyclone II/Max II timing models are
sufficiently complicated that there is little point to making datasheet
entries for various internal timing parameters. For example, the ALM is
fairly complicated and depending on how your logic is synthesized and
exactly how the router chooses to hook it up, your delay can vary
considerably. So your best bet is to look at real circuits with real timing
constraints, since Quartus II will do its best to put the critical signals
on the fastest paths. That said...
As some posters have already pointed out, RAM speeds have increased in
Quartus 5.0. The latest comparison I've seen shows us with a Tco advantage
vs. Virtex-4 when the RAM output registers are used, and a slight
disadvantage when the RAM is unregistered -- in either case a few hundred ps
difference.
As for LUT delays, here are the latest numbers I've got for a fastest speed
grade 7-input LUT (ALM can do some inputs of 7-inputs, and all functions of
6-inputs), as well as for a 4-LUT (the ALM can do two independent 4-LUTs).
Input 7-LUT 4-LUT
A 378 ps 366 ps
B 357 ps 228 ps
C 240 ps 225 ps
D 240 ps 53 ps
E 144 ps
F 53 ps
G 234 ps
According to Austin's post, Virtex-4 (fastest speed grade -- I dare you to
try to buy one ;-)) shows 165 ps across-the-board (seems bogus to me, but
what do I know). So which LUT is faster based on this data? Well, it
depends on how we lumped our delays into logic vs. routing (see above). It
also depends on how often Quartus II will manage to route your critical
signal on the fast LUT inputs -- usually it does a very good job of this.
The other critical component for logic fabric performance is the routing.
Based on an analysis of routing delay between registers placed a varying
distance apart in the X- and Y-directions, we've found that we have a ~20%
delay advantage (fastest speed grade vs. fastest speed grade). Of course,
even this type of study has its caveats -- how do you normalize distance to
take into account differences in logic density?
Stratix II employs a low-k inter-metal diaelectric (k = 2.9) vs. Virtex-4's
"reduced-k" diaelectric (k = 3.6), given us a ~20% metal capacitance
advantage. If you set aside architectural and circuit differences, to first
order you'd expect this to translate into a performance advantage for
Stratix II.
Regards,
Paul Leventis
Altera Corp.