Sign in

username:

password:



Not a member?

Search Comp.Arch.FPGA



Search tips

fpga by Keywords

Altera | ASIC | CPLD | Cyclone | DCM | DDR | DSP | Ethernet | ISE | JTAG | Linux | LVDS | Microblaze | ML310 | Modelsim | NIOS | OPB | PCI | Quartus | RocketIO | SDRAM | Spartan | Spartan3 | SRAM | Stratix | Verilog | VHDL | Virtex | Virtex-4 | Virtex-II | Xilinx | XST


Ads

See Also

DSPEmbedded SystemsElectronics

Comp.Arch.FPGA | V4 vs. Stratix-II...

There are 28 messages in this thread.

You are currently looking at messages 0 to 10.

V4 vs. Stratix-II... - Joseph H Allen - 2005-05-12 16:08:00

I'm upgrading a design, and I'm in the early
phases of choosing a vendor. 
I'm trying to compare parts based on experience I've had in the past, so I'm
focusing on block RAM clock to out delay as a critical performance number:

Altera M4K vs. Xilinx Block RAM clock to out delay, non-registered outputs:

Stratix-II -3	2.46 ns
Stratix-II -4	2.828 ns
Stratix-II -5	3.393 ns

Xilinx-V4 -11	1.83 ns
Xilinx-V4 -10	2.10 ns

Xilinx-V2 -4	2.65 ns (current part)

V4 appears to be 1.62 times faster for the slowest speed grade parts (which
I'm probably most interested in, though I should really compare equal priced
parts), and slower even than the original V2 design.  Am I missing
something?  Several posts here suggest that Stratix-II interconnect is
faster- is there any datasheet evidence to back this up?  Lets say the RAM
output is at least feeding a 2:1 MUX before being registered, and porbably
has to travel ~1/3 the width of the chip.

Also, help me fill in my chart:

LUT delay:

Xilinx-V2 -4	439ps
Xilinx-V4 -10	200ps
Xilinx-V4 -11	170ps
Stratix-II	? (can't find any data)

Carry delay:

Xilinx-V2 -4	106ps
Xilinx-V4 -10	90 ps
Xilinx-V4 -11	80 ps
Stratix-II	? (can't find any data)

Routing delay:

I can do this with fpga_editor in Xilinx.  How to do it for Stratix-II ?

-- 
/*  j...@world.std.com (192.74.137.5) */               /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n","
#"[!a[q-1]]);}
______________________________
Join the blogging team on FPGARelated.com and earn rewards! Details Here.



Re: V4 vs. Stratix-II... - Ben Twijnstra - 2005-05-12 17:19:00

Hi Joseph,

I stopped reading data sheets since they're way too big and the information
is never organized the way I need to have it. So I tend to simply write
little test cases and let the tools tell me what I need to know.

I would personally just compile the design with your new constraints in both
ISE and Quartus II (v5 has just been released) and see who comes out best.

> Altera M4K vs. Xilinx Block RAM clock to out delay, non-registered
> outputs:
> 
> Stratix-II -3 2.46 ns
> Stratix-II -4 2.828 ns
> Stratix-II -5 3.393 ns
> 
> Xilinx-V4 -11 1.83 ns
> Xilinx-V4 -10 2.10 ns
> 
> Xilinx-V2 -4  2.65 ns (current part)

I suggest you re-check Stratix-II timing with Quartus II 5.0 - Altera has
been doing some re-characterization which seemingly hasn't made it to the
handbook yet. In an M4K I am using in a Stratix II I'm getting 1.85ns for a
-3 part and 2.4ns for a -5 part.

> LUT Delay:
> Stratix-II    ? (can't find any data)

Well, it kind of varies between (off the cuff) 83ps and 400ps depending on
the input that changes and the mode the ALM is in. 

Easy to check in Quartus with, for example, an 8-input AND or so. I'm
getting cell delays between 0.047 and 0.404ns depending on the mode and the
input of the ALM (see below on how to do this).

> Carry delay:
> 
> Xilinx-V2 -4  106ps
> Xilinx-V4 -10 90 ps
> Xilinx-V4 -11 80 ps
> Stratix-II    ? (can't find any data)
>
> Routing delay:
> 
> I can do this with fpga_editor in Xilinx.  How to do it for Stratix-II ?

Open the timing analyzer. Right-click a path and select "List Paths" from
the menu. When expanding the messgaes in the status window you should get
detailed info on both cell and routing delay of the path.

Best regards,


Ben


______________________________
Join the blogging team on FPGARelated.com and earn rewards! Details Here.

Re: V4 vs. Stratix-II... - Peter Sommerfeld - 2005-05-12 17:38:00

Hi Joseph,

Remember that in Q II 5.0 the M4k performance has increased from 400 to
550 MHz. It looks like you're using the out-of-date numbers for tCO.
The new ones should be ~ 1.88 ns (I'm guessing).

There's a few ways to find the routing delays in Q II. The most
detailed way is to open the Timing Floorplanner (Assignments/Timing
Closure Floorplan), right-click a used logic cell, and choose
Locate>Chip Editor.

>From here you can multi-select resources, choose View/Show Delays,
right-click, and choose "Generate Connections Between Nodes". You can
show the actual routes used with View/Highlight Routing.

The easier way is to stay in the Timing Floorplanner, Ctrl-click the
stuff you want to find delays for, make sure View/Routing/"Show Routing
Delays" is selected, and choose View/Routing/"Show Paths Between
Nodes".

Interesting ... the Sratix II handbook doesn't have LUT timing params.
I was sure they were there for Stratix. Well it shouldn't be too
difficult with Chip Editor ... maybe someone gets an answer before I do
...

-- Pete



Joseph H Allen wrote:
> I'm upgrading a design, and I'm in the early phases of choosing a
vendor.
> I'm trying to compare parts based on experience I've had in the past,
so I'm
> focusing on block RAM clock to out delay as a critical performance
number:
>
> Altera M4K vs. Xilinx Block RAM clock to out delay, non-registered
outputs:
>
> Stratix-II -3	2.46 ns
> Stratix-II -4	2.828 ns
> Stratix-II -5	3.393 ns
>
> Xilinx-V4 -11	1.83 ns
> Xilinx-V4 -10	2.10 ns
>
> Xilinx-V2 -4	2.65 ns (current part)
>
> V4 appears to be 1.62 times faster for the slowest speed grade parts
(which
> I'm probably most interested in, though I should really compare equal
priced
> parts), and slower even than the original V2 design.  Am I missing
> something?  Several posts here suggest that Stratix-II interconnect
is
> faster- is there any datasheet evidence to back this up?  Lets say
the RAM
> output is at least feeding a 2:1 MUX before being registered, and
porbably
> has to travel ~1/3 the width of the chip.
>
> Also, help me fill in my chart:
>
> LUT delay:
>
> Xilinx-V2 -4	439ps
> Xilinx-V4 -10	200ps
> Xilinx-V4 -11	170ps
> Stratix-II	? (can't find any data)
>
> Carry delay:
>
> Xilinx-V2 -4	106ps
> Xilinx-V4 -10	90 ps
> Xilinx-V4 -11	80 ps
> Stratix-II	? (can't find any data)
>
> Routing delay:
>
> I can do this with fpga_editor in Xilinx.  How to do it for
Stratix-II ?
>
> --
> /*  j...@world.std.com (192.74.137.5) */               /* Joseph
H. Allen */
> int
a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
>
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
>
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n","
#"[!a[q-1]]);}

______________________________
Join the blogging team on FPGARelated.com and earn rewards! Details Here.

Re: V4 vs. Stratix-II... - Austin Lesea - 2005-05-12 18:43:00

Joseph,

I just saw a presentation that shows that V4 is faster on all 
interconnet paths (by as much as 500 ps for long paths) except the 
immediate neighbor paths, where we are just ever to slightly slower than 
S2 neighbor paths.

I also saw LUT comparisons, which took 8 slides, with animations, as 
comparing the 4LUTs to the ALM-LUT is not trivial:  you have to look at 
each and every input to output delay.  And then you have to make a guess 
as to how your logic will get synthesized.  Yes, we are faster for 4 LUT 
(most inputs), and they are faaster for wider functions (but not all 
inputs).

For example:  S2 4LUT input delays to output (in order): 155ps, 382ps, 
360ps, 275ps.  V4 4LUT:  165ps, 165ps, 165ps, 165ps. (fastest speed 
grades, both companies).

Then there is the interconnect.  V4 is 500 ps faster for full chip 
routes, 400 ps faster for 1/2 chip routes, 100-200 ps faster for a few 
CLBs, LABs, and 100-200ps for neighbor routes.  Some very short routes 
are 30ps better in S2.

Below 32 bits, S2 is slightly better for an adder, and over 32 bits, V4 
is better.  Same for cary chain, where S2 is ~ 200 ps better at ~ 16 
bits, and V4 is >500ps better at 48 bits, and longer carry chains (equal 
at 24 bits).

In our suite of test designs, we come out ~9% faster (on average) with a 
  +/- 4% error margin.  Of course some designs will be faster than that, 
and some slower, too.  We generally favor wider arithemetic, and 
pipelining, where S2 favors empty designs, and small arithemetic 
functions.  We tend to excell when the design gets full, and complex 
(like it does at the end of your project!).

BRAM functionality depends a lot on the use of registers, as use of the 
fabric registers really slows things down (and takes more power) than 
using the registers built into the BRAM.  Of course, anythign you can 
direct into the DSP48s will just scream, and outperform anything S2 has.

I think that the newsgroup here will basically tell you to try a design 
in both architectures, and play with the constraints to see how well it 
does.

Or, what I prefer, is to contact the FAEs of the respective companies, 
and ask them to show you how your design will perform (let them drive 
the tools).

Or, do both.

Austin


Re: V4 vs. Stratix-II... - Tommy Thorn - 2005-05-12 19:29:00

Austin Lesea wrote:
> Joseph,
> 
> I just saw a presentation that shows that V4 is faster on all 
> interconnet paths (by as much as 500 ps for long paths) except the 
> immediate neighbor paths, where we are just ever to slightly slower than 
> S2 neighbor paths.

...(lots of numbers deleted)...

Without detailing what you're comparing (ie., which device at which 
speed grade) none of this is meaningful.

Tommy -- not affiliated with either fighting bulls.
______________________________
Join the blogging team on FPGARelated.com and earn rewards! Details Here.

Re: V4 vs. Stratix-II... - austin - 2005-05-12 21:40:00

Tommy,

I thought I was clear, fastest speed grade, S2 and V4.

Austin

Tommy Thorn wrote:

> Austin Lesea wrote:
> 
>> Joseph,
>>
>> I just saw a presentation that shows that V4 is faster on all 
>> interconnet paths (by as much as 500 ps for long paths) except the 
>> immediate neighbor paths, where we are just ever to slightly slower 
>> than S2 neighbor paths.
> 
> 
> ...(lots of numbers deleted)...
> 
> Without detailing what you're comparing (ie., which device at which 
> speed grade) none of this is meaningful.
> 
> Tommy -- not affiliated with either fighting bulls.

Re: V4 vs. Stratix-II... - Jim Granville - 2005-05-12 23:34:00

Austin Lesea wrote:
<snip>
> For example:  S2 4LUT input delays to output (in order): 155ps, 382ps, 
> 360ps, 275ps.  V4 4LUT:  165ps, 165ps, 165ps, 165ps. (fastest speed 
> grades, both companies).

Since this is side-by-side, I was wondering why Xilinx spec all paths 
the same.

Is that actually the worst path, and then the SW is
free to use any path ?
[but your physical speed margin might change, on a re-route]

Or is there really such a difference in the implementation that Xilinx's
end up precisely identical, and Altera's vary over 2:1 ?

-jg

______________________________
Join the blogging team on FPGARelated.com and earn rewards! Details Here.

Re: V4 vs. Stratix-II... - John M - 2005-05-13 09:46:00

Joesph,

I agree with Ben.  With so many variables and so much marketing B.S.,
your best bet is to compile using both a V4 and SII.  I've found that
performance is highly dependent on implementation, synthesis tools, and
how full the device is.  These are all variables outside of your FPGA
vendor selection.  You also note that you're probably going with the
slowest speed grade, so I assume cost is an issue.  A true comparison
cannot be made with cost included.  In addition, you should also
consider whether EasyPath for Xilinx or Hardcopy for Altera are
alternatives to help lower your cost.  Finally, I would like to make
one point about interconnect.  Who cares if V4 or SII is slightly
faster?  It's the routing software that is going to make the major
difference.  Whichever software requires me to do the least amount of
floorplanning is the one that wins.  Also, how well does the software
perform as the chip gets full?  Personally, I think the floorplanning
tools of ISE are easier to use than Quartus.  However, I think Quartus
does a much better job at placement and routing as a design gets very
full (>90% utilization).

John

______________________________
Join the blogging team on FPGARelated.com and earn rewards! Details Here.

Re: V4 vs. Stratix-II...fabric only thread...LUT details... - Austin Lesea - 2005-05-13 10:50:00

Jim,

I have been corrected by many.  No, they are not all the same (in the 
hardware, and as an IC designer, I already knew that).  However, in the 
past they were treated as all equal (for efficiency, finding and using 
the faster path is not necessarily a big benefit).

I do not know if the paths are treated the same or not (on the 4LUT) in 
V4 p&r.  I am sure someone will tell me (now).

I think the point I was trying to make is that the 4LUT is faster than 
the ALM for a class of functions (4 inputs or less), and slower for 
wider functions (on some pins).  So, the quality of the synthesis, 
followed by the place and route (constraints) will make a huge 
difference in the performance.

I have been told that for every design that is better in S2, after some 
work, can be made even better than S2 in V4.  I do not doubt that Altera 
can, and does, make the exact same claim.

I disagree that the ultimate (best) performance in S2 is better, as that 
is not what our research has shown.  Again, Altera has their own suite 
of XX designs that they use to benchmark their device, and they also 
make exactly the same claim.

Given the state of the marketing wars (see the "mine is...." thread), I 
think I'll stay safely in the engineering camp, and say:  if you are 
really adamant about comparing the two, go take your finished design, 
and run it through both design tools, and make your own decision.  Our 
FAEs are available to help you with that chore.

And please take into account that we offer:  DSP48, EMAC, PPC, FIFO-BRAM 
that can be used to even greater advantage.

Austin
______________________________
Join the blogging team on FPGARelated.com and earn rewards! Details Here.

Re: V4 vs. Stratix-II...fabric only thread...LUT details... - Rudolf Usselmann - 2005-05-13 12:07:00

Austin Lesea wrote:

> Jim,
> 
...
> 
> I disagree that the ultimate (best) performance in S2 is better, as that
> is not what our research has shown.  Again, Altera has their own suite
> of XX designs that they use to benchmark their device, and they also
> make exactly the same claim.

Austin,

to settle this argument once and for all, why not take a bunch
of designs that are freely available on OpenCores, and present
utilization and performance reports without doing any tweaking
of the designs ? There are many VHDL and Verilog deigns available
on OpenCores from CPUs, to Crypto cores to communication cores.

Both companies could present their own results including with
a script as to how to reproduce the results, in case somebody
wanted to double check.

If you could agree to do this fir Xilinx, and perhaps we ghet a
volunteer from the Altera Camp, we can openly chose some designs ...

Best Regards,
rudi
=============================================================
Rudolf Usselmann,  ASICS World Services,  http://www.asics.ws
Your Partner for IP Cores, Design, Verification and Synthesis

| 1 | 2 | 3 | next