FPGARelated.com
Forums

Spartan 3A counter speed ?

Started by Jon Elson March 23, 2012
Hello,

Does anybody have a very rough estimate of how fast
you can run a 32-bit counter in a Spartan 3AN FPGA?

Thanks,

Jon
[This followup was posted to comp.arch.fpga and a copy was sent to the 
cited author.]

In article <XtKdnbZLV6lQVPHSnZ2dnUVZ_oadnZ2d@giganews.com>, 
jmelson@wustl.edu says...
> > Hello, > > Does anybody have a very rough estimate of how fast > you can run a 32-bit counter in a Spartan 3AN FPGA? > > Thanks, > > Jon
The answer to your question will be had by reading the Xilinx Product Data Sheets for the basic information which comes in a modular format including timing for I/O, logic cells, routing and globally distributed networks. The actual timing achievable will be determined by the specific design of the counter and how that design is instantiated in the part. -- Michael Karas Carousel Design Solutions http://www.carousel-design.com
Michael Karas wrote:

> [This followup was posted to comp.arch.fpga and a copy was sent to the > cited author.] > > In article <XtKdnbZLV6lQVPHSnZ2dnUVZ_oadnZ2d@giganews.com>, > jmelson@wustl.edu says... >> >> Hello, >> >> Does anybody have a very rough estimate of how fast >> you can run a 32-bit counter in a Spartan 3AN FPGA? >> >> Thanks, >> >> Jon > > The answer to your question will be had by reading the Xilinx Product > Data Sheets for the basic information which comes in a modular format > including timing for I/O, logic cells, routing and globally distributed > networks. The actual timing achievable will be determined by the > specific design of the counter and how that design is instantiated in > the part. >
Thanks, that's as close to a non-answer as you can get. The tricky part is the carry chain for long counters, and they really don't give you much info there, unless there's a secret manual I have not been able to find. Jon
On Mon, 26 Mar 2012 14:33:33 -0500
Jon Elson <jmelson@wustl.edu> wrote:

> Thanks, that's as close to a non-answer as you can get. The tricky > part is the carry chain for long counters, and they really don't > give you much info there, unless there's a secret manual I have > not been able to find. > > Jon
If I recall correctly, carry chain propagation was on the order of 700 ps/2 bits, but don't quote me on that. Part of the problem is that I don't think your question is answerable in the general case of "Spartan 3A". Different device sizes may or may not allow you to run 32-bits all on one carry chain. If you have to use two columns instead of just one, the additional performance hit of that next bit would be substantial. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.
Jon Elson <jmelson@wustl.edu> wrote:

(snip)
>> The answer to your question will be had by reading the Xilinx Product >> Data Sheets for the basic information which comes in a modular format >> including timing for I/O, logic cells, routing and globally distributed >> networks. The actual timing achievable will be determined by the >> specific design of the counter and how that design is instantiated in >> the part.
> Thanks, that's as close to a non-answer as you can get. The tricky > part is the carry chain for long counters, and they really don't > give you much info there, unless there's a secret manual I have > not been able to find.
Well, also it depends on how you use the counter. If you need to be able to latch the bits from the counter, then the timing might depend on that, and not the counter itself. (In race terms, to be able to get lap times while the counter continues to run.) -- glen
Rob Gaddi wrote:
> On Mon, 26 Mar 2012 14:33:33 -0500 > Jon Elson <jmelson@wustl.edu> wrote: > >> Thanks, that's as close to a non-answer as you can get. The tricky >> part is the carry chain for long counters, and they really don't >> give you much info there, unless there's a secret manual I have >> not been able to find. >> >> Jon > > If I recall correctly, carry chain propagation was on the order of 700 > ps/2 bits, but don't quote me on that. > > Part of the problem is that I don't think your question is answerable > in the general case of "Spartan 3A". Different device sizes may or may > not allow you to run 32-bits all on one carry chain. If you have to > use two columns instead of just one, the additional performance hit of > that next bit would be substantial. >
Well, it took me about 5 minutes to code up a simple project with a 32-bit counter and enough registers to prevent other logic from being the worst-case path. In a XC3S50AN-5 there are enough rows to keep 32-bits in a single carry chain. With no constraints, the design built and reported 4.402 ns minimum clock period (after place & route) or about 227 MHz. YMMV -- Gabor
Gabor wrote:
> Rob Gaddi wrote: >> On Mon, 26 Mar 2012 14:33:33 -0500 >> Jon Elson <jmelson@wustl.edu> wrote: >> >>> Thanks, that's as close to a non-answer as you can get. The tricky >>> part is the carry chain for long counters, and they really don't >>> give you much info there, unless there's a secret manual I have >>> not been able to find. >>> >>> Jon >> >> If I recall correctly, carry chain propagation was on the order of 700 >> ps/2 bits, but don't quote me on that. >> >> Part of the problem is that I don't think your question is answerable >> in the general case of "Spartan 3A". Different device sizes may or may >> not allow you to run 32-bits all on one carry chain. If you have to >> use two columns instead of just one, the additional performance hit of >> that next bit would be substantial. >> > > Well, it took me about 5 minutes to code up a simple project with > a 32-bit counter and enough registers to prevent other logic from > being the worst-case path. In a XC3S50AN-5 there are enough rows > to keep 32-bits in a single carry chain. With no constraints, the > design built and reported 4.402 ns minimum clock period (after > place & route) or about 227 MHz. > > YMMV > > -- Gabor
Timing constraint: TS_clk = PERIOD TIMEGRP "clk" 4.4 ns HIGH 50%; For more information, see Period Analysis in the Timing Closure User Guide (UG612). 574 paths analyzed, 124 endpoints analyzed, 1 failing endpoint 1 timing error detected. (1 setup error, 0 hold errors, 0 component switching limit errors) Minimum period is 4.402ns. -------------------------------------------------------------------------------- Paths for end point count_31 (SLICE_X11Y23.CIN), 30 paths -------------------------------------------------------------------------------- Slack (setup path): -0.002ns (requirement - (data path - clock path skew + uncertainty)) Source: count_0 (FF) Destination: count_31 (FF) Requirement: 4.400ns Data Path Delay: 4.363ns (Levels of Logic = 16) Clock Path Skew: -0.039ns (0.230 - 0.269) Source Clock: clk_BUFGP rising at 0.000ns Destination Clock: clk_BUFGP rising at 4.400ns Clock Uncertainty: 0.000ns Maximum Data Path: count_0 to count_31 Location Delay type Delay(ns) Physical Resource Logical Resource(s) ------------------------------------------------- ------------------- SLICE_X11Y8.XQ Tcko 0.495 count<0> count_0 SLICE_X11Y8.F3 net (fanout=1) 0.318 count<0> SLICE_X11Y8.COUT Topcyf 1.026 count<0> Mcount_count_lut<0>_INV_0 Mcount_count_cy<0> Mcount_count_cy<1> SLICE_X11Y9.CIN net (fanout=1) 0.000 Mcount_count_cy<1> SLICE_X11Y9.COUT Tbyp 0.130 count<2> Mcount_count_cy<2> Mcount_count_cy<3> SLICE_X11Y10.CIN net (fanout=1) 0.000 Mcount_count_cy<3> SLICE_X11Y10.COUT Tbyp 0.130 count<4> Mcount_count_cy<4> Mcount_count_cy<5> SLICE_X11Y11.CIN net (fanout=1) 0.000 Mcount_count_cy<5> SLICE_X11Y11.COUT Tbyp 0.130 count<6> Mcount_count_cy<6> Mcount_count_cy<7> SLICE_X11Y12.CIN net (fanout=1) 0.000 Mcount_count_cy<7> SLICE_X11Y12.COUT Tbyp 0.130 count<8> Mcount_count_cy<8> Mcount_count_cy<9> SLICE_X11Y13.CIN net (fanout=1) 0.000 Mcount_count_cy<9> SLICE_X11Y13.COUT Tbyp 0.130 count<10> Mcount_count_cy<10> Mcount_count_cy<11> SLICE_X11Y14.CIN net (fanout=1) 0.000 Mcount_count_cy<11> SLICE_X11Y14.COUT Tbyp 0.130 count<12> Mcount_count_cy<12> Mcount_count_cy<13> SLICE_X11Y15.CIN net (fanout=1) 0.000 Mcount_count_cy<13> SLICE_X11Y15.COUT Tbyp 0.130 count<14> Mcount_count_cy<14> Mcount_count_cy<15> SLICE_X11Y16.CIN net (fanout=1) 0.000 Mcount_count_cy<15> SLICE_X11Y16.COUT Tbyp 0.130 count<16> Mcount_count_cy<16> Mcount_count_cy<17> SLICE_X11Y17.CIN net (fanout=1) 0.000 Mcount_count_cy<17> SLICE_X11Y17.COUT Tbyp 0.130 count<18> Mcount_count_cy<18> Mcount_count_cy<19> SLICE_X11Y18.CIN net (fanout=1) 0.000 Mcount_count_cy<19> SLICE_X11Y18.COUT Tbyp 0.130 count<20> Mcount_count_cy<20> Mcount_count_cy<21> SLICE_X11Y19.CIN net (fanout=1) 0.000 Mcount_count_cy<21> SLICE_X11Y19.COUT Tbyp 0.130 count<22> Mcount_count_cy<22> Mcount_count_cy<23> SLICE_X11Y20.CIN net (fanout=1) 0.000 Mcount_count_cy<23> SLICE_X11Y20.COUT Tbyp 0.130 count<24> Mcount_count_cy<24> Mcount_count_cy<25> SLICE_X11Y21.CIN net (fanout=1) 0.000 Mcount_count_cy<25> SLICE_X11Y21.COUT Tbyp 0.130 count<26> Mcount_count_cy<26> Mcount_count_cy<27> SLICE_X11Y22.CIN net (fanout=1) 0.000 Mcount_count_cy<27> SLICE_X11Y22.COUT Tbyp 0.130 count<28> Mcount_count_cy<28> Mcount_count_cy<29> SLICE_X11Y23.CIN net (fanout=1) 0.000 Mcount_count_cy<29> SLICE_X11Y23.CLK Tcinck 0.704 count<30> Mcount_count_cy<30> Mcount_count_xor<31> count_31 ------------------------------------------------- --------------------------- Total 4.363ns (4.045ns logic, 0.318ns route) (92.7% logic, 7.3% route) That's the worst-case path in the timing report after adding a period constraint of 4.4 ns. Same achievable period of 4.402... -- Gabor
Gabor wrote:


> Well, it took me about 5 minutes to code up a simple project with > a 32-bit counter and enough registers to prevent other logic from > being the worst-case path. In a XC3S50AN-5 there are enough rows > to keep 32-bits in a single carry chain. With no constraints, the > design built and reported 4.402 ns minimum clock period (after > place & route) or about 227 MHz.
OK, thanks very much. The request was to run a counter at 100 MHz, and it sounds like this is doable. There are some concerns about crossing clock boundaries that need to be figured out, but it looks like it can be done. Thanks VERY much for doing the leg work on this! I've been learning how to set up a GUI for the project, the one area I really didn't know enough about. The FPGA part seemed simple except for the performance. Jon
glen herrmannsfeldt wrote:


> Well, also it depends on how you use the counter. If you need to be > able to latch the bits from the counter, then the timing might > depend on that, and not the counter itself. (In race terms, to be > able to get lap times while the counter continues to run.)
Yes, this is part of the design. I guess one needs to make a constraint so that the counter latches get a coherent sample. I'm thinking I should synchronize the external clock for each counter to a 150 MHz internal clock, and use a clock edge detector in the external clock domain to activate the clock enable of the counter on the internal clock. Thanks Jon
Jon Elson <jmelson@wustl.edu> wrote:

(snip, I wrote)
>> Well, also it depends on how you use the counter. If you need to be >> able to latch the bits from the counter, then the timing might >> depend on that, and not the counter itself. (In race terms, to be >> able to get lap times while the counter continues to run.)
> Yes, this is part of the design. I guess one needs to make a > constraint so that the counter latches get a coherent sample.
Without that constraint, you might get to 300MHz or so. If S3A isn't so different from S3E, 100MHz shouldn't be so hard with the latch.
> I'm thinking I should synchronize the external clock for each > counter to a 150 MHz internal clock, and use a clock edge > detector in the external clock domain to activate the clock > enable of the counter on the internal clock.
I think that sounds right. You have to meet the setup and hold times for the latch. -- glen