FPGARelated.com
Forums

New(?) fast binary counter for FPGAs without carry logic (e.g. Actel) -- Request For Comment

Started by robotron September 11, 2012
On Wednesday, September 12, 2012 6:40:30 PM UTC+2, rickman wrote:
> (..) > If the counter is free running, you really only need to phase each bit > correctly. The first bit is 0, 1, 0, 1... so there are only two phases, > either one or no FFs to delay it rather than n-1. The second bit has a > pattern of four states so the delay is modulo 4 and can be 0, 1, 2 or 3 > rather than n-2. Does that help? It should take some of the sting out > of a long counter.
Yes, but not much. As I wrote, it's (n-k) mod 2^k.
> (..) > Have you thought of switching to a device with a built in carry chain?
Yes and no. Currently, the Actel architecture suits our needs for aerospace, radiation tolerant design quite well. Of course we will port our design to other families, e.g. (much slower) Atmel and (more recent) Spartan-6. I only tried to mention the idea -- because it may be useful somewhere again, in general. Marek
robotron wrote:
> Dear Gabor, > > On Wednesday, September 12, 2012 4:17:56 PM UTC+2, Gabor wrote: >> Just to see if this has some application in Xilinx FPGA's I gave it >> a whirl in a Spartan 6. For a 32-bit counter with registered inputs >> and only the final p and q going offchip (again with additional >> registers) the best I could do in the -3 speed grade was 425 MHz. >> The same size counter and architecture (including a carry out) >> using the built-in carry chain logic for a normal binary counter >> resulted in more than 470 MHz. Looking through the timing numbers >> it appears that routing delays for this counter negate any help >> you might get by losing the carry chain (in Spartan 6). I imagine >> it would be a win in a CPLD (if you have the extra macrocells >> for the 2x register count). In the past I have used LFSR's for >> long counters in CPLD's - partly for speed, but mostly because >> of the reduced connectivity requirements. > > It seems the dedicated carry logic is of real help there. > OK, it makes no sense for these FPGAs. > > It is certainly *much* better at Actel/MicroSemi, we have already put it into our recent design. > > Thank you very much for the try. > Marek
Yes, the Spartan 6 look-ahead carry logic is quite fast, and in addition to the fast gates it has its own fast dedicated routing. Still 425 MHz is likely to be much faster than a typically achievable system speed for all but the most carefully tuned designs. And it turns out that having two flip-flops plus a LUT matches the slice architecture of the newer Xilinx parts, so the resource usage at the slice level is not bad for the pcounter. In fact if I get rid of the async reset in my code, the pcounter and the standard carry-chain counter use the same resources (8 slices for 32 bits). Getting back to the inversion issue, it seems to me that if you want to work back to a binary number and also use the enable input, you would need to base the inversion on the p bits as well as q bits, or else base it on the history of the en input as well as the q bits. Essentially, knowing the current p bit values allows the software to finish the carry propagation. -- Gabor
Am Dienstag, 11. September 2012 20:47:01 UTC+2 schrieb robotron:

>=20 > - does this design exists, is it being used, and if so, what is its name?
If I understand your description correctly it is a carry save accumulator: http://en.wikipedia.org/wiki/Carry-save_adder Quote: "To put it another way, we are taking a carry digit from the positio= n on our right, and passing a carry digit to the left, just as in conventio= nal addition; but the carry digit we pass to the left is the result of the = previous calculation and not the current one. In each clock cycle, carries = only have to move one step along, and not n steps as in conventional additi= on." Kolja Sulimma www.cronologic.de
robotron wrote:
> > > I've seen this used before. =A0They added delay lines after the > > counter bits to produce a count output that is simple binary. > <snip> > 1. *please*, could you find the original work? >
Similar sorts of carry pipelining were common in the early Xilinx XC2000/3000 parts; I recall there being some fast counter techniques in application notes and Xcell journals of that era. Pipelined carry chains at one or two bits per carry were also commonly used for accumulators and counters in the GHz clock rate GaAs standard cell GaAs designs that I worked on in the early 90's. The pictures from the following TriQuint patent show a few variants of the input/output deskew trees that can be implemented for delay equalization of a loadable accumulator having carry pipelining: http://www.google.com/patents/US5140540 ( disclaimer : I worked with some of the authors back when I was doing a foundry design through TriQuint ) - Brian
Dear colleagues,

thank you for the pointers to prior art.
I have included link to this newsgroup thread to the project page.

Best regards,
Marek
Here are some more links regarding counter & accumulator 
carry techniques.

--------------------
Links to early Xilinx counter app notes:

Ultra-Fast Synchronous Counters in XC3000 & XC4000 FPGAs
http://www.cs.york.ac.uk/rts/docs/Xilinx-datasource-2003-q1/appnotes/xapp014.pdf

Loadable Binary Counters in a XC3000 FPGA
http://www.cs.york.ac.uk/rts/docs/Xilinx-datasource-2003-q1/appnotes/xapp004.pdf

pages 15-18 of Xcell Journal #7
http://www.xilinx.com/publications/archives/xcell/Xcell7.pdf

--------------------
Haven't found a pdf for XAPP 001 yet:
"
" High-Speed Synchronous Prescaler Counter
" (XAPP 001)
"
" This simple design provides a very basic non-loadable,
" up counter with a count-enable control. However, this 
" simplicity permits it to be both the densest and the 
" second fastest design.
"
" A prescaler (CEP/CET) technique is used to gain speed,
" permitting the ripple-carry portion of the counter 
" eight clock periods in which to settle. Without special 
" adaptation, however, this technique precludes loading 
" the counter. As a non-loadable counter, three bits can 
" be implemented in three CLBs (1 CLB/bit), with the least 
" significant six bits requiring only four CLBs; this 
" explains the compactness. Only one TILO delay is incurred
" in the ripple-carry path for each three bits. 
"
 
This technique of making the low N bits run fast, with 
the upper bits running slower by 2^N, should map well 
into a compact yet fast implementation of a non-loadable 
binary counter for your Actel part.

 I.e., use something like the pcounter scheme for the low 
few bits, then make the upper bits with a ripple carry,
enabled by the carry out of the low bits.

 You probably will need to add special timing constraints 
to get the tools to understand the multicycle carry, and
that the ripple chain is a false path after FF reset.

 The advantage of this is that you would now only need to
deskew N LSB's of the counter for straight binary output.

--------------------
ORCA-3 FPGAs had an optional register in the dedicated carry chain:

" Fast-carry logic and routing to adjacent PFUs for 
" nibble-wide, byte-wide, or longer arithmetic functions,
" with the option to register the PFU carry-out.

--------------------
More carry-pipelined accumulator references:

( I've mentioned accumulators because they are a more 
general carry design problem than are counters, and 
because I know where to look for literature describing 
high speed pipelined versions.)

"Direct Digital Synthesizers: Theory, Design and Applications", Vankka
lib.tkk.fi/Diss/2000/isbn9512253186/isbn9512253186.pdf
See pages 48-49 for accumulator pipelining techniques.


"Single Chip 500 MHz Function Generator"
P.H. Saul, W. Barber, D.G. Taylor, T. Ward
IEE Proceedings, Vol. 138, No. 2, pp 239-243, April 1991

Reprinted in "Direct Digital Frequency Synthesizers", Kroupa (ed), IEEE Press, 1999

Fig. 2 shows the one-bit-per-carry accumulator structure
Fig. 5 shows the accumulator output deskew tree

-Brian