FPGARelated.com
Forums

Combinatorial logic delay plus routing delay exceeds clock period

Started by Sam Worth June 1, 2008
Generally speaking, it is not good, if a module's combinatoral logical delay 
plus the routing delay exceed the clock period.

A certain module calculates a 100-bit accumulation in approxinately 5ns (in 
a Virtex 5LXT-1). However, the design clock period is 4ns.

I could break the sum logic into two pieces, register the high-50-bits, 
adding the low-50-bits into a register, adding the registered high-50-bits 
with carry from the low-50-bits, and finally concatenate the high-50-bit sum 
with the registered low-50-bit registered sum. That would satifiy timing 
requirements. Howerver, the extra 100-bits of register resources does not 
seem offer a benefit and likely increases the likelyhood that a larger 
device will be required. There could be hundreds of these 100-bit adders in 
the design.

So, the module calculates the 100-bit accumulation in one go, taking 5ns and 
100 less registers than the preceeding example. Obviously, the input terms 
must remain stable for two clock periods for the sum to be valid and this is 
accounted for in the upper logic layers.

My issue is how to tell the ISE tool-chain that the 5ns total delay is 
acceptable in these modules. The Constraints Guide may cover this issue, but 
I do not see it. Can someone give me an example of the appropriate 
constraint usage?

Thanks in advance.

- Sam 


On Jun 1, 5:00=A0pm, "Sam Worth" <no-re...@some.org> wrote:
> Generally speaking, it is not good, if a module's combinatoral logical del=
ay
> plus the routing delay exceed the clock period. > > A certain module calculates a 100-bit accumulation in approxinately 5ns (i=
n
> a Virtex 5LXT-1). However, the design clock period is 4ns. > > I could break the sum logic into two pieces, register the high-50-bits, > adding the low-50-bits into a register, adding the registered high-50-bits=
> with carry from the low-50-bits, and finally concatenate the high-50-bit s=
um
> with the registered low-50-bit registered sum. That would satifiy timing > requirements. Howerver, the extra 100-bits of register resources does not > seem offer a benefit and likely increases the likelyhood that a larger > device will be required. There could be hundreds of these 100-bit adders i=
n
> the design. > > So, the module calculates the 100-bit accumulation in one go, taking 5ns a=
nd
> 100 less registers than the preceeding example. Obviously, the input terms=
> must remain stable for two clock periods for the sum to be valid and this =
is
> accounted for in the upper logic layers. > > My issue is how to tell the ISE tool-chain that the 5ns total delay is > acceptable in these modules. The Constraints Guide may cover this issue, b=
ut
> I do not see it. Can someone give me an example of the appropriate > constraint usage? > > Thanks in advance. > > - Sam
I would create two accumulators and drive their Enable inputs at half frequency, in counterphase. Peter Alfke
"Sam Worth" <no-reply@some.org> wrote in message 
news:4843384c$0$12931$4c368faf@roadrunner.com...
> > So, the module calculates the 100-bit accumulation in one go, taking 5ns > and 100 less registers than the preceeding example. Obviously, the input > terms must remain stable for two clock periods for the sum to be valid and > this is accounted for in the upper logic layers. > > My issue is how to tell the ISE tool-chain that the 5ns total delay is > acceptable in these modules. The Constraints Guide may cover this issue, > but I do not see it. Can someone give me an example of the appropriate > constraint usage? > > Thanks in advance. > > - Sam
Hi Sam, Read the constraints guide. Also, Google this:- multi cycle path NET "clock_en" TNM=FFS "clock_enable"; TIMESPEC TS1 = FROM : clock_enable : TO : clock_enable : 8ns ; HTH., Syms.
Sam Worth wrote:

> Generally speaking, it is not good, if a module's combinatoral logical delay > plus the routing delay exceed the clock period.
I believe there have been designs that depended on the delay being long enough.
> A certain module calculates a 100-bit accumulation in approxinately 5ns (in > a Virtex 5LXT-1). However, the design clock period is 4ns.
I would say that 5ns is too close, and for FPGA that you can't predict the times that well, anyway. There are many stories about Cray and his computers related to logic delay. (There is also the Cray-2 resonant box story, where to accomplish a 4ns clock cycle they designed a box resonant at 250MHz. Unfortunately, the clock cycle ended up being 4.2ns, and much delay in product release.)
> I could break the sum logic into two pieces, register the high-50-bits, > adding the low-50-bits into a register, adding the registered high-50-bits > with carry from the low-50-bits, and finally concatenate the high-50-bit sum > with the registered low-50-bit registered sum. That would satifiy timing > requirements. Howerver, the extra 100-bits of register resources does not > seem offer a benefit and likely increases the likelyhood that a larger > device will be required. There could be hundreds of these 100-bit adders in > the design.
FPGAs usually have so many FF's that registers are free. Without knowing anything else about the design, though, pipelining is usually a good thing. The question always is where to put the pipeline registers for best effect.
> So, the module calculates the 100-bit accumulation in one go, taking 5ns and > 100 less registers than the preceeding example. Obviously, the input terms > must remain stable for two clock periods for the sum to be valid and this is > accounted for in the upper logic layers.
Is there no other loss in reducing the clock rate for that part of the design? -- glen
"Peter Alfke" <alfke@sbcglobal.net> wrote in message 
news:1778944e-c20b-4263-9979-de97a3ad80d7@z24g2000prf.googlegroups.com...
I would create two accumulators and drive their Enable inputs at half
frequency, in counterphase.
Peter Alfke

That sounds interesting and something I would like to learn about. Do you 
have an example of such technique?

- Sam 


Sam Worth wrote:
> "Peter Alfke" <alfke@sbcglobal.net> wrote in message > I would create two accumulators and drive their Enable inputs at half > frequency, in counterphase. > Peter Alfke > > That sounds interesting and something I would like to learn about. Do you > have an example of such technique?
Follow the red wire here for two register banks on opposite enable phases: http://mysite.verizon.net/miketreseler/count_enable.pdf -- Mike Treseler
Mike Treseler wrote:
> Sam Worth wrote: >> "Peter Alfke" <alfke@sbcglobal.net> wrote in message I would create >> two accumulators and drive their Enable inputs at half >> frequency, in counterphase. >> Peter Alfke >> >> That sounds interesting and something I would like to learn about. Do >> you have an example of such technique? > > Follow the red wire here for two register banks > on opposite enable phases: > http://mysite.verizon.net/miketreseler/count_enable.pdf
sorry. Make that: http://mysite.verizon.net/miketreseler/stack.pdf -- Mike Treseler
"Symon" <symon_brewer@hotmail.com> wrote in message 
news:g1vebu$q48$1@aioe.org...
> Hi Sam, > > Read the constraints guide. Also, Google this:- > multi cycle path > > NET "clock_en" TNM=FFS "clock_enable"; > TIMESPEC TS1 = FROM : clock_enable : TO : clock_enable : 8ns ; > > HTH., Syms.
Thank, Symon. That did the trick. - Sam
On Jun 2, 8:35=A0am, "Sam Worth" <no-re...@some.org> wrote:
> "Symon" <symon_bre...@hotmail.com> wrote in message > > news:g1vebu$q48$1@aioe.org... > > > Hi Sam, > > > Read the constraints guide. Also, Google this:- > > multi cycle path > > > NET "clock_en" TNM=3DFFS "clock_enable"; > > TIMESPEC TS1 =3D FROM : clock_enable : TO : clock_enable : 8ns =A0; > > > HTH., Syms. > > Thank, Symon. That did the trick. > > - Sam
Sam, here is an even simpler solution that works if you accumulate for many clock ticks and can sacrifice two or three clock ticks before you get the result. You just divide the long accumulator into 2, 3, or 4 parts, with a single carry flip-flop between (you thus pipeline the carry signal) Then, at the end, you use 1, 2, or 3 clock ticks to flush the carry through the accumulator. It costs you no additional hardware at all, (Virtex-5 has the pipeline flip-flop built-in) and it runs as fast as a short accumulator. You pay with the latency at the end. There is no free lunch... Peter Alfke
"Peter Alfke" <peter@xilinx.com> wrote in message 
news:182802ac-178e-46b6-92c7-
Sam, here is an even simpler solution that works if you accumulate for
many clock ticks and can sacrifice two or three clock ticks before you
get the result.
You just divide the long accumulator into 2, 3, or 4 parts, with a
single carry flip-flop between (you thus pipeline the carry signal)
Then, at the end, you use 1, 2, or 3 clock ticks to flush the carry
through the accumulator. It costs you no additional hardware at all,
(Virtex-5 has the pipeline flip-flop built-in) and it runs as fast as
a short accumulator. You pay with the latency at the end. There is no
free lunch...
Peter Alfke

Thanks, Peter.
The added latency is fine. I am already fine with 2 ticks as it is. But, I 
do not understand what you mean by, "flush the carry through the 
accumulator". Is there an HDL example you can refer to?

Thanks in advance.
- Sam