Generally speaking, it is not good, if a module's combinatoral logical delay plus the routing delay exceed the clock period. A certain module calculates a 100-bit accumulation in approxinately 5ns (in a Virtex 5LXT-1). However, the design clock period is 4ns. I could break the sum logic into two pieces, register the high-50-bits, adding the low-50-bits into a register, adding the registered high-50-bits with carry from the low-50-bits, and finally concatenate the high-50-bit sum with the registered low-50-bit registered sum. That would satifiy timing requirements. Howerver, the extra 100-bits of register resources does not seem offer a benefit and likely increases the likelyhood that a larger device will be required. There could be hundreds of these 100-bit adders in the design. So, the module calculates the 100-bit accumulation in one go, taking 5ns and 100 less registers than the preceeding example. Obviously, the input terms must remain stable for two clock periods for the sum to be valid and this is accounted for in the upper logic layers. My issue is how to tell the ISE tool-chain that the 5ns total delay is acceptable in these modules. The Constraints Guide may cover this issue, but I do not see it. Can someone give me an example of the appropriate constraint usage? Thanks in advance. - Sam
Combinatorial logic delay plus routing delay exceeds clock period
Started by ●June 1, 2008
Reply by ●June 1, 20082008-06-01
On Jun 1, 5:00=A0pm, "Sam Worth" <no-re...@some.org> wrote:> Generally speaking, it is not good, if a module's combinatoral logical del=ay> plus the routing delay exceed the clock period. > > A certain module calculates a 100-bit accumulation in approxinately 5ns (i=n> a Virtex 5LXT-1). However, the design clock period is 4ns. > > I could break the sum logic into two pieces, register the high-50-bits, > adding the low-50-bits into a register, adding the registered high-50-bits=> with carry from the low-50-bits, and finally concatenate the high-50-bit s=um> with the registered low-50-bit registered sum. That would satifiy timing > requirements. Howerver, the extra 100-bits of register resources does not > seem offer a benefit and likely increases the likelyhood that a larger > device will be required. There could be hundreds of these 100-bit adders i=n> the design. > > So, the module calculates the 100-bit accumulation in one go, taking 5ns a=nd> 100 less registers than the preceeding example. Obviously, the input terms=> must remain stable for two clock periods for the sum to be valid and this =is> accounted for in the upper logic layers. > > My issue is how to tell the ISE tool-chain that the 5ns total delay is > acceptable in these modules. The Constraints Guide may cover this issue, b=ut> I do not see it. Can someone give me an example of the appropriate > constraint usage? > > Thanks in advance. > > - SamI would create two accumulators and drive their Enable inputs at half frequency, in counterphase. Peter Alfke
Reply by ●June 1, 20082008-06-01
"Sam Worth" <no-reply@some.org> wrote in message news:4843384c$0$12931$4c368faf@roadrunner.com...> > So, the module calculates the 100-bit accumulation in one go, taking 5ns > and 100 less registers than the preceeding example. Obviously, the input > terms must remain stable for two clock periods for the sum to be valid and > this is accounted for in the upper logic layers. > > My issue is how to tell the ISE tool-chain that the 5ns total delay is > acceptable in these modules. The Constraints Guide may cover this issue, > but I do not see it. Can someone give me an example of the appropriate > constraint usage? > > Thanks in advance. > > - SamHi Sam, Read the constraints guide. Also, Google this:- multi cycle path NET "clock_en" TNM=FFS "clock_enable"; TIMESPEC TS1 = FROM : clock_enable : TO : clock_enable : 8ns ; HTH., Syms.
Reply by ●June 1, 20082008-06-01
Sam Worth wrote:> Generally speaking, it is not good, if a module's combinatoral logical delay > plus the routing delay exceed the clock period.I believe there have been designs that depended on the delay being long enough.> A certain module calculates a 100-bit accumulation in approxinately 5ns (in > a Virtex 5LXT-1). However, the design clock period is 4ns.I would say that 5ns is too close, and for FPGA that you can't predict the times that well, anyway. There are many stories about Cray and his computers related to logic delay. (There is also the Cray-2 resonant box story, where to accomplish a 4ns clock cycle they designed a box resonant at 250MHz. Unfortunately, the clock cycle ended up being 4.2ns, and much delay in product release.)> I could break the sum logic into two pieces, register the high-50-bits, > adding the low-50-bits into a register, adding the registered high-50-bits > with carry from the low-50-bits, and finally concatenate the high-50-bit sum > with the registered low-50-bit registered sum. That would satifiy timing > requirements. Howerver, the extra 100-bits of register resources does not > seem offer a benefit and likely increases the likelyhood that a larger > device will be required. There could be hundreds of these 100-bit adders in > the design.FPGAs usually have so many FF's that registers are free. Without knowing anything else about the design, though, pipelining is usually a good thing. The question always is where to put the pipeline registers for best effect.> So, the module calculates the 100-bit accumulation in one go, taking 5ns and > 100 less registers than the preceeding example. Obviously, the input terms > must remain stable for two clock periods for the sum to be valid and this is > accounted for in the upper logic layers.Is there no other loss in reducing the clock rate for that part of the design? -- glen
Reply by ●June 2, 20082008-06-02
"Peter Alfke" <alfke@sbcglobal.net> wrote in message news:1778944e-c20b-4263-9979-de97a3ad80d7@z24g2000prf.googlegroups.com... I would create two accumulators and drive their Enable inputs at half frequency, in counterphase. Peter Alfke That sounds interesting and something I would like to learn about. Do you have an example of such technique? - Sam
Reply by ●June 2, 20082008-06-02
Sam Worth wrote:> "Peter Alfke" <alfke@sbcglobal.net> wrote in message > I would create two accumulators and drive their Enable inputs at half > frequency, in counterphase. > Peter Alfke > > That sounds interesting and something I would like to learn about. Do you > have an example of such technique?Follow the red wire here for two register banks on opposite enable phases: http://mysite.verizon.net/miketreseler/count_enable.pdf -- Mike Treseler
Reply by ●June 2, 20082008-06-02
Mike Treseler wrote:> Sam Worth wrote: >> "Peter Alfke" <alfke@sbcglobal.net> wrote in message I would create >> two accumulators and drive their Enable inputs at half >> frequency, in counterphase. >> Peter Alfke >> >> That sounds interesting and something I would like to learn about. Do >> you have an example of such technique? > > Follow the red wire here for two register banks > on opposite enable phases: > http://mysite.verizon.net/miketreseler/count_enable.pdfsorry. Make that: http://mysite.verizon.net/miketreseler/stack.pdf -- Mike Treseler
Reply by ●June 2, 20082008-06-02
"Symon" <symon_brewer@hotmail.com> wrote in message news:g1vebu$q48$1@aioe.org...> Hi Sam, > > Read the constraints guide. Also, Google this:- > multi cycle path > > NET "clock_en" TNM=FFS "clock_enable"; > TIMESPEC TS1 = FROM : clock_enable : TO : clock_enable : 8ns ; > > HTH., Syms.Thank, Symon. That did the trick. - Sam
Reply by ●June 2, 20082008-06-02
On Jun 2, 8:35=A0am, "Sam Worth" <no-re...@some.org> wrote:> "Symon" <symon_bre...@hotmail.com> wrote in message > > news:g1vebu$q48$1@aioe.org... > > > Hi Sam, > > > Read the constraints guide. Also, Google this:- > > multi cycle path > > > NET "clock_en" TNM=3DFFS "clock_enable"; > > TIMESPEC TS1 =3D FROM : clock_enable : TO : clock_enable : 8ns =A0; > > > HTH., Syms. > > Thank, Symon. That did the trick. > > - SamSam, here is an even simpler solution that works if you accumulate for many clock ticks and can sacrifice two or three clock ticks before you get the result. You just divide the long accumulator into 2, 3, or 4 parts, with a single carry flip-flop between (you thus pipeline the carry signal) Then, at the end, you use 1, 2, or 3 clock ticks to flush the carry through the accumulator. It costs you no additional hardware at all, (Virtex-5 has the pipeline flip-flop built-in) and it runs as fast as a short accumulator. You pay with the latency at the end. There is no free lunch... Peter Alfke
Reply by ●June 3, 20082008-06-03
"Peter Alfke" <peter@xilinx.com> wrote in message news:182802ac-178e-46b6-92c7- Sam, here is an even simpler solution that works if you accumulate for many clock ticks and can sacrifice two or three clock ticks before you get the result. You just divide the long accumulator into 2, 3, or 4 parts, with a single carry flip-flop between (you thus pipeline the carry signal) Then, at the end, you use 1, 2, or 3 clock ticks to flush the carry through the accumulator. It costs you no additional hardware at all, (Virtex-5 has the pipeline flip-flop built-in) and it runs as fast as a short accumulator. You pay with the latency at the end. There is no free lunch... Peter Alfke Thanks, Peter. The added latency is fine. I am already fine with 2 ticks as it is. But, I do not understand what you mean by, "flush the carry through the accumulator". Is there an HDL example you can refer to? Thanks in advance. - Sam





