Hello all, In my design i am using a 32 bit adder and some combinational logic after that. The full path i want to constrain to double the clock period (20ns) and it is not constraing. When analysed the critical path observed that there is big carry chain for the adder and a big routing delay between the combinational logic (which i never expected). Is the big carry chain is causing the trouble in the router. I am thinking of buffering the output of the adder with a -ve edge (constrain that path to 5ns). And then constrain the other path that is after the buffer to next stage FF to 16ns. Will this buffering ease the routing effort. Please advice. Thanks and regards Sumesh V S
Buffering the critical path.
Started by ●September 19, 2006
Reply by ●September 19, 20062006-09-19
What kind of device are you using? 20 ns for a 32-bit adder (using dedicated carry) would be ridiculously slow... Dedicated carry, available in all Xilinx FPGA devices, uses less than 50 ps per bit (plus some basic delay). Peter Alfke =========== vssumesh wrote:> Hello all, > In my design i am using a 32 bit adder and some combinational logic > after that. The full path i want to constrain to double the clock > period (20ns) and it is not constraing. When analysed the critical path > observed that there is big carry chain for the adder and a big routing > delay between the combinational logic (which i never expected). Is the > big carry chain is causing the trouble in the router. I am thinking of > buffering the output of the adder with a -ve edge (constrain that path > to 5ns). And then constrain the other path that is after the buffer to > next stage FF to 16ns. Will this buffering ease the routing effort. > Please advice. > Thanks and regards > Sumesh V S
Reply by ●September 19, 20062006-09-19
Peter Alfke wrote:> What kind of device are you using? > 20 ns for a 32-bit adder (using dedicated carry) would be ridiculously > slow... > Dedicated carry, available in all Xilinx FPGA devices, uses less than > 50 ps per bit (plus some basic delay). > Peter Alfke > > =========== > vssumesh wrote: > > Hello all, > > In my design i am using a 32 bit adder and some combinational logic > > after that. The full path i want to constrain to double the clock > > period (20ns) and it is not constraing. When analysed the critical path > > observed that there is big carry chain for the adder and a big routing > > delay between the combinational logic (which i never expected). Is the > > big carry chain is causing the trouble in the router. I am thinking of > > buffering the output of the adder with a -ve edge (constrain that path > > to 5ns). And then constrain the other path that is after the buffer to > > next stage FF to 16ns. Will this buffering ease the routing effort. > > Please advice. > > Thanks and regards > > Sumesh V SNo 20ns for the adder and the remaining combinational logic. The adder delay is as you said is very much less.
Reply by ●September 19, 20062006-09-19
Vessumesh, if you refuse to answer specific helpful questions, then I suggest you figure this out yourself, and do not bother this newsgroup. Peter vssumesh wrote:> Peter Alfke wrote: > > What kind of device are you using? > > 20 ns for a 32-bit adder (using dedicated carry) would be ridiculously > > slow... > > Dedicated carry, available in all Xilinx FPGA devices, uses less than > > 50 ps per bit (plus some basic delay). > > Peter Alfke > > > > =========== > > vssumesh wrote: > > > Hello all, > > > In my design i am using a 32 bit adder and some combinational logic > > > after that. The full path i want to constrain to double the clock > > > period (20ns) and it is not constraing. When analysed the critical path > > > observed that there is big carry chain for the adder and a big routing > > > delay between the combinational logic (which i never expected). Is the > > > big carry chain is causing the trouble in the router. I am thinking of > > > buffering the output of the adder with a -ve edge (constrain that path > > > to 5ns). And then constrain the other path that is after the buffer to > > > next stage FF to 16ns. Will this buffering ease the routing effort. > > > Please advice. > > > Thanks and regards > > > Sumesh V S > No 20ns for the adder and the remaining combinational logic. The adder > delay is as you said is very much less.
Reply by ●September 19, 20062006-09-19
Peter Alfke wrote:> Vessumesh, if you refuse to answer specific helpful questions, then I > suggest you figure this out yourself, and do not bother this newsgroup. > PeterPeter, He probably got thrown off by the top-posting format of your question. Forgive him. :) http://www.catb.org/jargon/html/T/top-post.html> A: No. > Q: Should I include quotations after my reply?-Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
Reply by ●September 19, 20062006-09-19
Geezz, doubt if he uses a Xnillix :) Peter Alfke wrote:> Vessumesh, if you refuse to answer specific helpful questions, then I > suggest you figure this out yourself, and do not bother this newsgroup. > Peter > > vssumesh wrote: > > Peter Alfke wrote: > > > What kind of device are you using? > > > 20 ns for a 32-bit adder (using dedicated carry) would be ridiculously > > > slow... > > > Dedicated carry, available in all Xilinx FPGA devices, uses less than > > > 50 ps per bit (plus some basic delay). > > > Peter Alfke > > > > > > =========== > > > vssumesh wrote: > > > > Hello all, > > > > In my design i am using a 32 bit adder and some combinational logic > > > > after that. The full path i want to constrain to double the clock > > > > period (20ns) and it is not constraing. When analysed the critical path > > > > observed that there is big carry chain for the adder and a big routing > > > > delay between the combinational logic (which i never expected). Is the > > > > big carry chain is causing the trouble in the router. I am thinking of > > > > buffering the output of the adder with a -ve edge (constrain that path > > > > to 5ns). And then constrain the other path that is after the buffer to > > > > next stage FF to 16ns. Will this buffering ease the routing effort. > > > > Please advice. > > > > Thanks and regards > > > > Sumesh V S > > No 20ns for the adder and the remaining combinational logic. The adder > > delay is as you said is very much less.
Reply by ●September 19, 20062006-09-19
Peter Alfke wrote:> Vessumesh, if you refuse to answer specific helpful questions, then I > suggest you figure this out yourself, and do not bother this newsgroup. > Peter > > vssumesh wrote: > >>Peter Alfke wrote: >> >>>What kind of device are you using? >>>20 ns for a 32-bit adder (using dedicated carry) would be ridiculously >>>slow... >>>Dedicated carry, available in all Xilinx FPGA devices, uses less than >>>50 ps per bit (plus some basic delay). >>>Peter Alfke >>> >>>=========== >>>vssumesh wrote: >>> >>>>Hello all, >>>> In my design i am using a 32 bit adder and some combinational logic >>>>after that. The full path i want to constrain to double the clock >>>>period (20ns) and it is not constraing. When analysed the critical path >>>>observed that there is big carry chain for the adder and a big routing >>>>delay between the combinational logic (which i never expected). Is the >>>>big carry chain is causing the trouble in the router. I am thinking of >>>>buffering the output of the adder with a -ve edge (constrain that path >>>>to 5ns). And then constrain the other path that is after the buffer to >>>>next stage FF to 16ns. Will this buffering ease the routing effort. >>>>Please advice. >>>>Thanks and regards >>>>Sumesh V S >> >>No 20ns for the adder and the remaining combinational logic. The adder >>delay is as you said is very much less. > >SOunds like there are several layers of combinatorial logic. Pipeline the design. Also, I think he is using the term "buffer" to indicate adding a register stage. The adder bits are like Peter said, about 50ps per bit, but the time to get on and off the carry chain adds more than 2ns, still nowhere near the 20ns. It isn't the carry chain causing the problem. The problem comes about from using many levels of logic (ie the signal goes through lots of LUTs) between the flip-flops plus the propagation delay associated with the carry chain. You need to look at the ratio of logic delay to routing delay. If the routing delay on the critical path is more than the logic delay, you can likely fix the problem with some manual placement. The placer does a very poor job placing the additional layers of LUTs in multi-layer combinatorial logic. The LUT connected to the flip-flop places well, but the LUTs leading up to that one get scattered to the far reaches of the chip. You could try a higher effort level on the placement, but that may not provide enough improvement. You'll get better results floorplanning the locations of the additional layers of LUTs to be laid out logically and close to the rest of the LUTs in the path. Trouble is, the LUT names are subject to change on subsequent synthesizer runs, so you have to be really careful. The best solution, if your design can support it, is to pipeline the logic deeper.
Reply by ●September 19, 20062006-09-19
> Peter Alfke wrote: > > Vessumesh, if you refuse to answer specific helpful questions, then I > > suggest you figure this out yourself, and do not bother this newsgroup. > > PeterSorry Peter, but i did not mean that. Sorry for the confusion. I am using v4LX60 for my design. And there is a requirement of adding two 37 bit no and doing some combinational logic based on that. The total time is 20ns. The adder is taking very little time, the full logic itself is taking around 4ns delay. But the main problem is with routing delay. I forgot to tell you that it is a block RAM based design. And it uses 128 BRAM frm v4lx60(implemented a 16 port RAM). Also it uses the block RAMS in a scattered manner. So now i have placed this block in the central region. So the last routing to the block RAMis taking lot of delays. In the previous version there was no combinational logic after the adder and i got the timig correctly. But not now. What i was asking is to add registers to latch the output of adder.I thought like it would be good for the PAR to see two paths insted of 1 path from a source FF to destination FF. Also Ray there is 32*16 such signals. Is it possible to manually route all those signals. I think the pipeling is not possible since this is part of a pipeline stage of a processor. Which expects the result in the same cycle. So pipelining is not an option.> It isn't the carry chain causing the problem. The problem comes about > from using many levels of logic (ie the signal goes through lots of > LUTs) between the flip-flops plus the propagation delay associated with > the carry chain.Ray i was asking that if we brake the above long line into separate parts using the +ve and -ve edge of the clocks is it possible to help tool for a better PAR. Thanks and regards Sumesh V S
Reply by ●September 20, 20062006-09-20
Sumesh, I have a special place in the dungeon for people who ask questions where they leave the most important details out, and tell us afterwards. "O, by the way..." You started mentioning address and long carry chains, which -as we know by now- are completely irrelevant to your problem. You have a big routing mess, and you are not allowed to pipeline. Tough luck! I think Ray has the best possible advice, but I do not see an easy solution. Look at how you arrange your Dual-Port RAMs, and how you can exchange data between them. Are there any unexplored addressing tricks? Have you looked at Virtex-5LX devices? They can perform not only arithmetic, but also logic in the DSP slice (also called the multiplier-accumulator). And they are available, as I posted yesterday (funny, neither praise nor outrage in the ng. Everyone asleep?) Good luck, you may need it! Peter ====================== vssumesh wrote:> > Peter Alfke wrote: > > > Vessumesh, if you refuse to answer specific helpful questions, then I > > > suggest you figure this out yourself, and do not bother this newsgroup. > > > Peter > Sorry Peter, but i did not mean that. Sorry for the confusion. > I am using v4LX60 for my design. And there is a requirement of adding > two 37 bit no and doing some combinational logic based on that. The > total time is 20ns. The adder is taking very little time, the full > logic itself is taking around 4ns delay. But the main problem is with > routing delay. I forgot to tell you that it is a block RAM based > design. And it uses 128 BRAM frm v4lx60(implemented a 16 port RAM). > Also it uses the block RAMS in a scattered manner. So now i have placed > this block in the central region. So the last routing to the block > RAMis taking lot of delays. > In the previous version there was no combinational logic after the > adder and i got the timig correctly. But not now. > What i was asking is to add registers to latch the output of adder.I > thought like it would be good for the PAR to see two paths insted of 1 > path from a source FF to destination FF. Also Ray there is 32*16 such > signals. Is it possible to manually route all those signals. I think > the pipeling is not possible since this is part of a pipeline stage of > a processor. Which expects the result in the same cycle. So pipelining > is not an option. > > It isn't the carry chain causing the problem. The problem comes about > > from using many levels of logic (ie the signal goes through lots of > > LUTs) between the flip-flops plus the propagation delay associated with > > the carry chain. > Ray i was asking that if we brake the above long line into separate > parts using the +ve and -ve edge of the clocks is it possible to help > tool for a better PAR. > Thanks and regards > Sumesh V S
Reply by ●September 20, 20062006-09-20
Peter Alfke wrote:> And they are available, as I posted yesterday > (funny, neither praise nor outrage in the ng. Everyone asleep?)Well, we can't have that, so ... congratulations! I look forward to having a chance to play with them, but it will not happen until the XC5VLX50 is supported by the ISE WebPACK and the ML501 becomes available. LUT6s are nice and will help combinational-heavy designs. The logic functions in the DSP may come in handy if they are flexible enough (haven't checked yet). The biggest surprise in the Virtex-5 though was the new routing network. I can only hope that the more regular structure translates into shorter P&R times. Oh, and kudos for the ML501. The specs looks outstanding and I'm really happy to see a DDR2 SODIMM slot and just just some soldered down part. Regards, Tommy






