FPGARelated.com
Forums

3 input adder in Spartan 3E

Started by skyworld July 24, 2007
Hi,

I have to design with 3 input adder, i.e. D = A + B + C, in Spartan
3E. The addition has to be finished in one 153.6MHz clock. When I do
PAR, I met timing violations. Can anybody give me some advices on how
to implement this design? (I can't upgrade to other device because of
cost). Thanks very much.


best regards

skyworld

You are probably way off timing depending on the bit-width of your words. 
 You could use a pipelined adder, if you can stand the extra latency.  If 
you need the add completed in a single stage, for let's say 16-bit words, 
your not going to hit 150 MHz on that device, but you could go faster by 
using a 3-2 compressor and then adding the final two terms using a carry 
look-ahead adder.


---Matthew Hicks


> Hi, > > I have to design with 3 input adder, i.e. D = A + B + C, in Spartan > 3E. The addition has to be finished in one 153.6MHz clock. When I do > PAR, I met timing violations. Can anybody give me some advices on how > to implement this design? (I can't upgrade to other device because of > cost). Thanks very much. > > best regards > > skyworld >
skyworld wrote:
> Hi, > > I have to design with 3 input adder, i.e. D = A + B + C, in Spartan > 3E. The addition has to be finished in one 153.6MHz clock. When I do > PAR, I met timing violations. Can anybody give me some advices on how > to implement this design? (I can't upgrade to other device because of > cost). Thanks very much. > > > best regards > > skyworld
How large are the vectors you're adding? Have you looked at the timing analysis of your path to sanity-check the timing violations? If you have long routing delays, that can be accommodated with relative placements to keep the register-to-adder and adder-to-adder routing delays down. Getting on and off the carry chain may limit you at this speed. You may have to ask yourself if your requirement is *really* three adds in one cycle. What do you do with the data after the clock? Comparing to a constant, for instance, would allow you to take the difference of the constant with A in the first cycle and compare to a B+C result rather than a direct comparison to A+B+C. If the logic generating any two vectors is simple enough, the values could be generated and the difference taken before the first register. I do know that a divider that needed to completely daisy-chain four 14-bit add/subtract stages per cycle was only happy at 66 MHz with some RLOC constraints. Your situation is a little better since you're not MSB carry-out to LSB, but not much. The time to get on and off that carry chain may swamp your results. - John_H
On 7 24 ,   9 48 , John_H <newsgr...@johnhandwork.com> wrote:
> skyworld wrote: > > Hi, > > > I have to design with 3 input adder, i.e. D = A + B + C, in Spartan > > 3E. The addition has to be finished in one 153.6MHz clock. When I do > > PAR, I met timing violations. Can anybody give me some advices on how > > to implement this design? (I can't upgrade to other device because of > > cost). Thanks very much. > > > best regards > > > skyworld > > How large are the vectors you're adding? > > Have you looked at the timing analysis of your path to sanity-check the > timing violations? If you have long routing delays, that can be > accommodated with relative placements to keep the register-to-adder and > adder-to-adder routing delays down. Getting on and off the carry chain > may limit you at this speed. > > You may have to ask yourself if your requirement is *really* three adds > in one cycle. What do you do with the data after the clock? Comparing > to a constant, for instance, would allow you to take the difference of > the constant with A in the first cycle and compare to a B+C result > rather than a direct comparison to A+B+C. If the logic generating any > two vectors is simple enough, the values could be generated and the > difference taken before the first register. > > I do know that a divider that needed to completely daisy-chain four > 14-bit add/subtract stages per cycle was only happy at 66 MHz with some > RLOC constraints. Your situation is a little better since you're not > MSB carry-out to LSB, but not much. The time to get on and off that > carry chain may swamp your results. > > - John_H
Hi John, in fact this design is for sigme-delta transmission modulator. I need a filter to transform 10 bit parallel input data (15.36MHz) to one bit output stream (153.6MHz). The filter are composed by 3 stage adders. Each adder has three inputs, i.e., what I have mentioned D = A + B + C. Every adder has to finish A + B + C within one 153.6MHz clock so that 153.6Mbps data stream works well. I have tried to use pipeline adder, but for this structure failed. So I am searching a way for "fast adder algorithm", or "fast three input adder algorithm", which could be implemented in Spartan 3E and runs fast enough. Thanks very much. skyworld
"skyworld" <chenyong20000@gmail.com> wrote in message 
news:1185286696.499801.126190@i13g2000prf.googlegroups.com...
> On 7 24 , 9 48 , John_H <newsgr...@johnhandwork.com> wrote: >> skyworld wrote: >> > Hi, >> >> > I have to design with 3 input adder, i.e. D = A + B + C, in Spartan >> > 3E. The addition has to be finished in one 153.6MHz clock. When I do >> > PAR, I met timing violations. Can anybody give me some advices on how >> > to implement this design? (I can't upgrade to other device because of >> > cost). Thanks very much. >> >> > best regards >> >> > skyworld >> >> How large are the vectors you're adding? >> >> Have you looked at the timing analysis of your path to sanity-check the >> timing violations? If you have long routing delays, that can be >> accommodated with relative placements to keep the register-to-adder and >> adder-to-adder routing delays down. Getting on and off the carry chain >> may limit you at this speed. >> >> You may have to ask yourself if your requirement is *really* three adds >> in one cycle. What do you do with the data after the clock? Comparing >> to a constant, for instance, would allow you to take the difference of >> the constant with A in the first cycle and compare to a B+C result >> rather than a direct comparison to A+B+C. If the logic generating any >> two vectors is simple enough, the values could be generated and the >> difference taken before the first register. >> >> I do know that a divider that needed to completely daisy-chain four >> 14-bit add/subtract stages per cycle was only happy at 66 MHz with some >> RLOC constraints. Your situation is a little better since you're not >> MSB carry-out to LSB, but not much. The time to get on and off that >> carry chain may swamp your results. >> >> - John_H > > Hi John, > > in fact this design is for sigme-delta transmission modulator. I need > a filter to transform 10 bit parallel input data (15.36MHz) to one bit > output stream (153.6MHz). The filter are composed by 3 stage adders. > Each adder has three inputs, i.e., what I have mentioned D = A + B + > C. Every adder has to finish A + B + C within one 153.6MHz clock so > that 153.6Mbps data stream works well. I have tried to use pipeline > adder, but for this structure failed. So I am searching a way for > "fast adder algorithm", or "fast three input adder algorithm", which > could be implemented in Spartan 3E and runs fast enough. Thanks very > much. > > > skyworld
Still no answer to my questions: What is the size of the vectors? Have you sanity-checked the timing report for long routing? Is there somethings you can do before or after this "I have to have it now" cycle? It's not obvious to me you can't simply move some things around. Where do the three values come from? Are the each results of previous 3-value adders? So terribly often, the problem can be repartitioned without compromising the system requirements. If you isolate your problem to a 3-value adder, you won't achieve your goals. If you expand your problem to the stages before and after or to the system level, you can make this work. You just probably can't make a 3-value adder work. And if you do answer my questions or provide more details, you might also include the speedgrade device you're targeting. - John_H
"skyworld" <chenyong20000@gmail.com> wrote in message 
news:1185286696.499801.126190@i13g2000prf.googlegroups.com...
> > in fact this design is for sigme-delta transmission modulator. I need > a filter to transform 10 bit parallel input data (15.36MHz) to one bit > output stream (153.6MHz). The filter are composed by 3 stage adders. > Each adder has three inputs, i.e., what I have mentioned D = A + B + > C. Every adder has to finish A + B + C within one 153.6MHz clock so > that 153.6Mbps data stream works well. I have tried to use pipeline > adder, but for this structure failed. So I am searching a way for > "fast adder algorithm", or "fast three input adder algorithm", which > could be implemented in Spartan 3E and runs fast enough. Thanks very > much. > > > skyworld >
As mentioned, if you're not feeding back your outputs into your filter, then you probably can pipeline this. Can you let us know why this doesn't work? if rising_edge(clock) then A_plus_B <= A + B; C_delay <= C; A_plus_B_plus_C <= A_plus_B + C_delay; end if; HTH., Syms.
Skyworld, you still have not told us the width of the three vectors
that you want to add at 153.6 MHz.
If the width is one bit, i.e. you want to add three serial bitstreams,
this is very simple and takes only 4 LUTs olus two flip-flops, with a
combinatorial delay through only two LUTs. That should easily run at
your 153.6 MHz.
Peter Alfke
=============
On Jul 24, 8:47 am, "Symon" <symon_bre...@hotmail.com> wrote:
> "skyworld" <chenyong20...@gmail.com> wrote in message > > news:1185286696.499801.126190@i13g2000prf.googlegroups.com... > > > in fact this design is for sigme-delta transmission modulator. I need > > a filter to transform 10 bit parallel input data (15.36MHz) to one bit > > output stream (153.6MHz). The filter are composed by 3 stage adders. > > Each adder has three inputs, i.e., what I have mentioned D = A + B + > > C. Every adder has to finish A + B + C within one 153.6MHz clock so > > that 153.6Mbps data stream works well. I have tried to use pipeline > > adder, but for this structure failed. So I am searching a way for > > "fast adder algorithm", or "fast three input adder algorithm", which > > could be implemented in Spartan 3E and runs fast enough. Thanks very > > much. > > > skyworld > > As mentioned, if you're not feeding back your outputs into your filter, then > you probably can pipeline this. Can you let us know why this doesn't work? > > if rising_edge(clock) then > A_plus_B <= A + B; > C_delay <= C; > A_plus_B_plus_C <= A_plus_B + C_delay; > end if; > > HTH., Syms.
We still do not know the width of the adder.
If it's a bit-serial 3-input adder, that takes only 4 LUTs plus two
flip-flops, and the combinatorial chain is only through two LUTs, so
it should easily meet the speed requirements.
Peter Alfke, Xilinx Applications

On Jul 24, 7:18 am, skyworld <chenyong20...@gmail.com> wrote:
> On 7 24 , 9 48 , John_H <newsgr...@johnhandwork.com> wrote: > > > > > skyworld wrote: > > > Hi, > > > > I have to design with 3 input adder, i.e. D = A + B + C, in Spartan > > > 3E. The addition has to be finished in one 153.6MHz clock. When I do > > > PAR, I met timing violations. Can anybody give me some advices on how > > > to implement this design? (I can't upgrade to other device because of > > > cost). Thanks very much. > > > > best regards > > > > skyworld > > > How large are the vectors you're adding? > > > Have you looked at the timing analysis of your path to sanity-check the > > timing violations? If you have long routing delays, that can be > > accommodated with relative placements to keep the register-to-adder and > > adder-to-adder routing delays down. Getting on and off the carry chain > > may limit you at this speed. > > > You may have to ask yourself if your requirement is *really* three adds > > in one cycle. What do you do with the data after the clock? Comparing > > to a constant, for instance, would allow you to take the difference of > > the constant with A in the first cycle and compare to a B+C result > > rather than a direct comparison to A+B+C. If the logic generating any > > two vectors is simple enough, the values could be generated and the > > difference taken before the first register. > > > I do know that a divider that needed to completely daisy-chain four > > 14-bit add/subtract stages per cycle was only happy at 66 MHz with some > > RLOC constraints. Your situation is a little better since you're not > > MSB carry-out to LSB, but not much. The time to get on and off that > > carry chain may swamp your results. > > > - John_H > > Hi John, > > in fact this design is for sigme-delta transmission modulator. I need > a filter to transform 10 bit parallel input data (15.36MHz) to one bit > output stream (153.6MHz). The filter are composed by 3 stage adders. > Each adder has three inputs, i.e., what I have mentioned D = A + B + > C. Every adder has to finish A + B + C within one 153.6MHz clock so > that 153.6Mbps data stream works well. I have tried to use pipeline > adder, but for this structure failed. So I am searching a way for > "fast adder algorithm", or "fast three input adder algorithm", which > could be implemented in Spartan 3E and runs fast enough. Thanks very > much. > > skyworld
skyworld <chenyong20000@gmail.com> wrote:
> Hi,
> I have to design with 3 input adder, i.e. D = A + B + C, in Spartan > 3E. The addition has to be finished in one 153.6MHz clock. When I do > PAR, I met timing violations. Can anybody give me some advices on how > to implement this design? (I can't upgrade to other device because of > cost). Thanks very much.
Why can't you pipeline like always @(posedge clk) begin AB<= A+B; D <= AB + C; end ? -- Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt --------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------
On Jul 24, 1:32 pm, Peter Alfke <pe...@xilinx.com> wrote:
> We still do not know the width of the adder. > If it's a bit-serial 3-input adder, that takes only 4 LUTs plus two > flip-flops, and the combinatorial chain is only through two LUTs, so > it should easily meet the speed requirements. > Peter Alfke, Xilinx Applications > > On Jul 24, 7:18 am, skyworld <chenyong20...@gmail.com> wrote: > > > On 7 24 , 9 48 , John_H <newsgr...@johnhandwork.com> wrote: > > > > skyworld wrote: > > > > Hi, > > > > > I have to design with 3 input adder, i.e. D = A + B + C, in Spartan > > > > 3E. The addition has to be finished in one 153.6MHz clock. When I do > > > > PAR, I met timing violations. Can anybody give me some advices on how > > > > to implement this design? (I can't upgrade to other device because of > > > > cost). Thanks very much. > > > > > best regards > > > > > skyworld > > > > How large are the vectors you're adding? > > > > Have you looked at the timing analysis of your path to sanity-check the > > > timing violations? If you have long routing delays, that can be > > > accommodated with relative placements to keep the register-to-adder and > > > adder-to-adder routing delays down. Getting on and off the carry chain > > > may limit you at this speed. > > > > You may have to ask yourself if your requirement is *really* three adds > > > in one cycle. What do you do with the data after the clock? Comparing > > > to a constant, for instance, would allow you to take the difference of > > > the constant with A in the first cycle and compare to a B+C result > > > rather than a direct comparison to A+B+C. If the logic generating any > > > two vectors is simple enough, the values could be generated and the > > > difference taken before the first register. > > > > I do know that a divider that needed to completely daisy-chain four > > > 14-bit add/subtract stages per cycle was only happy at 66 MHz with some > > > RLOC constraints. Your situation is a little better since you're not > > > MSB carry-out to LSB, but not much. The time to get on and off that > > > carry chain may swamp your results. > > > > - John_H > > > Hi John, > > > in fact this design is for sigme-delta transmission modulator. I need > > a filter to transform 10 bit parallel input data (15.36MHz) to one bit > > output stream (153.6MHz). The filter are composed by 3 stage adders. > > Each adder has three inputs, i.e., what I have mentioned D = A + B + > > C. Every adder has to finish A + B + C within one 153.6MHz clock so > > that 153.6Mbps data stream works well. I have tried to use pipeline > > adder, but for this structure failed. So I am searching a way for > > "fast adder algorithm", or "fast three input adder algorithm", which > > could be implemented in Spartan 3E and runs fast enough. Thanks very > > much. > > > skyworld
In the OP's second post he did say: "I need a filter to transform 10 bit parallel input data (15.36MHz) to one bit output stream (153.6MHz)." So I assumed that he meant 10-bits parallel data is input, then multipled and added, and then the ten-bit result is shifted out 1-bit at a time. -Dave Pollum