FPGARelated.com
Forums

Adders with multiple inputs?

Started by Unknown May 25, 2009
On 26 Mai, 05:10, sbatt...@yahoo.co.jp wrote:
> I didn't really post here because I feel like I would have any issues > with my design, I was more just curious on how XST would handle: > Result <= A + B + C + D + E + F + .... + X + Y;
At first it surely will create a linear chain of adders in the order that the equation has to be parsed based on VHDL semantics. (I am not sure whether that is left to right or right to left) There might be some extra logic to optimize the structure either at the RTL level or later on the bitlevel. It actually is rather simple to do that after placement. See my paper: http://eis.eit.uni-kl.de/eis/research/publications/papers/iccd04.pdf I do not know whether ISE does implement any of this so it is probably better to force the order the equation is parsed using brackets and create a minimum depth tree: Result <= (A + B) + (C+D); Beware: The performance of this might actually be worse. The reason for this is, that when you go through N adders of M Bits your critical path does not have length N*M but only N+M because all the carries run in the same direction. Therefore a regularly laid out array of adders might be faster than a tree if the the regularity significantly reduces the routing delay between the adders. Do not be concerned about the additional input bits that receive 0 inputs when extending your adders. Logic optimizers are very good at propagating constants. Unnecessary LUTs will be removed. But I agree with the other posters: You do not need 25 adders, 15 adders and some shift registers are enough for your application. You only need to keep track of 5 sums and update them with every new data word using two adders. Have fun, Kolja Sulimma
On May 25, 7:14=A0pm, sbatt...@yahoo.co.jp wrote:
> On May 26, 3:58=A0am, Weng Tianxiang <wtx...@gmail.com> wrote: > > > > > > > On May 25, 9:16=A0am, Peter Alfke <al...@sbcglobal.net> wrote: > > > > On May 25, 2:11=A0am, sbatt...@yahoo.co.jp wrote: > > > > > Hi guys, > > > > At the moment I'm waiting to find out whether I will be using Xilin=
x
> > > > or Actel for my project, and so I'm putting it together for both ju=
st
> > > > in case. > > > > > In the Actel IP cores, there is an array adder which allows a good > > > > number of inputs, and there's some optional pipelining. I figure it=
's
> > > > sufficient to just drop this in and wire up as many inputs as I nee=
d.
> > > > > Xilinx IP cores seem to have only 2-input adders, and I guess these > > > > are probably inferred by XST with the + operator anyway, so I don't > > > > want to bother with the IP core gen unless there's some reason why =
I
> > > > should. > > > > Supposing I want: > > > > > Result <=3D A + B + C + D + E; > > > > Note, I used only five inputs in my example for brevity, I will hav=
e
> > > > more like 25 in my actual system. > > > > > (looking in the XST manual, I can either pad the inputs with leadin=
g
> > > > zeros or convert to integer and back to std_logic_vector to get car=
ry
> > > > bits to fill my wider result) > > > > > At the end of the day, when I synthesize this, would there be any > > > > difference between coding it in stages (adding pairs of two togethe=
r,
> > > > then adding their sums together, and so on until all are added up) =
and
> > > > just putting A+B+C+D+E in one statement? > > > > All I can think of is that (depending how well conversions to/from > > > > integer are optimized in XST) I might save a few bits of space in t=
he
> > > > first stages. > > > > Using the bit padding method, I suppose that all of the adders in t=
he
> > > > first stages would wind up unnecessarily being the same width as th=
e
> > > > result. > > > > > Anyway, I'm just curious how this will end up working... any insigh=
t
> > > > appreciated! > > > > > Steve > > > > If I understand you right, you have 25 parallel inputs, each sending > > > you bit-serial data. > > > You need to convert the 25 inputs into one 6-bit binary word, and the=
n
> > > accumulate these words with increasing (or decreasing) binary weight. > > > > Conversion of 25 lines to 6 bits can be done in many ways, including > > > sequential scanning or shifting, which requires a faster clock of > > > > 1.5 MHz. > > > But here is an unconventional and simpler way: > > > Use 13 inputs as address to one port of a BlockRAM with 4 parallel > > > outputs =A0(8K x 4) > > > Use the remaining 12 inputs as address to the other port of the same > > > BlockRAM. > > > Store the conversion of (# of active inputs to a binary value) in the > > > BlockRAM. > > > > Add the two 4 bit binary words together to form a 5-bit word that > > > always represents the number of active inputs. > > > Then feed this 5-bit value into a 13-bit accumulator, where you shift > > > the content after each clock tick. > > > > This costs you one BlockRAM plus three or four CLBs in Xilinx > > > nomenclature, a tiny portion of the smallest Spartan or Virtex device=
,
> > > and it could be run a few thousand times faster than you need. > > > If you have more than 26 inputs, just add another BlockRAM for a tota=
l
> > > of up to 52 inputs, and extend the adder and accumulator by one bit. > > > (Yes, I know in Spartan you are limited to 12 address inputs, (4K x > > > 4), but you can add the remaining bit outside...) > > > > Peter Alfke, from home.- Hide quoted text - > > > > - Show quoted text - > > > Hi Steve, > > 1. Set up a 16*8 FIFO; > > 2. Each of 25 data sources is first registered in its 8-bit register > > with valid bit when data bits are full from its serial data source; > > 3. When valid =3D '1', push the data into FIFO and clear the valid bit; > > 4. Set up a 13-bit register with initialized 0 data when a new > > calculation starts; > > 5. When FIFO is not empty, add 13-bit register with high 5-bit being > > '0' and low 8-bit from FIFO output. > > > There is no need for 25 data sources. > > > Weng > > Hi Weng, > > I'm sorry I didn't explain in full what I am doing. There is only one > serial source feeding a string of delay lines, and at the end of the > delay lines is a 5x5 array of 8-bit registers whose sum I need to > calculate. Each time the serial source gets a byte in, everything in > the delay lines and 5x5 array gets shifted, and I have a new sum to > calculate (this happens once every ms or so, though, so I'm not really > worried about carry propagation). > So in this case, I don't think a FIFO would help any? > > As far as I can see, as noted in my reply to Andy's post, my options > are (a slightly modified version of) his suggested accumulator > solution, or feeding my 25 inputs into a tree of adders. There could > be some other clever solution though? > > I was originally just wondering if XST would generate such a tree with > 25 operands in a sum statement, or if I would have to build the tree > myself in a few statements. > The Actel array adder IP apparently uses the DADDA algorithm to handle > multiple inputs, but I haven't seen anything in the XST docs about > multiple-operand addition.- Hide quoted text - > > - Show quoted text -
"Each time the serial source gets a byte in, everything in the delay lines and 5x5 array gets shifted, and I have a new sum to calculate", OK, there is no FIFO needed. Every time you get a byte from the serial input, a 13-bit adder is used to add the data with high 5-bit '0' on the same clock when the byte is being shifted. Weng
On May 27, 12:17=A0am, Weng Tianxiang <wtx...@gmail.com> wrote:
> On May 25, 7:14=A0pm, sbatt...@yahoo.co.jp wrote: > > > > > On May 26, 3:58=A0am, Weng Tianxiang <wtx...@gmail.com> wrote: > > > > On May 25, 9:16=A0am, Peter Alfke <al...@sbcglobal.net> wrote: > > > > > On May 25, 2:11=A0am, sbatt...@yahoo.co.jp wrote: > > > > > > Hi guys, > > > > > At the moment I'm waiting to find out whether I will be using Xil=
inx
> > > > > or Actel for my project, and so I'm putting it together for both =
just
> > > > > in case. > > > > > > In the Actel IP cores, there is an array adder which allows a goo=
d
> > > > > number of inputs, and there's some optional pipelining. I figure =
it's
> > > > > sufficient to just drop this in and wire up as many inputs as I n=
eed.
> > > > > > Xilinx IP cores seem to have only 2-input adders, and I guess the=
se
> > > > > are probably inferred by XST with the + operator anyway, so I don=
't
> > > > > want to bother with the IP core gen unless there's some reason wh=
y I
> > > > > should. > > > > > Supposing I want: > > > > > > Result <=3D A + B + C + D + E; > > > > > Note, I used only five inputs in my example for brevity, I will h=
ave
> > > > > more like 25 in my actual system. > > > > > > (looking in the XST manual, I can either pad the inputs with lead=
ing
> > > > > zeros or convert to integer and back to std_logic_vector to get c=
arry
> > > > > bits to fill my wider result) > > > > > > At the end of the day, when I synthesize this, would there be any > > > > > difference between coding it in stages (adding pairs of two toget=
her,
> > > > > then adding their sums together, and so on until all are added up=
) and
> > > > > just putting A+B+C+D+E in one statement? > > > > > All I can think of is that (depending how well conversions to/fro=
m
> > > > > integer are optimized in XST) I might save a few bits of space in=
the
> > > > > first stages. > > > > > Using the bit padding method, I suppose that all of the adders in=
the
> > > > > first stages would wind up unnecessarily being the same width as =
the
> > > > > result. > > > > > > Anyway, I'm just curious how this will end up working... any insi=
ght
> > > > > appreciated! > > > > > > Steve > > > > > If I understand you right, you have 25 parallel inputs, each sendin=
g
> > > > you bit-serial data. > > > > You need to convert the 25 inputs into one 6-bit binary word, and t=
hen
> > > > accumulate these words with increasing (or decreasing) binary weigh=
t.
> > > > > Conversion of 25 lines to 6 bits can be done in many ways, includin=
g
> > > > sequential scanning or shifting, which requires a faster clock of > > > > > 1.5 MHz. > > > > But here is an unconventional and simpler way: > > > > Use 13 inputs as address to one port of a BlockRAM with 4 parallel > > > > outputs =A0(8K x 4) > > > > Use the remaining 12 inputs as address to the other port of the sam=
e
> > > > BlockRAM. > > > > Store the conversion of (# of active inputs to a binary value) in t=
he
> > > > BlockRAM. > > > > > Add the two 4 bit binary words together to form a 5-bit word that > > > > always represents the number of active inputs. > > > > Then feed this 5-bit value into a 13-bit accumulator, where you shi=
ft
> > > > the content after each clock tick. > > > > > This costs you one BlockRAM plus three or four CLBs in Xilinx > > > > nomenclature, a tiny portion of the smallest Spartan or Virtex devi=
ce,
> > > > and it could be run a few thousand times faster than you need. > > > > If you have more than 26 inputs, just add another BlockRAM for a to=
tal
> > > > of up to 52 inputs, and extend the adder and accumulator by one bit=
.
> > > > (Yes, I know in Spartan you are limited to 12 address inputs, (4K x > > > > 4), but you can add the remaining bit outside...) > > > > > Peter Alfke, from home.- Hide quoted text - > > > > > - Show quoted text - > > > > Hi Steve, > > > 1. Set up a 16*8 FIFO; > > > 2. Each of 25 data sources is first registered in its 8-bit register > > > with valid bit when data bits are full from its serial data source; > > > 3. When valid =3D '1', push the data into FIFO and clear the valid bi=
t;
> > > 4. Set up a 13-bit register with initialized 0 data when a new > > > calculation starts; > > > 5. When FIFO is not empty, add 13-bit register with high 5-bit being > > > '0' and low 8-bit from FIFO output. > > > > There is no need for 25 data sources. > > > > Weng > > > Hi Weng, > > > I'm sorry I didn't explain in full what I am doing. There is only one > > serial source feeding a string of delay lines, and at the end of the > > delay lines is a 5x5 array of 8-bit registers whose sum I need to > > calculate. Each time the serial source gets a byte in, everything in > > the delay lines and 5x5 array gets shifted, and I have a new sum to > > calculate (this happens once every ms or so, though, so I'm not really > > worried about carry propagation). > > So in this case, I don't think a FIFO would help any? > > > As far as I can see, as noted in my reply to Andy's post, my options > > are (a slightly modified version of) his suggested accumulator > > solution, or feeding my 25 inputs into a tree of adders. There could > > be some other clever solution though? > > > I was originally just wondering if XST would generate such a tree with > > 25 operands in a sum statement, or if I would have to build the tree > > myself in a few statements. > > The Actel array adder IP apparently uses the DADDA algorithm to handle > > multiple inputs, but I haven't seen anything in the XST docs about > > multiple-operand addition.- Hide quoted text - > > > - Show quoted text - > > "Each time the serial source gets a byte in, everything in =A0the delay > lines and 5x5 array gets shifted, and I have a new sum to > calculate", OK, there is no FIFO needed. > > Every time you get a byte from the serial input, a 13-bit adder is > used to add the data with high 5-bit '0' on the same clock when the > byte is being shifted. > > Weng
Alright, thanks for all the replies! For now I guess I will go with having an accumulator on each row of the array, adding the new data that is being shifted in, and subtracting off the end of each row that is being shifted out into nowhere land. Including the accumulators, there are then 9 adders and 5 subtracters. Cheers, Steve