comp.arch.fpga | Adders with multiple inputs?

Hi guys,
At the moment I'm waiting to find out whether I will be using Xilinx
or Actel for my project, and so I'm putting it together for both just
in case.

In the Actel IP cores, there is an array adder which allows a good
number of inputs, and there's some optional pipelining. I figure it's
sufficient to just drop this in and wire up as many inputs as I need.

Xilinx IP cores seem to have only 2-input adders, and I guess these
are probably inferred by XST with the + operator anyway, so I don't
want to bother with the IP core gen unless there's some reason why I
should.
Supposing I want:

Result <= A + B + C + D + E;
Note, I used only five inputs in my example for brevity, I will have
more like 25 in my actual system.

(looking in the XST manual, I can either pad the inputs with leading
zeros or convert to integer and back to std_logic_vector to get carry
bits to fill my wider result)

At the end of the day, when I synthesize this, would there be any
difference between coding it in stages (adding pairs of two together,
then adding their sums together, and so on until all are added up) and
just putting A+B+C+D+E in one statement?
All I can think of is that (depending how well conversions to/from
integer are optimized in XST) I might save a few bits of space in the
first stages.
Using the bit padding method, I suppose that all of the adders in the
first stages would wind up unnecessarily being the same width as the
result.

Anyway, I'm just curious how this will end up working... any insight
appreciated!

Steve

Reply by Andrew Holme ●May 25, 20092009-05-25

<sbattazz@yahoo.co.jp> wrote in message 
news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.com...
> Hi guys,
> At the moment I'm waiting to find out whether I will be using Xilinx
> or Actel for my project, and so I'm putting it together for both just
> in case.
>
> In the Actel IP cores, there is an array adder which allows a good
> number of inputs, and there's some optional pipelining. I figure it's
> sufficient to just drop this in and wire up as many inputs as I need.
>
> Xilinx IP cores seem to have only 2-input adders, and I guess these
> are probably inferred by XST with the + operator anyway, so I don't
> want to bother with the IP core gen unless there's some reason why I
> should.
> Supposing I want:
>
> Result <= A + B + C + D + E;
> Note, I used only five inputs in my example for brevity, I will have
> more like 25 in my actual system.
>
> (looking in the XST manual, I can either pad the inputs with leading
> zeros or convert to integer and back to std_logic_vector to get carry
> bits to fill my wider result)
>
> At the end of the day, when I synthesize this, would there be any
> difference between coding it in stages (adding pairs of two together,
> then adding their sums together, and so on until all are added up) and
> just putting A+B+C+D+E in one statement?
> All I can think of is that (depending how well conversions to/from
> integer are optimized in XST) I might save a few bits of space in the
> first stages.
> Using the bit padding method, I suppose that all of the adders in the
> first stages would wind up unnecessarily being the same width as the
> result.
>
> Anyway, I'm just curious how this will end up working... any insight
> appreciated!
>
> Steve

How fast do you need to clock it?  How many bits wide is your result?

Reply by ●May 25, 20092009-05-25

On May 25, 6:43=A0pm, "Andrew Holme" <a...@nospam.co.uk> wrote:
> <sbatt...@yahoo.co.jp> wrote in message
>
> news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.com...
>
>
>
> > Hi guys,
> > At the moment I'm waiting to find out whether I will be using Xilinx
> > or Actel for my project, and so I'm putting it together for both just
> > in case.
>
> > In the Actel IP cores, there is an array adder which allows a good
> > number of inputs, and there's some optional pipelining. I figure it's
> > sufficient to just drop this in and wire up as many inputs as I need.
>
> > Xilinx IP cores seem to have only 2-input adders, and I guess these
> > are probably inferred by XST with the + operator anyway, so I don't
> > want to bother with the IP core gen unless there's some reason why I
> > should.
> > Supposing I want:
>
> > Result <=3D A + B + C + D + E;
> > Note, I used only five inputs in my example for brevity, I will have
> > more like 25 in my actual system.
>
> > (looking in the XST manual, I can either pad the inputs with leading
> > zeros or convert to integer and back to std_logic_vector to get carry
> > bits to fill my wider result)
>
> > At the end of the day, when I synthesize this, would there be any
> > difference between coding it in stages (adding pairs of two together,
> > then adding their sums together, and so on until all are added up) and
> > just putting A+B+C+D+E in one statement?
> > All I can think of is that (depending how well conversions to/from
> > integer are optimized in XST) I might save a few bits of space in the
> > first stages.
> > Using the bit padding method, I suppose that all of the adders in the
> > first stages would wind up unnecessarily being the same width as the
> > result.
>
> > Anyway, I'm just curious how this will end up working... any insight
> > appreciated!
>
> > Steve
>
> How fast do you need to clock it? =A0How many bits wide is your result?

Assuming 25 8-bit inputs, the maximum result is 25*255 =3D 6375, meaning
13-bit output.
Serial data comes in at 57.6 kilobits/second =3D 7200 bytes/second, and
the sum of my array is checked once per byte so there will be a little
over 1ms between clock pulses (I can't imagine that being anywhere
near playing with timing issues). For the project I won't need
anything any faster than that.

I'm just wondering how XST would handle such an addition statement
with multiple operands (my synthesis report doesn't say anything about
adders). Is it smart enough to automatically do some kind of tree
algorithm, or would it do a "dumb" array of one adder feeding into the
next for each extra operand?

Thanks for the quick response!

Steve

Reply by Andy ●May 25, 20092009-05-25

On May 25, 4:56=A0am, sbatt...@yahoo.co.jp wrote:
> On May 25, 6:43=A0pm, "Andrew Holme" <a...@nospam.co.uk> wrote:
>
>
>
>
>
> > <sbatt...@yahoo.co.jp> wrote in message
>
> >news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.com...
>
> > > Hi guys,
> > > At the moment I'm waiting to find out whether I will be using Xilinx
> > > or Actel for my project, and so I'm putting it together for both just
> > > in case.
>
> > > In the Actel IP cores, there is an array adder which allows a good
> > > number of inputs, and there's some optional pipelining. I figure it's
> > > sufficient to just drop this in and wire up as many inputs as I need.
>
> > > Xilinx IP cores seem to have only 2-input adders, and I guess these
> > > are probably inferred by XST with the + operator anyway, so I don't
> > > want to bother with the IP core gen unless there's some reason why I
> > > should.
> > > Supposing I want:
>
> > > Result <=3D A + B + C + D + E;
> > > Note, I used only five inputs in my example for brevity, I will have
> > > more like 25 in my actual system.
>
> > > (looking in the XST manual, I can either pad the inputs with leading
> > > zeros or convert to integer and back to std_logic_vector to get carry
> > > bits to fill my wider result)
>
> > > At the end of the day, when I synthesize this, would there be any
> > > difference between coding it in stages (adding pairs of two together,
> > > then adding their sums together, and so on until all are added up) an=
d
> > > just putting A+B+C+D+E in one statement?
> > > All I can think of is that (depending how well conversions to/from
> > > integer are optimized in XST) I might save a few bits of space in the
> > > first stages.
> > > Using the bit padding method, I suppose that all of the adders in the
> > > first stages would wind up unnecessarily being the same width as the
> > > result.
>
> > > Anyway, I'm just curious how this will end up working... any insight
> > > appreciated!
>
> > > Steve
>
> > How fast do you need to clock it? =A0How many bits wide is your result?
>
> Assuming 25 8-bit inputs, the maximum result is 25*255 =3D 6375, meaning
> 13-bit output.
> Serial data comes in at 57.6 kilobits/second =3D 7200 bytes/second, and
> the sum of my array is checked once per byte so there will be a little
> over 1ms between clock pulses (I can't imagine that being anywhere
> near playing with timing issues). For the project I won't need
> anything any faster than that.
>
> I'm just wondering how XST would handle such an addition statement
> with multiple operands (my synthesis report doesn't say anything about
> adders). Is it smart enough to automatically do some kind of tree
> algorithm, or would it do a "dumb" array of one adder feeding into the
> next for each extra operand?
>
> Thanks for the quick response!
>
> Steve- Hide quoted text -
>
> - Show quoted text -

Do you really need to recompute the entire array sum every time, or
can you compute a running sum (accumulator) as the data comes in? You
can also subtract the last discarded term from your running sum if you
are looking for a continuous N-term running sum (as is used in a
boxcar filter, etc.)

As long as integers will handle your data size, you are much better
off using them than padding vectors. Simulations will run much faster,
and there is no hardware associated with conversion from/to SLV/signed/
unsigned to/from integer.

Andy

Reply by Peter Alfke ●May 25, 20092009-05-25

On May 25, 2:11=A0am, sbatt...@yahoo.co.jp wrote:
> Hi guys,
> At the moment I'm waiting to find out whether I will be using Xilinx
> or Actel for my project, and so I'm putting it together for both just
> in case.
>
> In the Actel IP cores, there is an array adder which allows a good
> number of inputs, and there's some optional pipelining. I figure it's
> sufficient to just drop this in and wire up as many inputs as I need.
>
> Xilinx IP cores seem to have only 2-input adders, and I guess these
> are probably inferred by XST with the + operator anyway, so I don't
> want to bother with the IP core gen unless there's some reason why I
> should.
> Supposing I want:
>
> Result <=3D A + B + C + D + E;
> Note, I used only five inputs in my example for brevity, I will have
> more like 25 in my actual system.
>
> (looking in the XST manual, I can either pad the inputs with leading
> zeros or convert to integer and back to std_logic_vector to get carry
> bits to fill my wider result)
>
> At the end of the day, when I synthesize this, would there be any
> difference between coding it in stages (adding pairs of two together,
> then adding their sums together, and so on until all are added up) and
> just putting A+B+C+D+E in one statement?
> All I can think of is that (depending how well conversions to/from
> integer are optimized in XST) I might save a few bits of space in the
> first stages.
> Using the bit padding method, I suppose that all of the adders in the
> first stages would wind up unnecessarily being the same width as the
> result.
>
> Anyway, I'm just curious how this will end up working... any insight
> appreciated!
>
> Steve

If I understand you right, you have 25 parallel inputs, each sending
you bit-serial data.
You need to convert the 25 inputs into one 6-bit binary word, and then
accumulate these words with increasing (or decreasing) binary weight.

Conversion of 25 lines to 6 bits can be done in many ways, including
sequential scanning or shifting, which requires a faster clock of >
1.5 MHz.
But here is an unconventional and simpler way:
Use 13 inputs as address to one port of a BlockRAM with 4 parallel
outputs  (8K x 4)
Use the remaining 12 inputs as address to the other port of the same
BlockRAM.
Store the conversion of (# of active inputs to a binary value) in the
BlockRAM.

Add the two 4 bit binary words together to form a 5-bit word that
always represents the number of active inputs.
Then feed this 5-bit value into a 13-bit accumulator, where you shift
the content after each clock tick.

This costs you one BlockRAM plus three or four CLBs in Xilinx
nomenclature, a tiny portion of the smallest Spartan or Virtex device,
and it could be run a few thousand times faster than you need.
If you have more than 26 inputs, just add another BlockRAM for a total
of up to 52 inputs, and extend the adder and accumulator by one bit.
(Yes, I know in Spartan you are limited to 12 address inputs, (4K x
4), but you can add the remaining bit outside...)

Peter Alfke, from home.

Reply by Weng Tianxiang ●May 25, 20092009-05-25

On May 25, 9:16=A0am, Peter Alfke <al...@sbcglobal.net> wrote:
> On May 25, 2:11=A0am, sbatt...@yahoo.co.jp wrote:
>
>
>
>
>
> > Hi guys,
> > At the moment I'm waiting to find out whether I will be using Xilinx
> > or Actel for my project, and so I'm putting it together for both just
> > in case.
>
> > In the Actel IP cores, there is an array adder which allows a good
> > number of inputs, and there's some optional pipelining. I figure it's
> > sufficient to just drop this in and wire up as many inputs as I need.
>
> > Xilinx IP cores seem to have only 2-input adders, and I guess these
> > are probably inferred by XST with the + operator anyway, so I don't
> > want to bother with the IP core gen unless there's some reason why I
> > should.
> > Supposing I want:
>
> > Result <=3D A + B + C + D + E;
> > Note, I used only five inputs in my example for brevity, I will have
> > more like 25 in my actual system.
>
> > (looking in the XST manual, I can either pad the inputs with leading
> > zeros or convert to integer and back to std_logic_vector to get carry
> > bits to fill my wider result)
>
> > At the end of the day, when I synthesize this, would there be any
> > difference between coding it in stages (adding pairs of two together,
> > then adding their sums together, and so on until all are added up) and
> > just putting A+B+C+D+E in one statement?
> > All I can think of is that (depending how well conversions to/from
> > integer are optimized in XST) I might save a few bits of space in the
> > first stages.
> > Using the bit padding method, I suppose that all of the adders in the
> > first stages would wind up unnecessarily being the same width as the
> > result.
>
> > Anyway, I'm just curious how this will end up working... any insight
> > appreciated!
>
> > Steve
>
> If I understand you right, you have 25 parallel inputs, each sending
> you bit-serial data.
> You need to convert the 25 inputs into one 6-bit binary word, and then
> accumulate these words with increasing (or decreasing) binary weight.
>
> Conversion of 25 lines to 6 bits can be done in many ways, including
> sequential scanning or shifting, which requires a faster clock of >
> 1.5 MHz.
> But here is an unconventional and simpler way:
> Use 13 inputs as address to one port of a BlockRAM with 4 parallel
> outputs =A0(8K x 4)
> Use the remaining 12 inputs as address to the other port of the same
> BlockRAM.
> Store the conversion of (# of active inputs to a binary value) in the
> BlockRAM.
>
> Add the two 4 bit binary words together to form a 5-bit word that
> always represents the number of active inputs.
> Then feed this 5-bit value into a 13-bit accumulator, where you shift
> the content after each clock tick.
>
> This costs you one BlockRAM plus three or four CLBs in Xilinx
> nomenclature, a tiny portion of the smallest Spartan or Virtex device,
> and it could be run a few thousand times faster than you need.
> If you have more than 26 inputs, just add another BlockRAM for a total
> of up to 52 inputs, and extend the adder and accumulator by one bit.
> (Yes, I know in Spartan you are limited to 12 address inputs, (4K x
> 4), but you can add the remaining bit outside...)
>
> Peter Alfke, from home.- Hide quoted text -
>
> - Show quoted text -

Hi Steve,
1. Set up a 16*8 FIFO;
2. Each of 25 data sources is first registered in its 8-bit register
with valid bit when data bits are full from its serial data source;
3. When valid =3D '1', push the data into FIFO and clear the valid bit;
4. Set up a 13-bit register with initialized 0 data when a new
calculation starts;
5. When FIFO is not empty, add 13-bit register with high 5-bit being
'0' and low 8-bit from FIFO output.

There is no need for 25 data sources.

Weng

Reply by ●May 25, 20092009-05-25

On May 25, 10:29=A0pm, Andy <jonesa...@comcast.net> wrote:
> On May 25, 4:56=A0am, sbatt...@yahoo.co.jp wrote:
>
>
>
> > On May 25, 6:43=A0pm, "Andrew Holme" <a...@nospam.co.uk> wrote:
>
> > > <sbatt...@yahoo.co.jp> wrote in message
>
> > >news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.com.=
..
>
> > > > Hi guys,
> > > > At the moment I'm waiting to find out whether I will be using Xilin=
x
> > > > or Actel for my project, and so I'm putting it together for both ju=
st
> > > > in case.
>
> > > > In the Actel IP cores, there is an array adder which allows a good
> > > > number of inputs, and there's some optional pipelining. I figure it=
's
> > > > sufficient to just drop this in and wire up as many inputs as I nee=
d.
>
> > > > Xilinx IP cores seem to have only 2-input adders, and I guess these
> > > > are probably inferred by XST with the + operator anyway, so I don't
> > > > want to bother with the IP core gen unless there's some reason why =
I
> > > > should.
> > > > Supposing I want:
>
> > > > Result <=3D A + B + C + D + E;
> > > > Note, I used only five inputs in my example for brevity, I will hav=
e
> > > > more like 25 in my actual system.
>
> > > > (looking in the XST manual, I can either pad the inputs with leadin=
g
> > > > zeros or convert to integer and back to std_logic_vector to get car=
ry
> > > > bits to fill my wider result)
>
> > > > At the end of the day, when I synthesize this, would there be any
> > > > difference between coding it in stages (adding pairs of two togethe=
r,
> > > > then adding their sums together, and so on until all are added up) =
and
> > > > just putting A+B+C+D+E in one statement?
> > > > All I can think of is that (depending how well conversions to/from
> > > > integer are optimized in XST) I might save a few bits of space in t=
he
> > > > first stages.
> > > > Using the bit padding method, I suppose that all of the adders in t=
he
> > > > first stages would wind up unnecessarily being the same width as th=
e
> > > > result.
>
> > > > Anyway, I'm just curious how this will end up working... any insigh=
t
> > > > appreciated!
>
> > > > Steve
>
> > > How fast do you need to clock it? =A0How many bits wide is your resul=
t?
>
> > Assuming 25 8-bit inputs, the maximum result is 25*255 =3D 6375, meanin=
g
> > 13-bit output.
> > Serial data comes in at 57.6 kilobits/second =3D 7200 bytes/second, and
> > the sum of my array is checked once per byte so there will be a little
> > over 1ms between clock pulses (I can't imagine that being anywhere
> > near playing with timing issues). For the project I won't need
> > anything any faster than that.
>
> > I'm just wondering how XST would handle such an addition statement
> > with multiple operands (my synthesis report doesn't say anything about
> > adders). Is it smart enough to automatically do some kind of tree
> > algorithm, or would it do a "dumb" array of one adder feeding into the
> > next for each extra operand?
>
> > Thanks for the quick response!
>
> > Steve- Hide quoted text -
>
> > - Show quoted text -
>
> Do you really need to recompute the entire array sum every time, or
> can you compute a running sum (accumulator) as the data comes in? You
> can also subtract the last discarded term from your running sum if you
> are looking for a continuous N-term running sum (as is used in a
> boxcar filter, etc.)
>
> As long as integers will handle your data size, you are much better
> off using them than padding vectors. Simulations will run much faster,
> and there is no hardware associated with conversion from/to SLV/signed/
> unsigned to/from integer.
>
> Andy

Well, I have thought about the accumulator approach, but this is a 5x5
array being fed data through some delay lines, so I would then need
one accumulator for each 5-byte row.
Then I would still have to sum up the output of the five accumulators,
leaving me one stage deeper than a tree of adders taking all 25
inputs.



Peter, I'm not converting 25 lines to 6 bits, but rather taking a sum
of 25 8-bit values resulting in a 13-bit value.

Thanks again for the replies so far!

Reply by ●May 25, 20092009-05-25

On May 26, 3:58=A0am, Weng Tianxiang <wtx...@gmail.com> wrote:
> On May 25, 9:16=A0am, Peter Alfke <al...@sbcglobal.net> wrote:
>
>
>
> > On May 25, 2:11=A0am, sbatt...@yahoo.co.jp wrote:
>
> > > Hi guys,
> > > At the moment I'm waiting to find out whether I will be using Xilinx
> > > or Actel for my project, and so I'm putting it together for both just
> > > in case.
>
> > > In the Actel IP cores, there is an array adder which allows a good
> > > number of inputs, and there's some optional pipelining. I figure it's
> > > sufficient to just drop this in and wire up as many inputs as I need.
>
> > > Xilinx IP cores seem to have only 2-input adders, and I guess these
> > > are probably inferred by XST with the + operator anyway, so I don't
> > > want to bother with the IP core gen unless there's some reason why I
> > > should.
> > > Supposing I want:
>
> > > Result <=3D A + B + C + D + E;
> > > Note, I used only five inputs in my example for brevity, I will have
> > > more like 25 in my actual system.
>
> > > (looking in the XST manual, I can either pad the inputs with leading
> > > zeros or convert to integer and back to std_logic_vector to get carry
> > > bits to fill my wider result)
>
> > > At the end of the day, when I synthesize this, would there be any
> > > difference between coding it in stages (adding pairs of two together,
> > > then adding their sums together, and so on until all are added up) an=
d
> > > just putting A+B+C+D+E in one statement?
> > > All I can think of is that (depending how well conversions to/from
> > > integer are optimized in XST) I might save a few bits of space in the
> > > first stages.
> > > Using the bit padding method, I suppose that all of the adders in the
> > > first stages would wind up unnecessarily being the same width as the
> > > result.
>
> > > Anyway, I'm just curious how this will end up working... any insight
> > > appreciated!
>
> > > Steve
>
> > If I understand you right, you have 25 parallel inputs, each sending
> > you bit-serial data.
> > You need to convert the 25 inputs into one 6-bit binary word, and then
> > accumulate these words with increasing (or decreasing) binary weight.
>
> > Conversion of 25 lines to 6 bits can be done in many ways, including
> > sequential scanning or shifting, which requires a faster clock of >
> > 1.5 MHz.
> > But here is an unconventional and simpler way:
> > Use 13 inputs as address to one port of a BlockRAM with 4 parallel
> > outputs =A0(8K x 4)
> > Use the remaining 12 inputs as address to the other port of the same
> > BlockRAM.
> > Store the conversion of (# of active inputs to a binary value) in the
> > BlockRAM.
>
> > Add the two 4 bit binary words together to form a 5-bit word that
> > always represents the number of active inputs.
> > Then feed this 5-bit value into a 13-bit accumulator, where you shift
> > the content after each clock tick.
>
> > This costs you one BlockRAM plus three or four CLBs in Xilinx
> > nomenclature, a tiny portion of the smallest Spartan or Virtex device,
> > and it could be run a few thousand times faster than you need.
> > If you have more than 26 inputs, just add another BlockRAM for a total
> > of up to 52 inputs, and extend the adder and accumulator by one bit.
> > (Yes, I know in Spartan you are limited to 12 address inputs, (4K x
> > 4), but you can add the remaining bit outside...)
>
> > Peter Alfke, from home.- Hide quoted text -
>
> > - Show quoted text -
>
> Hi Steve,
> 1. Set up a 16*8 FIFO;
> 2. Each of 25 data sources is first registered in its 8-bit register
> with valid bit when data bits are full from its serial data source;
> 3. When valid =3D '1', push the data into FIFO and clear the valid bit;
> 4. Set up a 13-bit register with initialized 0 data when a new
> calculation starts;
> 5. When FIFO is not empty, add 13-bit register with high 5-bit being
> '0' and low 8-bit from FIFO output.
>
> There is no need for 25 data sources.
>
> Weng

Hi Weng,

I'm sorry I didn't explain in full what I am doing. There is only one
serial source feeding a string of delay lines, and at the end of the
delay lines is a 5x5 array of 8-bit registers whose sum I need to
calculate. Each time the serial source gets a byte in, everything in
the delay lines and 5x5 array gets shifted, and I have a new sum to
calculate (this happens once every ms or so, though, so I'm not really
worried about carry propagation).
So in this case, I don't think a FIFO would help any?

As far as I can see, as noted in my reply to Andy's post, my options
are (a slightly modified version of) his suggested accumulator
solution, or feeding my 25 inputs into a tree of adders. There could
be some other clever solution though?

I was originally just wondering if XST would generate such a tree with
25 operands in a sum statement, or if I would have to build the tree
myself in a few statements.
The Actel array adder IP apparently uses the DADDA algorithm to handle
multiple inputs, but I haven't seen anything in the XST docs about
multiple-operand addition.

Reply by Peter Alfke ●May 25, 20092009-05-25

On May 25, 7:00=A0pm, sbatt...@yahoo.co.jp wrote:
> On May 25, 10:29=A0pm, Andy <jonesa...@comcast.net> wrote:
>
>
>
> > On May 25, 4:56=A0am, sbatt...@yahoo.co.jp wrote:
>
> > > On May 25, 6:43=A0pm, "Andrew Holme" <a...@nospam.co.uk> wrote:
>
> > > > <sbatt...@yahoo.co.jp> wrote in message
>
> > > >news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.co=
m...
>
> > > > > Hi guys,
> > > > > At the moment I'm waiting to find out whether I will be using Xil=
inx
> > > > > or Actel for my project, and so I'm putting it together for both =
just
> > > > > in case.
>
> > > > > In the Actel IP cores, there is an array adder which allows a goo=
d
> > > > > number of inputs, and there's some optional pipelining. I figure =
it's
> > > > > sufficient to just drop this in and wire up as many inputs as I n=
eed.
>
> > > > > Xilinx IP cores seem to have only 2-input adders, and I guess the=
se
> > > > > are probably inferred by XST with the + operator anyway, so I don=
't
> > > > > want to bother with the IP core gen unless there's some reason wh=
y I
> > > > > should.
> > > > > Supposing I want:
>
> > > > > Result <=3D A + B + C + D + E;
> > > > > Note, I used only five inputs in my example for brevity, I will h=
ave
> > > > > more like 25 in my actual system.
>
> > > > > (looking in the XST manual, I can either pad the inputs with lead=
ing
> > > > > zeros or convert to integer and back to std_logic_vector to get c=
arry
> > > > > bits to fill my wider result)
>
> > > > > At the end of the day, when I synthesize this, would there be any
> > > > > difference between coding it in stages (adding pairs of two toget=
her,
> > > > > then adding their sums together, and so on until all are added up=
) and
> > > > > just putting A+B+C+D+E in one statement?
> > > > > All I can think of is that (depending how well conversions to/fro=
m
> > > > > integer are optimized in XST) I might save a few bits of space in=
 the
> > > > > first stages.
> > > > > Using the bit padding method, I suppose that all of the adders in=
 the
> > > > > first stages would wind up unnecessarily being the same width as =
the
> > > > > result.
>
> > > > > Anyway, I'm just curious how this will end up working... any insi=
ght
> > > > > appreciated!
>
> > > > > Steve
>
> > > > How fast do you need to clock it? =A0How many bits wide is your res=
ult?
>
> > > Assuming 25 8-bit inputs, the maximum result is 25*255 =3D 6375, mean=
ing
> > > 13-bit output.
> > > Serial data comes in at 57.6 kilobits/second =3D 7200 bytes/second, a=
nd
> > > the sum of my array is checked once per byte so there will be a littl=
e
> > > over 1ms between clock pulses (I can't imagine that being anywhere
> > > near playing with timing issues). For the project I won't need
> > > anything any faster than that.
>
> > > I'm just wondering how XST would handle such an addition statement
> > > with multiple operands (my synthesis report doesn't say anything abou=
t
> > > adders). Is it smart enough to automatically do some kind of tree
> > > algorithm, or would it do a "dumb" array of one adder feeding into th=
e
> > > next for each extra operand?
>
> > > Thanks for the quick response!
>
> > > Steve- Hide quoted text -
>
> > > - Show quoted text -
>
> > Do you really need to recompute the entire array sum every time, or
> > can you compute a running sum (accumulator) as the data comes in? You
> > can also subtract the last discarded term from your running sum if you
> > are looking for a continuous N-term running sum (as is used in a
> > boxcar filter, etc.)
>
> > As long as integers will handle your data size, you are much better
> > off using them than padding vectors. Simulations will run much faster,
> > and there is no hardware associated with conversion from/to SLV/signed/
> > unsigned to/from integer.
>
> > Andy
>
> Well, I have thought about the accumulator approach, but this is a 5x5
> array being fed data through some delay lines, so I would then need
> one accumulator for each 5-byte row.
> Then I would still have to sum up the output of the five accumulators,
> leaving me one stage deeper than a tree of adders taking all 25
> inputs.
>
> Peter, I'm not converting 25 lines to 6 bits, but rather taking a sum
> of 25 8-bit values resulting in a 13-bit value.
>
> Thanks again for the replies so far!

Well, where do you store the 200 bits (5 x5 x 8) and how do you move
them into the FPGA?
Remember, the internal logic is thousands of times faster than you
need...
Peter

Reply by ●May 26, 20092009-05-26

On May 26, 11:48=A0am, Peter Alfke <al...@sbcglobal.net> wrote:
> On May 25, 7:00=A0pm, sbatt...@yahoo.co.jp wrote:
>
>
>
> > On May 25, 10:29=A0pm, Andy <jonesa...@comcast.net> wrote:
>
> > > On May 25, 4:56=A0am, sbatt...@yahoo.co.jp wrote:
>
> > > > On May 25, 6:43=A0pm, "Andrew Holme" <a...@nospam.co.uk> wrote:
>
> > > > > <sbatt...@yahoo.co.jp> wrote in message
>
> > > > >news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.=
com...
>
> > > > > > Hi guys,
> > > > > > At the moment I'm waiting to find out whether I will be using X=
ilinx
> > > > > > or Actel for my project, and so I'm putting it together for bot=
h just
> > > > > > in case.
>
> > > > > > In the Actel IP cores, there is an array adder which allows a g=
ood
> > > > > > number of inputs, and there's some optional pipelining. I figur=
e it's
> > > > > > sufficient to just drop this in and wire up as many inputs as I=
 need.
>
> > > > > > Xilinx IP cores seem to have only 2-input adders, and I guess t=
hese
> > > > > > are probably inferred by XST with the + operator anyway, so I d=
on't
> > > > > > want to bother with the IP core gen unless there's some reason =
why I
> > > > > > should.
> > > > > > Supposing I want:
>
> > > > > > Result <=3D A + B + C + D + E;
> > > > > > Note, I used only five inputs in my example for brevity, I will=
 have
> > > > > > more like 25 in my actual system.
>
> > > > > > (looking in the XST manual, I can either pad the inputs with le=
ading
> > > > > > zeros or convert to integer and back to std_logic_vector to get=
 carry
> > > > > > bits to fill my wider result)
>
> > > > > > At the end of the day, when I synthesize this, would there be a=
ny
> > > > > > difference between coding it in stages (adding pairs of two tog=
ether,
> > > > > > then adding their sums together, and so on until all are added =
up) and
> > > > > > just putting A+B+C+D+E in one statement?
> > > > > > All I can think of is that (depending how well conversions to/f=
rom
> > > > > > integer are optimized in XST) I might save a few bits of space =
in the
> > > > > > first stages.
> > > > > > Using the bit padding method, I suppose that all of the adders =
in the
> > > > > > first stages would wind up unnecessarily being the same width a=
s the
> > > > > > result.
>
> > > > > > Anyway, I'm just curious how this will end up working... any in=
sight
> > > > > > appreciated!
>
> > > > > > Steve
>
> > > > > How fast do you need to clock it? =A0How many bits wide is your r=
esult?
>
> > > > Assuming 25 8-bit inputs, the maximum result is 25*255 =3D 6375, me=
aning
> > > > 13-bit output.
> > > > Serial data comes in at 57.6 kilobits/second =3D 7200 bytes/second,=
 and
> > > > the sum of my array is checked once per byte so there will be a lit=
tle
> > > > over 1ms between clock pulses (I can't imagine that being anywhere
> > > > near playing with timing issues). For the project I won't need
> > > > anything any faster than that.
>
> > > > I'm just wondering how XST would handle such an addition statement
> > > > with multiple operands (my synthesis report doesn't say anything ab=
out
> > > > adders). Is it smart enough to automatically do some kind of tree
> > > > algorithm, or would it do a "dumb" array of one adder feeding into =
the
> > > > next for each extra operand?
>
> > > > Thanks for the quick response!
>
> > > > Steve- Hide quoted text -
>
> > > > - Show quoted text -
>
> > > Do you really need to recompute the entire array sum every time, or
> > > can you compute a running sum (accumulator) as the data comes in? You
> > > can also subtract the last discarded term from your running sum if yo=
u
> > > are looking for a continuous N-term running sum (as is used in a
> > > boxcar filter, etc.)
>
> > > As long as integers will handle your data size, you are much better
> > > off using them than padding vectors. Simulations will run much faster=
,
> > > and there is no hardware associated with conversion from/to SLV/signe=
d/
> > > unsigned to/from integer.
>
> > > Andy
>
> > Well, I have thought about the accumulator approach, but this is a 5x5
> > array being fed data through some delay lines, so I would then need
> > one accumulator for each 5-byte row.
> > Then I would still have to sum up the output of the five accumulators,
> > leaving me one stage deeper than a tree of adders taking all 25
> > inputs.
>
> > Peter, I'm not converting 25 lines to 6 bits, but rather taking a sum
> > of 25 8-bit values resulting in a 13-bit value.
>
> > Thanks again for the replies so far!
>
> Well, where do you store the 200 bits (5 x5 x 8) and how do you move
> them into the FPGA?
> Remember, the internal logic is thousands of times faster than you
> need...
> Peter

I declared five arrays (0 to 4) of std_logic_vector(7 downto 0) in
signals which get assigned in a clocked process, so if I understand
correctly, each of these bits gets a DFF.
I'm not worried about resources, as I am only using about 20% of the
slices available to a 400k gate Spartan-3 and I don't have that many
things more to add. I may be able to get away with the 200k gate chip
even. And when I added this set of adders to the design, it didn't
seem to take significantly more resources.

I understand that the logic is much faster than I need, so I guess it
doesn't really matter how I code it for the project, but I might like
to go through later (just for myself) and see how well I could
optimize it and how fast it could run ultimately if the incoming data
rate was not limited.

I didn't really post here because I feel like I would have any issues
with my design, I was more just curious on how XST would handle:
Result <=3D A + B + C + D + E + F + .... + X + Y;

Cheers!

Steve

Previous12 Next

Adders with multiple inputs?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group