FPGARelated.com
Forums

Adders with multiple inputs?

Started by Unknown May 25, 2009
Hi guys,
At the moment I'm waiting to find out whether I will be using Xilinx
or Actel for my project, and so I'm putting it together for both just
in case.

In the Actel IP cores, there is an array adder which allows a good
number of inputs, and there's some optional pipelining. I figure it's
sufficient to just drop this in and wire up as many inputs as I need.

Xilinx IP cores seem to have only 2-input adders, and I guess these
are probably inferred by XST with the + operator anyway, so I don't
want to bother with the IP core gen unless there's some reason why I
should.
Supposing I want:

Result <= A + B + C + D + E;
Note, I used only five inputs in my example for brevity, I will have
more like 25 in my actual system.

(looking in the XST manual, I can either pad the inputs with leading
zeros or convert to integer and back to std_logic_vector to get carry
bits to fill my wider result)

At the end of the day, when I synthesize this, would there be any
difference between coding it in stages (adding pairs of two together,
then adding their sums together, and so on until all are added up) and
just putting A+B+C+D+E in one statement?
All I can think of is that (depending how well conversions to/from
integer are optimized in XST) I might save a few bits of space in the
first stages.
Using the bit padding method, I suppose that all of the adders in the
first stages would wind up unnecessarily being the same width as the
result.

Anyway, I'm just curious how this will end up working... any insight
appreciated!

Steve
<sbattazz@yahoo.co.jp> wrote in message 
news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.com...
> Hi guys, > At the moment I'm waiting to find out whether I will be using Xilinx > or Actel for my project, and so I'm putting it together for both just > in case. > > In the Actel IP cores, there is an array adder which allows a good > number of inputs, and there's some optional pipelining. I figure it's > sufficient to just drop this in and wire up as many inputs as I need. > > Xilinx IP cores seem to have only 2-input adders, and I guess these > are probably inferred by XST with the + operator anyway, so I don't > want to bother with the IP core gen unless there's some reason why I > should. > Supposing I want: > > Result <= A + B + C + D + E; > Note, I used only five inputs in my example for brevity, I will have > more like 25 in my actual system. > > (looking in the XST manual, I can either pad the inputs with leading > zeros or convert to integer and back to std_logic_vector to get carry > bits to fill my wider result) > > At the end of the day, when I synthesize this, would there be any > difference between coding it in stages (adding pairs of two together, > then adding their sums together, and so on until all are added up) and > just putting A+B+C+D+E in one statement? > All I can think of is that (depending how well conversions to/from > integer are optimized in XST) I might save a few bits of space in the > first stages. > Using the bit padding method, I suppose that all of the adders in the > first stages would wind up unnecessarily being the same width as the > result. > > Anyway, I'm just curious how this will end up working... any insight > appreciated! > > Steve
How fast do you need to clock it? How many bits wide is your result?
On May 25, 6:43=A0pm, "Andrew Holme" <a...@nospam.co.uk> wrote:
> <sbatt...@yahoo.co.jp> wrote in message > > news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.com... > > > > > Hi guys, > > At the moment I'm waiting to find out whether I will be using Xilinx > > or Actel for my project, and so I'm putting it together for both just > > in case. > > > In the Actel IP cores, there is an array adder which allows a good > > number of inputs, and there's some optional pipelining. I figure it's > > sufficient to just drop this in and wire up as many inputs as I need. > > > Xilinx IP cores seem to have only 2-input adders, and I guess these > > are probably inferred by XST with the + operator anyway, so I don't > > want to bother with the IP core gen unless there's some reason why I > > should. > > Supposing I want: > > > Result <=3D A + B + C + D + E; > > Note, I used only five inputs in my example for brevity, I will have > > more like 25 in my actual system. > > > (looking in the XST manual, I can either pad the inputs with leading > > zeros or convert to integer and back to std_logic_vector to get carry > > bits to fill my wider result) > > > At the end of the day, when I synthesize this, would there be any > > difference between coding it in stages (adding pairs of two together, > > then adding their sums together, and so on until all are added up) and > > just putting A+B+C+D+E in one statement? > > All I can think of is that (depending how well conversions to/from > > integer are optimized in XST) I might save a few bits of space in the > > first stages. > > Using the bit padding method, I suppose that all of the adders in the > > first stages would wind up unnecessarily being the same width as the > > result. > > > Anyway, I'm just curious how this will end up working... any insight > > appreciated! > > > Steve > > How fast do you need to clock it? =A0How many bits wide is your result?
Assuming 25 8-bit inputs, the maximum result is 25*255 =3D 6375, meaning 13-bit output. Serial data comes in at 57.6 kilobits/second =3D 7200 bytes/second, and the sum of my array is checked once per byte so there will be a little over 1ms between clock pulses (I can't imagine that being anywhere near playing with timing issues). For the project I won't need anything any faster than that. I'm just wondering how XST would handle such an addition statement with multiple operands (my synthesis report doesn't say anything about adders). Is it smart enough to automatically do some kind of tree algorithm, or would it do a "dumb" array of one adder feeding into the next for each extra operand? Thanks for the quick response! Steve
On May 25, 4:56=A0am, sbatt...@yahoo.co.jp wrote:
> On May 25, 6:43=A0pm, "Andrew Holme" <a...@nospam.co.uk> wrote: > > > > > > > <sbatt...@yahoo.co.jp> wrote in message > > >news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.com... > > > > Hi guys, > > > At the moment I'm waiting to find out whether I will be using Xilinx > > > or Actel for my project, and so I'm putting it together for both just > > > in case. > > > > In the Actel IP cores, there is an array adder which allows a good > > > number of inputs, and there's some optional pipelining. I figure it's > > > sufficient to just drop this in and wire up as many inputs as I need. > > > > Xilinx IP cores seem to have only 2-input adders, and I guess these > > > are probably inferred by XST with the + operator anyway, so I don't > > > want to bother with the IP core gen unless there's some reason why I > > > should. > > > Supposing I want: > > > > Result <=3D A + B + C + D + E; > > > Note, I used only five inputs in my example for brevity, I will have > > > more like 25 in my actual system. > > > > (looking in the XST manual, I can either pad the inputs with leading > > > zeros or convert to integer and back to std_logic_vector to get carry > > > bits to fill my wider result) > > > > At the end of the day, when I synthesize this, would there be any > > > difference between coding it in stages (adding pairs of two together, > > > then adding their sums together, and so on until all are added up) an=
d
> > > just putting A+B+C+D+E in one statement? > > > All I can think of is that (depending how well conversions to/from > > > integer are optimized in XST) I might save a few bits of space in the > > > first stages. > > > Using the bit padding method, I suppose that all of the adders in the > > > first stages would wind up unnecessarily being the same width as the > > > result. > > > > Anyway, I'm just curious how this will end up working... any insight > > > appreciated! > > > > Steve > > > How fast do you need to clock it? =A0How many bits wide is your result? > > Assuming 25 8-bit inputs, the maximum result is 25*255 =3D 6375, meaning > 13-bit output. > Serial data comes in at 57.6 kilobits/second =3D 7200 bytes/second, and > the sum of my array is checked once per byte so there will be a little > over 1ms between clock pulses (I can't imagine that being anywhere > near playing with timing issues). For the project I won't need > anything any faster than that. > > I'm just wondering how XST would handle such an addition statement > with multiple operands (my synthesis report doesn't say anything about > adders). Is it smart enough to automatically do some kind of tree > algorithm, or would it do a "dumb" array of one adder feeding into the > next for each extra operand? > > Thanks for the quick response! > > Steve- Hide quoted text - > > - Show quoted text -
Do you really need to recompute the entire array sum every time, or can you compute a running sum (accumulator) as the data comes in? You can also subtract the last discarded term from your running sum if you are looking for a continuous N-term running sum (as is used in a boxcar filter, etc.) As long as integers will handle your data size, you are much better off using them than padding vectors. Simulations will run much faster, and there is no hardware associated with conversion from/to SLV/signed/ unsigned to/from integer. Andy
On May 25, 2:11=A0am, sbatt...@yahoo.co.jp wrote:
> Hi guys, > At the moment I'm waiting to find out whether I will be using Xilinx > or Actel for my project, and so I'm putting it together for both just > in case. > > In the Actel IP cores, there is an array adder which allows a good > number of inputs, and there's some optional pipelining. I figure it's > sufficient to just drop this in and wire up as many inputs as I need. > > Xilinx IP cores seem to have only 2-input adders, and I guess these > are probably inferred by XST with the + operator anyway, so I don't > want to bother with the IP core gen unless there's some reason why I > should. > Supposing I want: > > Result <=3D A + B + C + D + E; > Note, I used only five inputs in my example for brevity, I will have > more like 25 in my actual system. > > (looking in the XST manual, I can either pad the inputs with leading > zeros or convert to integer and back to std_logic_vector to get carry > bits to fill my wider result) > > At the end of the day, when I synthesize this, would there be any > difference between coding it in stages (adding pairs of two together, > then adding their sums together, and so on until all are added up) and > just putting A+B+C+D+E in one statement? > All I can think of is that (depending how well conversions to/from > integer are optimized in XST) I might save a few bits of space in the > first stages. > Using the bit padding method, I suppose that all of the adders in the > first stages would wind up unnecessarily being the same width as the > result. > > Anyway, I'm just curious how this will end up working... any insight > appreciated! > > Steve
If I understand you right, you have 25 parallel inputs, each sending you bit-serial data. You need to convert the 25 inputs into one 6-bit binary word, and then accumulate these words with increasing (or decreasing) binary weight. Conversion of 25 lines to 6 bits can be done in many ways, including sequential scanning or shifting, which requires a faster clock of > 1.5 MHz. But here is an unconventional and simpler way: Use 13 inputs as address to one port of a BlockRAM with 4 parallel outputs (8K x 4) Use the remaining 12 inputs as address to the other port of the same BlockRAM. Store the conversion of (# of active inputs to a binary value) in the BlockRAM. Add the two 4 bit binary words together to form a 5-bit word that always represents the number of active inputs. Then feed this 5-bit value into a 13-bit accumulator, where you shift the content after each clock tick. This costs you one BlockRAM plus three or four CLBs in Xilinx nomenclature, a tiny portion of the smallest Spartan or Virtex device, and it could be run a few thousand times faster than you need. If you have more than 26 inputs, just add another BlockRAM for a total of up to 52 inputs, and extend the adder and accumulator by one bit. (Yes, I know in Spartan you are limited to 12 address inputs, (4K x 4), but you can add the remaining bit outside...) Peter Alfke, from home.
On May 25, 9:16=A0am, Peter Alfke <al...@sbcglobal.net> wrote:
> On May 25, 2:11=A0am, sbatt...@yahoo.co.jp wrote: > > > > > > > Hi guys, > > At the moment I'm waiting to find out whether I will be using Xilinx > > or Actel for my project, and so I'm putting it together for both just > > in case. > > > In the Actel IP cores, there is an array adder which allows a good > > number of inputs, and there's some optional pipelining. I figure it's > > sufficient to just drop this in and wire up as many inputs as I need. > > > Xilinx IP cores seem to have only 2-input adders, and I guess these > > are probably inferred by XST with the + operator anyway, so I don't > > want to bother with the IP core gen unless there's some reason why I > > should. > > Supposing I want: > > > Result <=3D A + B + C + D + E; > > Note, I used only five inputs in my example for brevity, I will have > > more like 25 in my actual system. > > > (looking in the XST manual, I can either pad the inputs with leading > > zeros or convert to integer and back to std_logic_vector to get carry > > bits to fill my wider result) > > > At the end of the day, when I synthesize this, would there be any > > difference between coding it in stages (adding pairs of two together, > > then adding their sums together, and so on until all are added up) and > > just putting A+B+C+D+E in one statement? > > All I can think of is that (depending how well conversions to/from > > integer are optimized in XST) I might save a few bits of space in the > > first stages. > > Using the bit padding method, I suppose that all of the adders in the > > first stages would wind up unnecessarily being the same width as the > > result. > > > Anyway, I'm just curious how this will end up working... any insight > > appreciated! > > > Steve > > If I understand you right, you have 25 parallel inputs, each sending > you bit-serial data. > You need to convert the 25 inputs into one 6-bit binary word, and then > accumulate these words with increasing (or decreasing) binary weight. > > Conversion of 25 lines to 6 bits can be done in many ways, including > sequential scanning or shifting, which requires a faster clock of > > 1.5 MHz. > But here is an unconventional and simpler way: > Use 13 inputs as address to one port of a BlockRAM with 4 parallel > outputs =A0(8K x 4) > Use the remaining 12 inputs as address to the other port of the same > BlockRAM. > Store the conversion of (# of active inputs to a binary value) in the > BlockRAM. > > Add the two 4 bit binary words together to form a 5-bit word that > always represents the number of active inputs. > Then feed this 5-bit value into a 13-bit accumulator, where you shift > the content after each clock tick. > > This costs you one BlockRAM plus three or four CLBs in Xilinx > nomenclature, a tiny portion of the smallest Spartan or Virtex device, > and it could be run a few thousand times faster than you need. > If you have more than 26 inputs, just add another BlockRAM for a total > of up to 52 inputs, and extend the adder and accumulator by one bit. > (Yes, I know in Spartan you are limited to 12 address inputs, (4K x > 4), but you can add the remaining bit outside...) > > Peter Alfke, from home.- Hide quoted text - > > - Show quoted text -
Hi Steve, 1. Set up a 16*8 FIFO; 2. Each of 25 data sources is first registered in its 8-bit register with valid bit when data bits are full from its serial data source; 3. When valid =3D '1', push the data into FIFO and clear the valid bit; 4. Set up a 13-bit register with initialized 0 data when a new calculation starts; 5. When FIFO is not empty, add 13-bit register with high 5-bit being '0' and low 8-bit from FIFO output. There is no need for 25 data sources. Weng
On May 25, 10:29=A0pm, Andy <jonesa...@comcast.net> wrote:
> On May 25, 4:56=A0am, sbatt...@yahoo.co.jp wrote: > > > > > On May 25, 6:43=A0pm, "Andrew Holme" <a...@nospam.co.uk> wrote: > > > > <sbatt...@yahoo.co.jp> wrote in message > > > >news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.com.=
..
> > > > > Hi guys, > > > > At the moment I'm waiting to find out whether I will be using Xilin=
x
> > > > or Actel for my project, and so I'm putting it together for both ju=
st
> > > > in case. > > > > > In the Actel IP cores, there is an array adder which allows a good > > > > number of inputs, and there's some optional pipelining. I figure it=
's
> > > > sufficient to just drop this in and wire up as many inputs as I nee=
d.
> > > > > Xilinx IP cores seem to have only 2-input adders, and I guess these > > > > are probably inferred by XST with the + operator anyway, so I don't > > > > want to bother with the IP core gen unless there's some reason why =
I
> > > > should. > > > > Supposing I want: > > > > > Result <=3D A + B + C + D + E; > > > > Note, I used only five inputs in my example for brevity, I will hav=
e
> > > > more like 25 in my actual system. > > > > > (looking in the XST manual, I can either pad the inputs with leadin=
g
> > > > zeros or convert to integer and back to std_logic_vector to get car=
ry
> > > > bits to fill my wider result) > > > > > At the end of the day, when I synthesize this, would there be any > > > > difference between coding it in stages (adding pairs of two togethe=
r,
> > > > then adding their sums together, and so on until all are added up) =
and
> > > > just putting A+B+C+D+E in one statement? > > > > All I can think of is that (depending how well conversions to/from > > > > integer are optimized in XST) I might save a few bits of space in t=
he
> > > > first stages. > > > > Using the bit padding method, I suppose that all of the adders in t=
he
> > > > first stages would wind up unnecessarily being the same width as th=
e
> > > > result. > > > > > Anyway, I'm just curious how this will end up working... any insigh=
t
> > > > appreciated! > > > > > Steve > > > > How fast do you need to clock it? =A0How many bits wide is your resul=
t?
> > > Assuming 25 8-bit inputs, the maximum result is 25*255 =3D 6375, meanin=
g
> > 13-bit output. > > Serial data comes in at 57.6 kilobits/second =3D 7200 bytes/second, and > > the sum of my array is checked once per byte so there will be a little > > over 1ms between clock pulses (I can't imagine that being anywhere > > near playing with timing issues). For the project I won't need > > anything any faster than that. > > > I'm just wondering how XST would handle such an addition statement > > with multiple operands (my synthesis report doesn't say anything about > > adders). Is it smart enough to automatically do some kind of tree > > algorithm, or would it do a "dumb" array of one adder feeding into the > > next for each extra operand? > > > Thanks for the quick response! > > > Steve- Hide quoted text - > > > - Show quoted text - > > Do you really need to recompute the entire array sum every time, or > can you compute a running sum (accumulator) as the data comes in? You > can also subtract the last discarded term from your running sum if you > are looking for a continuous N-term running sum (as is used in a > boxcar filter, etc.) > > As long as integers will handle your data size, you are much better > off using them than padding vectors. Simulations will run much faster, > and there is no hardware associated with conversion from/to SLV/signed/ > unsigned to/from integer. > > Andy
Well, I have thought about the accumulator approach, but this is a 5x5 array being fed data through some delay lines, so I would then need one accumulator for each 5-byte row. Then I would still have to sum up the output of the five accumulators, leaving me one stage deeper than a tree of adders taking all 25 inputs. Peter, I'm not converting 25 lines to 6 bits, but rather taking a sum of 25 8-bit values resulting in a 13-bit value. Thanks again for the replies so far!
On May 26, 3:58=A0am, Weng Tianxiang <wtx...@gmail.com> wrote:
> On May 25, 9:16=A0am, Peter Alfke <al...@sbcglobal.net> wrote: > > > > > On May 25, 2:11=A0am, sbatt...@yahoo.co.jp wrote: > > > > Hi guys, > > > At the moment I'm waiting to find out whether I will be using Xilinx > > > or Actel for my project, and so I'm putting it together for both just > > > in case. > > > > In the Actel IP cores, there is an array adder which allows a good > > > number of inputs, and there's some optional pipelining. I figure it's > > > sufficient to just drop this in and wire up as many inputs as I need. > > > > Xilinx IP cores seem to have only 2-input adders, and I guess these > > > are probably inferred by XST with the + operator anyway, so I don't > > > want to bother with the IP core gen unless there's some reason why I > > > should. > > > Supposing I want: > > > > Result <=3D A + B + C + D + E; > > > Note, I used only five inputs in my example for brevity, I will have > > > more like 25 in my actual system. > > > > (looking in the XST manual, I can either pad the inputs with leading > > > zeros or convert to integer and back to std_logic_vector to get carry > > > bits to fill my wider result) > > > > At the end of the day, when I synthesize this, would there be any > > > difference between coding it in stages (adding pairs of two together, > > > then adding their sums together, and so on until all are added up) an=
d
> > > just putting A+B+C+D+E in one statement? > > > All I can think of is that (depending how well conversions to/from > > > integer are optimized in XST) I might save a few bits of space in the > > > first stages. > > > Using the bit padding method, I suppose that all of the adders in the > > > first stages would wind up unnecessarily being the same width as the > > > result. > > > > Anyway, I'm just curious how this will end up working... any insight > > > appreciated! > > > > Steve > > > If I understand you right, you have 25 parallel inputs, each sending > > you bit-serial data. > > You need to convert the 25 inputs into one 6-bit binary word, and then > > accumulate these words with increasing (or decreasing) binary weight. > > > Conversion of 25 lines to 6 bits can be done in many ways, including > > sequential scanning or shifting, which requires a faster clock of > > > 1.5 MHz. > > But here is an unconventional and simpler way: > > Use 13 inputs as address to one port of a BlockRAM with 4 parallel > > outputs =A0(8K x 4) > > Use the remaining 12 inputs as address to the other port of the same > > BlockRAM. > > Store the conversion of (# of active inputs to a binary value) in the > > BlockRAM. > > > Add the two 4 bit binary words together to form a 5-bit word that > > always represents the number of active inputs. > > Then feed this 5-bit value into a 13-bit accumulator, where you shift > > the content after each clock tick. > > > This costs you one BlockRAM plus three or four CLBs in Xilinx > > nomenclature, a tiny portion of the smallest Spartan or Virtex device, > > and it could be run a few thousand times faster than you need. > > If you have more than 26 inputs, just add another BlockRAM for a total > > of up to 52 inputs, and extend the adder and accumulator by one bit. > > (Yes, I know in Spartan you are limited to 12 address inputs, (4K x > > 4), but you can add the remaining bit outside...) > > > Peter Alfke, from home.- Hide quoted text - > > > - Show quoted text - > > Hi Steve, > 1. Set up a 16*8 FIFO; > 2. Each of 25 data sources is first registered in its 8-bit register > with valid bit when data bits are full from its serial data source; > 3. When valid =3D '1', push the data into FIFO and clear the valid bit; > 4. Set up a 13-bit register with initialized 0 data when a new > calculation starts; > 5. When FIFO is not empty, add 13-bit register with high 5-bit being > '0' and low 8-bit from FIFO output. > > There is no need for 25 data sources. > > Weng
Hi Weng, I'm sorry I didn't explain in full what I am doing. There is only one serial source feeding a string of delay lines, and at the end of the delay lines is a 5x5 array of 8-bit registers whose sum I need to calculate. Each time the serial source gets a byte in, everything in the delay lines and 5x5 array gets shifted, and I have a new sum to calculate (this happens once every ms or so, though, so I'm not really worried about carry propagation). So in this case, I don't think a FIFO would help any? As far as I can see, as noted in my reply to Andy's post, my options are (a slightly modified version of) his suggested accumulator solution, or feeding my 25 inputs into a tree of adders. There could be some other clever solution though? I was originally just wondering if XST would generate such a tree with 25 operands in a sum statement, or if I would have to build the tree myself in a few statements. The Actel array adder IP apparently uses the DADDA algorithm to handle multiple inputs, but I haven't seen anything in the XST docs about multiple-operand addition.
On May 25, 7:00=A0pm, sbatt...@yahoo.co.jp wrote:
> On May 25, 10:29=A0pm, Andy <jonesa...@comcast.net> wrote: > > > > > On May 25, 4:56=A0am, sbatt...@yahoo.co.jp wrote: > > > > On May 25, 6:43=A0pm, "Andrew Holme" <a...@nospam.co.uk> wrote: > > > > > <sbatt...@yahoo.co.jp> wrote in message > > > > >news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.co=
m...
> > > > > > Hi guys, > > > > > At the moment I'm waiting to find out whether I will be using Xil=
inx
> > > > > or Actel for my project, and so I'm putting it together for both =
just
> > > > > in case. > > > > > > In the Actel IP cores, there is an array adder which allows a goo=
d
> > > > > number of inputs, and there's some optional pipelining. I figure =
it's
> > > > > sufficient to just drop this in and wire up as many inputs as I n=
eed.
> > > > > > Xilinx IP cores seem to have only 2-input adders, and I guess the=
se
> > > > > are probably inferred by XST with the + operator anyway, so I don=
't
> > > > > want to bother with the IP core gen unless there's some reason wh=
y I
> > > > > should. > > > > > Supposing I want: > > > > > > Result <=3D A + B + C + D + E; > > > > > Note, I used only five inputs in my example for brevity, I will h=
ave
> > > > > more like 25 in my actual system. > > > > > > (looking in the XST manual, I can either pad the inputs with lead=
ing
> > > > > zeros or convert to integer and back to std_logic_vector to get c=
arry
> > > > > bits to fill my wider result) > > > > > > At the end of the day, when I synthesize this, would there be any > > > > > difference between coding it in stages (adding pairs of two toget=
her,
> > > > > then adding their sums together, and so on until all are added up=
) and
> > > > > just putting A+B+C+D+E in one statement? > > > > > All I can think of is that (depending how well conversions to/fro=
m
> > > > > integer are optimized in XST) I might save a few bits of space in=
the
> > > > > first stages. > > > > > Using the bit padding method, I suppose that all of the adders in=
the
> > > > > first stages would wind up unnecessarily being the same width as =
the
> > > > > result. > > > > > > Anyway, I'm just curious how this will end up working... any insi=
ght
> > > > > appreciated! > > > > > > Steve > > > > > How fast do you need to clock it? =A0How many bits wide is your res=
ult?
> > > > Assuming 25 8-bit inputs, the maximum result is 25*255 =3D 6375, mean=
ing
> > > 13-bit output. > > > Serial data comes in at 57.6 kilobits/second =3D 7200 bytes/second, a=
nd
> > > the sum of my array is checked once per byte so there will be a littl=
e
> > > over 1ms between clock pulses (I can't imagine that being anywhere > > > near playing with timing issues). For the project I won't need > > > anything any faster than that. > > > > I'm just wondering how XST would handle such an addition statement > > > with multiple operands (my synthesis report doesn't say anything abou=
t
> > > adders). Is it smart enough to automatically do some kind of tree > > > algorithm, or would it do a "dumb" array of one adder feeding into th=
e
> > > next for each extra operand? > > > > Thanks for the quick response! > > > > Steve- Hide quoted text - > > > > - Show quoted text - > > > Do you really need to recompute the entire array sum every time, or > > can you compute a running sum (accumulator) as the data comes in? You > > can also subtract the last discarded term from your running sum if you > > are looking for a continuous N-term running sum (as is used in a > > boxcar filter, etc.) > > > As long as integers will handle your data size, you are much better > > off using them than padding vectors. Simulations will run much faster, > > and there is no hardware associated with conversion from/to SLV/signed/ > > unsigned to/from integer. > > > Andy > > Well, I have thought about the accumulator approach, but this is a 5x5 > array being fed data through some delay lines, so I would then need > one accumulator for each 5-byte row. > Then I would still have to sum up the output of the five accumulators, > leaving me one stage deeper than a tree of adders taking all 25 > inputs. > > Peter, I'm not converting 25 lines to 6 bits, but rather taking a sum > of 25 8-bit values resulting in a 13-bit value. > > Thanks again for the replies so far!
Well, where do you store the 200 bits (5 x5 x 8) and how do you move them into the FPGA? Remember, the internal logic is thousands of times faster than you need... Peter
On May 26, 11:48=A0am, Peter Alfke <al...@sbcglobal.net> wrote:
> On May 25, 7:00=A0pm, sbatt...@yahoo.co.jp wrote: > > > > > On May 25, 10:29=A0pm, Andy <jonesa...@comcast.net> wrote: > > > > On May 25, 4:56=A0am, sbatt...@yahoo.co.jp wrote: > > > > > On May 25, 6:43=A0pm, "Andrew Holme" <a...@nospam.co.uk> wrote: > > > > > > <sbatt...@yahoo.co.jp> wrote in message > > > > > >news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.=
com...
> > > > > > > Hi guys, > > > > > > At the moment I'm waiting to find out whether I will be using X=
ilinx
> > > > > > or Actel for my project, and so I'm putting it together for bot=
h just
> > > > > > in case. > > > > > > > In the Actel IP cores, there is an array adder which allows a g=
ood
> > > > > > number of inputs, and there's some optional pipelining. I figur=
e it's
> > > > > > sufficient to just drop this in and wire up as many inputs as I=
need.
> > > > > > > Xilinx IP cores seem to have only 2-input adders, and I guess t=
hese
> > > > > > are probably inferred by XST with the + operator anyway, so I d=
on't
> > > > > > want to bother with the IP core gen unless there's some reason =
why I
> > > > > > should. > > > > > > Supposing I want: > > > > > > > Result <=3D A + B + C + D + E; > > > > > > Note, I used only five inputs in my example for brevity, I will=
have
> > > > > > more like 25 in my actual system. > > > > > > > (looking in the XST manual, I can either pad the inputs with le=
ading
> > > > > > zeros or convert to integer and back to std_logic_vector to get=
carry
> > > > > > bits to fill my wider result) > > > > > > > At the end of the day, when I synthesize this, would there be a=
ny
> > > > > > difference between coding it in stages (adding pairs of two tog=
ether,
> > > > > > then adding their sums together, and so on until all are added =
up) and
> > > > > > just putting A+B+C+D+E in one statement? > > > > > > All I can think of is that (depending how well conversions to/f=
rom
> > > > > > integer are optimized in XST) I might save a few bits of space =
in the
> > > > > > first stages. > > > > > > Using the bit padding method, I suppose that all of the adders =
in the
> > > > > > first stages would wind up unnecessarily being the same width a=
s the
> > > > > > result. > > > > > > > Anyway, I'm just curious how this will end up working... any in=
sight
> > > > > > appreciated! > > > > > > > Steve > > > > > > How fast do you need to clock it? =A0How many bits wide is your r=
esult?
> > > > > Assuming 25 8-bit inputs, the maximum result is 25*255 =3D 6375, me=
aning
> > > > 13-bit output. > > > > Serial data comes in at 57.6 kilobits/second =3D 7200 bytes/second,=
and
> > > > the sum of my array is checked once per byte so there will be a lit=
tle
> > > > over 1ms between clock pulses (I can't imagine that being anywhere > > > > near playing with timing issues). For the project I won't need > > > > anything any faster than that. > > > > > I'm just wondering how XST would handle such an addition statement > > > > with multiple operands (my synthesis report doesn't say anything ab=
out
> > > > adders). Is it smart enough to automatically do some kind of tree > > > > algorithm, or would it do a "dumb" array of one adder feeding into =
the
> > > > next for each extra operand? > > > > > Thanks for the quick response! > > > > > Steve- Hide quoted text - > > > > > - Show quoted text - > > > > Do you really need to recompute the entire array sum every time, or > > > can you compute a running sum (accumulator) as the data comes in? You > > > can also subtract the last discarded term from your running sum if yo=
u
> > > are looking for a continuous N-term running sum (as is used in a > > > boxcar filter, etc.) > > > > As long as integers will handle your data size, you are much better > > > off using them than padding vectors. Simulations will run much faster=
,
> > > and there is no hardware associated with conversion from/to SLV/signe=
d/
> > > unsigned to/from integer. > > > > Andy > > > Well, I have thought about the accumulator approach, but this is a 5x5 > > array being fed data through some delay lines, so I would then need > > one accumulator for each 5-byte row. > > Then I would still have to sum up the output of the five accumulators, > > leaving me one stage deeper than a tree of adders taking all 25 > > inputs. > > > Peter, I'm not converting 25 lines to 6 bits, but rather taking a sum > > of 25 8-bit values resulting in a 13-bit value. > > > Thanks again for the replies so far! > > Well, where do you store the 200 bits (5 x5 x 8) and how do you move > them into the FPGA? > Remember, the internal logic is thousands of times faster than you > need... > Peter
I declared five arrays (0 to 4) of std_logic_vector(7 downto 0) in signals which get assigned in a clocked process, so if I understand correctly, each of these bits gets a DFF. I'm not worried about resources, as I am only using about 20% of the slices available to a 400k gate Spartan-3 and I don't have that many things more to add. I may be able to get away with the 200k gate chip even. And when I added this set of adders to the design, it didn't seem to take significantly more resources. I understand that the logic is much faster than I need, so I guess it doesn't really matter how I code it for the project, but I might like to go through later (just for myself) and see how well I could optimize it and how fast it could run ultimately if the incoming data rate was not limited. I didn't really post here because I feel like I would have any issues with my design, I was more just curious on how XST would handle: Result <=3D A + B + C + D + E + F + .... + X + Y; Cheers! Steve