FPGARelated.com
Forums

FPGA multiplier

Started by sutejok September 5, 2006
from the Xilinx Virtex4 spec:

=B7 XtremeDSP=99 Slice
- 18x18, two's complement, signed Multiplier
- Optional pipeline stages
- Built-In Accumulator (48-bits) & Adder/Subtracter


i'm not too familiar with dsp on fpga - what does it mean when it says
18x18 multiplier? is it a hardware multiplier? is there anywhere i can
get informations on and how to use them?
something specific to virtex4 would be nice

thx=20
tejo

Click on
http://www.xilinx.com/bvdocs/userguides/ug073.pdf
for an extensive User Guide.
Peter Alfke, Xilinx
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D
sutejok wrote:
> from the Xilinx Virtex4 spec: > > =B7 XtremeDSP=99 Slice > - 18x18, two's complement, signed Multiplier > - Optional pipeline stages > - Built-In Accumulator (48-bits) & Adder/Subtracter > > > i'm not too familiar with dsp on fpga - what does it mean when it says > 18x18 multiplier? is it a hardware multiplier? is there anywhere i can > get informations on and how to use them? > something specific to virtex4 would be nice >=20 > thx=20 > tejo
Tejo,

http://direct.xilinx.com/bvdocs/userguides/ug073.pdf

Yes, the 18X18 multiplier/accumulator is a hardened block, so that
performing this function results in from 8 to 20 times less power than
performing this function would if it was done in the logic of the FPGA
(luts, dff, interconnect, etc.)

The above guide details use of the V4 for "extreme" DSP uses.

FPGAs are useful for tasks that DSP processors are too slow for,
otherwise, DSP processors are generally far easier and better suited for
DSP.  For example, a video conference processor, where multiple streams
must be encoded, decoded, combined, along with all audio processing is
one such task where a FPGA would excel for both cost, power, and
performance.

http://www.demosondemand.com/clients/xilinx/001/page/index_dsp_review.asp

Austin

sutejok wrote:
> from the Xilinx Virtex4 spec: > > � XtremeDSP� Slice > - 18x18, two's complement, signed Multiplier > - Optional pipeline stages > - Built-In Accumulator (48-bits) & Adder/Subtracter > > > i'm not too familiar with dsp on fpga - what does it mean when it says > 18x18 multiplier? is it a hardware multiplier? is there anywhere i can > get informations on and how to use them? > something specific to virtex4 would be nice > > thx > tejo >
Austin Lesea <austin@xilinx.com> wrote:

>Tejo, > >http://direct.xilinx.com/bvdocs/userguides/ug073.pdf > >Yes, the 18X18 multiplier/accumulator is a hardened block, so that >performing this function results in from 8 to 20 times less power than >performing this function would if it was done in the logic of the FPGA >(luts, dff, interconnect, etc.) > >The above guide details use of the V4 for "extreme" DSP uses. > >FPGAs are useful for tasks that DSP processors are too slow for, >otherwise, DSP processors are generally far easier and better suited for >DSP. For example, a video conference processor, where multiple streams >must be encoded, decoded, combined, along with all audio processing is >one such task where a FPGA would excel for both cost, power, and >performance.
I doubt about cost and preformance. Developing such a device would probably take so much time that an ASIC is just as cost effective and uses even less power. An older PC is already capable of doing these functions with a development time that can be expressed in days, not years. Even when dealing with loads of videostreams at high resolutions it is more cost effective, reliable and flexible to use PCs on a fast network to do the processing rather than building a custom FPGA/ASIC solution. -- Reply to nico@nctdevpuntnl (punt=.) Bedrijven en winkels vindt U op www.adresboekje.nl
Nico,

We (Peter and I) have decided to avoid any marketing discussions.

On technical subjects, at least I can (usually) post something useful.

You decide when, where, and how to use Xilinx FPGAs.

I am happy if you use them at all, for whatever you find profitable.

The website speaks for itself:  there are lots of customer testimonials
for those who wish to read them.  You decide.

Perhaps others can post why they use our FPGAs for extreme DSP?

Austin



Nico Coesel wrote:
> Austin Lesea <austin@xilinx.com> wrote: > >> Tejo, >> >> http://direct.xilinx.com/bvdocs/userguides/ug073.pdf >> >> Yes, the 18X18 multiplier/accumulator is a hardened block, so that >> performing this function results in from 8 to 20 times less power than >> performing this function would if it was done in the logic of the FPGA >> (luts, dff, interconnect, etc.) >> >> The above guide details use of the V4 for "extreme" DSP uses. >> >> FPGAs are useful for tasks that DSP processors are too slow for, >> otherwise, DSP processors are generally far easier and better suited for >> DSP. For example, a video conference processor, where multiple streams >> must be encoded, decoded, combined, along with all audio processing is >> one such task where a FPGA would excel for both cost, power, and >> performance. > > I doubt about cost and preformance. Developing such a device would > probably take so much time that an ASIC is just as cost effective and > uses even less power. An older PC is already capable of doing these > functions with a development time that can be expressed in days, not > years. > > Even when dealing with loads of videostreams at high resolutions it is > more cost effective, reliable and flexible to use PCs on a fast > network to do the processing rather than building a custom FPGA/ASIC > solution. >
Nico Coesel wrote:
> Even when dealing with loads of videostreams at high resolutions it is > more cost effective, reliable and flexible to use PCs on a fast > network to do the processing rather than building a custom FPGA/ASIC > solution. >
Probably there are other constraints. Do you need 1,000 of these? Is it a mass market product? Does it need to fit in a rack? In a unit the size of a pack of cigarettes? Getting something working in a lab in just any old manner, perhaps for getting funding, is one thing. Getting to a fieldable solution... There is no way one can make a blanket statement about the best solution without knowing all the requirements. In my case I worked on a project that had to encode live NTSC video to mpeg-2 I frames with a minimum of latency. The solution ended up being DSP based, blackfin DSP's, 4 video/audio inputs on a PCI card. It was a big project. In the end we were limited by the performance of the DSP's. The frame size ended up being 352x240 at 60 hz. There was just no way we could ever do 720x240 -- had to downsample in X. Now if we'd opted for an FPGA, if it was big enough we'd have had lots of options to improve performance. We could have licensed some existing IP, modified it to suit. It would have been a bigger unknown since none of us had direct FPGA experience, but we did have low level programming experience. Going an FPGA route might have been a better investment in the long run... Use the right technology/solution for the task at hand. No one size fits all. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
David Ashley wrote:
> Now if we'd opted for an FPGA, if it was big enough we'd have had > lots of options to improve performance. We could have licensed > some existing IP, modified it to suit. It would have been a bigger > unknown since none of us had direct FPGA experience, but we did > have low level programming experience. Going an FPGA route might > have been a better investment in the long run...
One more thing occured to me. With the Analog Devices blackfin DSP approach we found out the hard way memory bandwidth was a severely limiting factor. The DSP had some small amount of on chip memory that ran at the CORE clock frequency. The SYSTEM clock was a fraction of that, say 1/5th or 1/6th. Accessing external SDRAM took something like 6 SCLOCKS, which translates to 36 CORE clocks. Thats a *long* time in DSP space. The SDRAM controller never did burst accesses, as I recall. What that means is you can't effectively do anything unless you use the on chip fast memory, which operates at the CORE clock frequency (600 mhz to 750 mhz for example). But that was a limited resource. And there was no way to improve the SDRAM controller, that was part of the chip. So the resolution *couldn't* improve, we didn't have memory for it, and no amount of optimization would work. With an FPGA, on the other hand, one could have modified the external memory controller to burst out a whole 64 byte chunk of memory, or burst one into a BRAM. Then operate in the BRAM, then write it back out. Doing burst accesses would really speed things up, and the memory IO could go in in parallel with other processing. In short there would be almost unlimited ability to optimize, as needed. With the fixed-cpu DSP approach, we found out the limits the hard way. Then we had to reduce our expectations. In the end it was OK, but it might have been a disaster. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
On 2006-09-05, Austin Lesea <austin@xilinx.com> wrote:
> Tejo, > > http://direct.xilinx.com/bvdocs/userguides/ug073.pdf
The TOC of that document refers to instantiation templates on p58/60 while the PDF seems to go from 56..63 directly. Trouble with my pdf reader?? -- Ben Jackson AD7GD <ben@ben.com> http://www.ben.com/
Dave,

Funny you should mention this.  We came across the same issue, but we found 
out the limitations of the memory interface before we decided to spin 
boards.  We ultimately went with an FPGA.  You can't be an FPGA for parallel 
processing.  In our particular application an FPGA running at 67MHz out 
performed the BlackFin running at 500MHz, all because of the FPGA's inherent 
power of parallel processing.

Take care,
Rob


"David Ashley" <dash@nowhere.net.dont.email.me> wrote in message 
news:44fded00$1_3@x-privat.org...
> David Ashley wrote: >> Now if we'd opted for an FPGA, if it was big enough we'd have had >> lots of options to improve performance. We could have licensed >> some existing IP, modified it to suit. It would have been a bigger >> unknown since none of us had direct FPGA experience, but we did >> have low level programming experience. Going an FPGA route might >> have been a better investment in the long run... > > One more thing occured to me. With the Analog Devices blackfin DSP > approach we found out the hard way memory bandwidth was a severely > limiting factor. The DSP had some small amount of on chip memory > that ran at the CORE clock frequency. The SYSTEM clock was a fraction > of that, say 1/5th or 1/6th. Accessing external SDRAM took something > like 6 SCLOCKS, which translates to 36 CORE clocks. Thats a *long* > time in DSP space. The SDRAM controller never did burst accesses, as I > recall. > > What that means is you can't effectively do anything unless you use > the on chip fast memory, which operates at the CORE clock frequency > (600 mhz to 750 mhz for example). But that was a limited resource. > And there was no way to improve the SDRAM controller, that was part > of the chip. So the resolution *couldn't* improve, we didn't have memory > for it, and no amount of optimization would work. > > With an FPGA, on the other hand, one could have modified the external > memory controller to burst out a whole 64 byte chunk of memory, or > burst one into a BRAM. Then operate in the BRAM, then write it back > out. Doing burst accesses would really speed things up, and the memory > IO could go in in parallel with other processing. In short there would be > almost unlimited ability to optimize, as needed. > > With the fixed-cpu DSP approach, we found out the limits the hard way. > Then we had to reduce our expectations. In the end it was OK, but it > might have been a disaster. > > -Dave > > -- > David Ashley http://www.xdr.com/dash > Embedded linux, device drivers, system architecture
A PC on a fast network is far from an embedded solution.  And many 
application require an embedded solution.

"Nico Coesel" <nico@puntnl.niks> wrote in message 
news:44fdbcad.1055864262@news.kpnplanet.nl...
> Austin Lesea <austin@xilinx.com> wrote: > >>Tejo, >> >>http://direct.xilinx.com/bvdocs/userguides/ug073.pdf >> >>Yes, the 18X18 multiplier/accumulator is a hardened block, so that >>performing this function results in from 8 to 20 times less power than >>performing this function would if it was done in the logic of the FPGA >>(luts, dff, interconnect, etc.) >> >>The above guide details use of the V4 for "extreme" DSP uses. >> >>FPGAs are useful for tasks that DSP processors are too slow for, >>otherwise, DSP processors are generally far easier and better suited for >>DSP. For example, a video conference processor, where multiple streams >>must be encoded, decoded, combined, along with all audio processing is >>one such task where a FPGA would excel for both cost, power, and >>performance. > > I doubt about cost and preformance. Developing such a device would > probably take so much time that an ASIC is just as cost effective and > uses even less power. An older PC is already capable of doing these > functions with a development time that can be expressed in days, not > years. > > Even when dealing with loads of videostreams at high resolutions it is > more cost effective, reliable and flexible to use PCs on a fast > network to do the processing rather than building a custom FPGA/ASIC > solution. > > -- > Reply to nico@nctdevpuntnl (punt=.) > Bedrijven en winkels vindt U op www.adresboekje.nl