Reply by David Ashley September 7, 20062006-09-07
Rob wrote:
> Funny you should mention this. We came across the same issue, but we found > out the limitations of the memory interface before we decided to spin > boards. We ultimately went with an FPGA. You can't be an FPGA for parallel > processing. In our particular application an FPGA running at 67MHz out > performed the BlackFin running at 500MHz, all because of the FPGA's inherent > power of parallel processing.
No argument here. In this particular project we outsourced the hardware design, but did all the software in house. We had limited time to review the hardware designer's choice for chips -- he did some digging and we were content to trust his instincts. Getting to the point where we would have studied the datasheet in terms of memory bandwidth...there's just no way we would have invested the time in that then. There would have been an element of faith that AD would have designed their memory controller efficiently. Realistically their SDRAM controller seems to be just an afterthought. I can't really see how it would be *that* hard to add in bursting...and it would completely solve the bottlenecks... Note AD = Analog Devices + this discussion is in regards to their blackfin line of DSP products. Not really FPGA's. :) -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
Reply by Rob September 6, 20062006-09-06
A PC on a fast network is far from an embedded solution.  And many 
application require an embedded solution.

"Nico Coesel" <nico@puntnl.niks> wrote in message 
news:44fdbcad.1055864262@news.kpnplanet.nl...
> Austin Lesea <austin@xilinx.com> wrote: > >>Tejo, >> >>http://direct.xilinx.com/bvdocs/userguides/ug073.pdf >> >>Yes, the 18X18 multiplier/accumulator is a hardened block, so that >>performing this function results in from 8 to 20 times less power than >>performing this function would if it was done in the logic of the FPGA >>(luts, dff, interconnect, etc.) >> >>The above guide details use of the V4 for "extreme" DSP uses. >> >>FPGAs are useful for tasks that DSP processors are too slow for, >>otherwise, DSP processors are generally far easier and better suited for >>DSP. For example, a video conference processor, where multiple streams >>must be encoded, decoded, combined, along with all audio processing is >>one such task where a FPGA would excel for both cost, power, and >>performance. > > I doubt about cost and preformance. Developing such a device would > probably take so much time that an ASIC is just as cost effective and > uses even less power. An older PC is already capable of doing these > functions with a development time that can be expressed in days, not > years. > > Even when dealing with loads of videostreams at high resolutions it is > more cost effective, reliable and flexible to use PCs on a fast > network to do the processing rather than building a custom FPGA/ASIC > solution. > > -- > Reply to nico@nctdevpuntnl (punt=.) > Bedrijven en winkels vindt U op www.adresboekje.nl
Reply by Rob September 6, 20062006-09-06
Dave,

Funny you should mention this.  We came across the same issue, but we found 
out the limitations of the memory interface before we decided to spin 
boards.  We ultimately went with an FPGA.  You can't be an FPGA for parallel 
processing.  In our particular application an FPGA running at 67MHz out 
performed the BlackFin running at 500MHz, all because of the FPGA's inherent 
power of parallel processing.

Take care,
Rob


"David Ashley" <dash@nowhere.net.dont.email.me> wrote in message 
news:44fded00$1_3@x-privat.org...
> David Ashley wrote: >> Now if we'd opted for an FPGA, if it was big enough we'd have had >> lots of options to improve performance. We could have licensed >> some existing IP, modified it to suit. It would have been a bigger >> unknown since none of us had direct FPGA experience, but we did >> have low level programming experience. Going an FPGA route might >> have been a better investment in the long run... > > One more thing occured to me. With the Analog Devices blackfin DSP > approach we found out the hard way memory bandwidth was a severely > limiting factor. The DSP had some small amount of on chip memory > that ran at the CORE clock frequency. The SYSTEM clock was a fraction > of that, say 1/5th or 1/6th. Accessing external SDRAM took something > like 6 SCLOCKS, which translates to 36 CORE clocks. Thats a *long* > time in DSP space. The SDRAM controller never did burst accesses, as I > recall. > > What that means is you can't effectively do anything unless you use > the on chip fast memory, which operates at the CORE clock frequency > (600 mhz to 750 mhz for example). But that was a limited resource. > And there was no way to improve the SDRAM controller, that was part > of the chip. So the resolution *couldn't* improve, we didn't have memory > for it, and no amount of optimization would work. > > With an FPGA, on the other hand, one could have modified the external > memory controller to burst out a whole 64 byte chunk of memory, or > burst one into a BRAM. Then operate in the BRAM, then write it back > out. Doing burst accesses would really speed things up, and the memory > IO could go in in parallel with other processing. In short there would be > almost unlimited ability to optimize, as needed. > > With the fixed-cpu DSP approach, we found out the limits the hard way. > Then we had to reduce our expectations. In the end it was OK, but it > might have been a disaster. > > -Dave > > -- > David Ashley http://www.xdr.com/dash > Embedded linux, device drivers, system architecture
Reply by Ben Jackson September 6, 20062006-09-06
On 2006-09-05, Austin Lesea <austin@xilinx.com> wrote:
> Tejo, > > http://direct.xilinx.com/bvdocs/userguides/ug073.pdf
The TOC of that document refers to instantiation templates on p58/60 while the PDF seems to go from 56..63 directly. Trouble with my pdf reader?? -- Ben Jackson AD7GD <ben@ben.com> http://www.ben.com/
Reply by David Ashley September 5, 20062006-09-05
David Ashley wrote:
> Now if we'd opted for an FPGA, if it was big enough we'd have had > lots of options to improve performance. We could have licensed > some existing IP, modified it to suit. It would have been a bigger > unknown since none of us had direct FPGA experience, but we did > have low level programming experience. Going an FPGA route might > have been a better investment in the long run...
One more thing occured to me. With the Analog Devices blackfin DSP approach we found out the hard way memory bandwidth was a severely limiting factor. The DSP had some small amount of on chip memory that ran at the CORE clock frequency. The SYSTEM clock was a fraction of that, say 1/5th or 1/6th. Accessing external SDRAM took something like 6 SCLOCKS, which translates to 36 CORE clocks. Thats a *long* time in DSP space. The SDRAM controller never did burst accesses, as I recall. What that means is you can't effectively do anything unless you use the on chip fast memory, which operates at the CORE clock frequency (600 mhz to 750 mhz for example). But that was a limited resource. And there was no way to improve the SDRAM controller, that was part of the chip. So the resolution *couldn't* improve, we didn't have memory for it, and no amount of optimization would work. With an FPGA, on the other hand, one could have modified the external memory controller to burst out a whole 64 byte chunk of memory, or burst one into a BRAM. Then operate in the BRAM, then write it back out. Doing burst accesses would really speed things up, and the memory IO could go in in parallel with other processing. In short there would be almost unlimited ability to optimize, as needed. With the fixed-cpu DSP approach, we found out the limits the hard way. Then we had to reduce our expectations. In the end it was OK, but it might have been a disaster. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
Reply by David Ashley September 5, 20062006-09-05
Nico Coesel wrote:
> Even when dealing with loads of videostreams at high resolutions it is > more cost effective, reliable and flexible to use PCs on a fast > network to do the processing rather than building a custom FPGA/ASIC > solution. >
Probably there are other constraints. Do you need 1,000 of these? Is it a mass market product? Does it need to fit in a rack? In a unit the size of a pack of cigarettes? Getting something working in a lab in just any old manner, perhaps for getting funding, is one thing. Getting to a fieldable solution... There is no way one can make a blanket statement about the best solution without knowing all the requirements. In my case I worked on a project that had to encode live NTSC video to mpeg-2 I frames with a minimum of latency. The solution ended up being DSP based, blackfin DSP's, 4 video/audio inputs on a PCI card. It was a big project. In the end we were limited by the performance of the DSP's. The frame size ended up being 352x240 at 60 hz. There was just no way we could ever do 720x240 -- had to downsample in X. Now if we'd opted for an FPGA, if it was big enough we'd have had lots of options to improve performance. We could have licensed some existing IP, modified it to suit. It would have been a bigger unknown since none of us had direct FPGA experience, but we did have low level programming experience. Going an FPGA route might have been a better investment in the long run... Use the right technology/solution for the task at hand. No one size fits all. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
Reply by Austin Lesea September 5, 20062006-09-05
Nico,

We (Peter and I) have decided to avoid any marketing discussions.

On technical subjects, at least I can (usually) post something useful.

You decide when, where, and how to use Xilinx FPGAs.

I am happy if you use them at all, for whatever you find profitable.

The website speaks for itself:  there are lots of customer testimonials
for those who wish to read them.  You decide.

Perhaps others can post why they use our FPGAs for extreme DSP?

Austin



Nico Coesel wrote:
> Austin Lesea <austin@xilinx.com> wrote: > >> Tejo, >> >> http://direct.xilinx.com/bvdocs/userguides/ug073.pdf >> >> Yes, the 18X18 multiplier/accumulator is a hardened block, so that >> performing this function results in from 8 to 20 times less power than >> performing this function would if it was done in the logic of the FPGA >> (luts, dff, interconnect, etc.) >> >> The above guide details use of the V4 for "extreme" DSP uses. >> >> FPGAs are useful for tasks that DSP processors are too slow for, >> otherwise, DSP processors are generally far easier and better suited for >> DSP. For example, a video conference processor, where multiple streams >> must be encoded, decoded, combined, along with all audio processing is >> one such task where a FPGA would excel for both cost, power, and >> performance. > > I doubt about cost and preformance. Developing such a device would > probably take so much time that an ASIC is just as cost effective and > uses even less power. An older PC is already capable of doing these > functions with a development time that can be expressed in days, not > years. > > Even when dealing with loads of videostreams at high resolutions it is > more cost effective, reliable and flexible to use PCs on a fast > network to do the processing rather than building a custom FPGA/ASIC > solution. >
Reply by Nico Coesel September 5, 20062006-09-05
Austin Lesea <austin@xilinx.com> wrote:

>Tejo, > >http://direct.xilinx.com/bvdocs/userguides/ug073.pdf > >Yes, the 18X18 multiplier/accumulator is a hardened block, so that >performing this function results in from 8 to 20 times less power than >performing this function would if it was done in the logic of the FPGA >(luts, dff, interconnect, etc.) > >The above guide details use of the V4 for "extreme" DSP uses. > >FPGAs are useful for tasks that DSP processors are too slow for, >otherwise, DSP processors are generally far easier and better suited for >DSP. For example, a video conference processor, where multiple streams >must be encoded, decoded, combined, along with all audio processing is >one such task where a FPGA would excel for both cost, power, and >performance.
I doubt about cost and preformance. Developing such a device would probably take so much time that an ASIC is just as cost effective and uses even less power. An older PC is already capable of doing these functions with a development time that can be expressed in days, not years. Even when dealing with loads of videostreams at high resolutions it is more cost effective, reliable and flexible to use PCs on a fast network to do the processing rather than building a custom FPGA/ASIC solution. -- Reply to nico@nctdevpuntnl (punt=.) Bedrijven en winkels vindt U op www.adresboekje.nl
Reply by Austin Lesea September 5, 20062006-09-05
Tejo,

http://direct.xilinx.com/bvdocs/userguides/ug073.pdf

Yes, the 18X18 multiplier/accumulator is a hardened block, so that
performing this function results in from 8 to 20 times less power than
performing this function would if it was done in the logic of the FPGA
(luts, dff, interconnect, etc.)

The above guide details use of the V4 for "extreme" DSP uses.

FPGAs are useful for tasks that DSP processors are too slow for,
otherwise, DSP processors are generally far easier and better suited for
DSP.  For example, a video conference processor, where multiple streams
must be encoded, decoded, combined, along with all audio processing is
one such task where a FPGA would excel for both cost, power, and
performance.

http://www.demosondemand.com/clients/xilinx/001/page/index_dsp_review.asp

Austin

sutejok wrote:
> from the Xilinx Virtex4 spec: > > &#4294967295; XtremeDSP&#4294967295; Slice > - 18x18, two's complement, signed Multiplier > - Optional pipeline stages > - Built-In Accumulator (48-bits) & Adder/Subtracter > > > i'm not too familiar with dsp on fpga - what does it mean when it says > 18x18 multiplier? is it a hardware multiplier? is there anywhere i can > get informations on and how to use them? > something specific to virtex4 would be nice > > thx > tejo >
Reply by Peter Alfke September 5, 20062006-09-05
Click on
http://www.xilinx.com/bvdocs/userguides/ug073.pdf
for an extensive User Guide.
Peter Alfke, Xilinx
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D
sutejok wrote:
> from the Xilinx Virtex4 spec: > > =B7 XtremeDSP=99 Slice > - 18x18, two's complement, signed Multiplier > - Optional pipeline stages > - Built-In Accumulator (48-bits) & Adder/Subtracter > > > i'm not too familiar with dsp on fpga - what does it mean when it says > 18x18 multiplier? is it a hardware multiplier? is there anywhere i can > get informations on and how to use them? > something specific to virtex4 would be nice >=20 > thx=20 > tejo