> Funny you should mention this. We came across the same issue, but we found
> out the limitations of the memory interface before we decided to spin
> boards. We ultimately went with an FPGA. You can't be an FPGA for parallel
> processing. In our particular application an FPGA running at 67MHz out
> performed the BlackFin running at 500MHz, all because of the FPGA's inherent
> power of parallel processing.
No argument here. In this particular project we outsourced the hardware
design, but did all the software in house. We had limited time to review
the hardware designer's choice for chips -- he did some digging and we
were content to trust his instincts. Getting to the point where we would
have studied the datasheet in terms of memory bandwidth...there's just
no way we would have invested the time in that then. There would have
been an element of faith that AD would have designed their memory
controller efficiently.
Realistically their SDRAM controller seems to be just an afterthought.
I can't really see how it would be *that* hard to add in bursting...and
it would completely solve the bottlenecks...
Note AD = Analog Devices + this discussion is in regards to their
blackfin line of DSP products. Not really FPGA's. :)
-Dave
--
David Ashley http://www.xdr.com/dash
Embedded linux, device drivers, system architecture
Reply by Rob●September 6, 20062006-09-06
A PC on a fast network is far from an embedded solution. And many
application require an embedded solution.
"Nico Coesel" <nico@puntnl.niks> wrote in message
news:44fdbcad.1055864262@news.kpnplanet.nl...
> Austin Lesea <austin@xilinx.com> wrote:
>
>>Tejo,
>>
>>http://direct.xilinx.com/bvdocs/userguides/ug073.pdf
>>
>>Yes, the 18X18 multiplier/accumulator is a hardened block, so that
>>performing this function results in from 8 to 20 times less power than
>>performing this function would if it was done in the logic of the FPGA
>>(luts, dff, interconnect, etc.)
>>
>>The above guide details use of the V4 for "extreme" DSP uses.
>>
>>FPGAs are useful for tasks that DSP processors are too slow for,
>>otherwise, DSP processors are generally far easier and better suited for
>>DSP. For example, a video conference processor, where multiple streams
>>must be encoded, decoded, combined, along with all audio processing is
>>one such task where a FPGA would excel for both cost, power, and
>>performance.
>
> I doubt about cost and preformance. Developing such a device would
> probably take so much time that an ASIC is just as cost effective and
> uses even less power. An older PC is already capable of doing these
> functions with a development time that can be expressed in days, not
> years.
>
> Even when dealing with loads of videostreams at high resolutions it is
> more cost effective, reliable and flexible to use PCs on a fast
> network to do the processing rather than building a custom FPGA/ASIC
> solution.
>
> --
> Reply to nico@nctdevpuntnl (punt=.)
> Bedrijven en winkels vindt U op www.adresboekje.nl
Reply by Rob●September 6, 20062006-09-06
Dave,
Funny you should mention this. We came across the same issue, but we found
out the limitations of the memory interface before we decided to spin
boards. We ultimately went with an FPGA. You can't be an FPGA for parallel
processing. In our particular application an FPGA running at 67MHz out
performed the BlackFin running at 500MHz, all because of the FPGA's inherent
power of parallel processing.
Take care,
Rob
"David Ashley" <dash@nowhere.net.dont.email.me> wrote in message
news:44fded00$1_3@x-privat.org...
> David Ashley wrote:
>> Now if we'd opted for an FPGA, if it was big enough we'd have had
>> lots of options to improve performance. We could have licensed
>> some existing IP, modified it to suit. It would have been a bigger
>> unknown since none of us had direct FPGA experience, but we did
>> have low level programming experience. Going an FPGA route might
>> have been a better investment in the long run...
>
> One more thing occured to me. With the Analog Devices blackfin DSP
> approach we found out the hard way memory bandwidth was a severely
> limiting factor. The DSP had some small amount of on chip memory
> that ran at the CORE clock frequency. The SYSTEM clock was a fraction
> of that, say 1/5th or 1/6th. Accessing external SDRAM took something
> like 6 SCLOCKS, which translates to 36 CORE clocks. Thats a *long*
> time in DSP space. The SDRAM controller never did burst accesses, as I
> recall.
>
> What that means is you can't effectively do anything unless you use
> the on chip fast memory, which operates at the CORE clock frequency
> (600 mhz to 750 mhz for example). But that was a limited resource.
> And there was no way to improve the SDRAM controller, that was part
> of the chip. So the resolution *couldn't* improve, we didn't have memory
> for it, and no amount of optimization would work.
>
> With an FPGA, on the other hand, one could have modified the external
> memory controller to burst out a whole 64 byte chunk of memory, or
> burst one into a BRAM. Then operate in the BRAM, then write it back
> out. Doing burst accesses would really speed things up, and the memory
> IO could go in in parallel with other processing. In short there would be
> almost unlimited ability to optimize, as needed.
>
> With the fixed-cpu DSP approach, we found out the limits the hard way.
> Then we had to reduce our expectations. In the end it was OK, but it
> might have been a disaster.
>
> -Dave
>
> --
> David Ashley http://www.xdr.com/dash
> Embedded linux, device drivers, system architecture
Reply by Ben Jackson●September 6, 20062006-09-06
On 2006-09-05, Austin Lesea <austin@xilinx.com> wrote:
The TOC of that document refers to instantiation templates on p58/60
while the PDF seems to go from 56..63 directly. Trouble with my pdf
reader??
--
Ben Jackson AD7GD
<ben@ben.com>
http://www.ben.com/
Reply by David Ashley●September 5, 20062006-09-05
David Ashley wrote:
> Now if we'd opted for an FPGA, if it was big enough we'd have had
> lots of options to improve performance. We could have licensed
> some existing IP, modified it to suit. It would have been a bigger
> unknown since none of us had direct FPGA experience, but we did
> have low level programming experience. Going an FPGA route might
> have been a better investment in the long run...
One more thing occured to me. With the Analog Devices blackfin DSP
approach we found out the hard way memory bandwidth was a severely
limiting factor. The DSP had some small amount of on chip memory
that ran at the CORE clock frequency. The SYSTEM clock was a fraction
of that, say 1/5th or 1/6th. Accessing external SDRAM took something
like 6 SCLOCKS, which translates to 36 CORE clocks. Thats a *long*
time in DSP space. The SDRAM controller never did burst accesses, as I
recall.
What that means is you can't effectively do anything unless you use
the on chip fast memory, which operates at the CORE clock frequency
(600 mhz to 750 mhz for example). But that was a limited resource.
And there was no way to improve the SDRAM controller, that was part
of the chip. So the resolution *couldn't* improve, we didn't have memory
for it, and no amount of optimization would work.
With an FPGA, on the other hand, one could have modified the external
memory controller to burst out a whole 64 byte chunk of memory, or
burst one into a BRAM. Then operate in the BRAM, then write it back
out. Doing burst accesses would really speed things up, and the memory
IO could go in in parallel with other processing. In short there would be
almost unlimited ability to optimize, as needed.
With the fixed-cpu DSP approach, we found out the limits the hard way.
Then we had to reduce our expectations. In the end it was OK, but it
might have been a disaster.
-Dave
--
David Ashley http://www.xdr.com/dash
Embedded linux, device drivers, system architecture
Reply by David Ashley●September 5, 20062006-09-05
Nico Coesel wrote:
> Even when dealing with loads of videostreams at high resolutions it is
> more cost effective, reliable and flexible to use PCs on a fast
> network to do the processing rather than building a custom FPGA/ASIC
> solution.
>
Probably there are other constraints. Do you need 1,000 of these?
Is it a mass market product? Does it need to fit in a rack? In a
unit the size of a pack of cigarettes?
Getting something working in a lab in just any old manner, perhaps
for getting funding, is one thing. Getting to a fieldable solution...
There is no way one can make a blanket statement about the best
solution without knowing all the requirements.
In my case I worked on a project that had to encode live NTSC video
to mpeg-2 I frames with a minimum of latency. The solution ended
up being DSP based, blackfin DSP's, 4 video/audio inputs on a PCI
card. It was a big project. In the end we were limited by the performance
of the DSP's. The frame size ended up being 352x240 at 60 hz. There
was just no way we could ever do 720x240 -- had to downsample in X.
Now if we'd opted for an FPGA, if it was big enough we'd have had
lots of options to improve performance. We could have licensed
some existing IP, modified it to suit. It would have been a bigger
unknown since none of us had direct FPGA experience, but we did
have low level programming experience. Going an FPGA route might
have been a better investment in the long run...
Use the right technology/solution for the task at hand. No one size
fits all.
-Dave
--
David Ashley http://www.xdr.com/dash
Embedded linux, device drivers, system architecture
Reply by Austin Lesea●September 5, 20062006-09-05
Nico,
We (Peter and I) have decided to avoid any marketing discussions.
On technical subjects, at least I can (usually) post something useful.
You decide when, where, and how to use Xilinx FPGAs.
I am happy if you use them at all, for whatever you find profitable.
The website speaks for itself: there are lots of customer testimonials
for those who wish to read them. You decide.
Perhaps others can post why they use our FPGAs for extreme DSP?
Austin
Nico Coesel wrote:
> Austin Lesea <austin@xilinx.com> wrote:
>
>> Tejo,
>>
>> http://direct.xilinx.com/bvdocs/userguides/ug073.pdf
>>
>> Yes, the 18X18 multiplier/accumulator is a hardened block, so that
>> performing this function results in from 8 to 20 times less power than
>> performing this function would if it was done in the logic of the FPGA
>> (luts, dff, interconnect, etc.)
>>
>> The above guide details use of the V4 for "extreme" DSP uses.
>>
>> FPGAs are useful for tasks that DSP processors are too slow for,
>> otherwise, DSP processors are generally far easier and better suited for
>> DSP. For example, a video conference processor, where multiple streams
>> must be encoded, decoded, combined, along with all audio processing is
>> one such task where a FPGA would excel for both cost, power, and
>> performance.
>
> I doubt about cost and preformance. Developing such a device would
> probably take so much time that an ASIC is just as cost effective and
> uses even less power. An older PC is already capable of doing these
> functions with a development time that can be expressed in days, not
> years.
>
> Even when dealing with loads of videostreams at high resolutions it is
> more cost effective, reliable and flexible to use PCs on a fast
> network to do the processing rather than building a custom FPGA/ASIC
> solution.
>
Reply by Nico Coesel●September 5, 20062006-09-05
Austin Lesea <austin@xilinx.com> wrote:
>Tejo,
>
>http://direct.xilinx.com/bvdocs/userguides/ug073.pdf
>
>Yes, the 18X18 multiplier/accumulator is a hardened block, so that
>performing this function results in from 8 to 20 times less power than
>performing this function would if it was done in the logic of the FPGA
>(luts, dff, interconnect, etc.)
>
>The above guide details use of the V4 for "extreme" DSP uses.
>
>FPGAs are useful for tasks that DSP processors are too slow for,
>otherwise, DSP processors are generally far easier and better suited for
>DSP. For example, a video conference processor, where multiple streams
>must be encoded, decoded, combined, along with all audio processing is
>one such task where a FPGA would excel for both cost, power, and
>performance.
I doubt about cost and preformance. Developing such a device would
probably take so much time that an ASIC is just as cost effective and
uses even less power. An older PC is already capable of doing these
functions with a development time that can be expressed in days, not
years.
Even when dealing with loads of videostreams at high resolutions it is
more cost effective, reliable and flexible to use PCs on a fast
network to do the processing rather than building a custom FPGA/ASIC
solution.
--
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl
Reply by Austin Lesea●September 5, 20062006-09-05
Tejo,
http://direct.xilinx.com/bvdocs/userguides/ug073.pdf
Yes, the 18X18 multiplier/accumulator is a hardened block, so that
performing this function results in from 8 to 20 times less power than
performing this function would if it was done in the logic of the FPGA
(luts, dff, interconnect, etc.)
The above guide details use of the V4 for "extreme" DSP uses.
FPGAs are useful for tasks that DSP processors are too slow for,
otherwise, DSP processors are generally far easier and better suited for
DSP. For example, a video conference processor, where multiple streams
must be encoded, decoded, combined, along with all audio processing is
one such task where a FPGA would excel for both cost, power, and
performance.
http://www.demosondemand.com/clients/xilinx/001/page/index_dsp_review.asp
Austin
sutejok wrote:
> from the Xilinx Virtex4 spec:
>
> � XtremeDSP� Slice
> - 18x18, two's complement, signed Multiplier
> - Optional pipeline stages
> - Built-In Accumulator (48-bits) & Adder/Subtracter
>
>
> i'm not too familiar with dsp on fpga - what does it mean when it says
> 18x18 multiplier? is it a hardware multiplier? is there anywhere i can
> get informations on and how to use them?
> something specific to virtex4 would be nice
>
> thx
> tejo
>
Reply by Peter Alfke●September 5, 20062006-09-05
Click on
http://www.xilinx.com/bvdocs/userguides/ug073.pdf
for an extensive User Guide.
Peter Alfke, Xilinx
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D
sutejok wrote:
> from the Xilinx Virtex4 spec:
>
> =B7 XtremeDSP=99 Slice
> - 18x18, two's complement, signed Multiplier
> - Optional pipeline stages
> - Built-In Accumulator (48-bits) & Adder/Subtracter
>
>
> i'm not too familiar with dsp on fpga - what does it mean when it says
> 18x18 multiplier? is it a hardware multiplier? is there anywhere i can
> get informations on and how to use them?
> something specific to virtex4 would be nice
>=20
> thx=20
> tejo