FPGARelated.com
Forums

Xcell Article on 1.2Gsamples/sec FFT

Started by Andrew FPGA October 10, 2007
Hi all,
Just read an interesting article in Xilinx's xcel publication. Lots of
technical detail, and no "marketing" to speak of.
http://www.xilinx.com/publications/magazines/dsp_03/xc_pdf/p42-44-3dsp-andraka.pdf

After reading this I had a couple of burning questions I'm wondering
if anyone, or Ray himself, can shed some light on
1) 1.2 Gsamples/s seems like a pretty high input data rate - no doubt
there are a few applications around that need it. But what about the
1.2Gsamples/sec data output rate? What systems can take the FFT
outputs at this rate, and do something sensible with the data?
Although the FFT engine has done a bunch of processing, it hasn't
really reduced the amount of data in any way? I mean you can't go
hookup 1.2Gsps to a pc based platform. Even 10 gigabit ethernet cannot
transport this amount of data, let alone the cpu do much processing
with it.

2)I didn't understand the comparison between the 66 Gflop fpga FFT
core and the 48 GFLOP Cell processor implementation. Was the cell
processor implementation processing samples at 1.2Gsps? was it also at
from 32 to 2048 point transform?

Cheers
Andrew

Andrew FPGA wrote:
> Hi all, > Just read an interesting article in Xilinx's xcel publication. Lots of > technical detail, and no "marketing" to speak of. > http://www.xilinx.com/publications/magazines/dsp_03/xc_pdf/p42-44-3dsp-andraka.pdf > > After reading this I had a couple of burning questions I'm wondering > if anyone, or Ray himself, can shed some light on > 1) 1.2 Gsamples/s seems like a pretty high input data rate - no doubt > there are a few applications around that need it. But what about the > 1.2Gsamples/sec data output rate? What systems can take the FFT > outputs at this rate, and do something sensible with the data?
Actually, I would say there are far more fixed-point FFT cores used than this floating-point one, because the fixed-point cores can achieve even faster throughput. If you look at the Andraka Consultant web page you will see explanations about where it is used in. In general a FFT core is not a stand alone block, but usually used in connection with other functionality. So the core is embedded in an application and from the outside you don't see that data rate anymore.
> Although the FFT engine has done a bunch of processing, it hasn't > really reduced the amount of data in any way? I mean you can't go > hookup 1.2Gsps to a pc based platform. Even 10 gigabit ethernet cannot > transport this amount of data, let alone the cpu do much processing > with it.
Well, actually in many cases the FFT will actually increase the data amount. If you come from real world applications, usually you have real data and can set the imaginary part to 0. Then the output of the FFT is complex. Also, if you want to use that core in connection with a PC, you probably will not hook it up over a Ethernet connection, but use it with an FPGA on a PCI or PCI-E plug in card. Cheers, Guenter
Andrew FPGA wrote:
> Hi all, > Just read an interesting article in Xilinx's xcel publication. Lots of > technical detail, and no "marketing" to speak of. > http://www.xilinx.com/publications/magazines/dsp_03/xc_pdf/p42-44-3dsp-andraka.pdf > > After reading this I had a couple of burning questions I'm wondering > if anyone, or Ray himself, can shed some light on > 1) 1.2 Gsamples/s seems like a pretty high input data rate - no doubt > there are a few applications around that need it. But what about the > 1.2Gsamples/sec data output rate? What systems can take the FFT > outputs at this rate, and do something sensible with the data? > Although the FFT engine has done a bunch of processing, it hasn't > really reduced the amount of data in any way? I mean you can't go > hookup 1.2Gsps to a pc based platform. Even 10 gigabit ethernet cannot > transport this amount of data, let alone the cpu do much processing > with it. > > 2)I didn't understand the comparison between the 66 Gflop fpga FFT > core and the 48 GFLOP Cell processor implementation. Was the cell > processor implementation processing samples at 1.2Gsps? was it also at > from 32 to 2048 point transform? > > Cheers > Andrew >
That particular application was for image processing, the FFT was used in two passes to perform a 2D FFT of various sizes. Fast FFTs are also commonly used in communications, digital radio and SIGINT applications, all of which need to do the FFT on incoming data streams sampled at high rates. The 1.2 GS is the upper bound for this architecture in this device. The application in question needed a sustained 1.0 Gs/sec to keep up with the frame data. The FFT is surrounded with other hardware, not connected (at least on the data path) to a computer. The cell processor was not working at 1.2Gsps, in fact it would not be able to achieve that data rate. The comparison was to show that the FPGA design could substantially out-perform the cell processor. The cell application was actually a large FFT, 512K points as I recall. The large FFT is essentially the same process as a 2D FFT except that there is a phase rotation between passes for the large FFT that is not there for the 2D FFT. While the comparison is not exactly 1:1, it is similar enough to be able to draw a valid conclusion. I have used the same floating point core to perform large FFTs instead of 2D.
Guenter Dannoritzer wrote:


> > Actually, I would say there are far more fixed-point FFT cores used than > this floating-point one, because the fixed-point cores can achieve even > faster throughput. >
In this case, the fixed point version has about the same speed as the floating point, but with considerably less latency. A single instance of the core runs at up to 400 MHz in a -10 V4SX55 for both the floating point and fixed point versions. That speed is limited by the max clock of the DSP48 and BRAM elements. The fixed point core is smaller, which means more instances can be fit into a device for a higher overall throughput.