comp.arch.fpga | Xcell Article on 1.2Gsamples/sec FFT

Hi all,
Just read an interesting article in Xilinx's xcel publication. Lots of
technical detail, and no "marketing" to speak of.
http://www.xilinx.com/publications/magazines/dsp_03/xc_pdf/p42-44-3dsp-andraka.pdf

After reading this I had a couple of burning questions I'm wondering
if anyone, or Ray himself, can shed some light on
1) 1.2 Gsamples/s seems like a pretty high input data rate - no doubt
there are a few applications around that need it. But what about the
1.2Gsamples/sec data output rate? What systems can take the FFT
outputs at this rate, and do something sensible with the data?
Although the FFT engine has done a bunch of processing, it hasn't
really reduced the amount of data in any way? I mean you can't go
hookup 1.2Gsps to a pc based platform. Even 10 gigabit ethernet cannot
transport this amount of data, let alone the cpu do much processing
with it.

2)I didn't understand the comparison between the 66 Gflop fpga FFT
core and the 48 GFLOP Cell processor implementation. Was the cell
processor implementation processing samples at 1.2Gsps? was it also at
from 32 to 2048 point transform?

Cheers
Andrew

Reply by Guenter Dannoritzer ●October 10, 20072007-10-10

Andrew FPGA wrote:
> Hi all,
> Just read an interesting article in Xilinx's xcel publication. Lots of
> technical detail, and no "marketing" to speak of.
> http://www.xilinx.com/publications/magazines/dsp_03/xc_pdf/p42-44-3dsp-andraka.pdf
> 
> After reading this I had a couple of burning questions I'm wondering
> if anyone, or Ray himself, can shed some light on
> 1) 1.2 Gsamples/s seems like a pretty high input data rate - no doubt
> there are a few applications around that need it. But what about the
> 1.2Gsamples/sec data output rate? What systems can take the FFT
> outputs at this rate, and do something sensible with the data?

Actually, I would say there are far more fixed-point FFT cores used than
this floating-point one, because the fixed-point cores can achieve even
faster throughput.

If you look at the Andraka Consultant web page you will see explanations
about where it is used in. In general a FFT core is not a stand alone
block, but usually used in connection with other functionality. So the
core is embedded in an application and from the outside you don't see
that data rate anymore.

> Although the FFT engine has done a bunch of processing, it hasn't
> really reduced the amount of data in any way? I mean you can't go
> hookup 1.2Gsps to a pc based platform. Even 10 gigabit ethernet cannot
> transport this amount of data, let alone the cpu do much processing
> with it.

Well, actually in many cases the FFT will actually increase the data
amount. If you come from real world applications, usually you have real
data and can set the imaginary part to 0. Then the output of the FFT is
complex.

Also, if you want to use that core in connection with a PC, you probably
will not hook it up over a Ethernet connection, but use it with an FPGA
on a PCI or PCI-E plug in card.

Cheers,

Guenter

Reply by Ray Andraka ●October 10, 20072007-10-10

Andrew FPGA wrote:
> Hi all,
> Just read an interesting article in Xilinx's xcel publication. Lots of
> technical detail, and no "marketing" to speak of.
> http://www.xilinx.com/publications/magazines/dsp_03/xc_pdf/p42-44-3dsp-andraka.pdf
> 
> After reading this I had a couple of burning questions I'm wondering
> if anyone, or Ray himself, can shed some light on
> 1) 1.2 Gsamples/s seems like a pretty high input data rate - no doubt
> there are a few applications around that need it. But what about the
> 1.2Gsamples/sec data output rate? What systems can take the FFT
> outputs at this rate, and do something sensible with the data?
> Although the FFT engine has done a bunch of processing, it hasn't
> really reduced the amount of data in any way? I mean you can't go
> hookup 1.2Gsps to a pc based platform. Even 10 gigabit ethernet cannot
> transport this amount of data, let alone the cpu do much processing
> with it.
> 
> 2)I didn't understand the comparison between the 66 Gflop fpga FFT
> core and the 48 GFLOP Cell processor implementation. Was the cell
> processor implementation processing samples at 1.2Gsps? was it also at
> from 32 to 2048 point transform?
> 
> Cheers
> Andrew
> 

That particular application was for image processing, the FFT was used 
in two passes to perform a 2D FFT of various sizes.  Fast FFTs are also 
commonly used in communications, digital radio and SIGINT applications, 
all of which need to do the FFT on incoming data streams sampled at high 
rates.   The 1.2 GS is the upper bound for this architecture in this 
device.  The application in question needed a sustained 1.0 Gs/sec to 
keep up with the frame data.  The FFT is surrounded with other hardware, 
not connected (at least on the data path) to a computer.

The cell processor was not working at 1.2Gsps, in fact it would not be 
able to achieve that data rate.  The comparison was to show that the 
FPGA design could substantially out-perform the cell processor.  The 
cell application was actually a large FFT, 512K points as I recall.  The 
large FFT is essentially the same process as a 2D FFT except that there 
is a phase rotation between passes for the large FFT that is not there 
for the 2D FFT.  While the comparison is not exactly 1:1, it is similar 
enough to be able to draw a valid conclusion.  I have used the same 
floating point core to perform large FFTs instead of 2D.

Reply by Ray Andraka ●October 10, 20072007-10-10

Guenter Dannoritzer wrote:

> 
> Actually, I would say there are far more fixed-point FFT cores used than
> this floating-point one, because the fixed-point cores can achieve even
> faster throughput.
> 

In this case, the fixed point version has about the same speed as the 
floating point, but with considerably less latency.  A single instance 
of the core runs at up to 400 MHz in a -10 V4SX55 for both the floating 
point and fixed point versions.  That speed is limited by the max clock 
of the DSP48 and BRAM elements.  The fixed point core is smaller, which 
means more instances can be fit into a device for a higher overall 
throughput.

Xcell Article on 1.2Gsamples/sec FFT

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group